Abstract This paper describes a new method for quantifying the similarity of the lexical distribution of phonemes in different varieties of a language (in this case English). In addition to introducing the method, it discusses phonological problems which must be addressed if any comparison of this sort is to be attempted, and applies the method to a limited data set of varieties of English. Since the method assesses their structural similarity, it will be useful for analysing the historical development of varieties of English and the relationships (either as a result of common origin or of contact) that hold between them.
INTRODUCTION
In recent years considerable progress has been made in assessing the relationships between linguistic varieties by measuring the similarity between strictly comparable sets of phonetic data. In particular, measurement of Levenshtein Distance (see, for example, Nerbonne, Heeringa, and Kleiweg, 1999; Nerbonne and Heeringa, 2001; Heeringa, 2004) has proved useful for determining the relationships between closely related varieties, and the ‘Sound Comparisons’ method for assessing the distance between varieties provides a very promising alternative technique for looking into the changing relationships between closely-related and not so closely-related varieties (Heggarty, McMahon and McMahon, 2005; McMahon, Heggarty, McMahon and Maguire, 2007).
Phonetic comparison algorithms of this sort are not, however, without their problems. Firstly, they often depend upon auditory phonetic transcriptions of one degree of fineness or another, with all the associated issues of transcriber isoglosses, inaccuracies and realism that this method brings (see Milroy and Gordon, 2003: 144–152 for a discussion of the issues).