1. Introduction
Heritage speakers are children and adults who belong to a linguistic minority and have been exposed to their home language as well as the majority language that is spoken in their country of residence. Heritage speakers are a type of early bilingual, and their proficiency of the heritage language can vary (Mailhammer & Zeidan Reference Mailhammer and Ronia2019). They may be fully proficient in the home language or they may achieve partial command of it while having a strong command of the majority language; the latter may occur if they do not have access to education in their home language (Montrul Reference Montrul2010).
In many minority language communities, where speakers also use the majority language, the whole community is bilingual. Community bilingualism can result in language transfer (e.g., Flege & Eefting Reference Flege and Eefting1987) and the emergence of a contact variety (Mayr & Siddika Reference Mayr and Siddika2018; McCarthy et al. Reference McCarthy, Evans and Mahon2013; Nagy Reference Nagy2015; Treffers-Daller & Mougeon Reference Treffers-Daller and Mougeon2005). Western Armenian is a heritage language with communities of speakers in a number of countries, including Lebanon and the US (e.g., Bolsajian Reference Bolsajian2018). The goals of the current study are to examine the phonetic realization of the stop voicing contrast in this language, to investigate whether it is as has been described in the phonological literature, and also to determine whether the same patterns are found in both populations. This research provides a phonetic description of the voicing contrast in Western Armenian, as well as insight into whether transfer from the majority language has occurred among speakers.
1.1 Multilingualism and transfer
Multilingual speakers may show phonetic transfer effects in their languages, meaning that one language can influence the other (Flege & Eefting Reference Flege and Eefting1987; Lein et al. Reference Lein, Kupisch and van de Weijer2016; Mayr & Siddika Reference Mayr and Siddika2018; McCarthy et al. Reference McCarthy, Evans and Mahon2013; Nagy Reference Nagy2015; Treffers-Daller & Mougeon Reference Treffers-Daller and Mougeon2005), and this can happen in either direction (Major Reference Major1992; Mennen Reference Mennen2004). Jarvis & Pavlenko (Reference Jarvis and Pavlenko2009) categorize transfer into three types: forward transfer (L1 to L2), reverse transfer (L2 to L1) and lateral transfer (L2 to L3). Some multilingual speakers have been found to have productions that are intermediate between their two languages (Flege & Eefting Reference Flege and Eefting1987; Flege Reference Flege1995).
In terms of models of L2 learning, Flege (Reference Flege1995) suggests that in L2 production, phonemes are classified as new or similar through the L1 phonological system. This model proposed that new L2 phonemes that are distinct from those in L1 are classified in separate categories. In the case of L2 phonemes that are similar to their L1 counterparts, a process called equivalence classification might take place, whereby L2 speakers classify these phonemes as phonetic realizations of an L1 category. This mainly occurs in the early stages of second language acquisition, but as learners are more exposed to the L2, they are able to create a new category for the L2 sound and more accurately produce it (Flege Reference Flege1995; Flege et al. Reference Flege, Schirru and MacKay2003). The Perceptual Assimilation Model (PAM) (Best Reference Best1994), which was extended to PAM-L2 to include issues related to phonological encoding and phonetic realization, suggests that L2 phonemes are perceived based on their similarity to and difference from L1 phonemes. Accordingly, L2 phonemes are assimilated to the L1 phonological space in three ways: one L2 sound to one L1 category, two L2 sounds to one L1 category, and no L1-L2 category assimilation (Best & Tyler Reference Best and Tyler2007).
L1 to L2 transfer has been documented in a number of phonetic studies. The phonetic implementation of the stop voicing contrast in Arabic (/t-d/ and /k-ɡ/) produced by native speakers of Saudi Arabic was compared with the stop voicing contrast in English (/p-b/, /t-d/, and /k-ɡ/) produced by those speakers as well as native speakers of American English (Flege & Port Reference Flege and Port1981). The results of their study demonstrate transfer from L1 to L2 since the Arabic speakers were found to produce the English contrast with values similar to how they produce it in Arabic. In a similar L1 to L2 phonetic transfer study, Cox & Palethorpe (Reference Cox and Palethorpe2005) investigated the variety of English spoken by L1 speakers of Lebanese Arabic who were born in Australia. This Lebanese Australian English was found to share characteristics of both Lebanese Arabic and Standard Australian English; while the vowel spectral features and temporal patterns of the rime were the same as Australian English, there were final voicing and vowel duration effects that were the same as Lebanese Arabic. As explained by Cox & Palethorpe, this phenomenon can be referred to as ‘stabilized transference’, which is the creation of a new dialect based on transfer of characteristics from the ‘substratum’ language (Cox & Palethorpe Reference Cox and Palethorpe2005). Sometimes, stabilized transference can result in a contact variety, influenced by both languages. One example of this is Welsh English, which was found to have some correlates of lexical stress that were in line with Southern Standard British English and some that were in line with Welsh, and some that were intermediate between the two languages (Mennen et al. Reference Mennen, Kelly, Mayr and Morris2020).
Cross-linguistic transfer might also occur when code-switching. The effect of spontaneous code-switching on voice onset time (VOT; see section 1.3) values of Spanish and English in the speech of New Mexican Spanish-English bilinguals was investigated by Balukas & Koops (Reference Balukas and Koops2015). The results showed more Spanish-like VOT values when the speakers code-switched to English. On the other hand, no effect of code-switching was detected in Spanish. This pattern was also evident in the similarity of VOT values of Spanish spoken by the bilingual participants with the VOT values of non-contact Spanish. Interestingly, the English spoken by New Mexican Spanish-English bilinguals displayed the effects of language contact since the VOT values were more in line with those of Spanish, while the VOT values of Spanish remained unaffected by the patterns of English.
Bilingual proficiency plays a major role in transfer. Some research has found that language-specific phonetic realizations can be learned for the same phoneme by highly proficient bilinguals without transfer effects (Chen & Mok Reference Chen and Mok2019). The acoustic and articulatory characteristics of English and Mandarin /ɹ/ as produced by highly proficient Mandarin-English bilinguals was examined. Even though the Mandarin /ɹ/ is phonetically different from the English /ɹ/, speakers were able to produce the English sound with no transfer from Mandarin, their L1. In accordance with PAM-L2 (Best & Tyler Reference Best and Tyler2007), the Mandarin-English bilinguals were able to phonetically establish an L2 category to an L1 category if they share the same phonetic details, and were still able to differentiate the two sounds. However, in another study, the findings were different in relation to simultaneous bilinguals. Sundara et al. (Reference Sundara, Polka and Baum2006) focused on the phonetics of /d/ and /t/ production by Canadian French–English simultaneous bilinguals, as well as monolinguals. The researchers investigated differences in VOT, vowel formants, relative burst intensity and spectral measures. Participants read carrier sentences that included disyllabic real words with /d/ and /t/ in word-initial position. For the simultaneous bilinguals, the VOT values for French /t/ and /d/ and English /t/ were similar to those of monolingual speakers of each language; however, the VOT values for English /d/ were different from those of monolingual English speakers, showing a more French-like pattern, that is, voicing during the closure.
In L2 acquisition and production, reverse transfer may also occur, where the heritage language (L1) is influenced by the majority language (L2). This may be attributed to the process of equivalence classification (Flege Reference Flege1987; Flege & Eefting Reference Flege and Eefting1987; Flege Reference Flege1995), which causes sounds in the L1 to be influenced by sounds in the L2. Flege (Reference Flege1987) examined the production of the /t/ sound of L1 English speakers with L2 French living in Paris, L1 French speakers with L2 English living in Chicago, as well as monolingual speakers of each language. English /t/ produced by L1 English speakers with proficient French was influenced by French (the L2) and, accordingly, produced with shorter VOT than the English monolinguals. Similarly, the L1 French speakers with a high level of English produced French /t/ with longer VOT (more English-like) than the French monolinguals.
Heselwood & McChrystal (Reference Heselwood and McChrystal1999) found that in the UK Bradford Panjabi community, speakers who acquired their heritage language in the UK and those who acquired it in their home country produced voiceless aspirated Panjabi stops in the same way. However, this was not the case for the voiced Panjabi stop, where the speakers who were born in the UK used a pattern similar to that of British English. This change is due to the influence of the phonetics of British English stops, and may also be an indicator of language dominance, since those who were born in Pakistan showed less transfer from English (Heselwood & McChrystal Reference Heselwood and McChrystal1999). Similarly, in a study of stops produced by speakers of Sylheti living in the UK, McCarthy et al. (Reference McCarthy, Evans and Mahon2013) found that speakers who were born in the UK or arrived early had productions in English that were similar to monolingual speakers of British English. Speakers of Sylheti who arrived in the UK when they were older tended to produce English stops with the Sylheti production pattern, showing L1 to L2 transfer. In an investigation of VOT among speakers of Welsh in Patagonia, Argentina, Sleeper (Reference Sleeper2020) found that younger speakers had shorter VOT in voiceless stops, indicating a transfer effect from Spanish. The values found among these speakers were also shorter than those found among speakers of a similar age in Wales, where voiceless stops still have long VOT. In another study, the VOT values of Japanese and English produced by early Japanese-English bilinguals were compared to those of monolingual speakers of both languages (Harada Reference Harada2003). Japanese was the speakers’ heritage language and their L1, while English was their L2. The findings showed that the speakers were able to distinguish the VOT pattern between English and Japanese through creating a new category for English VOT. However, this was not the case for Japenese VOT values, which were longer (more English-like) than those of Japanese monolinguals, thereby reflecting a case of transfer from L2 to L1.
The VOT patterns across three generations of heritage speakers of Italian, Russian and Ukrainian in Toronto, Canada were examined in a study by Nagy (Reference Nagy2015). For heritage speakers of Italian, the results showed that VOT patterns were the same across the three generations. However, in the case of heritage speakers of Russian and Ukrainian, there was a difference in contrast with non-heritage speakers. For first and second generations of Russian speakers, the VOT values were similar to Homeland Russian, but for third generation speakers, the VOT patterns were drifting towards the range for Canadian English. As for Ukrainian speakers, the findings indicated that longer VOTs were exhibited across the generations, but the values did not reach the range for English. To investigate if these results are due to incomplete acquisition or attrition, Nagy (Reference Nagy2015) tested for correlations between ethnic orientation and VOT values. There were no significant correlations between the results from Ethnic Orientation Questionnaire and VOT. Accordingly, Nagy (Reference Nagy2015) concluded that speakers who do not use their heritage language or have negative attitudes towards it do not necessarily exhibit less Homeland-like patterns; hence, the drift in VOT values was neither attributed to influence of the dominant language nor to incomplete acquisition of Homeland patterns.
Cases of reverse transfer in bilingual speakers are not only restricted to stops. One study explored the realization of post-vocalic /ɹ/ in varieties of English and German spoken by L1 German speakers (Ulbrich & Ordin Reference Ulbrich and Mikhail2014). The study focused on the distinction in the implementation of post-vocalic /ɹ/ by speakers of a non-rhotic variety of German spoken in Berlin and rhotic and non-rhotic varieties of English used in Belfast and Oxford. The results showed that being exposed to a rhotic variety of English - the speakers’ L2 - led to the realization of a post-vocalic /ɹ/ in the non-rhotic variety of German, the L1.
Aside from phonetic and phonological factors, sociolinguistic factors also play a role in the phonetic realization of a sound in bilingual speakers. Studies have shown that among bilingual speakers, the dominance of one language over the other can be in part determined by the status of the language and its recognition as an official language in the context concerned. Kaland et al. (Reference Kaland, Galatà, Spreafico and Vietti2017) examined reading recordings of bilinguals from South Tyrol in Italy, who speak both Tyrolean and Italian. Tyrolean-dominant speakers were found to have adopted variants of the /ɹ/ sound that are used in Italian more than Italian-dominant speakers adopted variants of /ɹ/ that are common in Tyrolean. This is because Italian has a more dominant status in public life. In the case of heritage language speakers, it is the host language that is the dominant language in public life and the official language in the country; however, this does not deny the effect of the speakers’ attitude towards their heritage language. In accordance with this, attitudes towards languages may also determine the influence of one language over the other in relation to transfer. Law (Reference Law2017) showed that Cantonese-English bilinguals living in Hong Kong who had stronger attitudes towards their L1 (Cantonese) resisted the influence of their L2 on their L1 both at segmental and suprasegmental levels. Identity may also affect the language use of heritage speakers. Previously conducted sociolinguistic studies have argued that identity and social networks influence the realization of phonetic variants (Alam & Stuart-Smith Reference Alam and Stuart-Smith2011; Samant Reference Samant2010). This might be in cases where the speakers have more contacts within their heritage community and identify with it more than the host community.
1.2 The Western Armenian linguistic situation
Armenian is a distinct branch in the Indo-European language family, spoken by six million people. The two main varieties of this language are Western Armenian (WA) and Eastern Armenian (EA). WA is based on the dialect spoken in modern-day Istanbul while EA developed from the one spoken in the Ararat Valley and Yerevan. EA is the official language in present-day Armenia and is also spoken by Armenians living in Iran, India, and the former Soviet Union (Sakayan Reference Sakayan2007). WA is spoken across the Middle East as well as in Europe, South America and the US (Vaux Reference Vaux1998; Godson Reference Godson2004). Armenians have been in Lebanon for over two centuries. The most recent group arrived during World War I, and were refugees from Western Armenia. This has resulted in a substantial Armenian community in Lebanon, estimated at 156,000 people (Lebanon Overview 2008). As for Armenian Americans, Armenians immigrated to the United States in three groups. The first group arrived during the 20th century and included those who escaped the Ottoman Empire, specifically the 1915 Genocide. The second group included Armenians who escaped the civil war in Lebanon (1975-1990) as well as the Revolution in Iran (1978-1979). The third group arrived in the 1980s and included Armenians from the Soviet Union; this influx of Armenians continued even after the collapse of the Soviet Union, where many left the newly independent Republic of Armenia. Accordingly, while WA was originally the predominant variety of Armenian Americans, following the inflow from Iran and the Republic of Armenia, the number of EA speakers considerably increased, specifically in Southern California (Chahinian & Bakalian Reference Chahinian and Bakalian2016). Today, the number of Armenians in the US is speculated to be as many as 1,500,000 with California being home to the largest Armenian population in the country (Bolsajian Reference Bolsajian2018).
In relation to WA and language transfer, Godson (Reference Godson2004) conducted a study on WA heritage speakers in the US, examining the effects of L2 English on the L1 (WA), particularly in vowel production. It was found that the age at which the speakers were exposed to English had a significant effect on the production of WA vowels. The participants were of two types: those who learned English before the age of eight and those who learned it as adults. The results showed that even though English had a stronger influence on the WA vowels of speakers who learned it before age eight, it still had an effect on the vowels of those who learned it in adulthood. However, this effect was evident only in the WA vowels that are already acoustically close to English, that is, the front vowels /i/, /ε/ and /a/, where they are similar to English in the case of the speakers who were exposed to it as children.
1.3 Stop voicing
Stops, or plosives, are made up of a closure and release, which is sometimes followed by aspiration. Voice onset time (VOT) refers to the time lapse that occurs between the release of closure and the onset of vocal fold vibration (Lisker & Abramson Reference Lisker and Abramson1964). Lisker & Abramson (Reference Lisker and Abramson1964) studied the voicing patterns of 11 languages, noting that the timing of glottal pulsing relative to supraglottal articulation determines the consonantal characteristics, such as voicing, aspiration, as well as features related to ‘force of articulation’. Voiced stops are sometimes described as ‘pre-voiced’ - produced with vibration of the vocal folds beginning during the closure, also known as negative VOT or ‘voicing lead’. Voiceless stops have no voicing during the closure, and whether voicing begins immediately after the release or after a period of aspiration is the difference between voiceless unaspirated (0 VOT, ‘short lag’) and voiceless aspirated (positive VOT, ‘long lag’) stops (Lisker & Abramson Reference Lisker and Abramson1964; Lisker & Abramson Reference Lisker and Abramson1970). (Ladefoged & Maddieson Reference Ladefoged and Ian1996, 70) define aspiration as ‘a period after the release of a stricture and before the start of regular voicing (or the start of another segment, or the completion of an utterance) in which the vocal folds are markedly further apart than they are in modally voiced sounds’, resulting in an expulsion of air. Cross-linguistic research has found that some languages have a voicing contrast between voiced stops and voiceless unaspirated stops, such as Portuguese (e.g., /d/ vs /t/), and other languages having a contrast between voiceless aspirated and voiceless unaspirated, such as English (e.g., /th/ vs /t/), while some languages have a three-way contrast, such as Thai (Cho & Docherty Reference Cho, Whalen and Docherty2019). These are illustrated in Figure 1.
In the traditional approach, the phonological treatment of the contrast describes both patterns as a contrast of the feature [voice] (Honeybone Reference Honeybone2005), where in a binary system, with a contrast as in Portuguese, the voiced stop is [+voice] and the voiceless unaspirated one is [-voice]. In a binary description of a language like English, the traditional approach would consider the voiceless unaspirated stop [+voice] and the voiceless aspirated one [-voice]. In the framework of laryngeal realism, the contrast between truly voiced and voiceless unaspirated has been described in terms of the phonological feature [voice] while the contrast between voiceless aspirated and voiceless unaspirated are described as [spread glottis] (Iverson & Ahn Reference Iverson and Ahn2007; Schwarz et al. Reference Schwarz, Sonderegger and Goad2019). In a binary system, this means that in a language like Portuguese, the voiced stop is [+voice] and the voiceless unaspirated one is [-voice], while in a language like English, the voiceless aspirated stop is [+spread glottis] and the voiceless unaspirated one is [-spread glottis]. In a privative approach, the contrasts are described with the presence versus absence of the relevant feature.
In English, what are written as voiced stops are in fact voiceless unaspirated, as found by research on a number of varieties including American, British and Irish English (Lisker & Abramson Reference Lisker and Abramson1964; Wells Reference Wells1982; Kelly Reference Kelly2019). In their meta-analysis of VOT contrasts, Cho & Docherty (Reference Cho, Whalen and Docherty2019) report average VOT of 22 msec for ‘voiced’ stops and 94 msec for voiceless (aspirated) stops in American English. It should also be noted that in American English, some speakers have been found to have truly voiced stops, that is, their contrast is between voiceless aspirated stops and voiced stops (Lisker & Abramson Reference Lisker and Abramson1964; Schertz Reference Schertz2013). One of the four speakers examined by Lisker & Abramson (Reference Lisker and Abramson1964) produced /b,d,ɡ/ with voicing during the closure. In an examination of VOT in emphatic (corrective) speech among American English speakers, Schertz (Reference Schertz2013) also found that some speakers (nine out of 12) produced some voicing during the closure in initial voiced stops (in total, about 30% of tokens). These findings suggest that, at least in American English, the production of /b,d,ɡ/ may vary between truly voiced stops and voiceless unaspirated stops.
In relation to VOT in Arabic dialects, Bellem (Reference Bellem2014) noted that Lebanese Arabic can be categorized as having a two-way contrast between truly voiced stops and voiceless unaspirated stops. Different varieties of Arabic have been found to differ in the phonetic implementation of the voicing contrast. In Najdi Arabic, spoken in Saudi Arabia, measures of VOT showed a contrast between voiced stops and voiceless aspirated stops (Al-Gamdi et al. Reference Al-Gamdi, Al-Tamimi and Khattab2019). For /t/, the average VOT was 68.5 msec and for /d/, -74.7 msec. A similar pattern has been described for Qatari Arabic, where Kulikov (Reference Kulikov2016); Kulikov (Reference Kulikov2020) found an average VOT of 55 msec for voiceless stops and -69 msec for voiced stops. When Flege & Port (Reference Flege and Port1981) examined Saudi Arabic, they found longer VOT in initial position for voiceless stops /t/ and /k/ than was previously found in Lebanese Arabic by Yeni-Komshian et al. (Reference Yeni-Komshian, Caramazza and Preston1977). In the Lebanese study, /t/ had a mean VOT of between 20 and 30 msec (depending on the following vowel), while /d/ was between -40 and -70 msec (Yeni-Komshian et al. Reference Yeni-Komshian, Caramazza and Preston1977). In a recent phonetic examination of stop voicing in Lebanese Arabic, Al-Tamimi & Khattab (Reference Al-Tamimi and Khattab2018) measured a variety of acoustic correlates and found an average of -67 msec VOT for voiced (singleton) consonants and 8.7 msec for voiceless consonants in word-medial position.
Some work on VOT has also found differences depending on the position of the stop in the word. In American English, the contrast between the voiced and voiceless alveolar stop is neutralized in some word-medial positions, especially after a stressed syllable, where it becomes a tap (e.g., Ladefoged & Maddieson Reference Ladefoged and Ian1996; Iverson & Salmons Reference Iverson and Salmons1995). In research on Ixcatec (Oto-Manguean), DiCanio (Reference DiCanio2011) found that in word-medial position, voiceless aspirated stops had shorter duration of the closure and aspiration than in word-initial position.
1.3.1 Stop voicing in Armenian
Phonological and historical research on Armenian notes the following: ‘Classical Armenian used a three-way voicing contrast between voiceless aspirates, plain voiceless, and plain voiced consonants. [Eastern Armenian] has essentially the same system as Classical, while [WA] has plain voiced consonants for the second series and voiceless aspirates for the third series, thus neutralizing the system to a two-way contrast’ (Baronian Reference Baronian2017, 11). WA maintained the voiceless aspirates of Classical Armenian, but changed the voiced into voiceless aspirates, while also turning the voiceless unaspirated into voiced, as shown in Table 1. As such, it has flipped the voicing of the latter two, and ended up with a two-way contrast, as shown in Figure 2.
If we consider the phonetic pattern to be representing phonological specification, this contrast is typologically uncommon, insofar as it contrasts the two extremes of the VOT continuum, those being truly voiced stops with voiceless aspirated stops. Under the laryngeal realism approach, a language that has a phonological contrast between the two extremes of the continuum would be over-specified, since it involves both [+voice] and [+spread glottis] (Beckman et al. Reference Beckman, Helgason, McMurray and Ringen2011). While uncommon, this pattern has been found for Swedish (Beckman et al. Reference Beckman, Helgason, McMurray and Ringen2011) as well as Qatari Arabic (Kulikov Reference Kulikov2020) and Najdi Arabic (Al-Gamdi et al. Reference Al-Gamdi, Al-Tamimi and Khattab2019), described above. However, Salmons (Reference Salmons2020) notes that it is important to distinguish between phonetics and phonology in this context. What would be an unusual pattern phonologically might be simply explained in terms of phonetics. For example, voicing during the closure in a language like Swedish or Qatari Arabic could be analyzed more simply as a phonetic enhancement of the contrast, without requiring a phonological specification. Under a laryngeal realism approach, which has been argued for Germanic languages (Iverson & Salmons Reference Iverson and Salmons1995), this would mean that these languages use the [spread glottis] pattern, and speakers enhance the contrast by adding some voicing to the unmarked configuration. Salmons (Reference Salmons2020) describes the voiceless (aspirated) series as being marked for the [spread glottis] pattern, while the ‘voiced’ stops (usually produced as voiceless unaspirated) are phonologically unspecified, with voicing then added to ‘overmark’ the contrast (127). In this way, the contrast in Swedish would be the same as that among speakers of American English who produce voicing during the closure for /b,d,ɡ/.
A small amount of research has examined the stop contrast in Armenian. In a phonetic study of the voicing contrast in bilabial stops in EA, three speakers of the Tehran variety were examined, and it was found that the three-way contrast was maintained in various prosodic conditions (Hacopian Reference Hacopian2003). The averages of VOT in word-internal position for aspirated sounds was 66-92 msec, for unaspirated 7-31 msec, and voiced -7 to -82 msec. In another recent study on the acoustics of the three-way stop contrast in EA, Seyfarth & Garellek (Reference Seyfarth and Marc2018) found that VOT and aspiration were significantly different among all three categories. Examining stops and affricates in nonce words produced by speakers of WA in Lebanon, Kelly & Keshishian (Reference Kelly and Keshishian2019) found an Arabic-like pattern of a contrast between voiced obstruents and voiceless unaspirated obstruents. This suggests an effect of the majority language on the variety of WA spoken in Lebanon. They also found that f0 was higher, and intensity lower, following voiceless sounds than voiced sounds, for both stops and affricates.
1.4 Current study
No research has examined the phonetic realization of the WA voicing contrast, except the study of nonce words (Kelly & Keshishian Reference Kelly and Keshishian2019). The current study expands on this by focusing only on stops and by including real words. Furthermore, since WA is generally spoken as a heritage language, examining the voicing contrast in two different communities with different patterns of voicing in the majority language can provide insight into transfer patterns among heritage speakers of this language. Experiment 1 examines nonce words containing sounds that have been described in the WA literature as voiced stops and voiceless aspirated stops, both word-initally and word-medially. This experiment involves speakers from Lebanon only, in order to determine whether they have the contrast as described, and to determine whether the stops are produced differently in word-inital versus word-medial position. Experiment 2 expands upon the results of Experiment 1, by examining both real words and nonce words produced by speakers from Lebanon as well as speakers from the US.
2 Experiment 1: Nonce words in WA in Lebanon
2.1 Hypotheses
If the contrast remains as described in the WA literature, voiceless stops should be aspirated and voiced stops should have voicing during the closure (Fairbanks Reference Fairbanks1948; Baronian Reference Baronian2017). However, if there is an influence of Lebanese Arabic, we would expect there to be a contrast between truly voiced stops and voiceless unaspirated stops. It was also predicted that stops in word-medial position would have shorter closure duration of all relevant measures than stops in word-initial position.
2.2 Methodology
2.2.1 Participants
Eight native speakers of Armenian (four female, four male), who grew up in Beirut were recorded. Six of them (three female, three male) were aged 24-32, and the other two speakers were aged 53 and 59. All attended school through Armenian, spoke Armenian at home, and were also native speakers of Lebanese Arabic, and proficient in English.
2.2.2 Stimuli
Stimuli were nonce words that contrast voiced and voiceless coronal stops in word-initial and word-medial position: /thatha, dada/. Each speaker produced each word four times, each time in a carrier sentence. This led to a total of 128 segments for analysis (8 speakers * 4 repetitions * 2 sounds * 2 word positions). The carrier sentence was ‘I say X again’, where X is the target word. In Armenian this is [nu'ɾεn X ɡɘ, sεm]. This ensured that the target word was always sentence-medial. Footnote 1
2.2.3 Procedure
Speakers were recorded with a Zoom H5 recorder (.wav file, 44.1kHz) in a quiet room in their own homes. The sentences were presented in the Armenian script on paper, in a block together, after a block with other sentences for a separate experiment. The speakers were allowed to read through the list before recording, and none of them appeared to have (or reported having) any difficulty with the nonce words.
2.2.4 Measurements
Target words were labelled in Praat (Boersma & Weenink Reference Bolsajian2018) for whatever was present of closure, burst and aspiration, as shown in Figure 3. The closure, whether voiced or voiceless, was defined by a sudden drop in intensity shown by either silence (for voiceless stops) or a low-intensity waveform and a voicing bar (for voiced stops). The burst was identifed by a sudden excitation of the waveform and a clear vertical bar in the spectrogram. Aspiration was identified by low-intensity aperiodic noise in the waveform and spectrogram, which ended with the onset of the vowel, defined by the beginning of regular voicing pulses in the spectrogram and an increase in intensity. VOT was calculated for voiced sounds as the duration of voicing during the closure until the release, this being coded as a negative number, and for voiceless sounds as release burst plus aspiration.
The independent variables were Voicing (voiced vs voiceless) and Word position (initial vs medial). A linear mixed effects regression analysis was run using the lmer function from the lmerTest package (Kuznetsova et al. Reference Kuznetsova, Brockhoff and Christensen2017) in R (R 2008) to determine if VOT could be predicted by the independent variables. An alpha-level of 0.01 was chosen. Linear regression models were built up term by term and compared using the anova function. This means that first, a model with one independent variable was created, and then another version of the model but with a second independent variable added, was created, and these two were compared. If the more complex model was a significantly better fit for the data, this was the chosen model. Models with and without interactions were compared in the same way, until the best model for the data was found. For the purposes of clarity and relevance, we only report the results of the best model.
2.3 Results
Figures 4 and 5 show examples of a voiced stop and voiceless stop from initial position in the nonce words, with annotations of the closure, burst, aspiration and vowel.
The possible independent variables tested in the models were Voicing (voiced vs voiceless), and Word position (initial, medial) and possible random factors of Speaker, Sex, Age and Token. (Age was a categorical variable with two levels: younger (aged 24-32) and older (53 & 59).) The best model with the dependent variable VOT was one with only Voicing as an independent variable and Speaker as a random factor (R code: lmer(VOT ∼ Voicing + (1|Spk))). These results (Table 2) show that VOT was significantly longer for voiceless stops than voiced stops. The best model did not include Word position, meaning that adding this variable did not improve the model’s fit for the data. This can be seen in Figure 6. Table 3 shows means and standard deviations for VOT for each sound.
2.4 Discussion
The results for Experiment 1 showed that, contrary to the hypothesis and previous work on other languages (e.g., DiCanio Reference DiCanio2011), word-medial stops were not any shorter than word-initial stops. This is shown by the fact that including Word position as an independent variable did not improve the model. However, it is also possible that this lack of a difference was found because stress fell on the final syllable. In both EA and WA, stress has been described as falling on the final full vowel in the word (Vaux Reference Vaux1998; Fairbanks Reference Fairbanks1948; Dolatian Reference Dolatian2019), and this is what occurred in the current experiment. These results are similar to what has been found for Polish, where the patterns of VOT are not affected by whether the stop is in initial position or an intervocalic position (Keating Reference Keating1984).
As expected, voiced stops had voicing during the closure, meaning they are truly voiced. Voiceless stops were predicted to be aspirated if they were following previous phonological descriptions of WA (Fairbanks Reference Fairbanks1948; Vaux Reference Vaux1998; Baronian Reference Baronian2017), or unaspirated, if there is an effect of the majority language (Kelly & Keshishian Reference Kelly and Keshishian2019). The average VOT for voiceless stops was 20.6 msec, with aspiration of 14.5 msec. When compared to the results for EA, which found voiceless aspirated stops to have an average of 66-92 msec VOT (three speakers) and voiceless unaspirated 7-31 msec VOT (Hacopian Reference Hacopian2003), the current results are more in line with voiceless unaspirated stops. That study examined bilabial stops, and other work has found that coronal stops usually have even longer VOT than bilabial stops (Maddieson Reference Maddieson, Laver and Hardcastle1997; Cho & Ladefoged Reference Cho and Ladefoged1999; Kelly Reference Kelly2019). Also for EA, Seyfarth & Garellek (Reference Seyfarth and Marc2018) found aspiration of word-initial voiceless aspirated stops of around 75 msec, and voiceless unaspirated of around 15 msec (bearing in mind these were part of a three-way distinction). Cross-linguistic research by Cho & Docherty (Reference Cho, Whalen and Docherty2019) found that voiceless aspirated (denti-) alveolar sounds usually have VOT of 57-97 msec, and unaspirated 1.4-21 msec. As such, the current results for voiceless stops fit in better typologically with voiceless unaspirated stops than voiceless aspirated stops (Lisker & Abramson Reference Lisker and Abramson1964). Based on these findings, the voicing contrast found among WA speakers in Lebanon can be illustrated as in Figure 7.
Since Lebanese Arabic has been described as having a contrast between voiceless unaspirated and voiced stops (Al-Tamimi & Khattab Reference Al-Tamimi and Khattab2018), it is likely that WA as spoken in Lebanon has been influenced by this. Heritage speakers of WA in the US have been found to produce vowels that are influenced by English (Godson Reference Godson2004). It is also possible that, since WA has a two-way contrast, the voiceless aspirated sounds have gradually reduced aspiration to reframe the two-way contrast by removing the wide gap between voiceless aspirated and voiced stops. This is discussed further in section 4.
In order to truly clarify whether the pattern here is influenced by Arabic, it is useful to compare these results to speakers of WA who do not speak Arabic. If we find the same pattern for both groups, this suggests WA has undergone a change whereby it lost aspiration in its voiceless stops, making the contrast more typologically common. If the groups are different, this suggests an effect of the majority language on the WA contrast.
3. Experiment 2: Nonce words and real words in WA in Lebanon and the US
Based on the results of Experiment 1, stimuli for Experiment 2 did not focus on word-initial versus word-medial sounds, and instead only examined stops in word-initial position. This time, the majority of the target words were real words.
Lebanese Arabic has a contrast between voiceless unaspirated and voiced stops (Yeni-Komshian et al. Reference Yeni-Komshian, Caramazza and Preston1977; Al-Tamimi & Khattab Reference Al-Tamimi and Khattab2018) and this pattern was found in the WA in nonce words in Experiment 1. In comparison, for the majority of American English speakers there is a contrast between voiceless unaspirated stops, such as /t/ which is written as <d> and voiceless aspirated stops, such as /th/, written as <t>.
3.1 Hypotheses
Based on the findings for Experiment 1, it was expected that voiced and voiceless stops would be significantly different from one another for both groups of speakers. However, the actual acoustic patterns were expected to differ between the groups, with VOT reflecting the patterns found in Arabic or English, depending on the L2/majority language of the speakers. Speakers from Lebanon were expected to show the pattern found in Experiment 1, that is, voiced stops would have voicing during the closure (negative VOT) and voiceless stops would be unaspirated. Speakers from the US were hypothesized to have a more English-like pattern, that is, that voiced stops would be voiceless unaspirated while voiceless stops would be highly aspirated. It was expected that there would be no difference between real words and nonce words.
3.2 Methodology
3.2.1 Participants
The participants in Lebanon were six of the speakers from Experiment 1, aged 24-59 (four female, two male). The participants in the US were six native speakers of WA who grew up in California, aged 21-41 (five female, one male). None of the US participants spoke Arabic, and all spoke American English outside the home. They all spoke WA at home and considered it their first language. All participants had two WA-speaking parents.
3.2.2 Stimuli
Target words (listed in Figure 8) were all disyllabic with final stress, seven beginning with /d/, seven /t/, produced twice, as well as the two nonce words from Experiment 1, produced four times, all in the same carrier sentence as in Experiment 1. This gave: 12 speakers * 14 real words * 2 repetitions = 336 tokens, and 12 speakers * 2 nonce words * 4 repetitions = 96 tokens; 336 + 96 = 432 tokens, with one removed due to a reading error, leaving 431 tokens.
3.2.3 Procedure
Speakers were recorded with a Zoom H5 recorder (.wav file, 44.1kHz). The participants in Lebanon were recorded in a quiet room in their own homes. The participants in the US were recorded in the recording studio at the Institute of Armenian Studies at the University of Southern California. The sentences were presented in the Armenian script on paper, in a block together, with the nonce words interspersed with the real words.
3.2.4 Measurements
Again, the word-initial stops were labelled in Praat (Boersma & Weenink Reference Boersma and David2018) for whatever was present of closure, burst and aspiration, using the definitions as in Experiment 1, and VOT was then calculated. A linear mixed effects regression analysis was run, as in Experiment 1. The independent variables tested were Voicing (voiced vs voiceless), Group (Lebanon vs US) and Word type (real vs nonce). Possible random factors were again Speaker, Sex, Age and Token. The chosen alpha-level was again 0.01.
3.3 Results
The best model with the dependent variable VOT was one with all three independent variables: Voicing, Group and Word type and interactions, and Speaker as a random factor. (R code: lmer(VOT ∼ Voicing * Group * Word.type + (1|Spk))). These results (Table 4) show a highly significant main effect for all three independent variables, as well as a significant interaction between Voicing and Word type. The main effect of Group showed longer VOT for the US speakers. Because of the interaction between Voicing and Word type, a post-hoc pairwise test was run using the emmeans function (emmeans package (Lenth Reference Lenth2019)), with a Tukey adjustment for running multiple tests. This showed that voiceless stops had significantly longer VOT than voiced stops for both real words and nonce words. While real and nonce words did not differ from one another for voiceless stops, the results for voiced stops showed that real words had shorter VOT, meaning a shorter closure, than nonce words. This can be seen in Figure 9.
These results show that WA speakers whose L2 is Arabic have significantly different productions from speakers whose L2 is English. The speakers with an L2 of English have much more highly aspirated voiceless stops than those with an L2 of Arabic. The contrast between voiced and voiceless differs between the groups, whereby for speakers of Arabic, voiced stops have voicing during the closure and voiceless stops have a small amount of aspiration, while for speakers of English, voiced stops have a wide range of the amount of voicing during the closure and voiceless stops have a higher amount of aspiration, typical of English (Lisker & Abramson Reference Lisker and Abramson1964; Cho & Docherty Reference Cho, Whalen and Docherty2019). The higher aspiration is shown in a representative token in Figure 10. Figures 11 and 12 show examples of voiced stops by a US speaker, one with voicing during the closure and one with short aspiration.
For voiced stops, nonce words had a longer closure than real words. This can also be seen in comparing the results for Lebanese speakers in Table 5 to those in Table 3. The voiceless stops, in contrast, are quite similar across the two experiments.
Figure 9 shows an interesting pattern whereby voiced stops produced by US speakers have a lot of variation. In order to determine if speakers were doing different things, we examined the results by speaker, as shown in Figure 13, and in the histogram in Figure 14.
It can be seen that Speakers 24 and 25 have an English-like pattern, where their voiced stops are voiceless unaspirated and their voiceless stops are highly aspirated. For the other four speakers, the voiceless stops are also aspirated. However, their voiced stops have a large amount of variation, meaning that their [d] sometimes has voicing during the closure, and is sometimes more like a voiceless unaspirated sound. There appeared to be no difference in speakers’ linguistic background that could explain this pattern.
Based on these findings, the voicing contrast found among speakers in the US can be illustrated as in Figure 15.
3.4 Discussion
The difference between voiced stops when they were in nonce words versus real words suggests that nonce words are not always fully reflective of patterns found in natural words. While real words had a shorter closure, they are still within the category of voiced stops, having voicing during the closure.
The differences found between the two groups can be explained by an interaction between languages. Overall, WA speakers in Lebanon had little aspiration in voiceless stops, and voicing during the closure for voiced stops. Since WA was described as having voiced stops (Baronian Reference Baronian2017), one interpretation of these findings is that these speakers maintained that pattern of voicing during the closure, but the voiceless sound lost its original aspiration through contact with Arabic in this multilingual community.
The situation among US speakers is more complex, with more variation in this group. This occurred in two ways: voiced stops had a range of closure durations, from voicing during the closure (negative VOT/lead time) all the way to a voiceless unaspirated realization (0 VOT/short lag). The other type of variation was across speakers, with two of the six speakers showing a fully English-like pattern. This raises the question of how to interpret these data in the context of the linguistic background of these speakers, and previous findings on transfer effects. The voiceless stop for all of the US speakers is consistently aspirated, which is the previously described pattern for voiceless stops in WA. The voiced stop is variable for four speakers, suggesting one of two interpretations. It is possible that they had maintained the original voicing during the closure of WA, but that contact with English has made their /d/ category more variable in production. In other words, contact with English has influenced their /d/ to sometimes be realized as voiceless unaspirated, /t/. Another interpretation, related to the work by Lisker & Abramson (Reference Lisker and Abramson1964) and Schertz (Reference Schertz2013) mentioned in Section 1.3, is that these four speakers produce the voiced stop in line with the inconsistent productions of it by some speakers of American English.
4. General discussion
There is no monolingual community of WA (Godson Reference Godson2004), so there is no ‘control group’ to compare against in order to determine whether WA has undergone a general change to make the voicing contrast more efficient and typologically common, or whether the patterns we found are fully due to transfer effects. In any case, if WA had undergone a change itself, this would not have any effect on the present-day speakers across the world unless the change had happened before the movement of speakers began. As such, the most straightforward conclusion here, since we find different production patterns in the two groups, is that WA has been influenced by the voicing patterns of the majority language, similar to findings by Heselwood & McChrystal (Reference Heselwood and McChrystal1999) and McCarthy et al. (Reference McCarthy, Evans and Mahon2013). Such situations of community-wide multilingualism can lead to an influence of the majority language on the heritage language. This was found by Godson (Reference Godson2004) in relation to the production of vowels by heritage speakers of WA in the US. The differences in VOT patterns in WA between US and Lebanese speakers appear to be due to cross-linguistic transfer that results from language contact of WA with the majority language of the speakers. Earlier studies have detected transfer of phonetic features from L1 to L2 (Flege & Port Reference Flege and Port1981; Cox & Palethorpe Reference Cox and Palethorpe2005; Balukas & Koops Reference Balukas and Koops2015) as well as L2 to L1 (Flege Reference Flege1987; Heselwood & McChrystal Reference Heselwood and McChrystal1999; Harada Reference Harada2003; Sundara et al. Reference Sundara, Polka and Baum2006; Ulbrich & Ordin Reference Ulbrich and Mikhail2014; Nagy Reference Nagy2015), the latter of which is likely the case in this study. Both groups have maintained the sound that is the same as the majority language - the voiced stop for Arabic and the voiceless aspirated stop for English. The stop that differs from the original pattern described for WA is different for the two communities: in Lebanon, WA speakers have de-aspirated the voiceless stop, while in the US, some speakers have replaced the voiced stop with a voiceless unaspirated stop, and some speakers do this with variation.
Our results therefore indicate that the transfer effect is a phonetic one rather than being at the phonological level. In this interpretation, WA is specified for [voice] but has a phonetic enhancement (Salmons Reference Salmons2020) in the form of a spread glottis gesture for the voiceless series, similar to what Avery & Idsardi (Reference Avery and Idsardi2001) argue for Japanese. Footnote 2 The voiceless stops do not get voiced in word-medial position because of this spread glottis gesture. In the Arabic language environment, the spread glottis gesture is attenuated, similar to what happens in English when a voiceless stop occurs at the onset of an unstressed syllable (Keating Reference Keating1984; Iverson & Salmons Reference Iverson and Salmons1995) - but in this case due to phonetic transfer from Arabic. In the English language environment, in contrast, the aspiration from the spread glottis gesture is maintained in the voiceless stops, because it also occurs in English, but the voicing in voiced stops is more variable due to phonetic transfer from American English.
These results are in line with the process of equivalence classification proposed by Flege (Reference Flege1987); Flege & Eefting (Reference Flege and Eefting1987); Flege (Reference Flege1995), as well as PAM-L2 (Best & Tyler Reference Best and Tyler2007), since there is a phonetic assimilation of the heritage language category (L1) to the majority language (L2) pattern. The results are similar to the findings of the study conducted by Sundara et al. (Reference Sundara, Polka and Baum2006), where, in the case of bilinguals, the phonetic system of one language is influenced by the other. The pattern observed here is also similar to the pattern among heritage Russian and Ukrainian speakers in Toronto (Nagy Reference Nagy2015), with the VOT patterns drifting toward that of the majority language in both the US and Lebanon. However, since language attitudes and ethnic orientation were not studied as independent factors, we cannot determine if this drift can also be due to sociolinguistic attributes.
Bilingual proficiency can also play an important role in language transfer. As argued by Chen & Mok (Reference Chen and Mok2019), bilingual speakers who are proficient in both languages would be able to learn the language-specific phonetic realizations of the same phoneme without undergoing transfer. In the same way, if the bilingual speakers are not equally proficient in both languages, transfer from one of the languages can act as a marker of which is the dominant language for a particular speaker (Kaland et al. Reference Kaland, Galatà, Spreafico and Vietti2017). Since WA is a heritage language both in the Lebanese and US contexts, transfer may be regarded as a marker of language dominance. This would need to be investigated in a sociolinguistic study. Sociolinguistic features can affect language change, specifically issues related to prestige and language identity (Alam & Stuart-Smith Reference Alam and Stuart-Smith2011; Samant Reference Samant2010). In order to explore these effects, it would be beneficial to use Ethnic Orientation questionnaires to test if there is any correlation between the speakers’ VOT patterns and their attitudes toward WA as their heritage language.
One important distinction we found between the two populations was that there is more variation among the US group than the Lebanese group. It is also interesting to note that speakers of both groups anecdotally mentioned that the other group sounds different. This may be due to segmental or suprasegmental factors, which would require further research to tease apart, but there is a sense that there are perceptible differences between the two communities in their WA productions. Measuring VOT in stops from natural speech may also reveal insights into the phonology of WA from Lebanon and the US. Future research could also examine the perception of stops by both groups, for example, to see whether the speakers who produce variation in their voiced stops also perceive this variation.
Acknowledgements
Special thanks to the Institute of Armenian Studies at the University of Southern California, and in particular, to Shushan Karapetian, for connecting us with participants and providing access to the recording studio. Thanks to all speakers for their participation and also to the reviewers and editors for helpful comments.