Didn't hear that coming: Effects of withholding phonetic cues to code-switching

Alice Shen; Susanne Gahl; Keith Johnson

doi:10.1017/S1366728919000877

Didn't hear that coming: Effects of withholding phonetic cues to code-switching

Published online by Cambridge University Press: 31 January 2020

Alice Shen ,

Susanne Gahl and

Keith Johnson

Show author details

Alice Shen*: Affiliation:
University of California Berkeley
Susanne Gahl: Affiliation:
University of California Berkeley
Keith Johnson: Affiliation:
University of California Berkeley
*: Address for correspondence: Alice Shen, E-mail: [email protected]

Article contents

Abstract
Introduction
Experiment 1: concept monitoring
Experiment 2: eye tracking
Acoustic analysis
General discussion
Conclusion
References

Rights & Permissions

Abstract

Code-switching has been found to incur a processing cost in auditory comprehension. However, listeners may have access to anticipatory phonetic cues to code-switches (Piccinini & Garellek, 2014; Fricke et al., 2016), thus mitigating switch cost. We investigated effects of withholding anticipatory phonetic cues on code-switched word recognition by splicing English-to-Mandarin code-switches into unilingual English sentences. In a concept monitoring experiment, Mandarin–English bilinguals took longer to recognize code-switches, suggesting a switch cost. In an eye tracking experiment, the average proportion of all participants' looks to pictures corresponding to sentence-medial code-switches decreased when cues were withheld. Acoustic analysis of stimuli revealed tone-specific pitch contours before English-to-Mandarin code-switches, consistent with previous work on tonal coarticulation. We conclude that withholding anticipatory phonetic cues can negatively affect code-switched recognition: therefore, bilingual listeners use phonetic cues in processing code-switches under normal conditions. We discuss the implications of tonal coarticulation for mechanisms underlying phonetic cues to code-switching.

Keywords

code-switching bilingualism language comprehension phonetic cues

Type: Research Article
Information: Bilingualism: Language and Cognition , Volume 23 , Issue 5 , November 2020 , pp. 1020 - 1031

DOI: https://doi.org/10.1017/S1366728919000877 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © The Author(s) 2020

Introduction

Bilinguals frequently switch between languages mid-utterance. Many psycholinguistic studies on code-switching have reported a ‘switch cost’, i.e., an increased processing difficulty, in production (Meuter & Allport, Reference Meuter and Allport1999; Thomas & Allport, Reference Thomas and Allport2000; Costa & Santesteban, Reference Costa and Santesteban2004; Gollan & Ferreira, Reference Gollan and Ferreira2009, although see Kleinman & Gollan, Reference Kleinman and Gollan2016), recognition (Soares & Grosjean, Reference Soares and Grosjean1984), and comprehension (Olson, Reference Olson2017). How then do bilingual listeners manage the potentially difficult processing task of recognizing a code-switched word? A recent line of research points to subtle details of pronunciation as a possible key to this question.

For instance, Fricke, Kroll and Dussias (Reference Fricke, Kroll and Dussias2016) report subtle shifts in voice onset time (VOT) before an English-to-Spanish code-switch, while Piccinini and Garellek (Reference Piccinini, Garellek, Campbell, Gibbon and Hirst2014) report subtle shifts in intonation prior to code-switches in either direction. They further found that bilingual listeners use shifts in VOT and intonation as cues to anticipate code-switches. Phonetic cues to upcoming code-switches (henceforth ‘code-switching pronunciation’) may thus mitigate switch cost.

On the other hand, code-switching pronunciation could potentially make the comprehension process more difficult: perseverative coarticulation of matrix language phonetics into the code-switch – or indeed of the switch language back into the matrix language – might be detrimental to recognition.

There are at least three possible mechanisms by which code-switching pronunciation might arise. One is a ‘blending’ mechanism by which code-switching pronunciation might represent a blend of the phonetic features of both languages (Grosjean, Reference Grosjean2012; Olson, Reference Olson2013): the matrix language may come to sound more like the switch language, or vice versa. For example, Piccinini and Garellek (Reference Piccinini, Garellek, Campbell, Gibbon and Hirst2014) observed that stressed syllable pitch patterns in Spanish/English code-switched contexts were intermediate between those observed in unilingual contexts in either language. If such ‘intermediate’ prosodic contours are characteristic of utterances containing code-switches, they could serve as cues to an upcoming code-switch.

Another possibility is a ‘preparation’ mechanism by which code-switching pronunciation might reflect articulatory gestures that are preparatory to the production of a specific code-switched target.

These two explanations are mutually compatible, but entail slightly different empirical predictions. Under blending, code-switching pronunciation would be independent of specific upcoming code-switched targets. Under preparation, by contrast, the acoustic consequences of speakers preparing code-switched targets would depend on the articulatory gestures needed to prepare a specific target. Of course, code-switched utterances might very well be characterized both by general code-switching pronunciation patterns, such as the prosodic contours found in Piccinini and Garellek's (Reference Piccinini, Garellek, Campbell, Gibbon and Hirst2014) study, as well as by context-specific pronunciations arising in preparation for a specific code-switching target.

A third possibility is that code-switching pronunciation might reflect global cognitive costs of code-switching: if code-switching incurs a processing cost for the speaker, that increased processing load might cause an overall slowed speaking rate, for example. Under this scenario, the existence and degree of ‘code-switching pronunciation’ would depend on the degree of processing load (see e.g., Gollan, Kleinman & Wierenga, Reference Gollan, Kleinman and Wierenga2014, for evidence showing that code-switching does not necessarily or consistently entail a processing cost in production). We will not pursue this possibility further here, except to note that it is in principle compatible with both the blending and preparation scenarios: code-switching pronunciation may be a variable phenomenon modulated by processing demands of a specific code-switching context.

Phonetic consequences of code-switching may also differ across language pairs. The literature on phonetic reflexes of code-switching has so far been limited to English–Spanish, English–French, and English–Greek code-switching. One goal of the current study is to widen the evidence-base on the possible role in comprehension of phonetic reflexes of code-switching, by examining English-Mandarin code-switches.

We hypothesized that the comprehension of code-switched targets would differ depending on whether code-switched targets were spliced into utterances that were originally unilingual vs. utterances that originally contained code-switches. If that is the case, it would strongly suggest that there must be phonetic differences between unilingual vs. code-switched utterances, and that listeners use these differences as cues to upcoming code-switches. In other words, if it is true that bilingual speakers produce phonetic cues and listeners use them in comprehension, then manipulating the acoustic signal to remove those cues should impede recognition of the code-switch: if phonetic preparation acts as a ramp to ease the gradual transition to another language or to highlight the phonetic contrast between the languages, then removing the phonetic ‘ramp’ should make code-switches phonetically abrupt and difficult to anticipate.

While the current study was primarily designed to target the possible role of phonetic reflexes of code-switches on the comprehension process, we also analyzed the pitch contours of our stimuli, as a step towards pinpointing what acoustic events might be responsible for effects of the splicing manipulation on comprehension and to explore whether phonetic cues to code-switching were target-specific.

We focus on pitch contours because Mandarin has lexical tone while English does not. It is conceivable therefore that pitch patterns in English contexts preceding switches into Mandarin might reflect tonal properties of the Mandarin target. For example, pitch might dip in anticipation of a low tone, such that there is assimilatory anticipatory coarticulation with pitch ramping to meet the low onset of that low tone.

Tonal coarticulation has been observed in unilingual Mandarin speech (Xu, Reference Xu1997), which we describe in detail in the Acoustic Analysis section later on. English-to-Mandarin code-switching pronunciation might result in patterns resembling patterns of unilingual Mandarin tonal coarticulation. Alternatively, English-to-Mandarin code-switching pronunciation might differ from unilingual Mandarin tonal coarticulation: English does not have lexical tone, so pitch contours can in principle vary more freely in English than in Mandarin.

Tone-specific patterns are expected under the ‘preparation’ explanation for code-switching pronunciation, but not under the ‘blending’ explanation. Exploring these patterns can aid us in understanding the potential role of anticipatory coarticulation in code-switching pronunciation.

To test the hypothesis that anticipatory phonetic cues aid in processing code-switches, we conducted a concept monitoring experiment and an eye tracking experiment.

For both experiments, we spliced Mandarin code-switched target words from English-Mandarin code-switched sentences (e.g., I saw a màozi) into English sentences that were originally unilingual (e.g., I saw a hat) to withhold any anticipatory phonetic cues to the code-switch. The resulting spliced stimulus should bias the listener toward expecting the utterance to continue in English, as code-switch cues are absent. We compared listeners’ reaction times and proportions of looks to English and Mandarin targets spliced into English utterances that originally did vs. did not contain Mandarin targets, as illustrated in Figure 1. This resulted in four conditions: code-switched spliced, code-switched unspliced, unilingual spliced, and unilingual unspliced.

Fig. 1. Splicing auditory stimuli. The speaker recorded two sentence frames per experimental item: unilingual English sentences were recorded twice, and code-switched sentences were additionally recorded as unilingual English sentences. Target words were then cut from the unilingual or code-switched sentence frame and spliced into the fully English sentence frame.

Our prediction was that listeners will take longer to recognize code-switched target words, especially when spliced into unilingual utterances, since there would be no code-switching pronunciation to cue listeners to the upcoming code-switch.

Experiment 1: concept monitoring

This experiment tests whether listeners are slower to recognize Mandarin target words in English sentences if anticipatory phonetic cues to the code-switch are absent from the acoustic signal. This is tested by comparing reaction times to spliced and unspliced stimuli, in a concept monitoring experiment where participants see a pictured object and press a button when they hear the object named in an auditorily presented sentence. Spliced code-switched stimuli consist of a Mandarin target word spliced into an originally unilingual English utterance, so that the pronunciation in the portion of the utterance leading up to the target word will incorrectly bias the listener toward expecting an English target word. Unspliced code-switched stimuli consist of an originally code-switched English sentence with a Mandarin target word, so that the code-switching pronunciation leading up to the code-switched target word might aid in recognition of the code-switch. The prediction is that listeners will be slower to recognize the target when the phonetic information available is incongruent with the code-switch, so reaction times to spliced code-switched stimuli will be slower than to unspliced code-switched stimuli.

Method

Speaker

A 21-year-old female Mandarin–English bilingual produced all of the auditory stimuli. She self-reported balanced usage of both languages in home and school environments, having acquired Mandarin from birth and English around age four. The speaker completed a written language background questionnaire asking for speaking, listening, reading, and writing proficiency self-ratings in both languages. She rated herself as proficient in English and Mandarin on a scale of 0–6, with 0 being low and 6 being high, as shown in Table 1. The speaker read over the list of stimuli before recording, to check for grammaticality, and to ensure familiarity with the sentences to avoid hesitations during recording. The speaker was also administered the Bilingual Language Profile (Birdsong, Gertken & Amengual, Reference Birdsong, Gertken and Amengual2012), on which she scored −23 on a scale from −218 (very Mandarin-dominant) to 218 (very English-dominant), suggesting that she is a relatively balanced bilingual, though slightly more dominant in Mandarin. In addition, she reported having a positive attitude toward code-switching, frequently code-switching with friends, and occasionally with family.

Table 1. Speaker self-rated proficiency.

Participant screening

Participants were screened for proficiency prior to the experiments with two tasks. First, they were administered the same written language background questionnaire as was given to the speaker. They then completed a familiarization task, to check vocabulary size and to ensure association of the appropriate Mandarin and English names with the pictured objects. Participants were presented all visual stimuli one by one on a computer screen, along with printed English and Mandarin names for the pictured objects. The positions of the English and Mandarin names (left or right underneath the picture) were randomized. The task was self-paced, and participants were given an index card to note down any English and Mandarin words they were unfamiliar with, or if the words were not ones that they would typically use to name the pictured object. If the participant was not proficient enough according to the questionnaire (i.e., scoring below 3 on the 1 (low) – 4 (high) understanding and speaking proficiency scales on the language background questionnaire) or their vocabulary was too limited based on the familiarization task, they were disqualified from participating. A substantial vocabulary in both English and Mandarin, as well as familiarity with specific names of pictured objects, was desirable, as the study relied on participants’ being able to associate pictures with their spoken names in both languages. Therefore, any participants who marked more than ten words (of a total of 224) as unfamiliar or not their primary choice for describing the picture in either language (e.g., due to dialectal differences) was disqualified from participating. The entire screening process for each participant lasted approximately twenty minutes.

Participant language background

A total of 42 Mandarin–English bilinguals (35 female, 7 male) with no reported speech or hearing defects qualified for participation in this study. All participants but one completed both this experiment and Experiment 2. The participants’ linguistic backgrounds and, consequently, their language dominance, varied. Thirty-five participants were L1 Mandarin speakers, one participant was an L1 English speaker, while six participants were simultaneous bilinguals. Twenty-three participants reported also speaking other languages, and four participants reported both Mandarin and other Chinese languages as their L1s: Wu (Shanghainese), Yue (Cantonese), and Southern Min. The average age was 20.4 years (SD = 2.2). While most participants were 18–24, one male participant was 31 years of age. The average age of arrival to the U.S. was 15 years (SD = 7), although two participants first lived in Canada starting at ages four and eight, before moving to the U.S. at ages 12 and 18, respectively. Additionally, several participants grew up in Singapore, where English is an official language and most of the population code-switches frequently. Most participants moved from China to the U.S. for college, while two each moved from Malaysia and Singapore, and one each from Taiwan and Hong Kong. Four participants were born and raised in the U.S. All participants reported occasionally or regularly code-switching with friends or family. Three participants were left-handed.

We quantified participants’ language dominance using the Bilingual Language Profile (Birdsong et al., Reference Birdsong, Gertken and Amengual2012), a questionnaire that assesses language dominance. The participants’ scores ranged from -159 to 96, averaging -31 (SD = 59), meaning that most participants leaned Mandarin-dominant. Twenty-seven participants had negative scores, suggesting Mandarin dominance, while the other fifteen had positive scores, suggesting English dominance. Following a reviewer's suggestion, these dominance scores are included as part of a separate model in the Results section, to ascertain whether this diversity affected findings.

Table 2 provides participants’ average age of acquisition of English and Mandarin, as well as their self-rated proficiency in each language on a scale of 0–6, where 0 means “not well at all” and 6 means “very well.” Participants rated themselves as being almost equally proficient in speaking, understanding, reading, and writing both languages.

Table 2. Mean participant age of acquisition and self-rated proficiency. Standard deviations are in parentheses.

Visual stimuli

Visual stimuli consisted of 80 pictures from the Rossion and Pourtois (Reference Rossion and Pourtois2004) colored line drawing database, or other public domain colored line drawings that visually resembled the Rossion and Pourtois (Reference Rossion and Pourtois2004) set. All pictures depicted common objects, and were modified to the same dimensions. Of the 80 pictures, 64 were target experimental items, in that the pictured objects were mentioned in the corresponding auditory stimulus sentence. The other 16 pictures functioned as part of catch trials, where the pictured object was not mentioned in the corresponding auditory stimulus.

Auditory stimuli

Auditory stimuli consisted of 144 spoken English sentences: (a) 64 sentences that mentioned the paired visual stimulus, (b) the spliced versions of those 64 sentences that mentioned the paired visual stimulus, and (c) 16 sentences that functioned as catch trials, thereby not mentioning the paired visual stimulus.

The target experimental items included the 64 spoken English sentences with either English target words (32 unilingual sentences) or Mandarin target words (32 code-switched sentences), recorded by the speaker in random order. Sentences were constructed so that each mentioned a picturable noun. Picturable nouns occurred sentence-medially in half of the sentences and sentence-finally in the other half. This gave a total of 16 English sentences with medial nouns, 16 English sentences with final nouns, 16 code-switched sentences with medial nouns, and 16 code-switched sentences with final nouns. Sentences were designed with similar syntactic structures to control for intonational patterns: either 1) a main clause beginning with a subject pronoun, followed by a transitive verb and direct object, ending with a prepositional phrase, or 2) a subject pronoun, main verb, and embedded clause. In the former case, medial targets occupied the direct object position, while final targets were located in the prepositional phrase. In the latter case, final targets were located in the embedded clause. Target words were introduced by either a definite article, indefinite article, or possessive pronoun. Spliced versions of these 64 sentences were also constructed, as described in the Splicing section.

Additionally, 16 sentences were not target trials but functioned as catch trials instead, in that none of the picturable nouns heard in the auditory stimuli matched the pictured objects on the screen. For instance, participants might hear “I saw a raccoon behind the plant,” while being presented a picture of a zebra. The inclusion of these catch trials was to ensure that target loci were not predictable from the similar syntactic structures of the stimulus sentences. The catch trials were split evenly among the four kinds of stimuli, regarding position of the picturable noun and whether there was a code-switch. The intention was to prevent participants possibly using syntactic or contextual predictability to respond whenever they expected to hear a noun, e.g., pushing a button when they heard the determiner preceding the target noun.

These sentences can be found in Appendix S1.

Splicing

This study utilizes a splicing manipulation in both experiments to test the prediction that listeners will have relatively more difficulty recognizing a code-switch (manifesting as slower reaction time) if anticipatory phonetic cues to the code-switch are withheld. The speaker recorded multiple repetitions of each auditory stimulus sentence, including English-only versions of code-switched sentences to use as frames in the splicing condition.

To eliminate any phonetic information provided in the sentence leading up to the target word that could cue the language of the target word, stimuli were cross-spliced, so that a Mandarin target originally recorded in a code-switched sentence was spliced into what was originally a unilingual English sentence. To control for any effects of the splicing manipulation itself, English sentence stimuli were recorded twice, and English targets were identity-spliced into a separate repetition of the same English sentence. This procedure is illustrated in Figure 1.

Since the spliced and unspliced versions of each sentence were identical content-wise and would sound identical aside from the splicing effect, two lists were created in each experiment to avoid participants hearing the same sentence both spliced and unspliced. In each list, half of the items were spliced. The concept monitoring experiment had 64 distinct target sentences, so that each list had 32 spliced items (along with the 16 catch trial sentences). Participants were randomly assigned to one of two lists at the start of each experiment, with an equal number of participants assigned to each list.

Procedure

Data collection took place in a sound-attenuated booth in the PhonLab in the Department of Linguistics at the University of California Berkeley. Prior to the experiment, participants were presented with printed English instructions on a computer screen, informing them that they would hear a sentence while an image is displayed on the screen. Instructions stated that participants would hear both English and Mandarin throughout the experiment, and asked that they press a button if they heard the pictured object mentioned in the sentence. An experimenter was present to answer questions, as well as to clarify that: a) the pictured object would sometimes not be mentioned (i.e., in catch trials), and in that case, not to press a button, and b) participants were to press a button if the pictured object were named at all, in either language. Auditory stimuli were presented through headphones. During each trial, participants saw a picture in the center of the computer screen, and heard a spoken sentence that mentioned the pictured object. The task was to press a button as soon as they heard the object mentioned in the sentence. Presentation of trials was randomized, and a 1000 ms delay occurred between trials. Each trial lasted 3000 ms. The experiment lasted approximately fifteen minutes.

This experiment (concept monitoring) was counter-balanced with the next experiment (eye tracking); participants were randomly assigned the order in which to complete the two experiments. After completion of both experiments, participants were administered the Bilingual Language Profile (Birdsong et al., Reference Birdsong, Gertken and Amengual2012) as well as a questionnaire asking about their code-switching attitudes and behaviors. The entire study lasted around 45 minutes, and participants were compensated $5 for the completion of each of the three components.

Reaction times were measured as the latency between the onset of the target word and the subject's keypress response. Catch trials were first excluded from analysis, so that there were a total of 2688 target trials (64 unique stimuli x 42 participants). Data was then trimmed to remove trials with reaction times that were under 200 ms or longer than the trial duration. This resulted in the loss of 47 observations. Additionally, trials with target words that participants noted as unfamiliar during the familiarization task were excluded. Finally, each participant's mean was calculated, and any reaction times that were more than two standard deviations from that participant's mean were excluded from analysis. Only two observations were removed as outliers in this manner. After trimming, 2506 observations remained for analysis, so that approximately 7% of the target data was excluded.

Data analysis

The log-transformed data was modeled with a linear mixed effects regression, shown in Table S1. The model considers an interaction between whether a target word is a code-switch or not (Switch), spliced or unspliced (Splice), and sentence-medial or sentence-final (Position), and includes random slopes for Splice-by-item and Switch-by-subject (Baayen, Davidson & Bates, Reference Baayen, Davidson and Bates2008). As a follow-up analysis, we fitted an alternate model including an interaction of participants’ BLP scores (Dominance) with Switch, Splice, and Position.

Results

Concept monitoring

Table 3 shows average reaction time (in milliseconds) as a function of Switch, Splice, and Position. Generally, reaction times to code-switched targets were slower than to English targets (with the exception of final, unspliced targets), and reaction times to spliced targets were slower than to unspliced targets. However, the most noticeable difference is between reaction times to sentence-medial and sentence-final targets.

Table 3. Mean raw reaction times (ms), as a function of switch, splice, and position. Standard deviations are in parentheses.

Since the data distribution was right-skewed, reaction times were log-transformed.

The linear mixed effects regression model summarized in Table S1 and plotted in Figure 2 suggests that there was a significant effect of Position, and a tendency for Switch to affect reaction time. The target being code-switched is associated with longer reaction times (β = .091, t = 1.912, p = .059). Reaction times to sentence-medial words were significantly longer than those to sentence-final words (β = .217, t = 4.705, p < .001). However, Splice was not a significant effect (β = .047, t = 1.382, p = .172). Additionally, the interaction between Switch and Splice is not significant, suggesting that reaction times for code-switched trials are not predicted to differ significantly depending on whether they were spliced or unspliced.

Fig. 2. Log-transformed mean reaction times, by position, switch, and splicing conditions. Vertical lines represent standard errors.

We also fit a model that included participants' language dominance scores from the Bilingual Language Profile, since our participants’ linguistic backgrounds varied. This model had an interaction between Switch, Splice, Position, and Dominance. While this model performed worse than the one in Table S1 as evaluated by both models’ Akaike Information Criteria and log likelihoods, Switch was significant in this model (β = .013, t = 2.62, p = .01), as was the interaction between Switch and Dominance (β = .0001, t = 2.52, p = .01), and Switch, Position, and Dominance (β = .0001, t = −2.98, p = .003). This suggests that participants with a more positive BLP score (i.e., more English-dominant participants) had slower reaction times to code-switches, especially sentence-medial code-switches.

Due to participants’ different backgrounds in country of origin, age of arrival to the U.S., and age of acquisition of English, we performed further analyses with the original model to determine whether excluding the ten participants who were not born and raised in China (i.e., from Singapore, Malaysia, Taiwan, Hong Kong, or the U.S.) affected the results. This was not the case; the pattern of the results was unchanged. Exclusion of the five simultaneous bilinguals, who were also not born and raised in China, but rather in the U.S., Singapore, and Hong Kong, also did not affect results.

Discussion

The results of this experiment are consistent with the switch cost findings in previous studies: Listeners were slower to recognize code-switched words compared to words in a unilingual utterance. However, the absence of anticipatory phonetic cues did not have an apparent effect on the recognition of the code-switch, contrary to our initial hypothesis.

Assuming the intended anticipatory phonetic cues are present in the speech signal, this result suggests that perhaps Mandarin–English bilingual listeners did not detect or use such cues. However, while the reaction time measure used in this experiment revealed that Mandarin–English bilinguals are slower overall to recognize code-switches, it is possible that phonetic cues did affect the recognition process prior to and at the beginning of the code-switch, but that these effects had already dissipated before the button-press in the concept monitoring task.

The position of the target word had an interesting influence on recognition of code-switches. Though target word position was originally varied to prevent participants from predicting its location in the sentence by using syntactic cues like determiners and possessive pronouns, listeners took longer to recognize sentence-medial targets compared to sentence-final targets, regardless of whether the target was a code-switch or not. This difference could potentially be attributed to the reduction of uncertainty as the sentence progresses. After participants experience several trials, it might become clear that targets only occur medially, finally, or not at all, especially because sentences are controlled for syntactic structure. If participants are strategically expecting targets by syntactic position, rather than monitoring for the concept, then sentence-final targets might be easier. For example, if the participant has already heard the main clause but not the target, then the target is either sentence-final or will not occur.

Alternatively, listeners’ use of phonetic information in sentence processing could be affected by the amount of time they have to incorporate such information; all sentence stimuli were similar lengths so that trials with sentence-final targets are preceded by a longer utterance than trials with sentence-medial targets. Future work can manipulate sentence length, word position, and number of catch trials to investigate the difference between medial and final targets.

The model including Dominance suggests that dominant language is a factor in code-switched recognition. English-dominant bilinguals were slower to respond to code-switched, i.e., Mandarin, targets. One interpretation is that switching out of one's dominant language and into the non-dominant language is difficult. Perhaps bilinguals can recognize code-switches more easily if the switch occurs in their dominant language. This pattern is reminiscent of the Inhibitory Control Model (Green, Reference Green1998), and what Olson (Reference Olson2017) found in comprehension, though with a different effect of dominant language: instead of switching back into the dominant language being more costly due to the fact dominant language requires stronger inhibition, our bilinguals took longer to switch into their non-dominant language.

Experiment 2: eye tracking

Experiment 1 showed that Mandarin–English bilinguals are slower to recognize code-switched words, but failed to show an effect of the absence of anticipatory phonetic cues on concept monitoring times. While an offline task like the concept monitoring experiment can reveal whether code-switched recognition incurs a switch cost, it may not give insight into the time course of recognition and whether and when phonetic cues are incorporated.

Online tasks such as eye tracking are advantageous for understanding the time course of lexical activation during spoken language comprehension (Cooper, Reference Cooper1974; Tanenhaus, Spivey-Knowlton, Eberhard & Sedivy, Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995). The visual world paradigm in eye tracking is a particularly good method for studying spoken word recognition (Allopenna, Magnuson & Tanenhaus, Reference Allopenna, Magnuson and Tanenhaus1998; Altmann & Kamide, Reference Altmann and Kamide2009; Huettig, Rommers & Meyer, Reference Huettig, Rommers and Meyer2011).

The visual world paradigm involves a visual display of pictures, with a simultaneous auditory stimulus naming one of the pictures. The pictures represent the target word and various lexical competitors, with participants’ eye movements revealing when certain lexical items are activated during spoken word recognition. The auditory stimulus can be manipulated to test the role of different phonetic details in the process of recognizing a spoken word.

Experiment 2 uses the visual world eye tracking paradigm and splicing to investigate whether withholding anticipatory phonetic cues affects code-switched recognition. The visual world involves a display of four pictures, each corresponding to a different type of lexical candidate, and a simultaneous auditory stimulus so that the time course of lexical access is elucidated by the participant's fixations to pictures during perception of that continuous speech. The goal of this experiment is to probe which lexical candidates are considered during the processing of a code-switch, and whether bilingual listeners use phonetic information to constrain recognition to candidates in the expected language.

We predict that recognition of a code-switch will be hindered by a lack of phonetic cues to that switch. Therefore, in the spliced code-switched condition, we predict that listeners will fixate less on the target as compared to the unspliced code-switched condition, because the phonetic context will lack switch cues and bias them away from Mandarin. Listeners might therefore look at an English competitor early on, expecting a target in the same language as the sentence frame. In the unspliced code-switched condition, listeners will fixate more on the target, since available phonetic cues will bias them toward expecting a Mandarin code-switch. Listeners might also look toward the Mandarin competitor more than in any other condition, since only the unspliced code-switched condition involves phonetic cues signaling an upcoming Mandarin word.