1. Introduction
When viewing a visual scene, several factors influence what features are attended to, and in what order. Some of these factors are the visual saliency of the entities in the scene, the relations between these entities, a person’s familiarity, experience with, and general world-knowledge about the event, and so forth. A factor that has also proven important is the language background of the viewer. Cross-linguistic studies have shown that, under certain circumstances, the language of a speaker can bias their attention toward certain aspects of an event, particularly when language is explicitly involved in the task (e.g., Papafragou, Hulbert, & Trueswell, Reference Papafragou, Hulbert and Trueswell2008; Sakarias & Flecken, Reference Sakarias and Flecken2019; Sauppe, Reference Sauppe2016), namely those aspects that are typically and frequently mentioned in event descriptions in a given language (Slobin, Reference Slobin, Gumperz and Levinson1996, Reference Slobin, Strömqvist and Verhoeven2004). The present study examined bilinguals and the influence of language background on attention patterns to events. Specifically, we used event-related potentials (ERPs) to investigate whether the two languages of early bilinguals influence their attention during passive viewing of motion events.
A motion event typically involves the change in place of an entity (figure), tracing a specific trajectory toward a potential endpoint (path of motion) (Talmy, Reference Talmy1985, Reference Talmy2000). Talmy categorized world languages into verb-framed and satellite-framed languages. Satellite-framed languages, such as English, Dutch, and German, have a rich repertoire of motion verbs expressing information about the manner of motion of a moving entity, for example, to walk, stroll, run, and tiptoe. In these languages, information about the path of motion is typically expressed by elements associated with the main verb, the so-called satellites, such as prepositions, particles, prefixes, for example, to walk in(to), out (of), across, to, and along X. In verb-framed languages, such as Spanish, Turkish, and French, motion verbs typically contain path information, for example, entrer ‘enter’ in French, çıkış ‘exit’ in Turkish, cruzar ‘cross’ in Spanish, and not manner information. The linguistic encoding of manner information is mostly optional in sentences. It can be expressed outside the verb phrase using adjectives (a running man enters X), prepositional phrases (a man enters X on foot), or gerunds, for example, to enter running, if there is a certain contextual or pragmatic need to specify this type of information.
Such cross-linguistic differences in linguistic encoding of motion have been shown to affect perception of motion, particularly when language is used overtly or covertly during experimental tasks. The Thinking-for-Speaking hypothesis (Slobin, Reference Slobin, Gumperz and Levinson1996, Reference Slobin, Strömqvist and Verhoeven2004) states that when preparing to speak, people conceive of events in language-specific ways, attending to event aspects that are relevant for verbalization. As a result, with respect to motion, native speakers of satellite-framed languages are more likely to focus on the manner of motion than native speakers of verb-framed languages. The focus on path is comparable between the two language types, as path is a required component of a motion event in all languages (Slobin, Reference Slobin, Strömqvist and Verhoeven2004). Supporting this view, Gennari et al. (Reference Gennari, Sloman, Malt and Fitch2002) found language-specific attention patterns in verb-framed Spanish and satellite-framed English monolinguals, but only in the condition when participants had to verbally encode the scenes before making similarity judgments about them. Similarly, in Finkbeiner et al. (Reference Finkbeiner, Nicol, Greth and Nakamura2002), English native speakers were more likely to judge similarity of the clips of novel motions based on their manner, but only when they had to memorize the target clips in the current view for subsequent similarity judgment. Papafragou, Hulbert, and Trueswell (Reference Papafragou, Hulbert and Trueswell2008) used eye tracking to measure attention allocation to manner and path regions (endpoints) of scenes in speakers of English and verb-framed Greek. Cross-linguistic differences were found in gaze fixation patterns when the task was to prepare to verbally describe the scenes, but not when the participants were instructed to silently inspect the scenes for memorization.
The present study is interested in bilinguals. We asked whether speaking two typologically distinct languages from early childhood leads to a hybrid pattern in event processing, that is, a pattern of convergence, where bilinguals’ verbal and nonverbal behaviors include features of both their first (L1) and second (L2) language (Alferink & Gullberg, Reference Alferink and Gullberg2014; Jarvis & Pavlenko, Reference Jarvis and Pavlenko2008). For example, when a bilingual speaker uses a verb-framed language, they would need to ‘overwrite’ habitual specification of manner of motion found in their satellite-framed language, unless it carries important information for a specific situation. Likewise, when using their satellite-framed language, the bilingual speaker would have to specify the manner of motion in most, if not all, descriptions of motion events. Further, a speaker would need to keep in mind that manner information encoded in motion verbs can be combined with satellites encoding path information, a pattern not typical in a verb-framed language (e.g., Berthele & Stocker, Reference Berthele and Stocker2017; Flecken, Carroll, et al., Reference Flecken, Carroll, Weimar and Von Stutterheim2015; Stefanowitsch, Reference Stefanowitsch2013). To fulfill these language-specific requirements, a bilingual speaker would need to develop a pattern of event processing that would efficiently and flexibly account for both patterns. This could result in greater reliance on a feature that is acceptable in both languages, that is, motion path, or a decrease in the specification of the information that is only relevant in one of the bilingual’s languages (i.e., manner of motion).
Research suggests that bilingual performance is susceptible to a variety of factors that can tilt the scale toward either L1 or L2 patterns in verbal and nonverbal behaviors. In verbal behaviors, for example, in Berthele and Stocker (Reference Berthele and Stocker2017), German–French bilinguals showed a preference for manner verbs when describing motion video clips in satellite-framed German. But when functioning in a bilingual mode (i.e., using both German and French to perform the task), converged toward the French pattern, using more path verbs in satellite-framed German. Furthermore, Park (Reference Park2020) found that while Korean–English sequential (late) bilinguals tended to verbally describe motion events following the satellite-framed English (manner salient) pattern depending on L2 proficiency, their nonverbal similarity judgments of motion events were mostly consistent with verb-framed L1 Korean. In Flecken, Carroll, et al. (Reference Flecken, Carroll, Weimar and Von Stutterheim2015), late L2 German speakers whose L1 was verb-framed French watched and described motion events while their gaze fixations were recorded. In verbal descriptions, L2 German speakers did not differ from L1 German speakers in terms of their usage of manner verbs. However, L2 German speakers had a tendency to verbally encode the location of the moving entity (e.g., a car is driving on the road), which is a pattern typical of their L1 French, but not common in L1 German. Such entrenchment of L1 patterns was also reflected in gaze allocation to manner-elements in the scenes, early during speech planning, which patterned with L1 French participants performing the task in their native language.
In nonverbal behaviors, only a few studies examined the perception of motion in bilinguals. Bylund and colleagues examined the perception of motion in bilinguals using a motion event categorization task, where participants had to judge similarity between motion event videos based on either manner or path of motion. They found altered categorization preferences as a function of L2 use, exposure, or frequency of use in late bilinguals (Bylund, Athanasopoulos, & Oostendorp, Reference Bylund, Athanasopoulos and Oostendorp2013) and late multilinguals (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014). Lai, Garrido Rodriguez, & Narasimhan (Reference Lai, Garrido Rodriguez and Narasimhan2014) examined the effect of task language on the perception of motion in Spanish–English bilinguals using a similarity judgment task. Participants first watched a video clip of a motion event, listened to a description of the event, and repeated that description in one of their two languages. Then, they were presented with a manner-match and a path-match video on a split screen and were asked to judge which video was more similar to the original video. It was found that, when tested in Spanish, late bilinguals thought that the path-match video was more similar to the original video. But when tested in English, late bilinguals thought that the manner-match video was more similar to the original video. Similarly, in Montero-Melis, Jaeger, and Bylund (Reference Montero-Melis, Jaeger and Bylund2016), Swedish–Spanish late bilinguals judged similarity of motion clips in three conditions: primed with path-describing sentences, manner-describing sentences, and with nothing (control), while performing the task in verb-framed L2 Spanish. Manner-primed participants relied on manner more often than controls, while path-primed individuals did not differ from the control group. The above findings are consistent with the proposal that the acquisition of an L2 can go hand in hand with the incorporation of new conceptual distinctions or conceptualizations, altering one’s behavior and perhaps cognitive representations of motion, called restructuring (see, e.g., Bassetti & Cook, Reference Bassetti and Cook2011; Park & Ziegler, Reference Park and Ziegler2014; Pavlenko, Reference Pavlenko2011; Wang & Wei, Reference Wang and Wei2019).
Findings so far pertain to late, or sequential, bilinguals, but the situation may be different in early bilinguals who are brought up in a bilingual household or a household that speaks a language that is different from the language outside of the household. In the lexical domain, early bilinguals tend to converge on both language patterns, rather than build two entirely separate language-specific representations (Ameel et al., Reference Ameel, Storms, Malt and Sloman2005, Reference Ameel, Malt, Storms and Van Assche2009). Presumably, this ensures efficiency of the cognitive system. Consistent with this proposal, in Lai, Garrido Rodriguez, and Narasimhan (Reference Lai, Garrido Rodriguez and Narasimhan2014), regardless of the task language, early Spanish–English bilinguals consistently judged motion event similarity based on the path. Similarly, in Filipović (Reference Filipović2011), early Spanish–English bilinguals adhered to a single pattern of motion event lexicalization in a task where they had to watch motion event videos, such that their descriptions in both languages were path-based (similar to monolingual Spanish speakers), and showed no effects of task language (English or Spanish). These findings suggest that early bilinguals chose the (path-salient) pattern that is acceptable in both English and Spanish. In contrast, in Kersten et al. (Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010), early bilinguals patterned with English monolinguals regardless of the task language in a motion category discrimination task that featured a variety of novel motions. The novelty of the manners in the motion stimuli might have drawn attention to manner for the purpose of task success. Based on these findings, we suggest that early bilinguals may develop two systems with significant overlap, where items falling under the overlapping categories show a substantial degree of similarity, while certain language-specificity is still retained (Ameel et al., Reference Ameel, Storms, Malt and Sloman2005; Pavlenko & Malt, Reference Pavlenko and Malt2011). Thus, unlike late bilinguals, early bilinguals show a tendency to converge on the patterns shared between their languages, but also demonstrate certain flexibility, choosing the most efficient pattern for the task at hand.
Previous research investigating the domain of crosslinguistic differences in motion cognition relied mainly on behavioral measures, while neural correlates of behavioral differences remain understudied. EPRs that index attentional processes and stimulus evaluation can help reveal such language effects. In this study, we investigated whether previously reported behavioral differences in attention to motion events can be found and characterized in a nonverbal task in early bilinguals who acquired two typologically distinct languages simultaneously.
2. The present study
The present study examines to what extent two early acquired typologically distinct language systems in one mind co-determine attention allocation during the viewing of motion events in a nonverbal context, where language use is not required to perform the task. To this end, we examined the effect of language background on the perception of complex motion event scenes in early Turkish–Dutch bilinguals. Non-Turkish speaking Dutch participants were tested as a control group. Turkish is a verb-framed language with low manner saliency (Aksu-Koç, Reference Aksu-Koç, Berman and Slobin1994; Özçalışkan & Slobin, Reference Özçalışkan, Slobin, Özsoy, Akar, Nakipoğlu-Demiralp, Erguvanlı-Taylan and Aksu-Koç2003), whereas Dutch is satellite-framed with higher manner saliency. Testing early bilinguals who speak two typologically different languages and reside in the same cultural environment allowed us to attribute effects to differences between their two languages instead of their culture. That is, any pattern in the Turkish–Dutch bilingual group that diverges from the Dutch control group may be interpreted as an effect of their knowledge of Turkish. To ensure maximum comparability of the groups, we used Dutch as the language of instruction for both groups.
Following recent studies of the interactions between language and perception using ERPs (Flecken, Athanasopoulos, et al., Reference Flecken, Athanasopoulos, Kuipers and Thierry2015; Flecken & Van Bergen, Reference Flecken and Van Bergen2020; Thierry et al., Reference Thierry, Athanasopoulos, Wiggett, Dering and Kuipers2009) we used a design inspired by the visual oddball paradigm, where the frequency of conditions is manipulated to elicit a response. Traditional oddball designs probe early visual perception (e.g., Thierry et al., Reference Thierry, Athanasopoulos, Wiggett, Dering and Kuipers2009) that precedes any potential language effects. Given our interest in the processing of complex motion scenes and the interaction with language, we aimed to elicit an ERP P300 response, which was reported previously for complex language–perception interactions in oddball designs (Flecken, Athanasopoulos, et al., Reference Flecken, Athanasopoulos, Kuipers and Thierry2015). The P300 component is known to reflect attentional processing and evaluation of the incoming stimulus (Polich, Reference Polich2007 for a review). Due to its latency P300 likely reflects processes prior to sentence formulation, that is, specifically visual attentional processes (Flecken, Athanasopoulos, et al., Reference Flecken, Athanasopoulos, Kuipers and Thierry2015). That is, the relative magnitude of the P300 effect reflects the degree of the visually perceived match between the oddball stimulus and the preceding stimulus.
During the experiment, bilinguals and Dutch controls performed a motion-matching task with no overt use of language while EEG was recorded. In each trial, participants watched a short video clip of a motion event first, and then viewed a still picture that matched its preceding video clip in four ways: full match (response oddball condition; 10% of trials), full mismatch (standard condition; 70%), manner-of-motion match (critical oddball 1, 10%), and endpoint-match (critical oddball 2, 10%). Participants were instructed to press a button only in the full-match condition where the target picture fully matched the preceding video.
We expected a P300 oddball effect for the infrequent full match response condition relative to the frequent full mismatch condition, for both the bilingual and the control groups: The P300 for the full match (response oddball) condition should be more positive than the full mismatch (standard) condition, indicative of heightened task-relevant attentional processing (Polich, Reference Polich2007). In the two critical oddball conditions (manner match and endpoint match), we planned on comparing the P300 effect (subtracting the standard full mismatch P300 from each of the oddball conditions) for each of them between the Turkish–Dutch bilinguals, who were dominant in their L2 Dutch, and the Dutch control group. One possibility was an enhanced P300 effect for the manner match (relative to the standard full mismatch) in Dutch controls, as compared to Turkish–Dutch bilinguals. This would be because Dutch is a manner salient language and therefore there might be enhanced attention to manner in Dutch monolingual speakers, as compared to the bilingual speakers for whom manner matters predominantly in only one of their languages (Dutch), and less so in their other language (Turkish). The other, opposite possibility was an enhanced P300 effect for the endpoint match (relative to the standard full mismatch) in bilinguals, as compared to Dutch controls. This would be because motion event endpoints, as an important part of the motion path, represent a particularly important dimension of a motion event (Talmy, Reference Talmy2000). This holds in both satellite-framed and verb-framed languages. In bilingual speakers of a satellite- and a verb-framed language, this might give rise to a pattern of convergence (Alferink & Gullberg, Reference Alferink and Gullberg2014; Jarvis & Pavlenko, Reference Jarvis and Pavlenko2008), reflected in a strong reliance on the pattern that overlaps in the bilinguals’ two languages. In this case, this would show up as an attentional bias toward trials showing a match in motion event endpoints.
3. Methods
3.1. Participants
Sixty-one right-handed participants took part in the experiment for payment. All had normal or corrected-to-normal vision, and gave written consent to participate in the experiment, which was approved by the local ethics committee. None reported neurological or psychological disorders. Data from 4 participants were excluded because fewer than 30 segments per condition remained after pre-processing in at least one of the conditions. Additionally, data from five participants were excluded due to low performance on the behavioral task. The final pool thus consisted of 52 participants. Their languages and social backgrounds, collected using an extensive sociolinguistic web questionnaire (TUNE, 2013), were summarized in Table 1.
The Dutch control group included 25 native Dutch speaking students recruited from Radboud University Nijmegen, the Netherlands. All had an intermediate to high proficiency in an L2 (or L3; mainly English and German), but none of them was proficient in a verb-framed language. They reported frequent use of and exposure to languages other than Dutch (mainly English, German) in their environment (at work, at university, through the media, and so on), but all of them reported Dutch as their dominant language of use. They were all born in the Netherlands to two Dutch native-speaking parents, who had also been born in the Netherlands. They had started learning English around the age of 10 (seventh grade in the Dutch school system).
Twenty-seven Turkish–Dutch early bilingual students were recruited from the Turkish student association at Radboud University Nijmegen, and throughout the Netherlands, and the campus mosque. The bilingual participants were also all born in the Netherlands, but to two Turkish native-speaking parents. They typically spoke Turkish in a family setting at home or with Turkish–Dutch bilingual peers. They typically spent time in Turkey for 2 months in summer during which they spoke Turkish only. In addition, they started learning and being exposed to Dutch (by teachers, peers) when they entered pre-school around the age of 4. They also started learning English around the age of 10 in classroom settings, similar to the Dutch control group. Bilingual participants’ formal proficiencies in Turkish and Dutch were assessed with the Boston Naming Test (BNT) (Kaplan, Goodglass, & Weintraub, Reference Kaplan, Goodglass and Weintraub1983). BNT scores indicated that Turkish–Dutch bilinguals were most proficient in Dutch (Turkish: M = 69.54, SD = 15.93; Dutch: M = 109.88, SD = 14.11).
3.2. Materials and procedure
The materials consisted of 40 1.5-second clip-art animations and still pictures based on each animation (Fig. 1). Each animation depicted a motion event, in which a schematic human figure moved in a specific manner of motion toward a specific endpoint object or location (path of motion). The manner of motion of a figure was expressed either with a specific instrument (e.g., a sleigh, a bicycle) or without it (e.g., figure crawling, jumping, and dancing). Each figure corresponded to one specific manner of motion. The path of motion was operationalized as an endpoint toward which the figure moved. The trajectory of motion was not controlled for, as it was not relevant to the task. Eighteen figures moved along a horizontal trajectory, another 18 along a diagonal trajectory, and 8 along a vertical trajectory. Each figure moved along only one type of trajectory. The endpoint was an object, which could be entered (e.g., a tunnel and a door) or not (e.g., a mirror, a bench, and a ramp). In all scenes, the path of motion was represented by the endpoints and by the end of the video the moving figure arrived at the endpoint (i.e., the sliding image of the figure stopped right at the still endpoint-object by the time the video froze). The pictures showed a similar constellation of manner- and path-elements (figure and endpoint). Thus, during the task participants compared two scene elements (manner of motion depicted by the figure and path of motion depicted by the endpoint) in the pictures to the same two elements previously witnessed in the animation. We did not control for the degree of salience of either manner or path of motion scenes used as stimuli. Stimuli were created using Microsoft clip-art images and Adobe Premiere.
The clip-art animations were paired and output 400 video-picture pairs, which rendered four conditions: (1) full match (manner of motion and endpoint (path) were identical between the video and the picture), (2) manner match (manner of motion identical, endpoint (path) different), (3) endpoint match (endpoint (path) identical, manner of motion different), and (4) full mismatch (both elements different in video and picture). See Fig. 1 for an example and Table A1 for a full list of motion manners and endpoints used as stimuli. In total, there were 40 unique motion event constellations.
In the EEG session, 70% of trials (280 trials) were full mismatch trials and the three other conditions occurred each in 10% of all trials (40 trials each). Each picture was preceded by the full match, the manner match, or the endpoint match video only once during the experiment (40 trials for each condition), whereas it was preceded by a fully mismatching video seven times (280 trials for full mismatch condition). Four pseudo-randomized lists with four blocks of 100 trials were built following a Latin-square design, and the order of the conditions for each item was varied (i.e., list 1: item 1, condition 1 first and list 2: item 1, condition 2 first). Furthermore, lists were constructed such that the response trial (full match condition) and the two critical conditions (manner match, endpoint match) each appeared only once every 10 trials. Stimuli were presented using a Neurobehavioral systems Presentation script.
Participants were instructed to press a button only if the picture looked exactly like the preceding video clip (full match condition). They were told that the picture was always smaller than the video, and would only imply motion. Stimuli were presented against a white background on a 19-inch CRT monitor, with the pictures appearing in centered position, covering 250 pixels in the middle of the screen, and 7 cm in length as well as width. Participants were seated 100 cm from the screen, ensuring that the visual angle covered no more than 2° for each eye. The video was shown for its full duration of 1,500 ms, followed by a white screen with a focus point of 500 ms. Then, the target picture appeared for 200 ms, after which a white screen was shown for 800 ms. Participants were instructed to hold their response (if necessary) until a black question mark appeared on the screen (for 1,000 ms), and then to respond as quickly as possible. They were also instructed to blink after the question mark appeared on the screen. No additional measures were taken to actively prevent participants from any potential implicit or tacit use of language during the task.
The experiment started with written instructions (in Dutch) on the computer screen, followed by a practice with 10 trials, including 1 full match response trial. The experimenter, a native Dutch speaker, gave feedback with respect to the timings of button presses and blinks. The EEG session lasted for about 50 minutes including capping. After the EEG session, participants performed the Boston Naming Task in both Turkish and Dutch (order was counterbalanced, duration was 5 minutes each). This was followed by another, unrelated study. In total, the procedure lasted for 120 minutes.
3.3. EEG recording and data pre-processing
Electrophysiological data were recorded from 28 cap-mounted electrodes (Acticap), placed according to the 10–20 convention, at a rate of 1 kHz, using BrainVision Recorder 1.1. An additional two electrodes were placed at the outer canthi of each eye to monitor horizontal eye movements, and another two above and below the left eye to monitor blinks and other vertical eye movements. One electrode was placed on the right mastoid. EEG was recorded in reference to the left mastoid. Impedances were kept below 10 kΩ. Offline, the data were re-referenced to the average of the two mastoids. The data were preprocessed using BrainVision Analyzer 2. EEG activity was filtered offline with a bandpass zero phase-shift filter (high cut-off: 24 dB/oct – 30 Hz, low cut-off: 0.1 Hz). Blinks and horizontal eye movements were corrected on the basis of the four electrodes used for recording eye movements, using the Independent Component Analysis with Infomax algorithm. Data were segmented into epochs ranging from −200 to 1,000 ms after the onset of the target picture and baseline corrected in reference to 200 ms of pre-stimulus activity. Automatic artifact rejection discarded all epochs with an activity value difference exceeding ±100 microvolts. Epochs were visually inspected for contamination by muscle movement, due to too early button presses (trials with a button press within 800 ms after picture onset were excluded). Contaminated trials were removed. The remaining segments were averaged per participant and per condition. At least 30 trials were included in each (oddball) condition for each participant (full match: M = 38.73 (30–40), SD = 2.09; manner match: M = 38.60 (32–40), SD = 1.99; endpoint match: M = 38.29 (31–40), SD = 2.38; full mismatch: M = 267.65 (220–280), SD = 15.37).
3.4. Statistical analyses
We used the following approach for the analyses: first, to verify that our experimental manipulation worked, we entered mean amplitudes for the P300 time window in the oddball condition requiring a response (full match) and the standard condition (full mismatch) into a mixed ANOVA with Condition (full match and full mismatch) and Region (frontal-central and central-parietal) as within-subjects factors and Group (Turkish–Dutch bilinguals, Dutch controls) as the between-subjects factor. Due to averaging of the EEG signal per condition, item variance was not incorporated into the analyses. Next, we conducted focused analyses of the critical oddball conditions (endpoint match and manner match) using mean amplitudes of difference waves (each critical oddball minus the full mismatch) in the P300 time window as well as the late positivity (LP) time window (exploratory analysis). Mean amplitudes of difference waves were entered into a mixed ANOVA with Condition (endpoint match and manner match) and Region (frontal-central and central-parietal) as within-subjects factors and Group (Turkish–Dutch bilinguals and Dutch controls) as the between-subjects factor. For all analyses, in case of a three-way interaction, the data were split by Region and subjected to follow-up mixed ANOVAs with Condition and Group as fixed factors. In case of significant two-way interactions with Region, the data were split by Region and follow-up mixed ANOVA pairwise comparisons were conducted within each region. In case of a significant Group by Condition interaction, the data were split by Group and Repeated Measures ANOVA (RM-ANOVA) pairwise comparisons were conducted comparing conditions within each group. In case of main effects, post hoc paired samples t-tests were conducted.
All analyses and plotting were carried out in R (Version 4.0.3). ANOVAs were fitted using R package ez ver. 4.4.0 (Lawrence, Reference Lawrence2016). T-tests were conducted using R package stats ver. 4.0.3 (R Core Team, 2020). Effect sizes were calculated using R packages effectsize ver. 0.4.0 (Ben-Shachar, Lüdecke, & Makowski, Reference Ben-Shachar, Lüdecke and Makowski2020) and psychReport ver. 3.0.1 (Mackenzie & Dudschig, Reference Mackenzie and Dudschig2021). Tidyverse packages (Wickham et al., Reference Wickham, Averick, Bryan, Chang, D’Agostino McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu and Yutani2019) as well as packages splitstackshape ver. 1.4.8 (Mahto, Reference Mahto2019), plyr ver. 1.8.6 (Wickham, Reference Wickham2011), gdata ver. 2.18.0 (Warnes et al., Reference Warnes, Bolker, Gorjanc, Grothendieck, Korosec, Lumley, MacQueen, Magnusson and Rogers2017), rstatix ver. 0.6.0 (Kassambara, Reference Kassambara2020), and psych ver. 2.0.9 (Revelle, Reference Revelle2020) were used for the preparation of exported preprocessed and grand-averaged data for analyses. Graphs were plotted using tidyverse packages (Wickham et al., Reference Wickham, Averick, Bryan, Chang, D’Agostino McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu and Yutani2019) as well as R packages grid ver. 4.0.3 (R Core Team, 2020) and ggpubr ver. 0.4.0 (Kassambara, Reference Kassambara2020).
Data files as well as analyses and plotting scripts are accessible at https://tinyurl.com/2fjhxa3m.
4. Results
4.1. Behavioral results
The mean numbers of responses (button presses) in each group and each condition are listed in Table 2. Data were entered in a mixed ANOVA of 4 conditions (full match, manner match, endpoint match, full mismatch) by 2 groups (bilinguals, Dutch control). There was a main effect of Condition (F(3,150) = 970.09, p < 0.001, ηp 2 = 0.95). There were more responses for the full match (response) condition than each of the other conditions (full match vs. full mismatch, p < 0.001 Holm corrected, Cohen’s d = 5.38; full match vs. manner match, p < 0.001 Holm corrected, Cohen’s d = 5.38, full match vs. endpoint match, p < 0.001 Holm corrected, Cohen’s d = 4.33). There was no significant Condition by Group interaction (F(3,150) = 0.13, p = 0.94, ηp 2 = 0.003), and no Group main effect (F(1,50) = 0.68, p = 0.41, ηp 2 = 0.01), which validated correct performance on the task in both groups.
Reaction times were not analyzed due to a delay between target picture onset and response as well as limited number of data points (button presses were only required for full matches, accounting for 10% of trials).
4.2. ERP results
Grand-averaged ERP waveforms in the frontal-central (F3, F4, Fz, F7, F8, FC1, FC2, FC5, and FC6) and central-parietal (CP1, CP2, CP5, CP6, P3, P4, Pz, P7, and P8) electrode groups for the Dutch control group and the Turkish–Dutch bilinguals are provided on Figs. 2 and 3, respectively.
Visual inspection indicated that the full match condition (the response oddball) was more positive than the other three conditions (full mismatch, manner match, and endpoint match) across both electrode groups in both groups, identified as P300. At frontal-central electrodes, the positive deflection started as early as 150 ms and peaked around 350 ms, while at central-parietal electrodes, the positive wave peaked later in the Dutch control group, at ~500 ms, and at ~350 ms in the Turkish–Dutch bilingual group, consistent with a typical P300 (Kok, Reference Kok2001; Polich, Reference Polich2007). In the Dutch control group, the critical oddballs (manner match and endpoint match) did not visually differ from each other in any of the electrode groups. In the bilingual group, the critical oddballs (manner match and endpoint match) diverged from each other beginning at ~600 ms, with this difference sustained through 1,000 ms, particularly at frontal-central electrodes, such that the endpoint match appeared more positive compared to the manner match.
In the statistical analyses, the selections of time windows and electrodes were based on related prior oddball paradigm studies (Flecken, Athanasopoulos, et al., Reference Flecken, Athanasopoulos, Kuipers and Thierry2015; Flecken & Van Bergen, Reference Flecken and Van Bergen2020) as well as visual inspection of the current data. We used 350–700 ms as the P300 time window. Based on visual inspection, we also selected an additional 700–1,000 ms time window for exploratory analyses of the LP observed in the bilingual group. We focused on two regions: the frontal-central region (F3, F4, Fz, F7, F8, FC1, FC2, FCz, FC5, and FC6) and the central-parietal region (CP1, CP2, CP5, CP6, P3, P4, Pz, P7, and P8).
4.2.1. P300 (350–700 ms)
First, we verified that our experimental manipulation worked. That is, Is there a P300 effect for the response oddball (full match) condition, in bilinguals and controls?
There was a significant interaction of Condition by Region (F(1,50) = 5.94, p = 0.01, ηp 2 = 0.11) as well as main effects of Group (F(1,50) = 6.95, p = 0.01, ηp 2 = 0.12), Condition (F(1,50) = 54.46, p < 0.001, ηp 2 = 0.52), and Region (F(1,50) = 58.02, p < 0.001, ηp 2 = 0.54). To investigate the interaction, we collapsed Group, split the data by Region, and conducted RM-ANOVAs within each region with Condition as the within-subjects factor. Both yielded a main effect of Condition (frontal-central: F(1,51) = 34.76, p < 0.001, ηp 2 = 0.41; central-parietal: F(1,51) = 61.29, p < 0.001, ηp 2 = 0.55). Post hoc tests indicated that mean raw amplitudes in the full match condition were more positive than in the full mismatch condition in both regions (frontal-central: t(51) = 5.90, p = 0.001 corrected, Cohen’s d = 0.82; central-parietal: t(51) = 7.83, p < 0.001 corrected, Cohen’s d = 1.09), confirming the classic P300 effect in the response oddball condition (full match). There were no significant interactions with Group (all p > 0.05), suggesting that the two groups were matched in terms of their classic oddball reactions.
Next, we conducted a focused analysis of the two critical oddball conditions (manner match and endpoint match) using mean difference waves. Mean amplitudes of the difference waves were obtained by subtracting the standard full mismatch condition from each of the oddball manner match and the oddball endpoint match conditions. There were no main effects or interactions (all p > 0.3). No further analyses were conducted.
4.2.2. Late positivity (700–1,000 ms)
Next, we conducted exploratory analyses. Our question for this analysis was: Do the LP components for the critical oddball conditions differ across bilinguals and controls?
The mixed ANOVA on the mean difference waves with Condition and Region as within-subjects factors and Group as the between-subjects factor yielded a Group by Condition interaction (F(1,50) = 6.00, p = 0.02, ηp 2 = 0.11) as well as a main effect of Region (F(1,50) = 15.41, p < 0.001, ηp 2 = 0.24). To explore the interaction further, we conducted separate RM-ANOVAs comparing conditions within each group. In the Turkish–Dutch bilingual group, there was a main effect of Condition (F(1,26) = 6.33, p = 0.02, ηp 2 = 0.20), such that the endpoint match elicited more positive amplitudes than the manner match condition (t(26) = 2.52, p = 0.02 corrected, Cohen’s d = 0.48) and a main effect of Region (F(1,27), p = 0.005, ηp 2 = 0.26). In the Dutch control group, only a main effect of Region was found (F(1,24) = 6.55, p = 0.02, ηp 2 = 0.21). There was no difference between conditions (p > 0.2).
5. Discussion
The present study tested whether a bilingual background influences motion event perception. Participants were Turkish–Dutch early bilinguals whose two languages are typologically different in terms of motion event encoding, and a control group of (non-Turkish speaking) Dutch native speakers. During the experiment, in each trial, participants viewed a motion event video showing a figure moving in a specific manner of motion (e.g., skiing) toward a specific endpoint (e.g., a tunnel) first. Then, they viewed a still picture depicting a motion event that matched or mismatched the prior video in four conditions (10% full match, 10% manner match, 10% endpoint match, and 70% full mismatch). Their task was to judge whether the video and the still picture matched fully and press a button when they did.
In the ERP P300, which indexes attention to explicit task requirement (Polich, Reference Polich2007), we found a P300 effect for the oddball full match condition that required a response, in comparison to the standard full mismatch condition, across both groups. This suggests that both the Turkish–Dutch bilinguals and Dutch controls were equal in their abilities to attend to the full match condition as required by the task. In our critical comparison between the oddball conditions of manner match and endpoint match, the P300 effects in the bilinguals and the Dutch controls did not differ. This suggests that first, in Dutch monolinguals, task-related attention to manner was similar to attention to endpoints in motion event depictions. Second, the bilinguals did not show enhanced endpoint saliency, as compared to Dutch controls. This suggests that the bilinguals’ knowledge of Turkish, a verb-framed language with more focus on path and less focus on manner, did not affect early task-related attention in a nonverbal picture-video matching task. Our results suggest that the habitual use of a satellite-framed language like Dutch, either as an early Turkish–Dutch bilingual or a non-Turkish speaking Dutch native speaker, renders equal early task-related attention to both manner and path elements, without a stronger overt attentional focus on either when performing the picture-video matching task. Path of motion is also highly relevant in satellite-framed languages, given that it reflects the core schema of a motion event (Slobin, Reference Slobin, Strömqvist and Verhoeven2004; Talmy, Reference Talmy2000). Accordingly, our findings suggest that, with manner-encoding verbs being more ubiquitous in satellite-framed languages, the saliency of manner is on a par with path in speakers of satellite-framed languages.
A LP was found in the exploratory analysis of the 700–1,000 ms time window, more positive for the endpoint-matching oddball stimuli than the manner-matching oddballs, with a parietal scalp distribution. Such effect was significant only in the Turkish–Dutch bilingual group, not in the Dutch controls. In what follows, we provide several possible interpretations of the observed LP effect: (1) reanalysis of an initially processed feature given more context, (2) recollection of a specific feature, (3) reorientation of attention to a specific feature, and (4) language-modulated attention where a bilingual’s less dominant language exerts its influence in this time window.
A similar late positive component (often termed as LPC, also P600) has been previously reported for nonmatching stimuli in both verbal and nonverbal domains. In the nonverbal domain, such P600 effect was found in nonlinguistic but syntax-like processing (e.g., Christiansen, Conway, & Onnis, Reference Christiansen, Conway and Onnis2012). In the verbal domain, such P600 effect was originally viewed as an index of syntactic/structural processing (Hagoort, Brown, & Groothusen, Reference Hagoort, Brown and Groothusen1993; Hagoort, Brown, & Osterhout, Reference Hagoort, Brown and Osterhout1999; Osterhout & Holcomb, Reference Osterhout and Holcomb1992). However, it was later shown that P600 is not specific to structural processing, but is also sensitive to semantic anomalies (Sitnikova et al., Reference Sitnikova, Holcomb, Kiyonaga and Kuperberg2008; Vissers et al., Reference Vissers, Kolk, Van de Meerendonk and Chwilla2008). Vissers et al. (Reference Vissers, Kolk, Van de Meerendonk and Chwilla2008) proposed the monitoring theory, which suggests that P600 reflects a reanalysis of the stimulus resulting from a misalignment between syntactic and plausibility analyses. Yet another theory suggests that P600 is a member of the domain-general P300 ERP family (Sassenhagen et al., Reference Sassenhagen, Schlesewsky and Bornkessel-Schlesewsky2014; Sassenhagen & Fiebach, Reference Sassenhagen and Fiebach2019), reflecting response selection or classification resulting from subjective salience of the stimulus. Synthesizing the above-mentioned literature and taking into account our current data, our view is that the LP/LPC/P600 reflects a reanalysis process that seeks to resolve an incongruence introduced by ill fit of a specific feature in a given (verbal or nonverbal) context.
The second possible interpretation of the observed LP is recollection of features. In the dual-process model of recognition memory (e.g., Yonelinas, Reference Yonelinas2001), when processing a stimulus that is recognized as one that has been experienced previously, two processes are engendered: familiarity, or a feeling of knowing, and recollection, which reflects a retrieval of qualitative information related to the recognized item (Rugg & Curran, Reference Rugg and Curran2007). Recollection is conceived of as a slower process that involves accessing not only the prior occurrence of the episode, but also its specific features. An ERP signature of this process is a late positive shift, often posteriorly distributed, which has been found for the recollection of words and pictures (Curran, Reference Curran2000; Curran & Cleary, Reference Curran and Cleary2003; Duarte et al., Reference Duarte, Ranganath, Winward, Hayward and Knight2004; Galli & Otten, Reference Galli and Otten2011; Johansson et al., Reference Johansson, Stenberg, Lindgren and Rosén2002; Kuo & Van Petten, Reference Kuo and Van Petten2006; Rugg et al., Reference Rugg, Mark, Walla, Schloerscheidt, Birch and Allan1998; Woodruff, Hayama, & Rugg, Reference Woodruff, Hayama and Rugg2006). Accordingly, we suggest that the observed LP might indicate that Turkish–Dutch bilinguals engaged in the process of feature recollection when encountered with endpoint-matching stimuli.
The third potential explanation is reorientation of attention toward a specific feature. In our experiment context, the feature that triggered such later processing could be either the matching endpoint or the mismatching manner. If the matching endpoint was the feature that gave rise to the LP, it could very well be a delayed P300 component, reflecting the matching of the current still picture to the mental representation (Kok, Reference Kok2001) based on the preceding motion video. In this scenario, despite task-relevant attention indexed by the P300 being distributed between manner and endpoint, bilinguals’ later processing of the stimuli indexed by the LP nevertheless could reflect an attention bias toward the matching endpoint. This suggests a two-stage model in how Turkish–Dutch bilinguals process motion events: First, in the P300 time window (350–700 ms), they attended to manner and path simultaneously. Second, in a later time-window (700–1,000 ms), enhanced processing of path was evidenced. We speculate that the second stage could indicate reorientation of attention toward the endpoints to reassess the degree of match. Such delayed effect could stem from Dutch dominance in this early bilingual group. That is, language-specific monitoring (Kolk et al., Reference Kolk, Chwilla, Van Herten and Oor2003) could be run consecutively, rather than simultaneously, with the dominant-Dutch attention pattern preceding the pattern consistent with the nondominant Turkish, partially due to the task language being Dutch.
Finally, the last potential interpretation of LP is that it was triggered by the mismatching manner. According to our predictions, manner information could be less salient in bilinguals whose languages belong to different typological categories. Nevertheless, it could become a salient feature for the purpose of the matching task (cf. Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010), and thus require attention. There is an important linguistic consideration that needs to be accounted for in this scenario. In Turkish sentences, the verb typically comes at the end of the sentence (e.g., Adam trafik ışığ-ı-na doğru ilerl-(i)yor ‘The man toward the traffic light is proceeding’). If a Turkish speaker decides to encode manner, it could be encoded in the verb at the end of a sentence (e.g., Adam trafik ışığ-ı-na doğru bisiklet sürüyor ‘The man toward the traffic light is cycling’), or in an adjunct preceding the verb (e.g., Adam trafik ışığ-ı-na doğru bisiklet sür-erek ilerl-(i)yor ‘The man toward the traffic light by riding the bicycle is proceeding’). While this highlights the individual preference nature of manner encoding in Turkish, it is also important to point out that, in case the manner verb is used, the manner information would be encoded after the endpoint information, and thus would require initial attention toward endpoint, and only following that, attention to manner. In this scenario, LP could again indicate two stages of visual attention toward motion in Turkish–Dutch bilinguals, where Dutch-consistent task-related attention pattern is used first (resulting in equal attention toward path and manner in the P300 domain), and Turkish-consistent later attention toward manner comes second.
The absence of P300 effects for the critical oddballs between groups could further be explained by a number of other factors. First, the experiment was conducted in Dutch, where no Turkish cues were given to prompt a Turkish or a bilingual mode (Grosjean, Reference Grosjean and Nicol2001). Literature suggests that early bilinguals may have heightened attention toward their environment language (e.g., Kuipers & Thierry, Reference Kuipers and Thierry2010). Specifically, in the motion event perception domain, Lai, Garrido Rodriguez, and Narasimhan (Reference Lai, Garrido Rodriguez and Narasimhan2014) found that (late) bilinguals oriented toward manner more when they used Spanish (verb-framed) language during the experiments, not when they used English (satellite-framed) language. Our findings suggest that in the case of Dutch-dominant early Turkish–Dutch bilinguals, the language of the instructions (and the testing environment more generally) might have enhanced the activation of Dutch, which happened to be beneficial for the task performance. This speaks to the role of the dominant, more activated language in typology-driven attention effects in early bilinguals: the dominant, most activated language may affect early attentional processes, while the weaker, less activated language likely affects later processing, reflected by LP. However, to confirm this possibility as well as to distinguish between specific contributions of language dominance and relative activation, future research with closely matched bilingual groups performing the task with Turkish and Dutch instructions is needed.
Second, our design did not include a verbal interference task to prevent participants from using language as a tool to memorize the motion details in the video clip right before the presentation of the still picture, during which they had to make a decision about the match. In other words, it is possible that participants could rely on their verbal working memory (Athanasopoulos & Bylund, Reference Athanasopoulos and Bylund2013) in either of the languages that they knew. It is thus possible that the bilingual group relied on Dutch for this task.
Third, the P300 we were expecting is task-related. The lack of such task-related P300 effect could reflect a reduction of effort due to bilinguals’ efficient strategy of attending to the most task-relevant features. There has been evidence that early Spanish–English bilinguals assumed a more flexible English-like pattern of attention toward both manner and path in a nonverbal motion categorization task (cf. Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). Our Turkish–Dutch bilinguals, likewise, could have assumed the most flexible attention pattern, which was consistent with the Dutch pattern, to ensure successful task performance. Forth and finally, typological differences or the current classification of the typology may not have a strong enough influence on earlier processes in visual perception of motion. Talmy’s typology has been criticized and manner or path biases are really a trend, rather than a rule. For example, Pavlenko and Volynsky (Reference Pavlenko and Volynsky2015) pointed out that the degree of obligatoriness of manner encoding across satellite-framed languages matters for manner bias (cf. Montero-Melis et al., Reference Montero-Melis, Eisenbeiss, Narasimhan, Ibarretxe-Antuñano, Kita, Kopecka, Lüpke, Nikitina, Tragel, Jaeger and Bohnemeyer2017). Similarly, in a picture-matching study by Flecken and Van Bergen (Reference Flecken and Van Bergen2020), ERPs recorded from native English and Dutch participants for mismatching object configurations, also yielded no group differences in the P300 domain, likely due to probabilistic encoding of object position in English. In contrast, grammatical features are encoded more regularly and may have a stronger overall effect on event conceptualization and early attention (cf. Flecken, Athanasopoulos, et al., Reference Flecken, Athanasopoulos, Kuipers and Thierry2015).
Our findings have important implications for theories regarding the mechanisms underlying the relationship between language and attentional processes. The nonverbal ERP paradigm used in our study likely involved the recruitment of highly automatized visual processing routines reflected in the earlier, P300, time-window. The lack of P300 effect for critical oddballs suggests that these routines draw on long-term motion representations that highlight both manner and path elements in both Dutch-dominant early Turkish–Dutch bilinguals and Dutch controls. However, the extent to which such routines are entrenched in the dominant language or are purely task-driven is for the future research to determine. We did not find evidence that would show an effect of experience with verb-framed Turkish on long-term motion representations in early Turkish–Dutch bilinguals.
The LP effect found in our study suggests that experience with typologically distinct languages results in a second wave of attentional processing. This later wave of attentional processing provides an insight into the electrophysiological correlates of Thinking for Speaking effects. With both languages active in parallel, but to varying degrees, early bilinguals’ attention to motion events is two-staged. First, the task-driven attentional process, consistent with the dominant language, is employed. Second, the weaker, less activated language drives attentional processing downstream. An intriguing possibility is that this second wave of attention is the correlate of behavioral differences, such as those found in motion event categorization (Lai, Garrido Rodriguez, & Narasimhan, Reference Lai, Garrido Rodriguez and Narasimhan2014; Park, Reference Park2020) and visual attention allocation patterns (Flecken, Carroll, et al., Reference Flecken, Carroll, Weimar and Von Stutterheim2015), in early cross-typological bilinguals. Clearly, more research is needed to confirm this possibility.
Lastly, our study has a number of limitations. First, our design precluded us from manipulation of the language of instructions. All instructions were provided in Dutch to both Turkish–Dutch early bilinguals and Dutch controls. This likely activated Dutch in the bilingual group and may have reduced the potential impact of Turkish. Our findings thus could be at least partially influenced by stronger activation of Dutch in the bilingual participants. Although the use of Dutch as the language of instructions in our study was motivated by ensuring utmost comparability with the Dutch control group, future research should investigate whether instructions provided in Turkish would affect the attention patterns in closely matched early bilingual Turkish–Dutch groups. Second, our early Turkish–Dutch bilingual group was more proficient in Dutch than Turkish. Comparison with a more balanced or a Turkish-dominant group might reveal a more fine-grained picture of the relationship between language experience and attentional processes. Third, some limitations are related to the stimuli used in our experiment. Schematic motion scenes could potentially diminish the salience of motion elements (manner and endpoint). Although the animation clips included figures moving toward endpoints, the figures themselves were not animated to imitate the manner of motion in a naturalistic way. This could have diminished the salience of the manner of motion overall. The path of motion was represented by the endpoint, rather than the trajectory of motion. Although the trajectory was not relevant for the task, it is possible that typological differences may have a stronger effect on the attention to the trajectory, rather than endpoint of motion. Finally, we did not control for visual salience of the elements included into the motion scenes, and neither did our design, nor our statistical analyses, account for potential differences in the processing of individual items. Item variation could be a contributing factor in studies investigating attention patterns. Thus, future research should assume a more varied approach to the selection of stimulus items and their processing, as this could reveal patterns in line with effects of motion feature salience in motion event similarity judgments observed earlier (e.g., Bohnemeyer, Eisenbeiss, & Narasimhan, Reference Bohnemeyer, Eisenbeiss and Narasimhan2006).
In conclusion, the present study examined potential attentional biases toward manner or path of motion in Turkish–Dutch early bilinguals, compared to a control group of non-Turkish speaking Dutch participants. We found no difference in the oddball P300 effects between groups, and also a LP effect that is more positive for the endpoint-match condition in the bilinguals, not in the control group. We suggest that the oddball P300 reflects attention to the explicit task, and that the LP likely reflects late attention processes that could be influenced by language. We conclude that bilinguals who speak two typologically different languages showed a dual attention pattern toward path and manner.
Acknowledgments
We thank Fatih Bayram, Mehtap Acar, Serdar Acar, Elif Burhan Horasanlı, and the reviewers of the manuscript for their helpful comments and insights.
Data Availability Statement
Data files as well as analyses and plotting scripts are accessible at https://tinyurl.com/2fjhxa3m.
Conflict of Interests
The authors report no conflict of interests.
A. Appendix
Note. In the animations, each manner was paired with two different endpoints. Likewise, each endpoint was paired with two manners.