1. Introduction
A remarkable feature of multilingual speakers is the ability to engage with several acquired languages, seemingly without effort. In this paper, we will broadly refer to multilinguals as those language users who have acquired one or more non-native language(s) in addition to their native language, L1 (Cenoz, Reference Cenoz2013; De Groot, Reference De Groot, Kim and McKay-Semmler2017). Over the past decades, numerous studies have attempted to capture the complexity of the multilingual experience. In particular, they focused on the cognitive, structural and functional consequences of managing several languages in the brain (Abutalebi & Green, Reference Abutalebi and Green2007; Bialystok, Craik, & Luk, Reference Bialystok, Craik and Luk2012; Green, Reference Green1998; Kroll, Dussias, Bice, & Perrotti, Reference Kroll, Dussias, Bice and Perrotti2015; Mosca, Reference Mosca2017; Pliatsikas, Reference Pliatsikas2020; Schwieter, Reference Schwieter2016; Sebastián-Gallés & Kroll, Reference Sebastián-Gallés, Kroll, Schiller and Meyer2003).
A well-established aspect of the cognitive architecture of multilingualism is the parallel activation of languages across a range of proficiency levels, language combinations and linguistic domains (Blumenfeld & Marian, Reference Blumenfeld and Marian2013; Colomé, Reference Colomé2001; Costa, Caramazza, & Sebastián-Gallés, Reference Costa, Caramazza and Sebastián-Gallés2000; Dijkstra, Van Heuven, & Grainger, Reference Dijkstra, Van Heuven and Grainger1998; Guo & Peng, Reference Guo and Peng2006; Hoshino & Thierry, Reference Hoshino and Thierry2011). In order to successfully mitigate parallel activation and to ultimately select the appropriate target language, multilinguals must employ a language control mechanism on the non-target language (Abutalebi & Green, Reference Abutalebi and Green2007; Christoffels, Firk, & Schiller, Reference Christoffels, Firk and Schiller2007; Costa & Santesteban, Reference Costa and Santesteban2004; Declerck, Koch, Duñabeitia, Grainger, & Stephan, Reference Declerck, Koch, Duñabeitia, Grainger and Stephan2019; Green, Reference Green1998). Here, language control is conceptualised as a collection of control mechanisms applied to multilingual speech production and comprehension (Abutalebi, Reference Abutalebi2008; Green & Abutalebi, Reference Green and Abutalebi2013). From a theoretical point of view, this notion is featured in Green's (Reference Green1998) Inhibitory Control (IC) model of language control, which postulates that the non-target language needs to be suppressed prior to the linguistic output.
The exact nature of the mechanisms underlying language control is yet to be established. There is a substantial amount of evidence suggesting that language control is strongly associated with domain-general inhibitory control, also termed cognitive control or executive control (Bialystok et al., Reference Bialystok, Craik and Luk2012; Declerck, Meade, Midgley, Holcomb, Roelofs, & Emmorey, Reference Declerck, Meade, Midgley, Holcomb, Roelofs and Emmorey2021; Festman, Rodriguez-Fornells, & Münte, Reference Festman, Rodriguez-Fornells and Münte2010). Inhibitory control is an executive function used to regulate and inhibit irrelevant information with respect to thoughts or behaviour, as well as switching attention (Diamond, Reference Diamond2013; Miyake, Friedman, Emerson, Witzki, Howerter, & Wager, Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000). Some studies indicate that language control impacts executive functions – for example, inhibitory control (Bialystok, Reference Bialystok2010; Bialystok, Craik, Klein, & Viswanathan, Reference Bialystok, Craik, Klein and Viswanathan2004a; Green & Abutalebi, Reference Green and Abutalebi2013; Kroll & Bialystok, Reference Kroll and Bialystok2013; Miyake et al., Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000; Wiseheart, Viswanathan, & Bialystok, Reference Wiseheart, Viswanathan and Bialystok2016). Critically, evidence further suggests that language control may share some underlying processing mechanisms with inhibitory control (Declerck et al., Reference Declerck, Meade, Midgley, Holcomb, Roelofs and Emmorey2021; Green, Reference Green1998; Linck, Hoshino, & Kroll, Reference Linck, Hoshino and Kroll2005; Weissberger, Gollan, Bondi, Clark, & Wierenga, Reference Weissberger, Gollan, Bondi, Clark and Wierenga2015), although this notion is still debated (Branzi, Della Rosa, Canini, Costa, & Abutalebi, Reference Branzi, Della Rosa, Canini, Costa and Abutalebi2016; Calabria, Hernandez, Branzi, & Costa, Reference Calabria, Hernandez, Branzi and Costa2012).
In the current study, we investigated the impact of multilingualism on inhibitory control performance. More specifically, we examined whether the typological similarity between languages of a multilingual plays a role in modulating inhibitory control performance. Typological similarity, also termed typological distance or language similarity, refers to linguistic and structural (dis)similarities across different languages spoken by multilinguals (Foote, Reference Foote and Leung2009; Putnam, Carlson, & Reitter, Reference Putnam, Carlson and Reitter2018; Westergaard, Mitrofanova, Mykhaylyk, & Rodina, Reference Westergaard, Mitrofanova, Mykhaylyk and Rodina2017). For example, Italian and Spanish may be considered as more typologically similar languages compared to language pairs such as Dutch and Spanish because of the larger degree of overlap in morphosyntax, gender systems and cognates (Paolieri, Padilla, Koreneva, Morales, & Macizo, Reference Paolieri, Padilla, Koreneva, Morales and Macizo2019; Schepens, Dijkstra, & Grootjen, Reference Schepens, Dijkstra and Grootjen2012; Serratrice, Sorace, Filiaci, & Baldo, Reference Serratrice, Sorace, Filiaci and Baldo2012).
Several studies have focused on the modulating effects of typological similarity on language control – for example, within the context of a classical Stroop paradigm (Brauer, Reference Brauer, Healy and Bourne1998; Coderre, Van Heuven, & Conklin, Reference Coderre, Van Heuven and Conklin2013; Van Heuven, Conklin, Coderre, Guo, & Dijkstra, Reference Van Heuven, Conklin, Coderre, Guo and Dijkstra2011). However, studies directly investigating the effect of typological similarity on domain-general inhibitory control performance are scarce (but see Bialystok, Craik, Grady, Chau, Ishii, Gunji, & Pantev, Reference Bialystok, Craik, Grady, Chau, Ishii, Gunji and Pantev2005; Linck et al., Reference Linck, Hoshino and Kroll2005; Yamasaki, Stocco, & Prat, Reference Yamasaki, Stocco and Prat2018). Typical experimental paradigms to explore domain-general inhibitory control are the Simon task (Bialystok et al., Reference Bialystok, Craik, Klein and Viswanathan2004a, Reference Bialystok, Craik, Grady, Chau, Ishii, Gunji and Pantev2005; Simon & Small, Reference Simon and Small1969), and the spatial Stroop task (Hilbert, Nakagawa, Bindl, & Bühner, Reference Hilbert, Nakagawa, Bindl and Bühner2014; Lu & Proctor, Reference Lu and Proctor1995; Luo & Proctor, Reference Luo and Proctor2013). The core feature of the Simon task is a conflict between the physical location of a stimulus and the response, e.g., a stimulus appearing on the right side of a screen while the corresponding response button is located on the left side. The Simon effect quantifies the difference in response times (RTs) between trials in which stimulus and response location match and trials in which stimulus and response location mismatch. Typically, longer RTs are linked to the mismatch trials. Accordingly, a smaller Simon effect reflects better inhibitory control performance, whereas a larger Simon effect reflects lower inhibitory control performance (Bialystok, Craik, Klein, & Viswanathan, Reference Bialystok, Craik, Klein and Viswanathan2004b).
In this study, we used the spatial Stroop task (Hilbert et al., Reference Hilbert, Nakagawa, Bindl and Bühner2014; Lu & Proctor, Reference Lu and Proctor1995), which is a combination of the Simon task and the classical colour-word Stroop task (MacLeod, Reference MacLeod1992; Stroop, Reference Stroop1935). While the classical Stroop task involves the naming of a colour-word written in either the matching ink colour (congruent trial), e.g., the word RED written in red ink, or the mismatching ink colour (incongruent trial), e.g., the word RED written in blue ink, the spatial Stroop task focuses on spatial stimulus-stimulus conflicts. The basic feature of the spatial Stroop task is that a target word (“left”, “right”, “up”, “down”) either matches its location on the screen, e.g., LEFT shown on the left side of the screen (congruent trial), or it does not match its location on the screen, e.g., LEFT shown on the right side of the screen (incongruent trial). The key to success in this task is to inhibit the irrelevant spatial stimulus property (e.g., the location of the word) and to instead focus on the relevant target stimulus property (the target word itself). In this task, inhibitory control performance is reflected in the spatial Stroop effect, which describes the quantitative difference in RTs between congruent and incongruent trials (Hilbert et al., Reference Hilbert, Nakagawa, Bindl and Bühner2014; La Heij, Van der Heijden, & Plooij, Reference La Heij, Van der Heijden and Plooij2001; Marian, Blumenfeld, Mizrahi, Kania, & Cordes, Reference Marian, Blumenfeld, Mizrahi, Kania and Cordes2013; Roelofs, Reference Roelofs2021; Van Heuven et al., Reference Van Heuven, Conklin, Coderre, Guo and Dijkstra2011). Drawing parallels between the Simon task, a smaller Stroop effect is reported to indicate better inhibitory control performance (Bialystok & Martin, Reference Bialystok and Martin2004; Costa, Hernández, & Sebastián-Gallés, Reference Costa, Hernández and Sebastián-Gallés2008; Heidlmayr, Moutier, Hemforth, Courtin, Tanzmeister, & Isel, Reference Heidlmayr, Moutier, Hemforth, Courtin†, Tanzmeister and Isel2014; Pardo, Pardo, Janer, & Raichle, Reference Pardo, Pardo, Janer and Raichle1990).
In the current study, the critical question we sought to answer was the following: does typological similarity between the two languages significantly modulate inhibitory control performance in multilinguals? A relevant theoretical framework for this particular question is the Conditional Routing Model (CRM) by Stocco, Yamasaki, Natalenko, and Prat (Reference Stocco, Yamasaki, Natalenko and Prat2014). The model is based upon the notion that the multilingual experience dynamically impacts domain-general executive functions, including inhibitory control, as a result of the parallel activation of the languages (Bialystok & Martin, Reference Bialystok and Martin2004; Festman et al., Reference Festman, Rodriguez-Fornells and Münte2010). Here, the model postulates that executive functions are effectively trained over time (Kroll & Bialystok, Reference Kroll and Bialystok2013; Yamasaki et al., Reference Yamasaki, Stocco and Prat2018), which results in a strengthening of the neural circuits underlying these executive functions. When the languages within a multilingual system are highly typologically similar, one may predict a higher degree of cross-language interference (Cenoz, Reference Cenoz, Cenoz, Hufeisen and Jessner2001; Chen, Zhao, Zhaxi, & Liu, Reference Chen, Zhao, Zhaxi and Liu2020; De Bot, Reference De Bot2004). In turn, this implies that speakers of these languages develop better inhibitory control skills compared to speakers of typologically less similar languages (Yamasaki et al., Reference Yamasaki, Stocco and Prat2018). Therefore, the CRM provides us with a testable prediction for the effect of typological similarity on inhibitory control performance: speakers of typologically similar languages should exhibit a better inhibitory control performance compared to speakers of typologically less similar languages. Applied to the context of a spatial Stroop task used in this study, speakers of typologically similar languages (e.g., Italian–Spanish) should therefore show a smaller Stroop effect compared to speakers of typologically less similar languages (e.g., Dutch–Spanish speakers).
1.1. The current study
We explored the modulatory role of typological similarity on inhibitory control performance in a spatial Stroop task (hereafter simply Stroop task) in two groups of speakers with differing degrees of typological similarity. Participants were native Italian learners of Spanish, and native Dutch learners of Spanish. On the basis of typological work by Schepens et al. (Reference Schepens, Dijkstra and Grootjen2012) and Van der Slik (Reference Van der Slik2010), we defined our Italian–Spanish group as our typologically similar group, and our Dutch–Spanish group as our typologically dissimilar group. All participants had a Spanish proficiency level in the B1/B2 range within the CEFR framework (Council of Europe, 2001). We followed a spatial Stroop paradigm inspired by Hilbert et al. (Reference Hilbert, Nakagawa, Bindl and Bühner2014), who used the location words “left”, “right”, “up” and “down” to study the Stroop effect in native speakers of German (see also Hodgson, Parris, Gregory, & Jarvis, Reference Hodgson, Parris, Gregory and Jarvis2009; Lu & Proctor, Reference Lu and Proctor1995; Shor, Reference Shor1970). In our paradigm, we exploited the conflict between the target word and the location of the target word on the screen – for example, the Spanish location word [izquierda] “left” displayed on the right side of the screen, or the Spanish word [derecha] “right” displayed on the left side of the screen. The translation equivalents for [izquierda] “left” and [derecha] “right” are “sinistra” and “destra” in Italian, and “links” and “rechts” in Dutch, respectively. In the congruent condition, the target word and the target word location matched. In contrast, in the incongruent condition, the target word and the target word location did not match. We measured accuracy and RTs during this task. Post-experiment, we calculated the Stroop effect by subtracting the RTs for congruent trials from RTs for incongruent trials. Importantly, we employed an equiprobable Stroop task design, whereby the probability of each condition occurring in the subsequent trial is identical. Within the framework of the Dual Mechanisms of Control (DMC) model (Braver, Reference Braver2012), an equiprobable Stroop task design is linked to a proactive control strategy. At the core of this particular strategy is the maintenance of goal-relevant information over time to succeed at the task (Braver, Reference Braver2012; Gonthier, Braver, & Bugg, Reference Gonthier, Braver and Bugg2016). Therefore, our Stroop task taps not only into inhibitory control performance per se, but also into the cognitive mechanisms of monitoring the task.
Research questions
Our research questions were the following: first, is there a difference in terms of RTs as a function of typological similarity (typologically similar vs. typologically dissimilar)? Secondly, connected to this first question, is the Stroop effect larger for one group compared to the other, thereby reflecting an effect of typological similarity on inhibitory control performance?
Hypotheses
Based on the literature outlined above, we first predicted a Stroop effect for both the Italian–Spanish group and the Dutch–Spanish group. Behaviourally speaking, this would be reflected in higher accuracy and shorter RTs for congruent trials compared to incongruent trials. Next, in line with the CRM (Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2014) we hypothesised overall shorter RTs for the Italian–Spanish group compared to the Dutch–Spanish group. Finally, we expected a difference in inhibitory control performance as a function of typological similarity: we expected an interaction effect of condition (congruent vs. incongruent) and typological similarity (typologically similar vs. typologically dissimilar) on Stroop effect sizes. A smaller Stroop effect for the Italian–Spanish group would imply that the overall inhibitory control performance is better for the typologically similar languages compared to the less typologically similar Dutch–Spanish group. In turn, this would support the CRM (Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2014).
2. Methods
In addition to the spatial Stroop task, we asked participants to complete the Language Experience and Proficiency Questionnaire, LEAP-Q (Marian, Blumenfeld, & Kaushanskaya, Reference Marian, Blumenfeld and Kaushanskaya2007). The LEAP-Q is a questionnaire designed to obtain a measure for the linguistic profile of our participants in terms of their proficiency levels and experiences with the languages within their multilingual system (Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007). Finally, participants also completed the Lextale-Esp (Izura, Cuetos, & Brysbaert, Reference Izura, Cuetos and Brysbaert2014), a lexical decision task to establish vocabulary size in Spanish, for descriptive purposes.
2.1. Participants
For the Italian–Spanish group, we recruited and tested 33 healthy, right-handed native speakers of Italian (26 females) with a B1/B2 level of Spanish at Pompeu Fabra University (Barcelona, Spain). Mean age of the Italian–Spanish group was 27.12 years (SDage = 4.08). Our recruitment criteria for this group were the following: no additional language learnt before the age of three, age of acquisition of Spanish from fourteen years onwards, a maximum time spent in a Spanish-speaking country of no longer than one year, no psychological, neurological, visual, auditory, or language-related impairments; and finally, an age range between 18 and 35 years. For the Dutch–Spanish group, we recruited and tested 25 healthy, right-handed native speakers of Dutch (16 females) with a B1/B2 level of Spanish at Leiden University (Leiden, The Netherlands). Mean age of the Dutch–Spanish group was 22.84 years (SDage = 3.05). Our recruitment criteria were identical to the Italian–Spanish group, with the cap on maximum time spent in a Spanish-speaking country less stringent due to the testing location. Data from the LEAP-Q was analysed to establish a detailed linguistic profile of each participant. See Appendix A and Appendix B for an overview of the profiles for the Italian–Spanish group and the Dutch–Spanish group, respectively.
LEAP-Q: Italian–Spanish group
With respect to their linguistic profile in Spanish, two participants acquired Spanish as first foreign language, whereas eighteen participants acquired Spanish as second foreign language. Spanish was the third foreign language for ten participants, and three participants acquired Spanish as fourth foreign language (Appendix A). The mean age of acquisition (AoA) of Spanish was 23.93 years (SD = 5.07). On average, participants reported to be fluent in Spanish at the age of 24.88 years (SD = 4.48), to have started reading in Spanish at the age of 24.36 years (SD = 4.91) and to be fluent readers by the age of 24.24 (SD = 4.82). On average, participants spent 0.46 years (SD = 0.343) in a Spanish-speaking country and had learnt Spanish for 0.93 years (SD = 1.17) either at school as a foreign language, or as a language course in Spain. Twenty-five participants were completing or had completed a formal Spanish language course that was not part of the school curriculum shortly before or upon their arrival in Spain (mean length of course: 0.53 years, SD = 0.889 years). Finally, participants quantified their current daily exposure to Spanish as 40% (SD = 18.37%) of the time with respect to the other languages spoken. In terms of dominance, thirteen participants classified Spanish as their most dominant language after Italian, fourteen participants as their second most dominant language after Italian, five participants as their third most dominant language after Italian, and one participant as their fourth most dominant language after Italian. On a ten-point scale, ten being maximally proficient, participants rated their speaking proficiency at 5.95 (SD = 2.02), comprehension proficiency at 7.20 (SD = 1.71) and their reading proficiency at 7.33 (SD = 1.47).
LEAP-Q: Dutch–Spanish group
In this group, nine participants stated that they acquired Spanish as their second foreign language, nine participants as their third foreign language and seven participants as their fourth foreign language (Appendix B). Mean AoA of Spanish was 17.84 years (SD = 3.16). People stated to be fluent in Spanish on average at the age of 19.6 years (SD = 2.52), that they started reading in Spanish at the age of 18.44 years (SD = 3.24), and that they were on average fluent in reading by 19.76 years (SD = 3.41). Eighteen out of the twenty-five participants spent on average 0.57 years (SD = 0.66) in a Spanish-speaking country (e.g., Spain, Argentina, Colombia, Mexico). Compared to the other languages, participants quantified their daily exposure to Spanish with 12.96% (SD = 10.07). Critically, two participants reported Spanish as their second most dominant language, nineteen as their third most dominant, three as their fourth most dominant and one participant as their fifth most dominant language following Dutch. On a ten-point scale (ten being maximally proficient), participants reported an average speaking proficiency in Spanish of 6.4 (SD = 1.47), a comprehension proficiency of 7.08 (SD = 1.32) and a reading proficiency of 7.08 (SD = 1.22). These ratings are highly comparable with the Italian–Spanish group.
2.2. Materials and design
Prior to the experiment, participants completed the LEAP-Q (Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007) at home to reduce self-report biases frequently induced in laboratory settings (Rosenman, Tennekoon, & Hill, Reference Rosenman, Tennekoon and Hill2011). During the experiment, we first asked participants to complete the Lextale-Esp (Izura et al., Reference Izura, Cuetos and Brysbaert2014), followed by the Stroop task.
Tasks and stimuli: Lextale-Esp
We administered the Lextale-Esp to establish vocabulary size in Spanish. The task was programmed in E-prime2 (Schneider, Eschman, & Zuccolotto, Reference Schneider, Eschman and Zuccolotto2002), using the exact same stimuli as in the original version by Izura et al. (Reference Izura, Cuetos and Brysbaert2014).
Tasks and stimuli: Stroop task.
We administered the Stroop task to measure inhibitory control performance in our Italian–Spanish speakers and Dutch–Spanish speakers. We again generated an E-prime2 (Schneider et al., Reference Schneider, Eschman and Zuccolotto2002) script for this task. The target words were the written Spanish words [izquierda] “left” and [derecha] “right”.
2.3. Procedure
Prior to initiating the experiment, participants were provided with an information sheet and the opportunity to ask clarification questions. Then, participants signed the consent form in compliance with the ethics code for linguistic research at the Faculty of Humanities at Leiden University. Before each task, we provided participants with written task instructions in Spanish. Upon termination of all tasks, participants were provided with a debrief sheet, they signed the final consent form and received a monetary compensation for their participation.
Lextale-Esp
The Lextale-Esp procedure was identical for both groups. We asked participants to indicate via a button press whether the string corresponded to a Spanish word (e.g., [secuestro] “kidnapping”) or a pseudoword (e.g., plaudir). Participants were instructed that incorrectly assigning a word status to a pseudoword and vice versa would lead to a deduction in the score. The trial procedure was as follows: first, a black fixation cross was displayed for 1,000 ms on a white screen. Then, a letter string corresponding to either a word or a pseudoword was displayed in the centre of the screen. The letter string remained on the screen until the participants’ response. After the participants’ response, the next trial was initiated. Sixty trials were Spanish word trials, whereas thirty were pseudoword trials. Trial order was randomised so that each participant was presented with a unique trial order.
Stroop task
The procedure for the Stroop task was the same for both groups. Participants were asked to focus on the target word while ignoring the location of the target word on the screen and to respond to the target word via button presses. The trial procedure was as follows: first, participants saw a black fixation cross for 500 ms in the centre of a white screen. Next, they saw a target word appear on either the left or right side of the screen along the horizontal midline in Spanish. This target word was either [izquierda] “left” or [derecha] “right”. The target word was visible on the screen until participants responded or for a maximum display time of 1,000 ms (Figure 1). The next trial was initiated after participants’ response, or if the response time limit was reached. Half of the trials were congruent trials, where the target word matched the location on the screen. The other half of the trials were incongruent trials, where the target word and the location on the screen did not match. There were 24 trials for each target word (izquierda/ derecha) and target location on the screen (left side/right side), amounting to 48 trials for the congruent condition and 48 trials for the incongruent condition. Prior to the start of the main experimental round, there was a short practise round to familiarise participants with the task procedure. Trial order was randomized in the practice round and in the main experimental round.
3. Results
3.1. Data exclusion
For the Italian–Spanish group, data from one participant were lost due to a technical failure. Therefore, we included 32 datasets in the analysis. In contrast, for the Dutch–Spanish group we included all 25 datasets in the analysis, adding to a total of 57 datasets.
3.2. Data analysis
We analysed our behavioural data using R, Version 4.0.3 (R Core Team, 2020) in RStudio, Version 1.4.1106. We employed a single trial generalised linear mixed effects modelling approach using the lme4 package (Bates, Mächler, Bolker, & Walker, Reference Bates, Mächler, Bolker and Walker2014; Bates, Mächler, Bolker, Walker, Christensen, Singmann, Dai, Scheipl, Grothendieck, Green, Fox, Bauer, & Krivitsky, Reference Bates, Mächler, Bolker, Walker, Christensen, Singmann, Dai, Scheipl, Grothendieck, Green, Fox, Bauer and Krivitsky2020). We first modelled the outcome variables accuracy and RTs separately for each individual group. Next, we pooled our data from both groups for a group comparison analysis to study potential effects of typological similarity on Stroop effect sizes. For both the individual group analyses and the group comparison analysis, we applied the following model fitting procedure: first, we constructed a theoretically plausible model with a maximal random effects structure as supported by our data (Barr, Levy, Scheepers, & Tily, Reference Barr, Levy, Scheepers and Tily2013; Matuschek, Kliegl, Vasishth, Baayen, & Bates, Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017). In our case, the maximal model was a random-intercept and random-slope model for both accuracy and RTs. In the case of non-convergence or singular fit, we simplified our random effects structure. Next, we generated the model of best fit in a top-down procedure, whereby we simplified the fixed effects structure in a stepwise fashion. After fitting each model, we performed model diagnostics to establish the goodness of fit using the DHARMa package (Hartig, Reference Hartig2020). This involved the plotting of the model residuals against the predicted values, and closely investigating the distribution of the residuals and the presence of influential data points to identify issues in terms of the model fit. Then, we compared models with different fixed effects structures to establish the model of best fit using the anova() function, which is based on the Akaike's Information Criterion, AIC (Akaike, Reference Akaike1974), the Bayesian Information Criterion, BIC (Neath & Cavanaugh, Reference Neath and Cavanaugh2012) and the log-likelihood ratio. To test for the significance of the terms in the fixed effects structure, absolute t-values greater than 1.96 were interpreted as statistically significant at α = 0.05 (Alday, Schlesewsky, & Bornkessel-Schlesewsky, Reference Alday, Schlesewsky and Bornkessel-Schlesewsky2017; Matuschek et al., Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017). Finally, the models of best fit for RTs were re-fitted using the REML criterion (Bates et al., Reference Bates, Mächler, Bolker and Walker2014; Verbyla, Reference Verbyla2019). All best-fitting models and model parameters are reported in Appendix C, D and E.
To model accuracy, we used the glmer() function with a binomial distribution. This particular function from the lme4 package uses maximum likelihood estimation via the Laplace approximation (Bates et al., Reference Bates, Mächler, Bolker, Walker, Christensen, Singmann, Dai, Scheipl, Grothendieck, Green, Fox, Bauer and Krivitsky2020). In contrast, we used the lmer() function with a normal distribution to model RTs for correct trials. For the individual group analysis, our fixed effect of interest was condition (congruent vs. incongruent), whereas subject and item were included as random effects. For the group comparison analysis, we used the lmer() function to model the interaction effect of condition (congruent vs. incongruent) and typological similarity (typologically similar vs. typologically dissimilar) as well as their main effects on RTs. Subject and item were again included as random effects. To control for potential co-variates, we included Lextale-Esp score and order of acquisition of Spanish as fixed effects in all analyses.
3.3. Lextale-Esp
Post-experiment, we calculated Lextale-Esp vocabulary size scores. We subtracted the percentage of incorrectly identified pseudowords from the percentage of correctly identified words (Izura et al., Reference Izura, Cuetos and Brysbaert2014). For the Italian–Spanish group, the mean Lextale-Esp score was 26.30 (SD = 14.04). Large individual differences were evident from the range of scores, which was between −7.37 to 49.30. In contrast, the mean Lextale-Esp score for the Dutch–Spanish group was 22.69 (SD = 17.19). The range was from −11.92 to 54.73, which yielded similar large individual differences between participants. A two-sample t-test yielded no significant statistical difference in LexTALE-Esp scores between the two groups with t(45.90) = 0.851, p = 0.399. According to calculations provided by Lemhöfer and Broersma (Reference Lemhöfer and Broersma2012), all of our speakers were at or below the B2 level for Spanish according to CEFR standards (Council of Europe, 2001), in line with our recruitment criteria.
3.4. Stroop task
We first computed descriptive statistics for accuracy and RTs for both groups. See Table 1 for descriptive mean accuracy, mean RTs and Stroop effects for the Italian–Spanish group and the Dutch–Spanish group. Descriptively speaking, results yielded overall longer RTs for the Italian–Spanish group compared to the Dutch–Spanish group. Moreover, the Stroop effect was descriptively larger for the typologically similar languages compared to the typologically dissimilar languages. We first discuss the individual analysis for the Italian–Spanish group and the Dutch–Spanish group, respectively. Then, we discuss the group comparison for the Stroop effect size.
Italian–Spanish group: Accuracy
For the Italian–Spanish group, the model of best fit included condition as fixed effect, as well as subject and item as random effects. The by-condition random slopes for subjects led to singular fit and were therefore dropped from the model fitting procedure. The fixed effects LexTALE-Esp score and order of acquisition of Spanish did not significantly improve the model fit. Participants were significantly more accurate for congruent trials compared to incongruent trials with β = 0.560, SE = 0.171, z = -3.38, p = .001 (see Appendix C for the full model parameters). See Figure 2 for mean accuracy for the Italian–Spanish group.
Italian–Spanish group: Response times
For the Italian–Spanish group, the model of best fit yielded an effect of condition, a random effect for subject and item and a by-subject random slope for condition (Figure 2). Neither Lextale-Esp score nor order of acquisition of Spanish significantly modulated the outcome variable or improved the model fit. These two fixed effects were therefore excluded from the model fitting procedure. Participants were statistically faster in responding in the congruent condition compared to the incongruent condition with β = 29.46, SE = 5.79, t = 5.09, p < .001 (see Appendix C).
Dutch–Spanish group: Accuracy
For the Dutch–Spanish group, the model of best fit included a fixed effect of condition and Lextale-Esp score, as well as by-subject random slopes for condition and subject as random effect. Item led to singular fit and was excluded from the model fitting procedure. Further, the fixed effect of order of acquisition of Spanish did not significantly improve the model fit. Participants were significantly more accurate in the congruent compared to the incongruent condition with β = 0.466, SE = 0.243, z = −3.13, p = .002. Despite being included in the final model, the fixed effect of Lextale-Esp score was not statistically significant with β = 0.986, SE = 0.009, z = −1.67, p = .096 (see Appendix D for the full model parameters). See Figure 3 for mean accuracy for the Dutch–Spanish group.
Dutch–Spanish group: Response times
For the Dutch–Spanish group, we found that the model of best fit included condition as fixed effect, subject as random effect and by-subject random slopes for condition (Figure 3). The random effect for item was not supported by our data and was therefore excluded from the random effects structure. Neither Lextale-Esp score nor order of acquisition of Spanish significantly improved the model fit and were subsequently dropped from the model selection procedure. Participants were significantly faster in responding in the congruent condition compared to the incongruent condition, with β = 15.31, SE = 5.56, t = 2.75, p = .006 (see Appendix D).
In sum, data from both the Italian–Spanish group and the Dutch–Spanish groups suggest that participants were significantly more accurate and faster in the congruent condition compared to the incongruent condition. Therefore, both groups displayed the Stroop effect.
Stroop effect: group comparison
Finally, we compared the Stroop effect (RT incongruent trials minus RTs congruent trials) between the Italian–Spanish and Dutch–Spanish group to explore the possible impact of typological similarity. Here, we explored the interaction effect between condition and typological similarity on the size of the Stroop effect. Descriptively speaking, the Stroop effect was larger for the Italian–Spanish group compared to the Dutch–Spanish group. However, the model of best fit yielded a main effect of condition with participants being faster for congruent trials compared to incongruent trials with β = 23.20, SE = 4.40, t = 5.27, p < .001. The model also included a main effect of typological similarity, with participants from the typologically similar group (Italian–Spanish) being significantly slower compared to the typologically dissimilar group (Dutch–Spanish) with β = 30.70, SE = 11.72, t = 2.62, p = .009 (see Appendix E). There was no evidence for an interaction effect between condition and typological similarity. See Appendix E for full model specification details, as well as a comparison between the model that included the interaction term and the best-fitting model that did not include the interaction term. Further, the model of best fit also included a by-subject random slope for condition as well as item as random effect. The fixed effects Lextale-Esp score and order of acquisition of Spanish did not significantly contribute to improving the model fit and were therefore not included in the final model. See Figure 4 for the comparison of the Stroop effect across the Italian–Spanish and Dutch–Spanish group.
4. Discussion
In this study, we explored the effect of typological similarity on inhibitory control performance in a group of Italian–Spanish speakers and a group of Dutch–Spanish speakers via a spatial Stroop task. The goal of this study was twofold: first, we examined whether or not the typologically similar (Italian–Spanish) group showed a general processing advantage over the typologically dissimilar (Dutch–Spanish) group in terms of RTs. Secondly, we studied whether typological similarity yielded a difference between the two groups in terms of Stroop effect sizes (difference in RTs between incongruent and congruent trials). Here, a smaller Stroop effect would be indicative of better inhibitory control performance. On the basis of the CRM (Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2014), we expected shorter RTs and a smaller Stroop effect for the Italian–Spanish group compared to the Dutch–Spanish group.
Stroop data from both the Italian–Spanish and the Dutch–Spanish group showed that participants were sensitive to the inherent task conflict. More specifically, results demonstrated higher accuracy and shorter RTs for congruent compared to incongruent trials. This yields the typical Stroop effect, which is a measure of inhibitory control performance in this task. To succeed at this task, participants had to ignore the irrelevant information (location of the target) and instead focus on the target word itself to provide a correct response. Further, as discussed in the introduction, participants had to employ a proactive control strategy (Braver, Reference Braver2012; Gonthier et al., Reference Gonthier, Braver and Bugg2016) and monitor the goal-relevant information during the task, as described in the DMC model (Braver, Reference Braver2012; Gonthier et al., Reference Gonthier, Braver and Bugg2016). Therefore, the presence of a Stroop effect in both groups reflects not only a measure for inhibitory control performance, but also a monitoring strategy to solve this task.
With respect to the first research question, the group comparison analysis showed that the typologically dissimilar (Dutch–Spanish) group was comparatively faster than the typologically similar (Italian–Spanish) group in this task. This finding contrasts with our predictions. The original prediction on the basis of the CRM (Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2014) was a processing advantage for the typologically similar Italian–Spanish group compared to the Dutch–Spanish group due to continuous training of executive functions and inhibitory control skills over time. In contrast, our findings suggest that typologically dissimilar Dutch–Spanish group had a processing advantage in terms of RTs over the Italian–Spanish group. In the literature, similar findings were reported by Bialystok et al. (Reference Bialystok, Craik, Grady, Chau, Ishii, Gunji and Pantev2005), who investigated the role of typological similarity on the performance during a Simon task in highly proficient Cantonese–English speakers (typologically dissimilar group) and highly proficient French–English speakers (typologically more similar group). Results showed a processing advantage for Cantonese–English speakers compared to the French–English speakers in the form of faster RTs on the Simon task for Cantonese–English speakers (see also Linck et al., Reference Linck, Hoshino and Kroll2005). Our results are comparable to Bialystok et al. (Reference Bialystok, Craik, Grady, Chau, Ishii, Gunji and Pantev2005), and suggest that in this particular task, typological dissimilarity was advantageous over typological similarity. Moreover, these results suggest a qualitative difference between the Italian–Spanish and the Dutch–Spanish group – namely, a more efficient inhibitory control strategy for the speakers of the less typologically similar languages. Within the framework of the DMC model (Braver, Reference Braver2012) and the application of proactive control strategies during this task (Braver, Reference Braver2012; Gonthier et al., Reference Gonthier, Braver and Bugg2016), this implies that Dutch–Spanish speakers were more effective at employing a proactive control strategy, as reflected in overall shorter RTs. In other words, speakers of typologically more dissimilar languages were better at monitoring and actively maintaining goal-related information compared to speakers of typologically similar languages. This has critical implications for the conceptualisation of the underlying cognitive mechanisms for typologically similar vs. dissimilar language combinations.
With respect to our second research question, there was a descriptive trend of a smaller Stroop effect for the Dutch–Spanish group compared to the Italian–Spanish group. However, the overall processing advantage of the Dutch–Spanish group over the Italian–Spanish group was not reflected in the size of the Stroop effect. More concretely, we did not find a statistical difference between the Stroop effect size for the Italian–Spanish group compared to the Dutch–Spanish group. This finding was somewhat surprising and contrasts with our original predictions. Our result suggested, first, that the Stroop effect was unaffected by typological similarity, and second, that speakers of both groups demonstrated a highly comparable inhibitory control performance in this task. Importantly, the CRM framework proposed by Stocco et al. (Reference Stocco, Yamasaki, Natalenko and Prat2014) does not fully account for these specific findings. Instead, our findings strongly suggest a limited modulatory role of typological similarity on inhibitory control performance in this study. One arising question here is the following: why were the Dutch–Spanish speakers faster, but not better, compared to the Italian–Spanish speakers at performing the Stroop task?
One interpretation of our findings could be that factors other than typological similarity influence inhibitory control performance in this task. These other potentially modulating factors exert their influence such that one group had an inhibitory control advantage in terms of processing speed, but not in terms of overall performance. A well-established modulatory factor in language control, but less in inhibitory control, is language proficiency, as postulated in the IC model (Green, Reference Green1998). Previous studies have shown that multilingual children with a low non-native proficiency display unilateral cross-language interactions from the L1 into the L2 compared to multilingual children with high non-native proficiency (Brenders, Van Hell, & Dijkstra, Reference Brenders, Van Hell and Dijkstra2011; Poarch & Van Hell, Reference Poarch and Van Hell2012a). As outlined in Poarch and Van Hell (Reference Poarch and Van Hell2012b), this could indicate that less language control effort is needed to manage the native and the non-native languages. In turn, this implies less training of more general executive control functions such as inhibitory control if the difference in proficiency levels between the native and non-native language is considerable.
More specific to our intermediate late learners of Spanish, one could argue that our participants have not yet sufficiently trained their inhibitory control skills given their intermediate level of non-native proficiency, in turn accounting for a limited effect of typological similarity in this study. Therefore, one possibility is that there is an interaction effect between typological similarity and non-native proficiency, and only a particular degree of typological similarity paired with a specific proficiency level leads to training of the inhibitory control skills. This tentative hypothesis is partially in line with language control research by Brauer (Reference Brauer, Healy and Bourne1998). This study explored the effect of typological similarity on language control via the within-language Stroop effect and the between-language Stroop effect in speakers of typologically similar languages (German–English) and typologically dissimilar languages (English–Greek and English–Chinese) in the classical Stroop paradigm. The within-language Stroop effect refers to the differences in RTs between the congruent and incongruent condition when the stimulus and response languages are identical. On the other hand, the between-language Stroop effect quantifies the differences in RTs between the congruent and incongruent condition when the stimulus and response languages are different (Brauer, Reference Brauer, Healy and Bourne1998; Marian et al., Reference Marian, Blumenfeld, Mizrahi, Kania and Cordes2013; Van Heuven et al., Reference Van Heuven, Conklin, Coderre, Guo and Dijkstra2011). Critically, Brauer (Reference Brauer, Healy and Bourne1998) included low- and high-proficient speakers to also explore the effect of proficiency on inhibitory control performance. All three groups showed a within-language and a between-language Stroop effect. On the one hand, low proficiency in the non-native language was linked to larger differences between the within-language and the between-language Stroop effect across the native and non-native language, irrespective of typological similarity. On the other hand, highly proficient speakers in the typologically dissimilar group were linked to larger within-language compared to between-language Stroop effects in both the native and non-native language. Importantly, highly proficient speakers in the typologically similar group showed no difference between the within-language and the between-language Stroop effect. Therefore, these results suggest that when the difference in proficiency levels is considerable (i.e., low proficiency in the non-native language), the effect of typological similarity on language control performance may be limited, potentially because the amount of “training” of the inhibitory skills has not yet been sufficient to elicit any typological similarity effects.
Given the strong link between language control and domain-general inhibitory control (Bialystok et al., Reference Bialystok, Craik and Luk2012; Declerck et al., Reference Declerck, Meade, Midgley, Holcomb, Roelofs and Emmorey2021; Festman et al., Reference Festman, Rodriguez-Fornells and Münte2010), this argument could be applied to our study: our Italian–Spanish and Dutch–Spanish speakers were late language learners of Spanish who had a B1/B2 proficiency level in Spanish. We therefore postulate that the difference in proficiency between the native language (i.e., Italian or Dutch) and the non-native language Spanish was too substantial to elicit a typological similarity effect on inhibitory control performance, even at intermediate B1/B2 proficiency levels. However, we anticipate that with increasing non-native proficiency levels, a typological similarity effect on inhibitory control may be more pronounced. In view of this, it may not be surprising that inhibitory control performance (i.e., the size of the Stroop effect) was statistically equal given that our groups had highly comparable proficiency levels in their non-native language Spanish. Thus, while our findings are not fully compatible with the CRM framework proposed in the introduction (Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2014), they suggest that at intermediate non-native proficiency levels, the modulating role of typological similarity is not yet traceable at the behavioural level.
A second interpretation of our findings could be that managing cross-language interference between two typologically similar languages does not directly transfer to strengthening the networks underlying inhibitory control. While we know that speaking multiple languages has a direct impact on language control (Coderre & Van Heuven, Reference Coderre and Van Heuven2014; Coderre et al., Reference Coderre, Van Heuven and Conklin2013; Green, Reference Green1998; Green & Abutalebi, Reference Green and Abutalebi2013; Mosca, Reference Mosca2019), this may not generalise to broader executive functions such as inhibitory control. Contrary to the predictions by the CRM (Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2014), it may be the case that speaking typologically similar languages does not result in a quantitative difference in the amount of training of executive functions over time compared to typologically dissimilar languages. Therefore, the link between speaking typologically similar languages, language control and inhibitory control needs to be more closely inspected in future studies, specifically, the association between language control and inhibitory control.
Considering our compelling findings, the current study takes an important step towards understanding the relative contribution of typological similarity to inhibitory control performance. Taken together, our results suggest that typological similarity only plays a limited role in modulating inhibitory control performance, already at the stage when there is a moderate difference in proficiency levels between the native and the non-native language. However, typological similarity may start to play a role only when non-native proficiency becomes more native-like. Second, our findings further suggest a more complex link between managing multiple languages and more general inhibitory control skills. This could imply that multilingualism primarily influences language control, but that it has only limited effect on domain-general inhibitory control mechanisms. Therefore, our results have important implications for the conceptualisation of the underlying processes of inhibitory control and add novel evidence to the debate around the role of typological similarity in inhibitory control performance.
5. Conclusions
In this study, we used a spatial Stroop task to examine whether and how inhibitory control performance measured via the Stroop effect was modulated by typological similarity. We found that the typologically dissimilar (Dutch–Spanish) group was faster in performing the task compared to the typologically similar (Italian–Spanish) group. This implied that the Dutch–Spanish group was better at monitoring goal-related information throughout the task compared to the Italian–Spanish group. Critically, this did not impact the overall Stroop task performance. Instead, the size of the Stroop effect, and in turn inhibitory control performance, were similar across both groups, irrespective of typological similarity. Therefore, our results suggest that typological similarity plays a limited role in modulating inhibitory control performance, particularly in intermediate proficient multilinguals with considerable differences in proficiency between their L1 and non-native language(s).
5.1. Future directions
Our findings open new avenues to expand on current theoretical frameworks describing the impact of typological similarity on inhibitory control. An emerging line of research could focus on quantifying the degree of interference between typologically similar vs. dissimilar languages and the consequences for language control and/or inhibitory control. For this, future studies should first: investigate language pairs with varying degrees of typological similarity; second, include separate measures for both language control and inhibitory control performance; and, finally, recruit speakers of different proficiency levels to tease apart the potentially critical effects of proficiency in modulating inhibitory control performance. Recent years have also seen an increase in research on the neurocognition of inhibitory control which combines behavioural measures with electrophysiological and neuroimaging methods (Abutalebi et al., Reference Abutalebi, Della Rosa, Green, Hernandez, Scifo, Keim, Cappa and Costa2012; Christoffels et al., Reference Christoffels, Firk and Schiller2007; Constantinidis & Luna, Reference Constantinidis and Luna2019; Grundy, Anderson, & Bialystok, Reference Grundy, Anderson and Bialystok2017). Future studies in this area of research should also incorporate both offline and online measures such as electroencephalography or fMRI measures to model the cognitive and neural mechanisms underlying inhibitory control performance in multilingual language processing.
Acknowledgments
We thank Núria Sebastián-Gallés and the CBC laboratory team from Pompeu Fabra University for providing the facilities for the data collection. A special thanks goes to Philipp Flieger and Ifeoluwa Olusayo Oloruntuyi for their support during the data collection at Leiden University and their feedback on the first draft. We also thank Michal Korenar for his helpful comments. Finally, we would like to thank our anonymous reviewers and all our participants for their time.
CRediT Author Contribution Statement
Sarah Von Grebmer Zu Wolfsthurn: Conceptualisation, Methodology, Software, Investigation, Formal Analysis, Data Curation, Writing-Original Draft, Writing-Review & Editing, Visualisation. Anna Gupta: Conceptualisation, Investigation, Formal Analysis, Data Curation, Writing-Original Draft, Writing-Review & Editing. Leticia Pablos: Conceptualisation, Methodology, Writing-Review & Editing, Supervision. Niels O. Schiller: Conceptualisation, Writing-Review & Editing, Supervision, Funding Acquisition.
Declaration of Competing Interests
No known competing interests.
Funding Statement
This project has received funding from the European Union's Horizon2020 research and innovation programme under the Marie Skłodowska Curie grant agreement No 765556 – The Multilingual Mind.
Citation Diversity Statement
Within academia, research is witnessing a systematic underrepresentation of female researchers and members of minorities in published articles (Dworkin, Linn, Teich, Zurn, Shinohara, & Bassett, Reference Dworkin, Linn, Teich, Zurn, Shinohara and Bassett2020; Rust & Mehrpour, Reference Rust and Mehrpour2020; Torres, Blevins, Bassett, & Eliassi-Rad, Reference Torres, Blevins, Bassett and Eliassi-Rad2020; Zurn, Bassett, & Rust, Reference Zurn, Bassett and Rust2020). With this Citation Diversity Statement, we aim to raise awareness about this issue. We classified the first and last author based on their preferred gender for each reference in our reference list (wherever this information was available). Our reference list contained 25% woman/woman authors, 42% man/man, 13% woman/man and finally, 16% man/woman authors. For lack of direct comparison in the psycholinguistic field, we compared this to 6.7% for woman/woman, 58.4% for man/man, 25.5% woman/man, and lastly, 9.4% for man/woman authored references for the field of neuroscience (Dworkin et al., Reference Dworkin, Linn, Teich, Zurn, Shinohara and Bassett2020). Note that the limitations of this classification are twofold: first, we need to develop adequate comparison metrics for every research field; and second, we need to improve this rudimentary binary gender classification system. However, we are confident that future work will address both issues.
Data availability statement
The data that support the findings of this study are openly available in Open Science Framework at https://osf.io/e5ba9/?view_only=e8c7dbdf07984cbebb0fce50a132a605 [View-Only link]
Appendix
Appendix A. Linguistic profile of the Italian–Spanish group (N = 33) according to the LEAP-Q (Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007)
Appendix B. Linguistic profile of the Dutch–Spanish group (N = 25) according to the LEAP-Q (Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007)
Appendix C. Models of best fit for accuracy and RTs, including odd ratios/estimates, confidence intervals, test statistics and p-values for the Italian–Spanish group
Appendix D. Models of best fit for accuracy and RTs, including odd ratios/estimates, confidence intervals, test statistics and p-values for the Dutch–Spanish group
Appendix E. Comparison between the model with the interaction effect of condition and typological similarity (left) and the best-fitting model (right) with a main effect for condition and typological similarity