The last two decades have witnessed a growing interest in bilingualism. As the world becomes more globalized, the number of bilinguals increases to the point that today there are more bilingual than monolingual speakers (Bialystok, Reference Bialystok2017). In this globalized world, not only bilingualism has increased but also the number of people with a certain level of second-language proficiency. It is increasingly common for people to learn and to use a second language at some point in their lives. This context has favored research on the effects of learning a second language on general human functioning (for reviews, see Klimova, Reference Klimova2018; Lehtonen et al., Reference Lehtonen, Soveri, Laine, Järvenpää, de Bruin and Antfolk2018; Quinteros Baumgart & Billick, Reference Quinteros Baumgart and Billick2018).
Paralleling the increasing attention to multilingualism phenomena, in the last decades, the study of false memories has attracted the scientific community’s efforts, as it constitutes a means of providing insight into the constructive nature of memory (Schacter & Slotnick, Reference Schacter and Slotnick2004; for reviews, see Brainerd & Reyna, Reference Brainerd and Reyna2005; Gallo, Reference Gallo2006, Reference Gallo2010). Several experimental procedures have been used to induce memory distortions, but one, in particular, has been widely employed: the Deese/Roediger–McDermott (DRM) paradigm (Deese, Reference Deese1959; Roediger & McDermott, Reference Roediger and McDermott1995). In this paradigm, participants study lists of words associated with a non-presented critical lure. In a later memory test, not only studied words are retrieved (true memories) but also critical lures are often falsely recalled or recognized, leading to the false memory effect (e.g., Arndt, Reference Arndt2012; Cadavid & Beato, Reference Cadavid and Beato2017; Wang et al., Reference Wang, Otgaar, Howe, Lippe and Smeets2018; Yonelinas et al., Reference Yonelinas, Aly, Wang and Koen2010).
The DRM paradigm was created in an English-speaking environment, but the robustness of the false memory effect has fascinated memory researchers worldwide (e.g., Arndt, Reference Arndt2015; Beato & Arndt, Reference Beato and Arndt2017; Beato et al., Reference Beato, Suarez and Cadavid2023; Cadavid et al., Reference Cadavid, Beato, Suarez and Albuquerque2021; Huff et al., Reference Huff, Bodner and Fawcett2014; Wang et al., Reference Wang, Otgaar, Howe and Zhou2019). Consequently, DRM materials have been used in different languages by applying two different procedures. On the one hand, researchers have created new DRM lists based on free association norms in their own language to study false memories (e.g., Chinese: Chen et al., Reference Chen, Li, Westerberg and Tzeng2008; Geng et al., Reference Geng, Qi, Li, Fan, Wu and Zhu2007; Lee et al., Reference Lee, Iao and Lin2007; Dutch: Van Damme & D’Ydewalle, Reference Van Damme and D’Ydewalle2009a, Reference Van Damme and D’Ydewalle2009b; French: Dubuisson et al., Reference Dubuisson, Fiori and Nicolas2012; Plancher et al., Reference Plancher, Nicolas and Piolino2008; Hebrew: Anaki et al., Reference Anaki, Faran, Ben-Shalom and Henik2005; Ben-Artzi et al., Reference Ben-Artzi, Faust and Moeller2009; Italian: Iacullo & Marucci, Reference Iacullo and Marucci2016; Japanese: Abe et al., Reference Abe, Okuda, Suzuki, Sasaki, Matsuda, Mori, Tsukada and Fujii2008; Kawasaki & Yama, Reference Kawasaki and Yama2006; Polish: Ulatowska & Olszewska, Reference Ulatowska and Olszewska2013; Brazilian Portuguese: Stein et al., Reference Stein, Feix and Rohenkohl2006; European Portuguese: Albuquerque, Reference Albuquerque2005; Albuquerque & Pimentel, Reference Albuquerque and Pimentel2005; Rocha & Albuquerque, Reference Rocha and Albuquerque2003; European Spanish: Beato & Arndt, Reference Beato and Arndt2014; Boldini et al., Reference Boldini, Beato and Cadavid2013; Mexican Spanish: Anastasi, De Leon et al., Reference Anastasi, de Leon and Rhodes2005; Swedish: Johansson & Stenberg, Reference Johansson and Stenberg2002). On the other hand, to study the false memory effect, researchers have also translated original English lists (Roediger & McDermott, Reference Roediger and McDermott1995; Stadler et al., Reference Stadler, Roediger and McDermott1999) to a second language, such as Chinese (Mao et al., Reference Mao, Yang and Wang2010), Dutch (Zeelenberg & Pecher, Reference Zeelenberg and Pecher2002), French (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Howe et al., Reference Howe, Gagnon and Thouas2008), German (Diekelmann et al., Reference Diekelmann, Landolt, Lahl, Born and Wagner2008, Reference Diekelmann, Born and Wagner2010; Rummer et al., Reference Rummer, Schweppe and Martin2009), Italian (Ciaramelli et al., Reference Ciaramelli, Ghetti, Frattarelli and Làdavas2006), Brazilian Portuguese (Stein & Pergher, Reference Stein and Pergher2001), or European Spanish (García-Bajos & Migueles, Reference García-Bajos and Migueles1997) among others. In this regard, although a direct translation does not seem to be the best practice to adapt DRM materials (for reviews of methodological issues, see Graves & Altarriba, Reference Graves, Altarriba, Heredia and Altarriba2014; Marmolejo et al., Reference Marmolejo, Diliberto-Macaluso and Altarriba2009), robust false memory effects have been reported both translating original English lists to a second language and creating new lists based on free association norms of the language of interest.
One of the main theories that account for the DRM false memory illusion is the Activation-Monitoring Framework or AMF (Roediger et al., Reference Roediger, Balota, Watson, Roediger, Nairne, Neath and Surprenant2001). The AMF assumes that two different memory processes work in opposition to create a false memory: activation and monitoring. Specifically, false memory occurs when, first, the critical lure is automatically activated due to preexisting associations between the studied words and the critical lure, and, subsequently, monitoring processes fail. Another prominent theory of DRM false memories is the Fuzzy-Trace Theory or FTT (Brainerd & Reyna, Reference Brainerd and Reyna2002),Footnote 1 which also posits the existence of two opposing processes in false memory formation. In this framework, false memory appears when the gist information of the list is extracted, and this gist trace matches the critical lure, and the retrieval of verbatim representations is not enough to reject the critical lure (i.e., no recollection rejection).
Several models have also been proposed to explain how second languages are represented in the brain. Currently, the most accepted models are the Revised Hierarchical Model (Kroll & Stewart, Reference Kroll and Stewart1994; Kroll et al., Reference Kroll, Van Hell, Tokowicz and Green2010) and the connectionist models of bilingual representation (Hernandez et al., Reference Hernandez, Li and MacWhinney2005). These models differ on whether second-language lexical entries are developed and stored separately (Kroll & Stewart, Reference Kroll and Stewart1994) or not (Hernandez et al., Reference Hernandez, Li and MacWhinney2005). However, they agree on a critical characteristic: first-language lexical entries are quicker to fully activate conceptual representations or concepts than second-language lexical entries. Previous research pointed at this shared characteristic to establish straightforward predictions on false memory effects across first or dominant languages (L1, hereafter) and second or non-dominant languages (L2, hereafter) (e.g., Arndt & Beato, Reference Arndt and Beato2017; Beato & Arndt, Reference Beato and Arndt2021). Specifically, false memory is expected to be higher in the L1 than L2 because there would be a more automatic access to the conceptual representations from the dominant language. This effect is known as the language dominance effect. Furthermore, according to the Revised Hierarchical Model, second-language learners or unbalanced bilinguals access concepts from L2 words through their L1 translation. That is, participants with low L2 proficiency would not directly access the concept from L2 words but need to translate that word into their L1 and subsequently access the conceptual representation.
After reviewing the literature, we found that, despite the aforementioned growing interest in both false memory and bilingualism, false memory in a non-dominant language is still a poorly studied and understood phenomenon. Besides this, the few studies that have investigated false memory in both the L1 and L2 with the DRM paradigm have used different experimental conditions, reaching inconsistent conclusions. The goal of the current research is to address this gap. We studied memory processes in the L1 and L2 focusing on the role of automatic associations, so we need to bridge the false memory and language literature traditions. Concerning false memory literature, in this particular research, both the AMF and the FTT would make the same predictions because the associative strength between the studied words and the critical lure correlates with processes that allow gist extraction (Cann et al., Reference Cann, McRae and Katz2011; Huff et al., Reference Huff, Di Mauro, Coane and O’Brien2021). Regarding the language literature, the Revised Hierarchical Model (RHM) and the connectionist models rely on associative activations, just like the AMF.
It is important to understand how false memories are generated not only in dominant but also in non-dominant languages because it allows us, first, to establish the nature of the false memory phenomenon in the L1 and L2, and second, to explore the role that automatic access to conceptual representations plays in false memory. The next section presents an exhaustive review of research focused on false memory in the L1 and L2 in order to understand why studies sometimes reached different conclusions and how to move forward in this field.
Previous findings in within- and between-language false memory
An exhaustive literature review on false memory revealed that different experimental conditions have been used, with some studies including conditions in which study and test language matched (i.e., within-language conditions, L1L1 and L2L2) (Anastasi, Rhodes, et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Arndt & Beato, Reference Arndt and Beato2017), and others including conditions in which study and test languages did not match (i.e., between-language conditions, L1L2 and L2L1) (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Howe et al., Reference Howe, Gagnon and Thouas2008; Kawasaki-Miyaji et al., Reference Kawasaki-Miyaji, Inoue and Yama2003; Marmolejo et al., Reference Marmolejo, Diliberto-Macaluso and Altarriba2009; Sahlin et al., Reference Sahlin, Harding and Seamon2005; see Graves & Altarriba for a review).
Firstly, regarding within-language studies, the results showed that highly proficient participants in the L1, with less proficiency in the L2, systematically produced higher false recognition when words were studied in the L1 (i.e., L1L1) than in the L2 (i.e., L2L2) (Anastasi, Rhodes, et al., Reference Anastasi, Rhodes, Marquez and Velino2005, Experiments 3 and 4; Arndt & Beato, Reference Arndt and Beato2017; Beato & Arndt, Reference Beato and Arndt2021; Howe et al., Reference Howe, Gagnon and Thouas2008; Kawasaki-Miyaji et al., Reference Kawasaki-Miyaji, Inoue and Yama2003; Marmolejo et al., Reference Marmolejo, Diliberto-Macaluso and Altarriba2009; Sahlin et al., Reference Sahlin, Harding and Seamon2005). Only when participants were highly exposed to their L2 (they lived in an L2 environment), researchers found higher levels of memory distortion in the L2 than in the L1 (Anastasi, Rhodes, et al., Reference Anastasi, Rhodes, Marquez and Velino2005, Experiment 2). For its part, when L1 and L2 proficiency was similar, false recognition did not differ between languages (Cabeza & Lennartson, Reference Cabeza and Lennartson2005). Thus, it seems clear that participants in within-language conditions were more prone to distort their memory in their most proficient language (for a review, see Suarez & Beato, Reference Suarez and Beato2021). These results go in line with the Revised Hierarchical Model (Kroll & Stewart, Reference Kroll and Stewart1994) that states that the higher the proficiency, the greater the automatic access to the conceptual representations, leading to higher false memory in the L1 than in the L2.
Secondly, research including both within- and between-language conditions was interested in understanding what happened with false memories when study and test language matched and did not match. This comparison between within- and between-language conditions has been called the effect of language shift. Studies that have analyzed this effect have not reached a clear conclusion, as all possible results have been reported. Specifically, it is possible to find studies with higher false recognition in within- than between-language conditions (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Sahlin et al., Reference Sahlin, Harding and Seamon2005), but also studies with the opposite result (Howe et al., Reference Howe, Gagnon and Thouas2008; Marmolejo et al., Reference Marmolejo, Diliberto-Macaluso and Altarriba2009). It was even possible to find similar false memory for within- and between-language conditions, although this difference was not statistically tested (see Figure 1 in Kawasaki-Miyaji et al., Reference Kawasaki-Miyaji, Inoue and Yama2003).
The effect of memory instructions: restrictive versus inclusive instructions
After carefully analyzing the previous literature on within- and between-language false memory, we found that they differ in an important respect: the instructions given to the participants at the memory test. Researchers have used two types of instructions that triggered different strategies at retrieval, and that led participants to respond based on two very different criteria. We have called these instructions restrictive and inclusive memory instructions.
In restrictive memory instructions, participants were asked to endorse the studied items in the memory test only when the language at study and test matched, that is, in within-language conditions (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Kawasaki-Miyaji et al., Reference Kawasaki-Miyaji, Inoue and Yama2003; Sahlin et al., Reference Sahlin, Harding and Seamon2005).Footnote 2 Therefore, when study and test languages did not match, that is, in between-language conditions, participants should reject translated studied words. Thus, with these restrictive instructions, participants are explicitly asked to make judgments requiring retrieval of language information to confirm whether the language at study and test match. Hence, with restrictive instructions, participants would adopt a more conservative criterion during the recognition test by engaging in source-monitoring processes about the study language (i.e., AMF) or by searching for verbatim traces (i.e., FTT). These two mechanisms could help avoid false memories, especially in between-language conditions.
In contrast, inclusive memory instructions led participants to endorse all the studied items, even when presented in different languages at study and test (Howe et al., Reference Howe, Gagnon and Thouas2008). Thus, these inclusive instructions should promote a more lenient criterion at the recognition test, since participants should endorse studied words regardless of whether the study and test language matched or not. Therefore, with inclusive instructions, it was not necessary to retrieve language information to check whether the study and test languages were the same. Hence, inclusive instructions in within- and between-languages conditions give us a clearer picture of the activation of conceptual representations (i.e., AMF) or the strength of the gist traces (i.e., FTT). Taking into account our assumption that restrictive and inclusive memory instructions triggered different strategies at retrieval and led participants to respond according to two different criteria, it is not surprising that previous literature has reported mixed findings.
Regarding the effect of language shift on false memory, on the one hand, with restrictive memory instructions, we expect to find lower false recognition in between- than within-language conditions as participants should only endorse studied items when study and test language matches. Therefore, even if participants falsely recognize a critical lure as a studied item in between-language conditions, they would reject it at test when they identify it as a translated studied item (i.e., by correctly retrieving the study language). Indeed, this was the result found in previous studies that used this type of instructions (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Sahlin et al., Reference Sahlin, Harding and Seamon2005).
On the other hand, with inclusive memory instructions, previous studies by Howe et al. (Reference Howe, Gagnon and Thouas2008) and Marmolejo et al. (Reference Marmolejo, Diliberto-Macaluso and Altarriba2009) have found higher false recognition in between- than within-language conditions, although these studies raise some concerns that we should be aware of. First, the false recognition obtained by Howe et al. (Reference Howe, Gagnon and Thouas2008) is challenging to interpret because the authors included in the between-language conditions two conditions in which study and recognition test language matched. Specifically, the L1L2L1 and L2L1L2 conditions (study-recall-recognition language) were included as between-language conditions when, in our opinion, they were rather within-language conditions in terms of the recognition memory test. Second, although Marmolejo et al. (Reference Marmolejo, Diliberto-Macaluso and Altarriba2009) concluded that false recognition was higher in between-language conditions (i.e., L1L2 and L2L1) than in within-language conditions (i.e., L1L1 and L2L2), they did not find differences between any of the two comparisons of interest: L1L1 versus L1L2 (.80 vs. .87, respectively) and L2L2 versus L2L1 (.73 vs. .80, respectively). Third, both studies employed a free recall task before the recognition test, and therefore, the false recognition should be interpreted cautiously as these results could be contaminated by the preceding recall memory test (e.g., Gallo, Reference Gallo2006; Roediger & McDermott, Reference Roediger and McDermott1995). Finally, it should be noted that these studies included participants with high L2 proficiency and, therefore, their results may not generalize to unbalanced bilinguals or second-language learners, as those included in our study.
In summary, it is not possible to get a clear picture of the effect of language shift on false memories based on previous research because the only five studies that have compared false memory in within- and between-language conditions have used different instructions and have sometimes included different memory tests (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Howe et al., Reference Howe, Gagnon and Thouas2008; Kawasaki-Miyaji et al., Reference Kawasaki-Miyaji, Inoue and Yama2003; Marmolejo et al., Reference Marmolejo, Diliberto-Macaluso and Altarriba2009; Sahlin et al., Reference Sahlin, Harding and Seamon2005). Therefore, the main goal of the present study was to examine the effect of language shift (within- vs. between-language conditions) on false recognition while manipulating the memory instructions. Specifically, when restrictive instructions were used (Experiment 1), participants needed to make judgments requiring retrieval of language information, whereas with inclusive instructions (Experiment 2), participants did not need to retrieve the language information. It is worth noting that only a recognition test was included in the retrieval phase to avoid any possible data contamination by an initial free recall test. To our knowledge, this is the first time in the literature that research examines the role that restrictive and inclusive memory instructions play on the effects of language shift on false recognition in second-language learners.
Experiment 1
In Experiment 1, we studied the effect of language shift on false recognition when participants should make judgments requiring retrieval of the study language. Specifically, second-language learners were presented with restrictive memory instructions that led them to endorse an item in the recognition test only when the language at study and test matched (i.e., within-language conditions) but not when it did not match (i.e., between-language conditions).
As in previous research that used within-language conditions, and in which highly proficient participants in the L1 with less proficiency in the L2 were included, we expected to find higher false recognition in L1L1 than in L2L2 (see Suarez & Beato, Reference Suarez and Beato2021 for a review). This outcome could be explained by the RHM (Kroll & Stewart, Reference Kroll and Stewart1994). According to this model, the activation of the critical lures will be greater when words are studied in the L1 than in the L2 due to the greater automatic access to the conceptual representations in the L1. In turn, false recognition will be higher in L1L1 than in L2L2.
More importantly, regarding the effect of language shift, with these restrictive memory instructions, we expected to find lower false recognition in between- than in within-language conditions, as in previous studies (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Sahlin et al., Reference Sahlin, Harding and Seamon2005), both when words were studied in the L1 (i.e., L1L1 > L1L2) and L2 (i.e., L2L2 > L2L1). This is because even though participants might falsely recognize a critical lure in between-language conditions as a translated studied word, assuming that they are able to retrieve the study language, the instructions themselves would lead them to reject that word because it is not in the same language in which they studied it.
Method
Participants
Ninety undergraduate students (19 to 35 years old, M = 20.61, SD = 2.88) participated voluntarily and signed an informed consent form. Participants were native Spanish speakers (72% women) and were living in Spain at the time of the experiment. The Spanish school system includes mandatory English language training in primary and secondary school, but the usual situation is that young adults do not speak or listen to English on a daily basis outside of formal instruction. To measure Spanish (L1) and English (L2) proficiency, a self-report scale ranging from 1 (elementary knowledge) to 10 (native speaker proficiency) was used. All participants self-rated their L1 proficiency with a perfect score (M = 10, SD = 0.00) and their L2 proficiency as moderate,Footnote 3 giving scores that ranged from 1 to 8, with only one participant rating 10 (M = 5.68, SD = 1.70). Taking this into account, participants were considered unbalanced bilinguals, being Spanish their dominant language and English their non-dominant language. The study was approved by the Research Ethics Committee of the University of Salamanca.
Materials
Sixteen DRM lists with 10 words per list were used in the study phase (materials are freely available at https://osf.io/bz4g9/), 8 lists in Spanish and 8 lists in English (see Appendix). Spanish lists were built using the Fernandez et al. (Reference Fernandez, Diez and Alonso2003) free association norms for Spanish words and normed on native Spanish speakers (Alonso et al., Reference Alonso, Fernandez, Diez and Beato2004). For their part, English lists were built using the Nelson et al. (Reference Nelson, McEvoy and Schreiber1998) free association norms for English words and normed on native English speakers (Stadler et al., Reference Stadler, Roediger and McDermott1999).
Since we were aware of the traditionally high variability in false memory rates produced by DRM lists, we decided to match our L1 and L2 DRM lists for their level of false recognition. That is, Spanish and English lists had similar false recognition rates (54.00% and 54.38%, respectively) when they were applied to native-speaker participants in previous normative studies (Alonso et al., Reference Alonso, Fernandez, Diez and Beato2004; Stadler et al., Reference Stadler, Roediger and McDermott1999), t(14) = −.048, p = .962, Cohen’s d = 0.03, 95% CI [−0.17, 0.16]. In other words, Spanish and English lists share the same capacity to produce false recognition. Consequently, if the lists show different levels of false recognition in the present study, it is unlikely that these differences are due to the lists. Furthermore, eight DRM lists (four in English and four in Spanish) with similar false recognition rates in both languages were used as unrelated distractors in the recognition memory test.
The 96-item recognition memory test included 48 studied words (three per study list, serial positions 1, 6, and 10) and 48 non-studied words (16 critical lures and 32 unrelated distractors), half in English and half in Spanish. Words were randomly presented. Half of the critical lures were presented in the same language as their corresponding studied lists (i.e., within-language conditions: L1L1 or Spanish-Spanish, and L2L2 or English-English). The other half of the critical lures was translated into the other language (i.e., between-language conditions: L1L2 or Spanish-English, and L2L1 or English-Spanish). Critical lures were evenly assigned to within- and between-language conditions.
Procedure
Participants were run in groups of up to 22 and were presented with 16 study lists (eight in Spanish and eight in English). Each word was visually presented on a computer screen for 2 s. The associates within each list were presented in decreasing order of associative strength, and lists were randomly presented. Participants were instructed in Spanish (L1) to pay attention to each word in preparation for a subsequent memory test (approximated English translation): “In this part of the experiment, words will be presented one at a time on the computer screen. Words will appear at a constant rate in the center of the screen, pay attention to them. Your task is to study the words as best you can because afterward you will have to perform a memory test. Some words will be presented in English and others in Spanish. Do you have any questions?”
Following the study phase, participants were administered the recognition test, in which words were presented one at a time on the computer screen. Participants were asked to decide whether each word had been presented (i.e., “old” word) or not (i.e., “new” word) at the study phase and to respond by pressing the corresponding key. As in previous research (Cabeza & Lennartson, Reference Cabeza and Lennartson2005), restrictive memory instructions were provided, and participants had to respond “old” only to the studied words that appeared in the test phase in the same language as in the study phase. Otherwise, participants would have to respond “new” to the word. The approximated English translation of the instructions would be: “In this part of the experiment, we will test your memory. You will be presented with words one at a time on the computer screen. Your task is to determine whether each word was presented, or not, in the study phase. If the word was presented in the list of words that you just studied, please press the “E” key to indicate that it was STUDIED. If the word was not presented in the list of words you just studied, please press the “N” key to indicate that it is NEW. Be careful because you may have studied a word in Spanish and now, in the memory test, the word appears translated into English. In this case, you must indicate that it is a NEW word and press the “N” key. That is, you have to consider as STUDIED only the words that are written in the same language in which they were previously studied. For example, if you studied the word “FRANCIA” and the word “FRANCE” appears in the memory test, you should consider it as a NEW word by pressing the “N” key. You must do the same with the words that you studied in English and now are presented in Spanish. Do you have any questions?”
Results and discussion
Complete analysis code and data are freely available at https://osf.io/bz4g9/
Language dominance effect on true memory
To analyze language dominance effect on true memory, we compared the percentages of “old” responses to studied words in a dominant language or L1 (i.e., Spanish) and a non-dominant language or L2 (i.e., English). The paired t-test indicated that participants correctly recognized fewer words studied in the L1 (M = 68.84, SD = 16.95) than in the L2 (M = 77.45, SD = 13.66), t(89) = −5.484, p < .001, Cohen’s d = −0.56, 95% CI [−11.73, −5.49]. Previous studies conducted with second-language learners have reported a similar pattern of results, that is, higher true recognition in the non-dominant language (e.g., Arndt & Beato, Reference Arndt and Beato2017; Beato & Arndt, Reference Beato and Arndt2021).
Language dominance effect on false memory
To evaluate whether there was a language dominance effect on false recognition in both within- and between-language conditions, we conducted two separate two-way ANOVAs (see Table 1 for descriptives).
Note. L1 = Spanish; L2 = English. Standard deviations are reported in parenthesis.
First, a 2 (study language: L1 [Spanish], L2 [English]) x 2 (type of word: critical lure, unrelated distractor) repeated-measures ANOVA was performed on the percentage of “old” responses given to each type of word in within-language conditions. The analysis revealed a significant main effect of study language, F(1, 89) = 34.38, p < .001, η 2 p = .279, with more “old” responses in the L1 than in the L2 condition (26.20 vs. 16.44, respectively), 95% CI [6.46, 13.08]. A significant main effect of type of word, F(1, 89) = 122.85, p < .001, η 2 p = .580, showed that “old” responses to critical lures (M = 34.58) were more likely than “old” responses to unrelated distractors (M = 8.06), 95% CI [21.77, 31.28]. Finally, there was a significant Study Language × Type of Word interaction, F(1, 89) = 86.30, p < .001, η 2 p = .492. False alarms to critical lures were higher than false alarms to unrelated distractors in both the L1 (46.67 vs. 5.74, respectively), 95% CI [34.77, 47.08], p < .001, Cohen’s d = 1.87, and the L2 (22.50 vs. 10.37, respectively), 95% CI [7.00, 17.26], p < .001, Cohen’s d = 0.65, but the interaction occurred because the effect of type of word was larger in the L1 than in the L2 condition. These data confirmed that critical lures, regardless of the language, produced above-baseline levels of false recognition. Moreover, as expected, false recognition was higher in critical lures associated with words studied in the L1 (i.e., L1L1 condition, M = 46.67) than in the L2 condition (i.e., L2L2 condition, M = 22.50), p < .001, Cohen’s d = 0.89, 95% CI [18.12, 30.22]. False alarms to unrelated distractors were higher in the L2 (M = 10.37) than in the L1 (M = 5.74), p < .001, Cohen’s d = 0.47, 95% CI [2.57, 6.69], a trend opposite to that for critical lures.
Second, we analyzed the language dominance effect on false recognition in between-language conditions. In these conditions, the critical lures at test were presented in a different language to that in which their associates were studied, and participants should give “old” responses only to those words that matched language at study and test. The 2 (study language: L1, L2) × 2 (type of word: critical lure, unrelated distractor) repeated-measures ANOVA performed on the percentage of “old” responses given to each type of word in between-language conditions revealed a significant main effect of type of word, F(1, 89) = 18.41, p < .001, η 2 p = .171, where false alarms to critical lures (M = 14.86) were higher than false alarms to unrelated distractors (M = 8.06), 95% CI [3.65, 9.96]. Therefore, we confirmed that, as in previous studies with the same restrictive memory instructions, there was false recognition in between-language conditions (e.g., Cabeza & Lennartson, Reference Cabeza and Lennartson2005). In contrast, neither the main effect of study language, F(1, 89) = 2.66, p = .106, η 2 p = .029 nor the Study Language × Type of Word interaction, F(1, 89) = 3.41, p = .068, η 2 p = .037 were statistically significant.
In summary, using restrictive memory instructions that required participants to retrieve the study language, critical lures were falsely recognized in both within- and between-language conditions, showing a false memory effect. It is worth mentioning that in the between-language conditions, not only critical lures were not presented at study but also appeared translated in the recognition test and, despite all this, false memory was found, confirming the robustness of this effect. This finding shows that participants were not able to retrieve the language information of all the studied lists, as they were unable to reject all the translated critical lures. Furthermore, a language dominance effect was only found in within-language conditions. Specifically, false recognition was higher when words were studied in the dominant than in the non-dominant language as long as the study and test language matched (e.g., Anastasi, Rhodes, et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Arndt & Beato, Reference Arndt and Beato2017; Beato & Arndt, Reference Beato and Arndt2021; Sahlin et al., Reference Sahlin, Harding and Seamon2005), but there were no differences when the languages at study and test did not match (e.g., Kawasaki-Miyaji et al., Reference Kawasaki-Miyaji, Inoue and Yama2003; Sahlin et al., Reference Sahlin, Harding and Seamon2005).
Effect of language shift on false memory
In order to examine the effect of language shift on false memory, we conducted a 2 (study language: L1, L2) × 2 (language shift: within-language conditions [L1L1, L2L2], between-language conditions [L1L2, L2L1]) ANOVA on false recognition (i.e., false alarms to critical lures). This ANOVA yielded a significant main effect of study language, F(1, 89) = 47.28, p < .001, η 2 p = .347, where false recognition was higher when words were studied in the L1 (M = 30.83) than in the L2 (M = 18.61), 95% CI [8.69, 15.75]. Also, there was a main effect of language shift, F(1, 89) = 56.02, p < .001, η 2 p = .386, showing a higher false recognition in within-language conditions (i.e., L1L1 and L2L2) than in between-language conditions (i.e., L1L2 and L2L1) (34.58 vs. 14.86, respectively), 95% CI [14.49, 24.96].
Finally, there was a significant Study Language × Language Shift interaction, F(1, 89) = 31.82, p < .001, η 2 p = .263. As can be seen in Figure 1, false recognition was higher when words were studied in the L1 (M = 46.67) than in the L2 (M = 22.50) in within-language conditions, p < .001, Cohen’s d = 0.90, 95% CI [18.12, 30.22], but not in between-language conditions (15.00 vs. 14.72, respectively), p = .910, Cohen’s d = 0.02, 95% CI [−4.59, 5.15]. Furthermore, regarding the effect of language shift, false recognition was higher in within- than in between-language conditions when words were studied in both the L1 (L1L1: 46.67 vs. L1L2: 15.00), p < .001, Cohen’s d = 1.28, 95% CI [24.61, 38.72], and the L2 (L2L2: 22.50 vs. L2L1: 14.72), p = .017, Cohen’s d = 0.36, 95% CI [1.42, 14.14], but this difference was higher when words were studied in the L1.
Therefore, regarding the effect of language shift when restrictive memory instructions were used, the results indicated that false recognition was higher in within- than in between-language conditions regardless of the language in which the words were studied, replicating the findings of previous research that used the same restrictive instructions (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Sahlin et al., Reference Sahlin, Harding and Seamon2005). There are two possible explanations for this result: there was less false memory formation in between- than in within-language conditions or there was a correct retrieval of language information in between-language conditions. One could argue that participants rejected a higher number of critical lures in between-language conditions due to a lack of memory traces for those concepts. However, when we compare within- and between-language conditions (i.e., L1L1 vs. L1L2, or L2L2 vs. L2L1), we are contrasting conditions in which the study phase was the same. Since activation and gist extraction mainly occur in the study phase, in the comparison between L1L1 versus L1L2 (or L2L2 vs. L2L1), we expect that the critical lures were equally activated or the gist memory traces were the same in both conditions. If that was the case, we suggest that participants rejected more critical lures in between- than in within-language conditions because the restrictive memory instructions forced them to do so by correctly retrieving the study language, the second possible explanation. That is, it could be that participants (falsely) thought that critical lures were studied items but presented in a different language to that used at study and as the language did not match at study and test, following the instructions, participants rejected those critical lures. In this sense, the restrictive memory instructions made it difficult to know what was happening with critical lures: In between-language conditions, did participants have memory traces for the rejected critical lures, or did they not? Experiment 2 sought to answer this question by analyzing false recognition in within- and between-language conditions with different memory instructions that will allow us to fully capture the false memory illusion.
Experiment 2
Experiment 2 studied the effect of language shift on false recognition when second-language learners do not need to retrieve the study language. Concretely, participants were presented with inclusive memory instructions that consisted of responding “old” in the recognition test when the item had been presented in the study phase, regardless of whether the study and test language matched or not. With these inclusive instructions, first, we expected to find, as in Experiment 1, higher false recognition when words were studied in the L1 (i.e., L1L1) than in the L2 (i.e., L2L2). Second, regarding the effect of language shift on false recognition, we consider that, in Experiment 1, participants had memory traces for the rejected critical lures in between-language conditions, but they were rejected because participants correctly retrieved the study language. In contrast, in Experiment 2, as participants have to endorse all studied words regardless of the language in which they were presented (i.e., inclusive memory instructions), we expect to find an increase in between-language false recognition (i.e., L1L2 and L2L1) reaching a similar level to the false recognition in within-language conditions (i.e., L1L1 and L2L2).
Method
Participants
We recruited 90 undergraduate students (85.56% women), native Spanish speakers living in Spain at the time of the experiment. None of them had participated in Experiment 1. Participants’ ages ranged from 19 to 29 years (M = 20.24; SD = 2.21). They self-rated their proficiency (using a 10-point scale) in Spanish (L1) with a perfect score (M = 10, SD = 0.00) and their proficiency in English (L2) close to the midpoint (M = 4.94, SD = 1.65), with scores ranging from 1 to 8. All participants were volunteers and signed an informed consent form. The study was approved by the Research Ethics Committee of the University of Salamanca.
Materials
In Experiment 2, we used the same materials as in Experiment 1.
Procedure
The procedure was similar to that used in Experiment 1, except for the instructions provided to the participants in the recognition test. In this experiment, participants were instructed in their L1 to respond “old” to previously presented words in the study phase, regardless of the language in which those words had appeared (i.e., inclusive memory instructions). That is, they should endorse the items presented at the study phase, regardless of whether the study and test language matched or not. Similar instructions have also been used in Howe et al.’s (Reference Howe, Gagnon and Thouas2008) study. The approximated English translation of the instructions for Experiment 2 would be: “In this part of the experiment, we will test your memory. You will be presented with words one at a time on the computer screen. Your task is to determine whether each word was presented, or not, in the study phase. If the word was presented in the list of words that you just studied, please press the “E” key to indicate it is STUDIED. If the word was not presented on the list of words you just studied, please press the “N” key to indicate it is NEW. Be careful because you may have studied a word in Spanish and now in the memory test the word appears translated into English. In this case, you must indicate that it was STUDIED and press the “E” key. That is, you have to consider as STUDIED the words that were presented in the study phase, regardless of the language (Spanish or English). For example, if you studied the word “FRANCIA” and the word “FRANCE” appears in the memory test, you should consider it as a STUDIED word by pressing the “E” key. You must do the same with the words that you studied in English and now are presented in Spanish. Do you have any questions?”
Results and discussion
Complete analysis code and data are freely available at https://osf.io/bz4g9/
Language dominance effect on true memory
In order to check whether, as in Experiment 1, true memory was significantly higher in the non-dominant language than in the dominant language, we compared the “old” responses to words studied in the L1 (i.e., Spanish) and L2 (i.e., English). The paired t-test indicated that true recognition was higher in words studied in the L2 (M = 80.93, SD = 12.43) than in the L1 (M = 74.77, SD = 15.47), t(89) = −3.45, p = .001, Cohen’s d = −0.44, 95% CI [−9.70, −2.62]. This result replicated findings from our Experiment 1 and previous studies conducted with second-language learners (e.g., Arndt & Beato, Reference Arndt and Beato2017; Beato & Arndt, Reference Beato and Arndt2021).
Language dominance effect on false memory
As in Experiment 1, to assess whether there was a language dominance effect on false recognition in both within- and between-language conditions, we conducted two separate two-way ANOVAs (see Table 1).
First, a 2 (study language: L1, L2) × 2 (type of word: critical lure, unrelated distractor) repeated-measures ANOVA was performed on the percentage of “old” responses in within-language conditions. This analysis revealed a significant main effect of study language, F(1, 89) = 24.73, p < .001, η 2 p = .217, showing a higher percentage of “old” responses in the L1 than in the L2 condition (33.10 vs. 24.44, respectively), p < .001, 95% CI [5.20, 12.12]. A significant main effect of type of word, F(1, 89) = 195.03, p < .001, η 2 p = .687, showed that “old” responses to critical lures (M = 45.69) were more likely than “old” responses to unrelated distractors (M = 11.85), 95% CI [29.03, 38.66], confirming the existence of false recognition. Finally, there was a significant Study Language × Type of Word interaction, F(1, 89) = 57.50, p < .001, η 2 p = .392. Specifically, as in Experiment 1, this interaction indicated that although “old” responses to critical lures were higher than “old” responses to unrelated distractors in both the L1 (56.94 vs. 9.26, respectively), p < .001, Cohen’s d = 2.19, 95% CI [41.43, 53.94], and L2 (34.44 vs. 14.44, respectively), p < .001, Cohen’s d = 0.97, 95% CI [14.20, 25.80], this difference was greater in the L1 than in the L2. Moreover, as in Experiment 1, false recognition was higher in critical lures associated with words studied in the L1 (M = 56.94) than in the L2 (M = 34.44), p < .001, Cohen’s d = 0.82, 95% CI [15.88, 29.12]. Also as in Experiment 1, false alarms to unrelated distractors had an inverse pattern with higher values in the L2 (M = 14.44) than in the L1 condition (M = 9.26), p < .001, Cohen’s d = 0.43, 95% CI [2.64, 7.73].
Therefore, Experiment 2 replicated the results of Experiment 1 in the within-language conditions. That is, we found false recognition in both dominant and non-dominant languages. Furthermore, false recognition proved to be higher in the dominant than in the non-dominant language (e.g., Howe et al., Reference Howe, Gagnon and Thouas2008).
Second, regarding the language dominance effect on false recognition in between-language conditions, a 2 (study language: L1, L2) × 2 (type of word: critical lure, unrelated distractor) repeated-measures ANOVA was performed on the percentage of “old” responses. As in Experiment 1, the ANOVA revealed a significant main effect of type of word, F(1, 89) = 244.94, p < .001, η 2 p = .733, with higher “old” responses to critical lures (M = 43.19) than to unrelated distractors (M = 11.85), p < .001, 95% CI [27.36, 35.32]. In addition, there was no significant main effect of study language, F(1, 89) = 0.79, p = .378, η 2 p = .009. Finally, and in contrast to Experiment 1, there was a significant Study Language × Type of Word interaction, F(1, 89) = 12.01, p = .001, η 2 p = .119. Specifically, Bonferroni post hoc tests indicated that there was a false memory effect in the dominant and the non-dominant language, with “old” responses to critical lures higher than “old” responses to unrelated distractors in both the L1 (47.50 vs. 9.26, respectively), p < .001, Cohen’s d = 1.94, 95% CI [33.00, 43.48], and L2 (38.89 vs. 14.44, respectively), p < .001, Cohen’s d = 1.07, 95% CI [18.49, 30.41]. Again, the difference between critical lures and unrelated distractors was larger in the L1 than in the L2. Furthermore, analyses revealed that false recognition was higher in critical lures associated with words studied in the L1 (M = 47.50) than in the L2 (M = 38.89), p = .023, Cohen’s d = 0.31, 95% CI [1.24, 15.98], but false alarms to unrelated distractors were higher in the L2 (M = 14.44) than in the L1 (M = 9.26), p < .001, Cohen’s d = 0.43, 95% CI [2.64, 7.73]. Thus, unlike what happened in the same condition in Experiment 1, we also found a language dominance effect on false recognition in the between-language conditions. That is, the use of inclusive memory instructions allowed us to find that critical lures associated with words studied in the L1 were more falsely recognized than critical lures whose associates were studied in the L2, not only in within-language conditions but also in between-language conditions.
Effect of language shift on false memory
To analyze the effect of language shift on false recognition, a 2 (study language: L1, L2) × 2 (language shift: within-language conditions [L1L1, L2L2], between-language conditions [L1L2, L2L1]) ANOVA was carried out. The analysis revealed a significant main effect of study language, F(1, 89) = 34.78, p < .001, η 2 p = .281. Specifically, as in Experiment 1, false recognition was higher when words were studied in the L1 (M = 52.22) than in the L2 (M = 36.67), 95% CI [10.32, 20.80]. Moreover, as expected, there was no significant main effect of language shift, F(1, 89) = 0.808, p = .371, η 2 p = .009, 95% CI [−3.03, 8.03].
Finally, there was a significant Study Language × Language Shift interaction, F(1, 89) = 8.82, p = .004, η 2 p = .09. Bonferroni post hoc tests showed that false recognition was higher when words were studied in the L1 than in the L2, both in within-language conditions (i.e., L1L1 vs. L2L2) (56.94 vs. 34.44, respectively), p < .001, Cohen’s d = 0.82, 95% CI [15.88, 29.12], and in between-language conditions (i.e., L1L2 vs. L2L1) (47.50 vs. 38.89), p = .023, Cohen’s d = 0.31, 95% CI [1.24, 15.98], but this difference was much larger for within- than between-language conditions (see Figure 2). Furthermore, regarding the effect of language shift, false recognition was higher in within- than in between-language conditions only when words were studied in the L1 (i.e., L1L1 vs. L1L2; 56.94 vs. 47.50, respectively), p = .016, Cohen’s d = 0.34, 95% CI [1.77, 17.12], but not in the L2 (i.e., L2L2 vs. L2L1; 34.44 vs. 38.89, respectively), p = .193, Cohen’s d = −0.16, 95% CI [−11.18, 2.29].
General discussion
The aim of this research was to examine the effect of language shift on false recognition (i.e., to compare false recognition in conditions in which study and test language matched or not) in second-language learners while manipulating the memory instructions, so participants needed or not to retrieve language information to make the memory judgments.
For this purpose, we conducted two experiments in which we manipulated the instructions given at the recognition test, while the study phase instructions always remained the same. Specifically, in Experiment 1, we used restrictive memory instructions that required participants to endorse the studied items only when the language at study and test was the same (i.e., within-language conditions: L1L1, L2L2) and therefore, to reject the translated studied items (i.e., between-language conditions: L1L2, L2L1). In other words, with restrictive memory instructions, participants were forced to make judgments requiring retrieval of language information (i.e., to engage in source-monitoring processes, as predicted by the AMF, or to search for verbatim traces, as predicted by the FTT). In Experiment 2, inclusive memory instructions were used. These instructions required participants to endorse all the studied items regardless of the language in which they were presented. Thus, with inclusive memory instructions, participants were not required to make judgments requiring retrieval of word-specific language information (whether that be via monitoring or retrieval of verbatim traces), as they had to endorse the studied items in both within- and between-language conditions.
Regarding true memory, in both experiments, we found that true recognition was higher in the L2 or non-dominant language than in the L1 or dominant language, replicating previous findings of studies conducted with second-language learners or unbalanced bilinguals (e.g., Arndt & Beato, Reference Arndt and Beato2017; Beato & Arndt, Reference Beato and Arndt2021). Specifically, these authors suggested it is more effortful to access concepts’ lexical and semantic representations in the non-dominant than in the dominant language. This higher difficulty level benefits true recognition in the L2 as compared to the L1 (e.g., Bjork & Bjork, Reference Bjork, Bjork, Gernsbacher, Pew, Hough and Pomerantz2011).
Looking into the false memory effect, we found false recognition in within- and between-language conditions using not only inclusive, but also restrictive memory instructions. In other words, although the restrictive instructions at test promote the engagement of source-monitoring processes or the search for verbatim traces that reduce the false memory effect, this effect appeared with those restrictive instructions, even in the between-languages conditions. It may seem surprising that participants falsely recognized critical lures in between-language conditions when they were instructed to reject translated studied words (i.e., restrictive memory instructions). This finding can be explained by the AMF and the FTT. On the one hand, the AMF states that, in the DRM paradigm, memory distortions arise when the activation of preexisting associations between the studied words and the critical lure occurs (activation processes that are automatic to some extent), and subsequently, monitoring processes fail. On the other hand, the FTT posits that memory distortions appear when gist information of the list is extracted, this gist trace matches the critical lure, and the retrieval of verbatim traces of the studied words is not enough to reject the critical lure. Furthermore, our false memory results with restrictive memory instructions seem to indicate that participants were able to engage in some language source monitoring (AMF), or to retrieve verbatim traces (FTT), as they produced a lower false memory rate when study and test language did not match (between-language conditions) than when they matched (within-language conditions). However, the false memory effect with the DRM paradigm is so robust that these instructions could not eliminate the effect (see previous research using warning instructions for similar results; e.g., Carneiro & Fernandez, Reference Carneiro and Fernandez2010; McDermott & Roediger, Reference McDermott and Roediger1998; Watson et al., Reference Watson, McDermott and Balota2004).
We were also interested in examining the language dominance effect on false memory for within- and between-language conditions. On the one hand, in within-language conditions, as expected, we found a higher false recognition in the dominant (i.e., L1L1) than in the non-dominant language (i.e., L2L2) regardless of whether participants had or did not have to retrieve the language information (restrictive and inclusive memory instructions, respectively). In other words, a language dominance effect was found in false recognition in within-language conditions, as in previous studies (e.g., Anastasi, Rhodes, et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Arndt & Beato, Reference Arndt and Beato2017; Sahlin et al., Reference Sahlin, Harding and Seamon2005; for a review, see Suarez & Beato, Reference Suarez and Beato2021), using restrictive as well as inclusive memory instructions. On the other hand, in between-language conditions, we also observed a higher false recognition for critical lures associated with words studied in the dominant language (i.e., L1L2) than in the non-dominant language (i.e., L2L1), but only when participants were required to endorse the studied items regardless of the language in which they were presented. That is, we found a language dominance effect in false recognition in between-language conditions only when inclusive memory instructions (Experiment 2) were used.
The findings regarding the language dominance effect on false memory could be explained in terms of the RHM (Kroll & Stewart, Reference Kroll and Stewart1994), the AMF (Roediger et al., Reference Roediger, Balota, Watson, Roediger, Nairne, Neath and Surprenant2001), and the FTT (Brainerd & Reyna, Reference Brainerd and Reyna2002). In relation to the RHM and AMF, conceptual representations will be activated more quickly and automatically from L1 than L2 words. Thus, the conceptual representation of the critical lures would be more activated if the lists were studied in the L1 than in the L2. For its part, according to the FTT, it can be assumed that there is a greater strength of gist memory traces in L1 than in L2 word lists because gist traces are supposed to store many conceptually based elements of an experience, such as its meaning (Arndt & Gould, Reference Arndt and Gould2006). These arguments would explain the language dominance effect on false memory in within-language conditions. In the between-language conditions with restrictive instructions, participants had to retrieve the study language and, since study and test languages did not match, they had to reject the falsely recognized translated critical lures. This would explain why no language dominance effect on false memory was found in between-language conditions with restrictive instructions. However, this effect was found in between-language conditions when using inclusive instructions since participants did not have to engage in language source-monitoring processes or search for verbatim traces. That is, inclusive instructions allowed us to capture either all the activation of critical lures or the strength of gist memory traces, being in both cases stronger in the L1 than the L2.
The difference in the language dominance effect in between-language conditions when using restrictive versus inclusive memory instructions (Experiment 1 and 2, respectively) provides evidence of the existence of episodic details in false memories. To understand how this result pattern shows evidence for recollection processes in false memories, we present the following example. In between-language conditions, participants studied a list, for example, in their L1 (mesa, sillón, sentarse, descanso, asiento, cansancio, sofá, taburete, cocina, respaldo) and were presented in the recognition test with the critical lure in their L2 (e.g., CHAIR). First, with inclusive instructions, the L1L2 condition showed higher levels of false recognition than the L2L1 condition (i.e., language dominance effect). That is, the critical lure CHAIR (L2), when its list was studied in the L1, would be more endorsed than, for example, the critical lure AGUJA (L1) when its list was studied in the L2 (thread, pin, eye, sewing, sharp, point, prick, thimble, haystack, thorn). This result can be explained in terms of the AMF and the FTT. According to the AMF, the conceptual representations of critical lures derived from studied lists in the dominant language (L1L2) would receive a more pronounced activation than in the non-dominant language (L2L1). Regarding the FTT, these data show that the gist memory traces from studied lists in the dominant language (L1L2) were stronger than those from studied lists in the non-dominant language (L2L1). Second, the fact that the language dominance effect was not found when using restrictive instructions (Experiment 1), as noted above, seems to be due to the fact that participants were able to engage in language source monitoring (AMF) or to retrieve verbatim memory traces (FTT). That is, in L1L2 and L2L1 conditions, when provided with restrictive instructions, participants seem to avoid committing false memories to some extent by recollecting the language in which the critical lure was “studied” (the same language in which the associates of that list were studied). In other words, participants seem to retrieve language information as episodic details (i.e., source-monitoring in AMF or verbatim traces in the FTT) about their false memory to avoid identifying some critical lures as studied items in between-language conditions.
Lastly, the results of the effect of language shift on false recognition will be discussed. In Experiment 1 with restrictive instructions, we found higher false recognition in within- than in between-language conditions, regardless of the language in which the words were studied (i.e., L1L1 > L1L2, and L2L2 > L2L1). This finding replicated the results of previous studies that have used these restrictive instructions (Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Sahlin et al., Reference Sahlin, Harding and Seamon2005), and there are two possible explanations for it. First, it could be that in between-language conditions participants were quite resistant to the DRM false memory illusion because study and test languages did not match. Second, it could be the case that participants rejected more critical lures in between- than within-language conditions because they were able to retrieve the language information and (falsely) thought that critical lures were translated studied items. In other words, participants had false memories of those critical lures but answered “no” in the recognition test because they could remember the language in which the critical lures’ lists were presented and did not match the test language. The difference between these two possible explanations is substantial since there would be no false recognition in the former, while in the latter, there would be.
To disentangle whether or not there was false recognition, in Experiment 2, we used inclusive memory instructions. These instructions did not require participants to retrieve the study language and would allow them to endorse translated critical lures in case they have (false) memory for those critical lures. Therefore, when there is no need to retrieve the language, we can capture the false memory illusion more accurately. As expected, with inclusive memory instructions, in general, we found an increase in false recognition rates in between-language conditions (i.e., L1L2, L2L1) up to a similar level to the false recognition in within-language conditions (i.e., L1L1, L2L2). This increase in false recognition in between-language conditions when there was no need to retrieve the language (inclusive memory instructions) confirmed that the conceptual representations associated with critical lures in these conditions were activated, or that the gist trace was extracted and matched the critical lure. Consequently, in Expertiment 1, when the instructions did not allow participants to endorse translated studied items (restrictive instructions), they rejected more critical lures in between- than in within-language conditions because they were able to retrieve the language information (to correctly monitor the study language, according to the AMF, or to retrieve verbatim traces, following the FTT) and not because of a lack of false memories. This finding is particularly interesting as it evidences that: (1) participants created a memory trace for the critical lures, and (2) the memories for the critical lures contained episodic details (i.e., the study language of the critical lures’ lists).
Interestingly, analyzing the interaction between study language and language shift with inclusive instructions, we found an effect of language shift on false recognition only when words were studied in the L1 (i.e., higher false recognition in L1L1 than in L1L2), but not in the L2 (i.e., no differences in false recognition between L2L2 and L2L1). Predictions from the RHM could explain these results. According to the RHM, first, speakers usually have strong direct links from L1 words to their conceptual representations and, second, they access the conceptual representations from L2 words differently depending on their L2 proficiency. Specifically, participants with a high L2 proficiency would access the concepts directly from L2 words, while second-language learners would take a different route through the L1 translation. Thus, when participants studied words in their L1 (i.e., L1L1 and L1L2), they rapidly and automatically accessed the concepts. Furthermore, according to the AMF, that activation spread to associated concepts reaching the critical lures, or regarding the FTT, a strong gist memory trace for each list would be extracted. Later, on the one hand, when the critical lures were presented in their L1 at the recognition test (i.e., L1L1), participants falsely recognized them as studied items. On the other hand, when our participants with low L2 proficiency encountered the critical lures translated into their L2 at the recognition test (i.e., L1L2), they needed to translate those words into their L1 to access the conceptual representations. In this case, if participants were not able to translate some of the L2 words they found at test, they would not access those conceptual representations, and therefore, those translated critical lures would be rejected, finding lower false recognition in L1L2 than in L1L1 (i.e., the effect of language shift).
By contrast, when second-language learners studied words in their L2 (i.e., L2L2 and L2L1), as noted above, they accessed the conceptual representations by translating those words into their L1. Again, those conceptual representations would only be accessed when participants knew the translation of those L2 words, but in this case, the L2 knowledge would similarly affect both the L2L2 and L2L1 conditions. If the concepts associated with the critical lure were not activated (AMF) or gist traces from studied lists were not extracted (FTT) during the study phase, it does not matter in which language the critical lure is presented in the recognition test, as a similar false recognition in L2L2 and L2L1 would be expected, just as we found in our study.
This finding differs from previous studies that have used inclusive memory instructions and have found higher false recognition rates in between- than in within-language conditions (Howe et al., Reference Howe, Gagnon and Thouas2008; Marmolejo et al., Reference Marmolejo, Diliberto-Macaluso and Altarriba2009). This different pattern of results could be due to the fact that other researchers included bilingual participants while we tested second-language learners with a lower L2 proficiency. Moreover, as mentioned in the introduction, previous studies raised some methodological concerns that might also influence their results. Concretely, they employed a free recall task before the recognition test, and the language of this free recall task may or may not match the language of the study phase and the recognition test. Moreover, Howe et al. (Reference Howe, Gagnon and Thouas2008) included as between-language conditions some conditions where the language at study and at the recognition test was the same (i.e., within-language conditions in terms of the recognition memory test). Finally, although Marmolejo et al. (Reference Marmolejo, Diliberto-Macaluso and Altarriba2009) concluded that false recognition was higher in between- than in within-language conditions, they failed to find differences between our two comparisons of interest (i.e., L1L1 vs. L1L2 and L2L2 vs. L2L1).
We would like to point out that, even though we have no translation data to prove that participants might not be correctly translating all L2 words, previous studies with Spanish second-language learners of English have shown that participants had far from perfect knowledge of all L2 words presented in the experiment (Beato & Arndt, Reference Beato and Arndt2021). Future research might benefit from conducting a translation test at the end of the experiment to be certain that the explanation of our results provided in this work is accurate. Additionally, further research could not only manipulate the need to make judgments requiring retrieval of word-specific language information at the test phase (as we have done in this research), but also during the study phase. The effect of language shift on false recognition is expected to be greater if, before the study phase, participants were asked to remember the language of the studied words.
In summary, we must consider two important factors when studying the effect of language shift on false memory with second-language learners. First, we need to consider in which direction the shift occurs as, according to the RHM, second-language learners use two different routes when accessing the conceptual representations from L1 and L2 words. Second, we need to bear in mind the memory instructions used because, according to our results, only the inclusive memory instructions allowed us to fully capture the false memory illusion in both within- and between-language conditions. Furthermore, only the restrictive memory instructions allowed us to provide evidence regarding the existence of episodic details in false memories.
Taken together, the current findings help us gain knowledge on the effect of language shift on false memories. The false memory effect raised with the DRM paradigm is so robust that it appears in between-language conditions tested in moderate-proficient L2 speakers. Besides, the only five studies that had compared false memory in within- and between-language conditions had reported mixed findings as they had used different memory instructions. With the present study, we get a clearer picture of the role that instructions play in producing false recognition in both within- and between-language conditions.
Replication package
All research materials, data, and analysis code are available at https://osf.io/bz4g9/
Financial support
This study was partially conducted at the Psychology Research Centre (CIPsi/UM) School of Psychology, University of Minho, supported by the Foundation for Science and Technology (FCT) through the Portuguese State Budget (UIDB/01662/2020), and partially supported by the University of Salamanca and Banco Santander through a pre-doctoral research contract attributed to MS.
Conflicts of interest
The authors declare none.
Appendix