Introduction
Traditionally, metaphor has been regarded as a linguistic act rather than a cognitive way of thinking. So when processing a metaphor such as “our marriage is a dead-end street”, conventional comprehension models would suggest that the encoding depends on finding a shared similarity between the semantic concepts of “marriage” and “dead-end street” (e.g., Bowdle & Gentner, Reference Bowdle and Gentner2005) or treating “dead-end street” as an ad hoc category that the concept of “marriage” can be placed into, such as “things that are going nowhere” (e.g., Glucksberg & Keysar, Reference Glucksberg and Keysar1990; see Holyoak & Stamenković, Reference Holyoak and Stamenković2018, for a review). These models more or less assume that the concepts in a metaphor are already understood by the comprehender, and that metaphor processing depends on stretching the meaning of these words to find a point of comparison between the two concepts. In contrast to this view, Lakoff and Johnson (Reference Lakoff and Johnson1980) argued that metaphor is not only a rhetorical device, but also a fundamental part of human thinking. This work has become well-known as Conceptual Metaphor Theory (CMT). This pioneering work opened the door to the study of metaphor systems in cognitive linguistics.
According to CMT, conceptual metaphors are encoded and represented through an underlying conceptual system. For instance, abstract concepts (e.g., arguments) are constructed and represented entirely through metaphor. The essence of metaphor is to understand and experience an abstract concept through a concrete concept that has structural similarities with the abstract one. For instance, in the conceptual metaphor, “ARGUMENT IS WAR”, the properties of the concrete concept “WAR” are mapped onto “ARGUMENTS”, which leads to a set of correspondences between the two domains: both involve an unpleasant relationship, two opposite enemies, careful strategizing, conflicts, etc. In this way, conceptual metaphors allow abstract concepts that cannot be touched or seen to be represented in terms of concrete concepts that are directly perceived or experienced. As such, CMT has made a major contribution to the area of embodied cognition, an area that argues cognition is related to perception, action and experience (e.g., Barsalou, Reference Barsalou1999, Reference Barsalou2008; Gibbs, Reference Gibbs2006). A major challenge with embodied cognition is to explain how abstract concepts are represented in an embodied way, and CMT offers a feasible explanation.
Although CMT focuses on concept representation, semantic information, and language understanding, this theory has received only limited experimental examination in cognitive psychology, the area most concerned with experimentally investigating concept representation, human memory and language processing. There are some causes for this lack of examination (see Gibbs, Reference Gibbs2009) – for instance, some have argued that the theory is underspecified for cognitive experiments and that the supportive evidence can be explained by more common mechanisms, such as accounts based on association (McGlone, Reference McGlone2011). The experiments that have been conducted on CMT have focused on discourse reading (Glucksberg, Brown, & McGlone; Reference Glucksberg, Brown and McGlone1993; Gong & Ahrens, Reference Gong and Ahrens2007; Keysar, Shen, Glucksberg, & Horton; Reference Keysar, Shen, Glucksberg and Horton2000; Nayak & Gibbs, Reference Nayak and Gibbs1990; Thibodeau & Durgin, Reference Thibodeau and Durgin2008) and the embodiment of abstract concepts (Casasanto, Reference Casasanto2008; Casasanto & Boroditsky, Reference Casasanto and Boroditsky2008; Gibbs, Reference Gibbs2013; Matthews & Matlock, Reference Matthews and Matlock2011; Wilson & Gibbs, Reference Wilson and Gibbs2007; Yang et al., Reference Yang, He, Zhao and Zhang2015, Reference Yang, Reid, Katz and Li2021; Zhong & Leonardelli, Reference Zhong and Leonardelli2008; see Holyoak & Stamenković, Reference Holyoak and Stamenković2018, for a review), but little research has examined whether conceptual metaphors play a role in other areas associated with cognition, such as learning, categorization, and memory.
Conceptual metaphors and memory
A small group of studies conducted by Katz and colleagues have examined the psychological reality of conceptual metaphors through memory experiments. Firstly, Katz and Taylor (Reference Katz and Taylor2008) employed several semantic and episodic memory tasks to show that the conceptual metaphor, “LIFE IS A JOURNEY”, structured participants' semantic knowledge of typical life events. They found that when participants were asked to imagine the ideal life events of a 70-year-old, they tended to produce events in a chronological forward-temporal order and had highly consistent views on the age at which events would happen, the emotion associated with events, and whether the events would really happen or not. Katz and Taylor interpreted these findings to indicate that participants conceptualize typical lives as journeys with landmarks representing significant events in one's life, consistent with the LOVE IS A JOURNEY conceptual metaphor. Secondly, Katz and Law (Reference Katz and Law2010) demonstrated in an episodic memory task that reading consecutive lists of metaphorical expressions based on the same conceptual metaphor elicited proactive interference analogous to what is observed for word lists based on taxonomic categories (see also Katz & Reid, Reference Katz and Reid2020). Furthermore, a shift in the conceptual metaphor (e.g., switching from TIME IS MONEY to LOVE IS A JOURNEY) elicited a “release” from proactive interference akin to what happens when the taxonomic category is shifted between word lists (e.g., switching from exemplars of birds to fruit). This finding suggests that metaphorical expressions are organized according to conceptual metaphor categories in semantic memory, supporting their psychological reality.
Recently, Reid and Katz (Reference Reid and Katz2018) used a modified version of the Deese, Roediger and McDermott (DRM) memory paradigm to further test the psychological reality of conceptual metaphor theory (Deese, Reference Deese1959; Roediger & McDermott, Reference Roediger and McDermott1995). In a typical DRM task, participants read or hear a list of 15 words (e.g., jazz, horn, concert, orchestra, rhythm, piano, band, note, instrument, art, sound, symphony, radio, melody) and afterwards are asked to recall as many as possible (or recognize them from a list). Critically, each of the 15 words is associated with one non-presented word (i.e., the “critical lure”; in this case, the word music). Participants typically falsely remember this associated word at a very high rate even though it was not presented. This effect is also known as the “associative memory illusion,” and a large number of experimental studies have been conducted over the years, using a variety of DRM paradigm variants and controlling many different experimental variables to investigate the factors affecting the illusion (see Gallo, Reference Gallo2006, Reference Gallo2010; Chang & Brainerd, Reference Chang and Brainerd2021, for reviews).
Lakoff (Reference Lakoff and Gibbs2008) proposed that when a person encounters a metaphorical expression, the conceptual metaphor, which is the basis of this expression, is activated automatically. The theory is that when confronted with a phrase like “how did you spend the summer vacation?”, its conceptual metaphor, “TIME IS MONEY”, is automatically activated in people's minds. Moreover, metaphors that people generally use, not only defined by Lakoff, can have the same effect. Thus, Reid and Katz (Reference Reid and Katz2018) hypothesized that after reading several metaphorical expressions based on one conceptual metaphor, people should mistakenly remember other metaphorical expressions based on the same conceptual metaphor. These non-presented expressions should be associated to the read expressions, and therefore, should be activated in memory just as words are activated by their list of associates in the DRM memory illusion. This hypothesis was confirmed as after reading a list of phrases based on a common conceptual metaphor, participants were more likely to falsely recognize new phrases based on the same conceptual metaphor mapping than control phrases that did not share this mapping from the source domain to the target domain. Reid and Katz (Reference Reid and Katz2022) recently replicated the experiment but included a divided attention condition that limited participants’ conscious processing of the phrases. Even under divided attention, participants still displayed a robust false recognition effect, suggesting that conceptual metaphors are accessed automatically and effortlessly. These studies supply experimental evidence for the automatic activation of conceptual metaphors in English and validate the use of episodic memory tasks to explore the tenets of CMT.
The aim of the current research was to employ the same false memory paradigm adopted by Reid and Katz (Reference Reid and Katz2018) to examine the automatic activation of conceptual metaphors in Chinese–English bilinguals. Reid and Katz only tested native English speakers; however, conceptual metaphors are generally considered to be cross-lingual and universal as they have been identified across many languages, including Chinese (Li, Reference Li2010; Lv & Zhang, Reference Lv and Zhang2012; Yu, Reference Yu1995, Reference Yu2003, Reference Yu and Gibbs2008), Thai (Han, Reference Han2019), Dutch (Forceville, Reference Forceville2007), Tagalog (Palmer, Reference Palmer2003), Japanese (Berendt & Tanita, Reference Berendt and Tanita2011; Nomura, Reference Nomura1996), Spanish (Soriano, Reference Soriano2003), and Brazilian Portuguese (Gibbs, Lima, & Francozo, Reference Gibbs, Lima and Francozo2004) to name a few. Although different languages do not always have the exact same conceptual metaphors or instantiations of these metaphors (Yu, Reference Yu and Gibbs2008, Reference Yu and Sharifian2017), these studies indicate that cross-domain mappings appear in many different languages and cultures. Furthermore, Türker (Reference Türker2016) found that bilinguals can better comprehend metaphorical expressions in their L2 when they are based on conceptual metaphors that also exist in their L1, suggesting that conceptual metaphor activation is important when processing metaphorical expressions in L2. As such, people should show evidence of conceptual metaphor activation when reading both Chinese and English sentences, which would support that not only their existence in language, but also their psychological reality, is universal. Before describing the experiments in detail, we will briefly review the literature on bilingual metaphor processing.
Metaphor processing in bilinguals
Metaphor comprehension is also a popular topic in the bilingual literature, with studies indicating that there are important differences in processing between L1 and L2. Several studies indicate that conventional metaphors tend to be processed as novel metaphors in L2. For instance, Mashal, Borodkin, Maliniak, & Faust (Reference Mashal, Borodkin, Maliniak and Faust2015) found that L2 speakers considered conventional metaphoric word pairs (e.g., sweet revenge) to be more novel than L1 speakers. While native speakers processed conventional metaphors quicker when presented to the left hemisphere (LH) than when presented to the right hemisphere (RH), L2 speakers processed conventional metaphors faster when presented to the RH than the LH (Mashal et al., Reference Mashal, Borodkin, Maliniak and Faust2015). The RH is known to play a vital role in the processing of less prominent figurative expressions (e.g., novel metaphors) and more literal expressions (Cardillo, Watson, Schmidt, Kranjec, & Chatterjee, Reference Cardillo, Watson, Schmidt, Kranjec and Chatterjee2012; Forgács, Lukács, & Pléh, Reference Forgács, Lukács and Pléh2014; Kasparian, Reference Kasparian2013). In an ERP study, late bilinguals were shown to respond similarly in their L2 to novel and conventional metaphorical expressions, demonstrating equal amplitude of a late positive component, whereas divergence between the two expressions was reported on the same component in their L1 (Jankowiak, Rataj, & Naskręcki, Reference Jankowiak, Rataj and Naskręcki2017). This suggests that L2 speakers engaged in continued effortful information retrieval for both conventional and novel metaphoric expressions, whereas L1 speakers only showed this continued effort for novel expressions. Additionally, Ikuta and Miwa (Reference Ikuta and Miwa2021) presented L2 speakers forward and reversed metaphors (e.g., “some babies are angels” and “some angels are babies”, respectively) for various durations ranging from 500 ms to 8000 ms. Whereas previous research indicates that L1 speakers rate reversed metaphors as less comprehensible with increased presentation duration (Wolff & Gentner, Reference Wolff and Gentner2011), Ikuta and Miwa found that L2 speakers instead rated reversed metaphors as more comprehensible with increased duration. This suggests that metaphor processing depends more on symmetrical comparison in L2. According to the Career of Metaphor theory (Bowdle & Gentner, Reference Bowdle and Gentner2005), novel metaphors are also initially processed via symmetrical comparison, whereas conventional metaphors are processed more asymmetrically. Therefore, this study also suggests that metaphors tend to be processed as more novel in L2 compared to L1.
Other research indicates that metaphors tend to activate more literal meaning when processed in bilinguals’ L2. Heredia and Cieślicka (Reference Heredia and Cieślicka2016) examined metaphor processing in Spanish–English and English–Spanish bilingual readers. Participants read English passages biasing either a literal meaning or a metaphorical meaning. The results indicated that meaning activation was decided by language dominance as Spanish dominant bilinguals tended to activate the literal meaning whereas English dominant bilinguals and balanced bilinguals tended to access both the literal and metaphorical meanings. Moreover, Citron, Michaelis, and Goldberg (Reference Citron, Michaelis and Goldberg2020) found that metaphorical processing activity is higher in native speakers and that left amygdala activation rises as Metaphoricity increases. However, L2 speakers did not exhibit any significant activity beyond the caudate nucleus when Metaphoricity increased, suggesting that L2 speakers were less influenced by Metaphoricity than native speakers. These findings support the theory that metaphorical language is more engaging for native speakers but not always for L2 speakers, and that L2 speakers tend to process conventional metaphors more similarly to literal sentences than L1 speakers (Citron et al., Reference Citron, Michaelis and Goldberg2020). Along similar lines, Chen, Peng and Zhao (Reference Chen, Peng and Zhao2013) explored the neural mechanisms of bilinguals’ metaphor processing and found that, for Chinese–English bilinguals, the amplitude of the N400 component was more negative for English metaphor expressions than for Chinese literal, English literal, and Chinese metaphor expressions, suggesting that these individuals had more difficulty rejecting the literal meaning in their second language. Behavioral evidence also indicates that inhibiting literal meaning is difficult for L2 speakers as they often erroneously interpret metaphorical statements as literal (Picken, Reference Picken2005).
The finding that L2 speakers show difficulty inhibiting literal meaning when processing metaphors in their L2 is consistent with the graded salience hypothesis (Giora, Reference Giora2003). This hypothesis posits that metaphoric meaning is activated depending on which meaning (literal or figurative) is more salient, with salient meanings accessed more readily. Salience is influenced by factors like word frequency, familiarity, and conventionality. Non-salient meanings are less utilized, less familiar, and need more time to be triggered (Giora, Reference Giora2003, p. 491). Although the graded salience hypothesis is not a bilingual theory, it suggests that metaphor comprehension in bilinguals could be influenced by bilingual factors, such as language proficiency and language dominance. So, for less proficient bilinguals, the literal meaning of a metaphor expression could be more dominant and salient in their second language. Along similar lines, the literal-salience resonant model (Cieślicka, Reference Cieślicka2006, Reference Cieślicka, Heredia and Cieślicka2015) posits that when bilinguals process idioms in their L2, the literal meanings of the idioms’ constituents are more salient than their figurative meanings.
In sum, there is evidence that bilinguals show some differences in processing metaphor expressions in their first language versus their second language, particularly regarding the activation of literal meaning. Therefore, it is important to explore how conceptual metaphor activation may differ depending on whether bilinguals are reading in their first or second language.
The current study
In this experiment, Chinese–English bilingual participants completed the episodic memory task used by Reid and Katz (Reference Reid and Katz2018, Reference Reid and Katz2022). Participants read lists of metaphorical expressions (e.g. “How did you spend the summer?”) based on the same conceptual metaphor (e.g., TIME IS MONEY), and were then tested for recognition of new expressions that either used the same conceptual metaphor (e.g., “that cost me a day”) or did not (e.g., “the weekend is approaching”). If conceptual metaphors are activated automatically after reading a series of metaphor expressions, non-presented metaphor expressions based on the same conceptual metaphor should elicit more memory errors in the subsequent old/new sentence recognition test than control sentences.
Participants completed both Chinese and English versions of the episodic memory task, with each version being identical aside from the language. Through this bilingual design, this research can compare metaphor processing in native language and second language reading. According to the graded salience hypothesis, one could speculate that the first language has advantages in metaphor processing as the metaphorical meaning of the expressions might be more salient in participants’ first language. Therefore, there may be more memory errors for non-presented expressions that use the same underlying conceptual metaphor when participants read in their L1. However, as outlined above, bilinguals tend to process conventional metaphors as novel in their L2. Conceptual metaphors typically have a larger influence on the processing of novel metaphors than conventional metaphors (Gentner, Bowdle, Wolff, & Boronat, Reference Gentner, Bowdle, Wolff, Boronat, Gentner, Holyoak and Kokinov2001; Keysar et al., Reference Keysar, Shen, Glucksberg and Horton2000; Thibodeau & Durgin, Reference Thibodeau and Durgin2008). Therefore, there may be more memory errors for expressions using the same conceptual metaphor in L2 than L1 if conceptual metaphor activation depends on novelty.
The other major finding outlined above is that bilinguals have difficulty inhibiting literal meaning when processing metaphors in their L2. If the literal meaning is more salient for expressions in the bilinguals’ second language, they may also show increased memory errors for literal control sentences associated with the source domain of the conceptual metaphor. That is, expressions based on the TIME IS MONEY conceptual metaphor may more strongly activate meanings related to the literal meaning of MONEY, and therefore, participants may show more memory errors for literal sentences about money (e.g., “he makes biweekly payments”) on the recognition tests in their L2.
Method
Participants
Forty Chinese–English bilinguals (sample age: mean = 19, SD = 1.73) from Zhejiang Gongshang University participated in this experiment in return for a small gift. Of the forty participants, 36 were female and 4 were male. All participants indicated that they were proficient in reading Simplified Chinese as well as in reading English words, and that Chinese was their first and dominant language whereas English was their second language. All participants were born in China and lived there at the time of the experiment. The language learning order of all participants was firstly Chinese and then English, and their dominant language was Chinese at the time of the experiment. Sixteen participants had passed the Test for English Majors-Band 4 (TEM 4), 16 participants had completed the College English Test Band 6 (CET 6) with a mean score of 562, 19 participants had completed the College English Test Band 4 (CET 4) with a mean score of 605, and 3 participants had completed the IELTS with a mean score of 7. All had normal or corrected-to-normal vision and no reading disorders. More details about participants’ language backgrounds are displayed in Table 1.
Materials
Study lists
Eight study lists (four Chinese and four English) were created following the procedures outlined in Reid and Katz (Reference Reid and Katz2018). Each study list contained 16 metaphorical phrases based on 1 specific conceptual metaphor. The lists were based on four well-known conceptual metaphors that appear in both Chinese and English: “LOVE IS A JOURNEY,” “TIME IS MONEY,” “ECONOMY IS A HUMAN BEING,” and “ARGUMENT IS WAR”. Therefore, the same conceptual metaphors were used in both languages, although the expressions in these lists differed as metaphorical expressions vary across languages and cultures, even when the underlying mapping is the same (Yu, Reference Yu and Gibbs2008). For the expressions in the lists, each expression framed a concept from the target domain of the conceptual metaphor in terms of a concept from the source domain of the metaphor. For example, for the “TIME IS MONEY” list, English study expressions included “How did you spend the summer break?” and “The diversion should buy him a few minutes.” Here, “spend” and “buy” are associated with the source domain (i.e., MONEY) whereas “summer break” and “a few minutes” are associated with the target domain (i.e., TIME). Chinese expressions included “你把暑假的时间都花在哪里了(English translation: How did you spend the summer break?)” and “他投资了一年的时间做实验 (He invested a year in the experiment.)”. Here, “花 (English translation: spend)” and “投资(English translation: invested)” are associated with the source domain whereas “暑假 (English translation: summer break)” and “一年 (English translation: a year)” are associated with the target domain.Footnote 1 Each participant read only 3 of the 4 study lists in each language for the purposes of the recognition test (described below). This yielded 4 versions of the experiment, each of which included a different non-presented study list. Participants were randomly assigned to the different versions. The study lists were presented in blocks according to language, and the order of language (i.e., Chinese lists first or English lists first) was counterbalanced across participants.
Distractor task
After each study list, there was a short distractor task consisting of 10 simple mathematical questions that required participants to follow the proper order of operations (e.g., 5 + 6 ÷ (9-3)). The purpose of this task was to prevent participants from rehearsing the study list expressions in short term memory, which would affect the subsequent results of the recognition tests. Participants were asked to complete these math questions mentally without using any tools, such as a pen and paper or a calculator. There was no time limit for the math questions, but participants were asked to answer them as quickly and accurately as possible.
Recognition tests
After completing each study list and distractor task, participants completed a recognition task on the study list they had just finished. There was no time limit. Each recognition test consisted of 26 or 27 phrases, divided into 14 old items and 12-13 new items, or “lures”. The old items included all the phrases in the previous study list except the phrases in the second and the fourteenth serial positions. The new items included four types: 3 critical consistent lures, 3 control metaphor lures, 3 control literal lures and 3 to 4 unrelated lures.
The 3 critical consistent lures were expressions based on the same conceptual metaphor as the study list expressions but were not presented previously in the study list. The expression “That cost me a day” is an example of a critical consistent lure, which frames a TIME concept (“day”) in terms of a MONEY concept (“cost”).
The 3 control metaphor lures were phrases that framed the same target domain but with different source domains. For example, for the “LOVE IS A JOURNEY” list, one of the metaphor control lures was based on the “LOVE IS SWEET” conceptual metaphor. The purpose of the control metaphor lures was to eliminate the possibility that the participants would simply encode the target domain of the study list instead of activating a conceptual mapping from the source domain to the target domain. For instance, this ruled out that participants simply memorized that all the phrases were metaphorical expressions about “LOVE” rather than activating the specific metaphor mapping, “LOVE IS A JOURNEY”.
The 3 control literal lures were related to the source domain. For example, for the “LOVE IS A JOURNEY” list, a control literal lure was “This pathway is a short walk”, which is a literal statement about a JOURNEY. The purpose of the control literal lures was to eliminate the possibility that the participants simply encoded the source domain of the study list. For instance, this ruled out that participants simply attended to the fact that all of the sentences in the “LOVE IS A JOURNEY” list mentioned something about journeys, and used this criterion to make recognition decisions.
Finally, there were 3 to 4 unrelated lures that did not overlap with the study list in terms of either the target or source domain. These were the 3 critical consistent lures, 3 control metaphor lures, 3 control literal lures, and the conceptual metaphor label associated with the non-presented list (e.g., “argument is war”). These 10 phrases were split into 3 parts with 3-4 phrases each and randomly assigned to different study lists to serve as the unrelated lures. We expected false recognition to be quite low for these items as they had little overlap with the studied items.
To ensure that the different types of lure sentences did not differ in terms of factors that could influence memory, we asked 12 Chinese–English bilingual volunteers to rate the affective valence and familiarity of the expressions. The items were rated on two seven-point scales, one for familiarity, and one for affective valence (1 = very negative/very unfamiliar, 7 = very positive/very familiar). We then compared the sentence length, valence, and familiarity values between the critical consistent, control metaphor, control literal and unrelated lures. For the Chinese stimuli, there were no sentence length differences between those four conditions, F = 0.891, p = 0.450 (with mean lengths of 11, 9, 10, and 10, respectively) nor differences in emotional valence, F = 0.418, p = 0.740 (with mean emotional valences of 3.5, 3.3, 3.8, and 3.6, respectively). There were also no sentence familiarity differences between the four conditions, F = 0.064, p = 0.979 (with mean familiarities of 4.8, 4.7, 4.9, and 4.8, respectively).
For the English stimuli, there were no length differences between the four conditions, F = 0.386, p = 0.763 (with mean lengths of 7, 6, 6, and 6, respectively). There were also no emotional valence differences between those four conditions, F = 1.554, p = 0.208 (with mean emotional valences of 3.3, 3.3, 4.1, and 3.6, respectively), nor differences in sentence familiarity, F = 0.932, p = 0.430 (with mean familiarities of 4.6, 5.0, 4.7, and 4.8, respectively). All in all, both the Chinese and English sentences were matched well in terms of sentence length, emotional valence and familiarity across the four main comparison conditions.
Procedure
The procedure was similar to that used in Reid and Katz (Reference Reid and Katz2018). After participants arrived at the laboratory, they read the letter of information about this experiment, and verbal consent was obtained from each participant before they began. Participants were told that this was an experiment on language, memory and mental mathematics ability. The purpose of this was to encourage participants to pay more attention to the distractor tasks to exert the interference effect as much as possible.
After the explanation, the experimenter opened the experiment on the computer. E-prime 2.0 software (Psychology Software Tools, Pittsburgh, PA; see Schneider et al., Reference Schneider, Eschman and Zuccolotto2002) was used for data collection. After the participants read the experiment descriptions that were displayed on the computer, they pressed the space bar and started the test. The Chinese and English versions of the experiment were counterbalanced across different participants. Each version was divided into practice and formal experiment trials. The practice trial was designed to familiarize participants with the task and provided feedback with the correct answers to the math questions and recognition items after participants responded. Moreover, the practice trial items were metaphor and literal phrases created by the experimenter, and none of these items appeared later in the formal experiment trials. Each version contained one practice trial and three formal trials. The only difference between these two parts was that the practice trial displayed correct answers to the distractor task and recognition test whereas the formal trials did not. Participants firstly saw the study list, then finished the math distractor, and finally answered the recognition test for each study list they had just read. This process was repeated for each of the presented study lists. Therefore, in contrast to some DRM experiments wherein all study lists are presented first and are followed by a large recognition test drawing from all lists (e.g., Gallo, Roediger, & McDermott, Reference Gallo, Roediger and McDermott2001; Soro et al., Reference Soro, Ferreira, Semin, Mata and Carneiro2017), due to the difficulty of remembering full sentences, we tested recognition following each list (as done in Reid & Katz, Reference Reid and Katz2018; see also Kawasaki & Yama, Reference Kawasaki and Yama2006).
Each phrase in each study list was preceded by a fixation cross “+” presented for 500 milliseconds in the centre of the screen, and then the phrase itself was presented for 3 seconds. After the study list was presented, the 10 math questions in the distractor task followed. Finally, the recognition test took place with each phrase presented one at a time in the center of the screen. Participants were instructed to identify phrases as old or new by pressing the “O” or “N” buttons, respectively, on the keyboard. Similar to the distractor task, the recognition task did not have a time limit.
The entire memory task took about 25−40 minutes to complete. In order to understand participants’ Chinese and English language proficiency and provide a reference basis for later data analysis, participants also completed a language experience questionnaire after the memory task, which took about 5−10 minutes. Therefore, the whole process took about 30−50 minutes. This research was approved by the Foreign Languages Department of Zhejiang Gongshang University.
Results
Generalized Linear mixed-effects (GLMM) models from the lme4 package in R were used to analyze false recognition (Bates, Mächler, Bolker, & Walker, Reference Bates, Mächler, Bolker and Walker2015; Lo & Andrews, Reference Lo and Andrews2015; R Core Team, 2015), treating subjects and items as random effects and treating Lure Type and Language as fixed effects (Baayen, Reference Baayen2008; Baayen, Davidson, & Bates, Reference Baayen, Davidson and Bates2008). The function Anova in the car package (Fox & Weisberg, Reference Fox and Weisberg2016) was used to determine p-values from Wald chi-square tests for fixed items with more than two levels. For the false recognition analysis, the model was Accuracy = glmer (accuracy ~ Lure Type * Language + (1 |subject) + (1 |item), family = “binomial”, control = glmerControl(optimizer = “bobyqa”)).
For completeness, we also examined response latencies for the lure sentences when they were correctly identified as “new” items (as will be seen, there were too few falsely recognized items in some categories to do a meaningful analysis of response latencies for false recognition). In false recognition studies with word lists, participants typically take longer to identify new words as “new” when they are related to the studied words because these words seem more familiar and are more difficult to reject as “old” (see Gallo, Reference Gallo2006, for a review). Therefore, if critical consistent lures have longer response latencies when they are correctly identified as “new”, this would suggest that they seemed more familiar to participants, supporting the hypothesis that their corresponding conceptual metaphors were activated at study.
Response latencies less than 250 ms or more than 3 standard deviations from the participant's mean response times, as well as false recognitions (14.3% of the data) were excluded from the latency analyses. Before running the model, R-default treatment contrasts were altered to sum-to-zero contrasts (Levy, Reference Levy2014; Singmann & Kellen, Reference Singmann and Kellen2017). The R code for the model was: RT = glmer (RT ~ Lure Type * Language + (1 |subject) + (1 |item), family = Gamma(link=“identity”), control = glmerControl(optimizer = “bobyqa”)).
More complex models that included all relevant random structures were used in our initial analyses, but we used the simpler models noted above due to convergence failures with the more complex random slope models (Barr, Reference Barr2013). Different link functions were used for the false recognition and latency analyses because gamma link is appropriate for positively skewed continuous data that are characteristic of response latencies, and binomial link is suitable for recognition data, which is binomial data (each trial is a dichotomous datapoint, that is, the item is categorized as either “old” or “new”). More details about data analysis in GLMM models can be found in Fox and Weisberg (Reference Fox and Weisberg2018). The percentage of false recognition and mean response latencies from a subject-based analysis for the four types of lures are shown in Table 2.
Analyses
Preliminary analysis
We first conducted a preliminary analysis on the critical consistent, control metaphor, and control literal lures when they were unrelated to the study lists. Although the sentences were matched well on length, familiarity, and affective valence, this analysis was conducted to directly examine the likelihood of false recognition for these items when they were not related to the presented items (in other words, to ensure that some lures were not more likely than others to be falsely recognized for reasons other than conceptual metaphor activation). A 2 (Language: Chinese or English) by 3 (Unrelated Lure Type: critical consistent, control metaphor, or control literal) within-subject GLMM analysis was conducted. In the analysis for false recognition, there were no main effects of Lure Type (χ 2 = 2.008, p = .366) or Language (χ 2 = 2.383, p = .123), nor was the interaction between Language and Lure Type significant, χ 2 = 2.178, p = .337. We also conducted an analysis of response latencies, which revealed no significant main effect of Lure Type (χ 2 = 4.567 p = .102), but a significant main effect of Language (χ 2 = 241.517, p < .001), suggesting that correct rejections of unrelated lures were faster overall in Chinese (1068ms) than in English (1964ms). This was not surprising as it was expected that participants would take longer to process and make recognition decisions for sentences in their L2. Critically, there was no significant interaction between Language and Lure Type, χ 2 = 1.786, p = .406. Therefore, any differences between the lure types in the subsequent analyses cannot be attributed to pre-existing differences in the lure items.
Main analysis
Participants finished a recognition test following each study list. The data presented in Table 2 are based on participants’ false recognition across all study lists. The crucial interest was the false recognition rate contrast between the four lure types. We conducted a 2 (Language: Chinese and English) by 4 (Lure Type: critical consistent, control metaphor, control literal, and unrelated) repeated measurements GLMM analysis on false recognition and response latencies for lures correctly identified as new.
False recognition analysis
The false recognition analysis revealed a significant main effect of Lure Type (χ 2 = 115.723, p < .001), indicating that mean false recognition varied significantly between the different lure types. Moreover, there was a significant interaction between Language and Lure Type, χ 2 = 13.419, p = .004. Lastly, the main effect of Language did not approach significance, χ 2 = 1.065, p = .302.
In order to determine which lure types differed statistically, we conducted post-hoc comparisons between the false recognition rates for the different lure types within each language. For Chinese, the false recognition rate of the critical consistent lures was significantly higher than for control metaphor lures (z = −4.003, p < .001), control literal lures (z = −6.212, p < .001) and unrelated lures (z = −7.881, p < .001), replicating the findings of Reid and Katz (Reference Reid and Katz2018) in Chinese. The false recognition rates also differed significantly between the control metaphor lures and the control literal lures (z = −3.061, p = .012), and between the control metaphor lures and unrelated lures (z = −4.209, p < .001). For English, the false recognition rate of the critical consistent lures was also significantly higher than for the control metaphor lures (z = −3.289, p = .006), control literal lures (z = −3.313, p = .005) and unrelated lures (z = −7.230, p < .001), indicating that the conceptual metaphor false memory effect is robust even in participants’ second language. The false recognition rates also differed significantly between the control metaphor lures and the unrelated lures (z = 4.056, p < .001). Unlike in Chinese however, the false recognition rates did not differ significantly between the control metaphor lures and the control literal lures (z = 0.123, p = .999).
As mentioned above, there was also a significant interaction between Language and Lure Type. To explore this interaction, we conducted a series of paired t-tests to compare each Chinese lure type to its English counterpart. Alpha was adjusted using the Bonferroni correction, yielding an adjusted value of .013. The only contrast that reached significance was between control literal lures, t (39) = −4.22, p < .001, Cohan's d = −0.89 (all other p's > .025). Therefore, the interaction between Language and Lure Type was driven mainly by increased false recognition for literal sentences about the source domain in English, the participants’ L2, relative to their first language of Chinese.
To examine the testing order effect statistically, we also conducted a mixed-design GLMM analysis with Testing Order (first block vs. second block), Language (Chinese vs. English) and Lure Type (critical consistent, control metaphor, control literal and unrelated lures) as fixed factors. The results revealed that the main effect of Lure Type and interaction between Language and Lure Type were still significant (p's < .05), and that the main effect of Language remained nonsignificant (p = .369). The interaction between Language and Testing Order was significant, χ 2 = 11.702, p = .001, suggesting that Chinese–English bilinguals had fewer false recognitions when reading English metaphors than when reading Chinese metaphors in the first block, whereas the pattern was reversed in the second block. Critically however, the interaction between Lure Type and Testing Order was not significant, χ 2 = 1.095, p = .778, nor was the three-way interaction between Testing Order, Language and Lure Type, χ 2 = 1.043, p = .791, indicating that the Testing Order did not impact the pattern of false recognition between the different lure types across the two languages.
Latency analysis
The latency analysis on lures correctly identified as “new” revealed significant main effects of Lure Type (χ 2 = 706.608, p < .001) and Language (χ 2 = 1842.498, p < .001), indicating that recognition latencies varied significantly between the different lure types and languages. There was also a significant interaction between Language and Lure Type, χ 2 = 34.161, p < .001. The recognition latencies (and standard deviations) for the different lure types are also displayed in Table 2.
Post-hoc comparisons were conducted to examine whether the recognition latencies differed between specific pairs of lure types. For Chinese, the mean latency for critical consistent expressions was significantly slower than for control metaphor lures (z = −4.508, p < .001), control literal lures (z = −7.045, p < .001) and unrelated lures (z = −7.263, p < .001). The mean latency for the control metaphor lures was also significantly slower than for the control literal lures (z = −4.945, p < .001) and unrelated lures (z = −7.339, p < .001). For English, the mean latency of the critical consistent lures was also significantly slower than for the control metaphor lures (z = −2.876, p = .021), control literal lures (z = −9.980, p < .001) and unrelated lures (z = −8.548, p < .001), indicating a similar pattern of conceptual metaphor activation in participants’ second language. Mean latency was also significantly slower for the control metaphor lures than the control literal lures (z = −7.010, p < .001) and unrelated lures (z = −7.018, p < .001). These data suggest that even when critical consistent lures were not falsely recognized, they took longer to identify as “new” compared to the other lure types in both Chinese and English. This was likely because the critical consistent lures were more familiar, and therefore, harder to reject as “old” (see Gallo, Reference Gallo2006). As such, this is consistent with the false recognition results and further suggests that the conceptual metaphors were activated at study.
In sum, across both languages, non-presented expressions that were based on the same underlying conceptual metaphor as the study list items were falsely recognized more often than controls. This replicates the memory effect found by Reid and Katz (Reference Reid and Katz2018) both in a completely different language (Simplified Chinese), and in English with Chinese–English bilinguals who speak English as a second language. Metaphorically consistent expressions also took longer to identify as “new” when they were not falsely recognized, suggesting that they were more familiar to participants, and thus, harder to reject as “old”. The major difference between the languages was that control literal sentences about the source domains of the conceptual metaphors were falsely recognized more often in Chinese–English bilinguals’ second language, English.
General discussion and conclusion
The present study shows three major findings. Firstly, the results suggest that conceptual metaphors are activated automatically and immediately during sentence reading, which influences how such sentences are encoded into memory. After reading lists of metaphorical phrases based on the same conceptual metaphor, participants were more likely to falsely recognize non-presented phrases based on the same conceptual metaphor mapping (i.e., “critical consistent” lures), replicating the results obtained in English speakers (Reid & Katz, Reference Reid and Katz2018). We interpret this finding to suggest that the underlying conceptual metaphor was activated while participants read the study list expressions, and as a result, participants found new expressions that used this conceptual metaphor to be more familiar at recognition. This could be due to a processing fluency advantage for critical consistent expressions. That is, because the conceptual mapping needed to process these items was already activated from the study list, these expressions were processed more fluently during recognition, whereas for the control expressions, a new conceptual mapping would need to be activated, yielding relatively slower processing. Processing fluency is a heuristic for judging familiarity in recognition tasks and has been proposed as a key mechanism for false memory effects (Doss, Bluestone, & Gallo, Reference Doss, Bluestone and Gallo2016; Gallo & Roediger, Reference Gallo and Roediger2003; Whittlesea, Reference Whittlesea2002).Footnote 2 Alternatively, from the perspective of fuzzy-trace theory (Brainerd & Reyna, Reference Brainerd and Reyna2002; Reyna et al., Reference Reyna, Corbin, Weldon and Brainerd2016), the conceptual metaphor may represent the overall theme, or “gist”, of each list; and new expressions that use the same metaphor mapping may be deemed by participants as being more consistent with this gist, yielding higher false recognition. Regardless of the specific memory theory, the fact that the critical consistent expressions were falsely recognized significantly more often than controls suggests that conceptual metaphors influence how expressions are encoded into memory, supporting their psychological reality.
Secondly, our results also support the universality of conceptual metaphors as the memory effect is observed not only in English speakers, but also in Chinese speakers. This suggests that the effect found by Reid and Katz (Reference Reid and Katz2018) was not just due to the particular stimuli used, but that it generalizes to an entirely new set of expressions from a different language. We also employed two lists based on new conceptual metaphors, ARGUMENT IS WAR and THE ECONOMY IS A HUMAN BEING, that were not explored by Reid and Katz, suggesting the memory effect is robust across different conceptual mappings as well.
Thirdly, the memory effect was also observed in participants’ second language of English. The robustness of the effect in participants’ L2 is somewhat surprising as metaphor comprehension often poses problems for second language learners (see Nacey, Reference Nacey, bSemino and Demjén2016, for a review). Nonetheless, the Chinese–English bilinguals in this study showed a significant memory effect for critical consistent lures in their L2, and after correcting for multiple comparisons, the false recognition rate for these lures did not differ between L1 and L2. This demonstrates that participants were able to activate the appropriate conceptual mappings even in their L2.
Although the conceptual metaphor false memory effect was observed in both participants’ L1 and L2, the major difference that emerged between the languages was for the control literal lures, in which there was a higher false recognition rate in participants’ L2. Bilinguals often show difficulty inhibiting the literal meaning of metaphorical expressions in their second language (Chen et al., Reference Chen, Peng and Zhao2013) and sometimes interpret metaphorical expressions literally when there is a lack of context (Picken, Reference Picken2005). As such, we interpret the difference for control literal lures as indicating that the study list expressions activated a literal representation of the source domain to a greater extent in participants’ second language. Because the literal source domain was activated during reading, when participants subsequently recognized items, they were more likely to make memory errors for literal sentences about the source domain. In contrast, when participants read the study lists in their L1, they more easily inhibited the literal representation of the source domain to focus on its metaphorical aspects, resulting in very few memory errors for the control literal lures in their L1. This is also consistent with the graded salience model (Giora, Reference Giora2003), which posits that less salient meanings take longer to be activated. We hypothesize that metaphorical meanings are more salient in participants’ L1 due to their expertise in the language, but in L2, these meanings may be less salient, leaving more room for literal representations to activate.
Although we have discussed our results mostly in terms of CMT, it is important to consider other metaphor theories that could possibly explain these effects. Another dominant approach to metaphor is Gentner and colleagues’ Career of Metaphor Theory (Bowdle & Gentner, Reference Bowdle and Gentner2005; Gentner et al., Reference Gentner, Bowdle, Wolff, Boronat, Gentner, Holyoak and Kokinov2001), which posits that novel metaphors are processed through structural alignment, much like analogies. Over time, if a metaphor becomes highly conventionalized, the metaphoric meaning associated with the source term is abstracted and becomes a secondary, lexicalized meaning of the source term. At this point, this conventionalized meaning resembles a metaphoric category, consistent with the attributive category theory of metaphor (Glucksberg & Keysar, Reference Glucksberg and Keysar1990). For instance, “goldmine” can refer literally to a place where gold is mined, or to an abstract metaphor category of “things that are valuable”.
Some aspects of our data are consistent with Career of Metaphor theory. The finding that participants have more false recognitions in L2 for literal sentences associated with the source domains of the metaphors is consistent with Career of Metaphor because these metaphors should presumably be more novel in participants’ L2. According to Career of Metaphor, novel metaphors involve structurally aligning the concrete and literal representations of the target and source, which means the literal representation of the source should be activated for novel metaphors. In contrast, for highly conventional metaphors, the metaphoric meaning is lexicalized and can be accessed directly. Therefore, it makes sense that literal representations should be less activated in participants’ L1 with metaphors that are highly familiar as they should be able to directly access the metaphorical meaning of the source terms used in these metaphors. Therefore, the increased rate of false recognition for literal control sentences in participants’ L2 is consistent with Career of Metaphor theory.
From our perspective, it is more difficult to explain the pattern of false recognition in L1 in terms of Career of Metaphor theory. According to this theory, highly conventional metaphors, such as those used in this study, can be processed by categorization. Proponents of the attributive category theory of metaphor have rejected the notion that conceptual metaphors play a role in processing conventional metaphor statements (Glucksberg, Brown, & McGlone, Reference Glucksberg, Brown and McGlone1993; Keysar, Shen, Glucksberg, & Horton, Reference Keysar, Shen, Glucksberg and Horton2000; McGlone, Reference McGlone1996, Reference McGlone2007, Reference McGlone2011), arguing instead that people access stereotypical meanings of the source term that could be attributed to the target term. Furthermore, Gentner et al. (Reference Gentner, Bowdle, Wolff, Boronat, Gentner, Holyoak and Kokinov2001) argue that category-based approaches are “localist” in that they posit that metaphors highlight categorical relations between only the specific terms used in their statements, and therefore, do not have a mechanism to explain extended metaphorical mappings. If conceptual metaphors are not activated when processing conventional metaphors, we may expect a false memory effect in L2 where the expressions are novel, but there is no reason to expect that critical consistent lures should be falsely recognized more often than control metaphor lures in L1 where the expressions are conventional, unless false recognition was caused by another factor such as overall word similarity. However, Reid and Katz (Reference Reid and Katz2018) demonstrated that word similarity alone cannot account for false recognition effects based on conceptual metaphors (see also Katz & Reid, Reference Katz and Reid2020). Therefore, although the increased rate of false recognition for literal control sentences in L2 aligns with Career of Metaphor theory, the pattern of false recognition in L1 is inconsistent with the theory. Nonetheless, we do not reject the notion that structural alignment is an important mechanism in metaphor comprehension, and it is likely important for how conceptual metaphors are initially learned (Gentner et al., Reference Gentner, Bowdle, Wolff, Boronat, Gentner, Holyoak and Kokinov2001; Holyoak & Stamenković, Reference Holyoak and Stamenković2018).
There are some limitations in the present study. For instance, we only recruited medium proficiency Chinese–English bilinguals, but according to the graded salience hypothesis, language proficiency should also influence metaphor processing, so future research should compare conceptual metaphor activation in bilinguals with different levels of proficiency. Furthermore, we extended the evidence of conceptual metaphor activation only to Chinese–English bilinguals, but it may be interesting to explore the activation of conceptual metaphors in different languages to confirm the universality of Conceptual Metaphor Theory. Finally, another avenue for future research may be to compare English monolinguals reading English metaphoric expressions to Chinese monolinguals reading Chinese expressions to explore cultural differences in conceptual metaphor processing. Chinese culture emphasizes holistic processing, which involves a focus on how elements are interconnected, whereas Western culture prefers analytical processing, focusing on the properties of individual objects (Li, Masuda, Hamamura, & Ishii, Reference Li, Masuda, Hamamura and Ishii2018). As such, Chinese readers may focus more on how the study list expressions are connected through an underlying metaphor compared to English readers, who may focus more on the aspects of each expression that discriminate it from the others. Therefore, Chinese participants may show larger memory effects for critical consistent expressions than English participants. Furthermore, recent research has found that culture influences the DRM memory illusion (Wang et al., Reference Wang, Otgaar, Santtila, Shen and Zhou2021). Therefore, it may be interesting to explore the role of culture in false memories elicited from conceptual metaphor activation.
In conclusion, through a false memory task conducted with Chinese–English bilinguals, we provide evidence for the activation of conceptual metaphors during reading in the Chinese language, and for individuals reading expressions in their second language, English. This replicates and extends the findings of Reid and Katz (Reference Reid and Katz2018), generalizing the effect to both another language (Simplified Chinese) and another language group (second language learners). The data also suggest that participants had difficulty inhibiting literal meaning in their L2, supporting a bilingual extension of the graded salience hypothesis (Giora, Reference Giora2003). Future research should consider other languages, as well as the role culture plays in memory for language based on conceptual metaphors.
Acknowledgements
The data and material are available upon request from the first author. This research was supported by the Zhejiang Provincial Philosophy and Social Science Planning Project (22NDQN238YB) and the 2021 Zhejiang Gongshang University Provincial platform teaching project (1070XJ0520111-31) to Huilan Yang, and the 2022 Zhejiang Gongshang University school-level graduate education reform project (YJG2022104) to Sumin Zhang.
Conflicts of interest
None.
Appendix
Examples of Expressions under different condition used in Chinese experiment
时间就是金钱 (TIME IS MONEY study list)
你把暑假的时间都花在哪里了
我还有足够的时间完成这项工作
计划好你的时间
周末很宝贵
我没有时间花在这件事上
给你三个月的时间
老人剩下的时间不多了
他投资了一年的时间做实验
为此留出几天时间
你能花一个下午做这个吗
时间都被浪费了
游戏规定的时间快用完了
免费时间很宝贵
在他身上花那么多时间不值得
这将为我节省很多时间
Recognition test list
(Critical CM)
时间就是金钱
(Critical consistent lures)
借给我几分钟
那花了我一天的时间
不要浪费你的时间
(Control metaphor lures)
时间好像静止了
时间给他开了个玩笑
截止日期快到了
(Control literal lures)
你每月房租多少
他每两周付款一次
她借了低息贷款
(Unrelated items)
他们之间的谈话总是唇枪舌剑的
我们进行了非常生动的讨论
战争和疾病使得人口减少
Examples of Expressions under different condition used in English experiment
TIME IS MONEY study list
How did you spend the summer break
I have some days off banked from last month
Budget your hours
Weekends are precious
I don't have the hours for this
I'll give you a minute
Is that worth your while
Years are invested
Put aside a few days for this
Can you spare an afternoon
Hours are wasted
How many minutes do I have left
Free hours are valuable
The diversion should buy him a few minutes
This will save me many hours
Recognition test list
(Critical CM)
Time is money
(Critical consistent lures)
Lend me a few minutes
That cost me a day
You don't use your hours profitably
(Control metaphor lures)
The weekend seems so far away
The years have not been kind to him
The deadline is approaching
(Control literal lures)
How much is your rent per month
He makes biweekly payments
She took out a low-interest loan
(Unrelated items)
Your claims are indefensible
We had a very lively discussion
We need to protect our allies