Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-02-11T04:52:57.932Z Has data issue: false hasContentIssue false

Not all verbal labels grease the wheels of odor categories

Published online by Cambridge University Press:  04 February 2025

Yaxiong Cao
Affiliation:
School of Cultures, Languages and Linguistics, University of Auckland, Auckland, New Zealand
Asifa Majid
Affiliation:
Department of Experimental Psychology, University of Oxford, Oxford, UK
Norbert Vanek*
Affiliation:
School of Cultures, Languages and Linguistics, University of Auckland, Auckland, New Zealand Experimental Research on Central European Languages Lab, Charles University, Prague, Czechia
*
Corresponding author: Norbert Vanek; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Language is known to play a crucial role in influencing how humans perceive and categorize sensory stimuli, including odors. This study investigated the impact of linguistic labeling on odor categorization among bilingual participants proficient in Chinese (L1) and English (L2). We hypothesized that L1-like linguistic labels would more robustly propel the learning of new olfactory categories compared to a condition without language, and more familiar labels would better support odor category learning. The analysis focused on comparing learning trajectories and odor categorization performance of four groups, three in which odors were paired with different sets of verbal labels and a control group that categorized odors without any verbal labeling. Following four days of intensive training, the results showed that the groups with verbal labels numerically outperformed the control group, and that the less familiar the labels sounded the more successful categorization became. However, between-group differences did not reach statistical significance. These findings, while not conclusively supporting our hypotheses, provide insights into the complex relationship between linguistic familiarity and odor category formation. The results are nested within Ad Hoc Cognition, highlighting that variations in linguistic familiarity may not induce robust enough contextual changes to differentially affect how odor categories are formed.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

The interplay between language and sensory perception, particularly in odor categorization, is a fascinating area of cognitive research. This study investigates how linguistic labeling affects odor category formation in Chinese-English bilinguals, exploring whether familiarity with verbal labels influences the efficiency of learning and categorizing odors. The relationship between words and categories receives much ongoing scrutiny and scholarly attention. One widely held assumption across a vast array of approaches to language and cognition (e.g., Evans, Reference Evans2009; Jackendoff, Reference Jackendoff2002; Langacker, Reference Langacker2008) is that words are linguistic labels with largely stable conventionalized pairings of form and meaning stored in memory. A part of this assumption is that in response to a linguistic label people activate the core of a concept and meaning (i.e., the network of stored information linked to a concept) that a given label denotes (Armstrong et al., Reference Armstrong, Gleitman and Gleitman1983). This approach embraces the idea that the core of a concept is stable and static, inasmuch as it defies contextual variation. In sharp contrast to this view, Barsalou (Reference Barsalou, Collins, Gathercole, Conway and Morris1993, p.29) argued that “a concept is a temporary construction in working memory, derived from a larger body of knowledge in long-term memory to represent a category” so that “across context, a given person’s concept for the same category may change, utilizing different knowledge from long-term memory, at least to some extent” (see also, e.g., Barsalou, Reference Barsalou1983, Reference Barsalou and Neisser1987, Reference Barsalou, Vosniarlou and Ortony1989, etc.). More recently, Casasanto and Lupyan (Reference Casasanto, Lupyan, Margolis and Laurence2015) proposed the Ad Hoc Cognition (AHC) framework, within which every concept, category, and word meaning is created on a case-by-case basis and can vary from one occurrence to another, both within and among individuals and groups. Within AHC, what best defines concepts, categories and meanings is their emergence through activation of a context-specific network of stored information triggered by contextual cues. Studies show that linguistic labels can act as particularly useful cues to support ad hoc categorization, even when encountered just temporarily in highly context-specific pairings with sensory input of various kinds, including visual, tactile, and olfactory stimuli (Lupyan et al., Reference Lupyan, Rakison and McClelland2007; Miller et al., Reference Miller, Schmidt, Blankenburg and Pulvermüller2018; Vanek et al., Reference Vanek, Sóskuthy and Majid2021, respectively).

Lupyan et al. (Reference Lupyan, Rakison and McClelland2007) conducted a visual categorization study in which participants were trained to distinguish between approachable and non-approachable “aliens.” Visual stimuli were either co-presented with meaningless pseudowords or on their own. Participants who learned categories with verbal labels exhibited significantly higher accuracy than those without. In a separate experiment, the effect of visual-verbal labels was tested alongside visual location-based spatial cues and auditory verbal cues. When all verbal data (written, auditory) was pooled together and compared with all nonverbal data (no cue, location), the results showed that verbal cues facilitated category learning more than nonverbal/no cues. These findings support the idea of category creation ad hoc based on retrieval cues and also highlight that verbal cues can activate stored information more effectively than nonverbal ones. Further investigations (Lupyan & Casasanto, Reference Lupyan and Casasanto2015) compared the contribution of novel words (foove, crelch) to categorization learning with conventional words (round, pointy) and reported comparable learning outcomes. While novel verbal labels, despite lacking actual meaning, may help construct representations more robustly than when representations are constructed without verbal labels (Lupyan et al., Reference Lupyan, Rakison and McClelland2007), a question that remains open is whether new verbal cues activate memory information when concept-word pairings are truly random. Fixed pseudoword-concept pairings without randomization are not uncommon in the literature (e.g., Althaus & Plunkett, Reference Althaus and Plunkett2016; Perry & Lupyan, Reference Perry and Lupyan2014). However, unless counterbalanced, critics may argue that foove activates ‘round’ and crelch activates ‘pointy’ due to sound symbolism (Hinton et al., Reference Hinton, Nichols and Ohala2006) rather than by the context in which the label and the meaning are co-instantiated.

Utilizing a fully randomized pairing of novel verbal labels and sensory input, Miller et al. (Reference Miller, Schmidt, Blankenburg and Pulvermüller2018) explored whether the connections established between labels co-presented with tactile sensory input could enhance the ability to differentiate between subtle differences in vibrating configurations. In the experiment, participants were presented with native-language-like pseudowords alongside differing tactile sensations. Importantly, the participants were unaware that these pseudoword-tactile pairs were co-presented with either high (70%) consistency or low (25%) consistency. Following a week of training, participants showed significant improvement for tactile stimulus discrimination in the high-consistency but not low-consistency condition, despite the fact that the stimuli frequency and exposure duration were equal across conditions. These findings align with the AHC framework, with respect to its tenets that concept construction happens ad hoc in a context-driven manner and varies across individuals (each learning to associate a different stimulus pair) and within individuals (high versus low pairing consistency).Footnote 1 One important aspect of AHC, but not controlled for in the study by Miller et al., is bodily or experiential relativity (Casasanto, Reference Casasanto2011, Reference Casasanto and Shapiro2014), a more enduring part of the experiential context of individuals, in this case in the potential differences in general tactile discrimination ability, for example.

This limitation was ameliorated in a later study by Vanek et al. (Reference Vanek, Sóskuthy and Majid2021). Focusing on olfaction, Vanek et al. (Reference Vanek, Sóskuthy and Majid2021) established participants’ discriminatory abilities were equated before examining word-assisted odor category learning with native-language-sounding pseudowords randomly paired with odors. Following four days of training where individuals were tasked with categorizing odors alongside meaningless pseudowords, those who were trained with more consistent (81%) odor-pseudoword pairs learned the target categories significantly better compared to individuals who had the same perceptual input but were trained on less consistent (25%) odor-label pairs. These findings further highlight how new and initially meaningless verbal labels, at least those conforming to the phonotactic rules of the native language, can facilitate category formation and learning. Nested within the AHC framework, a consistency effect in categorization accuracy, despite randomness in odor-label pairings, demonstrates how context can give rise to stability from pervasive variability. One point not addressed earlier is whether the nature of verbal labels matters in promoting category learning.

1.1. The present study

Previous research on word-assisted category formation explored the effect of verbal labels on accuracy gains and learning trajectories. However, the (pseudo)words used to date have typically conformed to the phonotactic rules of the participant’s native language. This approach, while informative, overlooks the increasingly common reality of multilingual environments and the potential differences in how first (L1) and second (L2) language labels might influence cognitive processes.

We ask whether comparable verbal facilitation effects arise when categorization is accompanied by non-native-sounding verbal labels. This question is particularly pertinent given that more individuals than not frequently navigate between multiple languages. If meaningless verbal labels can implicitly form meaningful associations that support category construction, does familiarity with how the verbal labels sound matter? In a related vein, if non-native sounding words are ‘sieved through’ the weights tuned by the listener’s first language (Flege & MacKay, Reference Flege and MacKay2004; Sebastián-Gallés & Soto-Faraco, Reference Sebastián-Gallés and Soto-Faraco1999; Strange, Reference Strange1995), does previous linguistic experience affect how more vs. less familiar verbal labels assist listeners in forming new categories?

These questions are not only theoretically interesting but have significant ecological validity. In many real-world scenarios, individuals must categorize and learn about new stimuli using labels from their L2 or even completely unfamiliar languages. Understanding how this process differs from L1 labeling could have important implications for fields including second language acquisition, cross-cultural communication, and even marketing in global markets. To investigate these questions, the present study tracked the implicit associations formed between hard-to-name stimuli and verbal labels varying as a function of the participants’ previous linguistic experience. The labels either conformed to phototactic rules characterizing the participants’ first language, or second language, or were from a completely unfamiliar language. Participants were trained to categorize smells.

Three reasons motivated the choice of olfaction as the target modality: its high and often overlooked ecological validity, generally low effability that makes smells hospitable to various labeling configurations, and a well-established testbed showing that words make a difference to how odors are judged and perceived. Among vision, hearing, taste, touch, and smell, the latter often receives the least attention, despite its significant role in daily life. The significance of smell goes beyond mere sensory perception, serving to provide information about the surrounding environment. The loss of smell can act as an indicator of underlying health issues. For instance, olfactory dysfunction has been associated with obesity (Peng et al., Reference Peng, Coutts, Wang and Cakmak2019), and is often an early symptom of neurodegenerative conditions such as Alzheimer’s disease (Velayudhan et al., Reference Velayudhan, Pritchard, Powell, Proitsi and Lovestone2013) and Parkinson’s disease (Haehner et al., Reference Haehner, Boesveldt, Berendse, Mackay-Sim, Fleischmann, Silburn, Johnston, Mellick, Herting, Reichmann and Hummel2009). More recently, smell loss has been identified as a key symptom of COVID-19 (Lechien et al., Reference Lechien, Chiesa-Estomba, De Siati, Horoi, Le Bon, Rodriguez, Dequanter, Blecic, El Afia, Distinguin, Chekkoury-Idrissi, Hans, Delgado, Calvo-Henriquez, Lavigne, Falanga, Barillari, Cammaroto, Khalife, Leich and Saussez2020). As for effability, other than languages that have more elaborate smell vocabularies, such as Jahai (Malaysia) with 12 basic smell terms (Burenhult & Majid, Reference Burenhult and Majid2011) and Cha’palaa (Ecuador) with 15 (Floyd et al., Reference Floyd2018), consistency in the usage of smell labels is low overall in most languages (Majid et al., Reference Majid, Roberts, Cilissen, Emmorey, Nicodemus, O’grady and Levinson2018), although not all (e.g., Majid & Burenhult, Reference Majid and Burenhult2014; Majid & Kruspe, Reference Majid and Kruspe2018). The low codability of smells is an advantage if the goal is to explore learning odor categories with linguistic labels (Vanek et al., Reference Vanek, Sóskuthy and Majid2021).

Two main research questions fueled this study. The first question delved into the extent to which language facilitates odor category formation in a yet untested group of participants with Chinese as L1 and English as L2. The hypothesis was that L1-like linguistic labels would propel the learning of new olfactory categories more robustly than a condition without language. It was predicted participants trained with L1-like labels would exhibit superior performance in odor categorization accuracy as well as learning speed compared to participants trained with just olfactory stimuli. The second question explored whether the nature of verbal labels impacts the trainability of odor categorization. The hypothesis was that the more familiar the label the better suited it would be to support odor category learning. Namely, we predicted that Chinese speakers of L2 English co-exposed to odors paired with Chinese-like labels would learn odor categories better and faster than those co-exposed to odors paired with English-like labels, who in turn we predicted to learn better and faster than those co-exposed to odors paired with completely unfamiliar sounding labels (in this case Georgian pseudowords). Negligible differences were expected between the group with unfamiliar-sounding Georgian pseudoword labels and the control group with no labels.

2. Methodology

2.1. Participants

Eighty Chinese-English bilinguals (with Chinese as L1 and English as L2), without olfactory impairments, constituted the participant pool. They were randomly assigned to one of the four groups with 20 individuals each. This sample size aligns with established studies in the area (e.g., Miller et al., Reference Miller, Schmidt, Blankenburg and Pulvermüller2018; Vanek et al., Reference Vanek, Sóskuthy and Majid2021). However, we acknowledge that with four groups, the sample size per group is relatively small, which may limit the statistical power to detect smaller effects. Participants needed to be Chinese native speakers with high proficiency in English, certified via CET-6, TEM-8, or IELTS ≥6.5 or equivalent. Another inclusion criterion was complete unfamiliarity with the Georgian language. Recruitment focused on Chinese university students majoring in English based in China.

Participants underwent a comprehensive screening process involving a self-reported questionnaire inspired by Manescu et al. (Reference Manescu, Frasnelli, Lepore and Djordjevic2014), Vanek et al. (Reference Vanek, Sóskuthy and Majid2021), and also used in Cao et al. (Reference Cao, Majid and Vanek2024). This questionnaire covered aspects such as olfactory function, language proficiency, and educational background. The inclusion criteria were the absence of nasal or nervous system diseases affecting olfactory function, no illnesses impacting olfaction, no use of drugs, alcohol, or cigarettes, and no eating or use of aromatic products one hour before the test. Prior to the formal experiment, participants also completed an odor discrimination ability test using the Sniffin’ Sticks kit (Hummel et al., Reference Hummel, Kobal, Gudziol and Mackay-Sim2007). Participants needed 10 or more correct answers out of 16 to pass the discrimination ability test and qualify for entry into the main experiment. After being randomly allocated to one of the four groups (with Chinese pseudoword labels, English pseudoword labels, Georgian pseudoword labels, or the Control group without any verbal labels) all participants took part in four-day triads-matching training, following the perceptual learning design in Vanek et al. (Reference Vanek, Sóskuthy and Majid2021).

2.2. Olfactory stimuli

This study used odor pens from the Burghart Sniffin’ Sticks collection (Hummel et al., Reference Hummel, Sekinger, Wolf, Pauli and Kobal1997). Following the methodology of Vanek et al. (Reference Vanek, Sóskuthy and Majid2021), eighteen Sniffin’ Sticks were chosen as the olfactory stimuli and organized into six triplets. Each triplet comprised two test odors (Odor A and Odor B) and one reference odor (Odor X), with the reference odor always presented last following the two test odors. This triplet design aimed to minimize ‘internal’ differences within a triplet while maximizing ‘external’ differences between triplets. The selection of these eighteen odor pens was informed by previous research (Vanek et al., Reference Vanek, Sóskuthy and Majid2021), ensuring optimized internal similarity of triplets. It is important to note that while these odors are part of the Sniffin’ Sticks collection, they represent a subset that differs from the original 16-item identification test described in Hummel et al. (Reference Hummel, Sekinger, Wolf, Pauli and Kobal1997). The six triplets are shown in Figure 1. An additional odor triplet, ginger-anise-coke, was employed for task illustration purposes before the formal experiment.

Figure 1. Six odor triplets used in the experiment. (NOTE: Pictures and real-world labels are provided for illustration purposes only. The actual stimuli were odors and pseudowords).

2.3. Verbal stimuli

In total, fifty-four phonologically correct but meaningless pseudowords served as verbal stimuli. These comprised eighteen Chinese pseudowords, eighteen English pseudowords, and eighteen Georgian pseudowords. All pseudowords were presented in auditory form. A native Chinese speaker read aloud and recorded the Chinese pseudowords, a native English speaker performed the same for the English pseudowords, and a native Georgian speaker did so for the Georgian pseudowords. The audio files of all pseudowords used in the present study, as well as the odor categorization data, are available on the project website at https://osf.io/7m58j/.

2.3.1. English pseudowords

Eighteen English bisyllabic pseudowords served as verbal stimuli in the English group. These pseudowords were selected from a well-established collection (Rastle & Coltheart, Reference Rastle and Coltheart2000), bearing in mind that they should not evoke olfactory associations. The chosen English pseudowords were vebous, nurhact, bomegoze, difboze, zabnart, neethime, quimhet, bimgant, famwise, tozkolt, gantmirt, hochic, holpbon, goonoze, vurtless, bitjeed, doomipe, and kabist.

2.3.2. Chinese pseudowords

Chinese pseudowords consist of two characters, following the logic that two-character words constitute as much as 74% of the Chinese vocabulary (Modern Chinese Frequency Dictionary, Reference Wang1986). The construction of Chinese pseudowords involved pairing two existing Chinese words, each with two characters, and forming a new word by combining the first character of the first word with the second character of the second word. Three potential situations resulting from the composition of two-character Chinese pseudowords, as outlined by Xiao et al. (Reference Xiao, Zhang, Wang, Wu, Hu, Weng and Tan2005), were considered: (1) The new word itself had meaning (Figure 2a); (2) The new word itself had no meaning, and there was no other meaningful word with the same pronunciation (Figure 2b); (3) The new word itself had no meaning, but there was another meaningful word with the same pronunciation (Figure 2c). Given that the verbal labels were presented auditorily, the pronunciation of the new words could potentially lead participants to associate them with meaningful words. Consequently, only pseudowords falling into the second category (Figure 2b), where the new word itself had no meaning, and there was no other meaningful word with the same pronunciation, were deemed suitable for use in the experiment. Following these criteria, eighteen Chinese two-character pseudowords were generated (Figure 3). These were 队铁, 领荣, 鞋睛, 玻话, 火市, 电去, 脚台, 庭服, 天泡, 条钮, 方念, 宁局, 过体, 方度, 尺脑, 成造, 环息, 坦讲. Two Chinese native speakers independently verified that the chosen pseudowords were meaningless.

Figure 2. Composition processes of two-character Chinese pseudowords. (a) One excluded scenario illustrates the combination of the first character from a two-character Chinese word (e.g., the initial character in the Chinese word for “telephone”) with the second character from another two-character Chinese word (e.g., the second character in the Chinese word for “pool”). This combination results in the formation of a new real word (e.g., the newly created word in Chinese meaning “battery”). (b) Example of an acceptable scenario of composing a two-character Chinese pseudoword through combining the first character from a real two-character Chinese word (e.g., the initial character in the Chinese word for “glass”) with the second character from another two-character Chinese real word (e.g., the second character in the Chinese word for “telephone”). This combination results in the creation of a new, meaningless word (e.g., the newly formed word 玻话), and importantly, there is no other meaningful word that would have the same pronunciation as this new word. (c) Another excluded scenario involves combining the first character from one two-character Chinese word (e.g., the initial character in the Chinese word for “temporary”) with the second character from another two-character Chinese word (e.g., the second character in the Chinese word for “homework”). This combination results in the creation of a new, meaningless word (e.g., the newly formed word 临业). However, the pronunciation of 临业 is the same as 林业 (forestry), a meaningful word, rendering the new word unsuitable for use in this experiment.

Figure 3. The full set of eighteen Chinese two-character pseudowords used in the current experiment.

2.3.3. Georgian pseudowords

To investigate the potential influence of language familiarity on learning associations between verbal labels and odors, we included a third language, Georgian, which was unfamiliar to all participants. The aim was to include phonotactics of a language that does not share a language family either with English (Indo-European) or with Chinese (Sino-Tibetan), hence we chose Georgian (Kartvelian language family).

Eighteen Georgian bisyllabic pseudowords were created for the experiment. The list comprised these pseudowords: ლინერს, ადწოჯდ, პოწმლისტ, ნილხმეწ, ბვუხბე, ნბხილეწს, ჯლიკნვსე, ილჯსკემ, ადმსონ, ქწიხვდე, ნლოწალ, ჯიდლოსწ, აპპლეჯ, ქიწომცხ, ერტდრეფ, ყომნვეწ, იწუტყ, and რაბილტ. Two native speakers of Georgian independently verified that none of the pseudowords are meaningful.

2.4. Procedure

Initially, participants completed a self-reported questionnaire and the odor discrimination task using the Sniffin’ Sticks test, which lasted approximately 20 minutes. Those who met the inclusion criteria were instructed to proceed to the next step, the four-day odor categorization training. The categorization experiment was programmed using Praat (Boersma & Weenink, Reference Boersma and Weenink2014).

Prior to smelling each odor pen, participants in the verbal groups (Chinese, English, Georgian) heard a pseudoword played out loud. They were asked to pay attention to both the odors and the words. The Control group was trained without verbal labels. To illustrate the procedure in the verbal groups, in Triplet 1, participants heard pseudoword A for 2 seconds, followed by the presentation of Odor A for 6 seconds. The same pattern was followed for pseudoword B and Odor B, and pseudoword X and Odor X. Following every triplet there was a question “Which odor (Odor A or Odor B) is more similar to Odor X?.” Participants needed to select one answer, A or B, after which they received feedback on their answer on the screen in the form of a green tick or a red cross displayed for 1 second. Importantly, the feedback was based on a pseudo-random allocation of A-X and B-X pairings, not on inherent similarities between the odors.

Participants were trained on six triplets in a block, four blocks a day. After each block, a short break ensued, with the overall accuracy for the given block shown on the screen. This odor categorization training was conducted for four consecutive days. In total, participants were trained using twenty-four triplets (four blocks) per day, adding up to ninety-six triplets for the entire experiment. Each participant spent approximately 30 minutes in a session each day. During the training phase, participants engaged in the task with smell-label pairings that were either consistent in 81.25% of instances or inconsistent in 25.00% of instances. Following the four-day training period, participants took an odor categorization test (Vanek et al., Reference Vanek, Sóskuthy and Majid2021). This test was the same for each participant as their odor categorization task of four blocks from Day 1. An important difference in the test stage was the absence of labeling. The odor categorization test took approximately 30 minutes to complete. The entire experiment took each person approximately two hours and forty-five minutes.

2.5. Counterbalancing and randomization

The design followed four levels of counterbalancing and randomization. First, the pairing of odor and label was randomized for each participant. Second, the presentation order of the triplets varied between different blocks. For instance, in Group 1, on Day 1, the presentation order was Triplet 1–2–3-4-5-6 (Block 1), Triplet 2–3–4-5-6-1 (Block 2), Triplet 3–4–5-6-1-2 (Block 3), and Triplet 4–5–6-1-2-3 (Block 4). This pattern rotated for each subsequent day. Third, the presentation order of the three odors in a triplet differed between blocks and groups. For Group 1 and Group 2, the presentation order was Odors A-B-X (Block 1), Odors B-A-X (Block 2), Odors A-B-X (Block 3), and Odors B-A-X (Block 4). For Group 3 and Group 4, the presentation order was Odors B-A-X (Block 1), Odors A-B-X (Block 2), Odors B-A-X (Block 3), and Odors A-B-X (Block 4). Fourth, the correct answer (i.e., whether A or B was more similar to X) was pseudo-randomly determined for each triplet. For each triplet, 50% of participants received feedback that Odor A was more like Odor X and the other 50% received feedback that Odor B was more like Odor X. This means that the feedback (correct/incorrect) was based on this pre-determined allocation, not on any inherent similarity between the odors.

3. Results

3.1. Learning gains

The improvement in the accuracy of odor categorization was assessed by calculating the difference between the percentage of participants’ correct responses in the test phase and the percentage of correct responses in their very first session (Vanek et al., Reference Vanek, Sóskuthy and Majid2021). Changes in odor categorization accuracy were examined for each of the four conditions, i.e., without language labels (Control condition), with Chinese pseudoword labels (Chinese condition), with English pseudoword labels (English condition), and Georgian pseudoword labels (Georgian condition). At the start, comparable chance level performance characterized accuracy scores in the first session across groups, (MCON = 48.13%, SD = 14.01, MGEO = 47.29%, SD = 9.59, MCHI = 44.79%, SD = 8.54, MENG = 44.17%, SD = 12.85). These descriptive statistical results show that numerically, between-group differences were initially negligible (MCON-MGEO = 0.84%, MCON-MCHI = 3.34%, MCON-MENG = 3.96%).

Between-group differences were observed in the test phase. In terms of numerical changes in odor categorization accuracy, Figure 4 shows that the Georgian group was the highest scorer (MGEO = 76.67%, SD = 15.44), followed by the English group (MENG = 69.17%, SD = 20.70), then the Chinese group (MCHI = 68.33%, SD = 16.13), and the Control group (MCON = 68.13%, SD = 20.51). Compared with the Control group, test phase accuracy of the Chinese and English groups differed minimally (MENG-MCON = -1.04%, MCHI-MCON = 0.2%), however, the odor categorization accuracy of the Georgian group exhibited a more pronounced increase (MGEO-MCON = 8.54%). In terms of learning gains (test score minus first session score), the following trends have been observed: Georgian group (MGEO = 29.38%, SD = 17.65), English group (MENG = 25.00%, SD = 24.67), Chinese group (MCHI = 23.54%, SD = 17.22), and Control group (MCON = 20.00%, SD = 21.44).

Figure 4. Learning gains in odor categorization across groups (Georgian, English, Chinese, Control) calculated by subtracting each participants’ odor categorization accuracy score from Session 1 from their score in the test after the completion of training. The final test was conducted without verbal labels. The dashed line marks the mean of the Control group.

To assess whether these differences were statistically significant, we next investigated the relationship between group and odor categorization accuracy through logistic mixed-effects modeling. We used R (version 4.1.3, Development Core Team, 2021) and the lme4 package (Baayen et al., Reference Baayen, Davidson and Bates2008) for the analyses. All data are available at https://osf.io/7m58j/. Our focus here was on a subset of data, namely responses gathered during Session 1 and the Test. The model included response accuracy as a binary outcome variable (accurate versus inaccurate). Fixed-effect variables included group (Control, Chinese, English, versus Georgian), day (Session 1 versus Test), and their interaction. Both group and day were contrast-coded. Random effects comprise random slopes over the session by the participant and random slopes over the session, condition, and their interaction by item. The lme4 formula was accuracy ~1 + group × day + (1 + day | participant) + (1 + group × day | item).

The main effects of group and day, along with their interaction, were examined. Six sets of comparisons were conducted (Control versus Chinese, Control versus Georgian, Control versus English, Chinese versus Georgian, Chinese versus English, Georgian versus English). When examining the Chinese group and the Control group, there was a significant overall increase in accuracy from Session 1 to the Test session (β = 0.986, SE = 0.299, z = 3.301, p < 0.001). The Chinese group was slightly more accurate than the Control group, but the difference was not statistically significant (β = −0.131, SE = 0.180, z = −0.727, p = 0.467). The interaction term (GROUP = control × DAY = test) did not reach significance either (β = 0.103, SE = 0.333, z = 0.308, p = 0.758). These results do not support our hypothesis that adding Chinese pseudowords would facilitate odor category learning and result in greater accuracy gains compared to the absence of verbal labels. In the Control group, accuracy improved from 48% to 68% between Session 1 and the Test session, while in the Chinese group, the improvement was from 45% to 68%.

For the Georgian and Control group comparison, there was a significant overall increase in accuracy from Session 1 to the Test session (β = 1.004, SE = 0.313, z = 3.207, p = 0.001). The Georgian group was more accurate than the Control group, but the difference was not statistically significant (β = −0.023, SE = 0.231, z = −0.101, p = 0.920). The interaction term (GROUP = Georgian × DAY = test) did not reach significance either (β = 0.490, SE = 0.399, z = 1.228, p = 0.219). The interaction term was added following the prediction that unfamiliar pseudowords would support the process of learning odor categories substantially more than learning without words. Results showed that within the Georgian group, accuracy increased from 47% to 77% between Session 1 and the Test session.

For the English and Control group comparison, there was a significant overall increase in accuracy from Session 1 to the Test session (β = 1.026, SE = 0.344, z = 2.984, p = 0.003). The English group was more accurate than the Control group, but the difference was not statistically significant (β = −0.156, SE = 0.257, z = −0.608, p = 0.543). The interaction between the group and day did not reach significance either (β = 0.244, SE = 0.456, z = 0.536, p = 0.592), indicating no significant difference in the rate of accuracy improvement between the English group and the Control group. Accuracy within the English group rose from 44% to 69% between Session 1 and the Test session, yet learning odor categories with English pseudowords did not contribute to accuracy increases that would significantly differ from learning in the absence of verbal input.

The Chinese-Georgian comparison yielded similar results. There was a significant overall increase in accuracy from Session 1 to the Test session (β = 1.075, SE = 0.251, z = 4.292, p < 0.001), but participants in the Georgian group did not exhibit significantly higher accuracy than the Chinese group (β = 0.103, SE = 0.138, z = 0.744, p = 0.457). The interaction between the group and day did not reach significance (β = 0.375, SE = 0.319, z = 1.178, p = 0.239). This absence of an interaction contradicted our initial hypothesis that familiar vs unfamiliar phonotactics would impact associative learning of odor-label correspondences.

Similar results characterized the Chinese-English comparison. The overall increase in accuracy from Session 1 to the Test session was significant (β = 1.095, SE = 0.248, z = 4.413, p < 0.001), but participants in the Chinese group were not significantly less accurate than the English group (β = −0.027, SE = 0.152, z = −0.178, p = 0.859). The group-by-day interaction was not significant either (β = 0.134, SE = 0.354, z = 0.379, p = 0.705). This result is in line with the hypothesis that learning odor categories with first-language labels and with second-language labels is comparable.

Also, for the Georgian-English comparison, the overall increases in accuracy differed significantly between Session 1 and the test (β = 1.253, SE = 0.272, z = 4.607, p < 0.001), but between-group accuracy changes did not (β = 0.129, SE = 0.146, z = 0.882, p = 0.378). The group-by-day interaction did not reach significance (β = 0.263, SE = 0.385, z = 0.682, p = 0.495), which runs counter to the hypothesis that variation in labels (L2 vs unfamiliar) would differentially affect associative learning of odor-label correspondences.

To check whether differences in olfactory function played a role in learning gains, or more precisely in the changes of odor categorization accuracy because not all participants’ accuracy improved, we calculated Pearson correlations between participants’ pre-test scores (Sniffin’ Sticks test results) and their changes in odor categorization accuracy. As shown in Figure 5, the correlations were not statistically significant in any of the four groups (Control group: r(18) = 0.33, p = 0.16; Chinese group: r(18) = 0.11, p = 0.64; English group: r(18) = −0.24, p = 0.31; Georgian group: r(18) = 0.23, p = 0.33. In summary, improvements (or decrements) in categorization accuracy did not significantly correlate with participants’ ability to discriminate odors across any of the four groups (p > 0.05).

Figure 5. Correlation plots showing the absence of a significant relationship between participants’ initial odor discrimination ability and their learning gains across the four groups.

3.2. Learning trajectories

Participants’ odor categorization learning trajectories were examined in three steps, each with its own aim. The first aim is to explore the development of accuracy for each odor triplet across different sessions through a line graph (Figure 6) as well as to show the development of accuracy collapsed for all triplets across different sessions (Figure 7). The second aim is to zoom in on individual differences by means of scatter plots (Figure 8). The third aim is to statistically account for potential nonlinearities in accuracy scores in a perceptual learning task via analyses using Generalized Additive Mixed Modeling (GAMM) (Baayen et al., Reference Baayen, Vasishth, Kliegl and Bates2017) (Figure 9).

Figure 6. Line graphs displaying changes in categorization accuracy over four training sessions and the Test session per triplet. Average accuracy scores are shown separately for the Chinese (red), Control (green), English (blue), and Georgian (orange) groups. Categorization in the Test session was without verbal labels, unlike in the four preceding training sessions. The dashed horizontal line indicates chance performance at 50%. The six odor triplets left-to-right are (top): banana-pear-pineapple, caramel-coconut-coke, eucalyptus-peppermint-grass, (bottom): leather-smoked meat-mushroom, lilac-lavender-rose, peach-melon-raspberry.

Figure 7. Changes in accuracy collapsed for all triplets across sessions.

Figure 8. Development of odor categorization accuracy per participant across four training sessions and the test.

Figure 9. Visualized output of a Generalized Additive Mixed Model (GAMM) used to assess potential nonlinearities in categorization accuracy across groups (Control, Chinese, English, Georgian). Lines show the predicted accuracy, and the shaded areas are the 95% confidence intervals.

In an exploratory and descriptive analysis, we examined changes in categorization accuracy per triplet. While we observed some variation, the overall trend suggested a nominal increase in accuracy scores as the training sessions progressed. In the test phase, we observed that the Georgian group appeared to have higher accuracy scores compared to the other three groups. The Chinese and English groups showed substantial variation in accuracy across different odor triplets compared to the Control group. However, these differences were not tested for statistical significance and may be due to chance. Looking at the test scores only, pineapple-banana-pear and lilac-lavender-rose presented the greatest challenge for the Chinese group, while for the English group, the most difficult triplets were mushroom-smoked meat-leather and coke-caramel-coconut, and the Control group had the lowest accuracy scores for grass-peppermint-eucalyptus and peach-melon-raspberry.

Regarding the magnitude of learning gains, we observed varying patterns across groups and odor triplets. For instance, in the Georgian group, we noted a nominal improvement in odor categorization accuracy for the mushroom-smoked meat-leather (MTEST-MS1 = 38.75%), while the lilac-lavender-rose showed the smallest nominal increase (MTEST-MS1 = 16.25%). In the Chinese group, the odor triplet grass-peppermint-eucalyptus appeared to have the largest increase (MTEST-MS1 = 37.5%), while the lilac-lavender-rose showed the smallest (MTEST-MS1 = 6.35%). The English group seemed to have different patterns, with grass-peppermint-eucalyptus showing the largest nominal increase (MTEST-MS1 = 32.5%) and peach-melon-raspberry the smallest (MTEST-MS1 = 16.25%). For the Control group, we observed the largest nominal increase in the triplet coke-caramel-coconut (MTEST-MS1 = 45%), and the smallest in peach-melon-raspberry (MTEST-MS1 = 5%). It is noteworthy that, in the triplet lilac-lavender-rose, the Control group appeared to show an increase in correct response rates on the second day compared to the first day, while the other three groups seemed to display a U-shaped dip in accuracy on the second day (Figure 6).

As Figure 7 shows, the Control group started with slightly higher accuracy scores in both Session 1 and Session 2 compared to the other groups. However, this advantage disappeared from Session 3 onwards, resulting in the lowest accuracy in the subsequent test stage. Comparing the Chinese-English developmental trajectories, in Session 1 the Chinese group exhibited a slight accuracy advantage over the English group, which further increased in Session 2. However, in Session 3 the accuracy of the English group showed a sharp increase, surpassing all other groups. The English group continued to maintain some advantage over the Chinese group in Session 4 and the test stage. In contrast, the accuracy scores in the Georgian group showed a stable increase from Session 1 to Session 3, without any notable advantage. However, starting from Session 4, the accuracy of the Georgian group began to surpass that of the other three groups, reaching its maximum advantage in the test stage. What stands out is that only the accuracy rate of the Control group exhibited a more robust increase in the second session. The three verbal groups demonstrated a comparatively smaller increase in accuracy in the second session. However, progress in the verbal groups might have been slower but it turned out to be more consistent and after the fourth session all verbal groups’ accuracy increases surpassed those of the Control group.

Turning to individual differences in the learning trajectories, after four days of intensive training, 87.5% of all participants showed an improvement in the accuracy of odor categorization in the test. Whereas 3.75% of the participants showed no change in accuracy, the remaining 8.75% exhibited a decline relative to their first-day scores. In the Control group, 79.17% of the participants improved their odor categorization accuracy, and 20.83% showed either no change or decreased accuracy. Variation was substantial, the best-performing participant improved by 62.5%, and the worst-performing participant’s accuracy decreased by 16.67%. In the Chinese group, 91.67% of the participants demonstrated an enhancement in their ability to categorize odors accurately, while 8.33% experienced no change or a decrease in accuracy. Among these participants, the highest-performing individual improved by 62.5%, and on the opposite end, there was a case with an 8.33% decrease in accuracy. In the English group, 91.67% of the participants improved their odor categorization accuracy, while 8.33% showed no change or experienced a decline in accuracy, which is the same proportion as in the Chinese group. Among these participants, the highest-performing individual improved by 66.67%, whereas the greatest drop in accuracy was at 16.67%. The greatest increases on average were observed in the Georgian group, where 95.83% of the participants improved their odor categorization, while only 4.17% showed no change or a decrease in accuracy. Among them, the top performer improved by 62.5% improvement compared to their first day, and the worst performer’s accuracy decreased by 8.33%.

Finally, to examine the nonlinear learning trajectories over time more closely, generalized additive mixed modeling (GAMM) plots were used (Baayen et al., Reference Baayen, Vasishth, Kliegl and Bates2017; Sóskuthy, Reference Sóskuthy2017; Vanek et al., Reference Vanek, Sóskuthy and Majid2021). The results from the GAMM predictions are visualized in Figure 9. The model predicted the Georgian group’s accuracy (MS1 = 47.29%, SD = 9.59; MS2 = 51.46%, SD = 16.68; MS3 = 58.54%, SD = 16.25; MS4 = 65.83%, SD = 20.62, Test M = 76.67%, SD = 15.44) to steadily increase and surpass the other groups from Session 3 onwards. The trajectories of the Chinese group (MS1 = 44.79%, SD = 8.54; MS2 = 48.75%, SD = 12.47; MS3 = 56.88%, SD = 14.20; MS4 = 61.46%, SD = 15.29, Test M = 68.33%, SD = 16.13) and the English group (MS1 = 44.17%, SD = 12.85; MS2 = 46.88%, SD = 9.35; MS3 = 59.38%, SD = 12.23; MS4 = 63.96%, SD = 16.18, Test M = 69.17%, SD = 20.70) were predicted to be similar and show substantial overlaps across sessions, with the English group exhibiting a slightly more pronounced increase than the Chinese group towards the end. Accuracy scores in the Control group (MS1 = 48.13%, SD = 14.01; MS2 = 56.46%, SD = 14.90; MS3 = 58.33%, SD = 15.41; MS4 = 64.58%, SD = 15.08, Test M = 68.13%, SD = 20.51) started as the highest but slowed down most from the third day onwards.

Comparing the Chinese and Control developmental trajectories statistically, the discrepancy in learning curves did not reach significance based on a chi-squared test applied to the corresponding smooths of the estimated inter-condition differences (estimated df = 1.000; χ2 = 0.180; p = 0.671). Similarly, when statistically checking the Georgian and Control groups, the variance in learning curves was not statistically significant in a chi-squared test on the corresponding smooths of the estimated differences between these conditions (estimated df = 1.764; χ2 = 3.600; p = 0.165). Following the same procedure, the English group’s and the Control group’s learning curves did not significantly differ either (estimated df = 1.000; χ2 = 0.714; p = 0.398).

3.3. Bayes factor analysis

To complement our earlier analyses of learning gains across different conditions (Control, Chinese, Georgian, and English), we calculated Bayes factors (BFs) (Wagenmakers et al., Reference Wagenmakers, Wetzels, Borsboom and van der Maas2011; Wagenmakers et al., Reference Wagenmakers, Love, Marsman, Jamil, Ly, Verhagen and Morey2018) to examine the extent to which our data in each comparison support the null hypothesis. This approach minimizes the risk of misinterpretation inherent in relying solely on p > 0.05 to support the null hypothesis. The brms package (Bürkner, Reference Bürkner2017, Reference Bürkner2018, Reference Bürkner2021) was employed, providing an interface for fitting Bayesian generalized (non-)linear multivariate multilevel models using Stan (Carpenter et al., Reference Carpenter, Gelman, Hoffman, Lee, Goodrich, Betancourt, Brubaker, Guo, Li and Riddell2017), a C++ package for comprehensive Bayesian inference (see https://mc-stan.org/). Bayes factors (BF10) were calculated using the brms package, representing the comparison between H1 and H0. The default configuration of this package employs the normal distribution (Dienes, Reference Dienes2014) as the prior distribution for fixed effects, and it adopts the half-Cauchy distribution (Bürkner, Reference Bürkner2017) as the prior distribution for random effects.

Following Lee and Wagenmakers (Reference Lee and Wagenmakers2014), Bayes factors were categorized into evidence levels: BF10 ≤ 1/100 indicates extreme evidence for H0; 1/100 < BF10 ≤ 1/30 signifies very strong evidence for H0; 1/30 < BF10 ≤ 1/10 indicates strong evidence for H0; 1/10 < BF10 ≤ 1/3 represents moderate evidence for H0; and 1/3 < BF10 ≤ 3 suggests no clear evidence for either H0 or H1 (indicating similar degrees of support for both hypotheses). The obtained Bayes factors for the various comparisons were as follows: Control versus Chinese (BF10 = 0.14995), indicating moderate evidence for H0; Control versus Georgian (BF10 = 0.44573), suggesting a lack of discernible evidence in favor of either H0 or H1; Control versus English (BF10 = 0.23949), signifying moderate evidence for H0; Chinese versus Georgian (BF10 = 0.24932), revealing moderate evidence for H0; Chinese versus English (BF10 = 0.15904), demonstrating moderate evidence for H0; and Georgian versus English (BF10 = 0.20594), pointing to moderate evidence for H0.

This result for the Control versus Georgian comparison stands out from the others, as it falls in the range where we cannot confidently support either the null hypothesis (no difference between conditions) or the alternative hypothesis (there is a difference). This suggests that while most comparisons show moderate evidence for no difference between conditions, the potential difference between the Control and Georgian conditions remains inconclusive based on our data. These results largely align with our earlier findings from traditional statistical analyses, providing additional support for the lack of significant differences between most conditions in terms of learning gains.

4. Discussion

4.1. Learning new categories with and without labels

Two key findings emerged from this study: new odor categories are formed in the presence as well as in the absence of verbal labels, and greater familiarity with the type of linguistic labeling does not seem to bestow processing advantages for categorization, neither in terms of final accuracy scores nor learning trajectories. In relation to previous research, studies looking at the impact of verbal labels on category formation in the visual domain reported facilitation of response accuracy with verbal labels present compared to a condition when categories had to be learned without verbal labels (Lupyan et al., Reference Lupyan, Rakison and McClelland2007; Lupyan & Casasanto, Reference Lupyan and Casasanto2015). This type of verbal facilitation effect did not replicate in the present design with olfactory stimuli. Multiple reasons come to mind. First, the surveyed studies from the visual domain followed 100% label-percept consistency, whereas the design of this study reduced labeling consistency to 81%, following Vanek et al. (Reference Vanek, Sóskuthy and Majid2021). It is likely that reduction in pairing consistency weakened the label-percept associations and destabilized the experiential context to an extent that word-based cuing became comparable to sorting based merely on memorized perceptual features. Alternative accounts, more closely linked to odor processing and verbal labels, could be that the neural codes for olfaction and language are likely to interfere (Lorig, Reference Lorig1999) or that the olfactory (piriform) cortex is too close to the more ‘dominant’ cortical regions for language, which is why odors may lack fine-grained representations during label-percept integration (Olofsson & Gottfried, Reference Olofsson and Gottfried2015).

In relation to the verbal labeling advantage documented in associative learning studies that targeted tactile discrimination ability (Miller et al., Reference Miller, Schmidt, Blankenburg and Pulvermüller2018), odor categorization (Vanek et al., Reference Vanek, Sóskuthy and Majid2021) or motion event categorization (Vanek, Reference Vanek2019), the findings of the present study are not in a position to either confirm or refute such an advantage. In terms of learning gains, all three groups that categorized odors with verbal labels exhibited greater improvements compared to the control group, suggesting that the involvement of language could to some extent enhance the learning process and contribute to the formation and retention of the target categories. Linguistic enhancement of category learning can be viewed as a process of optimizing attention to focus on perceptual features that are essential for category formation (Lupyan et al., Reference Lupyan, Rakison and McClelland2007). On top of attentional tuning, errors get reduced as categorization takes place with feedback. Higher accuracy of categorization with labels across the verbal groups compared to categorization without verbal labels offers some signal of attentional tuning, but caution is due as the label/no-label differences did not reach statistical significance in the present study. What can account for this mismatch with previous findings in the field? The critical reader might question whether a task to implicitly learn associations between linguistic labels with percepts is directly comparable to a non-associative and arguably easier task to categorize percepts on their own, in the absence of any verbal or other stimuli. Accuracy scores were initially lower in the bimodal context involving both olfactory stimuli and verbal labeling versus higher in the unimodal context (Control condition) with odorants only particularly early on during training (with rare drops in Session 2, Figure 7), exhibiting the added cognitive load attributable to labels when both olfactory and verbal processing are concurrently engaged. A control condition with nonverbal labels could, in future studies, provide an alternative baseline. Another methodological change that could be beneficial would be to compare a condition with odor-label pairings and a condition with picture-label or with vibration-label pairings. While the present study controlled for the comparability of indirect context-based grounding of the meaning of pseudowords across conditions, extracting meaning when verbal symbols/pseudowords are associated with ‘more concrete’ sensory information (object shape or pin vibrations) might be easier than with arguably ‘less concrete’ stimuli like odors, which are known to be generally hard to access from words (Speed & Majid, Reference Speed and Majid2018). One way to promote context-based distributional learning of odor-label pairs could be via up-regulation of their relevance in the input, for instance by asking participants to repeat the pseudoword out loud after the corresponding odor has been sniffed.

4.2. Ad hoc cognition and semantic binding of odors with new words

The findings do not align with the hypothesis that the presence of new labels would significantly contribute to the perceptual processes involved in odor category formation, at least when compared to a no-label context. This outcome provides an opportunity to critically examine the ad hoc cognition (AHC) framework and its predictions in the context of olfactory perception and linguistic labeling.

When nested within the ad hoc cognition framework, we find two complementary accounts informative to illuminate possible reasons behind this finding. AHC assumes the inseparability of concepts and categories from context, which shapes how these are instantiated across three timescales. These comprise the scale of milliseconds on the micro-level of every instantiation, the scale between two instantiations within an individual, and the scale between individuals and groups on the macro-level of experiential histories (Casasanto & Lupyan, Reference Casasanto, Lupyan, Margolis and Laurence2015). While the design of this study kept the scale between instantiations maximally controlled, the experiential histories of the tested group may not have been conducive enough to reinforce the importance of the link between odors and language.

This link may be critical, especially if one views linguistic experience as a conventionalized system guiding perception towards those aspects of concepts and categories that a particular language group finds collectively important (Slivac & Flecken, Reference Slivac and Flecken2023). Considering how robustly linguistic communities vary in the consistency of odor labeling (e.g. Majid & Burenhult, Reference Majid and Burenhult2014) a better control of the macro timescale may be necessary, for instance by comparing the performance of a language group with a high consistency in odor labeling, such as Jahai or Cha’palaa, with a group that exhibits low consistency in olfactory language, such as Mandarin or English.

However, this explanation also highlights a potential limitation of the AHC framework. The broad claim that concepts and categories are inseparable from context may be difficult to falsify, as any null result could potentially be attributed to some aspect of context not adequately accounted for. To strengthen the AHC framework, more precise predictions about which contextual factors should influence concept formation, and to what degree, are needed.

The timescale of milliseconds, when odors and labels are being integrated, could shed new light on the formation of lexical-olfactory links. Recent research has made significant strides in understanding this rapid process. For instance, Zhou et al. (Reference Zhou, Lane, Cooper, Kahnt and Zelano2019) used intracranial recordings to demonstrate that spoken odor words elicited olfactory cortex activity in approximately half a second, shortly after auditory cortex activity in response to the word was detected. This rapid activation suggests a strong, potentially automatic link between olfactory words and olfactory processing areas. However, the precise nature and implications of this rapid integration remain topics of ongoing research and debate.

In the context of the present study, time-sensitive measures such as olfactory event-related potentials (OERP; Krauel et al., Reference Krauel, Schott, Sojka, Pause and Ferstl1999) could provide valuable insights. OERPs could indicate, for instance, whether inconsistent odor-label pairings elicit a larger mismatch negativity as the training sessions unfold. While our behavioral measures showed no significant differences between verbal groups in odor categorization scores, it is possible that more sensitive neurophysiological measures could capture processing differences induced by participants’ familiarity with the labels’ phonotactic properties.

Another limitation of this study is the relatively small sample size per group (N = 20) when divided into four conditions. Our power analysis indicates that while we have good power (89%) to detect medium to large effects, we may have limited power to detect smaller effects. This could potentially explain why some of our results did not reach statistical significance. Future studies should consider increasing the sample size to improve statistical power, particularly if smaller effect sizes are anticipated.

4.3. On the role of familiarity with how labels sound

Contrary to our hypothesis that non-native sounding words would be less ready to assist listeners in forming new categories, the learning gains of the group with the least familiar sounding labels (Georgian) exceeded the learning gains of those with more familiar sounding labels (English), which in turn were higher than the gains of those with the most familiar sounding labels (Chinese). Even though the differences are not statistically as pronounced as predicted, the inverse order between our expectation and the findings merits reflection as the least phonetically familiar and culturally relevant language labels seem to have helped the most. Two phenomena may be at play. First, Chinese and English labels were modeled on L1 and L2 phonotactics respectively, which are familiar systems that could have automatically activated orthography (e.g. Taft et al., Reference Taft, Castles, Davis, Lazendic and Nguyen-Hoan2008). Activation of orthography could have further prompted associations with a relatively larger number of existing phonological and orthographic L1 and/or L2 neighbors, potentially complexifying odor-label mapping. In this sense, greater familiarity with the labels’ phonotactics could be disadvantageous. Conversely, no familiarity with Georgian phonotactics could have favored ‘clean-cut’, more association-free lexical-perceptual mapping between labels and odors based simply on (supra)segmental features such as intonation or phoneme sequences. Such surface associations could be more straightforward to process and potentially facilitate encoding in memory due to lower stimulus complexity (Pusch et al., Reference Pusch, Packheiser, Azizi, Sevincik, Rose, Cheng, Stüttgen and Güntürkün2023). A label recall task added at the debrief stage of the experiment could confirm whether this idea holds.

Second, a related reason for what appears like the inverse of a label familiarity effect predicted in this study could be that increases in familiarity with how labels sound may have driven participants to focus more on language labels, potentially diverting their attention from odor features. Conversely, an unfamiliar-sounding label could reduce distraction and free up more cognitive resources for more detailed odor inspection. Some support for variations in sound familiarity impacting cognitive task performance comes from prior research indicating that more familiar background music can be more distracting than less familiar background music (Komlao, Reference Komlao2018). This idea is consistent with the results showing that Georgian-like labeling with minimum familiarity ranked above English-like labeling with medium familiarity, which in turn ranked above Chinese-like labeling with most familiarity. Such a trend in the observed learning gains matches the view that odor categorization performance, particularly at the early stages of training, may have changed under the influence of familiarity with how the labels sound. Drops in accuracy in Session 2 for half of the triplets with L1/L2-like labeling (Figure 6) corroborate the view that when Chinese-English bilinguals first encountered L1-like and L2-like pseudowords, these might have initially distracted more than the unfamiliar-sounding pseudowords did. Added distraction could be attributed to increased effort to differentiate new from existing entries in the mental lexicon. In contrast, completely unfamiliar-sounding Georgian pseudowords might have been less distracting to begin with (Figure 6), leaving more mental capacity for olfactory memory. In future research, label norming with scores for how familiar the labels sound to participants could help probe this aspect further.

Taken together, the findings of this study showed that participants learn new categories of odors when verbal labels are introduced, but being more familiar with how specific linguistic labels sound does not appear to provide a processing benefit for the categorization task. The introduction of new labels did not notably enhance the perceptual mechanisms engaged in odor category formation ad hoc when compared with a context where no labels were used, and when examined on a timescale from one instantiation to the next within individuals. In a wider experimental context of the verbal labeling advantage often observed in associative learning studies, these findings highlight the fleeting nature of contextual effects when new words are more or less useful in the construction of categories.

Footnotes

1 These findings are also compatible with other learning theories, but we explore the relationship to AHC for the purposes of this special issue.

References

Althaus, N., & Plunkett, K. (2016). Categorization in infancy: Labeling induces a persisting focus on commonalities. Developmental Science, 19(5), 770780. https://doi.org/10.1111/desc.12358CrossRefGoogle ScholarPubMed
Armstrong, S. L., Gleitman, L. R., & Gleitman, H. (1983). What some concepts might not be. Cognition, 13(3), 263308. https://doi.org/10.1016/0010-0277(83)90012-4CrossRefGoogle ScholarPubMed
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390412. https://doi.org/10.1016/j.jml.2007.12.005CrossRefGoogle Scholar
Baayen, R. H., Vasishth, S., Kliegl, R., & Bates, D. (2017). The cave of shadows: Addressing the human factor with generalized additive mixed models. Journal of Memory and Language, 94, 206234. https://doi.org/10.1016/j.jml.2016.11.006.CrossRefGoogle Scholar
Barsalou, L. W. (1983). Ad hoc categories. Memory & Cognition, 11, 211227. https://doi.org/10.3758/BF03196968CrossRefGoogle ScholarPubMed
Barsalou, L. W. (1987). The instability of graded structure: Implications for the nature of concepts. In Neisser, U. (Ed.), Concepts and conceptual development: Ecological and intellectual factors in categorization (pp. 101140). New York: Cambridge University Press.Google Scholar
Barsalou, L. W. (1989). ‘Intra-concept similarity and its implications for inter-concept similarity’. In Vosniarlou, S. & Ortony, A. (Eds.) Similarity and analogical reasoning (pp. 76121). New York: Cambridge University Press.CrossRefGoogle Scholar
Barsalou, L. W. (1993). Flexibility, structure, and linguistic vagary in concepts: Manifestations of a compositional system of perceptual symbols. In Collins, A., Gathercole, S., Conway, M. and Morris, P. (Eds.), Theories of memory (pp. 29101). Hove: Lawrence Erlbaum Associates.Google Scholar
Boersma, P., & Weenink, D. (2014). Praat: Doing phonetics by computer. Computer program. Version 5.3.64, <http://www.praat.org/>..>Google Scholar
Burenhult, N., & Majid, A. (2011). Olfaction in Aslian ideology and language. The Senses and Society, 6(1), 1929. https://doi.org/10.2752/174589311X12893982233597CrossRefGoogle Scholar
Bürkner, P. C. (2017). brms: An R package for bayesian multilevel models using stan. Journal of Statistical Software, 80(1), 128. https://doi.org/10.18637/jss.v080.i01CrossRefGoogle Scholar
Bürkner, P. C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395411. https://doi.org/10.32614/RJ-2018-017CrossRefGoogle Scholar
Bürkner, P. C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software, 100(5), 154. https://doi.org/10.18637/jss.v100.i05CrossRefGoogle Scholar
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1). https://doi.org/10.18637/jss.v076.i01CrossRefGoogle Scholar
Cao, Y., Majid, A., & Vanek, N. (2024). Verbal overshadowing in odor recognition. Proceedings of the Annual Meeting of the Cognitive Science Society, 46. https://escholarship.org/uc/item/46j0b47pGoogle Scholar
Casasanto, D. (2011). Different bodies, different minds: The body specificity of language and thought. Current Directions in Psychological Science, 20(6), 378383. https://doi.org/10.1177/0963721411422058CrossRefGoogle Scholar
Casasanto, D. (2014). Bodily relativity. In Shapiro, L. (Ed.), The Routledge handbook of embodied cognition (pp. 108117). Routledge/Taylor & Francis Group.Google Scholar
Casasanto, D., & Lupyan, G. (2015). All concepts are ad hoc concepts. In Margolis, E. & Laurence, S. (Eds.), Concepts: New directions. Cambridge. MA: MIT Press.Google Scholar
Dienes, Z. (2014). Using Bayes to get the most out of nonsignificant results. Frontiers in Psychology, 5, 781. https://doi.org/10.3389/fpsyg.2014.00781CrossRefGoogle Scholar
Evans, V. (2009). How words mean. Oxford: Oxford University Press.CrossRefGoogle Scholar
Flege, J. E., & MacKay, I. R. A. (2004). Perceiving vowels in a second language. Studies in Second Language Acquisition, 26(1), 134. https://doi.org/10.1017/S0272263104261010CrossRefGoogle Scholar
Floyd, S., et al. (2018). Smell is coded in grammar and frequent in discourse: Cha’ palaa olfactory language in cross-linguistic perspective. Journal of Linguistic Anthropology, 28, 175196https://doi.org/10.1111/jola.12190CrossRefGoogle Scholar
Haehner, A., Boesveldt, S., Berendse, H. W., Mackay-Sim, A., Fleischmann, J., Silburn, P. A., Johnston, A. N., Mellick, G. D., Herting, B., Reichmann, H., & Hummel, T. (2009). Prevalence of smell loss in Parkinson’s disease–a multicenter study. Parkinsonism & Related Disorders, 15(7), 490494. https://doi.org/10.1016/j.parkreldis.2008.12.005CrossRefGoogle ScholarPubMed
Hinton, L., Nichols, J., & Ohala, J. J (Eds.). (2006). Sound symbolism. Cambridge University Press.Google Scholar
Hummel, T., Kobal, G., Gudziol, H., & Mackay-Sim, A. (2007). Normative data for the “Sniffin’ Sticks” including tests of odor identification, odor discrimination, and olfactory thresholds: An upgrade based on a group of more than 3,000 subjects. European Archives of Oto-Rhino-Laryngology, 264(3), 237243. https://doi.org/10.1007/s00405-006-0173-0CrossRefGoogle ScholarPubMed
Hummel, T., Sekinger, B., Wolf, S., Pauli, E., & Kobal, G. (1997). ‘Sniffin Sticks’: Olfactory performance assessed by the combined testing of odor identification, odor discrimination and olfactory threshold. Chemical Senses, 22(1), 3952. https://doi.org/10.1093/chemse/22.1.39CrossRefGoogle ScholarPubMed
Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford. https://doi.org/10.1093/acprof:oso/9780198270126.001.0001CrossRefGoogle Scholar
Komlao, N. (2018). Variations of auditory distractions: The effect of familiarity with background music on cognitive performance on the concept shifting task. https://norma.ncirl.ie/id/eprint/3289Google Scholar
Krauel, K., Schott, P., Sojka, B., Pause, B. M., & Ferstl, R. (1999). Is there a mismatch negativity analogue in the olfactory event-related potential? Journal of Psychophysiology, 13(1), 4955. https://doi.org/10.1027/0269-8803.13.1.49CrossRefGoogle Scholar
Langacker, R. (2008). Cognitive grammar: A basic introduction. New York: Oxford University Press.CrossRefGoogle Scholar
Lechien, J. R., Chiesa-Estomba, C. M., De Siati, D. R., Horoi, M., Le Bon, S. D., Rodriguez, A., Dequanter, D., Blecic, S., El Afia, F., Distinguin, L., Chekkoury-Idrissi, Y., Hans, S., Delgado, I. L., Calvo-Henriquez, C., Lavigne, P., Falanga, C., Barillari, M. R., Cammaroto, G., Khalife, M., Leich, P., … Saussez, S. (2020). Olfactory and gustatory dysfunctions as a clinical presentation of mild-to-moderate forms of the coronavirus disease (COVID-19): A multicenter European study. European Archives of Oto-Rhino-Laryngology: Official Journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS): Affiliated with the German Society for Oto-Rhino-Laryngology –Head and Neck Surgery, 277(8), 22512261. https://doi.org/10.1007/s00405-020-05965-1Google ScholarPubMed
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.CrossRefGoogle Scholar
Lorig, T. S. (1999). On the similarity of odor and language perception. Neuroscience and Biobehavioral Reviews, 23(3), 391398. https://doi.org/10.1016/s0149-7634(98)00041-4CrossRefGoogle ScholarPubMed
Lupyan, G., Rakison, D. H., & McClelland, J. L. (2007). Language is not just for talking: Redundant labels facilitate learning of novel categories. Psychological Science, 18(12), 10771083. https://doi.org/10.1111/j.1467-9280.2007.02028.xCrossRefGoogle Scholar
Lupyan, G., & Casasanto, D. (2015). Meaningless words promote meaningful categorization. Language and Cognition, 7(2), 167193. https://doi.org/10.1017/langcog.2014.21CrossRefGoogle Scholar
Majid, A., & Burenhult, N. (2014). Odors are expressible in language, as long as you speak the right language. Cognition, 130(2), 266270. https://doi.org/10.1016/j.cognition.2013.11.004CrossRefGoogle Scholar
Majid, A., Roberts, S. G., Cilissen, L., Emmorey, K., Nicodemus, B., O’grady, L., … Levinson, S. C. (2018). Differential coding of perception in the world’s languages. Proceedings of the National Academy of Sciences, 115(45), 1136911376. https://doi.org/10.1073/pnas.1720419115CrossRefGoogle ScholarPubMed
Majid, A., & Kruspe, N. (2018). Hunter-gatherer olfaction is special. Current Biology, 28(3), 409413. https://doi.org/10.1016/j.cub.2017.12.014CrossRefGoogle ScholarPubMed
Manescu, S., Frasnelli, J., Lepore, F., & Djordjevic, J. (2014). Now you like me, now you don’t: Impact of labels on odor perception. Chemical Senses, 39(2), 167175. https://doi.org/10.1093/chemse/bjt066CrossRefGoogle ScholarPubMed
Miller, T. M., Schmidt, T. T., Blankenburg, F., & Pulvermüller, F. (2018). Verbal labels facilitate tactile perception. Cognition, 171, 172179. https://doi.org/10.1016/j.cognition.2017.10.010CrossRefGoogle ScholarPubMed
Olofsson, J. K., & Gottfried, J. A. (2015). The muted sense: Neurocognitive limitations of olfactory language. Trends in Cognitive Sciences, 19(6), 314321. https://doi.org/10.1016/j.tics.2015.04.007CrossRefGoogle ScholarPubMed
Peng, M., Coutts, D., Wang, T., & Cakmak, Y. O. (2019). Systematic review of olfactory shifts related to obesity. Obesity Reviews: An Official Journal of the International Association for the Study of Obesity, 20(2), 325338. https://doi.org/10.1111/obr.12800CrossRefGoogle ScholarPubMed
Perry, L. K., & Lupyan, G. (2014). The role of language in multi-dimensional categorization: Evidence from transcranial direct current stimulation and exposure to verbal labels. Brain and Language, 135, 6672. https://doi.org/10.1016/j.bandl.2014.05.005CrossRefGoogle ScholarPubMed
Pusch, R., Packheiser, J., Azizi, A. H., Sevincik, C. S., Rose, J., Cheng, S., Stüttgen, M. C., & Güntürkün, O. (2023). Working memory performance is tied to stimulus complexity. Communications Biology, 6(1), 1119. https://doi.org/10.1038/s42003-023-05486-7CrossRefGoogle ScholarPubMed
R Development Core Team. (2021). R: A Language and environment for statistical computing. R Foundation for Statistical Computing.Google Scholar
Rastle, K., & Coltheart, M. (2000). Lexical and non-lexical print-to-sound translation of disyllabic words and non-words. Journal of Memory and Language, 42, 342364. https://doi.org/10.1006/jmla.1999.2687CrossRefGoogle Scholar
Sebastián-Gallés, N., & Soto-Faraco, S. (1999). Online processing of native and non-native phonemic contrasts in early bilinguals. Cognition, 72(2), 111123. https://doi.org/10.1016/s0010-0277(99)00024-4CrossRefGoogle ScholarPubMed
Slivac, K., & Flecken, M. (2023). Linguistic priors for perception. Topics in Cognitive Science, 15(4), 657661. https://doi.org/10.1111/tops.12672CrossRefGoogle ScholarPubMed
Sóskuthy, M. (2017). Generalised additive mixed models for dynamic analysis in linguistics: A practical introduction. arXiv 1703.05339 [stat: AP].Google Scholar
Speed, L. J., & Majid, A. (2018). An exception to mental simulation: No evidence for embodied odor language. Cognitive Science, 42(4), 11461178. https://doi.org/10.1111/cogs.12593CrossRefGoogle ScholarPubMed
Strange, W. (Ed.). (1995). Speech perception, and linguistic experience: Issues in cross-language research. Baltimore: York Press.Google Scholar
Taft, M., Castles, A., Davis, C., Lazendic, G., & Nguyen-Hoan, M. (2008). Automatic activation of orthography in spoken word recognition: Pseudohomograph priming. Journal of Memory and Language, 58, 366379. https://doi.org/10.1016/j.jml.2007.11.002CrossRefGoogle Scholar
Vanek, N., Sóskuthy, M., & Majid, A. (2021). Consistent verbal labels promote odor category learning. Cognition, 206. https://doi.org/10.1016/j.cognition.2020.104485CrossRefGoogle ScholarPubMed
Vanek, N. (2019). Changing event categorisation in second language users through perceptual learning. Language Learning, 70(2), 309348. https://doi.org/10.1111/lang.12377CrossRefGoogle Scholar
Velayudhan, L., Pritchard, M., Powell, J. F., Proitsi, P., & Lovestone, S. (2013). Smell identification function as a severity and progression marker in Alzheimer’s disease. International Psychogeriatrics, 25(7), 11571166. https://doi.org/10.1017/S1041610213000446CrossRefGoogle ScholarPubMed
Wang, H. (1986). Modern Chinese Frequency Dictionary. Beijing, China: Beijing Language Institute Press.Google Scholar
Wagenmakers, E. J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., … Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example Applications with JASP. Psychonomic Bulletin and Review, 25(1), 5876. https://doi.org/10.3758/s13423-017-1323-7CrossRefGoogle ScholarPubMed
Wagenmakers, E. J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem. Journal of Personality and Social Psychology, 100(3), 426432. https://doi.org/10.1037/a0022790CrossRefGoogle ScholarPubMed
Xiao, Z., Zhang, J. X., Wang, X., Wu, R., Hu, X., Weng, X., & Tan, L. H. (2005). Differential activity in left inferior frontal gyrus for pseudowords and real words: An event-related fMRI study on auditory lexical decision. Human Brain Mapping, 25(2), 212221. https://doi.org/10.1002/hbm.20105CrossRefGoogle Scholar
Zhou, G., Lane, G., Cooper, S. L., Kahnt, T., & Zelano, C. (2019). Characterizing functional pathways of the human olfactory system. eLife, 8, e47177. https://doi.org/10.7554/eLife.47177CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. Six odor triplets used in the experiment. (NOTE: Pictures and real-world labels are provided for illustration purposes only. The actual stimuli were odors and pseudowords).

Figure 1

Figure 2. Composition processes of two-character Chinese pseudowords. (a) One excluded scenario illustrates the combination of the first character from a two-character Chinese word (e.g., the initial character in the Chinese word for “telephone”) with the second character from another two-character Chinese word (e.g., the second character in the Chinese word for “pool”). This combination results in the formation of a new real word (e.g., the newly created word in Chinese meaning “battery”). (b) Example of an acceptable scenario of composing a two-character Chinese pseudoword through combining the first character from a real two-character Chinese word (e.g., the initial character in the Chinese word for “glass”) with the second character from another two-character Chinese real word (e.g., the second character in the Chinese word for “telephone”). This combination results in the creation of a new, meaningless word (e.g., the newly formed word 玻话), and importantly, there is no other meaningful word that would have the same pronunciation as this new word. (c) Another excluded scenario involves combining the first character from one two-character Chinese word (e.g., the initial character in the Chinese word for “temporary”) with the second character from another two-character Chinese word (e.g., the second character in the Chinese word for “homework”). This combination results in the creation of a new, meaningless word (e.g., the newly formed word 临业). However, the pronunciation of 临业 is the same as 林业 (forestry), a meaningful word, rendering the new word unsuitable for use in this experiment.

Figure 2

Figure 3. The full set of eighteen Chinese two-character pseudowords used in the current experiment.

Figure 3

Figure 4. Learning gains in odor categorization across groups (Georgian, English, Chinese, Control) calculated by subtracting each participants’ odor categorization accuracy score from Session 1 from their score in the test after the completion of training. The final test was conducted without verbal labels. The dashed line marks the mean of the Control group.

Figure 4

Figure 5. Correlation plots showing the absence of a significant relationship between participants’ initial odor discrimination ability and their learning gains across the four groups.

Figure 5

Figure 6. Line graphs displaying changes in categorization accuracy over four training sessions and the Test session per triplet. Average accuracy scores are shown separately for the Chinese (red), Control (green), English (blue), and Georgian (orange) groups. Categorization in the Test session was without verbal labels, unlike in the four preceding training sessions. The dashed horizontal line indicates chance performance at 50%. The six odor triplets left-to-right are (top): banana-pear-pineapple, caramel-coconut-coke, eucalyptus-peppermint-grass, (bottom): leather-smoked meat-mushroom, lilac-lavender-rose, peach-melon-raspberry.

Figure 6

Figure 7. Changes in accuracy collapsed for all triplets across sessions.

Figure 7

Figure 8. Development of odor categorization accuracy per participant across four training sessions and the test.

Figure 8

Figure 9. Visualized output of a Generalized Additive Mixed Model (GAMM) used to assess potential nonlinearities in categorization accuracy across groups (Control, Chinese, English, Georgian). Lines show the predicted accuracy, and the shaded areas are the 95% confidence intervals.