Introduction
Gesture and language development are “tightly coupled.” (Iverson & Thelen, Reference Iverson and Thelen1999, p. 20), and the parallel unfolding of gesture development and spoken language development may lie in their shared symbolism (Capone & McGregor, Reference Capone and McGregor2004). Gesture initially grounds spoken language through sensorimotor experiences (Perniss & Vigliocco, Reference Perniss and Vigliocco2014). The emergence of specific gesture types in later infancy and early toddlerhood precedes children’s language production milestones including the onset of single words and two-word combinations (e.g., Crais, Watson & Baranek, Reference Crais, Watson and Baranek2009; Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005). This is a developmental period during which children move from saying their first words to rapid vocabulary growth. After age 12 months and the onset of first words, children gradually add new words at a rate of about one to two new words weekly, and after 24 months of age, word learning accelerates with children producing 10 new words within a 14-day period (e.g., Hirsh-Pasek, Golinkoff & Hollich, Reference Hirsh-Pasek, Golinkoff and Hollich2000; Mervis & Bertrand, Reference Mervis and Bertrand1994).
One explanation for children’s rapidly accelerating word production is fast mapping, the process whereby children encode beginning, incomplete word representations from brief exposures and incidental mappings of novel words to referents (Carey & Bartlett, Reference Carey and Bartlett1978; Dollaghan, Reference Dollaghan1987; Gershkoff-Stowe & Hahn, Reference Gershkoff-Stowe and Hahn2007; Swingley, Reference Swingley2010). Children as young as 13 months show fast mapping, and there is substantial evidence indicating that typically developing toddlers are fast mapping successfully by age 2 years (Hiebeck & Markman, Reference Hiebeck and Markman1987; Spiegel & Halberda, Reference Spiegel and Halberda2011). Fast mapping, however, is just one step in a word learning process that may not involve the same mechanisms needed for children’s development of full lexical representations (Bion, Borovsky & Fernald, Reference Bion, Borovsky and Fernald2013; Carey, Reference Carey2010; Horst & Samuelson, Reference Horst and Samuelson2008). Word learning can be considered a continuum starting with a person’s initial exposures and building to well-established understanding and use of words for effective communication. Fast mapping occurs early in word learning and can result from a single exposure to a word and referent incidentally. Following limited exposure, often in experimental tasks, fast mapping is usually assessed by immediate, forced-choice recognition or receptive identification. Next steps in word learning can be termed slow mapping. Slow mapping is defined by repeated linkages between a referent’s semantic information and the word form. Slow mapping or extended word learning may be assessed in recognition tasks following a time delay varying from minutes to days or weeks. Likewise, expressive naming can be viewed as evidence of slow mapping because it requires activating and speaking a stored representation (Capone & McGregor, Reference Capone and McGregor2005).
Our aim in this investigation was to test the role of gesture input in support of young children’s word learning. Early gestures in the baby’s environment, showing and pointing, can provide a foundation for first words. Caregivers engage in showing by shaking an object or moving an object up and down in front of the infant’s face while synchronously naming the object (Matatyaho & Gogate, Reference Matatyaho and Gogate2008), and researchers have reported a type of gestural motherese consisting of pointing gestures paired with talking (Iverson, Capirci, Longobardi & Caselli, Reference Iverson, Capirci, Longobardi and Caselli1999; Zammit & Schafer, Reference Zammit and Schafer2010). By 13 months of age, typically developing infants demonstrate understanding that the deictic, point gestures produced by adults reference objects in the environment (Gliga & Csibra, Reference Gliga and Csibra2009). Pointing can harness an infant’s joint attention with the adult and an object, supporting the child’s mapping of the word spoken by the parent to the object that has been ostensively indicated. In this sense, point gestures are considered a type of social/pragmatic cue, indicating the pointer’s intended referent (Capone Singleton, Reference Capone Singleton2012). Pointing by Italian mothers when interacting with their children aged 1;4 was positively correlated with their children’s vocabulary skills at 1;8 (Iverson et al., Reference Iverson, Capirci, Longobardi and Caselli1999).
In addition to deictic gestures such as showing and pointing, gestures can be iconic. Iconic gestures reflect a characteristic of a concrete referent. Iconic gestures manually represent some element of meaning – the shape, action, or function features of the referent (Capone Singleton & Saks, Reference Capone Singleton and Saks2015; Goldin-Meadow & Alibali, Reference Goldin-Meadow and Alibali2013). An example is when a caregiver moves his hand to his mouth to represent eat when asking a toddler, “Do you want to eat?” Gestures most often co-occur with speech, and the meaning conveyed by the gesture is often redundant with speech (Capone Singleton & Saks, Reference Capone Singleton and Saks2015; Hostetter & Mainela-Arnold, Reference Hostetter and Mainela-Arnold2015; Iverson et al., Reference Iverson, Capirci, Longobardi and Caselli1999). Zammit and Schafer (Reference Zammit and Schafer2010) reported an association between children’s comprehension of target words (aged 11 months) and their mothers’ verbal labeling of the items paired with iconic gestures at a time when the children (aged 9 months) had not yet acquired the words. In a meta-analysis of gesture studies, Hostetter (Reference Hostetter2011) concluded that listeners had better comprehension of speech when accompanied by gestures, but age was one of several moderating factors. Children benefitted more from gesture than adolescents or adults, and individuals who were considered less verbally proficient (i.e., listeners with Down syndrome or autism) were more likely to benefit from gestures than unimpaired learners. Also, the positive effects of an iconic gesture were measured when motoric or spatial information was being conveyed but not for abstract information (Hostetter, Reference Hostetter2011).
There is widespread belief in popular culture as well as some research support that children benefit from baby sign language or gesture instruction in the environment (Goodwyn, Acredolo & Brown, Reference Goodwyn, Acredolo and Brown2000; Lederer & Battaglia, Reference Lederer and Battaglia2015), and this has led to the development and marketing of Baby Signs® (see http://www.babysignstoo.com/), one of now several similar programs that encourage use of manual signs with typically developing infants and toddlers to increase their expressive communication abilities with caregivers as their speech skills develop. Founders of Baby Signs®, Goodwyn et al. (Reference Goodwyn, Acredolo and Brown2000) reported that children whose parents were trained to combine words with iconic gestures outperformed a control group with no training for expressive and receptive language measures. No differences in language development were found for a no-training control group compared with a verbal training group whose parents were trained to increase speech-only labeling. Studies by Goodwyn and Acredolo (Reference Goodwyn and Acredolo1993, Reference Goodwyn and Acredolo1998) using the same cohort as Goodwyn et al. (Reference Goodwyn, Acredolo and Brown2000) were critiqued by Johnston, Durieux-Smith, and Bloom (Reference Johnston, Durieux-Smith and Bloom2005), who conducted a systematic review of the baby sign language research literature. Johnston et al. (Reference Johnston, Durieux-Smith and Bloom2005) pointed out that Goodwyn et al. did not test and report comparisons between the sign-training group and the verbal-training group. Additionally, Goodwyn et al.’s finding of differences was statistically significant at only ages 15 and 24 months of age, not at 19, 30, or 36 months. Based on their review of this study and others, Johnston et al. (Reference Johnston, Durieux-Smith and Bloom2005) concluded that there were no advantages in adding gestural communication to parental input beyond 24 months. Kirk, Howlett, Pine, and Fletcher (Reference Kirk, Howlett, Pine and Fletcher2013) reported no language development differences for children followed from ages 8 to 20 months whose mothers used baby sign language compared to control groups consisting of symbolic gesture, verbal training or no intervention. Authors cautiously reported that there were three boys with relatively low ability whose expressive language learning appeared facilitated by participation in a group with sign language or symbolic gestures. Clearly, not all research has supported the premise that gesture enhances word learning. In a looking-paradigm study examining the fast-mapping abilities of infants, Puccini and Liszkowski (Reference Puccini and Liszkowski2012) found that children (aged 1;3) did not map words to referents accurately in the word-plus-gesture and gesture-alone conditions. The only statistically significant effect was found for the word-only condition. Authors concluded that spoken words alone are the optimal input for hearing children.
Differences among studies led us to conclude that any benefits derived by adding gesture to speech input may depend on the age and skills of the children as well as the type of gesture and its relationship to the referent. For example, in addition to the young age of their participants, Puccini and Liszkowski (Reference Puccini and Liszkowski2012) included American Sign Language gestures for yes and no. These gestures are arbitrary, not iconic, because there were no recognizable associations between the gestures and the study referents. Iconic gestures, as opposed to arbitrary gestures, are assumed to support word learning because they map semantic elements that can facilitate children’s representation of the referent. Several studies, however, have indicated that children under the ages of 3;6 to 4 years cannot easily engage iconic information (Lüke & Ritterfeld, Reference Lüke and Ritterfeld2014; Namy, Reference Namy2008; Tolar, Lederberg, Gokhale & Tomasello, Reference Tolar, Lederberg, Gokhale and Tomasello2008). Lüke and Ritterfeld (Reference Lüke and Ritterfeld2014) found no significant differences in typically developing preschoolers’ receptive fast mapping of novel cartoon character labels paired with iconic gestures versus arbitrary gestures, but word learning was supported by both gesture conditions when compared to a no-gesture condition. Namy (Reference Namy2008) found that children at 14, 18, and 22 months did not consistently recognize iconic action gestures when selecting target objects in target trials, and at 14 months, the children did not choose target objects more often than expected by chance. Only one age group, children who were 26 months, showed consistent selection of target objects above chance levels when the stimulus was an action gesture that matched conventional actions performed on objects (e.g., spinning a top-like novel object or scooping with a familiar object, a spoon). Namy concluded that gesture iconicity was fragile and fluctuating for children under age 2 years.
Tolar et al. (Reference Tolar, Lederberg, Gokhale and Tomasello2008) also concluded that iconicity recognition was fragile for children age 3 years and younger. They studied the recognition of iconicity for sign language signs with hearing children ages 2;6 to 4;6 years. Only in the age groups 3;6, 4;0 and 4;6 did 50% of the children successfully identify pictures based on iconic signs. Children at ages 2;6 and 3;0 did not correctly associate iconic signs with pictures. Stimuli in their study varied iconicity such that some signs were considered pantomime or actions associated with the referent (e.g., “baby,” “write”) versus perceptual aspects (e.g., “house,” “tornado”). Perniss and Vigliocco (Reference Perniss and Vigliocco2014) argued that the type of iconicity – action-based signs such as for the word “push” versus a perception-based sign for “deer” – is a factor in language development and language processing.
Capone and McGregor (Reference Capone and McGregor2005) compared iconic shape gestures versus iconic function gestures in a fast-mapping investigation with typically developing children 2;3 to 2;6 years. They hypothesized that an early visual or perceptual aspect such as the shape of a referent might be easily recognized and improve fast mapping compared to an action or function gesture that could require additional representational learning. Their hypothesis drew from literature proposing a shape bias as a mechanism supporting young children in learning words (Landau, Smith & Jones, Reference Landau, Smith and Jones1988; Smith, Reference Smith and Marschark2000). Diesendruck and Bloom (Reference Diesendruck and Bloom2003) proposed that children’s attention to perceptual aspects such as the shape of objects is not a specific linguistic mechanism but rather a more general means of concept creation for a category or kind. Toddlers’ attention and selectivity to the perceptual feature of shape increases between 2;0 to 3;1 for generalizing novel object labels and shape continues to be a significant factor for word learning beyond age 4 years (Davidson, Rainey, Vanegas & Hilvert, Reference Davidson, Rainey, Vanegas and Hilvert2018; Diesendruck & Bloom, Reference Diesendruck and Bloom2003; Landau et al., Reference Landau, Smith and Jones1988; Landau, Smith & Jones, Reference Landau, Smith and Jones1988; Smith, Reference Smith and Marschark2000). To test iconic shape gestures, Capone and McGregor (Reference Capone and McGregor2005) contrasted three fast-mapping conditions: nonce words paired with shape gestures; nonce words paired with function gestures; and nonce words only as a no-gesture control condition. Results indicated that children fast mapped at levels above chance when the word was paired with a shape gesture: 68% of the novel item/nonce word pairs were fast mapped. In the function gesture and no-gesture conditions, performance was at chance. Retrieval for the labels for the novel items trained in both gesture conditions, shape or function iconic gestures, required fewer cues than for labels trained in the no-gesture condition. Capone Singleton (Reference Capone Singleton2012) extended these findings regarding shape cues to children who were two and three years old. When three novel words were taught in three gesture conditions – with a shape gesture, with a function gesture, and with a point – children’s naming of words taught with shape gestures was significantly more frequent compared to the other conditions and resulted in better categorization and naming of untaught exemplars. It was iconic shape gestures, not deictic gestures such as pointing, that enhanced semantic representations underlying fast mapping and slow mapping processes for object naming.
Despite the evidence of gesture influences on language learning in typically developing toddlers, several unknowns remain. Given the conflicting findings of studies, one unknown is the extent to which an iconic gesture versus a deictic gesture might aid toddlers in fast mapping. A second unknown is whether fast mapping by children younger than 2;3 would be improved given adult input that combines spoken word labels with gesture aids. A clearer understanding of these factors is needed when advising parents of typically developing children regarding the use of gestural techniques to promote language learning. Of greater importance are implications for the use of gesture-speech input for language development and word learning in clinical populations with limited expressive language, including children with autism, Down syndrome, and even late talkers (Capone & McGregor, Reference Capone and McGregor2004; Capone Singleton & Anderson, Reference Capone Singleton and Anderson2020; Capone Singleton & Saks, Reference Capone Singleton and Saks2015; Caselli, Vicari, Longobardi, Lami, Pizzoli & Stella, Reference Caselli, Vicari, Longobardi, Lami, Pizzoli and Stella1998; Özçalişkan, Adamson, Dimitrova, Bailey & Schmuck, Reference Özçalişkan, Adamson, Dimitrova, Bailey and Schmuck2016; Thal & Tobias, Reference Thal and Tobias1992; Vogt & Kauschke, Reference Vogt and Kauschke2017; Wang, Bernas & Eberhard, Reference Wang, Bernas and Eberhard2001; Ellis Weismer & Hesketh, Reference Ellis Weismer and Hesketh1993).
Research questions
Our purpose was to determine whether gesture input combined with speech facilitated toddlers’ fast mapping nonce words to unfamiliar objects. We examined how participants from two age groups, children aged 1;4-1;8 and children aged 2;0-2;4, responded in three input conditions – an iconic shape gesture combined with speech, a deictic point gesture combined with speech, and a speech-only, no gesture control condition. The following research questions and hypotheses were posed:
-
1. Is there a significant effect of gesture input on receptive fast mapping of unfamiliar target objects by toddlers? We hypothesized that participants would demonstrate more correct responses in the gesture conditions than in the speech-only control condition. We also hypothesized that the iconic shape gesture condition would have a greater proportion of accurate responses than the point gesture condition.
-
2. Is there a significant effect of participant demographic variables on the receptive fast mapping skills in toddlers? We expected that the older toddler groups would have more accurate responses than the younger toddler groups.
-
3. Is there a significant effect of gesture condition on expressive naming of unfamiliar objects following a brief word learning task? We hypothesized that gesture input would support successfully naming newly learned objects. In particular, we expected that the shape gesture would support mapping some semantic information thereby increasing the likelihood of successful encoding and retrieval for the newly learned name.
-
4. Is there a significant effect of participant demographic variables on toddlers’ accurate expressive naming of nonce words paired with unfamiliar objects immediately following a brief word learning task? As with the receptive task, we anticipated that older toddlers would show more accurate naming than younger toddlers.
Method
Participants
Recruitment and enrollment proceeded after Institutional Review Board human-subjects approval. Participants were 48 children from the northern Gulf Coast region of the United States who met eligibility criteria: ≥ 10th percentile on the MacArthur-Bates Communicative Development Inventories (MBCDI; Fenson, Marchman, Thal, Dale & Reznick, Reference Fenson, Marchman, Thal, Dale and Reznick2007); a reported gestation age of ≥ 37 weeks; monolingual English-speaking parents/caregivers; and no known hearing impairments. Sixty-one toddlers were initially seen, but 13 (21%) were not enrolled. Nine had MBCDI percentile scores <10th percentile; three did not complete the experimental task; and one was not included due to investigator error.
Two participant groups were formed: 24 toddlers (14 boys, 10 girls) ages 1;4 to 1;8 in the Younger Toddler group and 24 toddlers (10 boys, 14 girls) ages 2;0 to 2;4 years in the Older Toddler group. See Table 1 for group demographic information. A one-way analysis of variance (ANOVA) was performed and a significant difference for MBCDI Words Produced was revealed for Gender and Age (F [3, 48] = 41.98, p < .001). Post hoc analyses using Bonferroni procedures indicated that the mean raw score for Older Toddler girls (M = 515.36, SD = 113.14) was significantly higher than for Older Toddler boys (M = 344.20, SD = 134.90) which was significantly higher than for Younger Toddler boys (M = 64.07, SD = 60.65) and Younger Toddler girls (M = 157.10, SD = 144.319). Differences between Younger Toddler boys and girls were nonsignificant. Because gender differences were significant, gender was added as a participant demographic factor to our analyses. A one-way ANOVA revealed no significant differences for maternal education for the age groups or for gender.
Note. MBCDI = MacArthur-Bates Communicative Developmental Inventories – Words and Sentences (Fenson et al., Reference Fenson, Marchman, Thal, Dale and Reznick2007).
Stimuli creation
Nine white objects (see Table 2) – six unfamiliar objects and three familiar objects – were selected to be perceptually similar in color and general size except for their distinctive shapes. The six unfamiliar objects were divided into two subsets. Three unfamiliar objects were targets paired with iconic shape gestures: a triangular holder, an over-the-door hanger, and part of an onion blossom maker. To provide names for the unfamiliar target objects, investigators studied monosyllabic, consonant-vowel-consonant nonce words from prior studies of word learning and fast mapping. Three nonce words, “tull” /tʌl/, “fim” /fɪm/, and “sep” /sɛp/, were selected because they had no phonemes in common with the familiar object names and they had high phonotactic probability (Vitevitch & Luce, Reference Vitevitch and Luce2004). Each word was randomly assigned to one unfamiliar target object (see Table 3).
Iconic shape gestures for each unfamiliar target object (see Table 3) were created with two hands making contact and presented statically in similar gesture spaces. Three unfamiliar objects served as foils: a pastry blender, a plastic ridged tube, and a cable T-fitting.
Five of our six unfamiliar objects were previously established as unfamiliar – that is, un-nameable objects (i.e., Beverly & Estis, Reference Beverly and Estis2003). Pilot testing of the shape gesture to object mapping occurred prior to object selection. Specifically, 20 consented adults were asked to identify an object when presented with its associated shape gesture in an array of seven unfamiliar objects. The three items selected as the unfamiliar target objects were the only three items of the seven selected by 100% of the participants: the triangular holder, the over-the-door hanger, and the onion blossom maker part. Unfamiliar foil objects and the familiar objects – keys, a cup, and a sock – were white and sized to be similar to the target unfamiliar objects. Also, familiar objects were selected using a word frequency program based on lexical development data, the Lex2005 Database (Dale & Fenson, Reference Dale and Fenson1996), that generated proportion ratings indicating that >80% of toddlers are reported to comprehend these.
Experimental design and procedures
The within-subjects design consisted of three word-learning conditions: Point, Shape, and Control. In the Point condition, the investigator (first author) pointed to the unfamiliar target object while saying the associated nonce word. The investigator pointed with the index finger of the right hand extended within approximately six inches of the object. In the Shape condition, the investigator produced the iconic shape gesture next to and within six inches of the unfamiliar target object while saying the nonce word. In the Control condition, the investigator said the nonce word for the unfamiliar target object but with no gesture. Objects were grouped and presented in a consistent order – the familiar object, the unfamiliar target object, and then the unfamiliar foil (see Figure 1). The administration of conditions (i.e., A = Point, B = Shape, and C = Control) associated with the unfamiliar target word-object pairing was systematically varied across participants using six unique sequences to attain complete counterbalancing: ABC, BCA, CAB, CBA, BAC, and ACB. The nonce words – tull, sep, and fim – each labeling an unfamiliar object were presented in the same order but the six presentation lists resulted in nine word-gesture condition pairings: “tull” + Point; “tull” + Shape; “tull” Control; “sep” + Point; “sep” + Shape; “sep” Control; “fim” + Point; “fim” + Shape; and “fim” Control. In this manner, gesture condition differences would not be due to unexpected item effects.
Experimental sessions were scripted (see Figure 1) and conducted live by the first author in one of three settings (i.e., 65% in participants’ homes; 23% in a preschool/daycare; and 13% in the university-based lab setting) with a familiar adult present in the room. Participants were seated in a highchair, a booster seat, or in their mother’s lap (See Figure 2). A brief 2- to 5-minute play period was used to establish rapport and to determine participants’ ability to follow simple commands.
The experimental procedure consisted of two phases, the fast-mapping phase and the testing phase, repeated for each of the three conditions. Within each fast-mapping phase, the investigator’s utterances were scripted (see Figure 1) and included producing the object name – familiar or unfamiliar target – a total of four times. The fast-mapping script was designed to initially call attention to the novel object. Then, the participant manipulated each object for approximately 10 seconds before dropping it in a bucket. Once in the bucket, the investigator pretended to look for the object and then quickly found it, supporting one additional exposure.
A testing phase immediately followed each fast-mapping phase. During testing, the three objects were arranged in a line on a red mat placed on the high-chair tray, the table, or the floor in front of the participant (See Figure 1). Object position was pre-determined and counterbalanced, such that the unfamiliar target object position varied in the three testing phases. First, the investigator instructed the participant to get the familiar object, and for this trial the investigator provided training to participants who did not correctly select the familiar object. Training consisted of repetition and scaffolded cues: holding one of the participant’s hands to promote a single-object selection, moving the familiar object closer to the participant, manipulating the familiar object while naming it, and hand-over-hand assistance to select the familiar object. Once the familiar object was removed, testing proceeded from a field of two unfamiliar objects (the target and the foil). Receptive assessment for the unfamiliar target object consisted of the name only, no point or gesture. Noncontingent positive reinforcement was provided following each selection to promote continued participation. Lastly, an opportunity for naming was provided: the investigator held up the unfamiliar target object and asked, “What’s this?” Upon completion of the testing phase, the next fast-mapping phase for the second condition in the counterbalanced sequence was conducted, followed by the associated testing phase. The third fast-mapping and testing phases completed the experimental procedure, which lasted approximately 8 minutes.
Receptive trials were scored as correct if the participant accurately selected the target object given the unfamiliar object name without prompting. Receptive trials were scored via the video recording, and reliability checks for 50% of the data revealed 100% inter-judge agreement and 100% intra-judge agreement. Expressive trials were scored as correct if the participant said the correct nonce word for the unfamiliar target objects. This was completed live and the participant’s responses were transcribed if needed. Reliability using the video recordings was 97% (35 of 36 reviewed decisions) for the inter-judge agreement and 97% for the intra-judge agreement.
Statistical analyses
The dependent variables were the fast-mapping responses – receptive fast mapping and expressive naming. These were dichotomous, categorical variables with correct responses coded as 1 and incorrect responses coded as 0. Participants contributed one receptive fast-mapping score and one expressive naming score for each of the three conditions: Point, Shape, and Control. Mean proportions and standard deviations were computed for the number of participants responding correctly in each gesture condition for each measure, receptive fast mapping and expressive naming.
To address the research questions, the factors of gesture input, age, and gender were examined. The generalized estimating equation (GEE; Liang & Zeger, 1986) approach in IBM SPSS (IBM Corp, Reference Corp2021, Version 28) was used. GEE is subsumed under generalized linear mixed models and offers a framework for examining repeated categorical outcome variables that are nested within participants who are selected at random (Heck, Thomas & Tabata, Reference Heck, Thomas and Tabata2012). A two-level data hierarchy was constructed in which receptive and expressive responses for three repeated gesture conditions (Level 1) were nested within each of the 48 toddlers (Level 2). Toddler participants were from a random sample of the population, and, therefore, are considered as random effects for the model. The intercept of the random effect was allowed to vary freely.
Several models were constructed with main effect and interaction terms, including a full factorial framework. The models were compared using the quasilikelihood under independence model criterion (QIC) to determine which model had the best quality of fit. QIC statistics with smaller values indicate a more superior model fit than higher values; therefore, we used the statistical models that demonstrated the lowest QIC for each analysis. Six models were specified for the two dependent variables, receptive fast mapping and expressive naming. Table 4 displays the models, the fixed effects, and the QIC statistics for each.
Note. QIC statistics were labeled as Failed when statistical analysis procedures did not execute. This was due to one participant group, the younger male toddlers, all having 0 values (i.e., no correct responses) for expressive naming in all gesture conditions.
Results
Receptive fast mapping
In a repeated fashion, the 48 participants contributed one receptive fast-mapping score for each of the three conditions: Point, Shape, and Control. This resulted in a total of 144 receptive fast-mapping responses, 1 or 0 for correct and incorrect, respectively. In addition to the planned investigation of gesture conditions and participant age groups, data are depicted with age groups divided by gender because of a significant gender difference for the participant groups on the eligibility language assessment, the MBCDI. Figure 3 displays the mean proportions of correct fast mapping responses over the gesture conditions, age groups, and genders. Group means and standard deviations indicated that Older Toddler girls were accurate for receptive fast mapping, and Older Toddler boys’ receptive fast mapping was greatest for objects exposed in the Shape condition.
To test for statistically significant differences, GEE was conducted. Model 2 (see Table 4) demonstrated the lowest QIC indicating the best quality of fit. Using GEE, the effects of gesture condition, age, and gender on receptive fast mapping were evaluated. Age was a significant main effect, Wald χ2 (1) = 5.551, p = .018, β = 1.047 (SE= .4446). The mean proportions of correct receptive fast-mapping responses in Older and Younger Toddlers were .74 (SD = .05) and .49 (SD = .10) respectively. When other factors were held constant, Older Toddlers were 1.5 times more likely to receptively fast map correctly than Younger Toddlers. The main effects of gesture input condition and gender were not statistically significant. The model was also repeated with the predictor factor of Presentation List to test for item effects. Presentation List was nonsignificant and the significant main effect for age was unchanged.
During each testing phase, toddlers were first asked to select a familiar object. This task consisted of a forced-choice field of two. Means and standard deviations for groups’ selection of familiar objects is shown in Table 5 and compared with their means and standard deviations for correct selection of the unfamiliar target objects. Group differences were statistically significant based on independent t tests (p values < .01). Older Toddlers had more correct selections of familiar and unfamiliar objects than the Younger Toddlers did.
Note. Means represent the group averages for correct object selection in three opportunities.
Expressive naming
Of the 48 total participants who were asked to expressively name newly learned unfamiliar object labels in three gesture input conditions, 8 responses were not collected, either due to toddler fatigue or investigator error. Older Toddler girls contributed 41 responses with 1 missing response in the Control condition. Older Toddler boys contributed 29 responses with 1 missing response in the Point condition. Younger Toddler girls contributed 27 responses with 1 missing response in Shape and 2 missing responses in Control. Younger Toddler boys contributed 37 responses with 2 missing responses in the Point condition, 1 missing in the Shape condition, and 2 missing Control responses. This resulted in a total of 134 expressive naming responses, 0 or 1 for correct and incorrect, available for analysis. Figure 4 displays the proportions of correct expressive naming responses over the gesture conditions, age groups, and genders. Similar to the receptive fast-mapping responses, Older Toddler girls outperformed the other groups. The Older Toddler boys had a higher proportion of correctly named objects in the Control condition.
To test for statistical significance of hypothesized factors on toddlers’ expressive naming, Model 5 demonstrated the best quality of fit based on the lowest QIC (See Table 4). In the GEE approach, the 10 missing data points are assumed to be random. Despite this limitation that can impact model efficiency and parameter estimates (Heck et al., Reference Heck, Thomas and Tabata2012), the selected GEE model adequately accommodated the available expressive fast mapping data and imputations were unnecessary. To address our research questions, main effects of gesture condition, age, and gender as well as the interaction of gesture condition and gender were evaluated. To rule out any item effects, the model was repeated with Presentation List tested. Results indicated two statistically significant main effects: age, Wald χ2 (1) = 9.369, p = .002, β = 3.145 (SE = 1.0276), and gender, Wald χ2 (1) = 7.761, p = .005, β = 2.683 (SE = 1.1605). A main effect of gesture condition was not statistically significant; however, an interaction between gesture condition and gender was statistically significant, Wald χ2 (1) = 9.822, p = .007, β = 2.388 (SE = 1.1857). Compared to boys, girls were nine times more likely to expressively name newly learned words in the Point and Shape conditions. While girls and boys performed similarly in the Control condition in which speech with no accompanying gestures was used to teach the new words, boys expressively named more words in the Control condition than in the two gesture conditions. In contrast, girls expressively named fewer words in the Control condition than in the two gesture conditions.
Because of the homogeneity in responding by the younger toddler boys (i.e., no younger toddler boys named any items correctly), a separate analysis of the expressive naming by the Older Toddlers only was conducted. There were two instances of missing data resulting in 70 expressive naming responses from 24 participants available for analysis. A GEE approach (QIC = 93.556) was used to test the main effects of gesture condition and gender as well as an interaction between gesture condition and gender. Results revealed a statistically significant main effect of gender: Wald χ2 (1) = 6.286, p = .012, β = 2.485 (SE = 1.1844). A main effect of gesture condition was not statistically significant; however, the interaction between gesture condition and gender was statistically significant, Wald χ2 (1) = 8.142, p = .017, β = 2.234 (SE = 1.2320). These findings mirrored those in the previous analysis: Older Toddler girls were six times more likely to expressively name newly learned words in the Point and Shape gesture conditions than the Older Toddler boys. The boys’ and girls’ performances were similar in the Control condition, but Older Toddler boys expressively named more words in the Control condition than in the two gesture conditions.
Demographic factors
Additional demographic factors were tested using the GEE approach. Analyses for the receptive fast-mapping responses and expressive naming (Models 2 and 5, respectively) were repeated with exposure to gesture communication (i.e., baby signs) and language skill (measured by MBCDI percentile score for words produced) added as covariates. Neither factor was statistically significant in the models, and the significant findings of main effects and interactions for hypothesized predictors remained constant.
Discussion
In this experimental study of fast mapping by young toddlers, we systematically manipulated gestures provided with the linguistic input. A group of typically developing one year olds was compared with a group of young typically developing two year olds for mapping object names to novel referents given brief interactions with an unfamiliar adult. Our research questions addressed the effect of gesture combined with speech on the word learning skills of the toddlers.
The first dependent measure was receptive fast mapping, and all participants contributed three data points for each unfamiliar object selection given the nonce words and paired gesture conditions. There was no statistically significant effect of gesture condition for the two toddler groups’ recognition of the target objects. We had hypothesized that participants would demonstrate more correct responses in the gesture conditions than the word-only, no gesture condition, and we proposed that the shape gesture would have the greatest proportion of correct fast mapping. This hypothesized pattern for the shape condition emerged for the two-year-old boys, but this was not a significant finding. Only a main effect of age was found to be a significant predictor of the receptive fast-mapping selections.
Our second dependent measure was expressive naming of the target unfamiliar objects. Again, we hypothesized that gesture input would support successful naming. In particular, we expected that the shape gesture would support this mapping through at least partial encoding of some semantic information, the shape of the novel object. Assuming mapping of the salient shape feature would occur, then the likelihood of successful encoding and retrieval for the newly learned name might increase. Again, there was no significant main effect of gesture input. Instead, age and gender were both significant predictors. There was a gesture-gender interaction primarily characterized by more correct naming by two-year-old boys in the word-only, no gesture condition.
Toddler age and fast mapping
Age was a significant predictor of both receptive fast mapping and expressive naming. The two year olds outperformed the one year olds for both measures. The older toddlers were 1.5 times more likely to correctly select the target object receptively, and 12 of the 24 older toddlers receptively identified all three novel referents correctly regardless of the gesture condition. Like other researchers who report that typically developing toddlers fast map successfully by age two (Hiebeck & Markman, Reference Hiebeck and Markman1987; Spiegel & Halberda, Reference Spiegel and Halberda2011), our results did not uncover fast mapping by children under 20 months.
The nature of our fast-mapping task and the testing phases may be factors impacting our findings. The linguistic context was ostensive (e.g., “It’s a X; “See the X”) with words and phrases that serve a deictic function, and each nonce word and unfamiliar object pairing included four explicit labels, not just one or two exposures. The task also was multimodal in several ways. In addition to hearing the name and seeing the object, the investigator moved the target object to several locations (table, tray and bucket) and allowed the child to briefly handle each object. Any benefits from this ostensive teaching and engaging interaction, however, may have been off set by the total number of targets presented in a sequence of brief exposures. That is, the fast-mapping task exposed the young toddlers to nine white objects (three named familiar objects, three target objects paired with nonce words and gesture conditions, and three foil objects labeled “it”). For each of the objects, the investigator engaged in a scripted interaction during which she said the labels for the nine objects four times resulting in 36 labels total in the approximately eight-minute interaction. Given the very young age of the toddlers, this could be considered a challenging task compared to fast mapping studies with fewer targets.
The first of our two measures, the receptive fast-mapping measure, was a forced-choice, recognition task from a field of two that took place immediately after the brief interactive exposures for three objects. Naturalistic behavioral methods that relied on toddlers’ object selections may have limited our findings. Specifically, the group of younger toddlers, aged 1;4 to 1;8, showed inconsistent skills for this assessment even when asked to select familiar objects with known names. Only 21% of the one year olds accurately selected all three familiar objects. As a group, the younger toddlers averaged less than two correct selections out of three opportunities, which was significantly fewer correct selections of familiar objects than the toddlers in the older group. Namy (Reference Namy2008) also found that toddlers at 14, 18, and 20 months of age demonstrated inconsistent performance compared to toddlers at 26 months of age in a gesture recognition study using direct, manual object selection. Studies that implement looking paradigms may be more effective than behavioral studies when it comes to assessing fast mapping in toddlers younger than 2;0 (Gliga & Csibra, Reference Gliga and Csibra2009; Puccini & Liszkowski, Reference Puccini and Liszkowski2012). Bion et al. (Reference Bion, Borovsky and Fernald2013) suggested that the forced-choice paradigm is truly a disambiguation task that may not be indicative of word learning. They reported looking-paradigm data differentiating the development of skills for disambiguation separate from word-object learning and retention for children from 1;6 to 2;6. Bion and colleagues concluded that recognition of familiar words as well as disambiguating and learning new words are skills that develop gradually.
Our second measure, expressive naming, was expected to be more challenging than the receptive fast-mapping task, because toddlers had to encode and retrieve the phonological elements of the nonce word to correctly name the object. This was still an immediate assessment, not a test of retention with any time delay. Across gesture conditions, older toddlers named more of the newly learned nonce words than younger toddlers. Only one toddler in the younger group (n = 24) named the nonce label when presented with the associated object, and she named two out of three of the objects. None of the one-year-old boys correctly named any target objects.
Interestingly, several older toddlers named the unfamiliar object, the folder holder, a “triangle,” during the fast-mapping phase and again during the expressive naming test. This was despite their correct selection of the object named “tull” during the receptive fast-mapping test. If children already had a name for this object, “triangle,” then a mutual exclusivity assumption could interfere with mapping a new name to the object. Mutual exclusivity, the idea that an object has only one label, is one process hypothesized to support fast mapping new names to novel objects (Beverly & Estis, Reference Beverly and Estis2003).
Gender and word learning
Gender was found to be a significant predictor for toddlers’ expressive naming of the newly learned object labels. Sex differences for early language development with girls outpacing boys is not an uncommon research finding, despite boys meeting age-level language expectations. Female toddlers achieve language milestones such as vocabulary and syntax use at earlier ages than males (Fenson, Dale, Reznick, Bates, Thal, Pethick, Tomasello, Mervis & Stiles, Reference Fenson, Dale, Reznick, Bates, Thal, Pethick, Tomasello, Mervis and Stiles1994), and Özçalişkan and Goldin-Meadow (Reference Özçalişkan and Goldin-Meadow2010) reported that girls’ gesture productions, like their spoken language skills, emerged ahead of boys’ gesture use. This was despite no significant differences in the gesture input by mothers of the children (Özçalişkan & Goldin-Meadow, Reference Özçalişkan and Goldin-Meadow2010). Our findings suggested that female toddlers have a referent mapping advantage that promotes word learning. Female toddlers over the age of two are geared to learn new words. They are flexible fast mappers whose performance may have been enhanced by, but was not dependent upon, gestural cues combined with linguistic input. Girls’ and boys’ vocabulary sizes become more similar in the preschool years; and yet, the fast-mapping advantage of female toddlers may be a factor that undergirds female language skills through adolescence and into adulthood (Özçalişkan & Goldin-Meadow, Reference Özçalişkan and Goldin-Meadow2010).
In addition to a main effect of gender with girls outperforming boys, there was a significant interaction between gender and gesture condition for the older toddlers. Two-year-old girls were six times more likely than the boys to correctly name objects that were paired with point and shape gestures in the exposure task; however, naming by the boys and girls for the word-only, no gesture condition was similar. The boys were more accurate namers in the word-only, no-gesture condition. In fact, the gesture input appeared to interfere with the boys’ naming.
Puccini and Liszkowski (Reference Puccini and Liszkowski2012) concluded that multimodal input in the form of an arbitrary gesture paired with a spoken word is unnecessary for word learning and potentially disruptive for their participants who were 1;3. They questioned whether children under the age of 2;2 to 2;6 can benefit from multimodal input for word learning, particularly when the gesture is not a deictic point that can support joint attention processes. Puccini and Liszkowski hypothesized that mapping gesture-speech input with a novel referent is more complex than mapping speech to a referent. The multimodal input results in a word plus gesture plus object or three-way mapping compared with the word plus object, a simpler two-term mapping. Furthermore, encoding a representational gesture with an object requires coordinating competing visual information, whereas the spoken word in the auditory modality can be mapped synchronously to the visual referent in the environment.
The nature of iconic gesture
Iconicity is only one semantic feature that could support referent representation during fast mapping and word learning, but iconicity varies for gestures and signs such that some require understanding of the word and its referent (e.g., sign for “cat” that refers to a cat’s whiskers) to understand the iconic relationship. The shape gesture, however, is an iconic cue consistent with research suggesting that young toddlers attend to the shape of objects in the process of rapidly learning new words (Landau et al., Reference Landau, Smith and Jones1988; Namy, Reference Namy2008; Smith, Reference Smith and Marschark2000) including some studies supporting the facilitative effect of a shape gesture for word learning (Capone & McGregor, Reference Capone and McGregor2005; Capone Singleton, Reference Capone Singleton2012). Our results did not show a clear benefit for these toddlers from the co-occurring shape gesture. In addition to the younger ages of our participants, there were several study differences. Perhaps the most important one was the extended word learning or slow mapping nature of the Capone Singleton investigations. If, rather than being all-or-nothing, recognition of iconicity is developmental requiring repeated associations, then a fast-mapping task might not capture its impact on semantic encoding.
In language, iconicity is contrasted with arbitrariness, a critical aspect of language symbolism (Nielsen & Dingemanse, Reference Nielsen and Dingemanse2021). And yet, words that have more iconic sounds (e.g., roar, choo-choo) are often learned earlier by young children than words with sounds that have no relationship to the referent. Iconicity is assumed to provide some perceptual-motor grounding or imagistic information (Nielsen & Dingemanse, Reference Nielsen and Dingemanse2021). In this sense, we hypothesized that iconicity depicted by hand shapes that mimicked object shapes would be facilitative for very young children. Hodges, Özçalişkan, and Williamson (Reference Hodges, Özçalişkan and Williamson2018), however, found that toddlers (mean age of 2;8) did not match one subtype of iconic gestures, attribute gestures, to object photos, but three year olds did recognize iconic attribute gestures significantly more often than predicted by chance. Hearing three year olds who participated in a study by Magid and Pyers (Reference Magid and Pyers2017) of iconic shape gestures did not reliably map shape gestures to referents. Children who were Deaf learners of sign language recognized shape gestures matched to referents at age three. Novack, Filippi, Golding-Meadow, and Woodward (Reference Novack, Filippi, Golding-Meadow and Woodward2018) conducted a series of studies investigating interpretation of iconic gestures by two year olds. They found that the toddlers correctly interpreted different handshape gestures in a reach gesture but not when the same handshapes were gestured without the extended arm in a reach toward the object referents. Investigators concluded that children have difficulty interpreting shape gestures as representational of objects.
Gesture and language learning
Multimodal motherese has been described in natural and experimental contexts for parents from several cultures (Cheung, Hartley & Monaghan, Reference Cheung, Hartley and Monaghan2021; Gogate, Bahrick & Watson, Reference Gogate, Bahrick and Watson2000; Gogate, Maganti & Bahrick, Reference Gogate, Maganti and Bahrick2015). This multimodal input, however, is characterized by gestures paired with showing, shaking, and moving unfamiliar objects including sometimes touching the child with the object and the use of deictic gestures for scaffolding joint attention in the environment. This synchronized movement has been proposed to reduce cognitive load for preverbal children particularly when there is referent ambiguity; however, these studies have not assessed the benefits of the multimodal motherese. After all, typically developing children effectively learn language when their parents do not specifically use gesture to enhance the spoken input.
What is the role of gesture? A question often raised by parents and professionals is whether gesture programs, such as Baby Signs® and others, should be used to enhance word learning for typically developing toddlers. For spoken language learners, we know that gesture cannot be wholly sufficient for mapping words to referents. Children need linguistic input to hear and then learn vocabulary. We had anticipated that gesture in the form of a deictic point would direct attention to the object for mapping, and that the shape gesture would encode some representational information supporting mapping words to target objects. Our results, however, did not support a facilitative effect of gesture input for very young toddlers. Gesture input appeared beneficial to the two-year-old girls; however, these female toddlers were generally better at word learning than their male counterparts. So, although they were significantly more likely to successfully name the novel objects in the gesture conditions compared to the boys, their performance in the control condition was not significantly different. Instead, the two-year-old boys named more items taught in the word-only condition than in the gesture conditions. These findings are indicative of disrupted word learning for the multimodal input. As suggested in a body of work by Goldin-Meadow and colleagues (e.g., Breckinridge Church & Goldin-Meadow, Reference Breckinridge Church and Goldin-Meadow1986; Goldin-Meadow & Alibali, Reference Goldin-Meadow and Alibali2013; Goldin-Meadow, Kim & Singer, Reference Goldin-Meadow, Kim and Singer1999; Goldin-Meadow, Nusbaum, Kelly & Wagner, Reference Goldin-Meadow, Nusbaum, Kelly and Wagner2001), gesture appears to lessen the cognitive burden when children are in the process of acquiring language. Our results, however, indicate that the role of gesture is mitigated by children’s age-related language development and gender. Gesture is not necessary or sufficient; and very young language learners may not be able to capitalize on gesture cues in fast-mapping paradigms.
Limitations and implications
Primary limitations were the sample size based on the number of total participants from a relatively homogenous background and the few binomial data points evoked by our study design. Additionally, an experimental word learning study such as this is limited by the nature of the experimental tasks and assessments that differ from naturally occurring exposures and interactions. During our study, participants were seated in a child chair with a tray that restricted their movement, and the investigator introduced the objects and associated language in a highly scripted manner. Any effect of the object position to the side of the face-to-face interaction is unknown. Similarly, the shape gesture was produced next to the object rather than directly between the child and adult or closer to the adult’s body as typically produced in sign languages. Another limitation was the lack of a familiarization phase. That is, participants’ first exposures to novel objects were during the experimental task. This lack of hands-on exposure to the objects could have interfered with participants’ attention to the added gesture.
The question of clinical importance that remains is whether word learning by young children at-risk for or exhibiting language disorders could be aided by spoken language combined with gesture. Ellis Weismer and Hesketh (Reference Ellis Weismer and Hesketh1993) reported that manual gestures representing spatial concepts supported nonce word learning by kindergartners with specific language impairment. Vogt and Kauschke (Reference Vogt and Kauschke2017) found that preschool children with specific language impairment performed similarly to typically developing children who were matched for age when novel words presented in a storybook were accompanied by gesture. In a single-subject intervention with four toddlers who were late talkers, Capone Singleton and Anderson (Reference Capone Singleton and Anderson2020) showed that shape gestures paired with taught words increased learning for these words and supported generalization to untaught exemplars compared to words taught with deictic gestures of touching, showing or eye gaze. These studies provide an emerging evidence base needing larger, well-designed investigations of gesture treatments for children with language disorders.
Conclusion
Results suggested that the role of gesture input was circumscribed. There was no statistically significant effect of gesture input when toddlers’ receptive fast-mapping responses were measured. Only an effect of age emerged, such that the older toddlers outperformed the younger toddlers for identification of novel objects taught with nonce labels. When expressive naming was assessed, only one girl and no boys in the younger toddler group named any target objects. Gesture input significantly interacted with gender. Gesture was facilitative for the two-year old girls but conversely appeared to interfere with naming for two-year-old boys. There was not a statistically significant unique effect of the iconic shape gesture compared to a deictic point for the older toddler girls, and the boys demonstrated more naming for the labels taught in the speech-only condition than in the two gesture-speech combined conditions. There remain unlimited opportunities for further investigation into the complex nature of language development in conjunction with child development factors, learning contexts, and the role of gesture input.
Acknowledgments
We wish to thank the following colleagues and graduate assistants: Paul Dagenais, Kelli Evans, Kate Darnall, and Brooke Terry Sorrells