Introduction
Adults exaggerate properties of their speech to infants. Infant directed speech (IDS) is characterized by higher and more variable pitch in shorter utterances (Fernald, Reference Fernald1989), hyperarticulation of vowels (Burnham et al., Reference Burnham, Wieland, Kondaurova, McAuley, Bergeson and Dilley2015; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Weirich & Simpson, Reference Weirich and Simpson2019) and consonants (Dilley et al., Reference Dilley, Millett, McAuley and Bergeson2014), and increased positive affect (Singh et al., Reference Singh, Morgan and Best2002) when compared to adult directed speech (ADS). Numerous studies show that typically developing infants prefer IDS over ADS, recently corroborated in a large-scale replication study across infancy (Frank et al., Reference Frank, Bergelson, Bergmann, Byers-Heinlein, Cristia, Cusack, Dyck, Floccia, Gervain, Gonzalez, Hamlin, Hannon, Kellier, Kline, Lew-Williams, Nazzi, Panneton, Rabagliati, Rennels, Seidl, Yurovsky and Soderstrom2019). Enhanced attention to IDS benefits infants and young toddlers either indirectly because increased attention results in better information processing, and/or directly because certain properties of speech in IDS are more accessible to the infant listener.
One proposed advantage of IDS for language learning is hyperarticulation. That is, the first and second formants (F1, F2) are produced at more extreme positions in vowel space such that they become better exemplars because they are clearer. Hyperarticulation of vowel space is positively correlated with speech intelligibility in adults (Bradlow et al., Reference Bradlow, Torretta and Pisoni1996), suggesting that IDS promotes language learning in infancy through speech clarity. Hyperarticulation has been found in IDS vowels in a variety of languages: English (both American and Australian), Russian, Mandarin, German, and Swedish (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002, Reference Burnham, Wieland, Kondaurova, McAuley, Bergeson and Dilley2015; Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Liu et al., Reference Liu, Kuhl and Tsao2003; Marklund & Gustavsson, Reference Marklund and Gustavsson2020; Weirich & Simpson, Reference Weirich and Simpson2019). The idea that hyperarticulation is a didactic adjustment to benefit early linguistic learning is called the hyperarticulation hypothesis; vowels and consonants are more perceptually distinct in IDS because they are more clearly articulated (Cristia & Seidl, Reference Cristia and Seidl2014). Liu et al. (Reference Liu, Kuhl and Tsao2003) found a significant positive correlation between vowel hyperarticulation in maternal IDS and infants’ phoneme discrimination. Others have found positive correlations between hyperarticulation in IDS and expressive vocabulary and word recognition in older infants (Hartman et al., Reference Hartman, Ratner and Newman2017; Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018).
Evidence consistent with the hyperarticulation hypothesis was found in a study that analyzed Australian mothers’ speech to their six-month-olds, another adult, and the mother’s pet cat or dog (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002). The resulting vowel space for three-point vowels (/i/, /u/, /a/) was significantly larger during IDS than ADS or speech to pets (PDS). The difference in vowel space between PDS and ADS was not significantly different. A follow-up study included mothers’ speech to her infant, to her pet, to a parrot, and to another adult (Xu et al., Reference Xu, Burnham, Kitamura and Vollmer-Conna2015). Again, mothers’ vowel space was significantly larger in speech to the infant than to the adult, which was not significantly different from that to the dog or the parrot, although there was a trend for greater vowel space in the latter case (consistent with the fact that parrots can potentially talk). Hyperarticulation was also seen in mothers’ speech to their infants but not to their dogs, but especially when their infants were young (Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017). In all of these studies, however, the adult participants were talking to their personal pets (dogs and cats), so the age of the non-human listener could vary widely.
In this vein, other research reveals a more complex picture regarding whether mothers hyperarticulate to their young listeners. Englund and Behne (Reference Englund and Behne2006) found no hyperarticulation in Norwegian mothers’ IDS vowels to one-month-olds. Cristia and Seidl (Reference Cristia and Seidl2014) found both hyper- and hypoarticulation (i.e., less clarity) in IDS to four- and 11-month-olds (see also Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017). A negative correlation was found between hyperarticulation and age in Japanese mothers’ speech to six- to 22-month-olds (Dodane & Al-Tamimi, Reference Dodane and Al-Tamimi2007). Dutch mothers showed hypoarticulation to their infants at 11 and 15-months of age, compared to ADS (Benders, Reference Benders2013). Japanese mothers spoke more clearly during ADS compared to IDS (Martin et al., Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015). Taken together, these studies show that mothers’ IDS vowels are both more clear and less clear to their infants across a fairly wide age range, across more than just Western English cultures, across individual differences in infants (e.g., hearing impairment), and also across interactional contexts (e.g., free v. structured speech; Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017). Although it is certainly possible that some of the hyperarticulation found in maternal IDS stems from a didactic interest, there must be other factors that contribute to this overall variability (McMurray et al., Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013).
Positive vocal valence (i.e., speaking while also displaying high positive emotion) may help explain why some IDS is more clear, and why there is variability in the degree to which mothers hyperarticulate while speaking to infants. That is, in addition to mothers differing in the extent to which they intentionally speak clearly in IDS, they may differ in the degree to which they express positive emotions in IDS. First, ‘happy’ speech is often perceived as more intelligible, in part because smiling during speaking often acts to shorten the vocal tract, widen the mouth, and raise F0 and certain formant frequencies (Kalashnikova et al., Reference Kalashnikova, Carignan and Burnham2017). Second, ‘happy’ speech tends to be higher in F0 and wider in F0 range, the latter being another acoustic correlate of clear speech (Bradlow et al., Reference Bradlow, Torretta and Pisoni1996). Third, speech to infants that is high in positive valence often contains voiced segments that are longer in duration (Green et al., Reference Green, Nip, Wilson, Mefferd and Yunusova2010), and increased vowel duration promotes hyperarticulation (Ferguson & Kewley-Port, Reference Ferguson and Kewley-Port2007). Infants prefer speech that is rated as “happy” over that rated as neutral regardless of whether it is spoken to an infant or an adult (Singh et al., Reference Singh, Morgan and Best2002). So the link between maternal hyperarticulation and infant/toddler speech processing may be due to increased attention to IDS that is positively valenced as well as the clarity of the speech per se. Similarly, puppies responded more positively to dog-directed speech than did older dogs (Ben-Aderet et al., Reference Ben-Aderet, Gallego-Abenza, Reby and Mathevon2017).
Importantly, Burnham et al. (Reference Burnham, Kitamura and Vollmer-Conna2002) noted that mothers’ pet-directed speech was lower in positive vocal valence than their IDS, but higher in positive valence than their ADS. Given that the mothers were speaking to cats and dogs in their homes, they may have been speaking to adult animals (possibly attenuating positive valence; see Ben-Aderet et al., Reference Ben-Aderet, Gallego-Abenza, Reby and Mathevon2016; de Mouzon et al., Reference De Mouzon, Gilbert, Di-Stasi and Leboucher2022). If positive valence increases with infant status (regardless of the species), there might be equivalent hyperarticulation in IDS to human and non-human infants. We compared mothers’ speech to their six-month-old infants, to 8- to 12-week-old puppies, and to adults. We anticipated that if hyperarticulation is primarily driven by its didactic purpose, it would be present in IDS to human infants, but not to puppies or adults. However, if vowel space is influenced by positive vocal valence, and puppies elicit as much positive emotion as human infants (which is greater than to adults), hyperarticulation would be equivalent in IDS to both human infants and puppies, but not to adults. Additionally, we aimed to document the acoustic and perceptual features of each speech type (e.g., pitch, vowel duration, positive valence, and hyperarticulation) for further exploratory analyses.
Method
Participants
Ten mothers with monolingual English-learning six-month-old infants were tested (mean infant age: 6 months and 12 days; range: 5 months, 29 days to 6 months, 19 days; 7 females), in line with the sample size of previous studies of vowel hyperarticulation to 5- and 6-month-olds (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). In the current study, mothers were all White, mean age: 33 years (range 28 to 39 years), six mothers were primiparous and the other four had one older child. We involved two puppies (age range 8-12 weeks) procured from our local animal shelter, and one of three different undergraduate research assistants (two females) as the adult listeners.
Audio Recordings
Recordings of mothers took place in a sound-attenuated room, using a Marantz CD-recorder and a Lavalier lapel microphone. All mothers were told that we were interested in capturing both similarities and differences in how they talked to three different audiences: infant, puppy, adult. We asked them to speak to each listener for 10 minutes (adult, infant, puppy; 30 total minutes) about three small, physical objects (either a colorful bead, a doll-sized boot, and either a wooden box or a tennis ball). These object names provided numerous acoustic samples of the American English corner vowels /i/ (bead), /u/ (boot), /ɑ/ (ball) and /a/ (box). We substituted ball for box for the last four mothers because some of the earlier mothers’ /a/ productions in ‘box’ were quite short such that the F1 and F2 values seemed unstable. Mothers were given 1-2 minute breaks between conversations in order for us to prepare the next listener.
Recording Procedure
All mothers were digitally recorded as they spoke to their own infants (IDS), to one puppy (PDS), and to one undergraduate assistant (ADS). The order of these registers as well as the objects being referred to was counterbalanced across mothers. Mothers described each object, talked about its characteristics, and sometimes its function. Mothers were seated in front of a small table on which an infant seat was positioned (for IDS) or facing another chair in which the assistant held the puppy en face with the mother (PDS) or the assistant was en face with the mother alone (ADS). Mothers were handed one object at a time and encouraged to use the name of the object as frequently as possible as they spoke to the intended listener.
Vowel Analysis
Target words (free from background noise) were excised from utterances (using Adobe Audition), and their vowels were isolated (PRAAT v6.0; using a combination of looking for physical evidence of vowels and listening to transitions into and out of a vowel production; this was judged visually using spectrograms), resulting in 614 vowels included for formant and acoustic analysis (bead = 209 (n=10 moms); boot = 201 (n=10 moms); box = 99 (n=6 moms); ball = 105 (n=4 moms)). The dependent measure of primary interest was the vowel area formed by these target productions. To evaluate hyperarticulation, the first two formants measured in Hz (F1, F2) were spectrally analyzed at center positions in each of the 614 excised vowels. Vowel space was then calculated from the means for F1 and F2 as such: (F1/i/*(F2/u/ - F2/A/) + F1/u/*(F2/i/ - F2/A/) + F1/A/*(F2/i/ - F2/u/))/2, where /A/ stands for /a/ (for box) or /ɑ/ (for ball). Also, a MELS conversion on these formant values was used in identical analyses and resulted in the same outcomes (see Supplemental Materials).
Positive Valence Analysis
To compare infant, pet, and adult-directed utterances for valence, 31 undergraduates rated a random sample of 90 utterances within each of the three listener groups (30 IDS, 30 PDS, 30 ADS) which were low-pass filtered at 400Hz to reduce lexical access. The undergraduates used a valence rating scale, with -4 very negative emotion to +4 very positive emotion, and 0 as neutral. The 90 utterances were randomly presented to the raters with the constraint that not more than two utterances of the same listener category could occur sequentially. Undergraduates received no specific training, and were asked to rate the valence of each filtered utterance as it was presented. The undergraduates were solicited from an Introductory Psychology pool wherein students can earn extra credit for being involved in a psychology experiment.
Results
A repeated measures analysis of variance (SPSS ANOVA) on average vowel space across listener (IDS, PDS, ADS) revealed a significant main effect (F(2,18) = 4.07, p = .035, η 2=.31), with pairwise comparisons indicating that average vowel spaces for IDS and PDS were not significantly different (t(9) = .45, p = .66; d = .14), but IDS vowel space was significantly greater than ADS vowel space (t(9) = 2.42; p = .039; d = 1.13) and PDS vowel space was marginally but not significantly greater than ADS vowel space (t(9) = 2.04, p = .072; d = 1.01; see Figure 1). What is clear from the analysis as well as Figure 1 are inconsistent differences between F1 and F2 values across the point vowels. That is, IDS/PDS vowel space was more exaggerated for the closed vowels /i/ and /u/ compared to the open vowels /a/ and /ɑ/. Paired t-tests were conducted between listener categories for both F1 and F2. Overall, there were no significant differences between F1 values for any vowels. For the open vowels /a/ (as in box) and /ɑ/ (as in ball), there were no significant differences between F2 values as a function of listener. In contrast, for the closed vowel /i/ (as in bead), F2 was significantly higher in IDS than ADS (t(9)= 5.04, p<.001, d=.48), and higher in PDS than ADS (t(9)= 2.53, p =.03, d=.80). IDS and PDS were not significantly different from each other in their /i/ F2. For the closed vowel /u/ (as in boot), the F2 for /u/ was significantly lower in IDS than ADS (t(9)= -5.53, p<.001, d=-1.74) as well as in PDS than ADS (t(9)= -6.84, p<.001, d= -2.16). IDS and PDS were not significantly different from each other in their /u/ F2.
Next, we compared adults’ perception of the positive valence of the low-pass filtered IDS, PDS, and ADS utterances. The results showed that IDS and PDS utterances were each significantly rated as higher in positive valence than ADS. The valence ratings between IDS and PDS were not significantly different. That is, a repeated measures ANOVA on positive valence showed a significant main effect of listener (F(2,60) = 131.51, p < 001; η 2=.81) with IDS (M = .91, SD = .60) and PDS (M = .98, SD =. 53) not significantly different from each other (t(30) = 1.56, p = .13; d = .28), but both significantly higher in positive valence than ADS (M = -.37, SD= .64; (IDS: t(30) = 11.56, p = .001, d = 2.08; PDS: t(30) = 12.62, p = .001, d = 2.27; see Figure 2).
To describe the acoustic characteristics of individual vowel tokens (i.e., vowel duration and pitch; see Table 1), mixed models were fit using lmerTest in R (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017), with listener (ADS as the baseline, IDS, PDS) in interaction with target word (ball as the baseline, bead, boot, box) as fixed effects, mother as random intercept, and random slopes by mother for listener (i.e., the formula was ~ listener * target_word + (1 + listener | mother)). Vowel duration was significantly longer in IDS than ADS (ß = 0.072, SE = 0.031, p = .03), but not so in PDS than ADS (ß = 0.029, SE = 0.026, p = .27), with no other main effects or interactions. Vowel pitch was not significantly higher in IDS than ADS (ß = 26.300, SE = 26.726, p = .33), but it was higher in PDS than ADS (ß = 46.251, SE = 22.560, p = .05).
Discussion
Consistent with previous studies, current findings corroborate that English-speaking mothers exaggerated certain vocal attributes of their speech when speaking to their six-month-olds compared to an adult (i.e., hyperarticulation of closed vowels, more positive valence, increased vowel duration). Mothers also made some vocal adjustments when speaking to a young puppy compared to speaking to an adult (i.e., hyperarticulation of closed vowels, more positive valence, higher average pitch) although the overall mean vowel space of PDS was not significantly larger than ADS (p<.07). The lack of hyperarticulation on the open vowels (i.e., /a/ and /ɑ/) is similar to that in two separate studies analyzing mothers’ IDS in a story-book reading context. That is, mothers exaggerated their formant values (F1, F2) in the vowels /i/ and /u/, but not /a/ (E. Burnham et al., Reference Burnham, Wieland, Kondaurova, McAuley, Bergeson and Dilley2015). Kuhl et al. (Reference Kuhl, Andruski, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997) also found that Russian speaking mothers did not shift their F2 values on the /a/ vowel when speaking to their infant. In contrast, Weirich and Simpson (Reference Weirich and Simpson2019) found hyperarticulation of /a/ in German mothers’ IDS compared to ADS (notably, this was “read” speech compared to natural interactive speech). When looking at the effect sizes of the total vowel space differences in the current study (these are reported as Cohen’s d), the IDS>ADS was d=1.13 whereas the PDS>ADS was d=1.01. When comparing effect sizes of only F2 differences for the /i/ vowel, the PDS>ADS was larger than the IDS>ADS (ds were .80 v. .48, respectively). This was also the case for the /u/ vowel with PDS>ADS having a larger effect size than IDS>ADS (ds were -2.16 v. -1.74, respectively). Thus, we argue that given the small sample size, the effect sizes lend credibility to our conclusion that mothers did hyperarticulate at least two of the three-point vowels when speaking to both infants and puppies.
Importantly, analyses of lip movements have shown that mothers increase the size of their mouth openings during IDS compared to ADS when producing the high vowels /i/ and /u/ and the low vowels /a/ and /ɑ/ (Green et al., Reference Green, Nip, Wilson, Mefferd and Yunusova2010). Interestingly, these enlarged mouth movements resulted in significant differences in F0 and F1 in all the IDS vowels, but significant F2 differences in only the IDS low vowels (/a/ and /ɑ/). In the current study, we found no significant differences across listener type in F1 and F2 for the low vowels, but significant changes in F2 for both high vowels (i.e., /i/ and /u/) in IDS and PDS compared to ADS. The lack of a significant difference in formant values for the low vowels in the current study could be due to the changing from “box” to “ball” which reduced their individual sample sizes. The fact that we did find significant F2 changes for the high vowels suggests that mothers adjusted their articulatory movements more easily when discussing the objects “bead” and “boot” to the infants and the puppies. Although it is possible that this increase in F2 for /i/ resulted from more smiling in IDS and PDS, this is unlikely for the F2 change in /u/. However, we have no independent measure of smiling during talking about any of the objects and to the three listener types. Future studies will benefit from video as well as auditory recordings of the participants, and comparing casual conversation with “teaching” language (e.g., asking mothers to specifically teach the listener about the object being named; see Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017).
The positive valence ratings of the IDS and PDS in the current study were not significantly different from each other, but both significantly higher than ADS. This is in contrast to the valence ratings found in Burnham et al. (Reference Burnham, Kitamura and Vollmer-Conna2002) whose pet-directed speech was significantly lower in positive valence compared to IDS. Thus, even though dogs can be considered low in linguistic potential (see Xu et al., Reference Xu, Burnham, Kitamura and Vollmer-Conna2015), they can elicit hyperarticulation from a mother by simple virtue of their infant status (i.e., they are puppies). Ben-Aderet et al. (Reference Ben-Aderet, Gallego-Abenza, Reby and Mathevon2017) refer to this as the “baby schema”: that humans should restrict the use of pet-directed speech to young puppies. Although these authors found pet-directed speech to pictures of dogs of all ages, only puppies actually responded to this speech in a playback condition.
Our results demonstrate that hyperarticulation in maternal speech can result from increased positive valence. This does not necessarily discount the hyperarticulation hypothesis; it remains possible that the mothers hyperarticulated their IDS primarily through vowel lengthening (in the service of being more clear) but the PDS primarily through raising F0 (in the service of being more emotionally positive). A future study could disentangle these possibilities by recording mothers talking to their infants, to a puppy and to an adult in two different conditions: (1) intentionally teaching the labels of the objects, and (2) intentionally capturing and holding attention of the listener.
Hyperarticulation in infant-directed speech is affected by a host of variables, including but not limited to, perceived linguistic potential, perceptual acuity, speaker gender, and emotion expression (Bradlow et al., Reference Bradlow, Torretta and Pisoni1996; Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017). Mother-infant exchanges may afford hyperarticulation as the result of vocal emotion in addition to a linguistic strategy. In this view, it would be expected that mothers with significantly dampened emotional expressiveness (e.g., clinical depression; Lam-Cassettari & Kohlhoff, Reference Lam-Cassettari and Kohlhoff2020) would show low levels of hyperarticulation to infants, even as the infant ages if mother’s depression is chronic. To our knowledge, no studies have examined the degree to which depressed mothers hyperarticulate during infant exchanges, but it has been documented that positive effects of IDS on associative learning are reduced in infants whose mothers are depressed (Kaplan et al., Reference Kaplan, Danko, Cejka and Everhart2015). Another possible factor in promoting hyperarticulation may be based on the amount and type of feedback received from the listener. That is, mothers with infants who clearly respond positively to their vocalizations may automatically enhance vocal clarity. In support of this notion, Lam and Kitamura (Reference Lam and Kitamura2012) found that mothers did not hyperarticulate point vowels when their infants were not able to actually hear mothers’ voices. These authors also found hyperarticulation in one mother’s speech to her normal hearing twin, but not to her hearing-impaired twin (Lam & Kitamura, Reference Lam and Kitamura2010).
Future research should look for changes in the variables promoting hyperarticulation as infant competencies and mother intentions shift. If hyperarticulation is primarily influenced by maternal emotional expressiveness, hyperarticulation may be more frequent in speech to younger as opposed to older infants as the emotional intent of IDS decreases as infants age (Kitamura & Burnham, Reference Kitamura and Burnham2003). Likewise, if hyperarticulation is primarily influenced by didactic intent, hyperarticulation in older infants and toddlers (compared to infants younger than 6 months) may be more common as infants become immersed in phoneme awareness and word learning (however, see Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018 who found consistent levels of hyperarticulation in IDS to infants seven to nineteen months of age).
Although our results bear on what mothers do, it is relevant to consider what the implications of these results may have for theories of language acquisition. The hypothesis that parents hyperarticulate with a didactic intent went hand in hand with the belief that parents’ changes in vowel pronunciation were beneficial for learning. This possibility has been studied using computational research, which can control for potential benefits infants derive from other aspects of infant-directed speech, since machines can be made to focus purely on learning sounds or words. This research has surprisingly shown that infant-directed speech does not entail net benefits for learning vowels (Ludusan et al., Reference Ludusan, Mazuka and Dupoux2021) or words (Guevara-Rukoz et al., Reference Guevara-Rukoz, Cristia, Ludusan, Thiollière, Martin, Mazuka and Dupoux2018). Given these computational results on learnability, it is possible that beneficial effects of hyperarticulation in IDS are actually due to cognitive and emotional effects of increased positive affection (which was manipulated here), or other factors that co-determine vowel space.
In sum, we argue that hyperarticulation in IDS is co-determined by several factors at any one time, whose role may vary over infant characteristics and contexts (Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018). Computational modeling studies could help support or question causal links between increased positive valence and learning, as they did for hyperarticulation and vowel learning. This type of convergent methods approach will promote understanding the complexity of hyperarticulation and other important cues in IDS for infant and toddler language acquisition.
Acknowledgements
This publication is supported by the J. S. McDonnell Foundation Understanding Human Cognition Scholar Award to AC. The content is solely the responsibility of the authors and does not necessarily represent the official views of the J.S.MF. We thank HoJin Kim and Maria Diehl for their assistance with data collection and vowel extraction. We thank the mothers and their infants for participating in this study and the Montgomery County Animal Care and Adoption Center for allowing us to employ two puppies from their facility.
Competing interest
The authors declare none.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000923000296.