Prosodic variation between contexts in infant-directed speech

Jenna DiStefano; Michelle Cohn; Georgia Zellou; Katharine Graf Estes

doi:10.1017/S0305000924000709

Prosodic variation between contexts in infant-directed speech

Published online by Cambridge University Press: 13 January 2025

and

Jenna DiStefano*: Affiliation:
Center for Mind and Brain, University of California, Davis, USA Department of Psychology, University of California, Davis, USA
Michelle Cohn: Affiliation:
Department of Linguistics, University of California, Davis, USA
Georgia Zellou: Affiliation:
Department of Linguistics, University of California, Davis, USA
Katharine Graf Estes: Affiliation:
Center for Mind and Brain, University of California, Davis, USA Department of Psychology, University of California, Davis, USA
*: Corresponding author: Jenna DiStefano; Email: [email protected]

Article contents

Abstract
Introduction
Acoustic features of IDS
Proposed motivations of IDS
The contexts of IDS
Current study
Methods
Results
Discussion
Competing interest
Footnotes
References

Rights & Permissions

Abstract

Speakers consider their listeners and adjust the way they communicate. One well-studied example is the register of infant-directed speech (IDS), which differs acoustically from speech directed to adults. However, little work has explored how parents adjust speech to infants across different contexts. This is important because infants and parents engage in many activities throughout each day. The current study tests whether the properties of IDS in English vary across three in-lab tasks (sorting objects, free play, and storytelling). We analysed acoustic features associated with prosody, including mean fundamental frequency (F0, perceived as pitch), F0 range, and word rate. We found that both parents’ pitch ranges and word rates varied depending on the task in IDS. The storytelling task stood out among the tasks for having a wider pitch range and faster word rate. The results depict how context can drive parents’ speech adjustments to infants.

Keywords

infant-directed speech language development prosody

Type: Article
Information: Journal of Child Language , First View , pp. 1 - 23

DOI: https://doi.org/10.1017/S0305000924000709 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Speakers commonly consider their listeners when talking and adjust the characteristics of their speech accordingly (Bell, Reference Bell1984; Clark & Murphy, Reference Clark and Murphy1982). The concept of audience design applies to many different types of listeners, such as those who are hard-of-hearing, those who speak a different language, AI devices, and infants and children (Cohn et al., Reference Cohn, Segedin and Zellou2022; Lam & Kitamura, Reference Lam and Kitamura2012; Uther et al., Reference Uther, Knoll and Burnham2007). The register that adults use when interacting with infants is known as infant-directed speech (IDS). It is often characterised by having heightened pitch (perceived fundamental frequency, F0), slower rate, vowel space expansion/hyperarticulation, and longer vowel duration (Cooper & Aslin, Reference Cooper and Aslin1990; Cristia & Seidl, Reference Cristia and Seidl2014; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; but see Englund, Reference Englund2018). These features of IDS are well-documented, but an open question is to what extent features of IDS vary across communicative contexts for a parent and child.

Like adults, infants do not experience language in a singular context. In day-to-day life, infants experience a variety of settings that alter the sounds, references, grammatical constructions, and lexical co-occurrences of a word. For example, “car” during story time can involve a narrative about an anthropomorphised car trying to win a race, whereas “car” on a road trip viewed among trucks, vans, and Sport Utility Vehicles (SUVs), can present opportunities to learn about categories (i.e., vehicles). However, the majority of studies examining IDS features have focused on a single type of interaction, spontaneous speech during play. A smaller number of studies have compared multiple contexts, usually two, such as read versus spontaneous speech (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022). Thus, we have little knowledge of the ways that parents adjust acoustic features of IDS when communicating across a range of different contexts with different goals. Indeed, parents talk to their infants in many different settings with various communicative goals throughout their everyday lives, yet most studies do not take this into account. The current study addresses whether parents display acoustic variation across different contexts when talking with their infants. Studying acoustic characteristics of IDS across multiple contexts tests the degree to which audience design is sensitive to the specific communicative needs of the situation and not just defined by the type of interlocutor.

2. Acoustic features of IDS

Heightened fundamental frequency (F0), perceived as pitch, is one of the most salient features of IDS. Many studies have shown that mean pitch is typically raised in IDS relative to adult-directed speech (ADS) (Cooper & Aslin, Reference Cooper and Aslin1990; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Narayan & McDermott, Reference Narayan and McDermott2016; Trainor & Desjardins, Reference Trainor and Desjardins2002), a finding that has been observed across a variety of languages and cultures (Broesch & Bryant, Reference Broesch and Bryant2015; Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022; Hilton et al., Reference Hilton, Moser, Bertolo, Lee-Rubin, Amir, Bainbridge and Mehr2022). In addition, pitch range is shown to be wider in IDS relative to ADS (Broesch & Bryant, Reference Broesch and Bryant2015; Fernald & Simon, Reference Fernald and Simon1984; Xu Rattanasone et al., Reference Xu Rattanasone, Burnham and Reilly2013). Increased mean pitch and pitch range in IDS are thought to serve a variety of functions such as directing infants’ attention (Nencheva & Lew-Williams, Reference Nencheva and Lew-Williams2022), highlighting important aspects of the language (Graf Estes & Hurley, Reference Graf Estes and Hurley2013; Werker et al., Reference Werker, Pons, Dietrich, Kajikawa, Fais and Amano2007) and communicating emotion (Trainor et al., Reference Trainor, Austin and Desjardins2000).

Another well-studied acoustic characteristic of IDS is a slower speaking rate (Cooper & Aslin, Reference Cooper and Aslin1990; Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022; Cristià, Reference Cristià2010; Fernald & Simon, Reference Fernald and Simon1984; Martin et al., Reference Martin, Igarashi, Jincho and Mazuka2016). Slower speech rate in IDS has been observed across languages and cultures (Broesch & Bryant, Reference Broesch and Bryant2015). Part of the slower rate is segmental lengthening; for example, vowel duration, or how long speakers produce vowels, is longer relative to ADS (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022; Cristia & Seidl, Reference Cristia and Seidl2014; Hartman et al., Reference Hartman, Ratner and Newman2017; but see Martin et al., Reference Martin, Igarashi, Jincho and Mazuka2016). Slower utterances (both at the word and vowel level) appear to be easier for young listeners to process because they occur on a longer timescale, giving listeners more time to parse linguistic information. For example, Zangl et al. (Reference Zangl, Klarman, Thal, Fernald and Bates2005) found that infants better recognised words when presented in a slower rate of IDS, compared to those in a more challenging acoustic register (faster speech rate or low pass filtered). Other work has shown that children of parents who produce longer vowel durations and more expanded vowel spaces (i.e., referring to the size of the acoustic distinctions between vowels based on properties of the first and second formants) tend to perform better on speech discrimination tasks (Hartman et al., Reference Hartman, Ratner and Newman2017), suggesting that parents’ IDS adaptations might affect infant learning.

The work reviewed so far has demonstrated that there are distinct speech adjustments that parents make when interacting with infants. However, this literature has analysed IDS as a homogenous speaking style, not accounting for ways that parents may adjust IDS in different contexts. The following sections will address whether parents have systematic patterns of speech variation within IDS, adapting their speech to infants to accommodate different communicative and/or social goals across contexts.

3. Proposed motivations of IDS

While several features of IDS, such as elevated pitch and slower speech rate, appear consistently across many studies and languages, there are differing views about what drives these adaptations. Some proposals argue that these adjustments serve to direct attention (Liu et al., Reference Liu, Kuhl and Tsao2003; Nencheva & Lew-Williams, Reference Nencheva and Lew-Williams2022; Räsänen et al., Reference Räsänen, Kakouros and Soderstrom2018), support language or cognitive development (Graf Estes & Hurley, Reference Graf Estes and Hurley2013; Hartman et al., Reference Hartman, Ratner and Newman2017; Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Liu et al., Reference Liu, Kuhl and Tsao2003; Song et al., Reference Song, Demuth and Morgan2010), and express positive affect (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Kitamura & Burnham, Reference Kitamura and Burnham2003; Trainor et al., Reference Trainor, Austin and Desjardins2000; Uther et al., Reference Uther, Knoll and Burnham2007; Werker & McLeod, Reference Werker and McLeod1989). There is some work suggesting that parents’ intentions can be better perceived in IDS. Fernald et al. (Reference Fernald, Taeschner, Dunn, Papousek, De Boysson-Bardies and Fukui1989) found that adults were better at characterising the intents of phrases (e.g., getting attention, game-playing, etc.) produced in IDS compared with ADS, suggesting that acoustic signatures of IDS are flexible to context.

Among these proposed motivations of IDS, directing attention has been one of the most widely discussed (Liu et al., Reference Liu, Kuhl and Tsao2003; Nencheva & Lew-Williams, Reference Nencheva and Lew-Williams2022; Räsänen et al., Reference Räsänen, Kakouros and Soderstrom2018). The notion is that the prosodic fluctuations of IDS and the emphasis on novel words engage and sustain attention. Nencheva and Lew-Williams (Reference Nencheva and Lew-Williams2022) proposed that parents may produce new words with a higher pitch than those that are familiar, which would then direct the infant’s attention towards the novel word and highlight its properties. The neuroscience literature identifies a process called entrainment, where neural activity time-locks to sensory input, which is important for processing and attention (Jones et al., Reference Jones, Kidd and Wetzel1981). In IDS, the auditory stimuli infants receive may be formulated specifically for infants, which could enhance their entrainment. Specifically, the slower rhythm of IDS may be ideal for the neural rhythms present in the infant brain (Payne et al., Reference Payne, Post, Astruc, Prieto and Vanrell2015). The attention-getting features of IDS may contribute to language development by promoting attention to the speech signal and directing infants towards the important features of language, such as new words and phrase boundaries (Nelson et al., Reference Nelson, Hirsh-Pasek, Jusczyk and Cassidy1989; Nencheva & Lew-Williams, Reference Nencheva and Lew-Williams2022).

Infants’ preference for IDS may be linked to its greater positive effect as well as its comforting nature, compared to other speech registers such as ADS and foreigner-directed speech (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Kitamura & Burnham, Reference Kitamura and Burnham2003; Uther et al., Reference Uther, Knoll and Burnham2007; Werker & McLeod, Reference Werker and McLeod1989). Prosodic features of IDS such as heightened pitch, pitch variation, expanded vowel space, and slower speech rate are also thought to contribute to the greater positive effect of IDS (Benders, Reference Benders2013; Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002). While greater positive effect is not the main concern of the current study, these findings show that speakers are aware of the emotional and more general needs of their addressee and adjust their speech register accordingly.

Specific prosodic properties of IDS, such as enhanced pitch range and slower speech rate, have been connected to emerging language skills in infants. Raneri et al. (Reference Raneri, Von Holzen, Newman and Ratner2020) found that parents who used slower speech rates to infants at 7 months had children with larger vocabularies at 2 years of age. Additionally, mothers’ enhancement of vowel duration in English correlated with infants’ expressive vocabulary, word recognition skills, and general language abilities (Hartman et al., Reference Hartman, Ratner and Newman2017; Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Liu et al., Reference Liu, Kuhl and Tsao2003; Song et al., Reference Song, Demuth and Morgan2010). There is some experimental evidence that IDS prosody can affect language learning. Graf Estes and Hurley (Reference Graf Estes and Hurley2013) tested the idea that IDS could facilitate word learning by presenting infants with an object labelling task in ADS or IDS. They found that infants did not learn the object labels when presented in ADS prosody but learned those same labels when presented in IDS prosody, suggesting that IDS affects the way that infants connect sounds to meaning. Thiessen et al. (Reference Thiessen, Hill and Saffran2005) also found that infants were better able to segment words from fluent speech in IDS versus ADS. They proposed that IDS may sustain infants’ attention better than ADS, making it easier for them to extract linguistic information from the speech stream (Thiessen et al., Reference Thiessen, Hill and Saffran2005).

4. The contexts of IDS

IDS may serve a variety of goals, such as eliciting and maintaining attention, supporting learning, or promoting positive affect. Nevertheless, it is possible that these goals do not appear within the same context or activity. When adults interact with one another, they make alterations in the way they talk and change the content of their language depending on the listener and the purpose of the interaction (Giles, Reference Giles1973; Gumperz, Reference Gumperz1977). One well-studied way that parents adapt IDS is in response to their infant’s age and language proficiency (Julien & Munson, Reference Julien and Munson2012; Ko, Reference Ko2012). In one study by Julien and Munson (Reference Julien and Munson2012), adults rated the accuracy of 2- and 3-year-old children’s production of fricatives and then were asked to speak as if they were responding to that child. When adults rated a child’s productions as inaccurate, the adults produced longer fricatives, showing that adults’ speech production is responsive to children’s language proficiency. There is also evidence that as children get older and develop more advanced language skills, parents increase the complexity of the language they provide, such as producing longer utterances with greater lexical diversity to older children than to younger children (Huttenlocher et al., Reference Huttenlocher, Waterfall, Vasilyeva, Vevea and Hedges2010; Rowe, Reference Rowe2012).

It is possible that parents also employ different features of IDS based on the needs of the setting at hand with their infant, recognising that different contexts have unique goals. For instance, during reading time, the infant hears words and a story, potentially linking what they hear with the images they see. Meanwhile, during playtime, the infant is an explorer, viewing and manipulating objects in coordination with their parents. These interaction patterns are consistent with Bell’s (Reference Bell1984) claim that the addressee’s role is not passive; their responsiveness and goals actively shape the way speakers communicate. Furthermore, a study by Spence and Moore (Reference Spence and Moore2003) showed that 6-month-old English learning infants can categorise different types of utterances in IDS, specifically approval and comfort utterance types. This is evidence that IDS does not only vary based on the context but also that infants are able to discriminate between these sub-styles based on the acoustic properties of IDS.

Parents may adjust salient prosodic features of IDS, like pitch and rate, depending on the type of communicative context. There is some evidence that acoustic features of IDS, specifically mean pitch, vary depending on the experimental task. For example, early work found context-related differences in both the acoustic features of IDS and utterance level characteristics such as utterance length and lexical diversity (Rondal, Reference Rondal1980; Stern et al., Reference Stern, Spieker, Barnett and MacKain1983). Additionally, a recent meta-analysis found that tasks consisting of spontaneous speech produced higher pitch compared to read speech across a wide range of infant ages and languages (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022). This meta-analysis did not break down the analyses into more specific tasks, thus it is difficult to make conclusions based on just two categories (i.e., read speech and spontaneous speech).

Previous work has demonstrated that the specific type of shared activity shapes features of parents’ IDS (Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017) and gesture use with their infant (Puccini et al., Reference Puccini, Hassemer, Salomo and Liszkowski2010). Puccini and colleagues found that parents used more pointing gestures in a room exploration task compared to free play. Gergely et al. (Reference Gergely, Faragó, Galambos and Topál2017) found acoustic differences in IDS between three different spontaneous-speech situations by Hungarian speakers and their 30-month-olds: teaching, book reading, and problem-solving. In the teaching task, speakers explained how to use a phone application. In the book reading task, speakers read a book to the infant. Finally, in problem-solving task, parents were instructed to encourage the infant to complete a task based on their age (e.g., grabbing objects, displacing objects and ordering objects). They found that parents produced the highest mean pitch in the problem-solving task, followed by storytelling and teaching which had the lowest mean pitch in IDS compared to the other tasks (Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017). The authors attributed these context-related changes to parents’ awareness of the attentional state of the listener (Gergely et al., Reference Gergely, Faragó, Galambos and Topál2017). While this study provides evidence that acoustic features of IDS can change based on the context, the speech samples were analysed across all words in all utterances, and there was variation in the linguistic content across tasks.

The current study focuses on specific target words in English that are repeated across tasks, allowing for a more direct comparison of how the production of the same words can vary based on the goals of different contexts. Our analysis of specific target words allows us to attribute acoustic differences to the varying contexts, rather than by the specific phrases or words used. We tested three prosodic features of IDS at the word level (average pitch, pitch range, and rate). While increased pitch range is a common feature of IDS, related work has shown variation across individuals in whether it might increase or decrease across IDS contexts, such as conversational versus read speech (Shute & Wheldall, Reference Shute and Wheldall1999). As explained in the following section, we predict that parents’ adjustments of IDS might vary depending on the context of the interactions with their infants.

5. Current study

The current study tests how the prosodic characteristics of IDS in English are tuned across different activities. Specifically, the current study investigates how parents adjust their speech to infants when producing the same word across different contexts. We analysed prosodic features of IDS (mean pitch, pitch range, and word rate) in the same set of target words across three different tasks in IDS: one where parents played with their child as they would at home (we call this a “freeplay task”), a task where parents and children worked together to sort toys into three different category bins (“sorting task”), and one where parents tell a story to narrate a wordless picture book to their child (“storytelling task”). Here, we use word rate to refer to word-level speaking rate (i.e., number of syllables per second). We analysed characteristics of pitch and rate based on previous evidence establishing the widespread modifications of these dimensions in IDS compared to ADS (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022; Kitamura et al., Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2001; Narayan & McDermott, Reference Narayan and McDermott2016).

During the IDS tasks, parents were instructed to focus on a specific set of target words while completing the activities with their infant. For the free play task, parents were given toys that represented the target words. They were asked to label the objects but to otherwise play with their infant like they would at home. In the sorting task, parents and infants had to work together to sort those same toys into the correct bins. One bin was labelled “living things,” one was “food,” and one was “objects.” Lastly, during the storytelling task, parents were given a wordless picture book that contained images of target items in scenes. They were prompted to make up a story aligning with the images in the book and interact with their infant as they would at home. These IDS tasks were designed so that parents had many opportunities to use the target words. Additionally, these tasks mimic activities that parents and their infants interact with during their day-to-day lives.

In this study, we predict that the type of activity will structure parents’ IDS prosody based on the goals of each distinct task. The goal of the storytelling task is to engage infants’ attention with the images and events in a book, which may result in exaggerated IDS features for target words, such as slower rate and higher and wider pitch variation. A unique aspect of the storytelling task is that it is the only context where parents and their infants do not have physical referents for the target words in the form of a toy. The absence of physical referents may create more ambiguity in infant’s understanding of the word. Parents’ attention to a referent may be very clear when they can gesture and interact with a physical object (Trueswell et al., Reference Trueswell, Lin, Armstrong, Cartmill, Goldin-Meadow and Gleitman2016). Meanwhile, infants may find more difficulty tracking their parents’ attention in contexts where the referent is more abstract – such as in our storytelling context where the only representation of the target words are images on a page. As a result of the potential ambiguity, parents may exaggerate IDS features to engage their infant. Relatedly, storytelling is more expressive compared to other spontaneous-speech contexts (Montaño & Alías, Reference Montaño and Alías2017). Within stories, there are many instances where speakers are conveying emotions related to the characters within the narrative. This inherently more expressive nature of storytelling may cause speakers in these contexts to produce more exaggerated speech reflected in pitch and rate, compared to other contexts (Cowie et al., Reference Cowie, Douglas-Cowie and Wichmann2002; Kuhn et al., Reference Kuhn, Schwanenflugel and Meisinger2010; Veenendaal et al., Reference Veenendaal, Groen and Verhoeven2014; Wolters et al., Reference Wolters, Kim and Szura2020). We predict that these unique characteristics of the storytelling task will drive prosodic variation.

The goal of the sorting task is to encourage infants to help perform a series of actions with the toys (picking up objects and placing them in buckets), which may incite parents to produce intermediate acoustic features of IDS because it involves encouraging infants’ involvement and compliance with a physical object. Gergely et al. (Reference Gergely, Faragó, Galambos and Topál2017) found that parents produced the highest pitch in the problem-solving task. Since our sorting task would best mimic Gergely et al. (Reference Gergely, Faragó, Galambos and Topál2017)’s problem-solving task, we predict that parents will have the highest mean pitch during the sorting task compared to the other two tasks because of the problem-solving nature of the interaction.

The goal of the free play task was to provide a naturalistic setting for word learning; parents were told to label each of the 14 objects but otherwise play with their infant as they would at home. Beech and Swingley (Reference Beech and Swingley2024) found that parents spoke with the greatest phonetic clarity when they were first referring to an object, and the referents in that situation were also clear. They describe this as creating “conversational gems” where infants have the most informative opportunity for word learning. We predict a similar pattern with our free play task because it is the most likely time when parents will focus on word forms and referents. In particular, we predict that during this task parents will have both exaggerated pitch and slower word rate than sorting and storytelling. As a result of the free play task providing an opportunity for word learning, we predict that these prosodic adjustments in pitch and word rate will become less exaggerated over additional mentions of each word. It is important to note that the “vignettes” used by Beech and Swingley (Reference Beech and Swingley2024) were taken from a corpus of at-home parent–child interactions that occurred in a variety of contexts (Cartmill et al., Reference Cartmill, Armstrong, Gleitman, Goldin-Meadow, Medina and Trueswell2013). Our study was conducted in the lab with clear distinctions between experimental tasks, therefore our task-based predictions differ somewhat from the prior work that shaped our predictions.

Together, the storytelling, sorting, and free play tasks have distinct requirements; accordingly, we predict that the degree of IDS prosodic modifications (pitch and word rate) will reflect differences between these tasks and ways that parents support the linguistic needs of their infants. While we predict differences in mean pitch, pitch range, and speaking rate between tasks, certain features may be more context-dependent than others (Stern et al., Reference Stern, Spieker, Barnett and MacKain1983).

We designed the ADS tasks to collect repetitions of the same target words during adult interactions. In the ADS storytelling task, similar to the IDS storytelling task, the parent created a narrative containing target words and told it to an adult experimenter. In the ADS object description task, parents received the same target objects as in the IDS free play task. They were instructed to label each item and describe it in a few sentences to the experimenter. Lastly, parents completed the map task, a spatial matching game, in which the parent described the locations of images of the target objects to an experimenter. The tasks for both IDS and ADS were designed to give parents multiple opportunities to incorporate the target words in their speech in different contexts. We limited our analyses to the target words in order to examine how prosodic characteristics of a consistent set of words change across contexts.

6. Methods

6.1. Participants

The participants were 42 parents (36 females, 6 males), all native speakers of American English in California, and their infants. There were 25 10- to 12-month-olds and 17 18- to 20-month-olds (22 females, 20 males) (mean age: 14.32 months). Fourteen additional participants were excluded from the analysis due to missing audio data/recording issues (n = 11), infant fussiness (n = 2), and one case of a language screening issue.

6.2. Stimuli

Target words consisted of 14 items: apple, baby, ball, boat, bottle, camel, car, carrot, cat, cheetah, giraffe, hat, lettuce, and otter. Physical toys representing each target word were provided in a toy box (see Figure 1a and b). Some of the toys in IDS and ADS were different in order to make the toys in the ADS tasks less baby-like and more relatable for adults so the interaction would be more natural (see Figure 1b). For the storytelling task in IDS and ADS, we created two wordless picture books (see Figure 1c). Each picture book depicted a different event (going to school or going on vacation) across 20 pages. Across the pages, there were a series of actions or scenes connected with the overall theme of the story. The narrative of the event was not written on the pages, but the target items were labelled individually next to each item. Each target word appeared 3 times in each book.

Figure 1. Toys and books used in the IDS and ADS conditions. (a) Toys used during the IDS tasks representing the 14 target words. (b) Toys used during the ADS tasks representing the 14 target words. (c) Images of the picture books used in both the IDS and ADS conditions.

6.3. Procedures

Each parent completed the register condition for IDS first, followed by the ADS (blocked). All parent–infant dyads participated in the IDS session first because pilot testing demonstrated that some infants became upset or fatigued while separated from the parent during the ADS task and could not participate in the IDS task.Footnote ¹

Within each register (IDS, ADS), the task orders were randomised. Parents completed three tasks with their infant (free play, storytelling, and sorting) to collect IDS samples, and three tasks with an experimenter (map, storytelling, and object description) to collect ADS samples. Each of the tasks lasted ~6 min. In each register (IDS, ADS) and each task, parents used the same set of target words as physical toys, appearing in books, or as isolated pictures. Figure 2 depicts each of the tasks. All sessions were video and audio recorded. For acoustic analyses, we used the audio recordings taken from a lavalier microphone clipped to the parent’s shirt connected to a portable audio recorder.

Figure 2. Depiction of the tasks in IDS and ADS conditions.

6.3.1. IDS tasks

Free play. Parents were instructed to take each of the toys out of a box, label them, and then play with the toys as they would at home with their infant. After 4 min, a timer sounded, indicating that the parent and infant had 2 min to finish playing and put the toys back in the box.

Sorting. In this task, parents and infants had to work together to sort the toys into three different buckets with distinct labels: living things, objects, and food.

Storytelling. Parents were given one of two picture books either “Daniel’s School Adventures” or “Baby’s First Vacation” (Figure 1c) (assignment of IDS or ADS was counterbalanced across participants). They had to make up a story as they went through the book, with the only instruction being that they incorporate the labelled items (i.e., the 14 target words) within the story. This task lasted 6 min.

6.3.2. ADS tasks

To collect ADS samples, the parent interacted with an experimenter and was instructed to talk as they normally would with any other adult. All experimenters were female native-American English speakers from California.

Object description. For this task, parents were asked to take each toy out of the box, say the name of the object, and then use the name in a few sentences that could consist of describing the physical appearance or what the item does or where it lives.

Map. Both the experimenter and parent were seated on either side of a privacy shield. Parents were given a sheet of paper that had images of target objects on it. The experimenter had a blank sheet of paper. The parent was tasked with describing the location of the objects to the experimenter so they could map the objects onto their blank sheet of paper. Parents were encouraged to talk about each object in as much detail as possible.

Storytelling. Parents received one of the picture books (the book not used in the IDS sample). They were asked to make up a story as they went through the book while incorporating the picture labels.

6.4. Acoustic analysis

Audio recordings were first separated by task using Praat (Boersma & Weenink, Reference Boersma and Weenink2021) and then transcribed using an online, automatic transcription tool (https://sonix.ai/). These transcriptions were then converted to Textgrids using the phonfieldwork R package (Moroz, Reference Moroz2020). Trained research assistants listened to each of the audio files to ensure that they were of high quality. They annotated any noise (infant crying, mispronunciations, loud gasps, laughing) while also correcting the transcriptions if there were any errors. Errors and noise were omitted from the final acoustic analysis. Following this process, audio files were converted to a mono channel and a sampling rate of 16,000 Hz. These files were then force-aligned using the Montreal Forced Aligner (MFA) (McAuliffe et al., Reference McAuliffe, Socolof, Mihuc, Wagner and Sonderegger2017). Word-level measurements (rate, mean word F0, F0 range over word) were extracted using Praat. F0 measurements were made using the autocorrelation algorithm in Praat using an adapted script (DiCanio, Reference DiCanio2007), which calculated the mean f0 over 10 equidistant measurements within each word to generate a contour based on plausible maxima and minima f0 values by-speaker gender (female range: 150 – 350 Hz; male range: 78 – 350 Hz) (Cohn et al., Reference Cohn, Segedin and Zellou2022). We calculated mean f0 and f0 range from the f0 contour to provide a measurement that is more robust to artefacts (Cohn et al., Reference Cohn, Segedin and Zellou2022; Cohn et al., Reference Cohn, Mengesha, Lahav and Heldreth2024). Pitch values were converted to semitones (relative to 75 Hertz) with the hqmisc R package (Quené, Reference Quené2022). Word rate was calculated by taking the number of syllables divided by the duration for each word, resulting in the number of syllables per second (De Jong & Wempe, Reference De Jong and Wempe2009).

6.5. Statistical analysis

We used linear-mixed effect models using the lmer package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R to analyse the data. We analysed speech from 42 parents across 12,151 target word productions. Mean number of occurrences for each of the 14 target words analysed is shown in Table 1. We modelled each acoustic property of interest (word rate, mean pitch, pitch range) in a separate model. In each model, we first attempted to fit a complex random effects structure (with by-subject random slopes for word occurrence, task, etc.) to account for inter-subject variability (Barr et al., Reference Barr, Levy, Scheepers and Tily2013). In the event of a convergence or singularity error, indicating the model structure is not supported by the data, we simplified the random effects structure using a systematic approach (adapted from Barr et al., Reference Barr, Levy, Scheepers and Tily2013 and Cohn et al., Reference Cohn, Segedin and Zellou2022; e.g., removing random effects that account for 0 variance). In each model, we also tested for collinearity between the predictors with the performance R package (Lüdecke et al., Reference Lüdecke, Ben-Shachar, Patil, Waggoner and Makowski2021). The retained structure for each model is provided in the sections below.

Table 1. Descriptive statistics (mean and SD) for the number of repetitions of the target words in each task in IDS and ADS

6.6. Comparing IDS and ADS

In one set of models, we compared acoustic features of IDS and ADS, collapsing across tasks. Each model included fixed effects of condition (IDS, ADS; reference level = ADS) and previous word occurrence count (centered) for each subject across the experiment, by-speaker and by-word random intercepts, and by-speaker random slopes for word occurrence. The purpose of these analyses was to confirm that our dataset replicated previously well-established findings regarding pitch and rate in IDS.

6.7. Comparing IDS features across contexts

In another set of models, we examined word mean pitch, pitch range, and rate across tasks for the subset of IDS data (6,407 target word productions). Fixed effects included task (Free play, Sorting, Storytelling; reference level = Storytelling) and previous word occurrence count (centered) for each subject across the experiment within the IDS condition, and by-speaker and by-word random intercepts, and by-speaker random slopes for task and word occurrence for all three of the models. We tested for collinearity between the predictors with the performance R package (Lüdecke et al., Reference Lüdecke, Ben-Shachar, Patil, Waggoner and Makowski2021).

7. Results

Results will be presented for each acoustic feature, beginning with pitch (mean and range) followed by-word rate. All models had low collinearity (VIF < 5), indicating that the inclusion of word occurrence was supported by the datasets.

7.1. IDS vs. ADS

The pitch model comparing IDS-ADS had a singularity error for the by-speaker random slopes for word occurrence. In the retained modelFootnote ² (output shown in Table 2), we found that condition was a significant predictor of mean F0. As seen in Figure 3a, mean F0 was significantly higher in IDS than ADS, replicating previous findings (Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, De Boysson-Bardies and Fukui1989; Fernald & Simon, Reference Fernald and Simon1984). In addition, there was a significant effect of word occurrence where the mean pitch increased across additional repetitions of the target words.

Table 2. Fixed and random effects parameters of the between conditions (IDS vs ADS) models

Figure 3. Pitch and rate comparisons between conditions at the word level. (a) mean F0 in semitones between addressee conditions. (b) F0 range in semitones between addressee conditions. (c) word rate between addressee conditions. Error bars show the standard error of the mean.

The pitch range model had a singularity error. The retained model (output shown in Table 2) did not reveal a significant difference between conditions, or a significant effect of word occurrence (Figure 3b).

The word rate model had a singularity error. In the retained model, we observed that condition was a significant predictor of word rate (model output provided in Table 2). Figure 3c shows that the rate was slower in IDS compared to ADS, replicating previous findings (Cooper & Aslin, Reference Cooper and Aslin1994; Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2022; Fernald & Simon, Reference Fernald and Simon1984; Raneri et al., Reference Raneri, Von Holzen, Newman and Ratner2020). In addition, there was a significant effect of word occurrence where the rate became quicker across additional repetitions of the target words.

7.2. IDS task comparisons

In the models comparing different tasks within IDS, the more robust random effects structure was not supported for either of the two pitch measures or word rate as there were singularity/convergence errors when including the by-speaker random slopes for word occurrence. The most parsimonious retained model includes by-speaker random slopes for task and by-word random intercepts.

In the retained pitch modelFootnote ³ (output provided in Table 3), there was no effect of task. As seen in Figure 4a, parents did not differ in mean pitch across the storytelling, sorting, and free play tasks. However, there was a significant main effect of word occurrence, with mean pitch increasing across additional repetitions of the target words (Table 3).

Table 3. Fixed and random effects parameters of the IDS between tasks models

Figure 4. Pitch and word rate comparisons between IDS tasks at the word level. (a) mean F0 in semitones between tasks. (b) F0 range in semitones between tasks. (c) word rate between tasks. Error bars show the standard error of the mean.

In contrast, the retained pitch range model (See Footnote footnote 3) revealed an effect of task (output provided in Table 3). As seen in Figure 4b, the free play task and the sorting task had a smaller F0 range than the storytelling task (P < 0.05). A post-hoc analysis using pairwise comparisons (Bonferroni adjusted) did not reveal a significant difference between free play and sorting tasks (Coef = −0.06, SE = 0.13, z = −0.47, P = 0.89) (see Figure 4b). There was not a significant effect of word occurrence on pitch range.

In the retained word rate model (See Footnote footnote 3) (model output provided in Table 3), there was an effect of task. Specifically, the free play task had a slower word rate compared to the storytelling task (see Figure 4c). However, there was not a significant difference in word rate between the sorting and storytelling tasks. A post-hoc analysis using pairwise comparisons (Bonferroni adjusted) did not reveal a significant difference between free play and sorting tasks (Coef = −0.23, SE = 0.21, z = −1.14, P = 0.49). In addition, there was a significant effect of word.

Occurrence: word rate became quicker across additional repetitions of the target words (P < 0.001).

7.3. Post-hoc analysis: Changes over occurrence in ADS

Additionally, we tested whether changes in word occurrence in IDS across tasks were paralleled in ADS. The models (provided in Supplementary Material) included fixed effects for task (Object description, Map, Storytelling; reference level = Storytelling) and previous word occurrence counts (centered) for each subject across the experiment within the ADS condition, and by-speaker and by-word random intercepts, as well as by-speaker random slopes for task. All three of the ADS models had a singularity error for the by-speaker random slopes for word occurrence. The retained models revealed that mean pitch, pitch range, and word rate did not significantly change over additional word occurrences in ADS (P > 0.05) (see Supplementary Material).

8. Discussion

The present study explored how parents adjust their speech to infants when producing the same words across different contexts. Since most studies analyse IDS within a singular context, we were interested in how the way parents talk to their children may adapt to the different activities. We analysed acoustic features (mean F0, F0 range, and word rate) of parents’ speech to their 10- to 20-month-old infants during three different tasks that shared a common set of words. We found a higher mean pitch and slower word rate in IDS than ADS, replicating previously established findings (e.g., Cooper & Aslin, Reference Cooper and Aslin1994; Fernald & Simon, Reference Fernald and Simon1984). However, there was not a significant difference in pitch range between IDS and ADS.

Within IDS, we found evidence for different acoustic adjustments based on the context, specifically for our pitch range and word rate measures. Our prediction that pitch would be greatest in the sorting task based on Gergely et al. (Reference Gergely, Faragó, Galambos and Topál2017) was not borne out in the current study, where we found mean pitch did not significantly vary between our tasks. However, when telling a story, parents produced larger pitch ranges compared to the free play and sorting tasks. They also produced a faster rate, relative to the sorting task. There are a few possible explanations for this pattern. First, each task had its own unique goals. In the sorting task, parents and their infants worked together to sort toys into buckets. During the free play task, parents played with their infants while labelling and describing toys. The storytelling task required parents to create their own narrative that incorporated the target words. The storytelling task was distinct from the other tasks because it involved telling a narrative and did not involve manipulating objects that represented target words. In addition, a likely goal for the parents during the storytelling task was to engage infants’ attention and keep them focused on the story itself. Fernald and Kuhl (Reference Fernald and Kuhl1987) found that infants listened longer to IDS than ADS when the IDS stream had exaggerated pitch contours, concluding that exaggerated pitch and infant attention go hand-in-hand. Our findings show that during storytelling parents had the widest pitch range compared to the other two tasks, suggesting that the demands and goals of a storybook might require more effort from parents to engage and sustain their infant’s attention. Further explanation for this idea will be discussed below.

Storytelling is typically expressive in nature, consisting of greater variation in pitch, tone, and pauses compared to other forms of spontaneous speech (Montaño & Alías, Reference Montaño and Alías2017). Expressiveness, specifically variations in pitch throughout a storytelling session, is theorised to be especially important to mark significant information in a story along with characters’ emotions, potentially aiding in reading comprehension (Cowie et al., Reference Cowie, Douglas-Cowie and Wichmann2002; Kuhn et al., Reference Kuhn, Schwanenflugel and Meisinger2010; Veenendaal et al., Reference Veenendaal, Groen and Verhoeven2014; Wolters et al., Reference Wolters, Kim and Szura2020). The greater pitch range and faster word rate observed in the present study may be due to the increased expressive nature of storytelling compared to the other two experimental tasks. Since the books provided to parents were wordless picture books, we can rule out that parents were simply communicating a predetermined narrative. In addition, when stories are conveyed in a more expressive manner via pitch changes, children are better able to comprehend the details of the story itself (Mira & Schwanenflugel, Reference Mira and Schwanenflugel2013). Parents are potentially aware of the benefits of expressive storytelling and adjust their speech accordingly to maintain their infant’s attention. Thus, it is plausible to conclude that storytelling drives increased prosodic variation in IDS, especially for pitch range and word rate. Further, it is possible that parents were more concerned about word learning and teaching in the free play task, thereby slowing their word rate to optimise learning compared to storytelling where expressiveness and infant engagement were of greater importance.

The storytelling task was also unique in that it was the only task where parents and their children did not have a physical, tactile referent for each of the target words. Instead, each target word was depicted by an image in the book. It is possible that not having a physical way to interact with each target word leads parents to use other strategies in order to engage their infants’ attention with the words and referents. In this case, parents exaggerated the target words in the book by producing them with a larger pitch range. At the same time, parents produced the target words at a faster rate in storytelling than free play, making these words somewhat less clear. Previous work found that phonetic clarity (as measured by external visual clarity ratings and acoustic analyses) and referential clarity were related in IDS (Beech & Swingley, Reference Beech and Swingley2024). Simply put, when referents in the physical environment were easy to detect, parents’ initial naming of the referent was rated as being clearer compared to referents that were not in the immediate physical environment (Beech & Swingley, Reference Beech and Swingley2024). The present study is consistent with these findings; referents were more obvious in the free play task than storytelling task, and this is where parents had the slowest and clearest speech when naming the target words, potentially creating more “conversational gems” (Beech & Swingley, Reference Beech and Swingley2024). Future studies can investigate speaker clarity and determine if there is variation between conditions and activities.

The finding that the word rate is faster in IDS storytelling may seem contradictory to the idea that a slower word rate enhances infants’ word recognition. However, it does not necessarily mean that infants are not learning important linguistic information during storytelling. Wang et al. (Reference Wang, Llanos and Seidl2017) investigated the developmental time course of how infants adjust to variations in speech rate. They found that while 11-month-old infants were unable to distinguish between familiar and unfamiliar words in a fast speech stream, 14-month-olds were successful at doing so. In addition, the researchers found that infants relied on the speech rate of the surrounding contextual information and not just the rate of the target words (Wang et al., Reference Wang, Llanos and Seidl2017). The mean age of the infants in the present study is 14.32 months, which means that most infants in our study can capture important linguistic information during the storytelling context, despite its faster word rate. To accommodate, infants may be using the surrounding contextual information about the story to process the target words.

Gergely et al. (Reference Gergely, Faragó, Galambos and Topál2017) also found that pitch characteristics differed across tasks designed to elicit IDS, although their findings differed from ours. They found that parents produced the highest mean pitch in the problem-solving task, followed by storytelling and teaching which had the lowest mean pitch in IDS. This pattern of pitch range variation across tasks differs from our findings, potentially because we presented a different array of activities and goals. The present tasks mimicked activities that parents and children participate in during their everyday lives. For example, the sorting task involved behaviours that many parents and children complete during a post-playtime clean-up session. The free play task was similar to playtime, and the storytelling task was similar to storytime. These tasks relate to each other and typical activities in different ways from Gergely et al.’s problem-solving, storytelling, and teaching tasks. Additionally, our analyses considered only the target words, allowing us to (1) more accurately control for intra- and inter-speaker variation and (2) explore how the production of the same words can vary depending on the context at hand. This second point is particularly interesting because even if the same word is used across contexts, the discourse and meaning of that word can change based on the goals of communication.

The present findings identify ways that prosodic characteristics of IDS are responsive to the different types of communicative contexts that parents and children engage in. Specifically, these results provide evidence that audience design is sensitive to the context and specific communicative needs of the situation, and not just defined by the interlocutor (Bell, Reference Bell1984; Clark & Murphy, Reference Clark and Murphy1982). Our findings suggest that parents are aware of the different communicative needs of the context and their infant and adapt their speech accordingly. This supports Lindblom’s (Reference Lindblom1990) idea that listener adaptations are on a continuum, with speakers adjusting based on both the individual and contextual needs of the listener. Crucially, the listener may have different needs depending on the context, requiring the speaker to adjust further to accommodate those needs, leading to different levels of hypo- and hyper-articulation (Lindblom, Reference Lindblom1990).

We also examined how parents’ IDS prosody changed across repeated occurrences of the target words, particularly if there were any reduction effects present. Reduction effects refer to the phenomenon that when information is repeated, speakers tend to hypo-articulate previously mentioned words (Lindblom, Reference Lindblom1990). For example, when adults talk to one another, acoustic features, such as speech rate and pitch, become less emphasised across repetitions of the same word (Fowler & Housum, Reference Fowler and Housum1987). Our findings showed that parents increased their mean pitch and word rate across repeated mentions of a word in IDS. On the one hand, the increase in mean pitch could reflect parents’ attempts to capture their infant’s attention and/or provide more information for word learning; rather than hypo-articulating “old” words, parents may treat all words as “new” to support their linguistic needs as a young language learner. At the same time, we saw an increased word rate with repetition in IDS, consistent with ADS. A possible explanation for this is that word rate is more responsive to infant feedback throughout an interaction than pitch may be. Work with adults has shown that when addressees demonstrate understanding of a referent, speakers produce acoustically shortened words compared to when they do not demonstrate understanding (Arnold et al., Reference Arnold, Kahn and Pancani2012). In IDS, Smith and Trainor (Reference Smith and Trainor2008) found that mothers raised their pitch more when they were aware that increased pitch resulted in positive engagement from the infant, demonstrating that mothers are responsive to infants’ attentional cues and feedback. In this study, parents might be aware of their infant’s word understanding and attention over the course of the interaction via infants’ actions or vocalisations, thereby supporting their consistent use of exaggerated IDS via pitch. Future studies can investigate the relationship between infant vocalisations, infant attention/engagement, and parents’ prosodic adjustments. One prediction is that the between-task prosodic differences we have observed in the present study are related to infants’ task-dependent attentional differences.

The design and results of the present study provide many directions for future research. First, even though the tasks that parents and their infants completed are more diverse than previous studies, they are still within the confines of an experimental, lab setting. We predict that larger effects would occur within a more naturalistic setting, such as at-home environments. At home, there may be more clear distinctions between different activities and contexts which may reveal even greater acoustic differences between contexts. In lab settings, parents may feel uncomfortable, especially with the knowledge that they are being recorded, which can shape their speech as well (e.g., Cohn et al., Reference Cohn, Liang, Sarian, Zellou and Yu2021). Future work can analyse changes in IDS prosody in at-home recordings. Day-long home recordings would also allow us to observe how parents change the way they communicate with their infants throughout the day during different activities such as mealtime, playtime, and bedtime, each of which has distinct demands and goals. The prediction here is that parents will adjust the manner in which they speak to their infants based on the specific task, using acoustic variation as a way to differentiate between goals. Future research should also investigate the conditions that affect prosody in ADS. While we found overall differences in mean pitch and speaking rate between IDS and ADS, we did not find that IDS had a larger pitch range. This may have occurred because of the teaching and instructional style of the ADS tasks (i.e., describing objects, explaining locations and stories), a possibility that remains to be investigated.

The ADS condition could have felt unnatural to parents; reading a book to another adult and describing toys are not things adults typically do with one another. The unusual task demands may have contributed to the surprising similarity in F0 range across IDS and ADS. At the same time, IDS tasks always occurred before ADS tasks in the current study. This was to ensure that infants were in the most content and alert state as possible. However, we recognise that this is a limitation because of the potential for speaker fatigue during the ADS condition, further affecting the present acoustic analyses and interpretations.

Second, the present study does not consider the types of utterance functions parents are using. Past work has shown that pitch patterns differ in polar interrogatives and declaratives in IDS (Geffen & Mintz, Reference Geffen and Mintz2017), although this work is limited. Future analyses will incorporate utterance functions (e.g., directives, interrogatives, etc.) with the acoustic analysis models to better understand other factors that may create variation within IDS. Lastly, we recognise that there may be sentence-level characteristics that shape context-related prosodic differences in IDS. Specific target words may fall in distinct positions within the utterance, affecting the way that word is produced. Previous work has found that IDS contains shorter utterances, leading to phrase-final lengthening for more words, among other characteristics (Martin et al., Reference Martin, Igarashi, Jincho and Mazuka2016; Wang et al., Reference Wang, Yu, Huang and Lany2023). This may affect the interpretation of the pitch and word rate results. Additionally, tasks may yield varying utterance lengths. Coupled with utterance functions, future analyses will include sentence-level information about parents’ utterances such as word position and utterance length to give us more nuanced information about what drives the prosodic differences between tasks that are seen in the current study.

In summary, this study considered how parents adjust their speech across different contexts when interacting with their infants. We found that parents produced a greater IDS pitch range during storytelling compared to the free play and object sorting. Parents also produced a faster word rate during storytelling compared to the free play. These findings suggest that parents adjust the way they talk to their infants based on the communicative needs of the context. In particular, engaging in a story with an infant seems to create a unique language experience. This may be due to the expressive nature of storytelling, especially to young children, that is supportive for reading comprehension, attention, and language development.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S0305000924000709.

Acknowledgements

Thank you to the undergraduate researchers who assisted with data collection and data preparation for the project. Thank you to the families and infants who participated in the project.

Competing interest

The author(s) declare none.

Footnotes

¹ We recognize that the lack of counterbalancing the order of IDS and ADS tasks is a limitation of the design. However, the primary focus of this study is to compare tasks within the IDS condition and not across IDS and ADS conditions. Therefore, the order of IDS and ADS conditions has minimal effects on the interpretation of the results. Consistently presenting the IDS condition first ensured that the infants were as alert, happy, and as interested as possible when participating in the crucial conditions of the study. Note that we included word occurrence in our models to account for prior production of the target words across ADS/IDS and the contexts.

² Lmer Syntax: feature ~ condition + word_occurrence_exp.c + (1 | subject) + (1 | word).

³ Lmer Syntax: feature ~ task + word_occurrence_cond.c + (1 + task | subject) + (1 | word)

References

Arnold, J. E., Kahn, J. M., & Pancani, G. C. (2012). Audience design affects acoustic reduction via production facilitation. Psychonomic bulletin & review, 19, 505–512.CrossRef Google Scholar PubMed

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.CrossRef Google Scholar PubMed

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects modelsusing lme4. Journal of Statistical Software, 67(1), 1–48. doi:https://doi.org/10.18637/jss.v067.i01.CrossRef Google Scholar

Beech, C., & Swingley, D. (2024). Relating referential clarity and phonetic clarity in infant-directed speech. Developmental Science, 27, 334–341. doi:https://doi.org/10.1111/desc.13442.CrossRef Google Scholar PubMed

Bell, A. (1984). Language style as audience design. Language in Society, 13(2), 145–204. doi:https://doi.org/10.1017/S004740450001037X.CrossRef Google Scholar

Benders, T. (2013). Mommy is only happy! Dutch mothers’ realization of speech sounds in infant-directed speech expresses emotion, not didactic intent. Infant Behavior and Development, 36(4), 847–862. doi:https://doi.org/10.1016/j.infbeh.2013.09.001.CrossRef Google Scholar

Boersma, P., & Weenink, D. (2021). Praat: doing phonetics by computer [computer program] (2011). Version, 5(3), 74.Google Scholar

Broesch, T. L., & Bryant, G. A. (2015). Prosody in infant-directed speech is similar across western and traditional cultures. Journal of Cognition and Development, 16(1), 31–43. doi:https://doi.org/10.1080/15248372.2013.833923.CrossRef Google Scholar

Burnham, D., Kitamura, C., & Vollmer-Conna, U. (2002). What’s new, pussycat? On talking to babies and animals. Science, 296(5572), 1435. doi:https://doi.org/10.1126/science.1069587.CrossRef Google Scholar PubMed

Cartmill, E. A., Armstrong, B. F., Gleitman, L. R., Goldin-Meadow, S., Medina, T. N., & Trueswell, J. C. (2013). Quality of early parent input predicts child vocabulary 3 years later. Proceedings of the National Academy of Sciences of the United States of America, 110(28), 11278–11283. doi:https://doi.org/10.1073/pnas.1309518110.CrossRef Google Scholar PubMed

Clark, H. H., & Murphy, G. L. (1982). Audience design in meaning and reference. Advances in psychology, 9, 287–299. North-Holland.Google Scholar

Cohn, M., Liang, K. H., Sarian, M., Zellou, G., & Yu, Z. (2021). Speech rate adjustments in conversations with an Amazon Alexa socialbot. Frontiers in Communication, 6, 67142. doi:https://doi.org/10.3389/fcomm.2021.671429.CrossRef Google Scholar

Cohn, M., Mengesha, Z., Lahav, M., & Heldreth, C. (2024). African American english speakers’ pitch variation and rate adjustments for imagined technological and human addressees. JASA Express Letters, 4(4).CrossRef Google Scholar PubMed

Cohn, M., Segedin, B. F., & Zellou, G. (2022). Acoustic-phonetic properties of Siri- and human-directed speech. Journal of Phonetics, 90. doi:https://doi.org/10.1016/j.wocn.2021.101123.CrossRef Google Scholar

Cooper, R. P., & Aslin, R. N. (1990). Preference for infant-directed speech in the first month after birth. Child Development, 61(5), 1584–1595. doi:https://doi.org/10.2307/1130766.CrossRef Google Scholar PubMed

Cooper, R. P., & Aslin, R. N. (1994). Developmental differences in infant attention to the spectral properties of infant-directed speech. Child development, 65(6), 1663–1677. doi:https://doi.org/10.1111/j.1467-8624.1994.tb00841.x.CrossRef Google Scholar

Cowie, R., Douglas-Cowie, E., & Wichmann, A. (2002). Prosodic characteristics of skilled reading: fluency and expressiveness in 8-10-year-old readers. Language and speech, 45(1), 47–82. doi:https://doi.org/10.1177/00238309020450010301.CrossRef Google Scholar PubMed

Cox, C., Bergmann, C., Fowler, E., Keren-Portnoy, T., Roepstorff, A., Bryant, G., & Fusaroli, R. (2022). A systematic review and Bayesian meta-analysis of the acoustic features of infant-directed speech. Nature Human Behaviour, 7, 114–133. doi:https://doi.org/10.1038/s41562-022-01452-1.CrossRef Google Scholar PubMed

Cristià, A. (2010). Phonetic enhancement of sibilants in infant-directed speech. The Journal of the Acoustical Society of America, 128(1), 424–434. doi:https://doi.org/10.1121/1.3436529.CrossRef Google Scholar PubMed

Cristia, A., & Seidl, A. (2014). The hyperarticulation hypothesis of infant-directed speech. Journal of Child Language, 41(4), 913–934. doi:https://doi.org/10.1017/S0305000912000669.CrossRef Google Scholar PubMed

De Jong, N. H., & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speechrate automatically. Behavior research methods, 41(2), 385–390.CrossRef Google Scholar

DiCanio, C. (2007). “Extract pitch averages,” https://www.acsu.buffalo.edu/∼cdicanio/scripts/Get_pitch.praat Google Scholar

Englund, K. T. (2018). Hypoarticulation in infant-directed speech. Applied Psycholinguistics, 39(1), 67–87. doi:https://doi.org/10.1017/S0142716417000480.CrossRef Google Scholar

Fernald, A., & Simon, T. (1984). Expanded intonation contours in mothers’ speech to newborns. Developmental psychology, 20(1), 104.CrossRef Google Scholar

Fernald, A., Taeschner, T., Dunn, J., Papousek, M., De Boysson-Bardies, B., & Fukui, I. (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16(3), 477–501. doi:https://doi.org/10.1017/S0305000900010679.CrossRef Google Scholar PubMed

Fernald, A., & Kuhl, P. K. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior & Development, 10(3), 279–293. doi:https://doi.org/10.1016/0163-6383(87)90017-8.CrossRef Google Scholar

Fowler, C. A., & Housum, J. (1987). Talkers’ signaling of “new” and “old” words in speechand listeners’ perception and use of the distinction. Journal of Memory and Language, 26(5), 489–504.CrossRef Google Scholar

Geffen, S., & Mintz, T. H. (2017). Prosodic differences between declaratives and interrogativesin infant-directed speech. Journal of Child Language, 44(4), 968–994. doi:https://doi.org/10.1017/S0305000916000349.CrossRef Google Scholar PubMed

Gergely, A., Faragó, T., Galambos, Á., & Topál, J. (2017). Differential effects of speech situations on mothers’ and fathers’ infant-directed and dog-directed speech: An acoustic analysis. Scientific Reports, 7(1), 13739. doi:https://doi.org/10.1038/s41598-017-13883-2.CrossRef Google Scholar PubMed

Giles, H. (1973). Accent mobility: A model and some data. Anthropological linguistics, 87–105. ChicagoGoogle Scholar

Graf Estes, K., & Hurley, K. (2013). Infant-directed prosody helps infants map sounds to meanings. Infancy, 18(5), 797–824. doi:https://doi.org/10.1111/infa.12006.CrossRef Google Scholar

Gumperz, J. J. (1977). The sociolinguistic significance of conversational code-switching. RELC journal, 8(2), 1–34.CrossRef Google Scholar

Hartman, K. M., Ratner, N. B., & Newman, R. S. (2017). Infant-directed speech (IDS) vowel clarity and child language outcomes. Journal of Child Language, 44(5), 1140–1162. doi:https://doi.org/10.1017/S0305000916000520.CrossRef Google Scholar PubMed

Hilton, C. B., Moser, C. J., Bertolo, M., Lee-Rubin, H., Amir, D., Bainbridge, C. M., … Mehr, S. A. (2022). Acoustic regularities in infant-directed speech and song across cultures. Nature Human Behaviour, 6(11), 1545–1556. doi:https://doi.org/10.1038/s41562-022-01410-x.CrossRef Google Scholar

Huttenlocher, J., Waterfall, H., Vasilyeva, M., Vevea, J., & Hedges, L. V. (2010). Sources of variability in children’s language growth. Cognitive psychology, 61(4), 343–365. doi:https://doi.org/10.1016/j.cogpsych.2010.08.002.CrossRef Google Scholar PubMed

Jones, M. R., Kidd, G., & Wetzel, R. (1981). Evidence for rhythmic attention. Journal of Experimental Psychology. Human Perception and Performance, 7(5), 1059–1073.CrossRef Google Scholar PubMed

Julien, H. M., & Munson, B. (2012). Modifying speech to children based on their perceived phonetic accuracy. Journal of Speech, Language, and Hearing Research, 55(6), 1836–1849. doi:https://doi.org/10.1044/1092-4388(2012/11-0131.CrossRef Google Scholar PubMed

Kalashnikova, M., & Burnham, D. (2018). Infant-directed speech from seven to nineteen months has similar acoustic properties but different functions. Journal of Child Language, 45(5), 1035–1053. doi:https://doi.org/10.1017/S0305000917000629.CrossRef Google Scholar PubMed

Kitamura, C., & Burnham, D. (2003). Pitch and communicative intent in mother’s speech: Adjustments for age and sex in the first year. Infancy, 4(1), 85–110. doi:https://doi.org/10.1207/S15327078IN0401_5.CrossRef Google Scholar

Kitamura, C., Thanavishuth, C., Burnham, D., & Luksaneeyanawin, S. (2001). Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and non-tonal language. Infant Behavior and Development, 24(4), 372–392. doi:https://doi.org/10.1016/S0163-6383(02)00086-3.CrossRef Google Scholar

Ko, E.-S. (2012). Nonlinear development of speaking rate in child-directed speech. Lingua, 122(8), 841–857. doi:https://doi.org/10.1016/j.lingua.2012.02.005.CrossRef Google Scholar

Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., Stolyarova, E. I., Sundberg, U., & Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277(5326), 684–686. doi:https://doi.org/10.1126/science.277.5326.684.CrossRef Google Scholar PubMed

Kuhn, M. R., Schwanenflugel, P. J., & Meisinger, E. B. (2010). Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Reading Research Quarterly, 45(2), 230–251.CrossRef Google Scholar

Lam, C., & Kitamura, C. (2012). Mommy, speak clearly: Induced hearing loss shapes vowel hyperarticulation. Developmental Science, 15(2), 212–221. doi:https://doi.org/10.1111/j.1467-7687.2011.01118.x.CrossRef Google Scholar PubMed

Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In Speech production and speech modelling (pp. 403–439). Dordrecht: Springer Netherlands.CrossRef Google Scholar

Liu, H. M., Kuhl, P. K., & Tsao, F. M. (2003). An association between mothers’ speech clarity and infants’ speech discrimination skills. Developmental Science, 6(3). doi:https://doi.org/10.1111/1467-7687.00275.CrossRef Google Scholar

Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60).CrossRef Google Scholar

Martin, A., Igarashi, Y., Jincho, N., & Mazuka, R. (2016). Utterances in infant-directed speech are shorter, not slower. Cognition, 156, 52–59. doi:https://doi.org/10.1016/j.cognition.2016.07.015.CrossRef Google Scholar

Martin, A., Schatz, T., Versteegh, M., Miyazawa, K., Mazuka, R., Dupoux, E., & Cristia, A. (2015). Mothers speak less clearly to infants than to adults: A comprehensive test of the hyperarticulation hypothesis. Psychological Science, 26(3), 341–347. doi:https://doi.org/10.1177/0956797614562453.CrossRef Google Scholar PubMed

McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal forced aligner: Trainable text-speech alignment using Kaldi. Proc. Interspeech 2017, 498–502. doi:https://doi.org/10.21437/Interspeech.2017-1386.CrossRef Google Scholar

Mira, W. A., & Schwanenflugel, P. J. (2013). The impact of reading expressiveness on the listening comprehension of storybooks by prekindergarten children. Language, Speech, and Hearing Services in Schools, 44(2), 183–194. doi:https://doi.org/10.1044/0161-1461.CrossRef Google Scholar PubMed

Montaño, R., & Alías, F. (2017). The role of prosody and voice quality in indirect storytelling speech: A cross-narrator perspective in four European languages. Speech Communication, 88, 1–16. doi:https://doi.org/10.1016/j.specom.2017.01.007.CrossRef Google Scholar

Moroz, G. (2020). Phonetic fieldwork and experiments with phonfieldwork package.CrossRef Google Scholar

Narayan, C. R., & McDermott, L. C. (2016). Speech rate and pitch characteristics of infant-directed speech: Longitudinal and cross-linguistic observations. The Journal of the Acoustical Society of America, 139(3), 1272–1281. doi:https://doi.org/10.1121/1.4944634.CrossRef Google Scholar PubMed

Nelson, D. G. K., Hirsh-Pasek, K., Jusczyk, P. W., & Cassidy, K. W. (1989). How the prosodic cues in motherese might assist language learning. Journal of Child Language, 16(1), 55–68. doi:https://doi.org/10.1017/S030500090001343X.CrossRef Google Scholar

Nencheva, M. L., & Lew-Williams, C. (2022). Understanding why infant-directed speech supports learning: A dynamic attention perspective. Developmental Review, 66. doi:https://doi.org/10.1016/j.dr.2022.101047.CrossRef Google Scholar

Payne, E. M., Post, B., Astruc, L., Prieto, P., & Vanrell, M. (2015). Rhythmic modification in child speech. https://ora.ox.ac.uk/objects/uuid:0248005f-c60449ce-9a07-33b1fb9fd8f2.Google Scholar

Puccini, D., Hassemer, M., Salomo, D., & Liszkowski, U. (2010). The type of shared activity shapes caregiver and infant communication. Gesture, 10(2–3), 279–296. doi:https://doi.org/10.1075/gest.10.2-3.08puc.CrossRef Google Scholar

Quené, H. (2022). Package ‘hqmisc’.Google Scholar

Raneri, D., Von Holzen, K., Newman, R., & Ratner, N. B. (2020). Change in maternal speech rate to preverbal infants over the first two years of life. Journal of Child Language, 47(6), 1263–1275.CrossRef Google Scholar PubMed

Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206. doi:https://doi.org/10.1016/j.cognition.2018.05.015.CrossRef Google Scholar PubMed

Rondal, J. A. (1980). Fathers’ and mothers’ speech in early language development. Journal of Child Language, 7(2), 353–369. doi:https://doi.org/10.1017/S0305000900002671.CrossRef Google Scholar PubMed

Rowe, M. L. (2012). A longitudinal investigation of the role of quantity and quality of child-directed speech vocabulary development. Child Development, 83(5), 1762–1774. doi:https://doi.org/10.1111/j.1467-8624.2012.01805.x.CrossRef Google Scholar PubMed

Shute, B., & Wheldall, K. (1999). Fundamental frequency and temporal modifications in the speech of British fathers to their children. Educational Psychology, 19(2), 221–233.CrossRef Google Scholar

Smith, N. A., & Trainor, L. J. (2008). Infant-directed speech is modulated by infant feedback. Infancy, 13(4), 410–420. doi:https://doi.org/10.1080/15250000802188719.CrossRef Google Scholar

Song, J. Y., Demuth, K., & Morgan, J. (2010). Effects of the acoustic properties of infant-directed speech on infant word recognition. The Journal of the Acoustical Society of America, 128(1), 389–400. doi:https://doi.org/10.1121/1.3419786.CrossRef Google Scholar PubMed

Spence, M. J., & Moore, D. S. (2003). Categorization of infant-directed speech: Development from 4 to 6 months. Developmental Psychobiology, 42(1), 97–109. doi:https://doi.org/10.1002/dev.10093.CrossRef Google Scholar PubMed

Stern, D. N., Spieker, S., Barnett, R. K., & MacKain, K. (1983). The prosody of maternal speech: Infant age and context related changes. Journal of Child Language, 10(1), 1–15. doi:https://doi.org/10.1017/S0305000900005092.CrossRef Google Scholar PubMed

Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant-directed speech facilitates wordsegmentation. Infancy, 7(1), 53–71. doi:https://doi.org/10.1207/s15327078in0701_5.CrossRef Google Scholar

Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological science, 11(3), 188–195. doi:https://doi.org/10.1111/1467-9280.00240.CrossRef Google Scholar PubMed

Trainor, L. J., & Desjardins, R. N. (2002). Pitch characteristics of infant-directed speech affect infants’ ability to discriminate vowels. Psychonomic Bulletin & Review, 9(2), 335–340. doi:https://doi.org/10.3758/BF03196290.CrossRef Google Scholar PubMed

Trueswell, J. C., Lin, Y., Armstrong, B. F., Cartmill, E. A., Goldin-Meadow, S., & Gleitman, L. R. (2016). Perceiving referential intent: Dynamics of reference in natural parent-child interactions. Cognition, 148, 117–135. doi:https://doi.org/10.1016/j.cognition.2015.11.002.CrossRef Google Scholar PubMed

Uther, M., Knoll, M. A., & Burnham, D. (2007). Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant-directed speech. Speech Communication, 49(1), 2–7. doi:https://doi.org/10.1016/j.specom.2006.10.003.CrossRef Google Scholar

Veenendaal, N. J., Groen, M. A., & Verhoeven, L. (2014). The role of speech prosody and text reading prosody in children’s reading comprehension. British Journal of Educational Psychology, 84(4), 521–536. doi:https://doi.org/10.1111/bjep.12036.CrossRef Google Scholar PubMed

Wang, T., Yu, E. C., Huang, R., & Lany, J. (2023). Acoustic cues to phrase and clause boundaries in infant-directed speech: Evidence from LENA recordings. Journal of Child Language, 1–20. doi:https://doi.org/10.1017/S030500092300034X.Google Scholar PubMed

Wang, Y., Llanos, F., & Seidl, A. (2017). Infants adapt to speaking rate differences in word segmentation. The Journal of the Acoustical Society of America, 141(4), 2569–2578. doi:https://doi.org/10.1121/1.4979704.CrossRef Google Scholar PubMed

Werker, J. F., & McLeod, P. J. (1989). Infant preference for both male and female infant-directed talk: A developmental study of attentional and affective responsiveness. Canadian Journal of Psychology / Revue Canadienne de Psychologie, 43(2), 230–246. doi:https://doi.org/10.1037/h0084224.CrossRef Google Scholar

Werker, J. F., Pons, F., Dietrich, C., Kajikawa, S., Fais, L., & Amano, S. (2007). Infant-directed speech supports phonetic category learning in English and Japanese. Cognition, 103(1), 147–162. doi:https://doi.org/10.1016/j.cognition.2006.03.006.CrossRef Google Scholar PubMed

Wolters, A. P., Kim, Y.-S. G., & Szura, J. W. (2020). Is reading prosody related to reading comprehension? A meta-analysis. Scientific Studies of Reading, 26(1), 1–20. doi:https://doi.org/10.1080/10888438.2020.1850733.CrossRef Google Scholar PubMed

Xu Rattanasone, N., Burnham, D., & Reilly, R. G. (2013). Tone and vowel enhancement in Cantonese infant-directed speech at 3, 6, 9, and 12 months of age. Journal of Phonetics, 41(5), 332–343. doi:https://doi.org/10.1016/j.wocn.2013.06.001.CrossRef Google Scholar

Zangl, R., Klarman, L., Thal, D., Fernald, A., & Bates, E. (2005). Dynamics of word comprehension in infancy: Developments in timing, accuracy, and resistance to acoustic degradation. Journal of Cognition and Development, 6(2), 179–208. doi:https://doi.org/10.1207/s15327647jcd0602_2.CrossRef Google Scholar PubMed

Figure 2. Depiction of the tasks in IDS and ADS conditions.

Table 1. Descriptive statistics (mean and SD) for the number of repetitions of the target words in each task in IDS and ADS

Table 2. Fixed and random effects parameters of the between conditions (IDS vs ADS) models

Table 3. Fixed and random effects parameters of the IDS between tasks models

DiStefano et al. supplementary material

File 18.1 KB

Article contents

Prosodic variation between contexts in infant-directed speech

Abstract

Keywords

1. Introduction

2. Acoustic features of IDS

3. Proposed motivations of IDS

4. The contexts of IDS

5. Current study

6. Methods

6.1. Participants

6.2. Stimuli

6.3. Procedures

6.3.1. IDS tasks

6.3.2. ADS tasks

6.4. Acoustic analysis

6.5. Statistical analysis

6.6. Comparing IDS and ADS

6.7. Comparing IDS features across contexts

7. Results

7.1. IDS vs. ADS

7.2. IDS task comparisons

7.3. Post-hoc analysis: Changes over occurrence in ADS

8. Discussion

Supplementary material

Acknowledgements

Competing interest

Footnotes

References

DiStefano et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests