1. Introduction
Prosody can be used to chunk utterances into meaningful units by marking boundaries, a phenomenon known as prosodic phrasing (Keating, Reference Keating2004; Ladd, Reference Ladd1986; Liberman & Prince, Reference Liberman and Prince1977; Selkirk, Reference Selkirk1984). For example, English speakers usually insert a boundary within a list (e.g., jelly, beans) to differentiate it from a compound (e.g., jellybeans), using prosodic cues such as duration and pitch (Ladd, Reference Ladd1986; Wynne et al., Reference Wynne, Wheeldon and Lahiri2018; Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021). Previous studies have found that English-speaking preschoolers can use durational and pitch cues for phrasing by age 5 (Vogel & Raimy, Reference Vogel and Raimy2002; Wells et al., Reference Wells, Peppé and Goulandris2004; Yoshida, Reference Yoshida2007; Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021). However, little is known about how prosodic phrasing is acquired by preschoolers speaking tonal languages, such as Mandarin, where pitch is used for both lexical meaning and prosodic phrasing simultaneously. Although compounding structures in English and Mandarin differ (Zhang et al., Reference Zhang, Anderson, Wang, Packard, Wu, Tang and Ke2012), studies in Mandarin can provide cross-linguistic evidence regarding potential acoustic cues that children can use for boundary marking. Thus, the current study aims to investigate Mandarin-speaking preschoolers’ use of prosodic cues to contrast compounds and lists in their productions.
2. Prosodic phrasing in English
Different linguistic units, such as compounds and lists, can be disambiguated by the existence of a prosodic boundary, inserted in lists but not in compounds (Nespor & Vogel, Reference Nespor and Vogel1986; Wells et al., Reference Wells, Peppé and Goulandris2004; Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021). Durational cues, such as preboundary lengthening and pauses, mark the existence of a boundary. For example, the duration of jelly is longer in the list “jelly, beans …” when preceding a word boundary compared to that in the compound “jellybeans …” without a boundary. A pause is also inserted between jelly and beans in the list but not in the compound (De Pijper & Sanderman, Reference De Pijper and Sanderman1994; Katz et al., Reference Katz, Beach, Jenouri and Verma1996; Kentner & Féry, Reference Kentner and Féry2013; Krivokapić, Reference Krivokapić2007; Price et al., Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991; Turk & Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2007; Wagner, Reference Wagner2005; Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021). Both preboundary lengthening and pauses are robust cues for marking boundaries and have been found across many (non-tonal) languages (Dutch: Cambier-Langeveld, Reference Cambier-Langeveld2000; French: Michelas & D’Imperio, Reference Michelas and D’Imperio2012; German: Huttenlauch et al., Reference Huttenlauch, De Beer, Hanne and Wartenburger2021; Petrone et al., Reference Petrone, Truckenbrodt, Wellmann, Holzgrefe-Lang, Wartenburger and Höhle2017).
English-learning children acquired these durational cues for boundary marking at around 4 to 5 years. For example, 5-year-olds can already produce preboundary lengthening and pauses to distinguish between compounds and lists in an adult-like manner, although they insert more pauses in compounds than adults (Dankovičová et al., Reference Dankovičová, Pigott, Wells and Peppé2004; Wells et al., Reference Wells, Peppé and Goulandris2004; Yoshida, Reference Yoshida2007; Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021. See Katz et al., Reference Katz, Beach, Jenouri and Verma1996 for different findings). These findings show that English-speaking preschoolers can build mappings correctly between prosodic cues and phrasing. However, little is known about whether and when preschoolers speaking tonal languages acquire the prosodic cues required for boundary marking.
3. Prosodic phrasing in Mandarin
Like English, Mandarin also uses preboundary lengthening and pauses for boundary marking (Bu et al., Reference Bu, Feng, Ang and Chen2022; Chen, Reference Chen, Huang, Lin, Chen and Hsu2022; Kuang et al., Reference Kuang, Chan, Rhee, Liberman and Ding2022; Shen, Reference Shen1992; Wang et al., Reference Wang, Xu and Ding2017). However, unlike English, Mandarin has four lexical tones with distinct pitch contours: a high-level tone (T1), a rising tone (T2), a low-dipping tone (T3), and a falling tone (T4). T1, T2, and T4 have simple contours while T3 has a complex contour (see Figure 1). These four lexical tones are associated with distinct lexical meanings (e.g., ma1 “mother,” ma2 “hemp,” ma3 “horse,” and ma4 “to scold”). The acoustic realization of lexical tones can be influenced by the existence of word boundaries (Kuang et al., Reference Kuang, Chan, Rhee, Liberman and Ding2022, Reference Kuang, Chan and Rhee2023; Lin, Reference Lin1999; Wang et al., Reference Wang, Xu and Ding2017; Xu & Wang, Reference Xu and Wang2009; Zhang, Reference Zhang2012). For example, the pitch range of dynamic tones is generally expanded when preceding a boundary (e.g., lists), whereas no pitch range expansion is expected when not preceding a boundary (e.g., compounds) (Xu et al., Reference Xu, Tang, Demuth and Xu Rattanasone2024; Zhang, Reference Zhang2012). For the level tone T1, instead of pitch range variation, its overall pitch height is higher when following a boundary, compared to the pitch preceding a boundary (Kuang et al., Reference Kuang, Chan and Rhee2023; Wang et al., Reference Wang, Xu and Ding2017). T3 is a special tone as it involves a complex phonological process, that is, tone sandhi,Footnote 1 in multisyllabic words (Chen, Reference Chen, Huang, Lin, Chen and Hsu2022; Chen & Yuan, Reference Chen and Yuan2007; Lai & Kuang, Reference Lai and Kuang2016; Shih, Reference Shih, Wang and Smith1997; Speer et al., Reference Speer, Shih and Slowiaczek1989; Zhang, Reference Zhang, Huang, Lin, Chen and Hsu2022). T3 was not investigated further here since children only acquire this complex process after the preschool period, especially in multisyllabic words (Tang et al., Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019a).

Figure 1. Pitch contours (citation forms) of lexical tones in Mandarin.
To our knowledge, only a few studies have investigated the acquisition of prosodic cues for sentence-level boundary marking by Mandarin-speaking preschoolers. One study found that Mandarin-speaking 4- to 5-year-olds can use preboundary lengthening and pauses to mark sentence boundaries, albeit with longer duration and inconsistent insertion of pauses compared to adults (Liu et al., Reference Liu, Yu, Chen and Li2023). Another study found that children can also implement pitch range expansion when preceding a boundary between sentences (Yu et al., Reference Yu, Liao, Wu, Li and Huang2020). Around this age, children are also producing connected speech and tonal processes in disyllabic words (Tang et al., Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019a; Xu Rattanasone et al., Reference Xu Rattanasone, Tang, Yuen, Gao and Demuth2018). Together, these findings suggest that by age 4 to 5, preschoolers can use both duration and pitch to mark sentence boundaries. However, it is yet unclear whether they can also use these cues to mark word boundaries. Word boundaries are critical to understanding speech as it differentiates a single item (i.e., compound) from multiple items (i.e., a list), essential for word recognition in connected speech (Beach et al., Reference Beach, Katz and Skowronski1996; Frazier et al., Reference Frazier, Carlson and Clifton2006).
So far, there has been only one study exploring Mandarin-speaking preschoolers’ use of prosodic cues at the word level, focusing on the implementation of preboundary pitch range expansion (Xu et al., Reference Xu, Tang, Demuth and Xu Rattanasone2024). The results showed that Mandarin-speaking 6-year-olds can correctly implement preboundary pitch range expansion for T2 and T4, with adult-like productions. However, it remains unclear whether and when children can use durational cues to mark boundaries and when children’s use of durational and pitch cues for phrasing become adult-like. As the pitch modification for phrasing is superimposed on tonal implementations, Mandarin-speaking children might only be able to use pitch for prosodic phrasing after they have acquired lexical tones by age 3 (Hua & Dodd, Reference Hua and Dodd2000; Li & Thompson, Reference Li and Thompson1977; Tang et al., Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019a; Xu Rattanasone et al., Reference Xu Rattanasone, Tang, Yuen, Gao and Demuth2018). They might then only be able to use pitch in an adult-like way after their tonal productions become adult-like after age 5 (Xu Rattanasone et al., Reference Xu Rattanasone, Tang, Yuen, Gao and Demuth2018). Furthermore, T1 is a level tone where pitch height is more critical than pitch range (Tupper et al., Reference Tupper, Leung, Wang, Jongman and Sereno2020; Zhang & Gu, Reference Zhang and Gu2023), which was not investigated by Xu et al. (Reference Xu, Tang, Demuth and Xu Rattanasone2024). This study therefore investigates the productions by Mandarin-speaking 4- to 6-year-olds, examining both durational (i.e., preboundary lengthening and pause) and pitch cues (i.e., pitch range for T2/T4 and pitch height for T1).
4. The current study
This study addressed two research questions: First, we want to know whether Mandarin-speaking preschoolers can implement durational cues for word boundaries to disambiguate compounds (N1 + N1) and lists (N1, N2), triggering changes of pitch cues (preboundary pitch range expansion and pitch height changes). If so, we then also want to know whether preschoolers show adult-like prosodic productions. The following hypotheses were formulated to test these research questions.
4.1. Hypothesis 1 (H1): Preboundary lengthening
As preboundary lengthening is a robust cue for boundary marking in Mandarin and has been acquired by 5-year-olds for sentence-level boundary marking (Bu et al., Reference Bu, Feng, Ang and Chen2022; Chen, Reference Chen, Huang, Lin, Chen and Hsu2022; Kuang et al., Reference Kuang, Chan, Rhee, Liberman and Ding2022; Shen, Reference Shen1992; Wang et al., Reference Wang, Xu and Ding2017), we expect that (a) Mandarin-speaking preschoolers should be able to use preboundary lengthening to distinguish compounds and lists (i.e., produce longer syllable durations for N1 in lists than compounds), but (b) might produce longer durations in both conditions compared to adults due to a slower speaking rate (Nip & Green, Reference Nip and Green2013; Yang et al., Reference Yang, Davis and Diehl2021).
4.2. Hypothesis 2 (H2): Pause
Based on Liu et al (Reference Liu, Yu, Chen and Li2023) and Yuen et al. (Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021), where Mandarin- and English-speaking 4- to 5-year-olds were found to insert pauses for boundary marking, we expected that Mandarin-speaking preschoolers would (a) insert pauses to disambiguate compounds and lists, with longer pause durations in lists than in compounds, but (b) might insert more pauses in compounds than adults and produce longer pauses.
4.3. Hypothesis 3 (H3): Pitch
Based on Xu et al. (Reference Xu, Tang, Demuth and Xu Rattanasone2024), where 6-year-olds can expand the pitch range for marking word boundaries, and Tang et al. (Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019a) and Xu Rattanasone et al. (Reference Xu Rattanasone, Tang, Yuen, Gao and Demuth2018) who provide acoustic evidence for the realization of tonal processes in disyllabic words by younger children, we expected that (a) Mandarin-speaking preschoolers should be able to use an expanded pitch range (for T2 and T4) and a higher pitch (for T1) in lists compared to compounds, though (b) their productions might not be like adults.
5. Methods
5.1. Participants
A total of 94 typically developing Mandarin-speaking preschoolers participated in this study, with 11 preschoolers subsequently excluded due to incomplete data (four 4-year-olds and seven 5-year-olds). The remaining 83 child participants were then divided into three age groups according to chronological age (see Table 1). Child participants were recruited from kindergartens in Hebei and Shenyang Province, where Standard Chinese is used in daily teaching activities. In China, children above the age of 6 can enrol in elementary school, but many are still in kindergarten as they are born later in the year and do not meet the age needed to enter elementary school when enrolment begins. Additionally, 43 adult native speakers of Mandarin (born in Beijing) were recruited from local universities as controls. None of the participants had any speech, hearing, or cognitive disorders. All received a gift for their participation. This study was conducted in accordance with the ethics protocol approved by Macquarie University’s Human Ethics Panel. Consent was provided by adult participants and by the principal of the kindergarten for child participants.
Table 1. Demographics of participants in each age group

5.2. Stimuli
A total of 16 picturable noun-noun items (N1-N2) were selected to form compounds (N1 + N2) and their related list forms (N1, N2). Of these, four items were used in the practice trials and twelve items in the test trials (see Supplementary Material 1, hereinafter, Supplementary Material S1). All items were selected from spontaneous speech produced by 1- to 6-year-olds in the following corpora: the Chang Corpus (Chang, Reference Chang1998), the Tong Corpus (Deng & Yip, Reference Deng and Yip2018), the Li/Zhou PeerTalk Corpus (Li & Zhou, Reference Li and Zhou2008), the Zhou Corpus (Zhou, Reference Zhou2001), and the Zhou Narratives Corpus (Li & Zhou, Reference Li and Zhou2011), all available as part of the CHILDES database (MacWhinney, Reference MacWhinney2000). These corpora consisted of spontaneous speech produced by preschoolers aged one to six. As T3 is acquired later and involves phonological process (cf. Tang et al., Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019a), only the lexical tones T1, T2, and T4 were used as N1.
Compounds and their related list forms were embedded into a carrier sentence “Zhe4-li3 you3…” (Here have …), followed by a filler noun (N3), constructing a two-item list in the compound condition (N1 + N2, N3) and a three-item list condition (N1, N2, N3). All items were presented on the screen using coloured clipart pictures.
5.3. Procedure
At the beginning of the test, a picture-naming task was conducted to familiarize all participants with the target words and their corresponding pictorial representations. Each picture depicting a set of items was presented on the screen one at a time, and the participants were asked to name all items shown in each picture (see example in Figure 2). The experimenter would provide the correct label if the participants did not produce any or the correct label. After the picture-naming task, four practice trials (two compound items and two list items) were presented to familiarize participants with the carrier sentence Zhe4-li3 you3… “Here have …” (without any conjunctions between N1 and N2). In instances where participants did insert a conjunction, for example, he “and,” during the training phase, they were provided further training.

Figure 2. Examples of compounds (left) and lists (right) in the testing phase.
During the testing phase, either two pictures eliciting compounds or three pictures eliciting lists were shown on the screen, and the experimenter asked “Can you tell me what these are?” Participants were instructed to answer the question using the provided carrier sentence to name objects from left to right. For example, if given the compound picture in Figure 2, participants were expected to say Zhe4-li3 you3 xiong2-mao1 he2 xi1-gua1 “Here have panda and watermelon,” and Zhe4-li3 you3 xiong2, mao1 he2 xi1-gua1 “Here have bear, cat, and watermelon” when given the list form. At this stage, if participants inserted conjunctions between N1 and N2, they were returned to the training phase until they stopped using conjunctions. Otherwise, the testing was discontinued after five attempts of the training (with 11 preschoolers not completing the study).
Compounds and lists were presented in a pseudo-randomized sequence, in which pictures of compounds (e.g., xiong2-mao1 “panda”) would never occur right after their corresponding list forms (e.g., xiong2, mao1 “bear, cat”) and vice versa. Two test versions were generated and counterbalanced across participants to avoid any list effects with no more than three compounds or lists shown successively. The productions were audio-recorded using a Marantz PM661 solid-state recorder and an AKG G520L head-worn microphone at a sampling rate of 44100 Hz. Each participant produced 24 target sentences, resulting in 1032 tokens from adults and 2256 tokens from preschoolers (744 tokens from 4-year-olds, 816 tokens from 5-year-olds, and 696 tokens from 6-year-olds).
5.4. Annotation and measurements
All recordings were annotated in Praat (Boersma & Weenink, Reference Boersma and Weenink2024). We adopted the following criteria to define the onset: (a) when the consonant was a plosive or an affricate, the onset was defined as the occurrence of a release burst in the spectrum and the waveform; (b) for fricatives, the onset of a syllable was determined by the beginning of high energy noise in the spectrum and a clear frication noise on the waveform; (c) for laterals, nasals, or glides, the onset was identified by the beginning of higher formants in the spectrum and periodic waveform. Since pitch information in lexical tones is mainly located in the rhyme part of the syllable in Mandarin, the onset of the rhyme was defined as the beginning of the second formant (F2) and the beginning of the periodic waveform. All syllables were either open syllables or had coda nasals. Hence, the criterion for the offset was the end of high formants and the end of periodic waveform. In compounds, for syllables where the onset of the second syllable was a vowel or a nasal, the offset of the first syllable was defined as the transition of F2 and the point of change in the periodic waveform.
Acoustic parameters were extracted using a Praat script by Xiong (Reference Xiong2017), including the syllable duration of N1, pause duration between N1 and N2, and pitch range and pitch height over N1. A criterion for the threshold of the presence or absence of a pause was based on Xu (Reference Xu1986), who reported that closure duration was 47 ms and 62 ms for aspirated/unaspirated plosives and 39 ms and 50 ms for aspirated/unaspirated affricates. Thus, the silent interval between N1 and N2 that exceeded the reference closure duration was coded as a pause. As acoustic parameters of plosives and affricates are stabilized by age 4, we adopted the same criterion to determine the presence or absence of a pause in preschoolers’ production (Peng & Chen, Reference Peng, Chen, Liu, Tsao and Li2020). Pause duration was calculated by subtracting the temporal offset of N1 from the onset of N2. Pitch tracks of annotated rhyme parts were checked and manually revised to avoid pitch doubling or halving errors. We also analysed normalized syllable and pause duration to account for speaking rate differences between children and adults, yielding results consistent with the raw data. For the sake of readability and organization of this manuscript, all normalized durational data are included in Supplementary Materials. Duration was normalized for each speaker using the following formula (1). Original F0 values were extracted at 10 equidistant points in Hz by the default algorithm in Praat. To better match human perception, F0 values from observed Hertz were further converted to semitones with a reference of 40 Hz. Pitch range was then derived by subtracting the lowest pitch from the highest pitch. The pitch height of T1 was normalized for each speaker to eliminate any individual differences in pitch height, using the following formula (2).


5.5. Statistical analysis
A total of 984 tokens from adults and 1992 tokens from preschoolers (648 tokens from 4-year-olds, 648 tokens from 5-year-olds, and 696 tokens from 6-year-olds) were included for further statistical analysis. An additional 48 tokens from adults and 264 tokens from preschoolers (96 tokens from 4-year-olds and 168 tokens from 5-year-olds) were excluded due to the insertion of the conjunction hai3-you3 “and” between N1 and N2, the presence of creaky voice, and/or the presence of environmental noise. All data were processed and analysed in R (R Core Team, 2023). All models were fitted using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). The syllable duration, pause duration, and pitch range were evaluated by linear mixed-effect models, and the occurrence of pause was evaluated by the generalized linear-mixed effect model with binomial distribution. To determine the model of best fit for each analysis, models with maximal random structures allowed by the data (without singularity or convergence issues) were selected based on a likelihood ratio test (Pinheiro & Bates, Reference Pinheiro and Bates2009). Statistical significance in all models was evaluated using the ANOVA function from the car package (Fox & Weisberg, Reference Fox and Weisberg2019), providing omnibus main effects and interactions among factors by conducting F-tests to estimate p-values. Pairwise comparisons were performed using emmeans package with Tukey’s honestly significant difference (HSD) adjustment (Lenth, Reference Lenth2024).
6. Results
The analyses were reported in three sections, with results on preboundary lengthening presented first, followed by pause insertion and finally pitch range expansion and height.
6.1. Preboundary lengthening
This section tested H1a: whether Mandarin-speaking preschoolers produce longer syllable duration in lists than in compounds and use preboundary lengthening to contrast these two structures, and H1b: whether preschoolers’ productions are adult-like. A linear mixed-effect model was fitted with syllable duration as the dependent variable and Condition (Compound and List) and Age group (4-, 5-, and 6-year-olds and adults) as fixed effects. Subject and Item were entered as random intercepts and slopes. Figure 3 shows the mean duration of N1 in Compound and List for each age group.

Figure 3. Duration of N1 in Compound (N1 + N2) and List (N1, N2) with +/− 1SE (standard error) for each Age group.
The results (see Table 2) revealed significant main effects for Condition and Age and a significant interaction for “Condition
$ \times $
Age.” See Supplementary Material S2 for parametric values of the model and Supplementary Material S3 for the results of the post hoc analysis. Results of normalized syllable duration can be found in Supplementary Material S4. Together, these results suggest that while all groups produced longer syllable duration in List (M = 579.25, SD = 175.52) than Compound (M = 377.43, SD = 118.18), all preschoolers (4-year-olds: M = 529.01, SD = 193.30; 5-year-olds: M = 556.21, SD = 197.87; 6-year-olds: M = 499.93, SD = 173.15) produced longer duration in List and Compound than adults (M = 381.23, SD = 109.59). The only difference between preschoolers was that 5-year-olds had a longer syllable duration in Compound than 6-year-olds.
Table 2. Statistical results of linear mixed-effect model in syllable duration. F-values, degree of freedom (df), and p-values are provided

Note: Significant results are in bold. R code for the model: lmer(Duration ~ Condition * Age group + (Condition | Subject) + (Condition | Item)). *** p < .001.
In sum, Mandarin-speaking preschoolers can use preboundary lengthening to contrast compounds and lists at age 4, though their productions are still not fully adult-like even by the age of 6, with a longer syllable duration in both compounds and lists than adults.
6.2. Pause
This section addressed H2a: whether Mandarin-speaking preschoolers can insert pauses in lists to distinguish from compounds, and H2b: whether pause insertion and duration might be adult-like. Two mixed-effect models were fitted first to examine the occurrence of pause and a second model to examine pause duration. Both models had Condition (Compound and List) and Age group (4-, 5-, and 6-year-olds and adults) as fixed effects and random intercept for Subjects in pause occurrence and random intercept and slope for Subject in pause duration.
The occurrence of pause is shown in Figure 4. The first model (results see Table 3) detected a significant main effect of Condition and Age group, but no significant interaction was found. See Supplementary Material S5 for estimated parameters of the model and Supplementary Material S6 for results on the post hoc analysis of Age group. The results showed that all groups were more likely to insert pauses in List (M = 95.23, SD = 10.42) than in Compound (M = 41.20, SD = 13.74), while all preschoolers (4-year-olds: M = 45.99, SD = 15.04; 5-year-olds: M = 45.06, SD = 12.92; 6-year-olds: M = 46.26, SD = 12.32) were all more likely to insert pauses in Compound than adults (M = 31.91, SD = 9.30).

Figure 4. The occurrence of pause in Compound and List by preschoolers (4-, 5-, and 6-year-olds) and adults, with +/−1SE (standard error).
Table 3. Statistical results of pause occurrence.
$ {\unicode{x03C7}}^2 $
, degree of freedom (df), and p-values are provided

Note: Significant results are in bold. R Model: glmer(Pause occurrence ~ Condition × Age group + (1 | Subject), family = binomial). *** p < .001.
Figure 5 displays the pause duration in Compound and List in each Age group. The results (see Table 4) from the second model revealed a significant main effect of Condition and Age group, as well as a significant Condition
$ \times $
Age group interaction. See Supplementary Material S7 for the estimated parameters of the model and Supplementary Material S8 for results on the post hoc analysis. Results of normalized pause duration can be found in Supplementary Material S9. Together, these results suggest that all groups produced a longer pause duration in List (M = 567.97, SD = 358.33) than in Compound (M = 118.04, SD = 71.03). Additionally, preschoolers produced (4-year-olds: M = 121.51, SD = 63.51; 5-year-olds: M = 152.90, SD = 108.08; 6-year-olds: M = 113.84, SD = 50.20) similar pause duration in Compound compared to adults (M = 86.62, SD = 19.05). In List, however, pause duration became more adult-like with age, where 4-year-olds (M = 688.46, SD = 419.86) and 5-year-olds (M = 657.88, SD = 468.00) produced longer pause duration compared to 6-year-olds and adults (M = 474.55, SD = 183.13), while 6-year-olds (M = 515.09, SD = 331.75) produced adult-like pause duration.

Figure 5. Pause duration in Compound and List by preschoolers (4-, 5-, and 6-year-olds) and adults, with +/−1 SE (standard error).
Table 4. Statistical results of pause duration. F-values, degree of freedom (df), and p-values are provided

Note: Significant results are in bold. R Model: lmer(Duration ~ Condition * Age group + (Condition | Subject)). ** p < .01, *** p < .001.
In sum, the results showed that Mandarin-speaking preschoolers can use pauses to contrast compounds and lists at age 4, but their productions were not fully adult-like even at age 6, with more pauses in compounds as compared to those of adults. However, there was a developmental trend for pause duration, decreasing with age and becoming adult-like by age 6.
6.3. Pitch
This section addressed H3a: whether Mandarin-speaking preschoolers use preboundary pitch range expansion preceding a boundary to disambiguate compounds and lists, as well as pitch height differences in T1, and H3b: whether their productions are adult-like. Two linear mixed effect models were constructed first to examine the use of preboundary pitch range expansion and a second model to investigate pitch height in T1. Both models had Condition (Compound and List) and Age group (4-, 5-, and 6-year-olds, and adults) as fixed effects. By-Subject and by-Item intercepts as well as the slope for Condition were entered as random effects. Tones (T1, T2, and T4) were also entered in the first model as a fixed effect. The mean pitch range of each tone in different age groups is shown in Figure 6.

Figure 6. Pitch range of N1 in Compound and List in T1/T2/T4 by Age group, with +/− 1 SE (standard error).
The results (see Table 5) revealed a significant main effect of Condition and Tone as well as a significant Condition
$ \times $
Tone and Tone
$ \times $
Age group interaction. No other significant main effects and interactions were detected. See Supplementary Material S10 for the parametric values of the model. For the results of the post hoc analysis, see Table 6 and Supplementary Material S11. These results showed that the pitch range of the two contour tones was significantly more expanded in List (T2 (M = 7.04, SD = 3.13) & T4 (M = 10.81, SD = 4.38)) than in Compound (T2 (M = 5.58, SD = 2.61) & T4 (M = 6.52, SD = 2.62)), whereas no significant pitch range difference was found over the level T1 in Compound (M = 2.49, SD = 1.15) compared to T1 in List (M = 2.45, SD = 1.29). Additionally, only T2 produced by 4-year-olds (M = 7.92, SD = 3.92) showed a significantly broader pitch range than that by adults (M = 4.89, SD = 1.63). No other significant pitch range differences were found in other groups and lexical tones. This suggests that Mandarin-speaking 5- and 6-year-olds’ productions are adult-like.
Table 5. Statistical results of pitch range of N1. F-values, degree of freedom (df), and p-values are provided

Note: Significant results are in bold. R Model: lmer(Pitch range ~ Condition * Tone * Age group + (1 | Subject) + (Condition | Item)). *** p < .001.
Table 6. Results of pairwise comparisons of Compound and List in different lexical tones

Note: Significant results are in bold. *** p < .001.
The mean pitch height in T1 can be found in Figure 7. See Supplementary Material S12 for the parametric values of the model. No significant main effects and interactions were detected. The results suggest that all preschoolers and adults did not produce T1 with significantly different pitch height in Compound versus List. There was also no significant pitch height difference among the groups (Table 7).

Figure 7. Pitch height in Compound and List in T1 by Age group, with +/− 1 SE (standard error).
Table 7. Statistical results of pitch height in T1. F-values, degree of freedom (df), and p-values are provided

Note: R Model: lmer(Normalized pitch height ~ Condition * Age group + (Condition | Subject) + (Condition | Item)).
Taken together, the results indicated that Mandarin-speaking preschoolers can use preboundary pitch range expansion in T2 and T4 with no difference in pitch height on T1 for these two structures, like adults. Furthermore, their productions of pitch range and pitch height were in general adult-like at age 4, though the pitch range on T2 produced by 4-year-olds was broader than adults.
7. Discussion
This study examined whether Mandarin-speaking preschoolers (4- to 6-year-olds) can produce prosodic cues, including preboundary lengthening, pause, preboundary pitch range expansion, and pitch height, to disambiguate compounds and lists, and if so, whether their productions are adult-like. Our findings support our hypotheses that Mandarin-speaking preschoolers, like their non-tonal language-speaking peers, can employ preboundary lengthening and insert pauses to disambiguate compounds and lists, but even 6-year-olds produce longer syllable duration and insert more pauses than adults. Additionally, our findings support our predictions regarding pitch range, showing that Mandarin-speaking preschoolers have mastered preboundary pitch range expansion for different tonal categories, disambiguating compounds and lists, where the older preschoolers were adult-like. Finally, neither preschoolers nor adults produced a pitch height difference for T1 between compounds and lists.
Our durational results extend previous findings from English (Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021), suggesting that 4-year-olds speaking a tonal language (Mandarin) have also already acquired the use of durational cues such as preboundary lengthening and pause to contrast compounds and lists, producing adult-like patterns for word boundary marking. However, Mandarin-speaking preschoolers tended to produce longer syllable durations than adults, indicating that their productions are not yet fully adult-like.
In addition to preboundary lengthening, Mandarin-speaking preschoolers can also insert pauses to disambiguate compounds and lists although the duration of the pauses is not adult-like until 6 years. These observations are in line with the performance of English-speaking preschoolers (Yuen et al., Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021). In addition to developing articulatory functions, a longer pause by preschoolers in lists might also be due to the development of working memory and planning skills. Liu et al. (Reference Liu, Yu, Chen and Li2023) showed a negative correlation between pause frequency and verbal working memory by Mandarin-speaking 4- to 5-year-olds. Preschoolers need more time to select the correct words and plan speech due to limited cognitive resources in working memory, sometimes resulting in speech disfluencies. Therefore, Mandarin-speaking preschoolers could be producing longer pauses than adults due to a slower overall speaking rate and longer speech planning time. Similar to Yoshida (Reference Yoshida2007) and Yuen et al. (Reference Yuen, Xu Rattanasone, Schmidt, Macdonald, Holt and Demuth2021), Mandarin-speaking preschoolers also inserted more pauses in compounds than adults, suggesting that they are less consistent in producing compounds as a single prosodic word.
Despite the potential challenge of simultaneously mapping pitch to both lexical and postlexical meanings, our results suggest that Mandarin-speaking preschoolers can implement appropriate preboundary pitch range expansion for dynamic tones (T2 and T4) to distinguish compounds and lists. However, neither children nor adults produce different pitch height for level T1 to distinguish compounds and lists (see below for further discussion on this point). Child productions are also adult-like with no significant group differences (except for T2 by 4-year-olds). This suggests that the non-adult-like lexical tone productions do not appear to impede preschoolers’ ability to implement preboundary pitch range expansion to convey postlexical meaning. Taken together with Tang et al.’s (Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019a) report that Mandarin-speaking 3-year-olds have acquired different tonal categories (even though the acoustic realizations are not yet fully adult-like), our findings indicate that acquiring tonal categories early might be sufficient for preschoolers to simultaneously map pitch to word-level and sentence-level meanings. Hence, our results suggest that Mandarin-speaking preschoolers can map pitch to multiple levels of linguistic structure.
Our results regarding pitch range expansion are also consistent with previous studies where no obvious changes were found for the level tone T1, but pitch range expansion was detected in dynamic tones before word boundaries (Lin, Reference Lin1999; Shi & Wang, Reference Shi and Wang2014; Wang et al., Reference Wang, Xu and Ding2017; Xu et al., Reference Xu, Tang, Demuth and Xu Rattanasone2024). Given that no pitch changes were found with the level tone, this suggests that pitch is not a robust cue across lexical tones for word boundary marking in Mandarin. This then has implications for the role of pitch and boundary marking for other tonal languages.
7.1. Limitations
Although our results suggest that Mandarin-speaking 4-year-olds can already use durational cues and pitch range expansion to distinguish compounds from lists, they employ a longer syllable duration and insert more pauses in compounds than adults. Future studies should include school-aged children to determine when their productions become adult-like.
While our results lend support to the claim that boundary cues trigger pitch range expansion in dynamic tones (Xu et al., Reference Xu, Tang, Demuth and Xu Rattanasone2024), we could not test this in the current study. Previous studies have shown cue-weighting among these prosodic cues for boundary marking, where pause insertion is more robust than preboundary lengthening (e.g., Zhang, Reference Zhang2012). Other studies have also pointed to the high correlation between prosodic cues and boundary strength in adults, such as longer pauses occurring with higher boundary strength (Cho, Reference Cho2016; Frota, Reference Frota2000; Horne et al., Reference Horne, Strangert and Heldner1995; Kentner et al., Reference Kentner, Franz, Knoop and Menninghaus2023). These issues are worth re-examining in light of our findings.
8. Conclusion
This study is the first attempt to investigate whether Mandarin-speaking preschoolers can produce prosodic cues for word boundary marking to contrast compounds and lists and, if so, whether their productions are acoustically like those of adults. The results show that similar to their non-tonal language-speaking peers, Mandarin-speaking preschoolers can use durational cues by age 4, though they are not adult-like, producing longer durations and inserting more pauses in compounds before age 6. We also found that pitch information does not appear to be a reliable cue for marking boundaries in Mandarin, raising questions about the multiple roles of pitch in other tonal languages. Finally, this study provides a baseline for examining the acquisition of prosodic cues by atypical populations, such as preschoolers with cochlear implants who do not receive adequate pitch information due to device use, showing delays in acquiring lexical tones (Tang et al., Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019b). It also raises questions regarding whether Mandarin-speaking preschoolers can perceive these prosodic cues to guide their identification and understanding of compounds versus lists.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000925000194.
Acknowledgements
We thank the Department of Linguistics and the Macquarie University Hearing Research Centre for their support. We also thank Ivan Yuen and Serje Robidoux for theoretical and statistical feedback and suggestions. We acknowledge the assistance from teachers at Tuofu, Doudou Wu, and Aishang kindergartens with data collection. This research was supported, in part, by a Macquarie University iMQRES scholarship to the first author, The National Social Science Fund of China (20CYY012) to the second author, and the Hearing Innovation Grant to the last two authors. The equipment was supported by the School of Foreign Studies at Nanjing University of Science and Technology.
Competing interests
The authors declare no potential competing interests.
Disclosure of use of artificial intelligence (AI) tools
No AI tools beyond spelling and grammar checking were used during the preparation of the manuscript.