Introduction
While most autistic children experience significant delays in the acquisition of language, a proportion of them do acquire language at a typical pace or with relatively modest delays (American Psychiatric Association [APA], 2013; Kim et al., Reference Kim, Paul, Tager-Flusberg, Lord, Volkmar, Rogers, Paul and Pelphrey2014). Between three and five years of age, these autistic preschoolers pass all early language milestones (e.g., onset of preverbal vocalizations, canonical babbling, first words) and reach using phrase speech, like their typically developing (TD) peers. Despite appearing fully verbal on standardized measures of language, these autistic children may still subtly struggle with accurately selecting and combining linguistic elements (phonemes, words, morphemes, and phrases) to produce meaningful sentences and at delivering an acoustically adequate speech. In clinical settings as well as in many scientific reports (e.g., Brynskov et al., Reference Brynskov, Eigsti, Jørgensen, Lemcke, Bohn and Krøjgaard2016; Cleland et al., Reference Cleland, Gibbon, Pepp, O’Hare and Rutherford2010; Peristeri et al., Reference Peristeri, Andreou and Tsimpli2017), language abilities of verbal autistic children are usually assessed using standardized assessments or highly controlled speech elicitation tasks (e.g., denomination task). However, standardized assessments are probably too coarse a tool to delineate and identify the potential properties specific to the speech of verbal autistic preschoolers who have followed what appears to be a typical trajectory of language development. Analyses of naturalistic language samples have been shown to capture more accurately the linguistic competence of young autistic children (Bacon et al., Reference Bacon, Osuna, Courchesne and Pierce2019). In this paper, our aim is to describe the structural and acoustical specificities of the spontaneous speech of young verbal autistic children collected during classic real-world-like interactions with a caregiver or a stranger.
Previous reports focusing on different areas of structural language skills, based on standardized assessments or naturalistic speech samples, have highlighted both strengths and weaknesses in the linguistic profiles of autistic children.
Phonology seems to be an area of relative strength for autistic children with functional expressive language. Although initial acquisition delays are documented, phonological skills are usually unimpaired, at least in autistic children with no or mild language disabilities (Cleland et al., Reference Cleland, Gibbon, Pepp, O’Hare and Rutherford2010; Rapin et al., Reference Rapin, Dunn, Allen, Stevens and Fein2009; Shriberg et al., Reference Shriberg, Paul, Black and Van Santen2011; Wolk et al., Reference Wolk, Edwards and Brennan2016). However, most results were obtained from highly controlled standardized assessments and speech elicitation tasks and may therefore not reflect the phonological properties of spontaneous speech delivery.
Turning to lexicon, relative to vocabulary matched TD peers, autistic toddlers produce words of greater length and show a lesser tendency to acquire phonological neighbors (i.e., words that are highly phonologically similar) (Kover & Ellis Weismer, Reference Kover and Ellis Weismer2014). These two phenomena could be a consequence of a persistent use of echolalic speech in autism (Stiegler, Reference Stiegler2015). If autistic children preferentially speak echoing linguistic strings (usually drawn from cartoon, songs or idiomatic constructions frequently used by someone around them), their early word production should display fewer of the features that characterize early vocabulary in typical development: short words and high phonological resemblance. Another consequence of echolalia could be a more restricted, and thus less diverse, use of vocabulary items. Surprisingly though, lexical diversity has been found to be equal in a study comparing three- to six-year-old autistic children with younger TD children matched on nonverbal IQ and gender (Eigsti et al., Reference Eigsti, Bennetto and Dadlani2007) and in a study comparing six- to 12-year-old autistic children with TD children matched on age, verbal IQ and expressive vocabulary during a narration task (Peristeri et al., Reference Peristeri, Andreou and Tsimpli2017). These results suggest that expressive lexical skills are intact in verbal autistic children. It should be noted, however, that in a narration task, lexical production is guided by the images in the book, unlike in real-word situations such as a parent-child interaction. In addition, both Eigsti et al. (Reference Eigsti, Bennetto and Dadlani2007) and Peristeri et al. (Reference Peristeri, Andreou and Tsimpli2017) measure lexical diversity through the type-token ratio, which has been documented to be imperfect and highly sensitive to sample length variations (Fergadiotis et al., Reference Fergadiotis, Wright and Green2015).
The mastery of structural language also involves advanced morpho-syntactic skills to combine words into grammatical and meaningful sentences of increasing length. Syntactic complexity, as measured on manually coded transcriptions or standardized assessments, has been repeatedly shown to be lower in autistic than TD children (Brynskov et al., Reference Brynskov, Eigsti, Jørgensen, Lemcke, Bohn and Krøjgaard2016; Condouris et al., Reference Condouris, Meyer and Tager-Flusberg2003; Eigsti et al., Reference Eigsti, Bennetto and Dadlani2007; Park et al., Reference Park, Yelland, Taffe and Gray2012). Peristeri et al. (Reference Peristeri, Andreou and Tsimpli2017) did report that a subgroup of autistic children with high language abilities (based on verbal IQ and expressive vocabulary) displayed high syntactic complexity and did not differ from TD children.
Although acoustical atypicalities of autistic individuals’ speech are said to be evident early and to persist into adulthood regardless of language levels (Baltaxe & Simmons, Reference Baltaxe and Simmons1985; DePape et al., Reference DePape, Chen, Hall and Trainor2012), objective acoustical studies of verbal autistic children’s voice report highly inconsistent results (see Fusaroli et al., Reference Fusaroli, Lambrechts, Bang, Bowler and Gaigg2017 for a meta-analysis). The most widespread measure of vocal atypicalities under investigation is pitch (measured by fundamental frequency), and more precisely mean pitch and pitch range. Many studies report no group effects on mean pitch (Bonneh et al., Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011; Diehl & Paul, Reference Diehl and Paul2012, Reference Diehl and Paul2013; Diehl et al., Reference Diehl, Watson, Bennetto, McDonough and Gunlogson2009; Grossman et al., Reference Grossman, Bemis, Skwerer and Tager-Flusberg2010; Lyakso et al., Reference Lyakso, Frolova and Grigorev2016; Nadig & Shaw, Reference Nadig and Shaw2012; Nakai et al., Reference Nakai, Takashima, Takiguchi and Takada2014; Patel et al., Reference Patel, Nayar, Martin, Franich, Crawford, Diehl and Losh2020; Scharfstein et al., Reference Scharfstein, Beidel, Sims and Rendon Finnell2011) or pitch range (Diehl & Paul, Reference Diehl and Paul2012; Green & Tobin, Reference Green and Tobin2009; Grossman et al., Reference Grossman, Bemis, Skwerer and Tager-Flusberg2010; Hubbard & Trauner, Reference Hubbard and Trauner2007; Nakai et al., Reference Nakai, Takashima, Takiguchi and Takada2014; Patel et al., Reference Patel, Nayar, Martin, Franich, Crawford, Diehl and Losh2020; Scharfstein et al., Reference Scharfstein, Beidel, Sims and Rendon Finnell2011) between autistic and TD children. Other studies, however, report higher mean pitch values (Filipe et al., Reference Filipe, Frota, Castro and Vicente2014; Sharda et al., Reference Sharda, Subhadra, Sahay, Nagaraja, Singh, Mishra, Sen, Singhal, Erickson and Singh2010) and wider pitch range (Bonneh et al., Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011; Diehl & Paul, Reference Diehl and Paul2013; Diehl et al., Reference Diehl, Watson, Bennetto, McDonough and Gunlogson2009; Filipe et al., Reference Filipe, Frota, Castro and Vicente2014; Sharda et al., Reference Sharda, Subhadra, Sahay, Nagaraja, Singh, Mishra, Sen, Singhal, Erickson and Singh2010) in autistic children in comparison to TD children. Despite somewhat inconsistent results and small to moderate effect sizes, autistic individuals’ atypical prosody seems to be generally characterized by a higher mean pitch and larger pitch range (Fusaroli et al., Reference Fusaroli, Lambrechts, Bang, Bowler and Gaigg2017, Reference Fusaroli, Grossman, Bilenberg, Cantio, Jepsen and Weed2022). The same goes for another common measure of vocal atypicality: duration. Again, some studies report no group effects in speech duration (Hubbard & Trauner, Reference Hubbard and Trauner2007; Patel et al., Reference Patel, Nayar, Martin, Franich, Crawford, Diehl and Losh2020; Paul et al., Reference Paul, Bianchi, Augustyn, Klin and Volkmar2008) or speech rate (Nadig & Shaw, Reference Nadig and Shaw2012), while others indicate that duration is longer (Bonneh et al., Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011; Diehl & Paul, Reference Diehl and Paul2012; Filipe et al., Reference Filipe, Frota, Castro and Vicente2014; Grossman et al., Reference Grossman, Bemis, Skwerer and Tager-Flusberg2010) and speech rate is slower (Bonneh et al., Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011; Patel et al., Reference Patel, Nayar, Martin, Franich, Crawford, Diehl and Losh2020) in autistic children when compared to TD children.
While pitch and duration have received substantial attention in such acoustical descriptions of autistic children’s speech, less interest has been given to other acoustical measures such as jitter, shimmer or formant frequencies (F1, F2, F3). Jitter, an index of the frequency variation of the vibration of vocal cords from cycle to cycle, and shimmer, an index of the amplitude variation of the speech production from cycle to cycle, are both acoustical measures of voice quality. Bone et al. (Reference Bone, Lee, Black, Williams, Lee, Levitt and Narayanan2014) found autistic children to have more jitter and jitter variability but found no difference in shimmer. As for autistic adults, two studies reported lower jitter and shimmer, and smaller F1-F3 dispersion, indicative of a greater stability both in phonation and articulation in comparison to non-autistic adults (Kissine & Geelhand, Reference Kissine and Geelhand2019; Kissine et al., Reference Kissine, Geelhand, De Foy, Harmegnies and Deliens2021).
An important confounding factor that can play a considerable role on the acoustic properties of one’s speech, and that has not been controlled for very thoroughly in the studies cited above, is age. Age ranges are usually very large (median age range of cited studies: seven years) spanning large developmental periods and sometimes overlapping over childhood, adolescence, and adulthood (Diehl & Paul, Reference Diehl and Paul2012, Reference Diehl and Paul2013; Diehl et al., Reference Diehl, Watson, Bennetto, McDonough and Gunlogson2009; Green & Tobin, Reference Green and Tobin2009; Hubbard & Trauner, Reference Hubbard and Trauner2007; Lyakso et al., Reference Lyakso, Frolova and Grigorev2016; Nadig & Shaw, Reference Nadig and Shaw2012; Paul et al., Reference Paul, Bianchi, Augustyn, Klin and Volkmar2008; Scharfstein et al., Reference Scharfstein, Beidel, Sims and Rendon Finnell2011). However, acoustical measures are strongly influenced by anatomical properties of the vocal tract which differ between younger and older children. Moreover, verbal autistic preschoolers have been given very little focus in the studies cited in this literature survey (only Bonneh et al., Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011; and Nakai et al., Reference Nakai, Takashima, Takiguchi and Takada2014 focus specifically on children younger than six).
In sum, even those verbal autistic preschoolers who have had little language delays are likely to display an atypical speech delivery in terms of the structure and acoustics of their language. Verbal autistic preschoolers who communicate using phrase speech represent a relatively small proportion of the children on the spectrum (e.g., they represent 17% of the larger sample from which they were pooled for the present study (Maes et al., Reference Maes, Weyland and Kissine2022 – see Methods section). Such a linguistic profile is thus not very frequent in autistic preschoolers. The present study seeks to offer a thorough description of both the structure and acoustics of three- to five-year-old verbal autistic preschoolers’ language obtained from naturalistic spontaneous speech samples and compare it to the speech of TD children matched on age, nonverbal IQ and socioeconomic status. As mentioned above, analyses of naturalistic language samples are the best way to capture the true linguistic competence of autistic children (Bacon et al., Reference Bacon, Osuna, Courchesne and Pierce2019), but are still relatively scarce in the literature. The linguistic description presented below is primarily qualitative, though highly detailed, and thus relies on a limited sample of both autistic and TD children. This study addresses several gaps in the literature by integrating both measures of structural language and acoustics, by restricting the age range to the preschool years, and by comparing the autistic children to a group of TD children strictly matched on variables known to impact language development: chronological age, nonverbal IQ and socioeconomic status. The limited sample size is thus counterbalanced by stringent matching and the depth of analyses. To the best of our knowledge, this is also the first description of French-speaking autistic preschoolers’ verbal abilities.
Based on the literature surveyed above, we expect verbal autistic children to be less performant than their TD peers on advanced aspects of structural language such as the lexicon and morpho-syntax. We also expect phonology to be a strength of autistic children. We expect autistic children’s speech delivery to be acoustically atypical with higher fundamental frequency (F0) values and longer speech duration. We also expect their speech delivery to be less flexible than that of TD peers by exhibiting lower jitter, shimmer, and formant dispersion.
Methods
Participants
Participants were ten verbal autistic and ten TD children. The autistic children had all received a formal diagnosis of Autism Spectrum Disorder (ASD) from a multi-disciplinary team before entering the study (or shortly after). The TD children had no known neurodevelopmental condition. All children were enrolled in French-speaking schools. Most children in both groups were raised in monolingual French-speaking settings. Some of the children (two in the ASD group and three in the TD group) were exposed to other languages (i.e., Lingala, Kinyarwanda, Dutch, Spanish, and Arabic) at home. However, these five children were reported to use French as a primary language with at least one of their parents and to be exposed to French more frequently than to their second language.
The children in the ASD group were pooled from a larger sample (n = 59) of three- to five-year-old autistic children that participated in a broader project on early linguistic development in autism which included many children with minimal spoken language abilities, and which was designed to target a critical window for speech emergence in autism, between three and less than six years of age (Anderson et al., Reference Anderson, Lord, Risi, DiLavore, Shulman, Thurm, Welch and Pickles2007). The autistic children described in this paper were identified as being the most verbal children of the larger sample (Maes et al., Reference Maes, Weyland and Kissine2022) and were selected on the basis that, like TD children, they were administered module 2 of the Autism Diagnostic Observation Schedule (ADOS-2; Lord et al., Reference Lord, Rutter, DiLavore, Risi, Gotham and Bishop2012) and not module 1. The ADOS-2 is a gold-standard semi-structured evaluation for the diagnosis of autism; module 1 requires no or little speech and module 2 requires phrase speech. The TD children were also pooled from a larger sample of children (n = 39) but were selected so that they would be matched pairwise with the autistic children on chronological age, nonverbal IQ (as measured with the Leiter International Performance Scale-Third Edition (Leiter-3; Roid et al., Reference Roid, Pomplun, Martin, Naglieri and Goldstein2009)) and socioeconomic status.
Following Bang et al.’s workflow (Bang et al., Reference Bang, Sharda and Nadig2020), we used propensity scores, that summarize several variables into one single score, to conduct the pairwise matching. We implemented the matching in R (R Core Development Team, 2019) using the matchit function from the MatchIt package (Ho et al., Reference Ho, Imai, King and Stuart2007) with the “nearest neighbor” method. The “nearest neighbor” method selected the ten (out of the initial 39) TD participants whose propensity scores were closest in distance from each autistic participant. The output of the matching procedure is therefore two groups of participants (TD and autistic) matched pairwise on chronological age, nonverbal IQ, and socioeconomic status.
Table 1 shows the descriptive statistics on a series of characteristics and confirms that the two groups are properly matched on the selected matching variables. It also shows that, incidentally, the two groups do not differ on verbal IQ (as measured with the Peabody Picture Vocabulary Test-Revised (PPVT-R; Dunn & Dunn, Reference Dunn and Dunn2007)) either. The two groups are also balanced in gender ratio (ASD: six boys, four girls; TD: seven boys, three girls).
Autistic symptomatology severity is measured with ADOS-2 comparison scores (from 1 to 10 with higher scores indicating a more severe symptomatology), nonverbal IQ by Leiter-3, and verbal IQ by PPVT-R. Socioeconomic status is based on parents’ economic and educational background and ranges from 0 to 19 with higher scores indicating higher socioeconomic status. ANOVA analyses of variance were used to assess between-group differences.
Language sample
Language samples were retrieved from two elicitation contexts: a parent-child free play and the administration of the ADOS-2, an experimenter-child semi-structured play (see Table 2). Both interactions were recorded using a Tascam DR-05 recorder located approximatively 30 cm away from the child. The recording of one parent-child interaction is missing in the ASD group because we were unable to organize the play session with the parent.
The audio recordings were segmented and transcribed using Praat (Boersma & Weenink, Reference Boersma and Weenink2013). For each audio recording, a TextGrid was created and divided, initially, into three tiers (see Fig. 1). In the target tier, the children’s verbal productions (i.e., word approximations, words, two-word combinations, full phrases) were isolated and labelled as “in” if they did not significantly overlap with any background noise or voices, and as “overlap” if they did. Utterance segmentation was driven by a pause-based rule: productions were segmented as independent utterances as soon as they were separated by a pause greater than 250 ms. Each segmented utterance was transcribed in a broad phonetic transcription using symbols of the Speech Assessment Methods Phonetic Alphabet (SAMPA) (Wells, Reference Wells, Gibbon, Moore and Winski1995) in the phono tier and then translated into an orthographic transcription in the ortho tier.
Next, the phone segmentation function of EasyAlign was used on each TextGrid computing words, syllables, and phones segmentations in three new tiers (see Fig. 1). With the help of two trained linguistics master students, we manually checked those three new tiers to correct any error in temporality.
Finally, the first 110 complete, intelligible, clear utterances of each child in each recording context were coded for morpho-syntactic inflections in the mlu tier (see Fig. 1). False starts, interruptions, or unintelligible utterances were excluded. These utterances were coded following conventions for the coding of French grammatical inflections established by Thordardottir (Reference Thordardottir2005). A detailed description of the coding of grammatical inflections as well as examples are provided in Supplemental material.
Procedure
The study reported in this paper was part of a larger four-session experiment on early linguistic development in autism. During the first session, parents were asked to take part in the free play with their child. During the second session, children took part in the ADOS-2 with the lab neuropsychologist, who holds an ADOS-2 certification. During the third and fourth session, they were administered the Leiter-3 and the PPVT-R, respectively. All sessions also included eye-tracking experiments unrelated to this paper.
Ethical approval was obtained for the entire experiment from the ULB-Erasme Ethics committee in accordance with the declaration of Helsinki (approval code: P2018/499/B406201837514). Parents signed a written informed consent form for their child to participate in the study. The children were also asked for oral assent.
Structural language measures
Structural languages abilities were assessed using three measures of language: phonetic inventories, lexical diversity, and Mean Length of Utterance (MLU) in morphemes.
Phonetic inventories were used as a measure of the phonemes acquired by the children and were extracted from the phono tier of each child in each context. Full phonetic inventory corresponded to the percentage of phonemes represented in the child’s productions over the total number of phonemes in the French sound system. Consonant inventory corresponded to the percentage of consonants represented in the child’s productions over the total number of consonants in the French sound system. Vowel inventory corresponded to the percentage of vowels represented in the child’s productions over the total number of vowels in the French sound system. For a phoneme to be included within a child’s inventory, it had to be produced at least three times by the child, in any position (see Chenausky et al., Reference Chenausky, Nelson and Tager-Flusberg2017). Phonemes that were produced as substitutions for other phonemes were also included. A detailed description of how the phonetic inventories were built is provided in Supplemental material.
Lexical diversity was used as a measure of the extent of the vocabulary used by each child. Lexical diversity is a measure of the number of different words used (types) out of the total number of words uttered (tokens). The list of function and content (common and proper) words and word approximations produced by each child in each context was extracted from the word tier and manually translated into their lemmatic form. A detailed description of the word lemmatization procedure as well as examples are available in Supplemental material. Lexical diversity was measured using the Moving-Average Type-Token Ratio (MATTR) which was computed in R (R Core Development Team, 2019) with the textstat_lexdiv function of the quanteda package. MATTR is resistant to sample length variations because it calculates the lexical diversity of a sample using a moving window that measures type-token ratios for each successive window of fixed length (Covington & McFall, Reference Covington and McFall2010; Fergadiotis et al., Reference Fergadiotis, Wright and Green2015). The mean of the successive type-token ratios is the measure of lexical diversity of the language sample. The moving window was fixed to 100 words because it had to be lower than the smallest sample length in the language sample and one TD child produced only 111 word tokens during the parent-child interaction (see Suppl. Table 4).
MLU in morphemes was computed as a measure of morpho-syntactic complexity of the speech of each child. The list of morphemes produced by each child in each context was extracted from the mlu tier. Because the play is likely to be still under preparation and a child can feel a bit uncomfortable at the beginning of an evaluation, which may delay the onset of the conversation (especially for autistic children), the first ten utterances produced by each child in each context were discarded. MLU was computed as the total amount of produced morphemes divided by the total amount of included transcribed utterances, namely 100. It is important to note that three TD children produced less than 110 utterances during the parent-child interaction (namely, 27, 61 and 83 utterances). MLU can still be computed even though the child did not reach the cut-off of 110 utterances, as it represents the mean number of morphemes per utterance. For these children, however, the ten first utterances were not discarded as it would have reduced their data even more. Statistical analyses will be conducted with and without those three children and changes in the results will be reported.
Acoustical measures of speech
Specificities of the speech of verbal autistic children during the parent-child interaction and the ADOS-2 were assessed using different acoustical measures: syllable duration, fundamental frequency (F0), jitter, shimmer and a dispersion index of the two first formants (F1, F2). All acoustical analyses were conducted in Praat (Boersma & Weenink, Reference Boersma and Weenink2013).
In order to measure syllable duration, the duration of all V, CV, VC and CVC syllables (V = vowel, C = consonant) was extracted from the syllable tier of each child in each context by subtracting the start time of each syllable from its end time. This analysis yielded a syllable sample of 37,017 tokens: 18,475 in the group of TD children and 18,542 in the group of autistic children.
Voice quality measures, however, were computed at the vowel level. In each audio file, each piece of sound that corresponded to an interval containing a vowel in the phone tier was extracted as a single sound object. For each of these vowel-level sound objects, start time, duration, median F1, median F2, median F0, F0 range, local jitter, and local shimmer were computed and exported to the output file. F1, F2 and F0 measures as well as jitter and shimmer were extracted in a time window beginning at 25% and ending at 75% of the duration of the vowel to obtain measures at the moment in time when the vowel is most stable and less influenced by coarticulation. Furthermore, median F0 was log transformed prior to all analyses detailed below.
Finally, a F1-F2 dispersion index was generated for each vowel of each participant (see Kissine & Geelhand, Reference Kissine and Geelhand2019), with higher values indicating less articulatory stability. First, for each participant, the mean value of median F1 (T mF1) and median F2 (T mF2) was computed for each type of vowel. Then, the Euclidean distance between the median F1 (VF1) and median F2 (VF2) of each vowel of each participant and the mean values of median F1 and median F2 of the corresponding vowel type was calculated as follows to generate the dispersion index:
Acoustical measures of speech can be strongly influenced by overlapping noises and voices. All utterances labelled as “in” in the target tier were listened to by the second author who specified whether she agreed that those utterances were free of any background noises. Only vowels extracted from utterances labelled as “in” by both the first and second author were included in subsequent analyses, yielding a vowel sample of 11,946 tokens: 5,923 in the TD group and 6,023 in the ASD group.
Analytic plan
All statistical analyses were implemented in R (R Core Development Team, 2019). Between-group differences on phonetic inventories, lexical diversity and MLU were investigated using simple linear regressions with the lm function from the stats package. Inventory type (consonant vs. vowel), Group (ASD vs. TD), and Age (in months) were used as independent variables. Between-group differences on syllable duration and all voice quality measures were investigated using multilevel linear regressions with the lmer function from the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). Group (ASD vs. TD) was the independent variable.
Tables reporting the outputs of all stepwise comparisons of multilevel linear regressions on the acoustical measures of speech are available as Supplemental material (see Suppl. Table 6 and Suppl. Table 7).
Results
Structural language: phonetic inventory
A simple linear regression model on the full phonetic inventory showed that the effect of group did not reach statistical significance (p = .3) when controlling for total recording length. Autistic children did not differ from TD children in the proportion of phonemes from the French sound system represented in their productions (see Table 3).
Subsequently, a simple linear regression on the proportion of acquired phonemes with the Group x Inventory type interaction as fixed effect and total recording length as controlling variable revealed no significant effect of Group or Group x Inventory type interaction (both p > .1). However, it did reveal a significant main effect of Inventory type (F (1, 35) = 144.26, p < .001). Vowel inventories were significantly larger than consonant inventories in both groups (β = 1.19-1, p < .001). Further analyses showed that this effect was driven by the fact that vowel inventories were complete for all children in both groups, but consonant inventories were not (see Table 3).
Structural language: lexical diversity
A simple linear regression model on the MATTR index with Group as fixed effect reached significance (F (1,37) = 8.73, p = .005, R2 = .19). Further analyses showed that the MATTR index was significantly lower in the ASD group than in the TD group (β = -.04, p = .005), indicating that verbal autistic children had lower lexical diversity than TD children matched on age, nonverbal IQ and socioeconomic status (see Fig. 2 and Table 3). Interestingly, the between-group difference remained significant (β = -.03, p = .01) even when controlling for MLU in morphemes. That is, even when differences in grammatical abilities are taken into account, autistic children still use a less diverse range of vocabulary than TD children. Finally, we verified whether there was an effect of Age in both groups. Neither the simple effect of Age nor the interaction with Group reached significance (both p > .6).
Structural language: MLU in morphemes
A simple linear regression model with Group as fixed effect showed a trend towards a statistically significant difference between autistic and TD children (F (1, 37) = 4.13, p = .05, R2 = .08), with children in the ASD group having slightly lower MLU in morphemes than children in the TD group (β = -.67, p = .05) (see Table 3). The exclusion of the three TD participants that did not produce 110 utterances did not significantly impact the results of the regression (F (1,34) = 4.05, p = .05, R2 = .08). We checked whether there was an effect of Age in both groups. Again, neither the simple effect of Age nor the interaction with Group reached significance (both p > .3)
Acoustics: syllable duration
Between-group differences in syllable duration were investigated using stepwise comparisons of multilevel linear regression models with by participant and by syllable structure random intercepts. Results showed that the addition of Group as fixed effect did not significantly improve the model fit but did tend towards significance (χ² (1) = 3.54, p = .06). Children in the ASD had a numerically higher syllable duration (see Table 3).
Acoustics: voice quality measures
Between-group differences on the four voice quality measures (log-transformed median F0, F0 range, jitter and shimmer) were each investigated using multilevel linear regression models with by participant and by vowel type random intercepts. Stepwise comparisons of models revealed an absence of group effect on log-transformed median F0 (p = .66), on F0 range (p = .97), on jitter (p = .91) and on shimmer (p = .43). Voice quality, as assessed by median F0, F0 range, jitter and shimmer, was similar across groups (see Table 3).
Acoustics: F1-F2 dispersion index
Between-group differences were investigated using stepwise comparisons of multilevel linear regression models with by participant and by vowel type random intercepts. The addition of group as fixed effect, as compared to the null model, did not improve the fit of the model (p = .55). The F1-F2 dispersion index was similar in autistic and TD children (see Table 3).
Discussion
Relying on measures of structural language and acoustic analyses, we compared naturalistic speech samples of highly verbal three- to five-year-old autistic children with those of TD children, matched pairwise on chronological age, nonverbal IQ and socioeconomic status, and groupwise on verbal IQ and gender. Overall, the results show that those autistic children who have developed high verbal skills by preschool age present few remaining language disabilities. Like TD peers, they communicate using phrase speech, make an almost complete use of the sound of their language, and show typical patterns of voice acoustics (on measures of F0, formants and voice quality).
The main remaining weakness in verbal autistic children’s language highlighted in our results was that they produced a less diverse range of vocabulary items in their spontaneous speech than TD children. This is inconsistent with previous research (Eigsti et al., Reference Eigsti, Bennetto and Dadlani2007; Peristeri et al., Reference Peristeri, Andreou and Tsimpli2017), which found no differences in lexical diversity between autistic and TD children. However, children in Peristeri et al. (Reference Peristeri, Andreou and Tsimpli2017) were older (school age) and may have managed to catch up with an initial deficit in lexical diversity, reaching levels of age-matched TD children. Autistic children in Eigsti et al. (Reference Eigsti, Bennetto and Dadlani2007) are directly comparable to those in this study in terms of age, but were matched on nonverbal IQ and gender with younger TD children. As a result, the lack of between-group difference in lexical diversity could be owed to the age difference.
The fact that our two groups are also comparable in verbal IQ (as measured by the PPVT-R) makes the between-group difference in lexical diversity all the more interesting. That is, at overall comparable levels of verbal and linguistic proficiency, young autistic children still appear to be more restricted in their spontaneous use of the lexicon. As mentioned in the introduction, this aspect of young verbal autistic children’s language could reflect their persistent use of repetitive and stereotyped speech – a diagnostic characteristic of many autistic children’s speech (Kim et al., Reference Kim, Paul, Tager-Flusberg, Lord, Volkmar, Rogers, Paul and Pelphrey2014; Lord et al., Reference Lord, Rutter, DiLavore, Risi, Gotham and Bishop2012). If young autistic preschoolers are restricted in their spontaneous use of vocabulary by their production of repetitive and stereotyped speech, it is plausible that their lexical diversity may approach that of younger TD children speech (Eigsti et al., Reference Eigsti, Bennetto and Dadlani2007), even if they already have a more fluent use of phrase speech. For example, verbal autistic children may use (contextually accurate or not) delayed echolalia, i.e., verbatim repetition of linguistic chunks extracted from the speech of adults around them or from sources such as songs, cartoons, or TV programs. Subsequently, as they start breaking down their echoed utterances to generalize their use to spontaneous self-generated sentences that can be used in a variety of contexts, verbal autistic children may eventually catch up with TD children in terms of lexical diversity (Peristeri et al., Reference Peristeri, Andreou and Tsimpli2017). It makes sense to speculate, but only speculate at this stage, that a diminished lexical diversity could be a characteristic of verbal autistic children who are (still) relatively echolalic and stereotyped, and not a global characteristic of verbal autistic individual’s speech.
Not statistically significant but still acknowledgeable, morpho-syntactic complexity was found to be somewhat lower for autistic children compared with TD children. Although potentially influenced by our limited sample size, this result is consistent with the overall results from the literature that verbal autistic children produce utterances that are less complex in terms of morpho-syntax in comparison with TD children (Brynskov et al., Reference Brynskov, Eigsti, Jørgensen, Lemcke, Bohn and Krøjgaard2016; Condouris et al., Reference Condouris, Meyer and Tager-Flusberg2003; Eigsti et al., Reference Eigsti, Bennetto and Dadlani2007; Park et al., Reference Park, Yelland, Taffe and Gray2012), and finds a straightforward explanation in the fact that morpho-syntactic abilities constitute really advanced language abilities. In that sense, autistic preschoolers may still be a bit behind in that area, but will eventually catch up with TD children (Peristeri et al., Reference Peristeri, Andreou and Tsimpli2017).
Quite surprisingly, and while we expected to find between-group differences in voice acoustics based on the surveyed literature, a large meta-analysis (Fusaroli et al., Reference Fusaroli, Lambrechts, Bang, Bowler and Gaigg2017), and our own listening of the recordings, none of the chosen acoustical measures allowed to differentiate the two groups.
While our expectations for group differences in jitter, shimmer or formant dispersion were based on limited literature, mean pitch and pitch range (as measured by F0) were identified by Fusaroli et al. (Reference Fusaroli, Lambrechts, Bang, Bowler and Gaigg2017) to be discriminating acoustical measures of the speech of autistic vs. TD individuals. However, those two measures did not prove sufficient to differentiate the children from the two groups in this study. This could be explained in several ways. First, consistent with our findings, many studies also report null results on pitch range and/or mean pitch (Bonneh et al., Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011; Diehl & Paul, Reference Diehl and Paul2012; Diehl et al., Reference Diehl, Watson, Bennetto, McDonough and Gunlogson2009; Green & Tobin, Reference Green and Tobin2009; Grossman et al., Reference Grossman, Bemis, Skwerer and Tager-Flusberg2010; Hubbard & Trauner, Reference Hubbard and Trauner2007; Nakai et al., Reference Nakai, Takashima, Takiguchi and Takada2014; Patel et al., Reference Patel, Nayar, Martin, Franich, Crawford, Diehl and Losh2020; Scharfstein et al., Reference Scharfstein, Beidel, Sims and Rendon Finnell2011). Second, our study, unlike others, targets a small age range, where significant changes in the vocal tract are unlikely to happen. Significant differences, especially in pitch variation, may be influenced by a confounding effect of age in studies with large age ranges. A third explanation for the lack of significant differences in pitch measures is that the two groups of children are extremely well matched, and our study therefore targets a very specific population.
Finally, syllable duration was found to be slightly higher in autistic children in comparison with TD children, but not in a statistically significant way. This is consistent with Bonneh et al. (Reference Bonneh, Levanon, Dean-Pardo, Lossos and Adini2011), who found utterance and word duration to be higher in young autistic children than in TD children. The difference in statistical significance between ours and their study may be explained by sample size differences, by the robust matching of our groups or by the level of analysis (namely, word and utterance vs. syllable). While differences in syllable durations may be small, word and utterances are composed of a combination of syllables therefore widening the duration gap between productions. Further, syllable duration may also vary within phrases in a person’s speech. For instance, autistic Cantonese-speaking adults were found to produce longer syllables than non-autistic peers in phrase-final position (Franich et al., Reference Franich, Wong, Yu and To2021).
In conclusion, a proportion of children diagnosed with autism are able to achieve high levels of language by preschool both in terms of structure and acoustics. Few remaining atypicalities seem to lie in a restricted use of different vocabulary items, a somewhat diminished morpho-syntactic complexity, and a slightly exaggerated syllable duration. We proposed earlier that verbal autistic preschoolers’ lexical and morpho-syntactic difficulties could be consequences of a delayed acquisition of language or the persistent use of echoed and stereotyped speech. However, neither delays in language acquisition nor the use of echoed and stereotyped speech could explain why these children’s speech delivery is slightly slower than that of TD peers (based on syllabic duration). Atypical prosody is considered a persistent characteristic of autistic individuals (DePape et al., Reference DePape, Chen, Hall and Trainor2012) and should therefore be found among most autistic individuals; syllable duration may be one of the most robust indicators of the peculiarities of the speech of autistic individuals.
Limitations and future directions
Despite what could be considered a limited sample size (i.e., ten children per group), the high language-level resemblance in the ASD group and the robust matching with the TD group strongly suggest that the robustness of our results is high. Mottron (Reference Mottron2021) recently pleaded that systematically favoring large sample size over within-group resemblance carries the risk to inflate noise level in research on ASD, especially as increasing autism prevalence goes hand in hand with growing heterogeneity in individual characteristics. We believe that the high resemblance within our ASD group compensates for its low sample size, giving additional power to the reported results. Autistic children in this sample are indeed quite few, but they are prototypical of their population subgroup, i.e., highly verbal autistic preschoolers.
While certainly needed in the future, acoustical investigations of larger samples of autistic children’s speech should be cautious about their sampling and matching procedure as well as the age of the participants. Furthermore, longitudinal approaches could help disentangle whether persistent language difficulties in the speech of verbal autistic individuals are consequences of a delayed and slower language acquisition process or rather represent autism-specific language peculiarities. Finally, annotated naturalistic speech samples (like the one described in the present paper) offer infinite analytical possibilities. For instance, taking the temporal and conversational dimensions into account (and, potentially, combining them to acoustical analyses) is a fruitful avenue for further research.
Acknowledgments
We would like to thank Morgane Colin, the lab neuropsychologist, for the help with recruitment and testing, and Béatrice Busson and Florence Merken, linguistics master students, for the help with data segmentation. We are also deeply grateful to all the children who took part in this study as well as their parents.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000923000417.
Competing interest
The authors declare none.
Funding statement
At the time the research was carried out, Pauline Maes was supported by a doctoral grant from the ROGER DE SPOELBERCH Foundation and Mikhail Kissine was a 2019–2022 Francqui Foundation Research Professor. Pauline Maes is now a postdoctoral researcher supported by an Excellence of Science (EOS) grant. Marielle Weyland is supported by a doctoral grant from the Marguerite-Marie Delacroix Support Fund.