Introduction
Infant-directed speech (IDS) is a type of register characterized by higher pitch, exaggerated prosody, simplified structure, longer pauses, and slower speech rate than adult-directed speech (ADS), among other distinctive features (e.g., Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). There are now over 50 years of research supporting the idea that IDS plays an important role in language acquisition (e.g., Karzon, Reference Karzon1985; Kemler Nelson et al., Reference Kemler Nelson, Hirsh-Pasek, Jusczyk and Cassidy1989; Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012; Snow & Ferguson, Reference Snow and Ferguson1977; Trainor & Desjardins, Reference Trainor and Desjardins2002; Weisleder & Fernald, Reference Weisleder and Fernald2013). Within this body of research, numerous laboratory studies have demonstrated the benefits of various characteristics of IDS for language acquisition (e.g., Graf Estes & Hurley, Reference Graf Estes and Hurley2013; Kempe et al., Reference Kempe, Brooks, Mironova and Fedorova2003; Ma et al., Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011; Mintz, Reference Mintz2003; Thiessen et al., Reference Thiessen, Hill and Saffran2005). For instance, under controlled laboratory conditions, it has been reported that IDS, relative to ADS, facilitates word segmentation (Thiessen et al., Reference Thiessen, Hill and Saffran2005), word recognition (Singh et al., Reference Singh, Nestor, Parikh and Yull2009), and word learning (Graf Estes & Hurley, Reference Graf Estes and Hurley2013).
Studies outside of the laboratory have also found correlations between caregivers’ use of IDS and child language outcomes, including vocabulary acquisition (e.g., Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2014; Shneidman et al., Reference Shneidman, Arroyo, Levine and Goldin-Meadow2013; Shneidman & Goldin-Meadow, Reference Shneidman and Goldin-Meadow2012; Weisleder & Fernald, Reference Weisleder and Fernald2013) and speech perception abilities (e.g., Trainor & Desjardins, Reference Trainor and Desjardins2002). A meta-analysis also found evidence that speech conforming more to the prosodic characteristics of IDS correlates with infants’ attention and with lexical development (Spinelli et al., Reference Spinelli, Fasolo and Mesman2017).
In tandem with this body of literature showing that exposure to IDS promotes language learning, there is a parallel body of literature showing that young infants prefer listening to IDS over ADS. The basic finding – that young infants prefer IDS – has been replicated across infants of different ages and language backgrounds (Cooper & Aslin, Reference Cooper and Aslin1994; Cooper et al., Reference Cooper, Abraham, Berman and Staska1997; Fernald, Reference Fernald1985; Hayashi et al., Reference Hayashi, Tamekawa and Kiritani2001; Kitamura & Lam, Reference Kitamura and Lam2009; Newman & Hussain, Reference Newman and Hussain2006; Pegg et al., Reference Pegg, Werker and McLeod1992; Santesso et al., Reference Santesso, Schmidt and Trainor2007; Singh et al., Reference Singh, Morgan and Best2002; Werker & McLeod, Reference Werker and McLeod1989), and is supported by a meta-analysis (Zettersten et al., Reference Zettersten, Cox, Bergmann, Soderstrom, Tsui, Mayor, Lundwall, Lewis, Kosie, Kartushina, Fusaroli, Frank, Byers-Heinlein, Black and Mathur2024.). However, a number of studies also report that the IDS preference (Newman & Hussain, Reference Newman and Hussain2006; Robertson et al., Reference Robertson, von Hapsburg and Hay2013) and benefits of IDS during word learning may begin to decrease with age (Ma et al., Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011). The goal of the present study is thus to systematically examine whether or not a preference for IDS (rather than exposure to IDS) predicts later language outcomes using a large, multi-lab sample of linguistically diverse infants.
To our knowledge, only one study at the time of writing addressed this question directly. That study found that individual preferences for IDS over ADS between 6 and 12 months of age predict expressive language outcomes at 18 months, at least in typically developing infants (siblings of children with autism did not show this association; Droucker et al., Reference Droucker, Curtin and Vouloumanos2013). The current study built on this finding with a larger and more diverse sample, which allowed us to more accurately measure the predictive effect of IDS preference and, with sufficient power, to examine additional questions about how other factors like age of testing influence this relationship.
A link between preference for IDS and language development may indicate a simple causal relationship: infants who have a greater preference for IDS fare better because their attention is drawn to the signal that is best matched to their learning needs. A wide variety of studies and theoretical papers suggest that IDS provides particularly rich language-learning opportunities, highlighting, for example, syntactic (e.g., Mintz, Reference Mintz2003; Soderstrom et al., Reference Soderstrom, Blossom, Foygel and Morgan2008), morphological (e.g., Kempe et al., Reference Kempe, Brooks, Mironova and Fedorova2003), phonetic (e.g., Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Trainor & Desjardins, Reference Trainor and Desjardins2002; Werker et al., Reference Werker, Pons, Dietrich, Kajikawa, Fais and Amano2007; but e.g., Martin et al., Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015 for counterevidence), timbral (Piazza et al., Reference Piazza, Iordan and Lew-Williams2017), and prosodic properties (e.g., Kemler Nelson et al., Reference Kemler Nelson, Hirsh-Pasek, Jusczyk and Cassidy1989). Other findings suggest that IDS might serve as a cue to infants that a speaker is a potential teacher or social partner who will provide learning opportunities (Begus et al., Reference Begus, Gliga and Southgate2016). For example, infants prefer to look at a person who previously used IDS than a person who used ADS (Schachner & Hannon, Reference Schachner and Hannon2011), and a speaker’s prior use of IDS was critical for eliciting infants’ subsequent gaze-following towards an object (Senju & Csibra, Reference Senju and Csibra2008).
It could therefore be argued that children who are able to preferentially focus on IDS as compared to ADS effectively enhance their exposure to the most appropriate type of input for language learning. However, more complex relationships might also be at play, as infants’ preferences may be driven by their experiences with IDS. For example, exposure to IDS may enhance a pre-existing but small early preference for IDS over ADS (as supported by findings that even newborn infants prefer IDS; e.g., Cooper & Aslin, Reference Cooper and Aslin1990). As familiarity with this speech register increases, the infant develops greater interest in IDS, which leads to more attention to this kind of input. Given the diversity of individual experiences in exposure to IDS (e.g., Weisleder & Fernald, Reference Weisleder and Fernald2013), a relation between IDS preference and language outcome may thus be intimately connected with experience. Indeed, in children with hearing loss, preferences for IDS are more closely tied to the child’s hearing age than to their chronological age (Robertson et al., Reference Robertson, von Hapsburg and Hay2013). Additionally, the quality and quantity of IDS may be correlated with other beneficial aspects of infants’ environments, including social factors such as attachment. Although it may be difficult to disentangle the relative influence of infants’ underlying preferences for, versus experience with, IDS, determining whether IDS preference is indeed associated with language development presents an important starting point.
In order to directly address this question, the current project leveraged the unique opportunity afforded by the ManyBabies 1 project. In the ManyBabies 1 project (The ManyBabies Consortium, 2020), 67 laboratories contributed data from 2329 infant participants on their relative preference for samples of North American English IDS and ADS, in 13 languages between 3 and 15 months of age. The current proposal builds on the primary ManyBabies 1 project by assessing language development in the same participants whose IDS preferences were tested as infants, at two later age points. Evidence for the facilitative effects of IDS is most robust in lexical learning; thus we elected to measure our participants’ vocabulary size, using parental report data (Communicative Development Inventories – CDI; Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007) collected at two time points commonly used for measuring productive vocabulary: 18 and 24 months. A productive measure was used for greater comparability across the 18 and 24 month ages. The 18-24 month age range is a time of considerable diversity in toddlers’ vocabulary size, characterized by a rapid rate of growth (Frank et al., Reference Frank, Braginsky, Yurovsky and Marchman2017b; Ganger & Brent, Reference Ganger and Brent2004; McMurray, Reference McMurray2007). Data from toddlers at both 18 and 24 months will therefore allow us to not only establish whether there is a relation between IDS preference and vocabulary acquisition generally, but to also elucidate whether the magnitude of this relation varies or remains constant throughout early language development. In total, 21 labs participated in this follow-up study, yielding a sample of N = 341 infants at 18 months, and N= 327 at 24 months who contributed data on their preference for IDS and their later vocabulary size. This sample is much larger than those that can typically be gathered by a single laboratory, which will allow more statistical power to measure the potential relation between IDS preference and vocabulary size.
In addition to querying the overall strength of this relation and how it changes over development, the characteristics of the sample allow us to address whether the relation between IDS preference and vocabulary is influenced by the child’s age at the time of data collection (or the chronological distance between collection of the two measures). Recent findings suggest that the importance of IDS may decrease over development. In one study, the amount of speech with IDS-like characteristics diminished from 24 to 33 months, and speech with less IDS-like characteristics was associated with greater vocabulary at 33 months, but not speech with more IDS-like characteristics (Ramírez-Esparza et al., Reference Ramírez-Esparza, García-Sierra and Kuhl2017). Thus, we might expect a smaller/less reliable relation with vocabulary development when IDS preference is measured at older ages.
Finally, the ManyBabies 1 sample is linguistically diverse, with participating labs testing infants learning 13 different languages, often including multiple language varieties (e.g., American, Canadian, British, and Australian English). This diversity can, to some extent, begin to address the impact of the over-representation of North American English in child language research. Numerous studies point to IDS as a cross-linguistic phenomenon (Blount, Reference Blount1972, Reference Blount1984; Blount & Padgug, Reference Blount and Padgug1976, Reference Blount and Padgug1977; Englund & Behne, Reference Englund and Behne2006; Farran et al., Reference Farran, Lee, Yoo and Oller2016; Fernald & Morikawa, Reference Fernald and Morikawa1993; Fernald & Simon, Reference Fernald and Simon1984; Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989; Grieser & Kuhl, Reference Grieser and Kuhl1988; Katz et al., Reference Katz, Cohn and Moore1996; Kitamura et al., Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2001; Morikawa et al., Reference Morikawa, Shand and Kosawa1988; Niwano & Sugai, Reference Niwano and Sugai2002; Newman, Reference Newman2003; Papoušek et al., Reference Papoušek, Papoušek and Haekel1987; Shute & Wheldall, Reference Shute and Wheldall1995; Zeidner, Reference Zeidner1983). Infants’ preference for IDS is also a crosslinguistic (and even cross-modal, Masataka, Reference Masataka1996) phenomenon (Cooper & Aslin, Reference Cooper and Aslin1994; Cooper et al., Reference Cooper, Abraham, Berman and Staska1997; Fernald, Reference Fernald1985; Hayashi et al., Reference Hayashi, Tamekawa and Kiritani2001; Kitamura & Lam, Reference Kitamura and Lam2009; Newman & Hussain, Reference Newman and Hussain2006; Pegg et al., Reference Pegg, Werker and McLeod1992; Santesso et al., Reference Santesso, Schmidt and Trainor2007; Singh et al., Reference Singh, Morgan and Best2002; Werker & McLeod, Reference Werker and McLeod1989). However, there is ample evidence that North American IDS is particularly extreme in its characteristics (e.g., Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989). Indeed, prosodic differences in IDS registers have been implicated as a source of difference in lexical learning between infants exposed to North American versus British English (Floccia et al., Reference Floccia, Keren-Portnoy, DePaolis, Duffy, Delle Luche, Durrant, White, Goslin and Vihman2016). In the ManyBabies 1 project, slightly less than half of participating labs were from North America. We therefore used the cross-linguistic diversity of this sample to investigate the relation between specific linguistic experience, IDS preference (at least to North American IDS), and eventual vocabulary outcomes. This latter analysis is necessarily tentative in nature because linguistic experience/community is confounded with the measurement of vocabulary, given that many linguistic communities have their own language- or dialect-specific version of the CDI to measure vocabulary size, and because all language communities were tested on their “IDS preference” using North American English stimuli. Ultimately, we were able to recruit sizeable samples of infants learning North American English, British English, and other languages, which allowed us to test our hypotheses with groups outside of North American English contexts.
In sum, the unique opportunity of the ManyBabies 1 project, which gathered data on infants’ preference for IDS from 2,329 infants, allowed us the opportunity to follow up with N = 467 (N = 341 at 18 months and N = 327 at 24 months) of these infants by assessing their productive vocabulary size. These data allowed us to ask the following three research questions:
-
1. To what extent does infants’ preference for IDS as measured in a laboratory setting predict their vocabulary at 18 and 24 months?
-
2. Does the relation between IDS preference and vocabulary size change over development?
-
3. Are there systematic differences in the strength of this relation across the language communities in our sample (exploratory)?
Method
Participants
Participants were a subsample of the primary ManyBabies 1 dataset, based on participating laboratories’ interest and ability to collect the follow-up CDI data from their participants. Only monolingual infants (90% or more exposure to the primary language based on parental report or via a detailed questionnaire depending on the laboratory) were included. A total of 21 laboratories (9 North American, 4 UK, 2 German, 1 New Zealand English, 1 Dutch, 1 Korean, 1 French, 1 Norwegian, 1 Swiss German) collected follow-up data, with a minimum sample of 10 infants per laboratory. In addition, three other laboratories initially signed up but withdrew due to: lack of sufficient participant interest combined with many of the participants not maintaining monolingual status (1); and author miscommunication (2). We asked that laboratories not impose any additional eligibility restrictions on their CDI collection beyond those of the primary study. The final sample consisted of 467 infants (228 North American English, 76 UK English, 163 other languages/dialects) who contributed data at 18 (N = 341) and/or 24 months (N= 327). In total, the final sample consisted of 668 CDI contributions (333 North American English, 92 UK English, and 243 other languages/dialects).
Additional participants who were part of the initial sample of CDI measures were excluded for the following reasons: CDI data from 88 infants were collected when infants were outside of the target age range, 123 infants did not complete at least one pair of IDS and ADS trials to provide an IDS score, and 2 infants were excluded as the participating laboratories reported unusable and/or incomplete CDI data. See Table 1 for more information about the demographics and distribution of the participants.
Data
The data used in our analysis came from two sources. First, we made use of the ManyBabies 1 primary dataset, which can be found at https://github.com/manybabies/mb1-analysis-public. Details about the creation of this dataset can be found in the ManyBabies 1 published study (The ManyBabies Consortium, 2020), and information about the conceptual and methodological relevance of the ManyBabies Project can be found in Frank et al. (Reference Frank, Bergelson, Bergmann, Cristia, Floccia, Gervain, Hamlin, Hannon, Kline, Levelt, Lew-Williams, Nazzi, Panneton, Rabagliati, Soderstrom, Sullivan, Waxman and Yurovsky2017a). In brief, the ManyBabies 1 dataset contains basic participant information (e.g., age in days, gender), and looking time data comparing interest to IDS and ADS. Three looking time paradigms were used across the laboratories: Headturn Preference Procedure, Central Fixation, and Eyetracking. Only data from the participants from laboratories contributing to the CDI follow-up, and for whom CDI data were collected, were included in the current analysis. Note that in the ManyBabies study, all infants, regardless of their language background, were tested on the same set of stimuli recorded in North American English. Stimuli were recorded from mothers speaking to their infant (aged 4-8 months; i.e., IDS) and separately to an experimenter (i.e., ADS) about a set of objects. Clips from these recordings were then subjected to a selection process based on naïve rater ratings regarding the extent to which they sounded infant-directed vs. adult-directed, as well as other characteristics that were controlled for (e.g., naturalness, noisiness). Selected clips were also controlled for other characteristics such as object labels and speaker identity. These clips were then combined to create 16 total test trials of 18 s each. The visual stimulus used for the Central Fixation, Eyetracking and some of the labs using Headturn Preference Procedure was a colourful checkerboard pattern. For the other Headturn Preference Procedure labs, lights or other visual displays were used.
Second, we collected four different types of CDI data. In the North American Primary Sample, we collected data from North American participants using webcdi, a web-based version of the MacArthur-Bates Communicative Development Inventories (Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007). For this sample, data were collected at 18 months (16–20 months) and 24 months (22–26 months), and standardized scores were used in the analysis. In cases where standardized scores were unavailable, raw scores were used, and the age ranges were therefore narrowed to 17.5–18.5 and 23.5–24.5 months. In the UK Primary Sample, data were collected using an online version of the Oxford CDI (Hamilton et al., Reference Hamilton, Plunkett and Schafer2000). In the “Other Language/Dialects Primary Sample” we collected data from non-North American and non-UK language communities. For this sample, each laboratory selected the CDI that best matched their language community. Lastly, we allowed for the contribution of “samples of convenience” because some laboratories had specific policies already in place regarding the collection of CDI data for their participants at the time of testing (i.e., concurrently with the experimental test of IDS preference) or at other times than those specified in the primary collection protocol. Given the diversity of these latter two samples and insufficient details at the time of pre-registration, we planned to conduct exploratory analyses on these data. We conducted specific exploratory analyses for the “Other Language/Dialects Primary Sample”, but there were insufficient data collected from samples of convenience to conduct analyses. Instructions provided to contributing laboratories can be found here: https://osf.io/t9mk5.
Analysis and results
Data preparation
See https://osf.io/7z4u6 for the original Stage 1 registered analysis. Full details of our analysis pipeline can be found at https://github.com/manybabies/mb1-cdi-followup. A cloud-based “Docker” image to facilitate result reproducibility is also available at https://github.com/manybabies/mb1-cdi-followup/blob/master/Docker%20instructions.md. Each participating laboratory provided a spreadsheet with laboratory and participant ID codes, and summary vocabulary scores at 18 and/or 24 months for each infant. By-item data were also collected but are not included in the analyses. For the North American sample, standard percentile scores using the Fenson et al. (Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007) norms were used for the preregistered analysis. For any other language communities, we planned to use raw scores (i.e., number of vocabulary words) if no standardized scoring was available. We used this approach for the UK preregistered analysis. However, ultimately we were able to obtain percentile scores for all of the samples as described below in the exploratory analyses. Laboratory and participant ID codes were used to match each infant with their mean looking times (mean preference to IDS and ADS samples) in the ManyBabies 1 dataset, as well as gender, age-in-days at test, and testing protocol (Headturn Preference, Central Fixation, or Eyetracking). A new variable “standardized mean preference for IDS” was created. To calculate this variable, looking time to each ADS trial was subtracted from its paired IDS trial to create a raw difference score. As a result we excluded any trial pairs in which there were missing data based on the ManyBabies 1 criteria for trial inclusion. A mean difference score was then calculated for each child. This mean difference score was then divided by the mean looking time across both ADS and IDS trials for each infant, to control for differences in looking time due to methodological and age-related factors. This score could vary from 1.6 (indicating a complete IDS preference) to -1.6 (indicating a complete ADS preference. In the analyses that follow, we make the assumption that the “standardized mean preference for IDS” can be used in a continuous fashion to represent the degree of preference to IDS for a given infant. We acknowledge that there are limitations to this assumption.
Power analyses
Although there is no equivalent study on which to conduct a power analysis, the one study that has looked directly at the relation between preference for IDS and vocabulary size found a correlation of r = .504 for children without siblings with autism (Droucker et al., Reference Droucker, Curtin and Vouloumanos2013). For the current study, we conducted a prospective power analysis and set the smallest effect of interest to a more modest r = .3, which would account for approximately 10% of the variance in the vocabulary size. A comparable level of effect size has been reported in studies investigating the relation between infants’ segmentation skills at 7.5 months and their productive vocabulary size at 24 months (Singh et al., Reference Singh, Steven Reznick and Xuehua2012). A power analysis using the pwr package (Champely, Reference Champely2015) in the R programming language (R Core Team, Reference Team2017) showed that a minimum of 84 infants would be necessary to detect a main effect of this size with a power threshold of 80% and alpha at 5%Footnote 1. Although our final sample was larger than 84 for all three of our main samples, there may be a reduction in power due to interactions involving the IDS preference and the greater model complexity. To address this concern, we pre-registered that we would conduct power checks during the analysis phase (see original Stage 1 registered report). Unfortunately, this was not possible due to singularity issues. Removing significant effects based on models that have singularity issues would not produce reliable results and thus we excluded this power analysis from the Stage 2 registered report. Instead, we conducted a sensitivity analysis (see Supplement 1).
Confirmatory statistical models
Some necessary deviations from our original analytic plan were implemented. Please see Supplement 2 for details.
For our primary confirmatory analyses, we applied a series of mixed-effects models (one for the North American primary sample, one for the UK primary sample, and one combined across the two samples) using the lme4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2015) in the latest version of the R programming language (4.2.3) available at the time of completing the analysis (R Core Team, Reference Team2017). The dependent factor of the models was the productive vocabulary score of each child in the CDI data (for the North American sample this is the standardized score, whereas raw scores were used for the UK sample). The models included the following predictors as fixed effects (note that the grand mean is interpreted at the reference levels for all binary variables and at the average levels for all continuous variables in the model):
ids_pref: Standardized mean preference for IDS (described above) as a centered continuous variable. The intercept of this factor represents the CDI value for the grand mean of ids_pref.
test_age: Age (in months) at time of IDS preference testing, entered as a centered continuous variable. The intercept of this factor represents the CDI value for the grand mean of test_age.
cdi_age: Age (in months) at which the CDI measure was taken, entered as a centered continuous variable. The intercept of this factor represents the CDI value for the grand mean of cdi_age.
gender: Male/female as an effect coded fixed factor to test for effects of gender. The intercept of this factor represents a hypothetical CDI value where gender is neither male nor female.
protocol: The testing protocol used to assess IDS preference (3 levels: central fixation, eye-tracking, and head-turn preference), entered as a deviation coded factor. The intercept of this factor represents the difference between CDI values for the mean of each level (e.g., central fixation) and the grand mean of all levels.
In order to keep the models to a manageable level of complexity, we restricted the interaction terms to those that could be motivated theoretically. The main factor of interest, the effect of IDS preference, may be conditioned by age at time of IDS testing, age when CDI was taken, or the testing protocol used to test IDS preference. The model therefore included two-way interactions between ids_pref and test_age, ids_pref and cdi_age, and ids_pref and protocol, as well as main effects of ids_pref, test_age and cdi_age. The factor gender was included to address known gender differences in vocabulary size, and therefore entered only as a main effect. In addition, lab was entered as a random factor in order to control for variance between the participating laboratories. This is particularly important given the allowed variation in methodology across laboratories in the original ManyBabies study, even within a protocol. The resulting starting model for each of the NAE and UK primary samples had the following structure:
As noted above, we preregistered two criteria that this mixed-effects model needed to meet and for which the model was simplified if necessary. First, the model had to reach convergence. To achieve convergence, we iteratively simplified the random effects structure of the model by sequentially removing random slopes for lab, starting with the highest order interaction terms that explained the least amount of random variance (Barr et al., Reference Barr, Levy, Scheepers and Tily2013). The final pruned model for the NAE was:
And the final pruned model for the UK model was:
In both finalized models, we controlled for the random effects (i.e., random intercept) of the participants to handle repeated measurements because some participants have completed CDI twice. The lmerTest R package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) was used to run the model using Type III error Sum of Square for consistency with the original ManyBabies 1 study.
Our second criterion involved a power calculation which we were unable to complete and was therefore not implemented (see above).
In addition, our pre-registered analysis plan included a Kappa test on the possibility of collinearity between test_age and cdi_age. A ‘c’ number higher than 10 from the Kappa test would result in residualizing test_age against cdi_age to allow the use of both in our models. We carried out the Kappa test and found a value of 3.26, suggesting that we did not violate the multicollinearity assumption and can include both age at CDI test and age at IDS test in the same planned mixed-effect models.
Separate NAE and UK models
A summary of the NAE and UK models can be found in Table 2. In the NAE model, one of the predictors in the model was statistically significant: child’s CDI age. In the UK model, there were two significant effects: the main effect of age at CDI test and the main effect of age at time of IDS preference testing. Neither the main effect of ids_pref (Research Question #1) nor the interaction(s) of ids_pref with age (Research Question #2) were significant. In addition, we ran a preregistered Bayesian analysis to probe the strength of the evidence in favor of the null effects for our research questions (see Supplement 2). Bayes factors ranged between .87 and 1.04, which did not reach our established threshold (.33) of support for the null hypothesis. These were calculated using the brms package in R (version 2.18.0; Bürkner, Reference Bürkner2017).
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Combined NAE and UK model
For the third research question, we ran a third analysis parallel to the first two, but that combined across the North American and UK samples, and included a new variable:
Dialect: NA/UK as an effect coded fixed factor
For this analysis, only North American infants in the more restricted 17.5-18.5 and 23.5-24.5 age ranges were included, and proportional scores (raw score divided by the total number of items in the CDI) were used, to ensure greater comparability across the samples. The initial model fitted to the data had the following structure:
After pruning, the final model was:
Just like the NAE and UK model, we controlled for the random effects (i.e., random intercept) of the participants because of repeated measurements of CDI (see Table 2 for summary).
As with the individual models, we found a significant main effect of age at the time when CDI was collected. We also found a significant main effect of dialect, with UK infants showing a higher vocabulary proportional score than North American infants. Finally, we found a significant main effect of gender in which females have a higher proportion of vocabulary than males. However, as with the individual models, we did not find any significant effect of the IDS preference nor any significant interaction between IDS preference and other factors, including dialect (Research Question #3).
The calculated Bayes factor for the main effect of IDS preference was .75. For the IDS preference and age of IDS test interaction it was .35, and for the interaction between dialect and IDS preference it was .37. These did not quite reach our established threshold (.33) of support for the null hypothesis.
Exploratory analysis
Next we conducted an analysis including all of the data across all 21 laboratories to test our hypotheses with a larger and more diverse sample. At the time of preregistration, it was unknown if we would have enough non-English laboratories to conduct additional analyses, so the following analysis was not registered and should be considered exploratory. In addition to the NAE and UK English-speaking laboratories, we included data from German (including Swiss German), Dutch, French, Norwegian, and Korean-speaking laboratories, as well as an additional English sample from New Zealand.
For this exploratory analysis, we took a different approach from our confirmatory analysis, in two primary ways. First, we generated normed data for all of our datasets using a process described in Frank et al. (Reference Frank, Braginsky, Yurovsky and Marchman2017b), rather than relying on proportional scores. This approach better accounts for differences across the instruments and languages and is more sensitive to age effects within a sample (see below). Second, we used a beta regression model to better capture the structure of the percentile scores, which are not normally distributed and bounded between 0 and 1.
To create the normed data that could be compared across instruments, data for German, Dutch, French, Norwegian, Korean, British and North American English vocabulary score norms were retrieved from WordBank (Frank et al., Reference Frank, Braginsky, Yurovsky and Marchman2017b; retrieved April 14th, 2022).Footnote 2 The vocabulary score norms from the countries’ participating labs were divided by age for each CDI instrument. Norming data for all instruments, except Norwegian (for which this process had already been conducted for a prior study), were collected using the child’s age in months, as this is how they are reported in the WordBank system. Given the rapid expansion of vocabulary during the period from 18-24 months, we wanted our norms to capture a more granular level of analysis at the level of individual days. Therefore, a quantile regression was performed for each country’s norming data. This was done using 1 percent quantiles with the R function gcrq from the package quantregGrowth (Muggeo, Reference Muggeo2023) and followed the procedure introduced in Kartushina et al. (Reference Kartushina, Mani, Aktan-Erciyes, Alaslani, Aldrich, Almohammadi, Alroqi, Anderson, Andonova, Aussems, Babineau, Barokova, Bergmann, Cashon, Custode, de Carvalho, Dimitrova, Dynak, Farah and Mayor2022). This process resulted in rankings from 1 to 99 for each infant age, in days, thereby controlling for vocabulary size differences attributable to infants’ sex, age and language. Each of these rankings interpolated data from months to days by dividing each month by the average length of a month in days (30.457 days). Raw scores from our participants were then compared to the raw score derived from the norms to the closest age in days. The column containing a CDI score with the closest value to our participant’s score was assigned as the participant’s percentile ranking. Participants outside the age range used for the CDI were removed from further analysis. To adjust for the fact that reporting age in months in the norms would have been centered on the middle of the month, 15 days were added to the reported age of each child when comparing to the norms. For the Norwegian data, no adjustment was used since the data were originally collected in days. IDS preference and test_age were also z-transformed to more easily interpret the estimates.
daily_percentile: The percentile vocabulary score normed to each language at the specific age in days for each child.
z.IDS_pref: Standardized mean preference for IDS (described above) as a centered continuous variable. The intercept of this factor represents the CDI value for the grand mean of ids_pref.
z.age_months: Age (in months) at time of IDS preference testing, entered as a centered continuous variable. The intercept of this factor represents the CDI value for the grand mean of test_age.
CDI.agerange: Age (in months) at which the CDI measure was taken, entered as a factor variable. The intercept of this factor represents the CDI value at 18 months old.
gender: Male/female as an effect coded fixed factor to test for effects of gender. The intercept of this factor represents a hypothetical CDI value where gender is neither male nor female.
method: The testing protocol used to assess IDS preference (3 levels: central fixation, eye-tracking, and head-turn preference), entered as a deviation coded factor. The intercept of this factor represents the difference between CDI values for the mean of each level (e.g., central fixation) and the grand mean of all levels.
nae: TRUE/FALSE coded fixed factor to test for effects of North-American English. The intercept of this factor represents FALSE.
For the model analysis, a beta regression was conducted using the function glmmTMB in the package of the same name (1.1.2; Brooks et al., Reference Brooks, Kristensen, Van Benthem, Magnusson, Berg, Nielsen, Skaug, Machler and Bolker2017). To evaluate model assumption of multicollinearity the function vif was used (R package car; version 3.0-12; Fox et al., Reference Fox, Weisberg, Price, Adler, Bates, Baud-Bovy and Bolker2019). A “full-null” model paradigm was used to account for the large number of viable possible models that could be used given our data (Forstmeier & Schielzeth, Reference Forstmeier and Schielzeth2011). Overdispersion was also investigated by checking that the dispersion parameter for the full model was not above 1.
Full Model:
Null Model:
The full-null model comparison was performed with the function anova (R package lmtest; version 0.9.40; Zeileis & Hothorn, Reference Zeileis and Hothorn2002).
Across all the labs 668 datapoints were collected. The final sample consisted of a total sample of 625 data points with 447 unique infants from 21 labs using 3 different methods. Forty-three observations were removed from the initial sample due to incomplete data – specifically, missing a computed percentile because their age range was outside of the age for the CDI from their respective assessments.
There was no evidence of collinearity between any of the predictors (maximum VIF was 1.17). No evidence of overdispersion was found (dispersion parameter = 0.5366). There was no evidence from the likelihood ratio test of the full-null model comparison: χ2 = 6.612, df = 6, p = 0.358 that the interactions of IDS preference and the other factors are associated with the daily percentile vocabulary scores. Thus, we performed another model comparison without IDS preference in any interactions as seen below.
Full Model (No Interactions):
Null Model (No Interactions):
There was no evidence of overdispersion in this new full model without interactions (dispersion parameter = 0.532). There was still no evidence that including IDS preference in the model statistically significantly increased the model fit in the full-null model comparison χ2 = 1.441, df = 1, p = 0.230. See Table 3 for the model estimates for the full and null models without interactions.
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
a The estimates from this model cannot be interpreted since the interaction effects did not improve model fit over the null model. The main effects are also uninterpretable due to the insignificant interaction effects (Engqvist, Reference Engqvist2005; Lorah, Reference Lorah2020).
In sum, our analyses do not support the hypotheses that preference for IDS as measured in the ManyBabies preference task is associated with later vocabulary (measured via CDI parental report), even with this expanded dataset and more granular level of analysis with respect to percentile scoring. We also did not find support for an interaction effect with age at testing (neither for the preference test nor the age of CDI collection), or method. We did find significant main effects in two of our nuisance variables – gender and language grouping.
Discussion
Our primary goal in this study was to test for a possible relation between infants’ preference for IDS (as measured in a large scale IDS preference study) and later vocabulary knowledge (as measured by CDI parental reports) at 18 and 24 months. Secondarily, we were interested in knowing whether any such relation might change over development or based on the infant’s linguistic experience. Across both our preregistered and exploratory analyses, we found no evidence for a relation between IDS preference and later vocabulary. However, Bayesian analyses did not reach our pre-established threshold to support the null hypothesis, so we cannot take this null finding as direct evidence that such a relation does not exist. Furthermore, the effect of age and language community on this relation was not significant.
Before exploring the implications of these null findings, a brief note on some unanticipated findings with “nuisance” variables is in order. We found significant effects of the age at which the CDI was collected, across several models. This result was expected (and would have been troubling if not found) for the UK-only model, which used raw scores – it is simply capturing that infants’ vocabulary grows from 18 to 24 months. However, we also found this effect in the North American English (NAE)-only model, which would not be predicted given these were percentile scores, which should not show a systematic increase with age, due to norming. It’s important to note that these scores were collected using the new web-CDI version of the North American CDI during an initial pilot phase (DeMayo et al., Reference DeMayo, Kellier, Braginsky, Bergmann, Hendriks, Rowland, Frank and Marchman2021). One possible explanation is that the percentile scores used were based on older normed data collected via a different approach, which may not have fully accounted for age effects in more recent web-based samples. However, we cannot be certain why this effect emerged. The effect of age that we found, although significant, was not large in this model and is unlikely to have an impact on the conclusion of our main research question.
There was also a significant main effect of the age of IDS preference testing in the model for the UK sample (but not the NAE sample) in predicting CDI scores. The reason for this finding is unknown, but it may have been an artifact of non-random assignment of infant age of testing in the CDI data collection – i.e., it is possible that the age window we imposed for follow-up CDI data collection might have unintentionally given rise to cohort effects within the IDS preference sampling. Finally, a significant main effect of the “language zone” in the combined model (i.e., UK vs. NAE) suggests that the proportional measure used as the outcome measure in that model did not fully calibrate between the two languages’ instruments. A similar main effect of language zone (NAE vs. others) emerged in our exploratory analysis which used percentile scores and an alternative analytic approach, suggesting that even with the percentile-based approach, we were not fully able to calibrate across the instruments. These findings reinforce the challenges for work that combines and/or compares across vocabulary instruments within a single analysis. However, despite these challenges, the failure to find a relation between IDS preference and vocabulary was consistent across the exploratory and confirmatory analyses. We also found an effect of gender in our analyses, with males scoring on average lower than females, as is common in the literature (e.g., Eriksson et al., Reference Eriksson, Marschik, Tulviste, Almgren, Pereira, Wehberg, Marjanovič, Gayraud, Kovacevic and Gallego2012; Frota et al., Reference Frota, Butler, Correia, Severino, Vicente and Vigário2016; Nylund et al., Reference Nylund, Ursin, Korpilahti and Rautakoski2021; Sansavini et al., Reference Sansavini, Bello, Guarini, Savini, Stefanini and Caselli2010; Schults & Tulviste, Reference Schults and Tulviste2016).
Reliability of individual differences in preference studies
One possible lens with which to understand our null findings is to raise the question of whether infant preference measures of the type used in our study actually capture meaningful individual variation. At a much broader level, this question raises the often underappreciated distinction between “differential” research approaches, which emphasize individual differences and are thus optimized to maximize between-subjects variance, and “experimental” approaches which emphasize group-level differences between (experimentally manipulated) conditions and are thus optimized to maximize within-subjects variance (Draheim et al., Reference Draheim, Mashburn, Martin and Engle2019). It is possible that a group of infants would show a robust preference for one stimulus type over another, without it being the case that individual variation in the size of that preference is meaningful. More concretely, although it is possible that larger differences in the measured looking toward IDS over ADS for a given infant capture real differences in that infant’s underlying preference for IDS relative to another infant who showed longer looking toward ADS, it is also possible that differences in performance are simply capturing attentional differences in the task, or transient effects of distraction or mood on the day of testing.
A way to probe this analytically is to examine the extent to which infant preference as measured in the laboratory is stable across repeated testing. A separate follow-on study (Schreiner et al., Reference Schreiner, Zettersten, Bergmann, Frank, Fritzsche, Gonzalez-Gomez, Hamlin, Kartushina, Kellier, Mani, Mayor, Saffran, Shukla, Silverstein, Soderstrom and Lippold2024) to the original ManyBabies 1 study did just this with a subset of the sample used in our analysis. Specifically, a total of 158 infants across 7 laboratories were brought back for a second day of testing about one week after the first test (range = 1–31 days). Although an IDS preference at the group level was also found in the retest session, replicating the group effect of preference for IDS, they found a lack of consistent evidence for test-retest reliability in measures of infants’ speech preference at the individual level.
A second analytic approach to examining the reliability of the IDS preference task is to examine its internal consistency – the extent to which infants show a consistent preference for IDS vs. ADS across trials within the same test session. Byers-Heinlein et al. (Reference Byers-Heinlein, Bergmann and Savalei2022) undertook such an analysis and found that the internal consistency measured via the intraclass correlation coefficient across the 8 trial pairs was .14. Note that values below .5 indicate poor reliability (Koo & Li, Reference Koo and Li2016), so again this analysis indicates that this task is not a reliable measure of individual preference.
The goal in the current study was to investigate the correlation between IDS preference in this task and CDI, and our ability to do so crucially depended on the reliability of both measures, as well as the sample size. The CDI is optimized to measure individual differences and test-retest reliability of the CDI is quite good, estimated to be between .86 and .90 (Dale et al., Reference Dale, Bates, Reznick and Morisset1989; Jahn-Samilo et al., Reference Jahn-Samilo, Goodman, Bates and Sweet2001; Simonsen et al., Reference Simonsen, Kristoffersen, Bleses, Wehberg and Jørgensen2014). Combined with the estimated reliability of the IDS preference task as well as the sample sizes of our groups, a sensitivity analysis revealed that our design had 80% power to detect a true correlation between IDS preference and CDI of .46 or greater for the NAE sample, and .89 or greater for the UK sample. Thus, given the possibly low reliability of the IDS preference task, even with the large samples we were able to collect, our study would have been underpowered to detect more moderately-sized correlations. A much larger sample, or ideally a more reliable measure of infants’ individual IDS preferences, would in the future be more revealing of whether a relation between attention to IDS and vocabulary development exists.
Implications for theory
The above commentary raises concerns about the extent to which we were able to capture individual variation in IDS preference sufficiently well to detect an effect, and it is worth noting that our Bayes analysis did not permit us to claim direct evidence in favour of the null hypothesis. Indeed, our Bayes Factor for the key factor of interest was close to 1, indicating close to equal support for the null hypothesis and for an effect of IDS preference. Moreover, our findings are in contradiction to those of Droucker et al. (Reference Droucker, Curtin and Vouloumanos2013), who found a significant relationship between preference for IDS and CDI scores at 18 months. There are some methodological differences between that study and ours, including the sample size and population tested, their use of the Words and Gestures form rather than the Words and Sentences form, and the details of how preference for IDS was measured (e.g., number of trials, specific nature of the IDS and ADS stimuli). But it is not possible at this point to know whether a methodological difference led to the different findings or simple statistical variation. Therefore, we must nonetheless consider the implications of the possibility that our findings are a true null result – i.e., that there is no relation between an infant’s underlying individual preference for IDS and their later language development.
This finding needs to be contextualized within, on the one hand, experimental evidence for the benefits of IDS in infant language processing tasks (e.g., Ma et al., Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011; Thiessen et al., Reference Thiessen, Hill and Saffran2005), widespread cross-cultural/cross-linguistic IDS usage (Hilton et al., Reference Hilton, Moser, Bertolo, Lee-Rubin, Amir, Bainbridge, Simson, Knox, Glowacki, Alemu, Galbarczyk, Jasienka, Ross, Neff, Martin, Cirelli, Trehub, Song, Kim and Mehr2022), and correlational findings of a relation between caregiver usage of IDS and infant language development within Western contexts (e.g., Weisleder & Fernald, Reference Weisleder and Fernald2013), and on the other, large cross-cultural differences in the usage of IDS that do not appear to be reflected in cultural differences in language acquisition milestones (e.g., Casillas et al., Reference Casillas, Brown and Levinson2020, Reference Casillas, Brown and Levinson2021; Cristia et al., Reference Cristia, Dupoux, Gurven and Stieglitz2019). In other words, there is compelling evidence that IDS can be important for language development, but not necessarily for all cultures.
One possibility, therefore, is that preference for IDS, rather than capturing a construct of relevance for language outcomes, is capturing individual differences in infant experience with IDS in a way that influences the extent to which IDS matters in the development of an individual child. Possible evidence in favour of this idea comes from the stronger preference for IDS found in the original ManyBabies 1 study for North American English-learning infants compared to infants learning other languages. This finding could be driven by differential experience with IDS across different languages, since North American English IDS is often considered to be a relatively extreme version of IDS (e.g., Byers-Heinlein et al., Reference Byers-Heinlein, Tsui, Bergmann, Black, Brown, Carbajal, Durrant, Fennell, Fievet, Frank, Gampe, Gervain, Gonzalez-Gomez, Hamlin, Havron, Hernik, Kerr, Killam, Klassen and Wermelinger2021), which could lead to systematic differences in the importance of IDS in the language development process, as infants’ perceptual systems tune to the ambient language experience. (However, there is an important confound in that study, in that infants were all tested with North American English IDS.) Alternatively, IDS might be similarly important to all infants regardless of cross-cultural/cross-linguistic differences in experience, but in a passive way, such that individual differences in preference are simply irrelevant to the IDS effect. This latter alternative is possible, but would contradict mainstream theories that a crucial role for IDS is in drawing the infant’s attention to the speech signal (e.g., Soderstrom, Reference Soderstrom2007). Finally, it is possible that the gap between the time when IDS preference was tested and the vocabulary size was reported was too large to reveal a reliable relation between IDS and language development, as it might be more robust when measured concurrently. The more extreme possibility, that IDS plays no role at all in supporting language development, seems even more unlikely given the wealth of evidence (at least in Western contexts) to the contrary.
Limitations and future directions
The primary limitation of this study is what we have already discussed in detail, which is the low test-retest reliability of the IDS preference measure for capturing individual differences. In addition and relatedly, our sample, although larger than many infant preference studies, may still have been underpowered to detect a relation between our primary variables of interest. Similarly, while our study was more geographically diverse than many of its type, our sample was one of convenience and not representative of the populations from which it was sampled, let alone representative of global diversity.
Our findings, together with those of Schreiner et al. (Reference Schreiner, Zettersten, Bergmann, Frank, Fritzsche, Gonzalez-Gomez, Hamlin, Kartushina, Kellier, Mani, Mayor, Saffran, Shukla, Silverstein, Soderstrom and Lippold2024), point to the importance of further considering the relation between group-based effects like the preference for IDS and individual variation within those effects. This approach is important both for general theory (understanding the role that IDS plays in language development) and due to the potential for perceptual measures of this type to be used in the study of special developmental populations and intervention (e.g., Droucker et al., Reference Droucker, Curtin and Vouloumanos2013).
Conclusion
Across 467 infants from 21 labs and several analytic approaches, our findings provide little support for a relation between preference for infant-directed speech as measured by laboratory perceptual tests, and later vocabulary measured by parental reports. A lack of test-retest reliability suggests that we may not be sufficiently capturing individual variation in infant preference to robustly detect relations between IDS preferences and vocabulary, and points to the importance of differentiating between group effects and individual differences in interpreting infant preference data. Future research should strive to improve the reliability of preference measures and expand the sample diversity. By exploring the relation between group effects and individual differences, we can gain a deeper understanding of the complex interplay between IDS, infant preferences, and language development in diverse populations.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000924000254.
Acknowledgements
The authors acknowledge the following funding sources: Andrew Jessop was supported by ES/L008955/1 and ES/S007113/1; Samantha Durrant was supported by ES/S007113/1 & ES/L008955/1; Christine E. Potter was supported by NICHD F32 HD093139; Casey Lew-Williams was supported by NICHD R01 095912; Jessica F. Hay was supported by NIH 1R01HD083312; Eon-Suk Ko was supported by NRF 2021R1I1A2051993; Melanie Soderstrom was supported by NSERC Discovery RGPIN-2019-05367; Natalia Kartushina was supported by Research Council of Norway 301625 & Centres of Excellence funding scheme 223265; Janet F. Werker was supported by SSHRC Insight Grant 435-2019-0306 & NSERC DG 81103. We also thank the many research assistants and participants who contributed to this study.
Author contribution
Conceptualization: Melanie Soderstrom, Janet F. Werker, Amanda Seidl, Mitsuhiko Ota, Julien Mayor, Jessica F. Hay, Erin E. Hannon, Anja Gampe, Michael C. Frank, Samantha Durrant, Krista Byers-Heinlein, Alexis K. Black, and Christina Bergmann. Data curation: Melanie Soderstrom, Luis E. Muñoz, Yana Ryjova, Jennifer L. Rennels, Karli M. Nave, Julien Mayor, Christina Bergmann, Mohammed K. AlShakhori, Ali H. Al-Hoorie, and Angeline S. M. Tsui. Formal analysis: Melanie Soderstrom, Joscelin Rocha-Hidalgo, Luis E. Muñoz, Mitsuhiko Ota, Karli M. Nave, Julien Mayor, Natalia Kartushina, Andrew Jessop, Michael C. Frank, Veronica Boyce, Christina Bergmann, and Angeline S. M. Tsui. Funding acquisition: Casey Lew-Williams, Eon-Suk Ko, Natalia Kartushina, Jessica F. Hay, Anja Gampe, and Cara Cashon. Investigation: Melanie Soderstrom, Janet F. Werker, Barbora Skarabela, Amanda Seidl, Yana Ryjova, Jennifer L. Rennels, Christine E. Potter, Mitsuhiko Ota, Nonah M. Olesen, Karli M. Nave, Julien Mayor, Alia Martin, Lauren C. Machon, Casey Lew-Williams, Eon-Suk Ko, Hyunji Kim, Natalia Kartushina, Jessica F. Hay, Naomi Havron, Erin E. Hannon, J. Kiley Hamlin, Nayeli Gonzalez-Gomez, Anja Gampe, Tom Fritzsche, Samantha Durrant, Catherine Davies, Cara Cashon, Alexis K. Black, Christina Bergmann, and Laura Anderson. Methodology: Melanie Soderstrom, Agata Bochynska, Amanda Seidl, Julien Mayor, Samantha Durrant, and Angeline S. M. Tsui. Project administration: Melanie Soderstrom, Karli M. Nave, Julien Mayor, Eon-Suk Ko, and Anja Gampe. Resources: Casey Lew-Williams. Software: Karli M. Nave, Julien Mayor, Anja Gampe, Christina Bergmann, Mohammed K. AlShakhori, and Ali H. Al-Hoorie. Supervision: Melanie Soderstrom, Barbora Skarabela, Jennifer L. Rennels, Mitsuhiko Ota, Nonah M. Olesen, Julien Mayor, Alia Martin, Casey Lew-Williams, Eon-Suk Ko, Hyunji Kim, Natalia Kartushina, Jessica F. Hay, Erin E. Hannon, J. Kiley Hamlin, Anja Gampe, Samantha Durrant, Catherine Davies, Cara Cashon, Christina Bergmann, and Angeline S. M. Tsui. Validation: Melanie Soderstrom, Joscelin Rocha-Hidalgo, Agata Bochynska, Christina Bergmann, and Angeline S. M. Tsui. Visualization: Melanie Soderstrom, Joscelin Rocha-Hidalgo, Luis E. Muñoz, Julien Mayor, Anja Gampe, Christina Bergmann, and Angeline S. M. Tsui. Writing – original draft: Melanie Soderstrom, Joscelin Rocha-Hidalgo, Luis E. Muñoz, Mitsuhiko Ota, Julien Mayor, Erin E. Hannon, Krista Byers-Heinlein, Alexis K. Black, Christina Bergmann, and Angeline S. M. Tsui. Writing – review & editing: Melanie Soderstrom, Joscelin Rocha-Hidalgo, Luis E. Muñoz, Agata Bochynska, Amanda Seidl, Jennifer L. Rennels, Mitsuhiko Ota, Karli M. Nave, Julien Mayor, Erin E. Hannon, Nayeli Gonzalez-Gomez, Anja Gampe, Catherine Davies, Krista Byers-Heinlein, Mohammed K. AlShakhori, and Ali H. Al-Hoorie.
Competing interest
The authors declare no competing interests.