Most English verbs agree with their subjects in person and number via a single affix marking the third-person singular form in the present tense. The verb be has a particularly complex paradigm (for English), with agreement in both the present and past tense (e.g., the baby is/was versus the kids are/were) and a unique first-person singular present-tense form (I am/was). Variable agreement occurs in both prestige and vernacular Englishes around the world, particularly in plural existential sentences, as in (1).
In these and other contexts, speakers may produce either an agreeing verb form, as in (1a), or a nonagreeing form, as in (1b).
Agreement variation has been extensively documented. Nonagreement occurs more often in speech than in writing (Crawford, Reference Crawford2005; Martinez Insua & Palacios Martinez, Reference Martinez Insua and Palacios Martinez2003), more often among less- than among more-educated speakers (Britain & Sudbury, Reference Britain, Sudbury, Jones and Esch2002; Hay & Schreier, Reference Hay and Schreier2004; Meechan & Foley, Reference Meechan and Foley1994; Tagliamonte, Reference Tagliamonte1998), and is influenced by a variety of linguistic factors, including subject type, tense, and distance between subject and verb (Britain & Sudbury, Reference Britain, Sudbury, Jones and Esch2002; Eisikovits, Reference Eisikovits and Cheshire1991; Hay & Schreier, Reference Hay and Schreier2004; Henry, Reference Henry2016; Martinez Insua & Palacios Martinez, Reference Martinez Insua and Palacios Martinez2003; Meechan & Foley, Reference Meechan and Foley1994; Tagliamonte, Reference Tagliamonte1998; Walker, Reference Walker, Beaman, Buchstaller, Fox and Walker2021).
This combination of consistency and variability in English agreement creates a complex learning problem. Children must acquire a system in which the form of the verb usually depends on the properties of its subject, but in which certain combinations of social and linguistic factors occasionally lead to a different form. How do young children acquire both consistent agreement and the probabilistic patterns of agreement variation found in their communities? Does their developing model of this complex system align with their caregivers’?
English-learning children begin using agreeing verb forms in both production and comprehension around three years (Brown, Reference Brown1973:271; Keeney & Wolfe, Reference Keeney and Wolfe1972; Lukyanenko & Fisher, Reference Lukyanenko and Fisher2016; Rissman, Legendre, & Landau, Reference Rissman, Legendre and Landau2013; Theakston & Rowland, Reference Theakston and Rowland2009), and reach high rates of accurate production around four years (Rice & Wexler, Reference Rice and Wexler2001:79). The comparatively few studies of children's acquisition of English agreement variation have found that Scottish and Northern Irish preschoolers produce variable agreement at rates that correlate with their input, and that their use, like their caregivers’, is conditioned by sentence type (Belfast, Northern Ireland: Henry, Reference Henry2016; Buckie, Scotland: Smith & Durham, Reference Smith and Durham2019, Ch. 7). This is consistent with other evidence that the acquisition of variation proceeds in tandem with the acquisition of categorical patterns in language (see Smith & Durham, Reference Smith and Durham2019, Ch. 1).
In the current paper, we explore agreement consistency and variation in child and caregiver English in the United States, focusing on children between two and six years old. Our first study explores agreement in two relatively large single-family corpora from the CHILDES database (MacWhinney, Reference MacWhinney2000), and the second explores a smaller corpus of caregiver and child speech during a semistructured “Search-and-Find” activity in the lab. In both studies, we take a broad approach to potential conditioning factors, including sentence types in our analyses that are rarely variable in adult English and others that nearly always are, to allow for the possibility that children's envelope of variation may differ from their caregivers’. Using this data, we characterize the envelope of variation, rates, and conditioning factors on agreement variation in caregiver and child US English.
Background
Acquisition of linguistic variation
Despite the fact that child-directed speech often differs substantially from adult-directed speech in speed, vocabulary, and sentence structure (e.g., Snow, Reference Snow1972), rates and patterns of variation in child-directed speech closely resemble those in adult-to-adult speech (Miller, Reference Miller2013a; Smith, Durham, & Fortune, Reference Smith, Durham and Fortune2007). When patterns differ, it can be as a direct result of other differences between child- and adult-directed speech (e.g., slower speech reducing deletion; Miller, Reference Miller2013a), or as a result of social pressure (e.g., caregivers avoiding stigmatized variants such as English ain't; Foulkes, Docherty, & Watt, Reference Foulkes, Docherty and Watt2005; Smith et al., Reference Smith, Durham and Fortune2007).
Children produce variation as young as 2-3 years, but acquisition of the factors that constrain variation continues gradually through childhood (Chevrot, Nardy, & Barbu, Reference Chevrot, Nardy and Barbu2011; Guy & Boyd, Reference Guy and Boyd1990; Kovac & Adamson, Reference Kovac, Adamson, Sankoff and Cedergren1981; Miller, Reference Miller2013a, Reference Miller2013b, Reference Miller, Ionin and Rispoli2019; Shin, Reference Shin2016; Shin & Miller, Reference Shin and Miller2022; Smith, Durham, & Fortune, Reference Smith, Durham and Fortune2009; Smith et al., Reference Smith, Durham and Fortune2007). For instance, Guy and Boyd (Reference Guy and Boyd1990) found that adults’ -t/d omissions were conditioned by morphology, with more omissions on uninflected words (e.g., mist) than on regular (missed) or semiweak (kept) past-tense verbs. Children (ages 4-18), like adults, omitted -t/d more often on uninflected words, but also frequently omitted -t/d on semiweak verbs. Interestingly, when children fail to show sensitivity to a conditioning factor, they often appear to have regularized the input, using one variant near categorically (Shin & Miller, Reference Shin and Miller2022, and references therein).
Variation can also affect the timing of acquisition. Children acquiring a morphological marker that is variably omitted often take longer to use the marker in comprehension (de Villiers & Johnson, Reference de Villiers and Johnson2007; Miller, Reference Miller2007, Ch. 5; Miller & Schmitt, Reference Miller and Schmitt2012), and children sometimes generalize variation to contexts where adults are categorical before arriving at adult patterns. Miller (Reference Miller2012) showed that children 3-5 years old who hear nonagreeing don't are more likely to produce nonagreeing do in yes-no questions (“Do your dad write with glitter glue?”) than children without nonagreeing don't in their input, even though neither group hears nonagreeing do (see also Radford, Reference Radford and Meisel1992).
Children learn other aspects of variation readily. For instance, young children pick up on many categorical-variable splits, including in the agreement system. Smith and Durham (Reference Smith and Durham2019) found that children 2-4 years old in Buckie, Scotland, like their caregivers, never produced nonagreeing verb forms with they but did with NP subjects. Even in negation, where Buckie children acquired the variants sequentially, they respected categorical constraints: Like adults, children never used bare na with third-person singular subjects.
Agreement variation in English
Agreement variation in plural existentials as in (1) has been documented across centuries (Nevalainen, Reference Nevalainen2006) and in an extremely wide range of English varieties, from Belfast (Henry, Reference Henry2016) to the American Rockies (Antieau, Reference Antieau2011) to the Falkland Islands (Britain & Sudbury, Reference Britain, Sudbury, Jones and Esch2002). In mainstream Englishes, agreement variation occurs almost exclusively in existential and similar constructions (i.e., here and where sentences; Biber, Johansson, Leech, Conrad, & Finegan, Reference Biber, Johansson, Leech, Conrad and Finegan1999:186; Chambers, Reference Chambers and Kortman2004; Krejci & Hilton, Reference Krejci and Hilton2017). In varieties where variation occurs more broadly (e.g., in the past tense They was/were very close), nonagreement is more common in existentials than in other contexts (Antieau, Reference Antieau2011; Tagliamonte, Reference Tagliamonte1998).
In existentials, variation typically involves a singular verb form occurring with a plural postverbal noun phrase, as in (1b). For instance, Crawford (Reference Crawford2005) showed that in mainstream American and British English there're and there are were almost invariably followed by a plural noun (0.4% nonagreement), but rates of nonagreement were higher with there's and there is (15.6% and 4.4%, respectively). Variation occurs with full forms of be but is most frequent with contracted 's (Crawford Reference Crawford2005; Krejci & Hilton, Reference Krejci and Hilton2017; Martinez Insua & Palacios Martinez, Reference Martinez Insua and Palacios Martinez2003; Meechan & Foley, Reference Meechan and Foley1994).
Overview of the current study
In the current study, we examine agreement variation in US English, extracting all instances of third-person tensed be and characterizing their subjects in two single-family corpora and one multifamily corpus. We examine and compare agreement in child and caregiver speech in a variety of contexts to explore whether children's envelope of variation matches their caregivers’, and more broadly to ask how categorical and variable patterns are simultaneously acquired.
The single-family corpora let us characterize patterns of agreement variation in two families with different socioeconomic backgrounds and afford large enough samples to compare variation in a particular child's speech with their caregivers’. The cross-sectional corpus lets us estimate the prevalence of agreement variation across families. Examining both the input and children's production lays the groundwork for a better understanding of the learning processes that connect the two.
Methods
CHILDES corpora
We first examined two corpora from CHILDES (MacWhinney, Reference MacWhinney2000): Sarah (ages 2;3-5;1 [Brown, Reference Brown1973]) and Nina (ages 1;11-3;3 [Suppes, Reference Suppes1974]). These relatively large, single-family corpora of US English allowed us to explore agreement variation in caregivers’ child-directed speech, compare lower-SES and higher-SES caregivers’ use of agreement variation, and compare individual children's patterns of agreement variation with their caregivers’.
Data for both corpora were collected in the late 1960s and early 1970s. Sarah's family was working-class and lived in the Northeast United States (Brown, Reference Brown1973:51). Nina's family was middle-class (Miller, Reference Miller2013b:307), and Nina lived with her mother in California, sometimes visiting her father in the Northeast. The children overlap in age, but Nina's recordings began when she was a few months younger and Sarah's continued longer. Nina's linguistic development was somewhat precocious, so despite the age difference, the children show substantial overlap in linguistic milestones during the time they were recorded (Miller, Reference Miller2013b).
Extraction and exclusions
All sentences with tensed (potentially) third-person forms of be (is, are, was, were, 's, ’re) were extracted from transcripts of child and caregiver speech. In these sentences, the subject was identified. Tokens were then coded for subject person (second, third) and subject number (singular/plural/ambiguous).
Second-person sentences and third-person sentences in which the subject had ambiguous number (e.g., mine, any), could independently elicit variable agreement (e.g., conjoined singulars; Lorimor, Reference Lorimor2007, Ch. 5), or was missing or unintelligible (e.g., inside there is, I think xxx is a better idea) were excluded, as were sentences in which 's was ambiguous between contracted is, has, or does (e.g., What's he like?), and sentences in which the verb had been added by the transcriber (e.g., here ('s) horsie!, dere [:there's] a monkey.). In the event of a self-correction, only the correction was analyzed.
After exclusions, there were a total of 25,566 tokens of be with third-person singular or plural subjects for analysis (Sarah: n = 2866; Sarah's caregivers: n = 6513; Nina: n = 4624; Nina's caregivers: n = 11,563). Because there was essentially no variation in sentences with singular subjects, our analyses focused on the 3,247 sentences (13%) with plural subjects (Sarah: n = 162; Sarah's caregivers: n = 453; Nina: n = 685; Nina's caregivers: n = 1947). This includes both sentences where we expect adults’ production to be variable and sentences where we would expect categorical production (see Table 1). By analyzing both, we address the question of whether children extend variation to nonvariable contexts or whether they show an adult-like categorical-variable split. We return briefly to sentences with singular subjects for an analysis of contractedness.
Coding and conditioning factors
Tokens were coded for a variety of potential conditioning factors. Representative examples are shown in Table 1.
Verb number. Each token was coded as singular or plural.
Speaker. Each utterance was tagged for speaker and categorized as caregiver (adult family member) or the target child. Speech from other adults (e.g., the researcher) or children (e.g., neighbors and friends) was not analyzed.
Sentence type. Previous studies of agreement variation single out there constructionsFootnote 1 (Crawford, Reference Crawford2005; Martinez Insua & Palacios Martinez, Reference Martinez Insua and Palacios Martinez2003; Meechan & Foley, Reference Meechan and Foley1994), but other sources suggest that variation also occurs in superficially similar sentences with here and where (Biber et al., Reference Biber, Johansson, Leech, Conrad and Finegan1999; Chambers, Reference Chambers and Kortman2004; Sparks, Reference Sparks1984). We therefore divided sentences into four categories: where, here, there, and other (see Table 1). Sentences that did not include here, where, or there, or in which the word was embedded inside another constituent (e.g., That's the bin [CP where your toys go], Are the toys [PP over there]?) were coded as other.
Subject type. Subject type often conditions agreement variation in non-there sentences (e.g., Hay & Schreier, Reference Hay and Schreier2004; Henry, Reference Henry2016). In the current data, subjects were classified as either pronouns or nonpronouns. We defined the subject of the sentence as the (potential) agreement controller, regardless of its position in the sentence (e.g., where are your shoes?, there's his feet, circles are here.)
Order. The fact that the verb precedes the agreement controller in there sentences is cited as a potential reason for variation in these sentences (e.g., Chambers, Reference Chambers and Kortman2004). However, order is rarely examined directly, and when it is, findings are mixed (e.g., Britain & Sudbury, Reference Britain, Sudbury, Jones and Esch2002; Cheshire & Fox, Reference Cheshire and Fox2009). We coded the order of subject and verb: SV versus VS.
Verb type. Existential constructions have a high frequency of copula be. To ask if this plays a role in variation, we coded whether each instance of be was a copula (e.g., The cats are here), or an auxiliary (e.g., The cats are sleeping). The few sentences in which the verb was ambiguous were grouped with auxiliaries (e.g., Sure they are, Hot wheels are what?).
We chose not to consider contractedness as a potential conditioning factor in our main analyses. We included both contracted and full-form verbs. Prior studies have found higher rates of nonagreement with there's than there is (e.g., Crawford, Reference Crawford2005), but for our analyses, we were concerned about the degree to which contractedness is confounded with sentence type, subject type, and order of subject and verb. It is rare to have a there, where, or here sentence with a pronoun subject and a contracted verb, because pronouns precede the verb in such sentences and contraction is impossible phrase-finally (e.g., there they are, *there they're, *there they's). Furthermore, the base rate of contraction is much higher for singular present tense verb forms than plural ones in these contexts. A search of the spoken subcorpus of the Corpus of Contemporary American English (Davies, Reference Davies2008) showed that there's represents 66% of third-person singular present tense there + be, while there're represents just 0.4% of third-person plural present tense there + be (see also Westergren Axelsson, Reference Westergren Axelson1998). Thus, in sentences with plural subjects, base rates alone would result in contracted forms appearing to promote nonagreement. Other studies have noted these confounds and conducted descriptive analyses to explore contractedness patterns (e.g., Hay & Schreier, Reference Hay and Schreier2004; Meechan & Foley, Reference Meechan and Foley1994), and we do the same (see both sections below titled Contractedness).
We also excluded child age from our analyses. Both corpora are longitudinal, but the relative sparsity of morphosyntactic variables made formal analysis of change across development impractical. Despite the sparsity, visual inspection of the data broken out over six-month age-windows suggested that the patterns we describe below were stable for both caregivers and children.
Analysis approach
We take a three-step analysis approach. We first describe the data and then present two types of inferential statistical analyses: generalized linear models and best conditional inference trees (Tagliamonte & Baayen, Reference Tagliamonte and Baayen2012).Footnote 2 These analyses each provide different insights into the data: descriptive analyses present a general picture of how and where agreement variation occurs, generalized linear models estimate the contribution of each factor to the choice of a plural versus a singular verb, and conditional inference trees show which factors are most strongly predictive in which subsets of the data.
Results
Descriptive observations
Agreement production with plural subjects showed substantial variability between forms like those in (2a) and (2b) but was essentially categorical with singular subjects, as in (3).
Figure 1 shows the distribution of plural verb forms across sentences with plural subjects, split by corpus, speaker, and order across the top, and by subject type and sentence type on the left. Nonagreement (i.e., lower percentages of plural verb-forms, darker cells) is common but not ubiquitous. It is, as expected, largely confined to sentences with postverbal, nonpronoun subjects. The proportion of nonagreement is particularly high for Sarah and her caregivers and for Nina. Nina's caregivers only occasionally produce nonagreement. Sarah and her caregivers produce nonagreement in other sentences, but Nina and her caregivers largely do not. Data is somewhat sparse, as indicated by the low token numbers in many cells and blank cells where factor combinations did not occur.
Generalized linear models
To more systematically explore the contribution of each conditioning factor, we fit two generalized linear models of verb form in sentences with plural subjects, one for each corpus. The models included the categorical predictors speaker (caregiver/child), subject type (pronoun/nonpronoun), order (SV/VS), verb type (auxiliary/copula), and sentence type (other/where/there/here), and no interactions. The first five predictors were binary and were entered into the model using effects coding, with the first-listed level coded as −0.5 and the second as 0.5. The last predictor had four levels and was entered using three treatment-coded contrasts, comparing here, there, and where sentences respectively to other sentences as a baseline. The dependent variable was verb form, with plural coded as 1 and singular as 0. Negative estimates therefore indicate more nonagreement in the second-listed level.
Nina's model revealed reliable effects of all contrasts except verb type, which was marginal, as shown on the left in Table 2. That is, Nina produced more nonagreement in sentences with plural subjects than her caregivers did, and sentences with nonpronoun subjects, sentences with VS order, and there, where, and here sentences had more nonagreement than sentences with pronoun subjects, SV order, and other sentences, respectively. The results of Sarah's model were similar, except that the effect of speaker is marginal, and strikingly smaller in magnitude (−0.79 versus −3.7 for Nina). This suggests that Sarah's production of plural verbs was more similar to her caregivers’ than Nina's was.
Conditional inference trees
To explore the relationships among predictors, we fit a best conditional inference tree for each corpus. Best conditional inference trees are built in a series of binary splits. At each step, the data is split on the most strongly predictive factor: first the dataset as a whole, then each of the resulting subsets, until no further factors are reliably predictive. This results in a tree-like structure that reveals which factors are most strongly predictive in each subset of the data.
Figure 2 shows the best conditional inference tree for Nina's corpus. There are three key properties to notice. First, where, there, and here sentences are grouped together, opposite other sentences, suggesting that agreement is variable in all three structures. Second, speaker is a key factor on both main branches: Nina produces more singular verb forms than her caregivers for all sentence types. Third, splits below speaker differ. In there, where, and here sentences, Nina's caregivers differentiate between sentence types and, in here sentences, between SV and VS orders. In contrast, Nina differentiates primarily by subject type and order.
Figure 3 shows the best conditional inference tree for Sarah's corpus. In several respects, Sarah's data is like Nina's: where, there, and here sentences are grouped opposite other sentences and VS order favors singular verb forms. In contrast to Nina's data, Sarah does not differ strongly enough from her caregivers for speaker to appear as a predictor in the tree. There, where, and here sentences do not subdivide further, and there is substantial agreement variation even in other sentences, including effects of order and subject type.
Contractedness
Previous studies have found that plural subjects are more common with the reduced verb form 's than the full form is (e.g., Crawford, Reference Crawford2005; Hay & Schreier, Reference Hay and Schreier2004). Because of likely confounds, instead of including contractedness in our main analyses (see Coding and Conditioning Factors), we report an exploration of contractedness here. To avoid the baseline differences in contraction rates for is and are, we ask how often sentences with singular verb forms (is, was, 's) have singular or plural subjects.
First, we confirmed the previously observed pattern: Speakers produced more plural subjects with contracted singular verb forms than with full singular verb forms (Figure 4a). This was true for both caregivers and children, and despite the fact that our dataset includes a wider variety of sentence types than, for instance, Crawford's (Reference Crawford2005) study of there + be.
What drives this pattern? Is there something about contracted verb forms that permits variation, or are contracted forms and variation independently common in the same environments? To determine the likelihood of contraction independent of agreement variation, we examined contraction rates in sentences with both singular subjects and singular verb forms. The proportion of full-form verbs showed two notable patterns. First, contraction was present in almost all cells and common in many. Second, the four contexts in which children and caregivers in both corpora were most likely to use contracted forms were VS there, where, and here sentences with nonpronoun subjects in VS order (here 4% full-form, there 22%, where 20%), and in other sentences with pronoun subjects in SV order (19% full-form). These contexts overlap substantially with the sentence types that promote nonagreement.
The independently high likelihood of contraction in key variable contexts might be enough to drive the pattern in Figure 4a. If so, once we control for sentence type, we would expect contracted and uncontracted verb forms to occur with plural subjects at similar rates. In contrast, if contraction uniformly promotes nonagreement, we would expect higher rates of plural subjects with contracted forms in all sentence types. The factors may also interact, with contraction occurring more often with plural subjects in some sentence types but not others.
Figure 4b shows the rate of plural subjects with contracted and full singular verb forms for five sentence types with high rates of contractedness: nonpronoun, VS there, where, here, and other sentences (e.g., there: there's your ginger ale waiting for you; where: where's his legs?; here: here's the monkeys swinging; other: what is the owl sitting on?), and pronoun, SV other sentences (e.g., she's not afraid). The plot shows that contractedness interacts with sentence type: caregivers produce more plural subjects with contracted verbs in VS there, where, and here sentences only. Rates are flat and low for SV other sentences and decrease for VS other sentences. Interestingly, the children are less consistent. Like their caregivers, they have more plural subjects with contracted singular verbs in VS here sentences, but they show flat or falling patterns for there and where sentences. Thus, while some of the effect in Figure 4a may come from the independent association of the same sentence types with contraction and variation, contractedness also promotes nonagreement in certain sentence types, particularly among caregivers.
Discussion
In our first study, we examined two corpora of caregiver and child speech and found substantial agreement variation. Variation occurred only in sentences with plural subjects, and, like previous studies, we saw effects of subject type, order of subject and verb, and sentence type. We also observed substantial differences between caregiver rates of variation in Nina and Sarah's corpora, and between Nina's rate of variation and her caregivers’. In an additional descriptive analysis, we saw a small effect of contractedness on variation only in there, where, and here sentences in caregivers’ speech.
These findings confirm that agreement variation is present in child-directed US English. The presence of variation, and the fact that the patterns echo those in previous studies, suggest that agreement variation is neither something that caregivers avoid nor something that independent properties of child-directed speech disfavor.
Another familiar pattern in these data is the difference between Sarah and Nina's caregivers. Previous studies have found that less-educated speakers tend to use higher rates of nonagreement (e.g., Meechan & Foley, Reference Meechan and Foley1994), and that variation in other sentences is largely absent in higher-prestige varieties of English (Chambers, Reference Chambers and Kortman2004). Consistent with this, we see that rates of plural verb forms are much lower among Sarah's caregivers than among Nina's and that only Sarah's caregivers produce nonagreement in other sentences.
A new finding in the current data is the marked difference between Nina and her caregivers. While Sarah matches her caregivers relatively well, producing only slightly more nonagreement, Nina produces drastically more nonagreement than her caregivers. Why might this be? Looking at the descriptive analyses, Nina's nonagreement appears in the same cells as her caregivers’: VS where, there, and here sentences. This suggests that Nina's over-production of nonagreement does not result from general confusion about agreement or about where variation is possible. However, in the conditional inference trees, the patterns that predict variation for Nina and her caregivers differ substantially. For her caregivers, sentence type is more predictive than order, and for Nina is the reverse. This mismatch suggests that Nina and her caregivers may be arriving at similar variation by different routes. This has interesting implications for acquisition: Children may be treating some potential conditioning factors as better bases for generalization than others, leading them to group there, where, and here sentences, even when they are differentiated in their input and despite the different linguistic analyses of their underlying structure. We return to this possibility in the General Discussion.
Search-and-Find corpus
Previous research suggests that agreement variation, particularly in existentials, is common not just across varieties but among individuals (e.g., Antieau, Reference Antieau2011; Hay & Schreier, Reference Hay and Schreier2004). To estimate its prevalence across families, we followed our analysis of Sarah and Nina's data by collecting a small corpus of caregiver and child speech using a Search-and-Find task. We first analyze data from all families together, providing information about patterns in the sample as a whole, and then we explore individual families’ patterns.
In the Search-and-Find task, caregivers sat with their children and worked through a simple Search-and-Find book (see Figure 5) that we designed to elicit there, where, and here constructions, and to promote plural subjects. Though we recorded and transcribed both parent and child speech, child speech made up a smaller proportion of the included sentences in this corpus (15% versus 31% for Sarah, 29% for Nina).
Methods
Participants
A total of one hundred English-speaking families participated in a Search-and-Find task over 105 sessions.Footnote 3 Sessions ranged from three to 18.5 minutes of recorded conversation (mean 7 minutes), for a total of about 12.5 hours of data. Children ranged in age from 1;7-6;0 (mean = 3;11, median = 3;11). Data was collected in 2016 and 2017 in central Pennsylvania. Participating caregivers grew up primarily in the Northeast and Mid-Atlantic US (n = 77) with smaller numbers from the Western US (n = 8), the Midwest (n = 8), and the US South (n = 1). Only one caregiver grew up outside the US (Toronto, Canada). The remaining caregivers specified broader regions (East/East Coast/Atlantic, n = 5; USA, n = 1). Caregivers’ highest level of education ranged from high school (n = 1) to a PhD or MD (n = 16). The most common level was a Bachelor's degree (n = 42), and the median level was a Master's degree (n = 35). The remaining caregivers had some college or an Associate's degree (n = 6).
Two additional families participated in the task but were not included because the participating caregiver did not learn English in early childhood.Footnote 4
Materials
Families were given a Search-and-Find book, in which each pair of pages included a moderately complex display of objects, and a smaller set of labeled objects (Figure 5). There were five pairs of pages (toys, beach, picnic, farm, bedroom), each with three singular and four plural items to locate.
Procedure
Sessions of the Search-and-Find task included one caregiver and their child or children. Caregivers and children were seated in adjacent chairs or with the child on the caregiver's lap in a corner of a quiet testing room with an audio recorder on a small table beside them. The researcher explained the task, then activated the recorder and moved behind a partition for the duration of the session. Caregivers were asked to work through the book with their child as they would at home.
Transcription, coding, and exclusions
Caregiver and child speech in each session was divided into turns and orthographically transcribed. Coding procedures and exclusions were identical to those described for the Sarah and Nina corpora above.
After exclusions, there were a total of 5,292 tokens of be with third-person singular or plural subjects for analysis (children: 769 tokens, caregivers: 4,523 tokens). Of these, 26% (n = 1383) had plural subjects (children: n = 170, caregivers: n = 1213). Quantity of data varied between families. Children produced a median of five included sentences (range: 0-33), and caregivers produced a median of 39.5 (range: 3-139). As before, we focus on sentences with plural subjects and return to sentences with singular subjects only for an analysis of contractedness.
Results
Descriptive observations
Figure 6 shows the distribution of plural and singular verb forms in sentences with plural subjects in the Search-and-Find corpus, split by speaker, order, subject type, and sentence type. Patterns were very similar to those in Sarah and Nina's corpora: Singular verb forms (i.e., nonagreement) occurred primarily and frequently in sentences with postverbal, nonpronoun subjects, particularly there, where, and here sentences.
Generalized linear model
We fit a mixed-effects generalized linear model of verb form with the categorical predictors speaker (caregiver/child), subject type (pronoun/nonpronoun), order (SV/VS), verb type (auxiliary/copula), and sentence type (other/where/there/here), and random intercepts by family. No interaction terms were included. Coding and contrasts were identical to the models for Sarah and Nina (See above section titled Generalized Linear Model).
This model revealed reliable effects of speaker, order, subject type, and sentence types there and here versus other, as shown in Table 3. This means that children were less likely to provide plural verbs in sentences with plural subjects than their caregivers and that nonpronoun subjects, VS order, and there and here sentences all favored singular verb forms, as compared to pronoun subjects, SV order, and other sentences, respectively.
Conditional inference tree
As before, to better understand the relationships among these factors, we built the best conditional inference tree, shown in Figure 7. Three patterns stand out. First, this tree groups where, there, and here sentences opposite other sentences, even when the mixed effects model did not flag the where-other contrast as reliable. Even if the pattern is somewhat less reliable for where sentences in this data, there are still important similarities between patterns in where sentences and those in there and here sentences. Second, subject type and the order are again key predictors. Finally, children's rate of nonagreement in nonpronoun, VS, and where sentences is similar to the whole group's rates in nonpronoun, VS, there, and here sentences, although caregivers’ rates are lower.
Contractedness
The effect of contractedness in sentences with singular verbs in the Search-and-Find corpus strongly resembled those for Sarah and Nina. First, we saw that both caregivers and children produced more plural subjects with contracted singular than with full singular verbs (Figure 8a).
As for Sarah and Nina, contracted singular verbs were common and were particularly likely in VS there, where, and here sentences, even when singular was the expected form. Figure 8b shows the rate of plural subjects with contracted and full-form singular verbs for nonpronoun, VS there, where, here, and other sentences, and pronoun and SV other sentences. As before, we found an interaction: SV other sentences with pronoun subjects and singular verbs never had plural subjects despite high rates of contraction. When they appeared with contracted verbs, VS other sentences with nonpronoun subjects were less likely to occur with plural subjects, and VS there, where, and here sentences were more likely to do so. In contrast to Sarah and Nina, and like their caregivers, the children in the Search-and-Find corpus produced more plural subjects with contracted than with full-form singular verbs in nonpronoun VS here, there, and where sentences.
Prevalence of variation across families
The cross-sectional sample allows us to characterize the prevalence of variation across families. Most caregivers produced at least one singular verb form with a plural subject (58/99,Footnote 5 59%), though some categorically produced plural agreement (41/99, 41%). If we look only at those families who produced at least one likely context for variation (i.e., a VS there, where, or here sentence with a plural, nonpronoun subject), the proportion who produce at least one instance of nonagreement is even higher (58/78, 74%).
Figure 9a shows that as the number of likely contexts for variation increases, the likelihood of producing at least one instance of nonagreement does also. In contrast, the average rate of plural verb forms in likely contexts for variation remains stable regardless of the number of contexts produced (Figure 9b). Extreme values are most common among caregivers who produce few likely contexts for variation and occur in both directions (all plural or all singular). This suggests that extreme values may be the result of sampling error, and that it is likely that all families vary at least occasionally.
Discussion
Patterns in the Search-and-Find corpus echo those in Sarah and Nina's data: Agreement variation is present in caregivers’ child-directed speech, it occurs only with plural subjects, and there are familiar effects of subject type, order, and sentence type. Like in Nina's corpus, we saw a reliable difference between caregivers and children, with children producing more nonagreement than adults. We saw a small effect of contractedness on caregivers’ and children's speech in VS, pronoun subject, and in there, where, and here sentences.
In this corpus we were also able to explore how widespread agreement variation is across families. We argue that the patterns are consistent with widespread or universal variation: as families produced more likely contexts for variation, the probability of observing at least one instance of nonagreement rapidly approached certainty.
General discussion
In two studies of caregiver and child US English, we found substantial agreement variation. Adults’ use of variable agreement in child-directed speech followed patterns familiar from studies of adult-to-adult speech. It occurred only with plural subjects, and nonagreement was more common in sentences with postverbal, nonpronoun subjects, particularly there, where, and here sentences.
Comparing children's variation to their caregivers’ resulted in more complicated patterns. In all three corpora, children's production of agreement was conditioned by many of the same key factors: Like adults, children consistently produced singular verb forms with singular subjects, near-categorically produced plural verb forms with plural pronoun subjects and plural nonpronoun subjects that preceded the verb, and variably produced singular and plural verb forms elsewhere. However, Nina and the children in the Search-and-Find corpus differed from their caregivers in rate of agreement and ranking of conditioning factors, producing patterns that looked much more like Sarah's, who in turn matched her caregivers well.
These findings (a) provide crucial background information about agreement in children's linguistic input, (b) demonstrate that children's agreement production reflects sophisticated knowledge of linguistic variation and enrich our understanding of how children learn language, and (c) inform analyses of the mechanism and sources of agreement variation. We briefly expand on each in turn.
Agreement variation in the input
Prior research on the acquisition of English verb agreement frequently assumes that it is categorical (e.g., Lukyanenko & Fisher, Reference Lukyanenko and Fisher2016; Theakston & Rowland, Reference Theakston and Rowland2009). This is a reasonable simplifying assumption for studies using sentence types in which adults produce agreeing verb forms categorically (e.g., subject-first declaratives: the tigers are holding the pen; Theakston & Rowland, Reference Theakston and Rowland2009:1454), and in the absence of detailed information about the presence and patterns of agreement variation in the appropriate variety of child-directed English. However, studying children's knowledge and use of variation stands to tell us at least as much about how children learn, categorize, and generalize as studying the places where behavior is categorical. Close examination of patterns of variability and consistency in the input is a crucial prerequisite to this work.
The current study demonstrates that agreement variation is common and widespread in child-directed US English, and that it patterns with adult-directed English (e.g., Crawford, Reference Crawford2005; Meechan & Foley, Reference Meechan and Foley1994; Walker, Reference Walker, Beaman, Buchstaller, Fox and Walker2021). This provides important background information for future studies of English agreement acquisition. As a concrete example, it provides support for the speculation that variation may be a reason for the widely observed asymmetry between singular and plural verb forms in comprehension. In several studies, researchers have found that children are less likely to treat singular than plural forms of be as informative cues to subject number (Davies, Rattanasone, & Demuth, Reference Davies, Rattanasone and Demuth2020; Lukyanenko & Fisher, Reference Lukyanenko and Fisher2016; Lukyanenko & Miller, Reference Lukyanenko, Miller, Bertolini and Kaplan2018). This asymmetry does not appear to extend to other cues to number (e.g., nominal plural: Davies, Rattanasone, Schembri, & Demuth, Reference Davies, Rattanasone, Schembri and Demuth2019; demonstratives: Reuter, Sullivan, & Lew-Williams, Reference Reuter, Sullivan and Lew-Williams2022), or even to contexts where variation with be is less likely (yes-no questions: Deevy, Leonard, & Marchman, Reference Deevy, Leonard and Marchman2017, Figure 1), making variation a likely explanation. Children may be rightly treating is and 's as uninformative cues since, in their experience, singular forms of be aren't picky about their subjects. The current study opens the way for further investigation of this phenomenon and for other studies of how children's real-time use of agreement during comprehension is influenced by patterns of variability and consistency in their input (Lukyanenko & Miller, Reference Lukyanenko, Miller, Bertolini and Kaplan2018).
Children's use of categorical and variable patterns and implications for acquisition
Agreement variation presents an interesting challenge for learners, since the same verb forms are used categorically in some contexts and variably in others. Our data showed no tendency for children to impose categorical structure on adults’ variability nor to vary where adults were categorical. Instead, children's production of English agreement respected an adult-like categorical-variable split from early childhood.
Children consistently produced singular verb forms with singular subjects, and plural forms with plural preverbal and pronoun subjects. This is consistent with research demonstrating that young children respect categorical-variable splits (Johnson & White, Reference Johnson and White2019; Smith & Durham, Reference Smith and Durham2019) and with findings from the acquisition literature that when children begin to produce agreement, they typically produce the expected form of agreeing verbs (e.g., Wexler, Reference Wexler, de Villiers and Roeper2011).
Studies of artificial language learning suggest that children have a stronger tendency than adults to impose categorical structure, even on conditioned variation (Hudson Kam, Reference Hudson Kam2015; Samara, Smith, Brown, & Wonnacott, Reference Samara, Smith, Brown and Wonnacott2017; Schwab, Lew-Williams, & Goldberg, Reference Schwab, Lew-Williams and Goldberg2018; Sneller & Newport, Reference Sneller and Newport2020). In the current study we saw no tendency for children to be more categorical than adults. Where children and caregivers differed, the children tended to be more variable. For instance, Nina's caregivers produced high rates of agreement with postverbal plural subjects in there sentences (123/134, 91.8%), but, rather than rounding up and producing only plurals, Nina regularly used both forms (10/30, 33.3% plural).
Our data show that children's agreement production reflects sensitivity to the patterns of variation in their caregivers’ speech. Children show substantial alternation among the available forms in appropriate sentence types, and the same factors promote verb nonagreement for adults and children. This adds to evidence that young children use variable English agreement in relatively adult-like ways (Henry, Reference Henry2016; Smith & Durham, Reference Smith and Durham2019:148-160) and contrasts with studies of other variables that find categorical production first, followed by variation (e.g., isn't versus ain't: Miller Reference Miller, Donaher and Katz2015; negation: Smith & Durham, Reference Smith and Durham2019:133-148; see Shin & Miller, Reference Shin and Miller2022 for a review). It also suggests an alternative explanation for certain seemingly nonadult-like patterns observed in prior acquisition studies. It may be that children's more frequent “errors” with postverbal plural subjects (e.g., Theakston & Rowland, Reference Theakston and Rowland2009) are not errors sparked by the high frequency of singular verb forms, but evidence of sensitivity to variation in the input.
Children were not entirely adult-like: Conditional inference trees indicated that both Nina and the children in the Search-and-Find corpus treat there, where, and here sentences more uniformly than their caregivers, and that Nina seems to place more weight on subject type and order than her caregivers. Differences of this kind represent an important area for future research. One possibility is that children track and represent different features than adults do. Preliminary evidence for this comes from a study demonstrating that children and adults differ in their use of singular forms in comprehension (Lukyanenko & Miller, Reference Lukyanenko, Miller, Bertolini and Kaplan2018). In an eye-tracking task, adults treated full-form singular is as an informative cue to subject number, but children did not. Neither age group treated contracted 's as an informative cue. This suggests that adults track the contractedness of the verb, but that children do not. Consistent with this, in the current study, Sarah's and Nina's caregivers produced more plural subjects with contracted than with full-form singular verbs, but Sarah and Nina did not.
Analyses of English agreement variation
The current data have implications for linguistic analyses of English agreement variation. In particular, the commonalities between there, where, and here sentences that we quantify present a challenge for theoretical approaches that hinge on the existential structure of variable contexts (e.g., Meechan & Foley, Reference Meechan and Foley1994). Such analyses may successfully extend to presentational here sentences but will likely struggle to explain variation in where sentences, given the different relationship between the subject and the verb.
Another class of explanations hinges on the nature of 's. Some proposals suggest that speakers treat there's as a single unit that does not participate in agreement dependencies (e.g., Rupp & Britain, Reference Rupp and Britain2019; Smith & Durham, Reference Smith and Durham2019). Other proposals suggest that 's is or is becoming a nonagreeing clitic (e.g., Krejci & Hilton, Reference Krejci and Hilton2017). As sole explanations, our data suggest that neither is sufficient. Variation in there, where, and here sentences requires the first group of proposals to posit many fused forms across a wider variety of sentence structures. Similarly, speakers’ categorical use of agreement with pronoun subjects despite high rates of contractedness likely requires the second group to posit two versions of 's, one that occurs with pronouns that is retaining its agreement features, and one that occurs in there, where, and here sentences that is losing them.
In our view, the patterns of variation we describe are most consistent with a processing explanation or some combination of processing and the explanations above. Processing explanations suggest that it is more effortful to select an agreeing form when one must look ahead for the agreement controller than when the controller has already been produced (e.g., Chambers, Reference Chambers and Kortman2004; Cheshire & Fox, Reference Cheshire and Fox2009). Such explanations are consistent with children's higher rates of nonagreement, as well as the higher rates of nonagreement in the youngest Buckie children noted by Smith and Durham (Reference Smith and Durham2019:158). Because children plan their sentences in shorter chunks than adults (McDaniel, McKee, & Garrett, Reference McDaniel, McKee and Garrett2010; Redford, Reference Redford2013), they might use singular verb forms as a strategy for avoiding early number commitments in VS sentences.
Conclusion
Our analyses demonstrate the widespread presence of agreement variation in child-directed US English, children's accurate production of categorical agreement, sensitivity to the same conditioning factors in adults’ and children's production of variation, remarkable consistency in how variation patterns across there, where, and here sentences, and a tendency for some children to be more variable than their caregivers, but only in variable contexts.
The language acquisition literature and the sociolinguistics literature have a history of approaching children's language in very different ways, with acquisitionists focusing on categorical patterns (e.g., Keeney & Wolfe, Reference Keeney and Wolfe1972; Lukyanenko & Fisher, Reference Lukyanenko and Fisher2016; Theakston & Rowland, Reference Theakston and Rowland2009), and the sociolinguists focusing on variation (e.g., Henry, Reference Henry2016; Smith & Durham, Reference Smith and Durham2019). We are not the first to notice that there is progress to be made by exploring consistent patterns, variability, and their interaction in acquisition (e.g., Johnson & White, Reference Johnson and White2019; Roberts, Reference Roberts1997). We hope that this study and others like it will inform future research in both fields.
Acknowledgments
Many thanks to the research assistants who helped run sessions of the Search-and-Find task and to transcribe, extract, and code the sentences analyzed here: Alaina Eck, Maggie Featherstone, Lorrin Mathias, Maddie Metzger, Katherine Muschler, Cassie Race, and Adriana Shevlin.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0954394523000054.
Data availability statement
Data and R code for all analyses in this paper may be accessed at https://osf.io/fwyvd/.
Competing interests
The authors declare none.