1 Introduction
The role of acoustic detail in phonologically identical forms is rather limited according to some well-established psycholinguistic and linguistic models, and semantic, syntactic, or morphological information is not expected to be reflected in the acoustics. For instance, the difference in morphological complexity between the English word laps, which is complex, and lapse, which is simplex, should not be expressed in the acoustic output, since the level between morphology and the acoustic output, namely phonology, produces the same form for both laps and lapse. As one case in point, psycholinguistic feed-forward models of speech production (see, e.g., Fromkin Reference Fromkin1971/1973; Harley Reference Harley1984; Levelt Reference Levelt1989, Reference Levelt1995; Roelofs Reference Roelofs1997; Levelt, Roelofs & Meyer Reference Levelt, Roelofs and Meyer1999) do not leave room for acoustic variation in the presence of phonological identity. Once the discrete symbolic representations are specified at the phonological level, and are alike for two words like laps and lapse, acoustic differences are excluded as long as everything else, such as the context, is held constant. We find a similar prediction in linguistic models describing the interaction of morphology and phonology (e.g. Chomsky & Halle Reference Chomsky and Halle1968; Kiparsky Reference Kiparsky, van der Hulst and Smith1982; Bermúdez-Otero Reference Bermúdez-Otero, Hannahs and Bosch2018). Here again, if laps and lapse are not distinct on the abstract and underlying level of lexical phonology, post-lexical phonology and phonetics should not cause acoustic variation. Although these psycholinguistic and linguistic models represent (or represented) the standard view, they have been challenged by many empirical studies showing that the role of acoustic detail in the language system is greater than previously assumed. These findings are more compatible with exemplar-based accounts, which offer more flexibility in the speech production process and in which the acoustic realization of items can be directly affected by activated information in, say, the semantic, syntactic or morphological domain (see, e.g., Dell Reference Dell1986; Pierrehumbert Reference Pierrehumbert, Bybee and Hopper2001, Reference Pierrehumbert, Gussenhoven and Warner2002).Footnote 2
The present article connects to all the previous research which asks whether acoustic detail plays a more significant role in language than well-known psycholinguistic and linguistic models presume. Specifically, we investigate whether morphosyntactic agreement in English is reflected in the acoustics, namely in the duration of the word-final s of regular plural nouns. Both noun–determiner and noun–verb agreement are in focus: while the determiner these agrees overtly with the subsequent plural noun with respect to the number value (e.g. these cabs), the does not do so (e.g. the cabs); similarly, while a present tense verb agrees overtly with the noun (e.g. cabs break down), a past tense form does not (e.g. cabs broke down). Such an effect would be remarkable, and so we proceed cautiously. However, a previous experiment (Schlechtweg & Corbett Reference Schlechtweg and Corbett2021) gave a tantalizing hint that there might be such an effect, and we therefore decided to investigate further. For this purpose, we conducted a well-controlled reading experiment in which native speakers of English participated.
Before presenting the details of this study in section 3, we provide the theoretical foundation of our analysis in section 2. This includes, first of all, a general overview of variables that seem to affect the acoustic realization of items. In a second step, we concentrate on one particular case, namely the duration of the word-final s in English, which has been measured in several contributions already and which is also the response variable in our own study. In the third component of section 2, we reflect upon why morphosyntactic agreement might be potentially mirrored in acoustic detail by considering previous research on how the concepts of informativeness and, crucially, predictability can influence the duration of linguistic material. Having presented our study in section 3, we discuss our findings in connection to previous research in section 4 and conclude in section 5.
2 Theoretical background
2.1 Phonological identity but acoustic variation: overview
In the last decades, a great number of studies have revealed that phonologically identical forms can differ acoustically, in their duration for instance. The decisive question in this research area is which particular variables are the origin of the acoustic variation. Four examples of such variables are frequency, syntactic category, morphosyntactic number and morphological status. Forms of higher frequency, such as the English noun time, are typically produced with a shorter duration than forms of lower frequency, like the phonologically identical word thyme (see, e.g., Whalen Reference Whalen1991; Gahl Reference Gahl2008; Drager Reference Drager2011; Conwell Reference Conwell2018; Lohmann Reference Lohmann2018a, Reference Lohmann2018b; but see also, for conflicting results, Jurafsky, Bell & Girand Reference Jurafsky, Bell, Girand, Gussenhoven and Warner2002; Cohn et al. Reference Cohn, Brugman, Crawford and Joseph2005). Moreover, Sereno & Jongman's (Reference Sereno and Jongman1995) data suggest that the syntactic category of an item affects the acoustics of this item; they detected variation between words like answer (verb) and the respective nominal equivalent (answer). Crucially, however, Lohmann (Reference Lohmann2020) did not replicate the effect. A further variable that seems to be reflected in acoustic detail is morphosyntactic number, since Schlechtweg & Heinrichs (Reference Schlechtweg and Heinrichs2022) and Schlechtweg, Heinrichs & Linnenkohl (Reference Schlechtweg, Heinrichs, Linnenkohl and Schlechtweg2020) found that German plural nouns (e.g. Schatten ‘shadows’) are longer than the phonologically identical singular forms (e.g. Schatten ‘shadow’). A fourth example of a variable is the morphological status. Elements of morphologically complex words, like the dis prefix of the English verb discolor, differ in their acoustic properties from structures that are phonologically alike but lack a morphological function, such as dis in discover (see, e.g., Kemps et al. Reference Kemps, Ernestus, Schreuder and Baayen2005a; Kemps et al. Reference Kemps, Wurm, Ernestus, Schreuder and Harald Baayen2005b; Sugahara & Turk Reference Sugahara and Turk2009; Smith, Baker & Hawkins Reference Smith, Baker and Hawkins2012). The variable morphological status connects to several studies examining the duration of the word-final s in English. Since we also measured the s duration in our own study, we consider this aspect in more detail in the next section.
2.2 Word-final s in English
After the general overview of variables potentially affecting the acoustics of phonologically identical forms, we focus on research on the duration of the English word-final s here. A central comparison in former investigations was the duration of affixal and non-affixal s. On the one hand, there is evidence that affixal s, as in laps, is longer than non-affixal s, as in lapse (Walsh & Parker Reference Walsh and Parker1983; Schwarzlose & Bradlow Reference Schwarzlose and Bradlow2001; Song et al. Reference Song, Demuth, Evans and Shattuck-Hufnagel2013; Seyfarth et al. Reference Seyfarth, Garellek, Gillingham, Ackerman and Malouf2018). Interestingly, the opposite effect, longer non-affixal s, was found in quite a few other studies (Zimmermann Reference Zimmermann, Carignan and Tyler2016; Plag et al. Reference Plag, Homann and Kunter2017; Schmitz, Baer-Henney & Plag Reference Schmitz, Baer-Henney and Plag2021; Tomaschek et al. Reference Tomaschek, Plag, Ernestus and Baayen2021). These conflicting findings are surprising in the first instance, but there are several aspects that must be taken into account. First of all, some of the studies are limited and caution is needed when interpreting the respective data. As argued in Plag et al. (Reference Plag, Homann and Kunter2017: 185), it is difficult to evaluate Schwarzlose & Bradlow (Reference Schwarzlose and Bradlow2001) and Walsh & Parker (Reference Walsh and Parker1983), owing to a small sample size and since many decisive details, including statistical details, are not presented. Second, at closer inspection, the results are not necessarily incompatible. Tomaschek et al. (Reference Tomaschek, Plag, Ernestus and Baayen2021: 128) point to the fact that Seyfarth et al. (Reference Seyfarth, Garellek, Gillingham, Ackerman and Malouf2018) predominantly looked at the voiced s; this specific group was not only longer for affixal than for non-affixal s in Seyfarth et al. (Reference Seyfarth, Garellek, Gillingham, Ackerman and Malouf2018) but also in Plag et al. (Reference Plag, Homann and Kunter2017).
Apart from the comparison of affixal and non-affixal s, different types of affixal s have also been examined. Hsieh, Leonard & Swanson (Reference Hsieh, Leonard and Swanson1999), but not Song et al. (Reference Song, Demuth, Evans and Shattuck-Hufnagel2013), found longer plural (e.g. laps) than third-person singular s (e.g. plays), but they admit that sentence position is a potential confound: the fact that plural forms occur more often than third-person singular forms at the end of a sentence might also be responsible for the effect. Plag et al.'s (Reference Plag, Hedia, Lohmann, Zimmermann, Körtvélyessy and Stekauer2020) experiment revealed that plural-genitive s (e.g. colleagues’) is longer than plural s (e.g. colleagues). The authors consider the lower frequency of the plural-genitive to be a possible reason for this result. In a recent study, Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021) concentrated on two other types of affixal s, namely the word-final s in regular plural (e.g. toggles) and pluralia tantum nouns (e.g. goggles). In a reading study, they tested 40 native speakers of English and nine pairs like toggles/goggles. The s was manually segmented and no difference in duration was detected between the groups of interest. The null effect was attributed to the fact that both regular plural and pluralia tantum nouns control morphosyntactic agreement regularly (since both take a plural verb form). However, the statistical analysis, including linear mixed-effects models, showed an interesting side effect. Before discussing the effect, let us look at the test sentences used in the experiment (see table 1).
In the study presented in Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021), it was essential to control for potentially confounding variables across the two conditions regular plural and pluralia tantum nouns. One way to achieve this was by relying on the same sentences in the two conditions so that, say, toggles and goggles were read out in exactly the same environment. For the present purpose, however, we need to consider a type of variation between the different test sentences: while four were in the present tense, a past tense verb occurred in five others. VerbTense was included in the mixed effects model as a fixed effect and three criteria, outlined in Plag et al. (Reference Plag, Homann and Kunter2017: 194), showed that VerbTense played a crucial role in the study. First, after the elimination of non-significant fixed effects, VerbTense remained in the final model as a significant fixed effect with t statistics smaller than -2. Second, it turned out that VerbTense improved the fit of the model, since the model with this fixed effect was significantly different from the model without VerbTense. Third, the Akaike Information Criterion (AIC) was smaller if VerbTense was in the model, in comparison to the model without it. The robustness of the effect was indicated by the fact that different models confirmed the finding. Table 2 presents the details of one model, in which the effect of VerbTense on the s duration becomes clear.Footnote 3 The descriptive statistics showed mean values of 0.062 seconds for the sentences with present tense verbs (standard deviation (SD) = 0.016) and 0.070 seconds for those in the past (SD = 0.017).
In sum, we observe two aspects in Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021), which are relevant to the current article. First, the s of the respective nouns was shorter if the sentence contained a present tense verb, in comparison to sentences with a past tense verb. That is, the s duration was reduced in the presence of overt morphosyntactic agreement, with the verb form functioning as another plurality marker. It could be that the longer s duration in sentences with a past tense verb compensates for the lack of another plurality marker. Second, as can be seen in table 1, we have to take into consideration that the groups, present tense (overt agreement) and past tense (no overt agreement), included totally different test sentences. The effect must therefore be treated with caution, and a controlled experiment needs to be designed and conducted to evaluate whether the effect is indeed real. This is the objective of the current work. Apart from the cases of noun–verb agreement just referred to, we intend to examine a second type of agreement in English, namely noun–determiner agreement, by contrasting the sentences with these, which reflects overt plural agreement between noun and determiner, to those with the, which might precede both a singular and a plural noun and hence does not signal overt number agreement. In table 1, we see that there was overt noun–determiner agreement in some (the sentences with these) but not in other sentences (the sentences with the, his, our). Although no effect of DeterminerAgreement was detected in the above-named study, we investigate this in a controlled experiment, too. Hence, in the controlled experiment, both noun–verb (present versus past tense verb) and noun–determiner agreement (these versus the) are examined.
2.3 No overt versus overt agreement: why the s might differ in duration
Before presenting and discussing the controlled experiment, this section reflects upon why distinct s durations might be theoretically plausible. On the basis of the data presented in the experiment described above and on the basis of two further reasons – the informative value and the syntagmatic probability of the s in the respective sentences – we hypothesize that overt morphosyntactic agreement leads to a reduced s. The first reason is that reduction in speech production is common for less informative, or relevant, material (see, e.g., Krasheninnikova Reference Krasheninnikova, Hollien and Hollien1979: 75; Demuth Reference Demuth, Goldsmith, Riggle and Yu2011). Engelhardt & Ferreira (Reference Engelhardt and Ferreira2014) present evidence for this idea. They contrasted the acoustic realization of necessary and unnecessary modifiers. That is, while blue in the phrase the blue triangle is necessary if triangles of different colors exist in the same context, it is unnecessary if only a single triangle is present in a given situation. It was shown that unnecessary modifiers, which do not provide an essential piece of information for the unique identification of the object (e.g. a triangle), were shorter in duration than necessary ones, which are, in turn, informative and decisive for the specification of the target object (e.g. the blue but not the purple triangle). Transferring these findings to the present project, we suggest that the s is most informative in sentences without overt plural agreement and hypothesize that its duration is longer here.
The second reason why agreement might affect the duration of the s is the concept of syntagmatic probability or predictability. A well-known idea in psycholinguistics, which has good empirical support, is that speakers tend to reduce elements in speech if they are predictable, since less articulatory effort is needed for reduced speech and since successful communication is still likely in reduced structures due to the high predictability of these structures (see, e.g., Jurafsky et al. Reference Jurafsky, Bell, Gregory, Raymond, Bybee and Hopper2001; Bell et al. Reference Bell, Jurafsky, Fosler-Lussier, Girand, Gregory and Gildea2003; Gahl & Garnsey Reference Gahl and Garnsey2004; Frank & Jaeger Reference Frank and Jaeger2008; Bell et al. Reference Bell, Brenier, Gregory, Girand and Jurafsky2009; Moore-Cantwell Reference Moore-Cantwell2013; Kurumada & Jaeger Reference Kurumada and Jaeger2015; Norcliffe & Jaeger Reference Norcliffe and Jaeger2016; Kurumada & Grimm Reference Kurumada and Grimm2017; for on overview, see also Rose Reference Rose2017: 3–4). For morphology, paradigmatic and syntagmatic predictability are kept apart (see, e.g., Cohen Reference Cohen2014; Rose Reference Rose2017). Paradigmatic predictability specifies the probability of occurrence of one particular form of a word paradigm, in contrast to the probability of occurrence of other forms of the same paradigm. Beyond this point, we do not consider paradigmatic predictability in the current paper. Instead, we focus on the concept of syntagmatic predictability, which describes how likely it is that a form occurs in a specific context or environment.
Some studies have analyzed the characteristics of s against the background of syntagmatic predictability. For Spanish, there is some, but overall inconclusive, evidence that the probability of reduction or deletion of s increases if the grammatical information expressed by the s is redundant and, hence, highly predictable (see, e.g., Poplack Reference Poplack1980; Hundley Reference Hundley1987; Erker Reference Erker2010; Torreira & Ernestus Reference Torreira and Ernestus2012). For instance, in un par de cervezas ‘a couple of beers’, the s attached to the noun is less important for the detection of the plural number value since un par de also signals plurality (see Hundley Reference Hundley1987: 893). For English, two studies are relevant in our context. Cohen (Reference Cohen2014) found, among other aspects, that the duration of the English word-final verbal s suffix, indicating singular agreement (e.g. reads), becomes shorter when the probability of singular agreement rises.
The key reference in connection to our present investigation is, however, another one. Rose (Reference Rose2017: 12–13, chapter 3) investigated the effect of syntagmatic predictability on the duration of word-final s in New Zealand English. On the basis of corpus data, she found that the s is reduced if it and the plurality of the noun are more predictable in the environment. For instance, the probability of a plural noun containing the s suffix is higher if a word like various precedes the plural noun than if a word like pretty appears. Rose's (Reference Rose2017) analysis revealed that only the preceding context (e.g. various) but not the following one has an impact on the duration of the plural s. On the one hand, her work supports our hypothesis that the s becomes longer if syntagmatic predictability is lower. On the other hand, since our own study to be presented in section 3 differs from Rose (Reference Rose2017) in several respects, it will contribute further insights into the effects of the environment on the acoustic realization of a suffix. A first, but minor, difference between Rose (Reference Rose2017) and our own analysis is the variety of English examined: while she concentrated on New Zealand English, our participants are speakers of North American English. Having access to data from more than one variety provides us with a broader picture of the subject. Second, while Rose (Reference Rose2017) restricts her analysis to the word immediately preceding or following the target plural noun, our test sentences contain only cases in which the second word before or the second word after the target plural noun represents, or does not represent, an additional plurality marker (e.g. The/These blue cabs always break/broke down). The advantage of our design is that we can exclude the potential influence of the phonetic environment on the target noun. That is, since blue is placed between the determiner and cabs, the distinct phonetic structure of the and these does not affect the acoustic realization of cabs. Third, while Rose (Reference Rose2017) relies on the automatically extracted s durations of the corpora, our data is segmented manually using a clearly defined protocol. Although her dataset is quite large, manual segmentation is overall more reliable, in particular if one considers conversational speech (see, e.g., Schiel, Draxler & Harrington Reference Schiel, Draxler and Harrington2011; Schuppler et al. Reference Schuppler, Grill, Menrath, Morales-Cordovilla, Besacier, Dediu and Martín-Vide2014). Most parts of the corpora used in Rose (Reference Rose2017) were based on interviews, which contain conversational speech. Fourth, Rose (Reference Rose2017) is not interested in the morphosyntactic phenomenon of number agreement, as we are, but collapses a quite diverse set of items that signal plurality to a greater (e.g. various, six) or smaller extent (e.g. pretty, of). We are, in contrast, specifically concerned with two types of plurality markers, namely these and present tense verbs. Fifth, Rose (Reference Rose2017) includes both the voiceless and voiced variant of the plural suffix; we, in contrast, concentrate on the voiced one only, since findings regarding the voiced /z/ are generally more homogenous than those for the voiceless /s/ (see section 2.2). Sixth, and crucially, the results from Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021), which form the origin of the present study, are not compatible with those from Rose (Reference Rose2017): while she concludes that the plural s is longer if the plurality can be less predicted on the basis of the preceding word, Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021) did not find an effect for the determiner, that is, the word appearing earlier than the target plural noun. Moreover, while Rose (Reference Rose2017) did not detect an effect for the word following the plural noun, Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021) found evidence to suggest that the verb tense, with the verb following the noun, plays a role in that the s duration increases for past tense verb forms. These conflicting results, together with the more reliable segmentation strategy and the benefits of our less diverse and neatly controlled experiment, explains the need for the novel study presented in the next section.
3 Methodology
We conducted a study in which subjects read sentences containing English plural nouns in four different agreement conditions, created on the basis of the two factors Determiner (the versus these) and Tense (present versus past). We investigated whether the duration of the word-final s depends on overt morphosyntactic agreement.
3.1 Subjects
Thirty-eight native speakers of North American English with a mean age of 29.3 years (SD: 6.6 years) participated in the study (24 female, 14 male). They had an academic background, corrected or corrected-to-normal vision, and declared no speech disorder.
3.2 Materials
Sixteen English nouns formed the center of the materials. They were monosyllabic, regular plurals, singular-dominant (had a higher frequency in the singular than in the plural), inanimate, and contained the voiced /z/ word-finally in the plural. The nouns were embedded in 16 different test sentences, which, in turn, had the four variants given in (1).
(1)
(a) The blue cabs always break down.
(b) The blue cabs always broke down.
(c) These blue cabs always break down.
(d) These blue cabs always broke down.
The four versions of each sentence differed with respect to (i) the determiner at the beginning of the sentence (the versus these) and (ii) the verb tense (present versus past). All of the 16 test sentences and the respective variants are presented in appendix A. In (1a), the determiner the does not specify the number value of the following noun, it could be both a singular and a plural noun form. As opposed to this, the verb form in (1a), a present tense form, clearly signals plurality, since the singular noun would take the verb form breaks. In (1b), neither the determiner nor the verb form indicates plurality, and could occur not only with a plural but also with a singular noun. In (1c), both the determiner and the verb tense signal plurality. Finally, in (1d), only the determiner does so. In sum, apart from the s suffix on the target noun (e.g. cabs), there are two additional plurality markers in (1c), one in (1a) and (1d), and none in (1b).
Each of the 16 sentences contained an irregular verb with the same number of syllables in the present and past tense, resulting in four different test versions with the same length (see (1)). As illustrated in (1), the four sentence variants were only minimally different from each other. With the exception of the determiner and the verb tense, the four variants of each sentence were exactly identical and we therefore controlled our test materials for syntactic, phonological and phonetic aspects. The s suffix and the target noun, whose durations were measured in the analysis, were placed in the same sentence type and position, and between the same words. Doing so, we further controlled for bigram frequencies of the sequences ‘preceding word + target noun’ and ‘target noun + following word’.
3.3 Procedure
The experiment was conducted in a silent room. Subjects were seated about 30 centimeters (12 inches) from a large-diaphragm condenser microphoneFootnote 4 and 60 centimeters (24 inches) from a computer screen.Footnote 5 The sentences were read silently first and then aloud while the subjects were recorded with Praat (Boersma & Weenink Reference Boersma and Weenink2020). All sentences were left-aligned, appeared in a single line in the middle of the screen, and were written in the same font type and size.
Participants produced each of the 16 test sentences in the four conditions introduced in (1), reading out a total of 64 test sentences. Subjects therefore served as their own control, and we balanced the study for the issue of inter-subject variation. Moreover, we included 64 filler sentences in order to minimize the influence of one version of a sentence on the same sentence in another condition. A further 31 sentences were placed between one version of a sentence (e.g. (1a)) and the next variant of the same sentence (e.g. (1b)). The order of the four experimental conditions described in (1) was counterbalanced both within and across subjects. Also, the order of the items varied across participants.
3.4 Data analysis
3.4.1 Data preparation and segmentation
A total of 2,432 test cases (38 subjects x 64 test cases per subject) were part of the experiment. The dataset was reduced by 98 files (4%) due to slips of the tongue and technical problems. The remaining 2,334 sound files were phonetically segmented in Praat. All productions of a particular noun (e.g. cabs) from the same speaker were analyzed together in order to increase the segmentation consistency. Both the spectrogram and the waveform were used to detect the beginning and end of the word-final [z]. Spectrum settings of 5,000 to 11,000 Hertz (Hz) facilitated the recognition of the fricatives. We relied on the acoustic characteristics of the fricative and segmentation steps from the literature to develop an appropriate segmentation strategy (see, e.g., Ladefoged & Maddieson Reference Ladefoged and Maddieson1996; Ladefoged Reference Ladefoged2003; Turk, Nakai & Sugahara Reference Turk, Satsuki Nakai, Sugahara, Sudhoff, Lenertová, Meyer, Pappert, Augurzky, Mleinek, Richter and Schließer2006; Machač & Skarnitzl Reference Machač and Skarnitzl2009; Schlechtweg & Härtl Reference Schlechtweg and Härtl2020), which was the same as the one used in Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021) (see also figure 1). That is, increased energy in the higher frequencies, visible in the spectrogram, functioned as the primary criterion to find the beginning and end of the target fricative. Visible fricative noise in the waveform represented the second criterion. If the two criteria did not coincide, priority was given to the primary one.
3.4.2 Statistical analysis and modeling
Having segmented the sound files, we first considered the simple descriptive statistics of the data. In a neatly controlled study like ours, these values give us a first idea of how the different conditions behave. Further, we relied on the program R (R Core Team 2021), the lme4 package (Bates et al. Reference Bates, Maechler, Bolker and Walker2015), and the lmerTest package (Kuznetsova et al. Reference Kuznetsova, Brockhoff, Bojesen Christensen and Jensen2020) to statistically analyze the data using linear mixed effects models (see, e.g., Winter Reference Winter2020).Footnote 6 Models were fitted for the two response variables DurationSuffix (= absolute s duration) and RelativeDurationSuffix, the latter being defined as the quotient of the absolute s duration and the absolute word duration.Footnote 7 For each of the two response variables, the following steps were implemented.
Statistical outliers in the absolute or relative s durations, defined as values plus and minus 2.5 standard deviations from the mean (see, e.g., Loewen & Plonsky Reference Loewen and Plonsky2016: 134), were discarded from the dataset. The s durations were then log transformed (to the base 10). Determiner (the versus these), Tense (present versus past) and their interaction were entered as the central fixed effects in the models. Log10SpeechRate_z, the log-transformed (to the base 10), centered and standardized speech rate, represented a control fixed effect. Speech rate refers to the quotient of the number of syllables of the whole sentence and the duration of the sentence measured in seconds. We further included Log10Frequency_z, the log-transformed (to the base 10), centered and standardized frequency of the target nouns as specified in the Google Books Ngram Viewer (https://books.google.com/ngrams) for American English, and Bigram_z, the centered and standardized counts of the sequence ‘target noun + following word’ (e.g. cabs always) in the Google Books Ngram Viewer, in the initial model. Due to zeroes in the dataset, the bigrams were not log transformed. Note that our experiment had actually been controlled for many aspects prior to the study. Since we used the same nouns and sentences in all conditions (with the exception of the/these and the verb tense), the frequencies and bigrams were balanced across the conditions. Nevertheless, we examined whether the two play a role overall in that, for instance, higher frequency triggers shorter s durations. Since we are interested in the duration of the suffix / the end of the word, we consider the bigram ‘target noun + following word’ only (and not the bigram ‘preceding word + target noun’).
For each response variable, we started model fitting with a model with the maximal random effects structure, consisting of the intercepts for Subject and Item and the four random slopes for Determiner by Subject, Determiner by Item, Tense by Subject and Tense by Item. Three of these four random slopes did not remain in the model since the maximal and the other random effects structures (i.e., those with the two intercepts and three, two or one random slope(s)) were not appropriate (‘Singular fit’ issue) and therefore manually and in a step-by-step manner simplified. It is well known from the literature that complex random effects structures can cause problems (see, e.g., Barr et al. Reference Barr, Levy, Scheepers and Tily2013; Matuschek et al. Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017; Cohen & Kang Reference Cohen and Kang2018; Martin Schweinberger p.c.), hence we opted for the reduced model. In the analysis of the absolute s durations, the only model containing (a) random slope(s) that was appropriate was the one with the slope for Determiner by Item; in the analysis of the relative s durations, it was the model with the slope for Determiner by Subject. The two random intercepts were part of these models, too.
The models containing the fixed and random effects structure as specified above were then reduced step by step by removing non-significant fixed effects from the model. Non-significant factors were excluded on the basis of the R column ‘Pr(>|t|)’, removing the factor with the highest value and a value greater than 0.05 at each step.Footnote 8 Once we had a model with significant fixed effects only, we additionally verified whether the criteria mentioned in Plag et al. (Reference Plag, Homann and Kunter2017: 194) went in the same direction. Plag et al. (Reference Plag, Homann and Kunter2017: 194) relied on three criteria, or tests, to decide whether a specific factor remained in the model. The first criterion refers to the t-statistics, which had to be greater than 2 or smaller than -2 for a factor to remain in the model. Moreover, a significant improvement of the fit of the model should occur if the factor is part of the model, in comparison to the model without the factor, and this would be indicated by a p value smaller than .05 when contrasting the model with and the model without the respective factor in an ANOVA. Finally, the Akaike Information Criterion (AIC) needed to be smaller if the factor was in the model, in comparison to the model without the factor (see also, e.g., Pinheiro & Bates Reference Pinheiro and Bates2000: 10; Wu Reference Wu2010: 90).
After completion of the manual reduction of the model, we additionally performed an automatic elimination of the non-significant fixed effects using the step function of the lmerTest package (Kuznetsova et al. Reference Kuznetsova, Brockhoff, Bojesen Christensen and Jensen2020; see also, e.g., Lohmann Reference Lohmann2020: 436) to see whether the result is the same.
3.5 Results
Figures 2 to 7 summarize the descriptive statistics of the datasets without statistical outliers, for the absolute and relative s durations, respectively.
Overall, the differences between the mean values of the individual groups are subtle, and in some cases, there is no difference at all. In an additional step of the descriptive analysis, we examine how consistent and stable the results detected so far are by using a method applied in Durvasula & Liter (Reference Durvasula and Liter2020: 197–8) (see also Schlechtweg & Corbett Reference Schlechtweg and Corbett2021). For this purpose, consider figure 8. We see the cumulative absolute suffix durations of the four conditions for the 38 subjects. That is, ‘1’ on the x-axis refers to the average absolute suffix durations of the four conditions of the first subject only. ‘6’, however, does not simply refer to the sixth subject, but to the cumulative average absolute suffix durations of the four conditions of the first six subjects. Looking at this graph, we get an idea of the development of our results with more and more subjects. We see that the development of the four conditions is comparable and homogeneous starting approximately at ‘21’ on the x-axis. Put differently, once 21 subjects had been tested, the curves of the conditions developed in more or less the same way. On the basis of this figure, we have no reason to assume that drastic changes between the conditions would arise if more subjects participated in the experiment. Hence, we can say that the picture drawn above is robust, and the differences across the conditions are consistently small.
To sum up our findings so far, we can say that the differences between groups are either small or absent, and this trend is stable and robust. Nevertheless, an inferential statistical analysis is still needed to verify whether the differences are significant, even if they are small. Further, taking a potential influence of speech rate into account is essential, even in a thoroughly controlled experiment. The results for the fixed effects of our final mixed-effects models, after the exclusion of non-significant fixed effects, are given in tables 3 and 4; the results for the random effects are given in appendices B and C. Note that the same fixed-effects structures were found in the automatic analysis with the step function.
First of all, and unsurprisingly, the s duration decreases with increasing speech rate, which is expressed in the negative estimate for Log10SpeechRate_z. This holds for both the analysis of absolute and the analysis of relative durations. Second, and interestingly, we detect an effect of Determiner in the analysis of the absolute s durations, that is, s durations are longer when the appears as the determiner in contrast to when these occurs. The difference is expressed by the fact that the estimate of the log transformed s duration of Determinerthese is negative and thus smaller than the intercept, which represents the baseline Determinerthe. The difference, if back-transformed from the logarithm, is about 0.0036 seconds. The criteria mentioned in Plag et al. (Reference Plag, Homann and Kunter2017: 194) support the findings, both for Determiner and Log10SpeechRate_z. That is, the t statistics of the significant fixed effects are smaller than -2, each factor significantly improves the fit of the model, and the AIC is smaller if the factor is part of the model. Hence, we can state that (i) the s is longer in absolute terms in combination with the in comparison to these and (ii) the s duration increases with decreasing speech rate.
4 Summary and discussion
Previous research has shown that the duration of the English word-final s depends on both its function and its context. There are two competing factors here. On the one hand, variation exists between different types of s, such as affixal and non-affixal s, or different types of affixal s (see, e.g., Plag et al. Reference Plag, Homann and Kunter2017). On the other hand, reduction and lengthening of the s is connected to its predictability in a given context (see, e.g., Rose Reference Rose2017). The current article expanded research of the second type and examined whether overt morphosyntactic agreement affects the duration of affixal s. Two major results emerged in the analyses. First, noun–verb agreement did not affect the s duration. That is, the suffix duration on the noun did not differ when there was a past tense verb form following (hence no overt agreement) as compared to when there was a present tense verb (hence overt agreement). The noun–verb agreement effect was found in Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021) but could not be replicated here. It is possible that the effect detected in this earlier study derived from other differences in the test sentences. Crucially, the current study was carefully controlled for such possible factors and included a much larger dataset, and the effect did not arise. Second, noun–determiner agreement did affect the duration of the s in the expected direction. This effect, a subtle one, occurred in the analysis of the absolute duration. If the preceded a plural noun (no overt noun–determiner agreement present), the s was longer than if these was used (overtly agreeing with the plural noun). In sum, our experiment gives slight evidence in favor of the idea that the s is reduced if the plural noun has an agreeing determiner (these). Noun–verb agreement, in contrast, has no impact on the duration of s.
There are two aspects which force us to interpret the results with caution: first, the differences between the the and these conditions are small and, second, significance between the two was only reached in the analysis of the absolute durations. Without an effect of relative durations, we do not have evidence that the percentage the suffix takes within the word increases if the precedes the target noun. Nevertheless, we must keep in mind that our results are based on a large dataset (2,307 test cases in the absolute duration analysis and 2,314 test cases in the relative duration analysis), which increases the reliability of the findings. Looking at other comparable studies on the duration of the English s, we see that our experiment is far more comprehensive than the investigations conducted by, for instance, Walsh & Parker (Reference Walsh and Parker1983), Schwarzlose & Bradlow (Reference Schwarzlose and Bradlow2001), Plag et al. (Reference Plag, Homann and Kunter2017), Seyfarth et al. (Reference Seyfarth, Garellek, Gillingham, Ackerman and Malouf2018), Schmitz et al. (Reference Schmitz, Baer-Henney and Plag2021) and Schlechtweg & Corbett (Reference Schlechtweg and Corbett2021). Therefore, we believe that the effect we detected for the absolute suffix durations is not irrelevant and is discussed in more detail below.
Several established psycholinguistic and linguistic models with a feed forward spirit (e.g. Levelt Reference Levelt1989) have been criticized on the basis of empirical data over the last two decades. Their theoretical conceptions seem to be too rigid and inflexible when it comes to the interplay of different types of linguistic information and cannot explain, for instance, why the acoustic output is affected by morphological complexity, since no connection between the two domains is assumed in such theories. The effect detected in the present experiment is equally incompatible with models of the above-named character. In a strict feed forward world, the word form of the English plural noun would be created, its discrete phonological units would be specified, and the acoustic sequence would be realized. Since the phonological structure is identical independently of whether the or these precedes the noun in the sentence, no acoustic distinctions are expected. There is some evidence for a contrast in our study, however, and this calls for a more flexible approach, as described in, for instance, Dell (Reference Dell1986) and Pierrehumbert (Reference Pierrehumbert, Bybee and Hopper2001, Reference Pierrehumbert, Gussenhoven and Warner2002), allowing the possibility that higher-order domains such as morphosyntax can have a direct connection to phonetics and the concrete realization of a word or word part.
The direction of the determiner effect, with the leading to a longer s, finds support in the literature. In a phrase containing the, the s is more informative in that it signals the number value alone, or, more precisely, without an additional plurality indicator on the determiner. In contrast, if these precedes a plural noun, it already specifies that the following noun is a plural one and the s does not contribute a new piece of information. Previous literature has shown that more informative elements are lengthened (e.g. Engelhardt & Ferreira Reference Engelhardt and Ferreira2014), and this is what happens in our data, too: if the s is preceded by the and plays the crucial role in the expression of plurality, it is longer, in comparison to cases with these in the determiner position. Considering syntagmatic predictability, there has been evidence that the s is enhanced if it is less predictable (e.g. Rose Reference Rose2017). So, if words like various precede a regular plural noun, they tell us that the noun must contain the s and the s can be reduced. Other words, like pretty, are neutral in turn and do not predict the occurrence of s, which is therefore likely to be lengthened. Again, our effect fits in nicely here, since the s turned out to be longer if the determiner (the) did not predict its occurrence.
Thus while the effect that we report is surprising, it has a reassuring regularity. In the current experiment the effect is found with a plural determiner but not with a past tense verb. The reverse would have been truly remarkable: it would imply that the length of affixal s is affected by the presence or absence of overt agreement on the verb, which is still to be pronounced. What is the possible basis for the difference between our result for attributive agreement and predicate agreement? There are two candidates: syntactic structure and linear precedence. In our sentence these blue cabs always break down, the determiner these is within the same nominal phrase as cabs, while break is more distant syntactically. Equally, these precedes cabs while break follows. Both syntactic structure and linear precedence are well established as affecting agreement (Corbett Reference Corbett2006: 180, 206–30), and could explain why we detected an effect for Determiner but not for Tense.
5 Conclusion
It is by now well known that fine acoustic detail can mirror different types of linguistic information. A case in point in this research area is the duration of the English word-final s, which has been shown to be modulated by speakers on the basis of both its function and context. On the functional side, affixal and non-affixal s differ, and even distinct types of affixal s are heterogenous. The current experiment adds a further piece of evidence supporting the idea that the s duration is also adjusted in specific contexts: overt noun–determiner agreement leads to a reduction of the s. The effect is subtle, of course, and needs to be replicated, but is compatible with research arguing for a more significant and flexible role of the acoustic output in language.
Appendix A: Test sentences
Appendix B: Random effects statistics of the mixed-effects model of absolute s durations in seconds