1. INTRODUCTION
1.1. Aims of this study
The mid vowels of Standard (Parisian) French, comprising three pairs of vowels with two degrees of aperture – close-mid /e, o, ø/ and open-mid /ɛ, ɔ, œ/ – have been described as subject to a set of phonetic shifts throughout the 20th and 21st centuries. Besides the fronting of /ɔ/ (Boula de Mareüil et al., Reference Boula de Mareüil, Adda-Decker and Woehrling2010), the changes mainly involve vowel height evolution, leading to a partial convergence of close-mid and open-mid qualities, respectively, within the mid unrounded front /e, ɛ/ and rounded back /o, ɔ/ vowel pairs, in word-final and other syllabic positions (Hansen and Juillard, Reference Hansen and Juillard2011; Gendrot and Audibert, Reference Gendrot and Audibert2019). A specific development of mid vowels in penultimate position has also been reported (Cecelewski et al., Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023), consisting of a reduction in the extent of Vowel Harmonisation (or Vowel Harmony, VH), which are the traditionally adopted terms in French phonology to describe the partial assimilation of mid vowels in height to the final vowel.
Considering these changes, along with the merger of the low vowel pair /a/ - /ɑ/ into a single phoneme (Hansen, Reference Hansen2001; Cecelewski et al., Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2024), the merger of /œ̃/ with /ɛ̃/ (Walter, Reference Walter1976), and the chain shifts affecting the remaining nasal vowels /ɛ̃, ɑ̃, ɔ̃/ (Hansen, Reference Hansen2001), surprisingly, few experimental studies have been dedicated to phonetic change within the French vowel system. Recent technological advances in both computational techniques and data collection now present significant advantages, enabling robust and reliable real-time acoustic studies of sound change (Yaeger-Dror, Reference Yaeger-Dror and Keating1994; Stuart-Smith et al., Reference Stuart-Smith, Brian, Rathcke, Macdonald, Lawson, Montgomery and Moore2017; Barkat-Defradas et al., Reference Barkat-Defradas, Fauth, Demolin and Suire2022), which offer original insights into the understanding of evolutionary dynamics of speech over short diachronic periods.
In this study, we aim to examine the diachronic pathway of the empirical counterpart of the phonological category of aperture of mid vowels in Standard (Parisian) French – /e, ɛ, o, ɔ/ – to compare historical impressionistic observations and phonological survey results with real-time data (Bailey, Reference Bailey, Chambers and Schilling2013) spanning nearly a century, from 1925 to 2023. The phonological behavior of mid vowels in French is governed by a set of rules, including the positional law and VH, which interact with phonotactic constraints and a complex lexical or etymological distribution. It would be impractical to examine the diachronic development of all these components in a single analysis. Therefore, we limit our focus to the development of the open-mid vs close-mid contrast in word-final, historically tonic, positions, as well as the diachronic pathway of VH affecting mid vowels in penultimate syllables. We will attempt to show that the phonetic shifts affecting mid vowels in French during the 20th century was paralleled by changes in the realization of VH. By doing so, we aim to demonstrate that phonetic change can impact not only the general arrangement of acoustic targets of specific segments in the vowel space but also the realization of more subtle phonological categories, such as the propensity of vowels to undergo distant assimilation.
The variability in quality of archive recordings made under conditions beyond experimental control has long been considered a hindrance to the growth of experimental diachronic phonetics in the form of real-time studies. Working with nearly century-old recordings, whose speaking style has undergone significant changes compared to more recent periods, led us to supplement a real-time study with an imitation experiment, designed to better understand the behavior of the acoustic parameter studied, the first formant (F1), in the analysed corpora. In particular, we will focus on the potential impact of a declamatory and hyperarticulated speaking style, such as that observed in the archives from the 1920s to the 1950s, on the extent of VH in French.
In the following sections, we will provide the historical context for the description of mid vowels in French during the period studied. This will be complemented by elements of philological research on the evolution of the use of the term vowel harmonisation/vowel harmony as applied to French. Next, we will present the results of a real-time study of mid vowel aperture, combined with an imitation-based experiment, which aims to better isolate and describe phonetic changes that affected French mid vowels between 1925 and 2023.
As the discussion revolves around the evolution of the degree of aperture and not that of anteriority, we will focus solely on the analysis of F1 peaks. We acknowledge that changes over the last century have also affected the second formant, the acoustic correlate of the degree of anteriority (Landick, Reference Landick1995; Boula de Mareüil et al., Reference Boula de Mareüil, Adda-Decker and Woehrling2010). However, due to space constraints and to align with the phonological definition of VH, the degree of anteriority will not be addressed in our study.
1.2. Mid vowels in French and positional law
A brief overview of the literature dedicated to the distribution, phonological representation, or empirical counterparts of the mid vowel subsystem in Standard (Parisian) French leads to two overall observations – on the one hand, there is no consistency throughout studies on the exact distribution of close-mid and open-mid vowels, and this lack of consistency appears to reflect large variations in quality and a profound instability in intermediate realizations that do not easily lend themselves to an unambiguous phonemic classification (Léon, Reference Léon1978; Malmberg, Reference Malmberg1969; Morin, Reference Morin1986; Baraduc et al., Reference Baraduc, Bergounioux, Castellotti, Dumont and Lansari1989; Durand and Lyche, Reference Durand, Lyche, Coveney and Sanders2004).
Among the sources of variation in the realization of mid vowels are the type of postvocalic consonant, the positional law (loi de position), and VH. The positional law refers to a tendency for open-mid vowels to become close-mid in open syllables and close-mid vowels to become open-mid in closed syllables. The prevalence of this diachronic law is different in regional varieties of French, the law acting more uniformly and accepting fewer exceptions in southern France (Morin, Reference Morin1986; Spence, Reference Spence1988; Storme, Reference Storme2017). The action of the positional law is more difficult to determine, so much so that in Modern French, the trend is rather towards the opening of /o/ in all non-final syllables (Straka, Reference Straka1950: 275). Valdman (Reference Valdman and Suner1978) completely rejects the positional law in French, and Morin (Reference Morin1986) proposes that the quality of contemporary vowels is related to their phonological length in Middle French rather than to synchronic syllable structure.
In the conservative norm of the Standard (Parisian) French variety, /e/ is prohibited in closed syllables, and /ɔ/ is prohibited in absolute final position, but there are still pairs such as pré ‘meadow’ /pʁe/ - prêt ‘loan’ /pʁɛ/ where there is an opposition between /e/ and /ɛ/ in open final syllable, or paume ‘palm’ /pom/ - pomme ‘apple’ /pɔm/ where there is an opposition between /o/ and /ɔ/ in closed final syllables.
In non-final syllables, specifically in penultimate syllables, which we discuss here, three constraints may be considered to govern the distribution of mid vowels in French: the positional law, VH, and faithfulness to the root (Durand and Lyche, Reference Durand, Lyche, Coveney and Sanders2004). Additionally, in all syllables, there is an influence, presumably less systematic, of spelling and analogy effects (Morin, Reference Morin1986: 204). The combination of all these constraints leads to the realization of a more central vowel quality regarding aperture (or height) in penultimate position – a fact recognized since Rousselot and Laclotte (Reference Rousselot and Laclotte1902) – but it does not completely neutralize the oppositions, at least in a formal speaking style of Standard (Parisian) French (Landick, Reference Landick2004).
Normative works maintain the opposition between /e/ and /ɛ/ in open penultimate syllables, for example, pêché ‘to fish, PTP’ /pɛʃe/ ∼ péché ‘sin’ /peʃe/ or raisonne ‘he/she/it reasons’ /ʁɛzɔn/ ∼ résonne ‘he/she/it resonates’ /ʁezɔn/. However, we can expect the combined action of the positional law and VH, which would tend to close the /ɛ/ of pêché, both because of the open penultimate syllable position and the presence of a close-mid /e/ in the final syllable. (Yet, the two rules conflict in the example where the positional law would partially close the /ɛ/ in résonne in an open syllable, while VH would keep it open-mid before an open-mid /ɔ/.)
The /o/ - /ɔ/ pair is governed by more specific phonotactic constraints not known to the /e/ - /ɛ/ pair. Specifically, in word-final syllable, /o/ never appears before /ʁ/, /ɡ/ and /ɲ/, and /ɔ/ never appears before /j/ (Léon, Reference Léon1978: 57). Landick (Reference Landick2004: 29), however, notes rare exceptions to these rules, e.g., saur /soʁ/ and goy /gɔj/. In penultimate syllable, /ɔ/ and /o/ retain their identity, but only /o/ appears before /z/ (Malmberg, Reference Malmberg1969). The spellings <eau, au, ô> and <o> followed by a voiced <s> (/z/) correspond to /o/ everywhere, and in some words ending in -osse, -one and -ome, but never if <au> is followed by <r>, in which case it is pronounced /ɔ/ (Malmberg, Reference Malmberg1969).
1.3 Mid vowels in Vowel Harmonization contexts
VH in French is usually defined as a word-level anticipatory assimilation process involving the partial assimilation in aperture of mid vowels in penultimate syllable (V1) to the word-final (V2) vowel. (Let us remind the reader here that the French stress falls on the word-final (non-schwa) syllable of an accentual group.) This can be observed in the contrasting pronunciations of V1 in words like fêtard ‘party-goer’ [fɛtaʁ] and fêter ‘to celebrate’ [fe̞te], or collègue ‘colleague’ [kɔlɛg] and colline ‘hill’ [ko̞lin]. This essentially partial assimilation had, for a certain period, been described in terms of derivational rules, thus being treated as triggering a shift from the category of open-mid vowels to close-mid vowels and vice versa (Morin, Reference Morin1971; Selkirk, Reference Selkirk1972; Dell, Reference Dell1973; Casagrande, Reference Casagrande1984; Tranel, Reference Tranel1987). The examples mentioned above should therefore be transcribed as [fete] instead of [fe̞te] and [kolin] instead of [ko̞lin]. However, it is worth noting that these authors have already indicated that the rules they outlined should be treated as ‘tendencies’ (Morin, Reference Morin1971: 98) or as optional rules (Dell, Reference Dell1973), with the essentially gradual and highly variable nature of VH also confirmed by experimental acoustic studies (Fagyal et al., Reference Fagyal, Nguyen and Boula de Mareüil2003; Nguyen and Fagyal, Reference Nguyen and Fagyal2008; Turco et al., Reference Turco, Fougeron and Audibert2016a, Reference Turco, Fougeron and Audibert2016b).
In impressionistic studies and ancient treatises, VH has been the subject of numerous detailed discussions, as outlined by Landick (Reference Landick2004) and Fagyal et al. (Reference Fagyal, Nguyen and Boula de Mareüil2003). Two main areas of divergence arise: first, authors differ in the number and quality of V1 involved in VH. The pair /e/ - /ɛ/ has been universally accepted as subject to VH, sometimes being the sole focus (Fouché, Reference Fouché1956; Selkirk, Reference Selkirk1972), to which some authors add either /ø/ - /œ/ (Straka, Reference Straka1950; Malmberg, Reference Malmberg1969), only /o/ - /ɔ/ (Grammont, Reference Grammont1914), or both pairs (Tranel, Reference Tranel1987). Second, authors diverge in terms of the subset of V2 vowels that can trigger VH, sometimes only V2 ∈{i, y, u} (Fouché, Reference Fouché1956), extending to all non-low vowels for Morin (Reference Morin1971).
Due to these divergences, we adopt, in this real-time study of a diachronic corpus, the definition of VH recently coined by Turco et al. (Reference Turco, Fougeron and Audibert2016a, Reference Turco, Fougeron and Audibert2016b) to study this effect in large corpora of continuous speech, i.e., in a broader lexicon, without limiting to quasi-minimal pairs of VH, e.g., béquille ‘clutch’ ∼ bécasse ‘woodcock’, notice ∼ nota typically used in experimental studies based on the reading of laboratory sentences (Fagyal et al., Reference Fagyal, Nguyen and Boula de Mareüil2003; Nguyen and Fagyal, Reference Nguyen and Fagyal2008). Specifically, Turco et al. (Reference Turco, Fougeron and Audibert2016a, Reference Turco, Fougeron and Audibert2016b) did not exclude V1 in closed syllables, nor did they define a specific repertoire of V2, accepting all French vowels in final syllable, dividing them into two classes of open or open-mid vowels (hereafter referred to as “non-high”Footnote 1 ), including /ɛ, ɛ̃, œ, a, ɑ̃, ɔ, ɔ̃/ and close or close-mid vowels (hereafter referred to as “non-low”), including /i, y, e, ø, o, u/.
In our analysis, we will distinguish neutral and VH contexts of mid vowels in penultimate syllable based on the degree of aperture of V1 (open-mid or close-mid) and the degree of aperture of V2 (non-high or non-low), as displayed in Figure 1. The transcription adopted here for mid vowels in penultimate syllables adheres to the vowel’s aperture in the derivational base; therefore, rêver ‘to dream’ was considered to have V1 /ɛ/ because of rêve ‘dream’ [ʁɛv].
In the imitation-based experiment using a corpus designed specifically for this study, the definition of VH is the same, distinguishing between an opening VH affecting /e, o/ and a closing VH affecting /ɛ, ɔ/. However, we adopt a more restrictive choice of contexts, relying on the lists of quasi-minimal pairs used in the line of studies on VH in French using this type of corpus (Fagyal et al., Reference Fagyal, Nguyen and Boula de Mareüil2003; Nguyen and Fagyal, Reference Nguyen and Fagyal2008).
1.4 Evolution of /e, ɛ, o, ɔ/ in the recent history of French
A commonly accepted idea that runs through the literature on the subject is the presumed instability, even phonological confusion (Hansen and Juillard, Reference Hansen and Juillard2011), of two-quality vowel pairs /e/ - /ɛ/ and /o/ - /ɔ/ which have been at play in Standard (Parisian) French since at least the mid-20th century (Martinet, Reference Martinet1945, Reference Martinet1969). Other authors note intermediate qualities where a more distinct contrast between the two phonemes was expected (Peretz, Reference Peretz1977; Peretz-Juillard, Reference Peretz-Juillard and Houdebine1985; Landick, Reference Landick1995).
While two-quality vowel pairs have been described as leading to a loss of distinctiveness, each of them seems to have its own dynamics and evolutionary path. The opposition between /e/ and /ɛ/ is most frequently discussed, but it is far from unanimous among researchers. If, as Léon (Reference Léon and Rondeau1973) suggests, a relatively normative use of /e/ and /ɛ/ in word-final positions characterizes the speech of working classes, others have reported an overuse of the close /e/ quality within the same sociolect, alongside a rather hypercorrective use among speakers with higher social status (Peretz, Reference Peretz1977). A more recent empirical analysis confirms a significant acoustic convergence between /e/ and /ɛ/ in word-final position in spontaneous speech compared to the journalistic speaking style (Gendrot and Audibert, Reference Gendrot and Audibert2019) – the merger is widespread in several French varieties (Armstrong and Pooley, Reference Armstrong and Pooley2010; Boula de Mareüil et al., Reference Boula de Mareüil, Woehrling and Adda-Decker2013; Hall, Reference Hall2019). Other isolated trends and potential sources of confusion were also noted, such as an overrepresentation of the close-mid quality before /ʁ/ among young speakers from the Paris suburbs (Conein and Gadet, Reference Conein, Gadet, Androutsopoulos and Scholtz1998).
The opposition of /o, ɔ/, on the other hand, seems to remain the least affected of the four pairs of two-quality vowels. Some orthographic criteria (or lexical constraints manifested in word spelling) may come into play: ‘au’, ‘eau’ and ‘ô’ tend to be pronounced /o/, and /ɔ/ is considered the underlying form of the graphic ‘o’ elsewhere than in open final syllable (Walter, Reference Walter1976; Walker, Reference Walker2001; Boula de Mareüil et al., Reference Boula de Mareüil, Woehrling and Adda-Decker2013). In addition, the results of a panel survey conducted by Hansen and Juillard (Reference Hansen and Juillard2011) show a shift toward the loss of this distinction from the 1980s to the 2010s.
As for mid vowels in penultimate syllables, one can wonder whether, alongside the general trend towards the acoustic convergence of open-mid and close-mid vowels, the effect of constraints such as the positional law or VH itself has evolved. Regarding the positional law, the interpretation proposed by Fouché (Reference Fouché and Dauzat1935), but not yet verified experimentally, is that it reflects a diachronic tendency to regularize the system, which emerged at the end of the 15th century and “n’a pas encore épuisé toutes ses possibilités”Footnote 2 (Fouché, Reference Fouché and Dauzat1935: 47). According to Spence (Reference Spence1988), Northern French is still subject to the effects of this active law, with Southern French being in a more advanced state of its impact.
Regarding VH, it has long been described as stylistically marked, belonging to a spontaneous speaking style, and absent from a formal, elevated, or more controlled speaking style (Fouché, Reference Fouché1956; Tranel, Reference Tranel1987). More recently, a study by Turco et al. (Reference Turco, Fougeron and Audibert2016a) on VH realizations in two large corpora of continuous speech highlighted a significant interaction between the F1 of harmonized (penultimate) vowels and the speaking style. The results were surprising, revealing that the degree of VH was significantly higher in the more conservative journalistic speech, as compared to casual speech. Drawing from these findings, Cecelewski et al. (Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023) suggested that the observed synchronic variation between these two speaking styles might mirror a diachronic change in the extent of VH and concluded that the degree of VH tended to decrease from the 1940s to the 1990s in a corpus of journalistic archives, which is part of the diachronic corpora from 1925 to 2023 analysed here. These results are all the more interesting as, contrary to expectations, VH appears to be most relevant in the speech production of professional speakers from a generation taught by “professeurs de diction qui n’admettent pas l’harmonisation vocalique”Footnote 3 (Straka, Reference Straka1950: 276, cited by Landick, Reference Landick2004: 46).
The evolution of mid vowels in penultimate syllables, assuming the trend towards a reduction in the extent of VH, seems logically linked to the ongoing evolution of these vowels in the word-final position, described as a tendency to shift the pronunciation of mid vowels towards more closed qualities and closer acoustic proximity to each other. The results of the previous studies lead us to formulate the hypothesis that an acoustic convergence in terms of aperture has affected /e, ɛ/ on the one hand and /o, ɔ/ on the other, in word-final position, in the period under investigation.
As for mid vowels in penultimate syllables (VH contexts), the analysis by Turco et al. (Reference Turco, Fougeron and Audibert2016a) and Cecelewski et al. (Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023), which defined mid vowels as archiphonemes V1 ∈ {E, O}, as in Gadet’s (1989) minimal system, overlooked a potential distinct development of VH for the four phonological mid vowels, V1 ∈ {e, ɛ, o, ɔ}, in neutral and VH contexts. Our study aims to refine the diachronic analysis of VH by separately examining the four mid vowels and distinguishing between closing VH on /ɛ, ɔ/ and opening VH on /e, o/. Our hypothesis thus posits that VH has reduced during the period under investigation, impacting the four mid vowels V1 ∈ {e, ɛ, o, ɔ}.
The interest in distinguishing the four mid vowels in penultimate syllables instead of two archiphonemes is the following: the literature review suggests that we are dealing with either a common trend or two distinct trends. The common trend would consist of an acoustic convergence of /e, o/ and /ɛ, ɔ/, leading to a reduction in the extent of VH as described by Cecelewski et al. (Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023). In this case, the prediction would be that if one of the pairs of vowels – /e, ɛ/ or /o, ɔ/ – shows convergence in terms of vowel quality, the vowels forming it would be affected by a reduction in the extent of VH during the same periods. If, on the other hand, these are two distinct trends, one towards acoustic convergence of mid vowels in word-final position and the other towards the reduction of the extent of VH, one might expect that different vowels would be affected by these trends in different ways, or possibly in different periods. The interest in the chronological extent of the data presented here (1925–2023) also lies in the ability to effectively distinguish different periods of evolution for each of the trends across different vowels among /e, ɛ, o, ɔ/. In this case, we might suppose that the reduction of VH is not a result of the convergence of close-mid and open-mid vowels in terms of aperture and therefore deserves a different interpretation within the scope of the evolution of the French vowel system.
1.5 Hypotheses
The predictions formulated thus lead us to put forward the following hypotheses:
Hypothesis 1 The oppositions between /e/ and /ɛ/, as well as between /o/ and /ɔ/, undergo acoustic convergence in word-final position in terms of aperture (F1), with a preference for closed variants during the period under investigation.
Hypothesis 2 The extent of VH tends to decrease for the four mid vowels /e, ɛ, o, ɔ/ in the period under investigation.
1.6 A few notes on VH-related terminology in the history of French phonology
From the very first experimental studies, a debate arose about the nature of VH in French. Indeed, its gradual and highly variable characteristics challenge its traditionally assumed phonological status (Fagyal et al., Reference Fagyal, Nguyen and Boula de Mareüil2003; Nguyen and Fagyal, Reference Nguyen and Fagyal2008), making the adoption of the term “vowel harmony”, which refers to a phonological phenomenon, delicate (Jakobson et al., Reference Jakobson, Fant and Halle1952: 41; Walker, Reference Walker2012: 575). Fagyal et al. (Reference Fagyal, Nguyen and Boula de Mareüil2003) reviewed various possible terms, including “umlaut” (mainly used in German linguistics), and recommended adopting the term “metaphony,” more familiar within the tradition of Romance linguistics and, unlike “vowel harmony”, specifically referring to regressive assimilation (Fagyal et al., Reference Fagyal, Nguyen and Boula de Mareüil2003: 17). In this section, we aim to contribute to this debate by providing some historical elements on the origin of the term “vowel harmony” in the French phonological tradition.
Historically, the phenomenon of vowel-to-vowel assimilation affecting mid vowels in penultimate syllable, as described by Rousselot and Laclotte (Reference Rousselot and Laclotte1902), did not warrant a specific term until 1916, when Grammont introduced the term “harmonisation” (Grammont, Reference Grammont1916: 23). However, during this period, the French school of historical and comparative linguistics was already acquainted with the term “harmonie vocalique” and, less frequently, “harmonie des voyelles” (Adam, Reference Adam1874). The French term “metaphonie” was introduced 20 years later by Henry in his comparative grammar of English and German (Henry, Reference Henry1893, who borrowed morphological elements from “umlaut” to create a terminological equivalent through scholarly translation into Greek). The English term “metaphony” came into being, a year later, in an abbreviated version of his grammar treatise (Henry, Reference Henry1894). We can thus assume that Grammont was familiar with these two terms. Nevertheless, Grammont did not use the term “metaphony”, albeit referring to regressive assimilation phenomena, often related to vowel height, typically affecting a non-final vowel under the influence of the word-final vowel. He did not use the term “harmonie” either, but instead introduced a new term specific to the phenomena he observed in French: “harmonisation vocalique”. Furthermore, many linguists (Martinet, Reference Martinet1945; Straka, Reference Straka1950; Fouché, Reference Fouché1956; Malmberg, Reference Malmberg1969) borrowed the term “harmonisation” from Grammont. The question of the phonological status of “harmonisation” was not raised at that time. In reality, it was not until 1973 that the term “harmonisation” was replaced by “harmonie” by Dell, who provided an interpretation in the form of a derivational rule, following the generative phonology approach (Dell, Reference Dell1973). Other linguists followed suit, and “harmonie”, rather than “harmonisation”, gradually became integrated into the French phonological tradition (Casagrande, Reference Casagrande1984; Tranel, Reference Tranel1987; Durand and Lyche, Reference Durand, Lyche, Coveney and Sanders2004). Indeed, contrary to a claim prevalent in the literature (Fagyal et al., Reference Fagyal, Nguyen and Boula de Mareüil2003), VH was considered as such for only a relatively brief terminological parenthesis, like a symbolic artifact stemming from a symbolic generative approach. If there is a term that might suit better than “harmonie” to describe the phenomenon in question, we propose that it be the first term used since Grammont (Reference Grammont1916), namely “harmonisation”, which has the merit of referring to a systematic phenomenon of assimilation without implying phonological categorization.
2 REAL-TIME STUDY OF MID VOWEL APERTURE AND VOWEL HARMONISATION
2.1 Corpora
To investigate VH, we worked with vowels drawn from a dataset of four corpora of French recordings, comprising largely scripted, mainly broadcast speech. To already existing corpora (INA and ESTER corpora), we have added two corpora specially created for this study, allowing us to expand the temporal scope of our analysis to earlier (1925–1929) and more recent (2020–2023) data.
The recordings of the newly created corpora were transcribed using the Google Cloud Speech-to-Text API recognition software (as described in Shakhovska et al., Reference Shakhovska, Basystiuk and Shakhovska2019), achieving an initial accuracy of approximately 60% to 70% compared to the original content. The transcriptions were then carefully completed by hand to best match the original text and to optimize the automated processing. The recordings were then segmented into phonemes using Montreal Forced Aligner (McAuliffe et al., Reference McAuliffe, Socolof, Mihuc, Wagner and Sonderegger2017).
Due to the limited number of female speakers in the early recordings, we confined our sample to male speakers throughout the dataset. This approach excludes potential variations introduced by female speaker productions, present in only a subset of our data. We acknowledge that this methodological choice constrains the applicability of our results. Therefore, we suggest that a more extensive study of vowel harmonization should also encompass female voices.
2.1.1 BnF 1925–1929 (“Archives de la Parole d’Hubert Pernot” collection)
The first corpus used in the present study comes from a collection of recordings made at the Sorbonne studio between 1924 and 1930, under the direction of Hubert Pernot, who became the director of the Archives de la Parole under the auspices of the Institute of Phonetics at the University of Paris in 1924. The corpus, specifically created for this study, comprises 23 recordings of approximately 3 minutes each, recorded between 1925 and 1929 on double-sided or single-sided 78 RPM records from Pathé and Presto companies and digitized by the Bibliothèque nationale de France (BnF). These recordings amount to over an hour of spoken language.
The recordings feature 11 male speakers delivering scripted speech in a declamatory style, corresponding to a hyperarticulated, either formal or literary elevated style of oral expression. The content of the recordings includes speeches related to the 1914–1918 war (9 recordings), unpublished literary texts read by their authors (7 recordings), and recollections of the scientific or political careers of intellectuals of that time (7 recordings). It is indeed a distinct style with its own characteristics, far from the colloquial style of everyday spontaneous speech, which is rarely found in the archives of that period.
Despite their age, the quality of the recordings was sufficiently consistent to ensure robust formant measurements. Figure 2 displays spectrograms and segmentations of VH contexts extracted from the BnF 1925–1929 corpus.
2.1.2 INA 1940–1997 corpus
The original dataset, spanning from 1940 to 1997, corresponds to the corpus used in a study by Boula de Mareüil et al. (Reference Boula de Mareüil, Rilliard and Allauzen2011) that analysed the evolution of prosody within journalistic speech and, more recently, in a pilot study by Cecelewski et al. (Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023) examining the evolution of VH in French. This corpus comprises approximately 10 hours of scripted speech, encompassing about 160 documents with durations ranging from 20 seconds to 20 minutes. These documents are sourced from the French Institut National de l’Audiovisuel (INA) archives and encompass a variety of materials, including cinematographic newsreels and audiovisual broadcast news (Barras et al., Reference Barras, Allauzen, Lamel and Gauvain2002).
Despite substantial variations in speaking style and acoustic conditions, Barras et al. (Reference Barras, Allauzen, Lamel and Gauvain2002) demonstrated that estimates of signal-to-noise ratios did not significantly increase for older documents. Moreover, Boula de Mareüil et al. (Reference Boula de Mareüil, Rilliard and Allauzen2011) found that the potential impact of background noise on measurements, calculated in terms of percentages of octave jumps between consecutive vowels and the percentage of vowels detected as non-voiced, was low. Due to the limited presence of female voices in journalistic archives from the second half of the last century, the few available female voices were excluded from the analysis. The data distribution within the corpus is notably imbalanced, with approximately 3000 words in the 1940s archives compared to 28000 words in the 1960s. Consequently, we grouped the data into four periods: 1940–1959, 1960–1979 and 1980–1997, following the approach adopted by Boula de Mareüil et al. (Reference Boula de Mareüil, Rilliard and Allauzen2011).
Regarding the speaking style within the INA corpus, it is not consistent throughout the entire period covered, as demonstrated by studies conducted by Boula de Mareüil et al. (Reference Boula de Mareüil, Rilliard and Allauzen2011) on the evolution of journalistic prosody. In particular, the 1940s–1950s were characterized by the so-called Gaumont-Pathé style: in 1908, the Pathé company introduced newsreels, shown before feature films, accompanied by commentaries delivered by a professional announcer who addressed the audience “avec une dynamique d’annonceurs de foire”Footnote 4 (Boula de Mareüil et al., Reference Boula de Mareüil, Rilliard and Allauzen2012: 108), often without amplification. The Gaumont-Pathé style persisted on the radio until the advent of the transistor and the popularization of TV.
For the purpose of acoustic analysis, the archive documents were segmented into phonemes using an automatic alignment system. This system employed context-independent acoustic models that were extensively trained and a pronunciation dictionary specifically adapted to the corpus. This machine learning-based transcription method used Gaussian mixture models (with 256 Gaussians per phoneme). Initially described in Adda-Decker et al. (Reference Adda-Decker, Boula de Mareüil, Adda and Lamel2005), this method has since been validated in various studies (Adda-Decker, Reference Adda-Decker2006; Gendrot and Adda-Decker, Reference Gendrot and Adda-Decker2005; Woehrling et al., Reference Woehrling, Boula de Mareüil, Adda-Decker and Lamel2008). Figure 3 displays spectrograms and segmentations of VH and neutral contexts extracted from the INA 1940–1997 corpus.
2.1.3 Corpus ESTER 1999–2004
The third corpus, frequently used in the French-speaking phonetics community, ESTER (Galliano et al., Reference Galliano, Geoffrois, Gravier, Bonastre, Mostefa and Choukri2006), has extended the temporal scope of our diachronic studies up to the first decade of the 20th century. The ESTER corpus comprises nearly 50 hours of journalist speech, from male speakers, consisting of excerpts from radio broadcasts aired between 1999 and 2004, on various stations such as France Inter, Radio France International, France Culture and Radio Classique. Notably, the ESTER corpus has not only been extensively used in acoustic phonetics (Adda-Decker et al., Reference Adda-Decker, Boula de Mareüil, Adda and Lamel2005, Reference Adda-Decker, Gendrot and Nguyen2008, Reference Adda-Decker, Gendrot, Snoeren, Nguyen, Nguyen and Adda-Decker2013; Bürki et al., Reference Bürki, Gendrot, Gravier, Linares and Fougeron2008; Audibert et al., Reference Audibert, Fougeron, Gendrot and Adda-Decker2016, among others), particularly contributing to the establishment of reference formant values for French vowels through the analysis of extensive datasets (Gendrot and Adda-Decker, Reference Gendrot and Adda-Decker2005). It has also been used in two recent studies on VH in French (Turco et al., Reference Turco, Fougeron and Audibert2016a, Reference Turco, Fougeron and Audibert2016b). The ESTER corpus contains manual orthographic transcriptions, which have been phonetically transcribed and automatically aligned (Galliano et al., Reference Galliano, Geoffrois, Gravier, Bonastre, Mostefa and Choukri2006). We analysed 472 male speakers from ESTER for this study.
2.1.4 Corpus 2020–2023
The latest corpus was created specifically for this study with the aim of extending the diachronic analysis to the most recent data available. We selected recordings from six male speakers, aged approximately between 25 and 35 years, with durations ranging from 3 to 12 minutes, totalling over 75 minutes of speech, recorded between 2020 and 2023. The recordings were collected from a range of platforms, encompassing three speakers from the French public radio channel France Inter, two from popular science communication channels on YouTube, and one from a trending news channel on TikTok. The purpose of this selection was to build a representative sample of contemporary journalistic speaking styles within the multifaceted landscape of modern media.
More specifically, the choice to include audiovisual content from YouTube and TikTok was primarily driven by the observation that, in recent times, the speaking style of young radio presenters has shown a noticeable shift towards characteristics of spontaneous or partially spontaneous speech. Conversely, the speaking style present on these new media platforms closely aligns with the conventions observed in French media during the earlier periods covered by the INA and ESTER corpora, characterized by scripted and planned speech.
2.2 Method
2.2.1 Extracting contexts for the analysis
We began by extracting occurrences of vowels V ∈ {e, ɛ, o, ɔ} in word-final position (here “full vowels”), including monosyllabic words (e.g. cette ‘this’ /sɛt/, mot ‘word’ /mo/, consonne ‘consonant’ /kɔ̃sɔn/), from the corpora corresponding to the period 1925–2023. Among polysyllabic words, we analysed V2 in VH contexts where it was one of the mid vowels (e.g. méthode ‘method’ /metɔd/), as well as the final vowel in words outside of VH contexts (e.g. parole ‘speech’ /paʁɔl/).
In order to extract VH contexts, we started by filtering sequences V1+C0+V2, where V1 ∈ {e, ɛ, o, ɔ} and V2 ∈ {i, e, ɛ, ø, œ, o, ɔ, a, y, u, ɔ̃, ɑ̃, ɛ̃}. Subsequently, we selected contexts where V2 was in word-final syllables. Similar to the studies by Turco et al. (Reference Turco, Fougeron and Audibert2016a) and Cecelewski et al. (Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023), no constraints were imposed on word length nor on the sequence of consonants between V1 and V2. Also, we did not impose any a priori restrictions on the inventory of V2 vowels that could or could not influence the aperture of V1. Final vowels V2 were then categorized based on their degree of phonological aperture, thereby distinguishing non-high vowels, including /ɛ, ɛ̃, œ, a, ɑ̃, ɔ, ɔ̃/, from non-low vowels, including /i, y, e, ø, o, u/. It is worth noting that in cases like réitéraient ‘reiterate, IMPERF’ or communautaire ‘community, ADJ’ only the assimilation of /e, ɛ, o, ɔ/ in immediate pretonic position was considered as relevant in our examination. In Table 1, we provide the number of contexts eligible for analysis.
Accepting words where the penultimate syllable was either open or closed among VH contexts necessitated controlling for the effect of the positional law in our data, even if it was not directly part of the analysis. When combined with the distinction between /e/ – /ɛ/ and /o/ – /ɔ/, the differentiation of the two V2 categories (non-low or non-high) and the structure of the penultimate syllable (open or closed), resulted in 14 possible combinatorial variants of VH and neutral contexts (let us recall that /e/ only occurs in French in an open syllable), as displayed in Figure 4.
To avoid an undesirable effect, we checked if there was variation between different categories of V1 and V2 in our corpora. We also verified whether the ratios of contexts where V1 was in a closed or open syllable remained relatively constant across the different periods analysed. We concluded that no trend was observed over time, which could have interfered with the diachronic evolution of the VH effect in our data. The distribution of open and closed syllables in each category resulting from the combination of one of the V1 and V2 is presented in Figure 5.
Additionally, we coded the consonant immediately following V1 as ‘C1’, classifying them into coronals ∈ {t, d, s, z, ʒ, l, n, ɲ}, dorsals ∈ {k, ɡ}, labials ∈ {p, b, m}, and ʁ. Similarly, we coded the consonant immediately following the word-final vowel as C2, using the same classifications, if the final consonant was in a closed syllable. Our data also included the variable Duration, corresponding to the duration of the analysed vowel in milliseconds, as well as the variable Lexical frequency of each word. Lexical frequency – which is known to have an impact on sound change (Labov, Reference Labov1994; Hansen, Reference Hansen2001) – was calculated internally within the corpus and transformed into a logarithmic scale.
2.2.2 Acoustic Measures
The values of the first formant peaks were extracted at ⅓, ½, and ⅔ of the duration of the analysed vowel (the onset and offset of the vowel were determined based on the boundaries imposed by the automatic alignment system, subsequently corrected), either in word-final position (including monosyllabic words) or in penultimate syllable (V1) in the contexts of VH. We used the Burg algorithm implemented in Praat (Boersma and Weenink, Reference Boersma and Weenink2016), and then averaged to obtain a single mean value. A script was used to automate the extraction of formant values with the following parameters: down-sampling to 44.1 kHz, pre-emphasis at 50 Hz, a lower detection range of 4.9 kHz, and an analysis window of 25 ms.
In the subsequent step, we conducted outlier removal within each of the 20 categories resulting from the combination of the four phonemes (/e, ɛ, o, ɔ/) and six time intervals (1925–1929, 1940–1959, 1960–1979, 1980–1997, 2000–2004, 2020–2023). Extreme values were eliminated within each category using the thresholds of the 5th and 95th percentiles. Statistical modelling was conducted in the R environment (R Core Team, 2021).
We did not engage in formant normalization procedures, which usually require speaker individuation and are known for their limitations (Adank, Reference Adank2003). The absence of speaker individuation in our diachronic corpora, and the exclusive presence of male voices, along with the inherent uncertainties in these procedures, led us to consider formant measurement normalization as irrelevant (Gendrot, Reference Gendrot, Nguyen and Adda-Decker2013: 256–258).
2.3 Results
2.3.1 Real-time change to the degree of aperture of the vowels /e, ɛ, o, ɔ/ in word-final position
We commenced by examining the diachronic evolution in the degree of aperture of the four mid vowels V ∈ {e, ɛ, o, ɔ} in word-final position, including monosyllabic words, across the ten decades studied, as represented in Figure 6.
To statistically validate the results obtained for the front–mid vowels /e, ɛ/ and the back-mid vowels /o, ɔ/, we constructed a multiple linear regression model using the lmer() function from the lme4 R package. The model included fixed effects for Phoneme (two levels: close-mid, open-mid, baseline open-mid), Period (six levels: 1925–1929, 1940–1959, 1960–1979, 1980–1997, 2000–2004, 2020–2023, baseline 1925–1929), an interaction term between these predictors, and fixed effect for Duration and Lexical frequency. Additionally, the model included random effect for Context (five levels: coronal, dorsal, labial, ʁ, #), with varying intercepts and slopes by period, as well as a random intercept for Word. It should be noted that the absence of speaker information within the INA corpus necessitated the exclusion of the random effect for Speaker from all statistical models applied to this dataset.
We discovered the best fit model by using manual forward model comparison with anova(), starting with F1 ∼ phoneme + period + (1|word), and adding terms and interactions to obtain the optimal model syntax to analyse both the /e, ɛ/ and /o, ɔ/ pairs simultaneously for a more reliable comparative interpretation. The models thus created included effects intended to address the question of whether there was acoustic convergence between open-mid and close-mid vowels (i.e., the interaction term between ‘Phoneme’ and ‘Period’), while including the effects of the duration of the segment analysed, lexical frequency within the corpus, and, among the covariates of the model, the potential effects of the consonant context following the analysed segment, in the case of an absolute final consonant. The analysis with anova() showed that the interaction terms between, respectively, ‘Duration’ and ‘Lexical frequency’ and ‘Period’ did not improve the AIC/BIC criteria of the models, which is why they were not included in the syntax. The following formula was thus used: lmer(F1 ∼ phoneme + period + duration + lexical_frequency + phoneme*period + (1 + context|period) + (1|word)).
The pseudo-R 2 measurements of the two models were performed using the r2() function from the performance package. The significance of the terms of the models thus created, as well as the values of Conditional R² and Marginal R², are displayed in Table 2. The full coefficients of the model are presented in Appendices 1 and 2.
We used the emmeans() function from the emmeans package in R to extract post-hoc comparisons of the interaction term. The significance of the terms in the comparison using a “pairwise” approach for follow-up contrasts, with a p-value adjustment equivalent to the Tukey test, is presented in Table 3. We display comparisons between consecutive periods, as well as between every two subsequent periods. The full coefficients of the pairwise comparisons are presented in Appendices 3 and 4.
In summary, both models indicated a significant effect of the mid-close degree of aperture, showing a significant difference in F1 between close-mid and open-mid vowels in our data. Additionally, the models showed a significant effect of lexical frequency for the /e, ɛ/ pair (with duration being non-significant) and a significant effect of duration for the /o, ɔ/ pair (with lexical frequency being non-significant).
Regarding the trends of F1 changes in the analysed vowels, two observations can be drawn from these data: firstly, the significance of the Period terms varied between the two pairs of vowels analysed depending on the given period compared to the baseline, with the significant effect indicating a lowering of F1 values over time for open-mid and close-mid vowels in each pair, respectively.
The second observation concerns a different pattern of change for front and back vowel pairs. The evolution of F1 in our dataset exhibits a similar pattern of change for /e/ and /ɛ/. The mean F1 values derived for /e/ and /ɛ/ in final word position suggest an absence of acoustic rapprochement in terms of aperture between these two phonemes. Conversely, a distinct pattern emerges for /o, ɔ/: mean F1 values point to a consistent acoustic convergence of approximately 50 Hz between /o/ and /ɔ/ from 1925 to 2023. Pairwise comparisons corroborate these observations, as we found non-significant interactions for /e, ɛ/ and significant interactions between consecutive periods for /o, ɔ/, except for non-significant contrasts between periods 1960–1979 and 1980–1997, as well as 1980–1997 and 2000–2023, which indicate a lack of significant acoustic convergence in these periods.
Additionally, we observed a significant effect of duration of back mid vowels on F1 height, with longer vowels tending to lower the F1 of /o, ɔ/. The effect of lexical frequency, on the other hand, was significant only for front-mid vowels (/e, ɛ/), which exhibited a higher F1 in high-frequency lexical words.
2.3.2 Real-time changes in Vowel Harmonisation of /e, ɛ, o, ɔ/
To quantify the degree of VH on the /e, ɛ, o, ɔ/ vowels spanning the time period from 1925 to 2023, Figure 7 displays mean F1 values in penultimate syllables according to the phonological height of V2 contexts. The extent of VH is inferred from the difference between the solid and dashed black lines. For comparison, average F1 values for the /e, ɛ, o, ɔ/ vowels in all positions (all tokens) are represented by blue (/e, ɛ/) and red (/o, ɔ/) lines.
Separate linear regression models were performed for the four mid vowels in penultimate position, following the same model comparison protocol as for vowels in word-final position. We primarily studied the effect of the variable Aperture of V2 (2 levels: non-high, non-low, baseline: non-low), Period, and their interaction on the F1 of V1. The models included the variables Duration and Lexical frequency, as well as the following covariates: random intercepts and slopes for Context, with these effects varying by Period, and a random intercept for Word. It is important to note that the variable Context (comprising four levels: coronal, dorsal, labial, and ʁ) was used to code the place of articulation of the first consonant immediately following the penultimate vowel. Thus, the following formula was used: lmer(F1_V1 ∼ aperture_of_V2 + period + duration + lexical_frequency + period*aperture_of_V2 + (1 + context|period) + (1|word)).
The pseudo-R² measurements of the two models were performed using the r2() function from the performance package. The significance of the terms in the models, as well as the values of Conditional R² and Marginal R², are displayed in Table 4. The full coefficients of the models fitted for the four V1 vowels (/e, ɛ, o, ɔ/) are presented in Appendices 5, 7, 9 and 11, respectively.
Post-hoc comparisons for the interaction term aperture_of_V2*period were then computed using the emmeans() function of the emmeans R package. The significance of the terms in the comparisons, using a “pairwise” approach for follow-up contrasts with a p-value adjustment equivalent to the Tukey test, is presented in Table 5. Comparisons are displayed between consecutive periods as well as between every two subsequent periods. The full coefficients of the pairwise comparisons for the four models are presented in Appendices 6, 8, 10, and 12, respectively.
The results yield three key observations regarding the F1 behavior of penultimate vowels in VH context. Firstly, the significant effect of the Aperture of V2 factor confirms the presence of a VH effect for all vowels analysed from 1925 to 2023.
Secondly, the significance of the Period terms varied depending on the penultimate vowel and the specific period compared to the baseline, with a significant effect indicating a lowering of F1 values over time. Indeed, both front-mid and back-mid vowels showed a reduction in F1 values from 1925 to 2023 in penultimate position, aligning with trends observed for these vowels in word-final position.
Lastly, pairwise comparisons reveal a significant reduction in the extent of VH for all vowels analysed between 1925 and 2023, though the magnitude of this effect varied for different vowels across different periods. Specifically, close-mid vowels /e, o/ experienced the most substantial reduction in VH between the 1920s and 1970s, while open-mid vowels experienced the greatest reduction in VH between 1970 and 2000.
Beyond the key variables related to the evolution of the VH effect, we observed a significant impact of the duration of all penultimate vowels analysed on the increase in F1 height for longer vowels. Regarding the effect of lexical frequency, its influence is more transient, showing significance for close-mid vowels (/e, o/) but not for open-mid vowels (/ɛ, ɔ/). Specifically, high-frequency words tend to significantly lower the F1 of close-mid vowels, whereas this effect is not significant for open-mid vowels.
3. REAL-TIME STUDY: SUMMARY AND CONCLUSIONS
This initial real-time study has demonstrated that, between 1925 and 2023, the degree of aperture of mid vowels /e, ɛ, o, ɔ/ expressed in terms of mean F1 has continually changed, with several aspects of this evolution deserving attention. Firstly, the mean F1 of mid vowels in word-final (and penultimate) syllable has consistently decreased, with the highest values corresponding to the earliest decades, specifically 1925–1960.
It is important to note that this study focused on the empirical reality of the phonological category of aperture rather than backness. The observed convergence of segments in terms of F1 pertains solely to aperture. Therefore, we cannot conclude a merger by approximation (Trudgill and Foxcroft, Reference Trudgill, Foxcroft and Trudgill1978; Labov, Reference Labov1994) in the acoustic space without considering F2 data. These data (Cecelewski et al., Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2024) indicate that, for our corpora, the fronting of /ɔ/ during the period studied maintains a relatively constant Euclidean distance, while allowing for convergence in terms of aperture between /o/ and /ɔ/ due to the fronting of the latter.
Finally, a reduction in the degree of assimilation in penultimate syllable within a VH context was observed for all four vowels /e, ɛ, o, ɔ/. The decrease extent of VH for the two close-mid vowels (/e, o/) occurred in an earlier period – between the 1920s and the 1970s – and was followed, a few decades later, by the reduction of VH for the two open-mid vowels (/ɛ, ɔ/).
Yet, the role of speaking style which has evolved since 1925 remains to be determined, with moments of disruption due to the advent of radio and the professionalization of broadcast presenters in the 1940s, as well as the gradual abandonment of the Gaumont-Pathé style after the 1950s. We know that this speaking style is characterized by high f0 and F1 values (Boula de Mareüil et al., Reference Boula de Mareüil, Rilliard and Allauzen2011) and that these two parameters are acoustic correlates of an increased vocal effort (Liénard and Di Benedetto, Reference Liénard and Di Benedetto1999). However, the exact impact of this speaking style on the degree of VH is still to be ascertained. So far, we have observed that earlier periods were characterized by both a greater extent of VH and higher F1 values of the vowels analysed in our data, regardless of its exact phonological context and position within the word. Therefore, we may formulate the following hypothesis:
Hypothesis 3 The increased vocal effort measured in the old-fashioned declamatory speaking style is responsible for a greater extent of VH in the earlier decades.
Inspired by earlier studies (Zetterholm, Reference Zetterholm and Müller2007; Boula de Mareüil et al., Reference Boula de Mareüil, Rilliard, Allauzen, Bilger, Buscail and Mignon2017), a targeted imitation experiment was conducted to test this hypothesis, aiming to better distinguish between diachronic variation inherent to the French phonological system and changes resulting from the ongoing evolution of the broadcast news speaking style.
4. IMITATION EXPERIMENT
4.1 Corpus
For the purpose of an imitation-based experiment, we selected 27 pairs of disyllabic words, each pair containing both a VH context and a neutral context, representing an adjective, a noun or a verb. Within our corpus, 27 pairs were chosen from those analysed by Fagyal et al. (Reference Fagyal, Nguyen and Boula de Mareüil2003) and Nguyen and Fagyal (Reference Nguyen and Fagyal2008) to potentially correspond to a high usage frequency. In each pair, the first syllable, which is always open, contains a V1 ∈ {e, ɛ, o, ɔ}. The onset of the first syllable, if filled, is made up of either a single obstruent C1 ∈ {p, k, d, s, ʒ, m, n, ʁ}, or a consonant cluster consisting of an obstruent + liquid sequence, chosen from the following: C1C2 ∈ {kl, fl, fʁ}. In each pair, one of the final vowels is close or close-mid, V2 ∈ {e, i, o, ø}, and the other is open or open-mid, V2 ∈ {ɛ, a, ɔ, œ}. In addition to the defined VH contexts, we included ten monosyllabic distractor words. The full list of VH pairs analysed is displayed in Appendix 13.
The selected words were inserted into the frame sentence used by Nguyen and Fagyal (Reference Nguyen and Fagyal2008), and previously by Fagyal et al. (Reference Fagyal, Nguyen and Boula de Mareüil2003): “Il retape ______ parfois, ______.” Thus, the same target word was produced twice, the first time within a rhythmic group and the second time at the end of the sentence. This was coded in the variable Repetition (2 levels, I, II). Apart from its somewhat artificial formulation, this frame sentence has the advantage of embedding the target word between two identical labial contexts, which are likely to facilitate unambiguous segmentation on one hand and minimize uncontrolled coarticulatory effects on the other.
4.2 Speakers and recordings
Six male subjects aged 21 to 31 years participated in the study. All of them were born in the Île-de-France region, where they still resided at the time of the recordings. None of them exhibited discernible dialectal pronunciation features nor voice and speech disorders. The recordings were conducted in an anechoic room using a high-quality microphone. The productions were recorded at a sampling frequency of 44.1 kHz (16 bits) and subsequently downsampled to 22.05 kHz.
The list of 84 sentences containing VH and neutral contexts was read four times by each speaker. During the first reading, the speakers were instructed to read each sentence at a normal pace without emphasizing a specific word. In the second reading, the speakers stood in a corner of the soundproof room to create the illusion of a slightly larger space. They were directed to speak loudly, as if addressing a person located in the opposite corner of the room, at a distance of approximately 2 to 2.5 m, producing highly articulated speech, akin to a declamatory style. Before each of the two recording sessions, the speakers were asked to maintain, as closely as possible, the same speech rate, rhythm, and intonational contour throughout their reading of the corpus. The two readings were coded under the variable Condition (2 levels: control, imitation).
4.3 Results
The mean F1 value of V1 ∈ {e, ɛ, o, ɔ} as a function of the phonological height of V2 (2 levels: non-high, non-low) and Condition (2 levels: control, imitation) is displayed in Figure 8.
To validate the results, we constructed linear mixed-effects models – using the lme4 R package, (Bates et al., 2014) – separately for the four V1 ∈ {e, ɛ, o, ɔ}, including the fixed effects corresponding to the phonological Aperture of V2 (two levels: non-high, non-low, baseline: non-low), Condition (two levels: control, imitation, baseline: control), and an interaction effect between these predictors, as well as random effects corresponding to Repetition (two levels: I, II), Speaker (6 levels) and Word. The following formula was used: lmer(F1_V1 ∼ aperture_of_V2 + condition + aperture_of_V2*condition + (1|speaker) + (1|repetition) + (1|word)). The p-values of the effects of the model are summarized in Table 6. The model coefficients are detailed in Appendix 14.
The results of the imitation-based experiment can be summarized as follows. Firstly, in this recent dataset, the VH effect is relatively subtle, achieving significance only for the open-mid vowels /ɛ, ɔ/, whereas the VH effect on the two close-mid vowels /e, o/ was not significant. As inferred from mean F1 values, the assimilation extent peaks at 25–30 Hz for /ɛ/ and hovers around 10–15 Hz for the vowels /e, o, ɔ/. This pattern may be indicative of a broader trend of reducing VH extent observed in our diachronic corpora. Secondly, a significant increase in F1 was noted in the speakers’ productions when using a hyper-articulated, loud speaking style, resulting in an F1 rise of approximately 60–80 Hz, and so a more open vowel quality for the vowels analysed. Finally, as indicated by the non-significant interaction terms aperture_V2*condition, there was no significant variation in VH due to the imitation condition. This suggests a consistent extent of assimilation across the four vowels analysed, regardless of the speaking style reproduced by the speakers. This suggests that the greater vocal effort reminiscent of the Gaumont-Pathé style is not sufficient to explain the greater VH tendency that was observed in the first decades of our archive corpus.
5. DISCUSSION AND CONCLUSIONS
We conducted an acoustic study to investigate real-time change in the degree of aperture of the French mid vowels /e, ɛ, o, ɔ/ in word-final and penultimate positions, as well as the diachronic pathway of how VH operates on these vowels in a novel dataset formed from four corpora dating from 1925 to 2023. An auxiliary study conducted within the imitation paradigm was devoted to examining the sensitivity of F1 and VH to a hyperarticulated, loud speaking style.
The combination of these two studies enabled us to demonstrate that, during the period in question, /e, ɛ, o, ɔ/ exhibited a lowering of F1, primarily reflecting shifts in speaking style rather than an internal diachronic evolution of the phonological system. Cross-linguistically, the phenomena of F1 increasing alongside a higher fundamental frequency have already been reported in the literature (Assmann et al., Reference Assmann, Nearey, Bharadwaj, Hubbard and Jayaraman2008) and are not themselves a novelty. Furthermore, in contrast to the earlier findings of Hall (Reference Hall2019) or Gendrot and Audibert (Reference Gendrot and Audibert2019), we did not observe acoustic convergence, in terms of aperture, between the front vowels /e, ɛ/. However, we did observe a significant acoustic convergence in terms of aperture of over 50 Hz on average between /o/ and /ɔ/, consisting of a raising of /ɔ/ while maintaining the height of /o/. This enabled us to confirm Hypothesis 1 for back-mid vowels while rejecting it for front-mid unrounded vowels.
While adhering to the “general goal of making historical and phonetic data an integral part of the rationale for motivating and testing phonological theory” (Foulkes, Reference Foulkes1997: 272), we used the empirical reality of real-time data spanning the longest possible period in order to examine the diachronic pathway of the vowel aperture category – specifically for mid vowels, known for their instability in the French vocalic system over time (Hansen and Juillard, Reference Hansen and Juillard2011). In this context, the findings of this study can be summarized by stating that, over time, among Standard (Parisian) French (male) speakers, there has been a weakening of the phonological aperture opposition in back-mid vowels, specifically, through the raising of the open-mid vowel. Nevertheless, this shift did not result in a complete neutralization in the formal speaking style of the male speakers we studied – a style which is known for being less susceptible to change compared to casual speech. The raising of /ɔ/ aligns with the established patterns of vowel shift processes observed cross-linguistically; specifically, there is an asymmetry in diachronic vowel shifts, as they tend to raise vowel height (Labov, Reference Labov1994). This pattern is also evident in Romance metaphony, which predominantly involves a series of vowel raising phenomena (Cole, Reference Cole1998). The real-time study we presented here further reflects the gradual nature, as opposed to the scalar nature of traditional philology’s observation of change involving the slow erosion of opposition within a phonetic system (Labov, Reference Labov1994).
It is important to note that, in this study, we limited ourselves to examining the acoustic correlate of aperture (F1), without studying that of tongue body anteriority (F2). The acoustic convergence examined here thus concerns only a convergence in terms of aperture and not merger by approximation (Trudgill and Foxcroft, Reference Trudgill, Foxcroft and Trudgill1978; Labov, Reference Labov1994) in the vowel space. However, the evolution of mid vowels encompasses other phenomena such as the fronting of /ɔ/ (Boula de Mareüil et al., Reference Boula de Mareüil, Adda-Decker and Woehrling2010) confirmed by initial examinations of our data not presented here. Therefore, to have a complete picture of the evolution of mid vowels in French, the vowel segment rearrangement phenomena tending towards acoustically and articulatorily counteracting convergence in low-high axis by spacing out the segments on the front-back axis should be discussed.
Additionally, the merger of /a/ and /ɑ/ was completed during the 20th century. During this period, the French /a/ underwent backing and centralization, while /ɑ/ experienced slight fronting and centralization (Cecelewski et al., Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2024). This type of neutralization can be characterized as a perfect example of merger by approximation, involving sound changes that occur subconsciously and progress gradually (Labov, Reference Labov1994), presumably driven by language-internal pressures (Guy, Reference Guy1990).
In summary, we can roughly visualize the development of the mid and low vowel system in French as undergoing a series of phonetic shifts, characterized by the convergence in terms of aperture of /o, ɔ/ alongside the fronting of /ɔ/, likely facilitated by the merger of /a, ɑ/. With the loss of /ɑ/ and the shift of /ɔ/ and /a/, the French vowel system appears to have undergone vertical compression, completed by a narrowing of the posterior region of the vowel space, while maintaining the distance between the remaining distinctive segments. In Figure 9, the black outline represents the state of the oral vowel system in word-final position at the beginning of the century, while the red outline depicts the current state of segment redispersion in the vowel space.
This raises the question of the presence of asymmetry in the evolution of these mid vowel pairs, both front and back, in French. Specifically, how can we explain the partial merger of back mid vowels over time, in contrast to the remarkable stability of unrounded front vowels? For example, Martinet views such dynamics as a consequence of articulatory asymmetry between the possible degrees of vowel height in the front and back of the vowel space: “pour le même nombre de phonèmes dans la série d’avant et d’arrière, les marges de sécurité seront plus étroites à l’arrière qu’à l’avant, et ceci peut, en partie, expliquer les divergences de comportement entre les deux séries”Footnote 5 (Martinet Reference Martinet1955: 99).
Another possible explanation could involve vowel quantities. While this study does not directly investigate this aspect, we can observe that historically, in several languages, starting with Latin, differences over time in vowel quantity helped maintain quality distinctions between vowels sharing a common region of the vowel space (Leppänen and Alho, Reference Leppänen and Alho2018). In our data, the relative duration change of back-mid vowels indeed shows a trend towards equalizing average durations between back-mid vowels spelled <ô, au, eau> and those spelled <o>, and these two categories of spelling roughly correspond to the historical opposition of distinctive vowel length (Morin, Reference Morin1985). Conversely, the average duration ratios between historically long and short front mid vowels, such as maître ‘master’ and mètre ‘metre’, remain practically constant over time. During the period covered by our data, when vowel duration was only a faint reflection of the historical system of vowel quantity oppositions in French, we can assume that spelling criteria were at play in triggering the change of relative vowel duration. This asymmetry in terms of vowel duration evolution may thus have contributed to the weakening of the aperture opposition between back-mid vowels and the maintenance of the opposition in the unrounded front-mid vowel pair.
Another specific aspect of the diachronic pathway of French mid vowels revealed by this study is the reduction in the extent of VH affecting the vowels /e, ɛ, o, ɔ/, validating our Hypothesis 2. This is further supported by a weak VH effect, significant only for the vowels /ɛ, ɔ/ and not significant for /e, o/, in the well-controlled corpus recorded in the laboratory, which was specifically designed for the imitation-based study. More importantly, the results from the imitation study indicated that the degree of VH remained unaffected by speaking style shift for the four mid vowels under analysis, allowing us to reject the complementary Hypothesis 3. These results led us to postulate an internal phonetic evolution of the system rather than a style-related artifact, which is in line with the findings of the pilot study on VH conducted by Cecelewski et al. (Reference Cecelewski, Gendrot, Adda-Decker and Boula de Mareüil2023). The authors examined the relationship between mean f0 and F1 from a diachronic perspective over half a century, specifically in the INA corpus (1940–1997), and came to a similar conclusion.
This phenomenon is interesting for several reasons. Firstly, the reduction of VH appears to be at least partly due to the narrowing of the mid vowel region of the vowel space. We can cite examples of languages where Vowel Harmony was historically lost along with mergers of vowel segments. For instance, Sandstedt (Reference Sandstedt2020) notes that the positional merger of /ɛ/ and /e/ was a crucial factor in the gradual loss of height Vowel Harmony in Old Norwegian. However, it would be a stretch to directly compare long-term observable phenomena with known starting and ending points to the gradual changes of a partially phonologized phenomenon such as Vowel Harmonisation in the vowel system of Standard (Parisian) French.
It is worth noting that the degree of phonologization of VH in French has been discussed previously, particularly by Nguyen and Fagyal (Reference Nguyen and Fagyal2008). They argued that VH is too variable and limited in scope to be considered phonological. Our contribution, of a philological nature, has been to demonstrate that the history of the terms “vowel harmonisation” and “vowel harmony” shows that VH has only been considered to be phonological since the advent of generative phonology in French. Before that, the original term “vowel harmonisation”, dating back to the early 20th-century treatises on French phonology, appears to have been coined to emphasize the variability of this phenomenon and its “sub-phonemic” nature.
In any case, the reduction in the extent of VH over time seems to have been facilitated by the particular status of VH in French, which lies between a phonetic, biomechanical phenomenon and a near-categorical phenomenon governed by a robust phonological principle. Inherited from a period when the height differences between close-mid and open-mid vowels were more pronounced, VH has gradually diminished over the 20th and 21st centuries. This reduction in VH occurred alongside other vocalic shifts, serving as a compensatory adjustment to preserve the declining oppositions between open-mid and close-mid vowels for a longer period. This change paralleled hypercorrection phenomena (Ohala, Reference Ohala, Masek, Hendrick and Frances Miller1981), likely influenced by orthographic factors. In this instance, in their study of VH in continuous speech corpora, Turco et al. (Reference Turco, Fougeron and Audibert2016a) found that orthographic representations, such as <é> for /e/ and <au, eau> for /o/, promoted more prototypical, close-mid realizations of these vowels, regardless of the opening VH context in word-final syllable.
Another aspect is the paradoxical sociolinguistic behavior of VH in French. While historical accounts suggest that VH was only present in spontaneous speech and absent in formal style, we have shown that VH was indeed present in formal or conservative style in the early 20th century, despite diction teachers of the time explicitly discouraging VH (see Section 1.4). Our imitation experiment further revealed that VH in French was not affected by a strong, hyperarticulated, and projected voice, similar to the declamatory style of historical recordings. In the context of this study, the approach we adopted for differentiating between a speaking style shift and a genuine diachronic change was an imitation-based experiment (see, for example, Schlichting and Sullivan, Reference Schlichting and Sullivan2013; Zetterholm, Reference Zetterholm and Müller2007; Boula de Mareüil et al., Reference Boula de Mareüil, Rilliard and Allauzen2011, Reference Boula de Mareüil, Rilliard, Allauzen, Bilger, Buscail and Mignon2017), inspired by the principles of Historical Laboratory Phonology (Ohala, 1993):
If particular sound changes are posited to have a phonetic basis, then one should be able to duplicate the conditions under which they occurred historically and find experimental subjects producing ‘mini’ sound changes that parallel them (Ohala, 1993: 261).
A methodological choice was made not to engage a speech professional, capable of reproducing an acoustic impression close to an epoch-characteristic, declamatory, broadcast announcer style, as was done by Boula de Mareüil et al. (Reference Boula de Mareüil, Rilliard and Allauzen2011), but rather to opt for a group of non-professional speakers. As no imitation is free from the individual subject imprint (see Pinget, Reference Pinget2022), we had predicted a high inter-speaker variability, which was expected to increase the likelihood of detecting a correlation between voice type and degree of VH, even in a minority of speakers. This is something we might have missed by working with a single subject. The detailed results presented here allowed us to conclude that there was no impact of the projected voice on the extent of VH, for any of the six speakers who were recorded.
However, this type of imitation does not capture all the specific features of historical declamatory or journalistic styles. Other elements of these speaking styles, not yet studied empirically, may have contributed to the loss of VH. One such factor, currently under investigation, is the evolution of final lengthening in accentual groups (Vaissière, Reference Vaissière1991). Over time, the stressed vowel has become shorter and is now often supported by a pre-pausal /ə/ (e.g., Fónagy, Reference Fónagy1989), which may result in less assimilation of the penultimate vowel compared to archival recordings where final lengthening was more pronounced. A complete and accurate prosodic annotation of continuous speech corpora is an arduous and laborious task, which is why we did not include more precise factors characterizing final lengthening in our statistical modelling. Nevertheless, the fact that penultimate vowel duration had a significant effect on the F1 value of all the vowels analysed indicates that more precise prosodic phenomena, not annotated in our corpus, the decrease of such as final lengthening, may have contributed to the loss of VH in the 20th century.
The last aspect of the archive-based study of changes in VH that warrants attention is the role of lexical frequency in these changes. The effects of frequency on phonetic change are complex and varied, with some arguing that high-frequency words drive the change (Bybee, Reference Bybee and Michael Barlow2000), while others suggest that low-frequency words can sometimes be the precursors of change, possibly because listeners retain detailed phonetic memories for specific words (Hay et al., Reference Hay, Pierrehumbert, Walker and LaShell2015). Another perspective holds that lexical frequency has no effect on the propagation of change (see Labov, Reference Labov1994). Our data indicate that, for penultimate syllable vowels, close-mid vowels /e, o/ were realized with a higher F1 in high-frequency lexical words. However, this effect was not significant for open-mid vowels /ɛ, ɔ/. More specifically, among words containing /e/ or /o/ in penultimate syllable, high-frequency lexical words, pronounced with a more prototypical aperture, may have contributed to the reduction of assimilation to the final vowel over time. This finding is particularly intriguing as the most prominent reduction of VH occurred a few decades earlier for /e, o/ compared to /ɛ, ɔ/. While it is speculative to suggest that lexical frequency accelerated the VH reduction process in these vowels, it may merely be coincidental, this aspect certainly deserves further investigation and detailed analysis.
In summary, a major difficulty in studies of this kind, inherent in the corpus nature, is the capacity to distinguish between features linked to a particular style or its temporal development and those that can be ascribed to a genuine diachronic change within the phonetic/phonological system. Indeed, this challenge is further complicated by the scarcity of spontaneous speech and recordings of female voices, especially in French archives dating back to before the 1960s. Hence, it is essential to note that the results from the two studies presented here are applicable primarily to formal, controlled, and conservative Standard (Parisian) French, which is often assumed to exhibit greater resistance to phonetic innovations. Indeed, it would not be overstating it to say that we could only work on a type of speech where changes are most subtle and scarce.
Despite these limitations, we have identified several changes affecting the French vowel system over the past century. We advocate for real-time studies using archival speech corpora, which have thus far been underutilized in French phonology research. Many questions remain unresolved, particularly regarding the causes of these changes and the overall dynamics of the vowel system, encompassing all oral vowel segments. We continue to develop the diachronic corpus presented here and are working to extend its temporal scope to include archives from the Bibliothèque nationale de France, spanning the years 1910–1914. Additionally, we aim to achieve more detailed prosodic annotations and increase the volume of data for each sub-period studied. This extension would enable us to trace the diachronic pathway of the French vowel system throughout the recorded history of metropolitan French.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0959269524000206
Acknowledgements
We would like to thank the reviewers for their time and effort in reviewing the manuscript. Their insightful comments and suggestions resulted in improvements to certain aspects of the analysis and inspired new ideas for future research.
This work was partially supported by the French Investissements d’Avenir - Labex EFL program (ANR-10-LABX-0083) and by the French National Research Agency (ANR) as part of the DIPVAR project (ANR-21-CE38-0019).
Competing interests
The author declares none.