Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-26T19:26:08.394Z Has data issue: false hasContentIssue false

An acoustic exploration of sibilant contrasts and sibilant merger in Mixean Basque

Published online by Cambridge University Press:  16 May 2024

Ander Egurtzegi
Affiliation:
CNRS – IKER UMR5478
Dorota Krajewska*
Affiliation:
University of the Basque Country UPV/EHU
Christopher Carignan
Affiliation:
University College London
Iñigo Urrestarazu-Porta
Affiliation:
CNRS – IKER UMR5478, UPV/EHU, UPPA
*
*Corresponding author. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

This exploratory study investigates sibilants in Mixean Low Navarrese, an endangered variety of Basque. This variety has been described with ten different contrastive sibilants: /s̻, s̺, ʃ, t͡s̻, t͡s̺, t͡ʃ, z̻, z̺, ʒ, d͡z̺/. The objective of the paper is to (a) provide a detailed description of the acoustics of Mixean sibilants, and (b) elucidate whether ten categories can be proposed based only on acoustical data, or whether fewer categories should be considered. The study is based on free-conversation data of ten subjects (three females, seven males) aged between 80 and 85 years. We analyze metrics reflecting the place of articulation (spectral moments, and especially the center of gravity (CoG)), including also the temporal dynamics of CoG (using the discrete cosine transform of CoG measurements of nine intervals of each phone). We also explore the acoustic correlates of the contrasts between (a) voiced and voiceless sounds and (b) fricative and affricate sounds. The results show that only seven categories can be proposed based on acoustic measurements. The lamino-alveolar series reliably contrasts with the rest, but the distinction does not hold between the apico-alveolar and the postalveolar series. We found minimal differences in the analysis of dynamic data, and none in the static analysis.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of The International Phonetic Association

1 Introduction

Mixean Low Navarrese is an endangered variety of Basque that has been described with ten different contrastive sibilants, including oppositions based on three places of articulation, voicing, and fricatives vs. affricates. Nevertheless, a merger collapsing the three apico-alveolar sibilants with the three corresponding postalveolars has been (preliminarily) proposed in the literature (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020b).

The current exploratory study investigates these segments, in order to (a) provide a detailed description of the acoustics of sibilants in this variety, and (b) elucidate whether ten categories can be proposed based only on acoustical data, or whether fewer categories should be considered. At the same time, this paper aims to present a thorough exploration of how to approach the acoustic description of a variety of a language including an undetermined number of sibilant contrasts. To that end, we take into account an array of acoustic measures. To begin with, we include metrics which reflect the place of articulation (spectral moments; especially, the spectral center of gravity, i.e. CoG), including also the temporal dynamics, i.e. the variation CoG values show throughout the sound segment. We also take into account the acoustic correlates of the contrasts between (a) voiced and voiceless sounds and (b) fricative and affricate sounds.

The paper is organized as follows: Section 2 provides the necessary background information on sibilants and sibilant systems in the world’s languages, the Basque variety under study and its phonology, and sibilant contrasts in Basque and related acoustic studies. Section 3 lists our research questions. Section 4 introduces the data and the acoustic measures used to analyze it. Section 5 presents the results: a general exploration (5.1), the analysis of spectral moments (5.2), the temporal dynamics of CoG (5.3), the voicing contrast and analysis of duration (5.4), and distinction between fricative and affricate sounds (5.5). Section 6 discusses the results and Section 7 provides conclusions.

2 Background

2.1 Sibilants in the world’s languages

Sibilants are usually defined as fricatives with high-frequency spectral energy, in which a turbulent, random-frequency sound is generated by the strike of a jet of air against the teeth after passing at a high velocity through a narrow constriction, which is produced by the closeness of the flexible part of the tongue and the passive articulator (see e.g. Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 138). In the case of affricate sibilants, this sound is the second component of the segment, namely the release of a preceding homorganic stop. Sibilants are coronals by definition, and can be produced from the dental to the palatal region. They are more common than other fricatives, potentially due to their higher acoustic energy (Ladefoged Reference Ladefoged2001: 167). Most of the world’s languages have at least one fricative sound (93.4% in the UPSID database, Maddieson (Reference Maddieson1984: 42)), which is typically a sibilant. A notable exception is Australian languages, many of which lack fricatives altogether (Maddieson Reference Maddieson1984: 42).

The most frequent sibilant segment in the world’s languages is /s/, which typically denotes an alveolar fricative sibilant, although it might represent a dental or denti-alveolar in some linguistic descriptions. These sounds have often been collapsed in the literature, given that it is often difficult to discern them (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 146), and they rarely contrast. In the UPSID database, where about 83% of the languages have some kind of s-sound, only four languages (1.3%) have both /s/ and /s̪/ (cf. Maddieson Reference Maddieson1984: 44).

In the Phoible database (Moran & McCloy Reference Moran and McCloy2019), /s/ is represented in 2020 phonological inventories (out of 3020, i.e. 67%). Following /s/, we find /t̠ʃ/ (1218; 40%), /ʃ/ (1104; 37%), /z/ (893; 30%), /d̠ʒ/ (820; 27%), /ts/ (666; 22%), /ʒ/ (478; 16%), and /dz/ (312; 10%). Thus, the most frequent sibilants involve all possible voiceless/voiced and fricative/affricate combinations of the alveolar or postalveolar places of articulation (with no secondary articulation/release). Other sibilants are represented in less than 200 phonological inventories of the Phoible database (less than 7% each).

Sibilants are most usually voiceless, although voiced sibilants are not uncommon (see Żygis et al. Reference Żygis, Fuchs and Koenig2012 for a phonetic account of this biased distribution in sibilant affricates). Maddieson (Reference Maddieson1984: 45) observed that the ratio of voiced to voiceless sibilant fricatives was 0.36 for the (dental-)alveolar and 0.34 for the postalveolar pair. Non-sibilant voiced fricatives are more frequent, and thus non-sibilant fricative pairs show higher voicing ratios. Although not necessarily so, the presence of a voiced fricative in a segmental inventory generally implies that of its voiceless counterpart (Maddieson Reference Maddieson1984: 47).

While many languages, such as English or Spanish, have a reduced number of sibilants, other languages have larger sibilant inventories. Polish (Jassem Reference Jassem2003: 103) is a classic example of a large sibilant inventory, with 12 sibilants (fricative/affricate and voiced/voiceless variants of dental/alveolar, (retroflex) postalveolar, and alveolo-palatal sibilants). Larger inventories imply more opportunities for mergers to occur, as in some dialects of Polish which reduce the three-way contrast to two places of articulation (Żygis Reference Żygis2003: 179). Mixean Basque, the language variety studied here, involves a lesser known but nonetheless comparable case. Although previously described with a ten-sibilant system (/s̻, s̺, ʃ, t͡s̻, t͡s̺, t͡ʃ, z̻, z̺, ʒ, d͡z̺/), recent research has proposed a merger between the postalveolar and apico-alveolar categories (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020b).

2.2 Contextualizing Mixean Basque

Within France, Mixe (Amiküze in Mixean Basque) is one of the regions forming the Pyrénées-Atlantiques department, in the South-West corner of France. Mixe is located in the North(-East) of the Basque Country, in the North of the historical province of Low Navarre. It shares borders with the Gascon Béarn to the North and the historical Basque province of Zuberoa to the East. The region is formed by 32 towns, the main one being Donapaleu (Saint-Palais in French). However, the total population of Mixe is relatively small (7856 people in 2015; L’institut national de la statistique et des études économiques (2019)) and the number of Basque speakers is even smaller. In fact, Mixean Basque (amiküzera in Mixean Basque), has been described as being on its way to disappearing (Camino Reference Camino2016: 51): not even 10% of children are schooled in a Basque-speaking model (Zabalik 2016) and for those the education language is standard Basque, and not the local variety.

Mixean Basque has been understudied until recently, perhaps because it is underrepresented in the literature and most Basque dialectologists have considered it, in synchronic classifications, as (Eastern) Low Navarrese (Mitxelena 2011 [Reference Mitxelena1977]) or as a transition variety between Low Navarrese and Zuberoan. The endangerment of this variety urges researchers to study Mixean Basque now, while it is still possible, before it definitively disappears. Nevertheless, the recent thorough dialectological study by Camino (Reference Camino2016) and the general acoustic description by Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b) have greatly helped improve our knowledge of Mixean Basque.

2.3 Mixean phonology

The phonological inventory of Mixean Basque consists of 34 contrastive consonants, including 12 stops /p, t, c, k, ph, th, ch, kh, b, d, ɟ, ɡ/, ten sibilants /s̻, s̺, ʃ, t͡s̻, t͡s̺, t͡ʃ, z̻, z̺, ʒ, d͡z̺/, a labiodental fricative /f/, nine sonorants /m, n, ɲ, l, ʎ, ɾ, r, j, w/, two laryngeals /h, h ̃/ and six contrastive vowels /a, e, o, i, ʉ, u/. Besides, French loanwords include /ʀ, v, ɛ, œ/ and nasalized vowels. However, vowel nasalization is not phonemic in Mixean Basque (Camino Reference Camino2016: 200), but nasalized vowels are produced due to coarticulation with any neighboring nasal segment. Some of these phonemic contrasts have been acoustically established, including the three contrastive stop series – voiced, voiceless unaspirated and voiceless aspirated (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020b: 2797–2798)–, a sixth vowel quality /ʉ/, not present in other Basque dialects (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020b: 2796–2797), or nasality as a contrastive feature distinguishing the two laryngeals /h, h ̃/ (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020a). However, other phonological oppositions have not been studied in enough detail. This is the case of the sibilant series, which is tentatively described with the high number of ten contrasting segments. Nevertheless, Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b: 2797–2798) proposed a potential merger that would reduce their number to seven (see details below). This paper aims to fill these gaps with a more in-detail study of the recordings of Mixean Basque already examined by Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b), which were collected by Camino (Reference Camino2016).

2.4 Sibilant sounds in Basque

It is widely accepted that Basque once had six voiceless sibilants that were common to all dialects. These sibilants are most usually described as dorso- or lamino-alveolar /s̻/, apico-alveolar /s̺/, and palato-alveolar /ʃ/ fricatives, and their affricate counterparts, /t͡s̻, t͡s̺, t͡ʃ/ (i.a. Mitxelena 2011 [Reference Mitxelena1977]; Hualde Reference Hualde, Ignacio Hualde and Ortiz de Urbina2003; Egurtzegi Reference Egurtzegi2013). Nevertheless, according to the description of the Northern High Navarrese variety of Bortziri by Yárnoz (Reference Yárnoz2002a; Reference Yárnoz2002b), the six sibilants are better described as flat postalveolar (transcribed by the author as /ṣ, t͡ṣ/), denti-alveolar (/s̪, t͡s̪/), and palatalized postalveolar (/ɕ, t͡ɕ/). However, this description was only followed by one later work (Jurado Reference Jurado2011). In addition, some authors – including Larrasquet (Reference Larrasquet1934), N’Diaye (Reference N’Diaye1970), and Txillardegi (1982) – have described the apico-alveolar fricative of Eastern varieties as retroflex, restricting the apico-alveolar realizations to the varieties that are in contact with Spanish (N’Diaye Reference N’Diaye1970: 15).

The most striking part of the Basque sibilant inventory is the opposition between an apico-alveolar and a lamino-alveolar place of articulation, at least if we accept the most widespread descriptions of these segments (Hualde Reference Hualde, Ignacio Hualde and Ortiz de Urbina2003; Hualde et al. Reference Hualde, Lujanbio and Joxe Zubiri2010). To our knowledge, Basque and Mirandese (Rodrigues Reference Rodrigues2022), an endangered Western-Romance language spoken in North-Eastern Portugal which may exhibit such an opposition restricted to sibilant fricatives, are the only languages that would allow the study of a phonemic contrast between apico-alveolar and lamino-alveolar sibilants. However, many languages seem to use both configurations as non-contrastive. In Spanish, each is used in different geographic varieties (Hualde Reference Hualde2014: 34), while in English they might be in free variation: a study of the productions of 20 speakers of American English by Dart (Reference Dart1991: 38) found that 57.5% of the examined alveolar sibilant tokens had laminal articulations while 42.5% had apical articulations.

This six-sibilant system with three places of articulation and fricative and affricate voiceless sibilants is found in a number of varieties, including Standard Basque. However, not all varieties have the same sibilant inventory (Mitxelena 2011 [Reference Mitxelena1977]; Hualde Reference Hualde2010). Some varieties spoken in the Southern Basque Country have merged the apico-alveolar and lamino-alveolar sibilants, resulting typically in an apical fricative and a laminal affricate, but some varieties retain only laminal sibilants (see Muxika-Loitzate (Reference Muxika-Loitzate2017) and Beristain (Reference Beristain2018; Reference Beristain2021) for recent acoustic studies on sibilant merger in Bizkaian and Gipuzkoan). Some Eastern varieties in contact with French have merged the apico-alveolar fricative and the post-alveolar fricative, as reported in Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b) for Mixean Basque. Additionally, some Eastern varieties (including Mixean) have developed a series of voiced sibilants with the same places of articulation – such as /z̻, z̺, ʒ, d͡z̺/ – likely introduced through contact with Gascon and French. Voiced sibilants are mainly found in loanwords (e.g., etsamin /ed͡z̺amin/ ‘examination’) but they have also been developed in liaison (Larrasquet Reference Larrasquet1932, Reference Larrasquet1934; Lafon Reference Lafon, Haritschelhar and Charritton1999: 129; Mitxelena 2011 [Reference Mitxelena1977]), e.g., deus + ezdeuse [deuz̺e] ‘nothing’. In many varieties from all the Basque-speaking area, voiced palato-alveolar sibilants /ʒ, d͡ʒ/ have also developed from glide fortition, e.g., jan [(d)ʒan] ‘eat’, mendija [mendiʒa] ‘mountain’.

2.5 Previous acoustic studies of Basque sibilants

Although the number of acoustic studies on Basque data was very low until the 2010s, the last ten years have slowly but steadily improved the state of this discipline. Virtually all recent studies that describe the sibilants of a given variety of Basque characterize them according to their CoG values. Overall, studies on High Navarrese have mostly reported maintenance of all six (fricative and affricate) voiceless sibilant phonemes through CoG measurements (e.g., Yárnoz Reference Yárnoz2002a), while other varieties often show mergers of different kinds.Footnote 1

Nevertheless, many studies are reduced in scope (with regard to the number of speakers, tokens, and/or segments analyzed), and they typically analyze western and central varieties of the language. For example, Hualde (Reference Hualde2010) reported the results of a single speaker; Gandarias et al. (Reference Gandarias, Plaza and Gaminde2014) too reported the complete CoG results for a single speaker and they only analyzed three tokens for each sibilant; Iglesias et al. (Reference Iglesias, Gandarias and Unamuno2016) did not use lexical items but nonce words and analyzed just one speaker. Muxika-Loitzate (Reference Muxika-Loitzate2017) and Beristain (Reference Beristain2018) analyzed only fricatives, leaving affricates aside. Beristain (Reference Beristain2021) reports results of a bigger corpus (including 18 female speakers, six per variety, and 80 tokens per speaker, i.e. 40 tokens per fricative), comparing the apico-alveolar and lamino-alveolar fricatives of three varieties of Basque (and Northern Spanish).

Concerning descriptions of sibilant sounds in our variety of interest, there are two previous works that deal with Mixean data. First, Urrutia et al. (Reference Urrutia, Etxebarria, Túrrez and Duque1991) report the lower energy cut-off frequencies of the sibilants of three speakers of Eastern Low Navarrese, including one speaker of Mixean Low Navarrese. They report the mean lower energy cut-off frequency of three voiceless sibilant fricatives: apico-alveolar, dorso-alveolar, and palatal (Urrutia et al. Reference Urrutia, Etxebarria, Túrrez and Duque1991: 203–232) and their affricate counterparts (Urrutia et al. Reference Urrutia, Etxebarria, Túrrez and Duque1991: 233–272), with only four tokens of (partially) voiced sibilants across speakers, all of them of [z̺] (Urrutia et al. Reference Urrutia, Etxebarria, Túrrez and Duque1991: 227). Comparing their data to other modern acoustical analyses of sibilants is not easy, since all recent studies report the spectral CoG as the main (or only) metric. Besides, Urrutia et al. (Reference Urrutia, Etxebarria, Túrrez and Duque1991) only offer aggregated data of three speakers of Eastern Low Navarrese, which combines Donibane Garazi (Cicean Low Navarrese), Donapaleu (Mixean Low Navarrese), and Salazar Valley (Salazarese) varieties. However, Mixean is markedly different from those geographically adjacent varieties, so that the results reported in this work do not necessarily represent the Mixean variety.

More recently, Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b) provide an extensive acoustic description of Mixean Basque, their work likely being the most complete acoustic description available on any variety of the Basque language. They used data originally recorded by Camino (Reference Camino2016) through fieldwork carried over the last 40 years, and analyzed the speech of ten speakers from ten towns within the Mixe region, for a total of 1494 sibilant tokens (ranging between 97 and 214 tokens per speaker). They found very similar CoG values for the three voiced fricative sibilants, /z̻/ being a little higher than /z̺/ and /ʒ/. Lamino-alveolar sibilants /s̻, t͡s̻/ showed the highest CoG values, with no meaningful differences between apico-alveolar /s̺, t͡s̺/ and alveolo-palatal /ʃ, t͡ʃ/. The authors proposed that this result suggests a merger in the place of articulation of the apical and postalveolar categories; both for voiced and voiceless as well as fricative and affricate sibilants (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020b: 2799). Regarding the resulting place of articulation, the authors tentatively suggested that alveolo-palatal sibilants may have merged to the apico-(post)alveolar series in the Mixean variety (Egurtzegi & Carignan Reference Egurtzegi and Carignan2020b: 2800). There were no significant differences between the CoG frequencies of fricative and affricate segments (in their fricative portion) sharing place of articulation. Nevertheless, the description of Mixean Basque sibilants by Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b) has its limitations: for example, they do not report the differences between voiced and voiceless sibilants, and they do not investigate the acoustic cues that differentiate fricatives and affricates.

In this paper, we go well beyond the limitations of all previous studies by examining variation in the acoustic properties of sibilant sounds, as well as acoustically differentiating fricative vs. affricate sibilants and voiced vs. voiceless sibilants. To this end, we report on metrics that go beyond CoG, including direct cosine transform (DCT) coefficients to capture temporal information, relative intensity to distinguish between fricatives and affricates, and auto-correlation (AC) coefficients to differentiate voiced from voiceless segments. Thus, we aim at providing a full description of Mixean Basque sibilants that could be used as a basis for the analysis of the sibilant systems of other Basque varieties, or other languages with similar oppositions.

3 Research questions

In this exploratory acoustic study, we aim to analyze the properties of Mixean Basque sibilants by answering the following questions in order to present a detailed description of the sibilant oppositions in this variety:

  1. 1. What are the spectral characteristics of sibilants in Mixean Basque? Which measure differentiates their place of articulation most accurately? Do we find evidence in line with the merger of the apico-alveolar and postalveolar sibilants proposed in Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b)?

  2. 2. What are the spectro-temporal properties of sibilants, and especially how does their spectral center of gravity change over time?

  3. 3. Which acoustic metrics differentiate voiced and voiceless sibilants?

  4. 4. Which acoustic metrics differentiate fricative and affricate sibilants?

  5. 5. Taking all the acoustic measures into account, how many sibilants can we distinguish in Mixean Basque?

4 Methodology

4.1 Data

The present study takes a deeper look into the data examined by Egurtzegi & Carignan (Reference Egurtzegi and Carignan2020b). It is the only data available that allows us to study the Mixean variety of Basque preceding the abrupt stop in transmission and major shift to French that started in the 1970s (Camino Reference Camino2016). It is based on free-conversation data recorded in various villages of the Mixe region by Iñaki Camino (analyzed from a dialectological point of view in Camino (Reference Camino2016)). We selected recordings carried out between 2005 and 2015 with a SONY MZ-R30 minidisc recorder (at a sampling rate of 44.1 kHz), leaving out speakers over 85 years old as well as older recordings made with a DAT recorder, both because of the different recording method and the large timespan in the period of recording. Thus, our corpus comprises data from ten subjects (three female, seven male) aged between 80 and 85 years and from the following villages in the region of Mixe: Donapaleu, Uhartehiri, Sorhapürü, Arrüeta, Martxüeta, Labetze, Amendüze, Gamue, Zohota, and Arberatze. For each speaker, an average of 5.5 minutes of audio was analyzed (with a 3.5–8.5 min. range). The recordings were initially transcribed by Iñaki Camino, and then expanded and, when necessary, corrected by the first author, guided by the agreed-upon etymological forms of the Eastern varieties. The transcriptions were force-aligned using the WebMAUS application (Kisler et al. Reference Kisler, Reichel and Schiel2017) with the Basque (FR) language setting, specifically designed for the north-eastern varieties of Basque by the first author of this paper. The automatically generated TextGrids were later carefully hand-corrected.

In total, 1912 sibilants were gathered from the recordings. On average, speakers produced 191 sibilant tokens (ranging between 111 and 265, SD = 46.5). Between four and 716 tokens of each phone were extracted from the corpus, as shown in Table 1. This asymmetry is inherent to the frequency of the sounds in the language, where voiceless lamino-alveolars are the most frequent segments (being part of high-frequency grammatical words and affixes), while the voiced lamino-alveolar affricate /d͡z̻/ is limited to a handful of words (mostly recent loans). Due to the low number of examples, /d͡z̻/ was not used in regression analyses, but it is included whenever feasible in descriptive statistics.

Table 1. Total number of sibilant tokens in the study

The plots in the next section report speaker-normalized data. Normalization was performed to account for the physiological differences among speakers. We first computed by-speaker Lobanov normalization values (or z-scores) in R (R Core Team 2022), which we then converted back to their original scale by using the average standard deviation and the average of the means (grand mean) of the ten speakers. These normalized values retain the speaker-specific normalized structure of z-scores but in a more familiar scale. All statistical analyses were performed on non-normalized data. Instead of using speaker-normalized values, a random effect for ‘speaker’ was included in all models to account for this variation.

4.2 Acoustic metrics

4.2.1 Static values of spectral moments

The center of gravity of the fricative noise (CoG, in Hz) was measured in Praat (Boersma & Weenink Reference Boersma and Weenink2023) for nine equal intervals of the phone between 5% and 95% of the duration of the phone, using a 300–19000 Hz pass Hann filter. For fricatives, the measurements used in the static analysis are those corresponding to the middle interval of the phone (i.e. 45–55% of the phone’s duration). For affricates, however, in order to make sure to draw data from the fricative phase, a window centered around 70% (65–75%) of the phone was used instead. CoG is the most commonly used measure to analyze spectral properties of sibilants (see, among others, Jongman et al. Reference Jongman, Wayland and Wong2000; Gordon et al. Reference Gordon, Barthmaier and Sands2002; or Muxika-Loitzate Reference Muxika-Loitzate2017; Beristain Reference Beristain2021 for Basque), but, for the sake of completeness, kurtosis, skewness, and spectral standard deviation (SSD) were also measured at the 45–55% window of the fricative sounds and 65–75% window of the affricate sounds in this study.

4.2.2 Temporal dynamics of CoG

Reidy (Reference Reidy2016) has shown that dynamic spectral analysis of sibilants reveals language-specific and consonant-specific information not available when using only static measures. In this study, the nine measurements of CoG were compressed into four coefficients using Discrete Cosine Transformation (DCT) following methods used to track temporal changes in formants (Harrington et al. Reference Harrington, Kleber and Reubold2008; Harrington & Schiel Reference Harrington and Schiel2017) or for the analysis of sibilants as in Stuart-Smith (Reference Stuart-Smith2020). The dct function of the package emuR (Winkelmann et al. Reference Winkelmann, Jaensch, Cassidy and Harrington2021) was used for this procedure. DCT decomposes a signal (in this case, a series of data points) into a series of cosinusoidal waves. The sum of these waves approximates the shape of the original data, so that the signal can be approximately reconstructed from them. Each coefficient (k0, k1, etc.) corresponds to the amplitude of its respective wave. The first three coefficients are typically used in acoustic studies: k0, k1 and k2. They are proportional to the mean, linear slope, and curvature of the trajectory, respectively (Harrington Reference Harrington2010: 305). Additionally, we also report the fourth coefficient (k3) to capture yet more fine-grained temporal change, which may potentially help distinguish affricates from fricatives. Figure 1 is an example showing the way DCT transforms the data. The four DCT coefficients for one affricate sound are plotted on the left. The plots on the right show the sum of the waves, comparing the effect of using three and four coefficients, and the original nine data points.

Figure 1. [t͡s̻] produced by Speaker 15 (item 1860). Left: plots of DCT coefficients (k0–k3). Right: the sum of k0 to k2 coefficients, the sum of k0 to k3 coefficients and the original nine values of CoG.

4.2.3 Voicing and duration

Voicing contrasts can be described acoustically in different forms. For this study, we measured the auto-correlation (AC) coefficients and segment duration.

Following the protocols in Blevins et al. (Reference Blevins, Egurtzegi and Ullrich2020: 311), AC values were calculated with the program EMU (Harrington Reference Harrington2010). We used the ESPS method, with a frame spacing of 10 ms, a window length of 7.5 ms, and pitch ranges of 60–400 Hz for male speakers and 90–600 Hz for female speakers, for a total of 17,957 measurements from all sibilants in our corpus. To aggregate the results while avoiding coarticulatory effects, we calculated the median values of the AC coefficients of each sibilant. Ceteris paribus, voiced phones are expected to show higher AC coefficients and, conversely, voiceless phones should show lower AC coefficients.

Duration may be another cue that helps distinguish voiced and voiceless sibilants. Some studies point to speakers perceiving shorter fricatives as voiced (Cole & Cooper Reference Cole and Cooper1975; Widdison Reference Widdison1995) and acoustic studies have shown voiced consonants tend to be in fact shorter (e.g. Smith Reference Smith1997; Żygis et al. Reference Żygis, Fuchs and Koenig2012). On the other hand, we do not expect to find remarkable differences within the voiced and voiceless categories (Gordon et al. Reference Gordon, Barthmaier and Sands2002; Kochetov Reference Kochetov2017). We relied on raw duration data (log-transformed) because it would offer a straightforward interpretation while still showing voicing distinctions.

4.2.4 Fricative/affricate distinction

Fricatives and affricates can be distinguished by various acoustic measures, such as frication duration (Kluender & Walsh Reference Kluender and Walsh1992; Mitani et al. Reference Mitani, Kitama and Sato2006) or amplitude rise time (Howell & Rosen Reference Howell and Rosen1983; Mitani et al. Reference Mitani, Kitama and Sato2006). However, these measures require annotating the fricative portion of the phone. For this study, we used relative intensity, i.e. the difference between the highest intensity in the following vowel and the lowest intensity in the sibilant. Hualde et al. (Reference Hualde, Eager and Nadeu2015) showed that this measure differentiates fricatives and affricates in Catalan. To calculate the relative intensity, we used Praat to extract the maximum intensity in the following phone and the minimum intensity in the sibilant. We did this after discarding the initial and final 5% of each phone and filtering the signal in the same way described above for the CoG measurements (i.e. a 300–19000 Hz pass Hann filter).

4.3 Statistical analyses

For the statistical analyses we used Bayesian mixed-effects models fitted with brms (Bürkner Reference Bürkner2017). We used weakly informative priors, which means that the influence of extreme values on the posterior is minimized, but the prior does not have a strong influence on the posterior. For each of the relevant factors, we report mean estimates and their 95% credible intervals (CIs). When analyzing contrasts between phones we use the emmeans package (Lenth et al. Reference Lenth, Buerkner, Giné-Vázquez, Herve, Jung, Love, Miguez, Riebl and Singmann2023) to obtain median estimates of the differences between pairs of phones based on the posterior distribution, together with highest density intervals (HDI). Then, we calculate the percentage of 89% of HDI contained in the region of practical equivalence (ROPE) using the bayestestR package (Makowski et al. Reference Makowski, Ben-Shachar and Lüdecke2019). We take the ROPE to be a range of ±0.1 SDs in the dependent variable. Our decision rule is that if less than 5% of HDI falls within the ROPE we take it as strong evidence for the difference.

Further details of the models, as well as their diagnostics, are included in the supplementary materials.

5 Acoustic properties of Mixean sibilants

5.1 Exploration

A first visual inspection of spectrograms of the Basque sibilant sounds as produced by Mixean speakers shows differences between apico-alveolar and lamino-alveolar voiceless fricatives (Figure 2) and lamino-alveolar and palato-alveolar voiceless fricatives (Figure 3), but no clear discernable difference between apico-alveolar and palato-alveolar voiceless fricatives (Figure 4) or their affricate counterparts (Figure 5). In the following subsection we will present quantitative data to support these observations.

Figure 2. A spectrogram of the sequence [os̺iis̻] from jelosi izaiteik ‘(we didn’t have to) be jealous’ (speaker 10).

Figure 3. A spectrogram of the sequence [jʃoakajs̻o] from gaixoak aizorat juiten ‘the poor going to the neighbor’ (speaker 10).

Table 2. Mean spectral moments (SD is shown between parenthesis)

Figure 4. A spectrogram of the sequence [es̺katoʃe] from neskatoxe ‘girl’ (speaker 9).

Figure 5. A spectrogram of the sequence [ntʃaneont͡s̺i] from laboantxan eontsi ‘be engaged in agriculture’ (speaker 10).

Figure 6. Violin plots and superimposed boxplots of CoG by phone.

Figure 7. Boxplots of speaker-normalized skewness by phone.

Figure 8. Boxplots of speaker-normalized Spectral Standard Deviation by phone.

5.2 Spectral moments

Table 2 and Figures 69 Footnote 2 show values of the spectral moments – CoG, SSD, kurtosis, and skewness – for each target sibilant. Lamino-alveolar voiceless sibilants have the highest mean CoG values (5362 Hz for the affricate and 5193 Hz for the fricative). Apico-alveolar and postalveolar voiceless sibilants have similar values (around 4400 Hz). The lowest CoG is that of voiced fricatives (ranging from 3289 Hz for /ʒ/ to 3998 Hz for /z̻/). SSD ranges from 3035 Hz for /t͡ʃ/ to 4733 for /ʒ/. Mean kurtosis values are positive and range from 2.4 for laminal voiceless sibilants to 6.7 for /z̺/. Mean skewness values are also positive, suggesting a concentration of energy in the lower frequencies. The lowest values correspond to voiceless laminal sibilants and the highest to voiced apical and postalveolar sibilants.

Figure 9. Boxplots of speaker-normalized kurtosis by phone.

A Bayesian linear mixed-effects model was fitted to the data with CoG value as the response variable and phone as the fixed effect. A random effect of word was included, as well as that of speaker with correlated varying slope for the variable phone: CoG ∼ phone + (1 | word) + (phone | speaker). The following priors were specified for the CoG model:

Intercept ∼ Normal(4800, 1500)

β, τ, σ ∼ Normal(0, 1500)

By-subject correlation ∼ LKJcorr(1)

Results are presented in Table 3. Comparisons between the phones (Table 4) showed that there is no evidence for a difference between /s̺/ and /ʃ/, on the one hand, and /t͡s̺/ and /t͡ʃ/, on the other hand. The same holds for their voiced counterparts (/ʒ/ and /z̺/).

Table 3. Model’s estimates for CoG

Table 4. Contrasts between the values of PHONE within each manner category

Models for the remaining spectral moments were also conducted (see supplementary materials for detailed results). None showed evidence of differences between apico-alveolar and postalveolar sibilants. Nevertheless, voiced apico-alveolar and postalveolar sibilants only differ in the second spectral moment, i.e. standard deviation, /ʒ/ having higher values than any other sibilant.

Finally, Figure 10 shows CoG values modeled for all speakers. The lack of difference between apical and postalveolar sibilants is generalized among the speakers.

Figure 10. Posterior predictive distribution of CoG (Hz) for each analyzed speaker (median, .66 and .95 CI).

5.3 Temporal dynamics of CoG

In this section we study the spectral temporal dynamics of sibilants, focusing on CoG values, which were measured in nine equal intervals of the phone. Figure 11 shows CoG trajectories for each phone (the mean for each speaker and the mean for all data). Figure 12 presents the mean CoG values for all data, separately for voiceless fricatives and affricates. It can be seen that:

  • The trajectories of voiced sounds are flatter than those of voiceless ones (though much less data is available for voiced sibilants).

  • The time point of the highest mean CoG value is registered at 55–65% of the interval for both alveolar fricatives, and at 45–55% for the postalveolar fricative. For affricates, it is 65–75% of the whole segment for alveolar phones and 55–65 for the postalveolar.

  • As regards the distinction between /s̺/ and /ʃ/, CoG values are the same for both in the middle portion of the sound, but their overall trajectories appear to have different shapes, with lower values for /s̺/ than for /ʃ/ in the onset. These higher values make /ʃ/ more similar to /s̻/ at the onset.

Figure 11. CoG trajectories. Each colored line represents speaker averages while black lines represent the pooled mean.

Figure 12. Average CoG values for voiceless sibilants (all data), fricatives plotted in the upper chart and affricates in the bottom one.

The nine equidistant data-points were reduced to four coefficients using DCT. Recall that DCT coefficients reflect the different properties of the signal curve: k0 corresponds to its mean, k1 reflects the signal’s slope, and k2 is proportional to its curvature (Harrington Reference Harrington2010: 305). Table 5 shows their mean values for each phone, followed by their standard deviation between parentheses.

Table 5. Mean values of the first four DCT coefficients (SDs in parenthesis)

Four models were fitted to the data, with k0, k1, k2, and k3 as the response variable, respectively, all with phone as the predictor and speaker and word as random effects (e.g., k0 ∼ phone + (phone | speaker) + (1 | word)). We provided different priors for each coefficient.

For k0:

Intercept ∼ Normal(6200, 1000)

β, τ, σ ∼ Normal(0, 1000)

By-subject correlation ∼ LKJcorr(1)

For k1:

Intercept ∼ Normal(–200, 500)

β, τ, σ ∼ Normal(0, 500)

By-subject correlation ∼ LKJcorr(1)

For k2:

Intercept ∼ Normal(–300, 300)

β, τ, σ ∼ Normal(0, 300)

By-subject correlation ∼ LKJcorr(1)

For k3:

Intercept, β, τ, σ ∼ Normal(0, 200)

By-subject correlation ∼ LKJcorr(1)

The estimates returned by the models are given in Table 6 and the most relevant contrasts (obtained with emmeans) are listed in Table 7 (see supplementary materials for all the data).

Table 6. Models of DCT coefficients

Table 7. Most relevant contrasts between phones in the four models (contrasts with ROPE < .05 are in bold)

The main results are the following:

  • k0. Results are similar to those obtained with CoG values.

  • k1. Positive values of k1 correspond to a negative slope (i.e. the value is lower at the end). For the data analyzed here, estimates are highest for voiced sibilants and /ʃ/, and, except for /ʒ/, all values are negative (i.e. we have an overall increase of CoG). As regards voiceless fricatives, the value is closest to zero for /ʃ/, and it is lower for /s̺/ than for /s̻/. In other words, the slope is steeper for /s̺/ than for other voiceless fricatives. For laminal sibilants, the value is more extreme for the affricate than for the fricative. The model shows some evidence of the difference between /ʃ/ and /s̺/ (4% in ROPE), which was not obtained with static CoG measures, as well as that between the laminal and the postalveolar. However, k1 is the same for the laminal and the apical.

  • k2. Positive values of this coefficient reflect a u-like trajectory, and negative values correspond to the inverse pattern. Values closer to zero represent flatter trajectories. All estimates are negative for our data. Laminal sibilants show the most arched trajectory. Values do not differ for fricative-affricate pairs. This model, however, shows few significant contrasts between phones.

  • k3. This coefficient has not often been used in other phonetic studies, but it might be relevant for the distinction between fricatives and affricates, as it shows low values for fricatives, but higher positive values for all affricates. This appears to correspond to the position of the peak, with higher values having a later peak. The model fitted to our data suggests that k3 distinguished lamino-alveolar and apico-alveolar fricative vs. affricate pairs (but there is less evidence for the difference between the postalveolar fricative and affricate, with 9% in ROPE).

In general, looking at all pairs of sounds, k0 distinguishes more pairs than other coefficients. As regards the contrast between the apical and the postalveolar sounds, k1 is different for the fricatives, but none of the coefficients distinguishes the affricates.

5.4 Voicing

In this section we will present the results of the two cues we used for voicing, AC coefficient (Section 5.4.1) and duration (Section 5.4.2).

5.4.1 Auto-correlation coefficients

An acoustic analysis of voicing probability based on auto-correlation (AC) values results in a clear distinction between voiced and voiceless sibilants. Note that the zero-to-one scale in the y axis in Figures 1315 corresponds to fully voiceless (0, no correlation) to fully voiced and completely regular (1, perfect correlation) realizations, with extreme values (i.e. a perfect correlation between glottal cycles) being impossible in real data. Figure 13 illustrates the distribution of the realizations of each of the sibilants separately, while Figure 14 plots the aggregated AC coefficients of all voiced sibilants on the one hand, and all voiceless sibilants on the other.

Figure 13. Distribution of the AC coefficients of each sibilant.

Figure 14. Aggregated distribution of the AC coefficients of voiced vs. voiceless sibilants.

Figure 15 shows the AC coefficient as a function of normalized time of a voiceless and a voiced lamino-alveolar sibilant. These were the first sibilant sounds in the corpus, produced by the first speaker analyzed for this study (S06, male). They are presented here as a means of illustration of the contrast in clear cases of thorough voicelessness/voicedness.

We fitted a Bayesian generalized mixed effects model to the data. The response variable was AC coefficient and phone was the fixed effect. We included word as an intercept-only random effect and speaker as a by-phone correlated varying intercept and slope effect: AC ∼ phone + (1 | word) + (phone | speaker). We specified the following priors in the model:

Intercept ∼ Normal(0.5, 0.25)

β, τ, σ ∼ (Normal(0, 0.25)

By-subject correlation ∼ LKJcorr(1)

The model estimates are summarized in Table 8. Table 9 shows the contrasts between sibilants with the same place of articulation. The posterior distributions of AC coefficients show a clear distinction between voiced and voiceless sibilants for every place of articulation.

Table 8. Model’s estimates for AC coefficients

Figure 15. AC coefficient as a function of normalized time of a voiceless and a voiced lamino-alveolar sibilant (produced by the speaker S06).

Table 9. Contrasts between fricative values of PHONE within each place of articulation category

5.4.2 Duration

In this section we focus on the duration of each phone. Figure 16 shows speaker-normalized duration values by phone. Voiced sibilants appear to be shorter than voiceless sibilants. Within the voiceless sibilants, the apical fricative seems to have a slightly shorter duration.

Figure 16. Boxplots of speaker-normalized duration by phone.

A Bayesian mixed effects model was fitted to the data, with duration as the response variable, phone as the fixed effect, word as a random-intercept effect and speaker as a correlated varying intercept and slope random effect for phone: duration ∼ phone + (1|word) + (phone|speaker). Unlike the rest of the models, we specified a lognormal family, to better account for the long tails produced by longer duration tokens. We specified the following priors:

Intercept ∼ Normal(–2.5, 2.5)

β, τ, σ ∼ Normal(0, 0.5)

By-subject correlation ∼ LKJcorr(1)

Table 10 summarizes the estimated duration of each phone (in seconds). Table 11 shows the contrasts between voiced and voiceless phones with the same place of articulation, between voiceless fricative and affricate phones with the same place of articulation, and the contrasts between voiceless postalveolars and apico-alveolars. On average, duration distinguishes between voiced and voiceless fricatives with the same place of articulation, but fails to distinguish fricatives and affricates. Apical fricatives appear to be shorter than other voiceless sibilant phones.

Table 10. Model’s estimated mean duration for each phone

Table 11. Contrasts between the values of PHONE within each place of articulation category

5.5 Fricative/affricate distinction

In Section 5.3 we have shown that at least some pairs of fricative and its corresponding affricate sound differ in their temporal dynamics of CoG in that the value of the k3 DCT coefficient is higher for affricates.

In this section we focus on differences in relative intensity, i.e. the lowest intensity value in the sibilant as compared to the highest intensity value in the following vowel. As shown by Hualde et al. (Reference Hualde, Eager and Nadeu2015), the difference in intensity is higher in affricates than in fricatives, due to the oral closure present in affricates. An example from our data is given in Figure 17, which compares the intensity curves and waveforms for an affricate and a fricative sound.

Figure 18 shows relative intensities for our data taking into account only prevocalic sibilants. As can be seen, affricates have higher values. A Bayesian mixed-effects model was fitted to the data, with relative intensity as the response variable and phone as the fixed effect and with a random intercept for word and a random intercept plus slope for subject: relative intensity ∼ phone + (1 | word) + (phone | speaker). We specified the following priors:

Intercept ∼ Normal(15, 5)β

β, τ, σ ∼ Normal(0, 5)

By-subject correlation ∼ LKJcorr(1)

Results are presented in Table 12. The model predicts higher relative intensity for affricates.

Figure 17. An example of the changes in intensity in an affricate and a fricative sound produced by Speaker 15.

Figure 18. Boxplots of relative intensity by phone for prevocalic sibilants.

Table 12. Model’s estimates for relative intensity

6 Discussion

As regards spectral moments, the CoG distinguishes lamino-alveolar sibilants from apico-alveolar and postalveolar sibilants. As in other acoustic studies of Basque sibilants, the CoG of lamino-alveolars is higher than that of other sibilants, which is expected due to the shorter front cavity found in the articulation of lamino-alveolars. We have also taken into account the remaining spectral moments, especially to find out whether they distinguish apico-alveolar and postalveolar sibilants. However, neither measure shows clear differences between those series, either for voiceless and voiced phones. This is in line with Egurtzegi & Carignan’s (Reference Egurtzegi and Carignan2020b) study of the same data set, where, on the basis of CoG values only, a possible merger between apico-alveolar and postalveolar sibilants was proposed for Mixean Basque. These might be the first acoustic documentations of xexeo (Hualde Reference Hualde2010), i.e. a merger between apical and postalveolar sibilants in favor of the later due to contact with French. However, the lack of information on the directly preceding stage of the language does not help a suitable comparison of the resulting phone to the previously attested ones.

Nonetheless, differences in CoG between apico-alveolar and postalveolar sibilants have been documented in other varieties, mostly in High Navarrese, though they tend to be smaller than these between other sibilant pairs (Hualde Reference Hualde2010; Urrestarazu-Porta in prep.). As a means of comparison, Table 13 presents the mean and modeled CoG values of a 69-year-old female speaker of High Navarrese from Etxarri-Aranatz (from Urrestarazu-Porta in prep., plotted in Figure 19), alongside the values from Mixean Basque reported in Tables 2 and 3. Note that the High Navarrese measurements were extracted with the same script we used for the current study, and its model was constructed with the same population-level effect and the same priors as the Mixean one, but the experimental conditions and recording devices were different.

Table 13. Comparison between the mean spectral CoG (SD between parenthesis) and model’s estimates for the Mixean data and a speaker of High Navarrese

Figure 19. Violin plots and superimposed boxplots of CoG by phone (fem., High Navarrese).

The difference between the mean values of the CoG of /s̺/ and /ʃ/ is clearly smaller in Mixean (0 Hz) than in High Navarrese (789 Hz), and this is also true for their affricate counterparts (200 Hz in Mixean and 365 Hz in High Navarrese). In addition, we observe higher mean CoG values in /s̻/ and /t͡s̻/ in High Navarrese (6682 Hz and 6967 Hz, respectively) than in Mixean (5193 Hz and 5362 Hz, respectively), which is suggestive of a potential compression of the acoustic space of the fricatives and the displacement of lamino-alveolars after a potential merger between apico-alveolar and postalveolar segments.

Given that we are most interested in the difference in CoG between apico-alveolar and postalveolar fricative and affricate sibilants in Mixean (with a potentially lost opposition) and High Navarrese (where the opposition is still in effect), we have computed the contrast distribution between the posterior distributions of apico-alveolar and postalveolar sibilants from the two models’ results. To this end, we subtracted the posterior distribution of the CoG of the postalveolar sibilant from that of the apico-alveolar sibilant. The differences between categories are plotted in Figure 20 for each case (fricatives and affricates) and variety (Mixean and High Navarrese). Note that, while the differences are reliably above 0 for the High Navarrese speaker (with a mean difference of 779 Hz for fricatives and 363 Hz for affricates), 0 is close to the center of the distribution in both cases in the Mixean data (with a mean difference of 174 Hz for fricatives and 13 Hz for affricates). While the former points to a clear difference between categories in High Navarrese, the latter suggests a merger between the two categories in Mixean Basque. The narrower distribution of the High Navarrese might be a consequence of the fact that it involves a single speaker as well as the different experimental setting.

Figure 20. Distribution of the difference between posterior distributions of CoG by manner and variety.

In order to better understand the spectral differences between sibilants in Mixean Basque, we also studied the way CoG changes through time (using the DCT of CoG values measured in nine equal intervals of the phone). Recent studies have suggested that the temporal dynamics of frequency values might show divergent patterns between different languages (Reidy Reference Reidy2016 found that Japanese and English /s/ differed in the shape of their trajectories) or between genders or age groups (Stuart-Smith Reference Stuart-Smith2020). It is not clear, however, if differences in the way CoG varies throughout the phone can be a cue salient enough to distinguish between sibilants within a given variety.

In general, our results converge with results of other studies. As regards the overall characteristics of the trajectory, Iskarous et al. (Reference Iskarous, Shadle and Proctor2011) found that CoG increases in the first half of English /s/, and that the average increase is around 1500 Hz. Nine 30-ms intervals distributed evenly through each token were used in the study and, in the figures provided in the paper, the first spectral moment appears to be highest around the sixth or seventh interval of the phone. This is in line with our results: for fricatives, the interval with the highest CoG was 55–65% for both alveolar voiceless fricatives, but 45–55% for the voiceless postalveolar fricative. As expected due to the initial closure, the CoG peak comes later in voiceless affricates; at 65–75% in both alveolar phones and 55–65% in the postalveolar.

As regards DCT coefficients, a study comparable to ours is Stuart-Smith (Reference Stuart-Smith2020). Stuart-Smith reports that k1 values are higher for English /ʃ/ than for /s/. We have found that k1 coefficients for all analyzed sounds except /ʒ/ were negative, which corresponds to a temporally increasing CoG. In our study, k1 coefficients were also higher for /ʃ/ than for /s̻/ and /s̺/, but we did not found any difference between /s̻/ and /s̺/. As for k2, in Stuart-Smith’s (Reference Stuart-Smith2020) study the curvature was similar for males, but females showed a more pronounced curvature for /s/ than for /ʃ/. For the Basque data, k2 is higher (thus pointing to a more curved pattern) for laminal phones than for the rest. Finally, the k3 coefficient, which is related to the position of the peak CoG value, has proved relevant to distinguish between fricatives and affricates.

With regard to voicing, our study, based on auto-correlation coefficients, points to the distinction between voiced and voiceless sibilants being consistent in Mixean Basque. Besides AC coefficients, raw duration appears to be a reliable metric to distinguish Mixean voiced and voiceless sibilants with the same place of articulation. Voiced sibilants are around 30 ms shorter than voiceless sibilants.

In addition to the duration differences between voiced and voiceless sibilants, we found /s̺/ to be shorter than /ʃ/. This result points in the direction of duration potentially being another cue speakers might use to distinguish between these two segments alongside the difference in k1. Nonetheless, this result should be taken cautiously. Firstly, the difference is small. Some studies show speakers rely on duration for categorizing shorter voiceless fricatives as being voiced and longer as voiceless (Cole & Cooper Reference Cole and Cooper1975; Widdison Reference Widdison1995). However, the difference in duration between voiced and voiceless sibilants is around 30 ms, i.e. voiceless sibilants are almost 50% longer than their voiced counterparts or, conversely, voiced sibilants are around 40% shorter than voiceless sibilants. In our study, the median estimate of the posterior distribution of /s̺/ is just around 14 ms shorter than the median estimate of the posterior distribution of /ʃ/, and a difference between 30 and 3 ms is highly credible given our data. We are not aware of any study that concludes that speakers are able to only rely on such a small difference in duration to distinguish between a pair of voiceless sibilants.

Secondly, we used the raw duration of naturalistic speech. This means that we did not control for some factors that are known to affect duration, such as speech rate. We assumed each person’s fluctuations of speech rate would be distributed normally and, thus, including the correlated varying intercept and slope effect for speaker would account for much of the variation produced by speech rate. However, the number of tokens for some phones was limited and there might be an effect related to speech rate that was unaccounted for.

Thirdly, /s̺/ and /ʃ/ are not distributed evenly across the language. In our data, 47% of the occurrences of /ʃ/ are between vowels (V.ʃV), 21% are word-initial (#ʃV), 17% are in syllable onsets preceded by a consonant (VC.ʃV) and 16% are in non-final syllable-codas (Vʃ.CV). In turn, 59% of /s̺/ in our data are in non-final syllable-codas (Vs̺.CV), 25% are between vowels (V.s̺V), 8% are in syllable-onsets preceded by a consonant (VC.s̺V), 7% are word-initial (#s̺V) and around 1% are word-final (Vs̺#). We speculate that the shorter estimation of /s̺/ may be due to it occurring mostly in coda position followed by a consonant.

Finally, the difference between prevocalic fricative and affricate sibilants seems to be appropriately accounted for through the analysis of relative intensity. In line with the results in Hualde et al. (Reference Hualde, Eager and Nadeu2015), the difference in intensity was found to be higher in affricates than in fricatives, which is linked to the oral closure phase of the affricates.

In short, we can reliably distinguish between seven sibilants in Mixean Basque, but the acoustic difference between /s̺, t͡s̺, z̺/ and /ʃ, t͡ʃ, ʒ/, respectively, observed in our data might not be enough to speak of a perceptible distinction. Our results call for further perceptual and articulatory analyses that would corroborate our findings. However, the linguistic landscape of Amiküze has changed dramatically since the 1970s; the transmission of Mixean Basque was almost entirely interrupted and French has replaced Basque almost completely for most social interactions (Camino Reference Camino2016). Thus, it may be an impossible endeavor to find Mixean speakers with a variety of the language comparable to the one analyzed in our study. This highlights the urgent need for the documentation and study of other endangered varieties and languages.

7 Conclusion

This paper has presented a detailed study of the sibilant system of Mixean Basque, a variety which has been described with ten sibilants: /s̻, s̺, ʃ, t͡s̻, t͡s̺, t͡ʃ, z̻, z̺, ʒ, d͡z̺/. Our results suggest that the phonological oppositions based on manner of articulation – i.e. voiced vs. voiceless and fricative vs. affricate – are still part of this variety, but two of the three historical places of articulation are not easily phonetically distinguishable in the modern language. While the lamino-alveolar series (/s̻, t͡s̻, z̻/) reliably contrasts with the rest, only minimal differences between the apico-alveolar series (/s̺, t͡s̺, z̺, d͡z̺/) and the postalveolar series (/ʃ, t͡ʃ, ʒ/) have been found in the analysis of dynamic data and duration, and none in the static analysis. A difference in CoG in the onset of apico-alveolars vs. postalveolars and a small difference in duration might not be enough to reliably differentiate the two sets of segments today, but it can potentially be considered a historical remnant of what was a fully working opposition in earlier generations. Our analysis reminds us that studying synchronic acoustic data can lead to interesting historical observations, and help us better understand the gradual nature of sound change in general, and segmental mergers in particular.

Acknowledgments

We are highly indebted to Iñaki Camino for letting us use his recordings, and two anonymous reviewers as well as the associate editor of JIPA for their many comments and suggestions, which have helped improve the final result of this paper. All remaining errors are ours. This research was partially funded by Modern approaches to diachronic phonology applied to Basque (MADPAB) (ANR-20-CE27-0007), Monumenta Linguae Vasconum VI (PID2020-118445GB-I00), and Diachronic Linguistics, Typology and the History of Basque (IT1534-22).

Author contribution

Both AE and DK have contributed to this paper equally and should be considered first authors.

Replication material

The measurements and code used for this research can be found at https://osf.io/vctgw/.

Footnotes

1 Yárnoz (Reference Yárnoz2002a) does not report aggregated CoG values, but, as an example, a 69-year-old female High Navarrese speaker from Etxarri-Aranatz from the general study of sibilant production in different varieties of Basque in Urrestarazu-Porta (in preparation) shows the following mean (and SD) CoG values for each sibilant phoneme when extracted with the same script we used for the current study: /s̻/ 6685 Hz (706), /s̺/ 5316 Hz (490), /ʃ/ 4527 Hz (286), /t͡s̻/ 6967 Hz (393), /t͡s̺/ 5287 Hz (341), /t͡ʃ/ 4922 Hz (352).

2 In all boxplots in the paper the dot represents the mean and the horizontal line the median. The hinges correspond to Q1 and Q3 and whiskers to 1.5 IQR. The notches correspond to ∼95% confidence interval for the median.

References

Beristain, Ander. 2018. Basque dialectal substrate in the realization of /s/ in L2 Spanish. MA thesis, University of Illinois at Urbana-Champaign.Google Scholar
Beristain, Ander. 2021. Spectral properties of anterior sibilant fricatives in Northern Peninsular Spanish and sibilant-merging and non-merging varieties of Basque. Journal of the International Phonetic Association 52(3). 132. https://doi.org/10.1017/S0025100320000274.Google Scholar
Blevins, Juliette, Egurtzegi, Ander & Ullrich, Jan. 2020. Final obstruent voicing in Lakota: Phonetic evidence and phonological implications. Language 96(2). 294337. https://doi.org/10.1353/lan.2020.0022.CrossRefGoogle Scholar
Boersma, Paul & Weenink, David. 2023. Praat: Doing phonetics by computer. http://www.praat.org/.Google Scholar
Bürkner, Paul-Christian. 2017. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software 80. 128. https://doi.org/10.18637/jss.v080.i01.CrossRefGoogle Scholar
Camino, Iñaki. 2016. Amiküze eskualdeko (h)eskuara. Pamplona & Bilbao: Nafarroako Gobernua & Euskaltzaindia.Google Scholar
Cole, Ronald A. & Cooper, William E.. 1975. Perception of voicing in English affricates and fricatives. The Journal of the Acoustical Society of America 58(6). 12801287. https://doi.org/10.1121/1.380810.CrossRefGoogle ScholarPubMed
Dart, Sarah Northrop. 1991. Articulatory and acoustic properties of apical and laminal articulations. PhD thesis, University of California, Los Angeles. https://www.proquest.com/docview/303942751/abstract/2D991346B41A4FBEPQ/1.Google Scholar
Egurtzegi, Ander. 2013. Phonetics and phonology. In Mikel Martínez-Areta (ed.), Basque and proto-Basque. Language-internal and typological approaches to linguistic reconstruction, 119–173. Frankfurt am Main: Peter Lang.Google Scholar
Egurtzegi, Ander & Carignan, Christopher. 2020a. A typological rarity: The /h/ versus /h ̃/ contrast of Mixean Basque. In. The University of British Columbia, Vancouver (held online).Google Scholar
Egurtzegi, Ander & Carignan, Christopher. 2020b. An acoustic description of Mixean Basque. The Journal of the Acoustical Society of America 147(4). 27912802. https://doi.org/10.1121/10.0000996.CrossRefGoogle ScholarPubMed
Gandarias, Leire, Plaza, Jone & Gaminde, Iñaki. 2014. Lekeitioko txistukariez: frikariak eta afrikatuak. Euskalingua 24. 621.Google Scholar
Gordon, Matthew, Barthmaier, Paul & Sands, Kathy. 2002. A cross-linguistic acoustic study of voiceless fricatives. Journal of the International Phonetic Association 32(2). 141174. https://doi.org/10.1017/S0025100302001020.CrossRefGoogle Scholar
Harrington, Jonathan. 2010. Phonetic analysis of speech corpora. Chichester: Wiley-Blackwell.Google Scholar
Harrington, Jonathan, Kleber, Felicitas & Reubold, Ulrich. 2008. Compensation for coarticulation, /u/-fronting, and sound change in standard southern British: An acoustic and perceptual study. The Journal of the Acoustical Society of America 123(5). 28252835. https://doi.org/10.1121/1.2897042.CrossRefGoogle ScholarPubMed
Harrington, Jonathan & Schiel, Florian. 2017. /u/-fronting and agent-based modeling: The relationship between the origin and spread of sound change. Language 93(2). 414445. https://doi.org/10.1353/lan.2017.0019.CrossRefGoogle Scholar
Howell, Peter & Rosen, Stuart. 1983. Production and perception of rise time in the voiceless affricate/fricative distinction. The Journal of the Acoustical Society of America 73(3). 976984. https://doi.org/10.1121/1.389023.CrossRefGoogle ScholarPubMed
Hualde, José Ignacio. 2003. Segmental phonology. In Ignacio Hualde, José & Ortiz de Urbina, Jon (eds.), A grammar of Basque, 1564. Berlin: De Gruyter Mouton.CrossRefGoogle Scholar
Hualde, José Ignacio. 2010. Neutralización de sibilantes vascas y seseo en castellano. Oihenart 25. 89116.Google Scholar
Hualde, José Ignacio. 2014. Los sonidos del español. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511719943.Google Scholar
Hualde, José Ignacio, Eager, Christopher D. & Nadeu, Marianna. 2015. Catalan voiced prepalatals: Effects of nonphonetic factors on phonetic variation? Journal of the International Phonetic Association 45(3). 243267. https://doi.org/10.1017/S0025100315000031.CrossRefGoogle Scholar
Hualde, José Ignacio, Lujanbio, Oihana & Joxe Zubiri, Juan. 2010. Goizueta Basque. Journal of the International Phonetic Association 40(1). 113127. https://doi.org/10.1017/S0025100309990260.CrossRefGoogle Scholar
Iglesias, Aitor, Gandarias, Leire & Unamuno, Lorea. 2016. Euskararen txistukariak aztertzeko indize akustikoez. Euskalingua 28. 618.Google Scholar
Iskarous, Khalil, Shadle, Christine H. & Proctor, Michael I.. 2011. Articulatory–acoustic kinematics: The production of American English /s/. The Journal of the Acoustical Society of America 129(2). 944954. https://doi.org/10.1121/1.3514537.CrossRefGoogle ScholarPubMed
Jassem, Wiktor. 2003. Polish. Journal of the International Phonetic Association 33(1). 103107. https://doi.org/10.1017/S0025100303001191.CrossRefGoogle Scholar
Jongman, Allard, Wayland, Ratree & Wong, Serena. 2000. Acoustic characteristics of English fricatives. The Journal of the Acoustical Society of America 108(3). 12521263. https://doi.org/10.1121/1.1288413.CrossRefGoogle ScholarPubMed
Jurado, Mirari. 2011. Caracterización de sibilantes fricativas vascas y su percepción en el sistema fonético español. Anuario del Seminario de Filología Vasca “Julio de Urquijo” 45(1). 81137. https://doi.org/10.1387/asju.9727 Google Scholar
Kisler, Thomas, Reichel, Uwe & Schiel, Florian. 2017. Multilingual processing of speech via web services. Computer Speech & Language 45. 326347. https://doi.org/10.1016/j.csl.2017.01.005.CrossRefGoogle Scholar
Kluender, Keith R. & Walsh, Margaret A.. 1992. Amplitude rise time and the perception of the voiceless affricate/fricative distinction. Perception & Psychophysics 51(4). 328333. https://doi.org/10.3758/BF03211626.CrossRefGoogle ScholarPubMed
Kochetov, Alexei. 2017. Acoustics of Russian voiceless sibilant fricatives. Journal of the International Phonetic Association 47(3). 321348. https://doi.org/10.1017/S0025100317000019.CrossRefGoogle Scholar
Ladefoged, Peter. 2001. Vowels and consonants: An introduction to the sounds of languages. Malden, MA: Blackwell.Google Scholar
Ladefoged, Peter & Maddieson, Ian. 1996. The sounds of the world’s languages. Oxford: Blackwell.Google Scholar
Lafon, René. 1999. Contribution à l’étude phonologique du parler basque de Larrau (Haute-Soule). In Haritschelhar, Jean & Charritton, Pierre (eds.), Vasconiana, 113133. Bilbao: Euskaltzaindia.Google Scholar
Larrasquet, Jean. 1932. Phonétique du basque du Larrajá (quartier du Barcus). Revista Internacional de los Estudios Vascos 23(1). 153191.Google Scholar
Larrasquet, Jean. 1934. Le basque souletin nord-oriental. Paris: Floch. http://catalog.hathitrust.org/api/volumes/oclc/12866856.html.Google Scholar
Lenth, Russell V., Buerkner, Paul, Giné-Vázquez, Iago, Herve, Maxime, Jung, Maarten, Love, Jonathon, Miguez, Fernando, Riebl, Hannes & Singmann, Henrik. 2023. emmeans: Estimated Marginal Means, aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.Google Scholar
L’Institut national de la statistique et des études économiques. 2019. Populations legales 2016. https://www.insee.fr/fr/statistiques/zones/3681328.Google Scholar
Maddieson, Ian. 1984. Patterns of sounds. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Makowski, Dominique, Ben-Shachar, Mattan S. & Lüdecke, Daniel. 2019. bayestestR: Describing effects and their uncertainty, existence and significance within the Bayesian framework. Journal of Open Source Software 4(40). 1541. https://doi.org/10.21105/joss.01541.CrossRefGoogle Scholar
Mitani, Shigeki, Kitama, Toshihiro & Sato, Yu. 2006. Voiceless affricate/fricative distinction by frication duration and amplitude rise slope. The Journal of the Acoustical Society of America 120(3). 16001607. https://doi.org/10.1121/1.2221390.CrossRefGoogle ScholarPubMed
Mitxelena, Koldo. 2011 [1977]. Fonética histórica vasca. In Joseba A. Lakarra & Iñigo Ruiz Arzalluz (eds.), Obras completas, vol. VI. Bilbao: UPV/EHU.Google Scholar
Moran, Steven & McCloy, Daniel (eds.). 2019. PHOIBLE 2.0. Jena: Max Planck Institute for the Science of Human History. https://phoible.org/.Google Scholar
Muxika-Loitzate, Oihane. 2017. Sibilant merger in the variety of Basque spoken in Amorebieta-Etxano. Languages 2(4). 25. https://doi.org/10.3390/languages2040025.CrossRefGoogle Scholar
N’Diaye, Geneviève. 1970. Structure du dialecte basque de Maya. The Hague: Mouton. https://doi.org/10.1515/9783111349725.CrossRefGoogle Scholar
R Core Team. 2022. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/.Google Scholar
Reidy, Patrick F. 2016. Spectral dynamics of sibilant fricatives are contrastive and language specific. The Journal of the Acoustical Society of America 140(4). 25182529. https://doi.org/10.1121/1.4964510.CrossRefGoogle ScholarPubMed
Rodrigues, Alexandra Soares. 2022. Phonotactic conditions and morphotactic transparency in Mirandese word formation. Folia Linguistica 56(1). 87122. https://doi.org/10.1515/flin-2021-2005.CrossRefGoogle Scholar
Smith, Caroline L. 1997. The devoicing of /z/ in American English: Effects of local and prosodic context. Journal of Phonetics 25(4). 471500. https://doi.org/10.1006/jpho.1997.0053.CrossRefGoogle Scholar
Stuart-Smith, Jane. 2020. Changing perspectives on /s/ and gender over time in Glasgow. Linguistics Vanguard 6(s1). https://doi.org/10.1515/lingvan-2018-0064.CrossRefGoogle Scholar
Txillardegi. 1982. Some acoustic data about the three Basque sibilants. In Proceedings of the First International Basque Conference in North America, 18–34. Fresno, CA: California State University.Google Scholar
Urrestarazu-Porta, Iñigo. in preparation. The acoustic properties of Basque sibilants.Google Scholar
Urrutia, Hernán, Etxebarria, Maitena, Túrrez, Itziar & Duque, Juan Carlos. 1991. Fonética vasca 3: Las sibilantes en los dialectos orientales. Bilbao: Universidad de Deusto.Google Scholar
Widdison, Kirk A. 1995. The perception of voicing in Spanish sibilants. In Proceedings of the 4th European Conference on Speech Communication and Technology (Eurospeech 1995), 2289–2292. https://doi.org/10.21437/Eurospeech.1995-521 CrossRefGoogle Scholar
Winkelmann, Raphael, Jaensch, Klaus, Cassidy, Steve & Harrington, Jonathan. 2021. emuR: Main Package of the EMU Speech Database Management System.Google Scholar
Yárnoz, Belén. 2002a. Sibilants in the Basque dialect of Bortziri: An acoustic and perceptual study. Pamplona: Gobierno de Navarra.Google Scholar
Yárnoz, Belén. 2002b. Descripción de las sibilantes vascas mediante el parámetro Tongue shape. Euskalingua 1. 2531.Google Scholar
Zabalik. 2016. Euskara gure eskoletan: dena eta ez dena. Zabalik. Amikuzeko euskalgintza. http://zabalik-amikuze.eus/euskara-gure-eskoletan-dena-eta-ez-dena/.Google Scholar
Żygis, Marzena. 2003. Phonetic and phonological aspects of Slavic sibilant fricatives. ZAS Papers in Linguistics 32. 175213. https://doi.org/10.21248/zaspil.32.2003.191.CrossRefGoogle Scholar
Żygis, Marzena, Fuchs, Susanne & Koenig, Laura L.. 2012. Phonetic explanations for the infrequency of voiced sibilant affricates across languages. Laboratory Phonology 3(2). 299336. https://doi.org/10.1515/lp-2012-0016.CrossRefGoogle Scholar
Figure 0

Table 1. Total number of sibilant tokens in the study

Figure 1

Figure 1. [t͡s̻] produced by Speaker 15 (item 1860). Left: plots of DCT coefficients (k0–k3). Right: the sum of k0 to k2 coefficients, the sum of k0 to k3 coefficients and the original nine values of CoG.

Figure 2

Figure 2. A spectrogram of the sequence [os̺iis̻] from jelosi izaiteik ‘(we didn’t have to) be jealous’ (speaker 10).

Figure 3

Figure 3. A spectrogram of the sequence [jʃoakajs̻o] from gaixoak aizorat juiten ‘the poor going to the neighbor’ (speaker 10).

Figure 4

Table 2. Mean spectral moments (SD is shown between parenthesis)

Figure 5

Figure 4. A spectrogram of the sequence [es̺katoʃe] from neskatoxe ‘girl’ (speaker 9).

Figure 6

Figure 5. A spectrogram of the sequence [ntʃaneont͡s̺i] from laboantxan eontsi ‘be engaged in agriculture’ (speaker 10).

Figure 7

Figure 6. Violin plots and superimposed boxplots of CoG by phone.

Figure 8

Figure 7. Boxplots of speaker-normalized skewness by phone.

Figure 9

Figure 8. Boxplots of speaker-normalized Spectral Standard Deviation by phone.

Figure 10

Figure 9. Boxplots of speaker-normalized kurtosis by phone.

Figure 11

Table 3. Model’s estimates for CoG

Figure 12

Table 4. Contrasts between the values of PHONE within each manner category

Figure 13

Figure 10. Posterior predictive distribution of CoG (Hz) for each analyzed speaker (median, .66 and .95 CI).

Figure 14

Figure 11. CoG trajectories. Each colored line represents speaker averages while black lines represent the pooled mean.

Figure 15

Figure 12. Average CoG values for voiceless sibilants (all data), fricatives plotted in the upper chart and affricates in the bottom one.

Figure 16

Table 5. Mean values of the first four DCT coefficients (SDs in parenthesis)

Figure 17

Table 6. Models of DCT coefficients

Figure 18

Table 7. Most relevant contrasts between phones in the four models (contrasts with ROPE < .05 are in bold)

Figure 19

Figure 13. Distribution of the AC coefficients of each sibilant.

Figure 20

Figure 14. Aggregated distribution of the AC coefficients of voiced vs. voiceless sibilants.

Figure 21

Table 8. Model’s estimates for AC coefficients

Figure 22

Figure 15. AC coefficient as a function of normalized time of a voiceless and a voiced lamino-alveolar sibilant (produced by the speaker S06).

Figure 23

Table 9. Contrasts between fricative values of PHONE within each place of articulation category

Figure 24

Figure 16. Boxplots of speaker-normalized duration by phone.

Figure 25

Table 10. Model’s estimated mean duration for each phone

Figure 26

Table 11. Contrasts between the values of PHONE within each place of articulation category

Figure 27

Figure 17. An example of the changes in intensity in an affricate and a fricative sound produced by Speaker 15.

Figure 28

Figure 18. Boxplots of relative intensity by phone for prevocalic sibilants.

Figure 29

Table 12. Model’s estimates for relative intensity

Figure 30

Table 13. Comparison between the mean spectral CoG (SD between parenthesis) and model’s estimates for the Mixean data and a speaker of High Navarrese

Figure 31

Figure 19. Violin plots and superimposed boxplots of CoG by phone (fem., High Navarrese).

Figure 32

Figure 20. Distribution of the difference between posterior distributions of CoG by manner and variety.