1. Introduction
This is a review of literature reporting experiments on the physiology, acoustics, and perception of common Danish stød, published during the past three quarters of a century.
The term common danish stød is a translation of fællesdansk stød – as opposed to vestjysk stød ‘West Jutlandic stød’. When specification is not required, common Danish stød is rendered as stød henceforth. It is found everywhere in Denmark north of a wavy line, the stød border, running from the west, between the islands Rømø and Mandø in the Wadden Sea, across Southern Jutland, South Funen, and the southernmost part of Zealand to the town of Præstø in southeast Zealand (Ringgaard Reference Ringgaard1973:22). Note, however, that stød is not distributed in the same manner in different Danish regions.
Some dialects south of the stød border have a tonal accent, as for instance on Rømø and Ærø. Others have no accent at all, whether tonal or stød, as for instance on Lolland and Bornholm (Ringgaard Reference Ringgaard1973).
Some authors merely characterize their speakers as Danish, others distinguish between speakers from Copenhagen and those who speak Standard Danish from a regional background. The only dialect study is that of Ringgaard (Reference Ringgaard1960), discussed in Section 4.1.4.
Section 2 is a brief description of the phonetic and phonological properties of stød as they are generally recognized today. Section 3 recounts a few older phonetic descriptions and experiments that serve to put into perspective the twenty-two modern investigations recounted in Section 4. Section 5 is a summary, and Section 6 contains a discussion.
2. Properties of common Danish stød
Stød is a challenging subject, partly due to its phonetic properties, as evidenced in Sections 3 and 4, but partly also due to the role it plays in the phonology of Danish.
2.1 Phonetic characteristics
The prototypical manifestation of stød is generally described as a particular kind of creaky voice or laryngealization. That is, vocal fold vibrations with irregular variation in periodicity as well as amplitude, commonly associated with a decrease of fundamental frequency (F0) and intensity toward the end of the syllable. However, the manifestation may be weaker and resemble modal voice: merely a somewhat compressed voice quality, lacking vibratory irregularity and F0 perturbation at the end. Stød is perceptible in whisper but disappears in song, at least in bel canto style singing. To set it apart from creaky voice or laryngealization as described in other languages, the manifestation of stød is termed creakiness here. It is very reminiscent of Livonian stød, though structurally different (Kiparsky Reference Kiparsky, Wolfgang Kehrein, Boersma and van Oostendorp2017). However, Livonian has become extinct since the last speaker passed away in 2013 (Charter Reference Charter2013). Danish stød may therefore now be unique among the languages of the world.
Stød is conventionally marked with a superscript glottal stop symbol – after the vowel if it is long, as in [b̥eːˀn] ben ‘leg’, or else after the first postvocalic sonorant consonant, as in [b̥enˀ] bind ‘bandage’ – even though stød is not, except in its strongest manifestation as under emphasis, produced as a glottal stop.
2.2 Phonological characteristics
Stød is a property of syllables. For a syllable to have stød it must satisfy two phonological conditions, one segmental, one prosodic. (i) It requires a long sonority rhyme, either a long vowel – as in [seːˀ] se ‘see’ – or else a short vowel followed by a sonorant consonant – as in [halˀ] hal ‘hall’. In either case, one or more consonants may follow in the syllable – as in [seːˀn] sen ‘late’ and [halˀm] halm ‘straw’. (ii) The syllable may not be unstressed. With long sonority rhymes and stress (whether primary or secondary), syllables have stød-basis in Basbøll’s account of stød (Reference Basbøll2005:277–278). Note that vowel length is distinctive in Danish, as in [ˈviːlə] hvile ‘rest’ versus [ˈvilə] ville ‘would’, and vowels with stød are long vowels phonologically. There is no corresponding length distinction in Danish consonants.
The fundamental phonological governing principle is that a syllable with stød-basis has stød if it is not penultimate in the lexeme, hence for instance [tˢanˀ] tand ‘tooth’ with stød but [ˈtˢand̥ə] tante ‘aunt’ without stød. There are lexically marked exceptions to this principle in certain monosyllables with a short vowel ending in one sonorant consonant, such as [føl] føl ‘foal’ and [vϵn] ven ‘friend’ without stød, and in many disyllables ending in /əl ən ər/, such as [ˈɑŋˀɡ̊əl] ankel ‘ankle’, [ˈlæːˀjən] lagen ‘sheet’, and [ˈfeːˀb̥ɐ] feber ‘fever’ with stød. In inflection and derivation complex morphological principles apply, giving rise to stød contrasts such as [ˈhuːˀsð̩] huset ‘the house’ with stød (from indef.sg. [ˈhuːˀs] hus ‘house’) versus [ˈhuːsð̩] huset ‘housed’ without stød (from inf [ˈhuːsə] huse ‘house’). See further Basbøll (Reference Basbøll2005, Reference Basbøll2008, Reference Basbøll2014, Reference Basbøll, Noël Aziz Hanna and Catherine Smith2021, Reference Basbøll, Cigana and Gregersen2022) and Grønnum, Pharao & Basbøll (Reference Grønnum, Pharao and Basbøll2020).
Phonetic investigations of stød may be undertaken to illuminate questions about its physiology, acoustics, or perception. But the point of departure may also be a quest for a phonetic underpinning of a particular phonological analysis. Thus, for instance, building on prior investigations of vowel and consonant duration and the timing of the onset of creakiness (Petersen Reference Petersen1973; Fischer-Jørgensen Reference Fischer-Jørgensen1987), Basbøll (Reference Basbøll, Marco Bertinetto and Loporcaro1988) proposed that Danish syllables have moraic structure, and that stød is a property of the second mora in bimoraic syllables. He specified that non-sonorant consonants cannot be moraic in Danish, and that trimoraic syllables are excluded (Basbøll Reference Basbøll1989, Reference Basbøll2005:292, Reference Basbøll, Noël Aziz Hanna and Catherine Smith2021). In other words, [seːˀ] se ‘see’ and [seːˀn] sen ‘late’ are equally bimoraic, as are [halˀ] hal ‘hall’ and [halˀm] halm ‘straw’. Basbøll also suggested that consonants with stød are long phonologically. His analyses occasioned additional investigations of duration (Grønnum & Basbøll Reference Grønnum and Basbøll2001) as well as two perceptual experiments (Grønnum & Basbøll Reference Grønnum, Basbøll, Bel and Martin2002, Reference Grønnum and Basbøll2003).
3. Older phonetic descriptions and investigations, 1743–1943
Jens Pedersen Høysgaard (Reference Høysgaard1743, Reference Høysgaard1747, Reference Høysgaard1769) was the first to identify stød phonetically and to recognize its distinctive function. The term stød is also due to him. It means ‘push’ or ‘thrust’, and he described it impressionistically as resembling a very little hiccup (et meget lidet hik) that arises when the pharynx closes and cuts off the breath (svælget lukker sig for ånden – Fischer-Jørgensen surmises that ‘Høysgaard … hardly made any distinction between pharynx and larynx’ Reference Fischer-Jørgensen1987:72).
For a century and a half after Høysgaard, stød was commonly assumed to be a glottal stop. That was, for instance, the opinion of highly influential Otto Jespersen (Reference Jespersen1897–1899:297), one that he maintained in the German edition of his book (Jespersen Reference Jespersen1913:78–79), although Rousselot’s (1897–1901) exploratory kymograph experiments had shown that the vibrations in the larynx hardly ever died away completely in words with stød (Rousselot 1901:873–879). That same observation was subsequently made repeatedly – albeit in experiments of somewhat limited scope in terms of number of speakers and recorded words – thus also in the last investigations to rely exclusively on the kymograph in stød investigations, that is, Ekblom (Reference Ekblom1933) and Abrahams (Reference Abrahams1949). See Fischer-Jørgensen (Reference Fischer-Jørgensen1987:72–74) for a fuller account of older descriptions and instrumental investigations of stød.
The kymograph was not a very sensitive or accurate instrument, though it functioned reliably enough to show that stød is not normally a full glottal stop. Smith (Reference Smith1944) is the first phonetic investigation that also employed more advanced methods and equipment, but it took more than a decade before the kymograph was irrevocably retired and became a museum piece. Thus Ringgaard (Reference Ringgaard1960) was the last to employ it in some of his initial experiments around 1958.
4. Modern instrumental investigations
Between 1944 and 1978, three Danish doctoral dissertations and eight papers saw the light of day. Then there was a lull until Eli Fischer-Jørgensen’s monumental report about experiments – spanning more than a decade since 1974 – appeared in 1987. Another lull ended in 2001 when Hans Basbøll and I published the first of six papers. The last period contains the four most recent contributions to empirical phonetic stød research. If I am not guilty of an oversight, there are no more modern empirical phonetic (acoustic, physiological, or perceptual) analyses of common Danish stød than the twenty-two books and papers reviewed below.
4.1 1944–1978
4.1.1 Smith Reference Smith1944
Svend Smith’s (Reference Smith1944) dissertation – in the first major, systematic, and detailed experimental phonetic investigation of Danish stød – challenged the generally accepted view of the larynx as the primary source of stød. In what for his time was a groundbreaking and difficult interdisciplinary undertaking, he recorded electromyographic (EMG) activity from surface electrodes placed on the lower abdomen of five speakers. Readers appreciate the precariousness of the enterprise when they learn that in normal upright bodily position, the abdominal activity involved in maintaining this posture drowns out whatever specific activity speech production might entail. Only lying down did Smith obtain valid recordings from the electrodes, registered by an oscillograph, and captured on film. The recorded material consisted of eleven pairs of stød/non-stød words (five monosyllabic, five disyllabic, and one trisyllabic pair), such as [vϵn] ven ‘friend’, [vϵnˀ] Vend! ‘Turn!’ and [ˈʁoːsən] rosen ‘the rose’, [ˈʁoːˀsən] rosen ‘the praise’. Each pair was pronounced several times, in isolation, by five speakers. Numerous recordings were discarded due to speaker errors and technical glitches, yielding merely eighty-three pairs of words suitable for analysis. Six were whispered. In fifty-six instances, three of them whispered, the word with stød exhibited an abrupt increase in abdominal EMG activity, succeeded by an equally abrupt decrease. However, Smith was unable to synchronize EMG and sound recordings and therefore could not be sure at what point in time the higher EMG activity began in words with stød. Nevertheless, he proposed that the causative factor in stød production is a ballistic contraction of the expiratory muscles. This sudden contraction creates a steep increase in subglottal pressure that in its turn (though not infallibly) induces a secondary, proprioceptive reflexive innervation of the vocal folds, giving rise to creakiness. The subsequent equally abrupt reduction in subglottal pressure results in a reduction of intensity and F0.
In a subsequent experiment, Smith investigated F0 in stød versus non-stød. Six speakers produced some forty pairs of words with and without stød, as well as a few short sentences. They were recorded with a microphone whose vibrations were transferred to an oscillograph and captured on film. The vibrations on the film strip were analysed with a Meyer-Schneider pitch meter, a time-consuming manual procedure. There was no significant F0 difference at the beginning of words with stød and non-stød. The F0 fall often observed at the end of syllables with stød is not a tonal phenomenon per se, notes Smith, but is to be explained in conjunction with the reduction in amplitude, and both are a consequence of the reduction in subglottal pressure that results from the abrupt cessation of innervation of the expiratory musculature. Finally, Smith observed a tendency for greater vibratory amplitude at the beginning of syllables with stød, indicative of greater tension, a tendency that became the subject of a third experiment.
Smith recorded oral airflow velocity in unvoiced initial consonants in three speakers on a kymograph. He found some evidence of greater flow velocity that he interpreted as greater articulatory energy at the beginning of words with stød. In the same vein, the only voiced initial consonant in the recorded material, [l], exhibited vocal fold oscillations of greater amplitude in words with stød, also indicative of more articulatory energy.
Smith introduced the concept of phases in stød production. The first phase is the increase of EMG activity in the abdominal musculature. The second phase is the succeeding decrease of EMG activity – often, but not invariably, accompanied by irregular vocal fold vibrations. This concept of a biphasic stød resonated with most researchers following Smith but with a slightly different, that is, acoustic connotation. The first phase became the interval with modal voice, the second phase became the interval with creakiness.
Concluding the results of all three experiments, Svend Smith’s final words (Reference Smith1944:120) are that ‘… Stødet herefter kan defineres fysiologisk som en Trykaccent, en særlig markerende Bevægelse foretaget ved en stødagtig Fremhæven af Lydmasse’ [… the stød henceforth can be defined physiologically as a stress-accent, a particular marking motion brought about by a push-like highlighting of sound mass].
In the years following Smith’s experiments, the explicit assumption became that the vocal folds (the vocalis muscles) are active, not passive participants in stød production. Common to every phonetic investigation of stød after Smith’s is the assertion that the vibratory pattern of the vocal folds in syllables with stød deviates in some manner from normal, modal vocal fold vibrations, and subsequent researchers set out to characterize in more detail the nature of this deviation. Some have also tried to establish the physiological mechanism underlying it.
4.1.2 Fischer-Jørgensen Reference Fischer-Jørgensen1955
Eli Fischer-Jørgensen (Reference Fischer-Jørgensen1955) measured the duration of long vowels without stød, long vowels with stød, and short vowels in two triads – [ˈpʰiːb̥ɐ] piber ‘pipes’, [ˈpʰiːˀb̥ɐ] piber ‘whines’, [ˈpʰib̥ɐ] pipper ‘chirps’, and [ˈlϵːsɐ] læser ‘reader’, [ˈlϵːˀsɐ] læser ‘reads’, [ˈlϵsɐ] læsser ‘loads (prs)’ – recorded once in isolation by ten speakers. The same triads plus [ˈhyːlə] hyle ‘howl’, [ˈhyːˀlɐ] hyler ‘howls’, [ˈhylə] hylde ‘praise (inf)’ were embedded in short sentences and recorded three times by three speakers each. In isolation, long vowels with stød averaged 82% of the duration of long vowels without stød. In sentence context they averaged 89% of long stødless vowel duration. The short vowels averaged 50% in isolation and 60% in sentences, respectively, of long stødless vowel duration. Fischer-Jørgensen (Reference Fischer-Jørgensen1987) contains a more comprehensive investigation of vowel duration; see Section 4.2.1.
4.1.3 Faaborg-Andersen Reference Faaborg-Andersen1957
During a pioneer investigation of EMG activity in intrinsic laryngeal muscles in thirty-two healthy subjects and twenty-three patients with unilateral vocal fold paresis, Knud Faaborg-Andersen (Reference Faaborg-Andersen1957) also recorded a pair of words with and without stød, [moːˀɐ̯] mord ‘murder’ and [moːɐ̯] mor ‘mother’, from six healthy speakers. (These words would be pronounced [moɐ̯ˀ] and [moɐ̯] today.) He obtained action potentials from two needle electrodes, one inserted via the mouth into the vocalis muscle and one inserted from the outside, through the skin and the cricothyroid cartilage, into the cricothyroid muscle. He obtained a total of nine and ten recordings from the vocalis and cricothyroid muscles, respectively. Table 11 (Faaborg-Andersen Reference Faaborg-Andersen1957:78) lists the maximum amplitude of the electric action potential in μV in the two muscles in the production of [moːˀɐ̯] and [moːɐ̯], respectively, in these nineteen instances. I have calculated the difference in maximum action potential between the two words. In the vocalis muscle, the average maximum action potential is 56.3 μV higher in the word with stød, whereas in the cricothyroid muscle, it is only 11.3 μV higher in the word with stød. However, the dispersion in the data is considerable and the difference is not statistically significant. I think the conclusion should be that Faaborg-Andersen demonstrated a tendency for higher activity in the vocalis muscle in stød than in non-stød, but no corresponding increase in the cricothyroid muscle. Faaborg-Andersen did not measure F0, therefore we cannot know whether the (in retrospect) surprising absence of substantial increase in cricothyroid activity indicates that there was no F0 difference between the two words.
4.1.4 Ringgaard Reference Ringgaard1960
In another pioneer research effort, Kristian Ringgaard (Reference Ringgaard1960) investigated West Jutlandic stød. It occurs in the dialects of West Jutland with an extension towards the east at Vejle and Horsens and reaches across the Little Belt to the northernmost part of Funen (Ringgaard Reference Ringgaard1973:24). Lying north of the stød border, these regions also have the common stød, though with a different distribution than in the Copenhagen standard language. Monosyllables with a short vowel succeeded by a sonorant consonant and an obstruent, such as pels ‘fur’ and flink ‘nice’, invariably have stød in Copenhagen, [pʰϵlˀs] and [fl̥eŋˀɡ̊]. In Jutland and on Funen they are without stød, [pʰϵls] and [fl̥eŋɡ̊].
West Jutlandic stød occurs in stressed syllables before /p t k/ in intervocalic, or originally intervocalic, position (Ringgaard Reference Ringgaard1960:10). That is, it occurs in disyllables and in monosyllables that derive from apocopated disyllables, but never in original monosyllables, as in [ˈsɔʔɡ̊ə] sukker ‘sighs (prs)’ and apocopated [sɔʔɡ̊] sukke ‘sigh (inf)’, the latter contrasting with a stødless original monosyllable [ˈsɔɡ̊] suk ‘sigh (n)’.
On a kymograph, seven speakers recorded isolated words with West Jutlandic stød, with common stød, and without stød. Ringgaard found the kymograms well suited to establish durational relations, but they could not provide unambiguous information about the articulation of West Jutlandic stød, except that it is different from the common Danish stød. He then proceeded with spectrography of his own speech at Svend Smith’s Institut for Talelidende [Institute for the Speech Impaired] in Hellerup, the only phonetics laboratory in Denmark with a spectrograph in 1958. Ringgaard discovered that in the common stød of [d̥uˈkʰæːˀd̥] dukat (a gold coin), voicing continued until the onset of the final stop consonant, and the release of the stop consonant was affricated. That is, glottis must have been open at the time of release. West Jutlandic stød, on the other hand, exhibited a ‘lydløst’ tomrum [‘soundless’ void] at the end of the vowel, obliterating the vowel formant transitions to the consonant, and no noise whatever after the release of the stop consonant (Ringgaard Reference Ringgaard1960:76).
Ringgaard was unable to determine what caused this soundless void, a vocal fold occlusion or ekspirationsstandsning [cessation of expiration]. To settle the issue, he had an X-ray movie made at Århus Municipal Hospital of his vocal folds, illuminated from the back of the neck, moving at 48 frames per second. He pronounced [kʰad̥] kat ‘cat’ three times and [kʰaʔd̥] katte ‘cats’ six times. I regret that with my untrained eye I am unable to interpret the X-ray still pictures reproduced in Figures 27 and 28 (Reference Ringgaard1960:78–81). But in a footnote (Reference Ringgaard1960:77) Ringgaard quotes Chief Physician E. Mosekilde, who states that
in the word [kʰad̥] kat ‘cat’ the false and true vocal cords only just come together while preserving their natural relief. In [kʰaʔd̥] katte ‘cats’ the occlusion is far more massive … The true and false vocal cords and the sinus of Morgagni are completely obliterated [my paraphrased translation].
Ringgaard concludes that West Jutlandic stød is a glottal stop. He reasons that such a firm glottal occlusion must involve increased subglottal pressure.
Subglottal pressure was recorded via a 2 mm thick syringe, inserted into Ringgaard’s larynx just below the cricoid cartilage, and printed on an electrocardiograph, while he pronounced isolated words without stød, with common stød, and with West Jutlandic stød. He found that in stødless words as well as words with common stød, subglottal pressure rose and fell in a smooth curve with peak pressure around the midpoint in the vowel. But in West Jutlandic stød, subglottal pressure was higher and the peak later in the vowel. He had expected subglottal pressure to increase even further, after the occlusion in the glottis. But it did not. During the glottal and succeeding oral occlusions, subglottal pressure fell back to its level of equilibrium. That could only happen if the respiratory organs performed a sudden inspiratory movement.
Accordingly, in the last experiment in this impressive series, X-ray diaphragm kymography revealed that in four productions of [kʰaʔd̥] katte ‘cats’ Ringgaard’s diaphragm performed a downwards, that is, an inspiratory movement during the glottal and oral occlusions, only reverting to expiration at the end of the word.
Ringgaard concluded the results of his challenging experiments, noting that West Jutlandic stød, that is, the glottal stop, appears to be both a very energetic and a very complex articulation.
I have no reason to doubt Ringgaard’s methods and results. But I believe a cautious caveat is in order here, one that I could voice about many physiological investigations of speech. Because the invasive procedures are intensely uncomfortable, it is difficult to recruit enough subjects to confidently allow generalization of the results to all speakers of the language.
Finally, I want to point out that even more remarkable than the phonetic nature of West Jutlandic stød is the fact that West Jutland has two phonetically different – and phonologically distinctive – laryngeal phenomena, a glottal stop and a common Danish stød, and in almost perfectly mutually exclusive contexts. This is a rather unique situation. Nevertheless, West Jutlandic stød is alive in the dialects to this day (Torben Arboe, Yonatan Ungermann Goldshtein, and Inger Schoonderbeek Hansen, personal communication, March 2022).
4.1.5 Lauritsen Reference Lauritsen1968
Margaret Lauritsen’s (Reference Lauritsen1968) short paper is from a Berkeley Phonology Laboratory Report that I have not been able to access. I quote her results from Fischer-Jørgensen (Reference Fischer-Jørgensen1987:77–78). Lauritsen published 33 spectrograms of words with and without stød, produced by one speaker. At the end of long vowels with stød, the spectrograms revealed a clear reduction in intensity, sometimes creakiness, and falling F0. Importantly, Lauritsen noted that the timing of creakiness is variable. In a long vowel it may spill over into a succeeding sonorant consonant, and in a sonorant consonant it may show up already in the preceding short vowel. She further noted that stød is audible in whispered speech through a reduction of the intensity of the whispery noise. I may add that the audibility of stød in whisper is something Danish speakers can easily verify for themselves. Contrasts like [vϵl] vel ‘well’ versus [vϵlˀ] væld ‘abundance’, [hans] hans ‘his’ versus [hanˀs] Hans (a boy’s name), and [ˈmæːlɐ] maler ‘painter’ versus [ˈmæːˀlɐ] maler ‘paints’ are perfectly perceptible in whispered speech, in isolation as well as in running speech.
4.1.6 Vihman Reference Vihman1971
In an appendix to her dissertation on Livonian phonology, Marilyn Vihman (Reference Vihman1971) compared Danish and Livonian stød. Livonian stød is always associated with a tonal difference, one that may even substitute for creakiness. That is not the case in Danish. One speaker recorded a series of minimal pairs, words with and without stød, in isolation and in a small frame sentence. Vihman found creakiness as well as decreasing F0 at the end of most words with stød. But there were also instances of stød where no creakiness and no F0 decrease were detectable in the spectrograms. Those words, however, exhibited higher F0 at vowel onset than did non-stød words. Later in the vowel, F0 converged with the non-stød pattern.
Vihman’s observations were confirmed by nearly every acoustic analysis that followed.
4.1.7 Petersen Reference Petersen1973
Pia Riber Petersen (Reference Petersen1973) analysed F0, intensity, and a few spectra in disyllables from six speakers. As did Lauritsen (Reference Lauritsen1968), she found considerable variability in the acoustic manifestation of stød in long vowels: explicit or less explicit creakiness, that started earlier or later in the vowel, and was of shorter or longer duration. One subject (labelled ʻ22’) stood out from the others by the complete absence of any visible signs of creakiness. However, his stød syllables exhibited higher F0 than non-stød syllables.
Petersen subsequently ran a listening test, made up of recordings of a pair of words, [ˈlϵːˀsɐ] læser ‘reads’ and [ˈlϵːsɐ] læser ‘reader’, spliced out from their sentence context, by speaker 22 and by another male speaker with acoustically explicit stød. [ˈlϵːˀsɐ] was very often identified as stødless [ˈlϵːsɐ] læser ‘reader’, whereas only rarely was stødless [ˈlϵːsɐ] identified as [ˈlϵːˀsɐ] læser ‘reads’. With that caveat, however, speaker 22’s renditions of [ˈlϵːˀsɐ] læser ‘reads’ were identified correctly more often than those of the acoustically more explicit speaker.
Petersen furthermore found long vowels with stød in her material – where test items were embedded in semantically natural sentences – to have the same duration as long vowels without stød. She found the creakiness to take up about one third of the total long vowel duration and to begin at about 110 ms from vowel onset. Consonants with stød after short vowels were longer by about 35% than consonants without stød. Short vowels before consonants with stød were slightly shorter than before consonants without stød. F0 maxima were higher in words with stød, and minima were lower. That is, the frequency range spanned was larger in stød than in non-stød words. Intensity displayed more of a fall in stød words, whether F0 was also falling (as it was not with speaker 22).
4.1.8 Thorsen [Grønnum] Reference Thorsen1974
I was intrigued by Petersen’s results and ran a perceptual experiment. I found that when the end of a word with stød was cut off at the point in time where the vocal fold vibratory pattern changed visibly from modal voice to creakiness – or at any point earlier in time – stød was no longer perceptible. A similar test with stimuli from Petersen’s speaker 22 revealed that even his shortest stimuli were infallibly perceived to have stød. I suggested that this was due to two features: his voice sounded somewhat strained or compressed in words with stød and they also began on a significantly higher F0. In retrospect, I propose that these two features may be correlated; see the discussion in Section 6.1.
4.1.9 Fischer-Jørgensen & Hirose Reference Fischer-Jørgensen and Hirose1974a, Reference Fischer-Jørgensen and Hirose1974b
Eli Fischer-Jørgensen & Hajime Hirose (Reference Fischer-Jørgensen and Hirose1974a) recorded EMG activity in labial and laryngeal muscles in stop consonants at the Haskins Laboratories in 1972. Among the words in the recorded material were also two pairs of stød/non-stød words, [man] man ‘one/you’ versus [manˀ] mand ‘man’ and [ˈpʰiːb̥ɐ] piber ‘pipes’ versus [ˈpʰiːˀb̥ɐ] piber ‘whines’. In a brief follow-up paper (Fischer-Jørgensen & Hirose Reference Fischer-Jørgensen and Hirose1974b) they noted that the vocalis muscle exhibits a clear peak of activity in the stød words, absent in non-stød, with the one speaker whose recordings could be reliably interpreted. The same speaker exhibited no difference in labial muscle activity, or in other words no evidence of a more forceful labial gesture in the onset of syllables with stød versus no stød.
4.1.10 Fischer-Jørgensen Reference Fischer-Jørgensen1974
This is short note to say that the work that commenced with Hajime Hirose at the Haskins Laboratories in 1972 continued in May and June of 1974 in Copenhagen, where EMG recordings of the vocalis muscle were obtained from seven speakers in words with and without stød. Four speakers showed increased vocalis activity in vowels with stød, three speakers did not. The final words in this advance notice are: ‘More details will be given in later reports’ (Fischer-Jørgensen Reference Fischer-Jørgensen1974:206), as indeed they were – in 1987.
4.1.11 Jørgensen Reference Jørgensen1978
John Jørgensen (Reference Jørgensen1978) quotes Roman Jakobson’s (Reference Jakobson1941) dictum that phonological phenomena that are rare in the languages of the world are acquired late by children. He recorded and transcribed the speech of four two-year-old toddlers. Each child produced around 125 short, non-imitated phrases – with two or three syllables per phrase. In a total of about one thousand syllables, only forty-one, equally distributed among the four children, contained a mistake in stød. There were twenty-five instances of no stød where one should have been, ten instances of stød where none should have been, and six instances of stød on the wrong segment in the syllable. Jørgensen suggests that this early mastery is due to stød being a prosodic phenomenon, and prosodic phenomena (for instance, intonation) are known to be acquired very early in children’s linguistic development.
4.2 Fischer-Jørgensen Reference Fischer-Jørgensen1987 (Reference Fischer-Jørgensen1989a, Reference Fischer-Jørgensen1989b)
Fischer-Jørgensen (Reference Fischer-Jørgensen1989a) is a photocopy of a slightly revised version of the manuscript for 1987, and Fischer-Jørgensen (Reference Fischer-Jørgensen1989b) is an abbreviated version. I refer to the comprehensive 1987 annual report edition here, that has recently become available online. It reports the results of experiments conducted over more than a decade after 1974.
Various subgroups of fifteen speakers, some of whom spoke Standard Danish from a regional background, were analysed in the many different experiments reported in this mammoth paper. Eli Fischer-Jørgensen performed extensive acoustic analyses: duration, creakiness, F0, intensity, and spectral composition. She filmed the vertical movement of the larynx, made fibre optic (laryngoscopic) videos of the larynx, recorded oral airflow, subglottal pressure, pharyngeal pressure, and she made palatograms. She obtained EMG signals from the vocalis, the cricothyroid, the interarytenoid, the posterior cricoarytenoid, and the lateral cricoarytenoid muscles. A microphone signal was also always recorded and synchronized with the physiological recordings.
4.2.1 Acoustic analyses
Duration of vowels with stød in words read in isolation was less (by 15% on average) than of long vowels without stød. But in words recorded in sentences, vowels with and without stød were of equal duration. Consonants with stød were longer (by 33% on average) than consonants without stød, with a tendency for the preceding short vowel to be shorter before a consonant with stød. This is in line with Petersen’s (Reference Petersen1973) results and with Fischer-Jørgensen’s own (Reference Fischer-Jørgensen1955) findings.
Creakiness was present in (only) 71% of 1700 words with stød, but there were considerable differences between speakers and even more considerable intrapersonal differences, associated with differences in clarity of pronunciation across different recording sessions. Fischer-Jørgensen noted explicitly that there was not a single instance among the 1700 words where stød appeared as a glottal stop.
Fundamental frequency was invariably, with every speaker, higher at the beginning of syllables with stød. At the end F0 was lower in only about half of 1700 words analysed. The timing of F0 lowering, when present, was variable.
Intensity was often higher at the beginning and nearly always, though not without exception, lower toward the end of a syllable with stød.
Spectral analysis showed more energy in the upper part of the spectrum (that is, acoustic compression) in vowels with stød, mostly only toward the end of the syllable, because – as demonstrated by inverse filtering – the sound produced by the vocal folds has relatively more energy in the higher harmonics in creakiness than in modal voice.
In summary: Fischer-Jørgensen found the manifestation of stød to be extremely variable acoustically. However, whether creakiness was present or not, F0 was invariably higher in the onset of syllables with stød, and intensity was nearly always lower at the end.
4.2.2 Physiological analyses
Larynx vertical movement is reflected in the movement of the thyroid prominence (the Adam’s apple). It was filmed in five male speakers – in profile – with a television camera and a video-recorder. A general tendency for the larynx to be lower in [u] than in [i] and [a] was confirmed, but no consistent difference was found between vowels with stød and vowels without stød.
Fibre optic video recordings of the vocal folds in seven speakers showed compression of the anterior part of the vocal folds toward the end of syllables with stød. Likewise, the ventricular folds were constricted, but with very considerable differences between speakers.
Oral airflow was recorded in five speakers and showed a tendency for stronger flow in prevocalic unvoiced consonants in syllables with stød – reminiscent of Smith’s (Reference Smith1944) observation – and a significantly diminished flow toward the end of the syllable.
Subglottal pressure was obtained in eight pairs of words from one speaker. It exhibited small but significant differences between stød and non-stød words: a steeper rise and a slightly, but consistently, higher peak pressure about 50 ms from vowel onset in syllables with stød. The reduction in airflow and the higher subglottal pressure together are indicative of increased resistance in the glottis – that is, the opening between the vocal folds – during creakiness. The effect should be a lower pharyngeal pressure above the glottis, and that was what appeared in three speakers. But the difference in pharyngeal pressure was small between stød and non-stød and not always statistically significant.
Palatograms of five speakers revealed a larger area of contact between the tongue and the hard palate in high vowels with stød versus non-stød.
Together, the results of these experiments are indicative of the expenditure of more articulatory energy at the onset of syllables with stød – with a cautious caveat about the lips; see the discussion of Fischer-Jørgensen & Hirose (Reference Fischer-Jørgensen and Hirose1974b) in Section 4.1.9.
EMG recordings were obtained by hooked wire electrodes inserted into the muscles, five in all. The interarytenoid and posterior cricoarytenoid muscles are responsible for closing and opening the glottis, respectively. Recordings were obtained from three speakers and none of them revealed any activity specifically associated with stød. The lateral cricoarytenoid muscles bring the vocal processes together, increasing the medial compression of the vocal folds, but otherwise seem to have much the same function as the vocalis muscles. The vocalis muscles are active in all voiced sounds, relaxed in unvoiced sounds, and involved in the manifestation of glottal stops. The cricothyroid muscles lengthen the vocal folds, making them thinner and tenser, thus increasing F0. The vocalis muscles may also be involved when – within the chest register – F0 rises, possibly to counteract the slight opening of the glottis that would otherwise result from tension of the cricothyroid muscles. Note that except for the interarytenoid, these muscles are all paired. That is, there are, for instance, two vocalis muscles, but it is customary to record from one side only in a pair and to refer to it in the singular.
Vocalis and cricothyroid muscles were recorded in seven speakers. Five speakers exhibited a markedly stronger vocalis activity in syllables with stød: a steep rise beginning on average 20–40 ms after vowel onset, a well-defined peak at about 100–130 ms, and a subsequent steep fall, yielding a total duration of 100–200 ms. The cricothyroid activity increased and decreased in harmony with F0 movements. In words with stød, where F0 was high at vowel onset, the cricothyroid peak was correspondingly early, slightly preceding the F0 maximum. In words without stød, where F0 performed a rising movement to the post-tonic syllable, the cricothyroid peak was correspondingly later. And, notably, the level of activity was no higher in the early (stød) cricothyroid peaks than in later (non-stød) peaks. In other words, the cricothyroid is associated with F0 but not with creakiness per se, and vocalis was unambiguously a factor in the production of stød in these five speakers.
The situation was different for the remaining two speakers, who exhibited no apparent difference in vocalis activity between stød and non-stød words. In one speaker the electrode appears to have been dislocated, but the other speaker (labelled ‘JR’) likely did not employ his vocalis muscle in stød at all. Furthermore, what little creakiness he exhibited occurred on a rising F0 and there was no F0 drop at the end of the syllable. On the other hand, JR’s ventricular folds were the most constricted among the seven speakers, almost completely blocking the view of his vocal folds. Fischer-Jørgensen does not say so, but it is implicit that whatever creakiness is present in JR’s stød is a result of the shortening and thickening of the vocal folds that occur when the ventricular folds are constricted due to contraction of the aryepiglottic sphincter musculature. One might further speculate whether this ventricular constriction, present also to varying degrees in the other six speakers, may not also be a concomitant factor in their stød production, in addition to the vocalis activity.
4.2.3 Stød versus other phonation types
Fischer-Jørgensen considered how stød is distinguished from similar non-modal voice qualities and phonation types. Harsh voice exhibits aperiodicity but not irregular variation in intensity. The first phase of a syllable with stød – that is, the beginning without creakiness – shares features with tense voice, for instance higher subglottal pressure and higher intensity. But it does not exhibit the constricted glottis that characterizes tense voice. Constricted glottis only appears during the second phase, with the creakiness. Stød shares many features with creak and creaky voice, but they never exhibit the reduction in overall intensity level very often found toward the end of stød syllables. Therefore, Fischer-Jørgensen (Reference Fischer-Jørgensen1987:183) concluded, ‘Stød is not just creaky voice.’
4.2.4 Causal relations
The increase in fundamental frequency at the beginning of syllables with stød that Vihman (Reference Vihman1971) and Petersen (Reference Petersen1973) also found, cannot be due to an increase in subglottal pressure, because the magnitude of that increase is insufficient to drive the magnitude of F0 increase observed. Nor can a subsequent rapid decrease in subglottal pressure account for the frequently observed reduction of intensity, because intensity often begins to lower before the peak in subglottal pressure is reached. But might a subglottal pressure increase induce at least a startle reflex, in terms of a vocalis contraction? Not as a general and consistent event, because in several individual instances, the vocalis activity peak preceded the subglottal pressure peak. Furthermore, contraction of the aryepiglottic sphincter musculature above the glottis, constricting the ventricular folds, can hardly be the result of changes in subglottal pressure. The conclusion is inescapable that vocalis muscle activity in stød is an independent activity and not a secondary effect of changes in subglottal pressure.
There are further potentially related parameters. They are partly acoustic (high F0 and high intensity at syllable onset; creakiness, lowering of intensity, lowering of F0, and increased energy in the upper part of the voice source spectrum toward the end of the syllable), partly physiological (articulatory energy; increased vocalis and cricothyroid muscle activity; constriction of the ventricular folds). Fischer-Jørgensen ascertained that high F0 and cricothyroid activity are obviously associated. (I would add that higher intensity may be merely a function of higher F0, not an independent factor, because when frequency increases, so does intensity, exponentially to the power of two, ceteris paribus (Divell Reference Divell2010).) But the F0 lowering and the creakiness, often observed in the second phase of stød syllables, cannot be due to a cricothyroid relaxation that already begins early in the first phase. Falling F0 and creakiness are more likely the result of a strong contraction of the vocalis muscle (and the lateral cricoarytenoid), often accompanied by constriction of the ventricular folds. In other words: cricothyroid activity at syllable onset increases F0, and vocalis activity causes creakiness and lowers F0 toward the end of syllables with stød.
4.3 Grønnum & Basbøll 2001–2012, Grønnum Reference Grønnum2015
In a series of investigations from 2001 to 2012, Hans Basbøll and I addressed various acoustic and perceptual issues, before my provisionally last phonetic investigation of stød in 2015.
4.3.1 Grønnum & Basbøll Reference Grønnum and Basbøll2001
In Grønnum & Basbøll (Reference Grønnum and Basbøll2001) we tested the phonetic reality of Basbøll’s (Reference Basbøll, Marco Bertinetto and Loporcaro1988) proposal that consonants with stød are phonologically long, and we also recorded a small corpus to measure vowel duration. Consonants with and without stød were measured in five different conditions: (i) word finally in utterance final position, (ii) word finally in utterance medial position, (iii) word medial position between a stressed and a post-tonic vowel, (iv) word medial position between a vowel and a consonant, and (v) syllabic when schwa is assimilated. Three vowel qualities ([i y ϵ]) were tested short, long with stød, and long without stød. Seventy-one different words in all, each embedded in six different semantically normal sentences yielded 426 different utterances, recorded by five speakers.
Vowel duration measurements in five speakers across three vowel qualities averaged 128 ms for long vowels without stød, 121 ms for vowels with stød, and 90 ms for short vowels. The 7 ms difference between long vowels with and without stød is not statistically significant and they are of essentially equal duration. Short vowel duration is 70% of long vowel duration, in agreement with both Petersen (Reference Petersen1973) and Fischer-Jørgensen (Reference Fischer-Jørgensen1987).
Sonorant consonant duration measurements did not confirm Petersen’s (Reference Petersen1973) and Fischer-Jørgensen’s (Reference Fischer-Jørgensen1987) findings that consonants with stød are longer than consonants without stød, not across positions. And particularly, word final position turned out to be intriguing. In utterance medial position, word final [nˀ] and [lˀ] were between 4 and 27 ms longer than [n] and [ l], but utterance finally, [nˀ] and [lˀ] were between 19 and 48 ms shorter, not just relative to [n] and [l] in the same context but compared with [n] and [l] in all other positions, as shown in Table 1.
We wondered why utterance final consonants with stød should be so short and suggested that opposing forces in stød production and pause anticipation might be at work. We recalled Fischer-Jørgensen’s (Reference Fischer-Jørgensen1987) laryngoscopic investigation that showed a transverse narrowing of the glottis (that is, compression of the anterior part of the vocal folds) during the stød phase, and we suggested two different scenarios (Grønnum & Basbøll Reference Grønnum and Basbøll2001:242).
-
(i) If a narrower glottis is a prerequisite for stød, then abduction of the vocal cords, i.e., opening of the glottis, preparatory to the pause, will inhibit creaky voice more than modal voice, and the vibrations will accordingly cease sooner.
-
(ii) Subglottal pressure decreases before a pause. Therefore, volume velocity through the glottis diminishes, and the aerodynamic driving force in the vibrations – the so-called Bernoulli Effect (Bernoulli Reference Bernoulli1738; Grønnum Reference Grønnum2005:69) – will not be sufficient to maintain vibrations against the adductive forces involved in stød. In other words, vibrations are interrupted by a glottal closure.
We noted that certain items in our recordings appeared to be instances of the latter scenario, others fit the first option better. Syllabic consonants with and without stød (in pairs like [ˈsɡ̊yll̩] skylde ‘owe’ and [b̥eˈsɡ̊ylˀl̩] beskylde ‘accuse’) did not exhibit consistent differences.
Overall, our data on consonant duration did not provide a general phonetic underpinning of a phonological analysis of consonants with stød as being long. The only condition in Table 1 that exhibited noticeably longer stød consonants was utterance medial, word final position (68–55 = 13 ms). In retrospect, it is perhaps relevant that in the sentences read by the speakers, these words were almost exclusively succeeded by a word beginning with an unvoiced obstruent. Therefore, spill-over of creakiness to a succeeding syllable could not occur. Perhaps for the stød to run its full course, the consonant carrying it may be slightly lengthened. There is some support for such an assumption in the values in Table 1 from word medial position before [s], where [nˀ lˀ] were very slightly but nevertheless significantly longer than [n l] (65–60 = 5 ms).
Stød onset timing and duration measurements confirmed previous authors’ observations of considerable variability in onset and duration of creakiness. It may start as early as 10 ms after vowel onset and as late as 130 ms, averaging at 67 ms across short and long vowels alike. Quite apart from the fact that creakiness is not universally present, we noted that this variability challenges the traditional view, developed over the years since Smith (Reference Smith1944), of a biphasic stød (in long vowels) that begins with modal voice and approximately halfway through the vowel turns into creakiness.
4.3.2 Grønnum & Basbøll Reference Grønnum, Basbøll, Bel and Martin2002
In Grønnum & Basbøll (Reference Grønnum, Basbøll, Bel and Martin2002) we conducted a perceptual experiment to find out whether listeners perceive vowels with stød to be long, in accordance with the physical facts and their phonological status. We created forty disyllabic nonce words with four different stressed syllable types, Vːˀ, Vː, VCˀ, and VC, such as [ˈtˢiːˀlən], [ˈtˢiːlən], [ˈtˢilˀən], and [ˈtˢilən]. Each syllable type occurred in ten different instantiations, exemplified with VC, thus: [ˈtˢilən ˈb̥ϵməð ˈsd̥ϵnəð ˈmyləð ˈɡ̊øləð ˈsølən ˈsd̥ulən ˈtˢɔməð ˈkʰʁ̥ɑlən ˈtˢʁ̥ɑməð]. We recorded these 40 words embedded in semantically reasonably natural utterances. The words were then excised from their context and arranged in an ABX-test where X was to be judged as most resembling either A or B. A, B, and X were always different. For instance, the subjects would hear [ˈmylˀəð ˈmyːləð ˈmyːˀləð] and had to decide whether [ˈmyːˀləð] sounded more like [ˈmylˀəð] or more like [ˈmyːləð]. Twenty-four different ABX combinations, times ten instantiations, times two voices yielded 480 different ABX stimuli, arranged in a random sequence, that were presented to twenty-two subjects. Six listeners responded in a completely random fashion, but sixteen were consistent in their responses, albeit in two different ways, as evidenced in Table 2.
We interpreted values above 67% as similarity, below 33% as dissimilarity (bold in the table) and percentages between the two as indeterminacy. There were, as accident would have it, eight listeners in each group. Listeners in group 1 appeared to base their similarity judgements on vowel length: (words with) phonologically long vowels resemble each other, irrespective of the presence of stød (69%), and (words with) phonologically short vowels resemble each other, irrespective of the presence or absence of stød in the succeeding consonant (82%). Group 2 founded their judgements on the presence or absence of stød. That is, (words with) stød syllables resemble each other, irrespective of phonological vowel length (72%), and (words with) stødless syllables likewise resemble each other (71%). Both groups rejected any resemblance between (words with) syllables that are different with respect to both stød and phonological vowel length (30%/29%; 26%/25%). We concluded: ‘In brief: to half of our subjects stød vowels resemble phonologically long stødless vowels, to the other half syllables with stød resemble each other, irrespective of the segmental composition.’
4.3.3 Grønnum & Basbøll Reference Grønnum and Basbøll2003
New in this paper is a perception experiment to test whether listeners could discriminate different timings of onset of creakiness. From the recorded material in our 2001 experiment, we extracted twenty disyllabic words, four words from each of five speakers, instances of [ˈd̥iːˀsən] disen ‘the fog’, [ˈsb̥iːˀsɐ] spiser ‘eats’, [ˈviːˀsɐ] viser ‘shows (prs)’, [ˈpʰiːˀb̥ɐ] piber ‘whines’, [ˈɡ̊yːˀsɐ] gyser ‘shivers’, [ˈlϵːˀsɐ] læser ‘reads’, and [ˈlϵːˀnɐ] læner ‘leans’. Two items by each speaker had visible creakiness in waveform and spectrum and two did not, even though stød was unambiguously audible in every word.
Listeners’ task was a visual analogue scaling where they had to mark, on 10 cm long lines on sheets of paper, where in the vowel they thought the stød began (early to the left, late to the right). Eighty-one university students participated. No evidence of bipartition of long stød vowels was found. That is, responses did not cluster around the midpoint of the line. There was a clear overall tendency for perceived stød onset at the very beginning of the vowel. Nevertheless, some stimuli had a slightly more pronounced tendency for early judgements than others. We extracted seven acoustic parameters from the comprehensive selection available in Praat (Boersma & Weenink Reference Boersma and Weenink2006): HNR (harmonicity-to-noise ratio) average across the vowel; HNR maximum; distance of HNR maximum from vowel onset; average F0 through the vowel; F0 maximum; F0 minimum; F0 range. We also measured vowel duration. The only factor that showed a modest correlation (r = 0.75) with listeners’ responses was, in fact, vowel duration. Stød was perceived to begin earlier in the high vowels [iːˀ] and [yːˀ] that are intrinsically shorter than lower [ϵːˀ]. The listeners, apparently at a loss to do what was asked of them, searched for something else in the stimuli to differentiate them, and found the acoustic property that distinguishes [iːˀ yːˀ] from [ϵːˀ], namely duration. That was our conclusion in 2003. I would like to offer another tentative explanation now. If stød perception is essentially cued by higher onset F0, perhaps as a concomitant of greater acoustic compression, the intrinsically higher F0 of [iːˀ] and [yːˀ] versus [ϵːˀ] may have contributed to a perception of earlier stød onset.
In retrospect, I want to stress what I consider to be the most interesting aspect of this experiment, namely that listeners apparently did not perceive long vowels with stød as having two parts, but – by and large – perceived stød throughout the vowel.
4.3.4 Grønnum & Basbøll Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007
In Grønnum & Basbøll (Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007) we reviewed the phonetic results from our previous investigations and concluded with a proposal for a characterization of stød in articulatory terms, thus (Grønnum & Basbøll Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007:200):
The laryngeal activity is a ballistic gesture that – minimally – makes for a slightly compressed voice quality, at one end of a continuum, and – maximally – creates a distinctly creaky voice at the other. Under emphasis it may become a complete glottal closure. It is a property of the sonorant syllable rhyme. It is aligned with the onset of the rhyme. It is variable with respect to strength and to temporal extension.
We interpreted the ballistic gesture as a lowpass filtered muscular response to a transient neural command, timed to coincide with the onset of the rhyme. The neural command may be stronger or weaker, making for more irregular or less irregular vocal fold vibration of shorter or longer duration, but when the command is executed, the speaker can no longer control the way the vocal folds respond to it, just as one can no longer control the trajectory of a tennis ball once it has bounced off the racket.
Figure 1 is an illustration of two very different stød manifestations and for comparison a word without stød, in words excised from larger utterance contexts. Note that stød is equally perceptible in the two leftmost words.
Our proposal is consistent with the fact that speakers cannot choose to increase the duration of stød ad infinitum, the way one may choose to lengthen creaky voice at the end of an utterance. In the same vein, parents addressing their infants may produce extremely long stødless vowels, but they do not similarly lengthen vowels with stød (Bleses et al. Reference Bleses, Werner Vach, Sonja Wehberg, Madsen and Basbøll2008). Our account of the laryngeal activity is also consistent with the way stød behaves acoustically: more explicit or less explicit creakiness; variable timing of the onset of creakiness in waveform and spectrum; variable total duration that often makes the creakiness continue well into the following syllable. Furthermore, our proposal is consistent with Fischer-Jørgensen’s (Reference Fischer-Jørgensen1987) EMG-data: the higher vocalis muscle activity in creakiness relative to modal voice increases and decreases (roughly) gradually. It would be quite noteworthy if the actual mechanical change in vocal fold vibration mode were not also (roughly) gradual. We had no concrete evidence to suggest some underlying systematic factor beneath the considerable variability in stød strength and timing, but we conjectured that analysis of stød in non-scripted speech would reveal variation of stød manifestation as a function, inter alia, of the degree of prominence on the syllable.
The paper also contains a report on a small and informal but revealing experiment. We excised the word [ˈlϵːˀsɐ] læser ‘reads’ (Figure 2, right) from an utterance and spliced out the [l] and the first half of the vowel, to the left of the dotted red line. On three separate occasions I presented the resulting truncated item (Figure 2, left) to three groups of Danish university students (about 15, 20, and 70 persons, respectively) in a lecture room. I told the audience that they would hear a word spliced out from a larger context and that it might sound somewhat abrupt. I played the truncated word back five times and asked the audience: ‘What do you hear? Could this be a Danish word?’ Despite the creakiness (admittedly not very explicit though the stød was perceptibly unmistakable) in what was left of the vowel, it was unambiguously of short duration, and apparently duration trumps creakiness when the two are in conflict, because the audience unhesitatingly responded with a word in the Danish lexicon: [ˈϵsɐ] esser ‘aces (n.pl)’. This is my redacted formulation of our result. Our reasoning in the paper is more complicated and somewhat convoluted.
Similarly, the [l] and the first half of the vowel were removed from the word [ˈlϵːˀnɐ] læner ‘leans’ (Figure 3, right). The creakiness has spilled over into the following nasal consonant. As such, the truncated item (Figure 3, left) has a perfect match in the lexicon, [ˈϵnˀɐ] ænder ‘ducks’, and that is how listeners responded. In other words, the accidental spill-over was perceived at face value as ordinary creakiness in the [n]. If that had not been the case, listeners would have had recourse to another short vowel lexical item, [ˈϵnɐ] ender ‘ends (pres/n.pl)’.
4.3.5 Grønnum & Basbøll Reference Grønnum, Basbøll and Niebuhr2012
This paper is not primarily a study in the phonetics of stød, but it contains a section about psycholinguistic issues (Grønnum & Basbøll Reference Grønnum, Basbøll and Niebuhr2012:38–40) with two observations that add to the phonetic picture of stød.
-
(i) We noted that words with stød enter a child’s lexicon no later than corresponding words without stød. Nor do stød alternations in inflection delay children’s acquisition of morphology (Kjærbæk & Basbøll Reference Kjærbæk and Basbøll2010:15, 25).
That is in line with Jørgensen’s (Reference Jørgensen1978) observation that two-year-old toddlers make very few stød mistakes (see Section 4.1.11), and I take it to mean that the complex laryngeal articulation involved in stød is not an obstacle to children’s acquisition of their mother tongue.
-
(ii) During the phonetic annotation of the DanPASS corpus (almost ten hours of non-scripted speech, monologues as well as dialogues, by 22 speakers; see Grønnum Reference Grønnum2009), the transcribers (myself among them) found that despite highly variable acoustic manifestations, stød was very nearly always clearly identifiable perceptually and – even more noticeably – did not seem to suffer any weakening in less distinct and/or more rapid passages. This is in obvious contrast to the manifestation of most segments. Thus, for instance, obstruents vary from the most clearly pronounced prototypical stops and fricatives to the weakest possible approximants, and they may be deleted altogether. Stød is never likewise deleted.
4.3.6 Grønnum Reference Grønnum2015
Fischer-Jørgensen (Reference Fischer-Jørgensen1987) observed that F0 is consistently higher at the beginning of syllables with stød. It is in fact the most stable acoustic difference between syllables with and without stød. In Grønnum (Reference Grønnum2015) I widened the scope of analysis to running, non-scripted speech and included measurements in post-tonic syllables. I culled the data from the DanPASS corpus (Grønnum Reference Grønnum2009). In order that any observed differences in vowel onset F0 be ascribed solely to the stød/non-stød difference, initial consonants in the syllable had to be of similar glottal and oral configurations (voiced or not, aspirated or not, fricative or stop) and the vowels should have the same tongue height. That left a meagre six acceptable words for analysis: [ˈvanˌfalˀ] vandfald ‘waterfall’, [b̥aˈnæːˀnˌpʰalmə] bananpalme ‘banana palm’, [ˈlasd̥ˌb̥iːˀl] lastbil ‘lorry’, [d̥iaˈmanˀd̥ˌmiːnə] diamantmine ‘diamond mine’, [ˈpʰʌsd̥ˌhuːˀs] posthus ‘post office’, and [ˈfϵŋˀsəl] fængsel ‘prison’. By way of compensation, there were sixteen speakers and often more than one instantiation by the same speaker. In isolated words, onset F0 in stressed syllables with stød ([ˈnæːˀn], [ˈmanˀd̥]) was significantly higher (by about one semitone) than without stød ([ˈvan], [ˈlasd̥]). There was a similar difference in the post-tonic syllable ([ˌfalˀ], [ˌpʰal]; [ˌb̥iːˀl], [ˌmiː]), but of lesser magnitude (about half a semitone) and not statistically significant. In words extracted from non-scripted monologues the same trends were observed, but the less constrained production in non-scripted speech, with its expanded frequency range, introduced a considerable dispersion in the measurements and the differences were not statistically significant. The lively exchanges between speakers in the non-scripted dialogues caused an even more considerable increase in overall frequency range, and semantic and pragmatic factors would consistently place some words higher in the speakers’ range than others, thus obscuring any modest increase in F0 caused by stød. In a parenthetical note: This is a neat demonstration that read speech and controlled speech materials are justified and necessary when looking at some of the finer details of prosody that are easily obscured in less constrained speech.
I concluded that despite its perceptual robustness, and apart from the stable F0 difference at syllable onset, there is abundant evidence that the acoustic properties of stød are extremely variable: Vocal fold vibrations may or may not be explicitly irregular. The timing of the irregularity – when present – may be earlier or later in the sonorant syllable rhyme and it may continue into a succeeding post-tonic syllable. Likewise, syllables with stød often exhibit lowering of F0 toward the end of the rhyme. In retrospect, I suggest that the results support a characterization of stød as basically physiologically compressed voice that in its weakest form merely increases F0 in the beginning, whereas stronger compression activates the vocalis muscles and turns the modal vibratory pattern into creakiness and/or lowers F0. What governs this borderline between compressed modal-like vibratory patterns throughout the syllable versus explicit – occasionally weak – creakiness from some time after vowel onset is, however, an open and intriguing question.
4.4 The most recent studies, 2015–2022
Three studies are concerned with the nature and timing of the creakiness that characterizes stød, and Kirkedal (Reference Kirkedal2016) studied the importance of stød for automatic speech recognition.
4.4.1 Hansen Reference Hansen2015
The point of departure for Gert Foget Hansen’s (Reference Hansen2015) dissertation is the perceptual equivalence of different acoustic manifestations of stød. He challenges Grønnum & Basbøll’s (Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007) suggestion that the action of the vocal folds is a ballistic response to a transient neural command. Instead, he proposes a controlled dynamic voice quality gesture, a progressively increasing and decreasing compression of the vocal folds (Hansen Reference Hansen2015:58–62). Whether or not creakiness occurs is a question of the degree of compression. (Note that Hansen and I were mutually unaware of each other’s writing in 2015.) The difference between the two proposals, he suggests, is in the control mechanism, and in Hansen’s view his hypothesis is better equipped to explain the perceptual equivalence of different manifestations: it is the dynamic change – the increase and decrease – in compression that Danes perceive as stød, irrespective of its magnitude, that is, whether creakiness occurs or not.
Compressed vocal fold vibrations entail a relatively shorter open period during each vibratory cycle. The acoustic result is a voice source spectrum with harmonics of relatively greater intensity in the upper part of the spectrum, that is, acoustic compression. The intensity relation between the upper and lower parts of the source spectrum can be expressed formally in several different ways. Hansen selected three such formal expressions to apply in the acoustic analysis.
The material consisted of thirty-four words with stød and their non-stød counterparts, inserted in sentences, read four to six times each by one male speaker of Standard Copenhagen Danish.
After a series of painstakingly detailed – token by token – acoustic observations, it turned out that in a considerable number of instances, maximum acoustic compression occurred early in the syllable, before the onset of irregular vibrations. If acoustic compression is a consequence of physiological compression of the vocal folds, and if the observed maximum acoustic compression precedes the onset of creakiness, it follows that increased vocal fold compression cannot unambiguously be made responsible for creakiness. Some other factor must be at play. Hansen felt compelled to abandon his hypothesis and refrained from speculating any further. He concluded (Hansen Reference Hansen2015:189):
Når kompression ikke nødvendigvis hænger sammen med uregelmæssige svingninger og med intensitetsdykket sådan som hypotetiseret, så bidrager kompressionsforløbet ikke til at forklare den variabilitet der ses især med hensyn til forekomsten af uregelmæssige svingninger i forbindelse med stød. [When compression is not necessarily correlated with irregular vibrations and with the reduction of intensity as hypothesized, the course of compression does not contribute to explaining the variability observed, particularly with respect to the occurrence of irregular vibrations in connection with stød.]
4.4.2 Kirkedal Reference Kirkedal2016
Anders Søeborg Kirkedal (Reference Kirkedal2016) found that supplied with information about stød, automatic speech recognition improves – in read as well as non-scripted speech. He tested seventeen acoustic parameters in a search for the one that contributes most to correct identification of stød in the speech signal. The highest ranking among the seventeen – even above measures of compression – are two expressions of phase distortion.
I think this is an interesting finding. Phase distortion is a derivative of phase shift, the time shift between the lowest harmonic in the spectrum and harmonics at higher frequencies. It is a direct reflection of changes in the intricate, three-dimensional vibratory pattern of the vocal folds: front-to-back, side-to-side, bottom-to-top. Any change from the modal vibratory pattern, through minimally compressed vocal fold vibrations to explicit creakiness, will involve phase shifts. But the vibratory pattern is not an independently controlled feature. It is a consequence of changes in other parameters, for instance vocal fold length and tension, the degree of transverse glottis opening, and compression of the vibrating mass in the vertical dimension when the ventricular folds are constricted. I suspect that phase shift and phase distortion are very sensitive measures of change in the vibratory pattern and might be detectable earlier than acoustic evidence of compression and irregular vibrations. If it were easy to extract from the speech signal, phase distortion might be a good indicator of the timing of the stød command.
4.4.3 Esling, Moisik, Benner & Crevier-Buchman Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019
Note that voice quality in (the title of) this book refers not only to the vocal folds but encompasses also what is customarily called base of articulation or articulatory setting. And laryngeal is not to be understood literally either but encompasses the larynx as well as the epilarynx and the pharynx.
The book’s central message is that the generally accepted description of speech as a source∼filter function – a sound emitted by the glottis and filtered by the resonance chambers above the glottis – is an inadequate dichotomy. Modifications in the epilarynx and the pharynx – the laryngeal cavity in the authors’ terms – are not merely responsible for modifying resonance frequencies but are active participants in differentiating the many ways the vocal folds may be configured and vibrate. In other words, the laryngeal cavity is an articulator. The Laryngeal Articulator Model (LAM) has three components: (i) tongue retraction, (ii) narrowing of the aryepiglottic folds through contraction of the complex sphincter/constrictor musculature, and (iii) raising of the larynx. These three components, in various combinations and to varying degrees, are involved in all the different phonation types, with their different vocal and ventricular fold configurations. Exceptions are aspiration, modal voice, breathy voice, falsetto, and breathy falsetto: they are exclusively properties of the vocal folds, of their different degrees of tension and different degrees of opening between the arytenoids.
Among the accompanying media files are a wealth of laryngoscopic video recordings, including recordings of five stød and contrasting non-stød words spoken by two Danes. They ‘illustrate how degrees of laryngeal constriction produce the phonetic effects that accompany instances of stød … all consistent with narrowing/tightening of the laryngeal constrictor mechanism’ (Esling et al. Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019:149, video legend). In the body of the text the authors state that the account of stød in Grønnum (Reference Grønnum1998), Grønnum, Vazquez-Larruscaín & Basbøll (Reference Grønnum, Vazquez-Larruscaín and Basbøll2013), and Hansen (Reference Hansen2015) ‘is consistent with our description of laryngeal constrictor action’ and that the stød/non-stød distinction ‘is accounted for globally by laryngeal constriction and its co-related effects’ (Esling et al. Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019:148). We are not told what these co-related effects are, but ‘Fischer-Jørgensen (1989) offers a comprehensive instrumental account of stød effects, as does Hansen (Reference Hansen2015), corresponding elegantly with our LAM interpretation’ (Esling et al. Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019:148). They continue: ‘Both also observe that the stops /t, k/ affricate when they occur initially in syllables containing stød – perhaps an enhancement in force across the syllable.’ Neither Fischer-Jørgensen nor Hansen make any such statement. Nor could they have because the facts are different: /t/ is always affricated in syllable initial position – stød or no stød – and /k/ is never affricated.
There is no mention in the book of Fischer-Jørgensen’s EMG recordings and hence no hint as to how laryngeal constrictor and vocalis muscle activity might interact in stød.
The authors state that, with the LAM model as the point of departure, much remains to be investigated. They note expressly (Esling et al. Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019:15) that the relation between F0 and the laryngeal articulator is not well understood. Evidently, research in that area would be very relevant to the description of stød.
4.4.4 Peña Reference Peña2022
Jailyn M. Peña’s (Reference Peña2022) paper was published online on 24 February 2022. Accordingly, the ‘three quarters of a century’ in the title of this review is a liberal interpretation of what – at the latest possible stage in the preparation of the manuscript – became seventy-nine years.
Peña subscribes to the generally accepted view that syllables with stød have two phases, a first phase with modal voicing and high F0, and a second phase typically with lower F0, irregular vibrations, and lower intensity. She finds that the sonorant portion of the whole syllable rhyme ‘acts as a single unit relative to stød’s gestural constellation in Danish’ (Peña Reference Peña2022:2) and that the onset of the stød phase proper is timed relative to the centre of that unit. This is in explicit contradistinction to Basbøll’s mora analysis (see Section 2.2), where stød is a property of the second mora in bimoraic syllables. That is, the creakiness turns up in the second half of a long vowel or in the sonorant consonant after a short vowel. In Basbøll’s analysis, any succeeding sonorant consonants, as in [b̥eːˀn] ben ‘leg’ and [jϵlˀm] hjelm ‘helmet’, are irrelevant in the mora account.
Peña analyses four different types of monosyllables with stød (C = any consonant, V = any vowel, S = sonorant consonant, O = obstruent consonant): 21 CVːˀ words (as in [søːˀ] sø ‘lake’), 40 CVːˀO words (as in [huːˀs] hus ‘house’), 55 CVːˀS words (as in [d̥æːˀl] dal ‘valley’), 63 CVSˀ words (as in [nϵmˀ] nem ‘easy’), and 29 CVS words without stød for reference (as in [sœn] søn ‘son’). Nine Danish speakers recorded the words, embedded in natural sentences.
F0 runs the expected course, that is, it begins higher in words with stød than in words without stød. Another expected result is the stød spill-over to the succeeding word, present in 88% of Peña’s data.
Peña argues that if mora is the relevant temporal domain, stød should begin in the middle of a long vowel, or at the boundary between a short vowel and the succeeding sonorant consonant. If, on the other hand, the whole sonority rhyme is the relevant domain, stød should begin at the sonority rhyme centre. The predictions of the two approaches will be indistinguishable in CVːˀ and CVːˀO words, where the long vowel constitutes the whole sonority rhyme. But they make different predictions for CVːˀS and CVSˀ words: stød onset will be later than the vowel midpoint in CVːˀS words under the sonority rhyme centre hypothesis. And if the vowel is longer than the consonant, stød onset will come earlier than the V-S boundary in CVSˀ words.
To eliminate the confounding influence from differences in vowel duration in the recorded material, within and across word types, stød onset times are represented as distances (i) to the centre of the long vowel (in CVːˀ, CVːˀO, and CVːˀS words), (ii) to the V-S boundary (in CVSˀ words), and (iii) to the sonority rhyme centre (in CVSˀ and CVːˀS words), and they are expressed as proportions of the total duration of the respective domains. An example is the formula in (1) for the relative distance of stød onset to the long vowel midpoint (Peña Reference Peña2022:20):
The distance measure that yields the least amount of variability as expressed by its standard deviation is considered the best representation of how stød is timed in the syllable.
Table 3 presents the results, adapted from Peña’s Table 9 (Reference Peña2022:20). Note that the rightmost column, distance from vowel onset, is my add-on.
Standard deviations are extremely large and the dispersion of the underlying data therefore considerable. That is only to be expected, given what we know about the variability in stød onset timing, from early to late after vowel onset, and anything in between. But standard deviations, even if very large, are nonetheless somewhat smaller in the ‘distance-to-sonority-rhyme-centre’ means. Peña concludes that the sonority rhyme centre represents the most stable timing relationship between stød and the syllable. That is, morae do not capture the reality of stød onset timing as well as do sonority rhyme centres.
Peña does not consider the inescapable implication for speech production: To produce different stød onset times in words with sonority rhymes of different duration, for instance [sɡ̊ϵlˀ] skæl ‘dandruff’ and [sɡ̊ϵlˀm] skælm ‘rogue (n)’, or [oɐ̯ˀ] ord ‘word’ and [oɐ̯ˀm] orm ‘worm’, the speaker must look ahead and anticipate the end of the sonority rhyme. Then the rhyme must be halved before the speaker can initiate the stød gesture. One wonders if that is really a likely production strategy.
Peña does not quote the non-normalized stød onset times – that is, the distance of stød from the onset of the vowel. But the averages of this measure can be retrieved from the values in her Table 7 (vowel and consonant durations, Reference Peña2022:19) and the results in Table 9 (Reference Peña2022:20), reproduced in Table 3 here. For example, to calculate stød onset distance from vowel onset in CVːˀ words, I looked up the average duration of the long vowel in Peña’s Table 7 (Reference Peña2022:19): 131 ms. Its midpoint is then at 65.5 ms. Onset distance from the midpoint is given as 9.08 ms. If we feed these values into formula (1) above, we get:
And then we rewrite the expression and find the distance of stød onset from vowel onset:
Likewise for the remaining three word types. This procedure does not yield standard deviations, so we cannot know how stød onset times relative to vowel onset would rank statistically compared with sonority rhyme centre measures.
I am inclined to believe that a fixed reference for the control and timing of creakiness in syllables with stød, like the onset of the stressed vowel, is a more likely candidate than an anchor that floats in time as a function of the specific segmental composition of the syllable sonority rhyme. This is an empirical question deserving to be settled.
5. Summary
As far as I know, there are no further published empirical phonetic stød investigations since 1944 other than those reviewed in Section 4 and whose results are summarized below.
5.1 Acoustic properties that distinguish syllables with stød
The list of differentiating qualities is long.
-
(i) F0 is higher at the onset of the vowel (Vihman Reference Vihman1971; Petersen Reference Petersen1973; Thorsen Reference Thorsen1974; Fischer-Jørgensen Reference Fischer-Jørgensen1987; Grønnum Reference Grønnum2015; Peña Reference Peña2022).
-
(ii) The voice source spectrum contains relatively more energy in the upper part of the spectrum (Fischer-Jørgensen Reference Fischer-Jørgensen1987; Hansen Reference Hansen2015).
-
(iii) Maximum acoustic compression often occurs early in the syllable before the onset of irregular vocal fold vibrations (Hansen Reference Hansen2015).
-
(iv) Spectral phase shift is more pronounced (Kirkedal Reference Kirkedal2016).
-
(v) Modal voice typically – but not universally – turns into creakiness at some point during the sonorant syllable rhyme (Lauritsen Reference Lauritsen1968; Vihman Reference Vihman1971; Petersen Reference Petersen1973; Fischer-Jørgensen Reference Fischer-Jørgensen1987; Grønnum & Basbøll Reference Grønnum and Basbøll2001, Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007; Peña Reference Peña2022).
-
(vi) F0 typically – but not universally – falls toward the end of the sonorant syllable rhyme (Smith Reference Smith1944; Lauritsen Reference Lauritsen1968; Vihman Reference Vihman1971; Petersen Reference Petersen1973; Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(vii) Intensity nearly always falls toward the end of the sonorant syllable rhyme (Smith Reference Smith1944; Lauritsen Reference Lauritsen1968; Petersen Reference Petersen1973; Fischer-Jørgensen Reference Fischer-Jørgensen1987; Peña Reference Peña2022).
-
(viii) Creakiness typically spills over into a succeeding syllable if its onset is a vowel or a sonorant consonant (Lauritsen Reference Lauritsen1968; Grønnum & Basbøll Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007; Peña Reference Peña2022).
5.2 Physiological properties that distinguish syllables with stød
This list is a little longer.
-
(i) The expiratory musculature (the diaphragm) contracts and relaxes abruptly (Smith Reference Smith1944). Nobody has since verified that, unless the tendency for greater oral airflow in syllable initial unvoiced consonants (Smith Reference Smith1944; Fischer-Jørgensen Reference Fischer-Jørgensen1987) is to be explained by higher expiratory activity.
-
(ii) Oral airflow diminishes toward the end of the syllable (Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(iii) Subglottal pressure is higher, indicating increased resistance in the glottis (not necessarily greater expiratory pressure) that in its turn derives from compression of the vocal folds (Smith Reference Smith1944; Ringgaard Reference Ringgaard1960; Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(iv) The anterior parts of the vocal folds are visually compressed at the end of the syllable (Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(v) The cricothyroid muscle is active, its maximum activity preceding the onset of the vowel, resulting in higher F0 at vowel onset (Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(vi) The aryepiglottic sphincter musculature contracts, specifically constricting the ventricular folds (Esling et al. Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019) – perhaps more so in speakers with weaker vocalis activity (Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(vii) The vocalis muscle is active in creakiness (Faaborg-Andersen Reference Faaborg-Andersen1957; Fischer-Jørgensen & Hirose Reference Fischer-Jørgensen and Hirose1974a, Reference Fischer-Jørgensen and Hirose1974b) – perhaps less so in speakers with more ventricular fold constriction (Fischer-Jørgensen Reference Fischer-Jørgensen1987).
-
(viii) The creakiness in syllables with stød cannot be sustained ad infinitum (Grønnum & Basbøll Reference Grønnum, Basbøll, Solé, Speeter Beddor and Ohala2007).
-
(ix) The contact area between the tongue dorsum and the hard palate is larger, indicative of the expenditure of more articulatory energy during creakiness (Fischer-Jørgensen Reference Fischer-Jørgensen1987).
Considering the complexity of the acoustics and physiology of stød, as summarized above, it is thought-provoking that it should be acquired so early in children’s linguistic development. Two-year-old children make hardly any stød mistakes (Jørgensen Reference Jørgensen1978).
5.3 Perceptual properties
The perception of stød is a much less explored field.
-
(i) Lauritsen (Reference Lauritsen1968) observed that stød is audible in whisper due to reduction of noise intensity. This is something any Dane from north of the stød border can verify.
-
(ii) Grønnum & Basbøll (Reference Grønnum, Basbøll, Bel and Martin2002) found that vowels with stød were not perceived to have two parts or phases.
-
(iii) Grønnum & Basbøll (Reference Grønnum and Basbøll2003) found that listeners by and large perceive the onset of stød to coincide with stressed vowel onset.
-
(iv) Grønnum & Basbøll (Reference Grønnum, Basbøll and Niebuhr2012) repeated the observation from the annotation of the DanPASS corpus that stød is perceptually very robust and withstands considerable variation in tempo and clarity of pronunciation.
-
(v) Stød is not heard in bel canto song. Fischer-Jørgensen (Reference Fischer-Jørgensen1987) mentions this fact – not that she performed any experiment – but this is something any Dane from north of the stød border is able to verify if only he can carry a tune.
6. Discussion
Is it possible to make an integrated whole of all the facts presented above? Three major topics invite speculation.
6.1 F0 and acoustic compression
The first question concerns the relation between f 0 and the acoustic compression that results from a longer glottal closure time in each vibratory cycle. Fischer-Jørgensen (Reference Fischer-Jørgensen1987) found that the peak in cricothyroid muscle activity preceded the onset of the vowel in syllables with stød, accounting for the high F0 at vowel onset. She also found acoustic compression to mostly coincide with creakiness. That is different from Hansen (Reference Hansen2015) who found that peak acoustic compression often preceded the onset of creakiness. I do not know how to reconcile the two accounts. How different are the physical facts behind ‘mostly coincide’ and ‘often precede’? For the sake of the argument, I will adopt Hansen’s (Reference Hansen2015) account.
Fischer-Jørgensen leaves no doubt that creakiness is a result of increased vocalis muscle activity. If that is so, and if acoustic compression peaks earlier than the onset of non-modal vibrations, acoustic compression and non-modal vocal fold vibrations cannot have the same cause. But perhaps high F0 and acoustic compression can both be attributed to the lengthening and tensing induced by the cricothyroid muscles. Even more tentatively: Could the acoustic compression be the independent variable, making higher F0 the dependent variable? If so, any argument that high F0 at syllable onset is a separate tonal gesture in stød production becomes void.
If indeed acoustic compression early in syllables with stød is the salient feature – common to all manifestations of stød and in conjunction with high F0 – it could be the property listeners extract from the stream of speech. Not the higher F0 per se, and not the creakiness that is not always present anyway. That would explain the perceptual equivalence of the many different acoustic manifestations of stød. It may not be straightforward, but I believe the proposal is testable in experiments with synthetic speech where one can manipulate F0, acoustic compression, and creakiness independently.
6.2 Ventricular folds and vocal folds
The second major question is the coordination of the constriction of the ventricular folds and vocalis muscle activity. Are they two independent gestures? Is there a trade-off relation between them? Fischer-Jørgensen’s speaker JR apparently did not have any stød-specific vocalis muscle activity at all. The weak creakiness in his stød appeared due solely to the shortening and thickening of the vocal folds, resulting from constriction of the ventricular folds, presumably due to contraction of the aryepiglottic sphincter musculature.
6.3 Cricothyroid, ventricular folds and vocal folds
The third conundrum is how cricothyroid activity relates to the ventricular fold constriction and the vocalis muscles. Is one neural command imaginable that first induces the cricothyroid to create acoustic compression (hence also higher F0), and then after some variable time lag triggers the sphincter and vocalis muscles to produce creakiness? If there are two separate neural commands, how are they coordinated? It must be a foolproof control, because we have never seen the non-creaky and the creaky intervals reversed in time. It will take a combination of Fischer-Jørgensen’s (Reference Fischer-Jørgensen1987) EMG study (including the sphincter musculature) and Esling et al.’s (Reference Esling, Reid Moisik, Benner and Crevier-Buchman2019) laryngoscopic experiments to shed light on this question.
6.4 Two simpler questions
Why would there be more articulatory energy at the onset of syllables with stød? That is something that perhaps needs to be explored in more comprehensive experiments. But increased articulatory effort may account for an anecdotal and very commonplace observation among Danish phoneticians: Ask linguistically naïve speakers of Danish what the difference is between pairs of words with and without stød, for instance [ˈmuːˀsən] musen ‘the mouse’ and [ˈmuːsən] musen ‘the muse’. They invariably reply along the lines of Der er ligesom mere tryk på [ˈmuːˀsən] ‘There is kind of like more stress on the mouse.’
Petersen (Reference Petersen1973) found that out of context [ˈlϵːˀsɐ] læser ‘reads’ was often identified as stødless [ˈlϵːsɐ] læser ‘reader’, whereas only rarely was stødless [ˈlϵːsɐ] identified as [ˈlϵːˀsɐ] læser ‘reads’. This irreversibility should be verified in a more comprehensive experiment.
Clearly, there is work to be done!
Acknowledgement
I am grateful to Hans Basbøll, John Esling, Birgit Hutters, two anonymous reviewers, and the editor for insightful and inspiring comments and questions. Also, many thanks to Torben Arboe, Yonatan Ungermann Goldshtein, and Inger Schoonderbeek Hansen for their ready replies to my enquiry about West Jutlandic stød in contemporary Danish dialects. And, finally, thanks to Bob Ladd for solving problems of style, in writing as well as in song.
Competing interests
The author declares none.