Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-29T00:38:40.318Z Has data issue: false hasContentIssue false

The intonation of yes–no questions in Luganda

Published online by Cambridge University Press:  12 April 2021

Scott Myers*
Affiliation:
The University of Texas at Austin [email protected]
Rights & Permissions [Opens in a new window]

Abstract

The intonation of yes–no questions in Luganda (Bantu, Uganda) has only been sketched in passing. Hyman states that Luganda yes–no questions are marked by a ‘super-high tone’ immediately following the last lexical high tone in the sentence, but there is little agreement in the literature about the intonation of yes–no questions if there is no lexical high tone in the sentence. To clarify the differences between statements and yes–no questions in Luganda, an acoustic production study was conducted. Nineteen speakers read aloud sentences differing in the location of the last lexical high tone relative to the end of the sentence. Each sentence was produced as a statement and as a question. Analysis of f0 measurements supported Hyman’s description of sentences with a lexical high tone, since the questions had an f0 peak that was higher and later than in the corresponding statements. For sentences without a lexical high tone, yes–no questions were found to begin with an interval in which f0 is higher than in corresponding statements, and end with a final f0 value lower than in statements. It is proposed that the yes–no question marker is a phrase accent (H). Like the high phrase accent posited by Pierrehumbert for English, this intonational tone is associated after the last tone in the phrase, but in Luganda that last tone is lexical, rather than being an intonational focus marker as in English. This H accent is subject to upstep in the position after a high tone.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of the International Phonetic Association

1 Introduction

In tone languages, differences in the f0 trajectory distinguish one lexical item from another. It is therefore a challenge in tone languages to accommodate intonational tones, since these are also differences in f0 trajectory, distinguishing one sentence type from another. Hyman & Monaka (Reference Hyman, Monaka, Sonia, Gorka and Pilar2011) note that in some tone languages, such as Coreguaje (Tukanoan, Colombia), the intonational tones marking statements and questions override and replace lexical tones. In other tone languages, intonational tones are just added to the sequence of lexical tones. For example, intonational boundary tones are placed in the last syllable of a phrase following all lexical tones in Swedish (Bruce Reference Bruce1977), Japanese (Pierrehumbert & Beckman Reference Pierrehumbert and Beckman1988), Kinande (Hyman Reference Hyman, Sharon and Draga1990), Chichewa (Myers Reference Myers1996), and Akan (Genzel & Kügler Reference Genzel and Frank2020).

The yes–no question construction in Luganda, a Bantu tone language spoken in Uganda, provides an interesting case of the interaction of lexical tone with intonational tones. The lexical tone patterns of the language have been thoroughly documented in work by Tucker (Reference Tucker1962), Cole (Reference Cole1967), Stevick (Reference Stevick1969a), Hyman (Reference Hyman, Sharon and Draga1982), and Hyman & Katamba (Reference Hyman and Katamba1993, Reference Hyman and Katamba2010), but this extensive literature contains only brief mentions of intonation in Luganda.

Hyman (Reference Hyman, Sharon and Draga1982: 28) states that in a sentence in which there is a lexical high tone, the yes–no question is marked by ‘a super-high interrogative tone’ immediately following that high tone, as in (1b). This interrogative tone is absent in the corresponding statement in (1a). The standard orthography does not mark tone, so here and henceforth the orthographic representations of the examples are augmented with acute accents indicating lexical high tones.Footnote 1

If there is no lexical high tone in the sentence, according to Hyman (Reference Hyman, Sharon and Draga1982), there is a super-high question-marking tone on the second syllable, as in (2b).

However, in later work, Hyman (Reference Hyman, Sharon and Draga1990: 122) and Hyman & Katamba (Reference Hyman and Katamba2011: 71–72) provide a different description of questions with no lexical high tones, stating that such a question is low-toned throughout, as in (3).

The brief description of yes–no questions given by Stevick (Reference Stevick1969a: 27) states the case in which there is a lexical high tone in a way similar to Hyman (Reference Hyman, Sharon and Draga1982), except that he does not say that the intonational tone is super-high. In the case in which the final word in the sentence has no lexical high tone, Stevick says that ‘the final syllable is extremely low in pitch’.

These descriptions do not agree on some rather important factual matters: whether the intonational tone marking yes–no questions is higher in pitch than a lexical high tone in the same position, and whether there is such an intonational high tone in yes–no questions that have no lexical high tones. There are also some cases that they do not cover, such as sentences with more than one lexical high tone, or ones in which the last lexical high tone in the sentence is earlier than the final word.

However, these descriptions are in agreement that the question-marking high tone in Luganda is not limited to the final syllable of an intonational phrase, which insures that it is not a boundary tone (H%), in the sense of Pierrehumbert (Reference Pierrehumbert1980) or Beckman & Hirschberg (Reference Beckman and Julia1994). It is instead parallel in distribution to the phrase accents (H and L) of Pierrehumbert (Reference Pierrehumbert1980), intonational tones which occur immediately following the final pitch accent in the phrase in English. The last lexical high tone in Luganda is not a pitch accent in the sense of Pierrehumbert (Reference Pierrehumbert1980), since the syllable it is associated with is not stressed, but it is parallel to the nuclear accent in English in being the last tone before the end of the phrase. This class of intonational tone is modelled after the ‘sentence accent’ that Bruce (Reference Bruce1977) posited for Swedish. Grice, Ladd & Arvaniti (Reference Grice, Ladd and Arvaniti2000) survey a number of intonational patterns in European languages in which an intonational tone occurs in this zone after the last pitch accent. It will be proposed in this paper that yes–no questions in Luganda are marked by a H phrase accent.

The positioning of the question-marking high tone in Luganda yes–no questions is unusual from a comparative perspective. A high boundary tone on the phrase-final syllable distinguishes questions from statements in Swedish (Hadding-Koch & Studdert-Kennedy Reference Hadding-Koch and Michael1964), Venda (Ziervogel, Wentzel & Makuya Reference Ziervogel, Wentzel and Makuya1972: 147), Kinyarwanda (Sibomana Reference Sibomana1974: 185), English (Pierrehumbert Reference Pierrehumbert1980), Japanese (Pierrehumbert & Beckman Reference Pierrehumbert and Beckman1988: 75), Kinande (Hyman Reference Hyman, Sharon and Draga1990: 114), Chichewa (Myers Reference Myers1996), and German (Féry Reference Féry1993: 73). Questions are also marked by raising of the pitch range and/or reduction of phrasal pitch downtrends, as in Lingala (Guthrie Reference Guthrie1940), Kongo (Carter Reference Carter1973), Danish (Thorsen Reference Thorsen1978), Kikuyu (Clements & Ford Reference Clements, Kevin and Didier1981), Hausa (Inkelas & Leben Reference Inkelas and Leben1990), Jita (Downing Reference Downing and Francis1995), and Kipare (Herman Reference Herman1996). Final low tones mark yes–no questions in languages such as Chickasaw (Gordon Reference Gordon and Sun-Ah2005), Akan (Genzel & Kügler, Reference Genzel and Frank2020), and the languages surveyed in Rialland (Reference Rialland2009). But the use of a high tone with phrase accent positioning has not been reported before for yes–no questions.

The previous descriptions of yes–no question intonation in Luganda were based on impressionistic transcriptions. The present study aims to clarify the difference between yes–no questions and statements in Luganda with an acoustic production experiment, comparing the two kinds of sentences across a range of lexical tone configurations.

2 Background on Luganda tone

There are three contrasting lexical tone categories in Luganda: high, low, and falling (Tucker Reference Tucker1962; Cole Reference Cole1967; Stevick Reference Stevick1969a; Hyman Reference Hyman, Sharon and Draga1982; Hyman & Katamba Reference Hyman and Katamba1993, Reference Hyman and Katamba2010). The minimal pair in (4) provides an example of the contrast between high and low tone (Snoxall Reference Snoxall1967).

The distribution of the three tone categories depends on syllable type. Luganda has a phonemic length contrast in both vowels and consonants, as exemplified by minimal pairs such as kumala ‘to finish’ – kumaala ‘to plaster’, and kuba ‘to be’ – kubba ‘to steal’ (Tucker Reference Tucker1962, Snoxall Reference Snoxall1967, Clements Reference Clements, Leo and Engin1986). Tucker (Reference Tucker1962) defines the distribution of the tones in terms of a distinction between long and short syllables. A long syllable, according to Tucker, is one with a long vowel or a coda (i.e. the first half of a long consonant), while a short syllable is an open one with a short vowel. There is a contrast between high and low tone in both long and short syllables, but the contrast between falling and high tone only occurs in long syllables. Both high and falling tone are characterized by a rise in f0 followed immediately by a fall, but they differ in that the f0 peak occurs earlier in the syllable in falling tone than in high tone (Myers, Namyalo & Kiriggwajjo Reference Myers, Namyalo and Kiriggwajjo2019).

A high tone in Luganda is subject to unbounded leftward tone spread. Within a tone phrase, as defined by Hyman, Katamba & Walusimbi (Reference Hyman, Katamba and Livingstone1987) and Pak (Reference Pak2008), a high tone extends leftward to the second syllable after a high-toned syllable, or, if there is no such syllable, to the second syllable of the tone phrase (Hyman & Katamba Reference Hyman and Katamba2010). The result of this process is an extended sequence of high-toned syllables, as in (5), where the high sequence is boldfaced. Spaces are omitted between the words in this and subsequent transcriptions in order to incorporate the long vowels and diphthongs that arise when a vowel at the end of one word is juxtaposed with a vowel at the beginning of the next word (Tucker Reference Tucker1962, Clements Reference Clements, Leo and Engin1986, Myers Reference Myers2020).

The only lexical high tone in this sentence belongs to the second syllable of the object noun nnamúnye ‘bird’. This extends leftward to the second syllable of the verb amira ‘he/she is swallowing’, which is the first word of the tone phrase that contains nnamúnye. The f0 trajectory of such multi-syllable high-tone spans is described by Myers, Selkirk & Fainleib (Reference Myers, Selkirk and Fainleib2018).

Another high tone that is subject to this high tone spread is an intonational boundary tone H% found in statements. This high tone is manifested as a plateau of high-toned syllables extending from the sentence-final syllable leftward to the second syllable after a high-toned syllable, or the second syllable of the tone phrase. The final high-toned span is boldfaced in the example in as in (6).

In this sentence, there are no words with a lexical high tone, yet there is a span of high-toned syllables extending from the final syllable of the sentence to the second syllable of the verb alera ‘he/she is carrying’, the first word of the final tone phrase. Hyman & Katamba (Reference Hyman and Katamba1993) attribute such spans to a boundary tone H%. Its meaning and distribution are unclear: Hyman (Reference Hyman, Sharon and Draga1982) describes it as characteristic of list intonation, while Hyman & Katamba (Reference Hyman and Katamba2010) describe it as indicating ‘finality’. The citation-form transcriptions in Cole (Reference Cole1967), Snoxall (Reference Snoxall1967), and Stevick (Reference Stevick1969a) all include this intonational high tone for all items ending in two or more syllables without a lexical high tone. Hyman & Katamba (Reference Hyman and Katamba2010) describe the final H% in statements as optional, but Myers et al. (Reference Myers2018) report that it occurred in every statement in their sample of statements ending in final phrases lacking lexical high tones.

3 Experiment

The difference between yes–no questions and statements has been described in Luganda, but the description has been incomplete, and has the vagueness and subjectivity to be found in any generalizations about speech production based on impressionistic transcriptions. The present study reports the results of an acoustic production experiment in which yes–no questions and statements were compared across conditions differing in the location of the last lexical high tone.

Both the peak f0 value and the timing of the f0 peak were measured. In those sentences with a lexical high tone, Hyman’s (Reference Hyman, Sharon and Draga1982) description of the question-marking tone as ‘super-high’ leads to the expectation that the f0 maximum would be greater in questions than in statements. Moreover, since this super-high tone is described as following the lexical high tone, we would hypothesize that the f0 peak will be later in questions than in statements in the case of sentences with a lexical high tone. The materials differed in the location of the last lexical high tone in order to test the claim that the location of the super-high intonational tone depends on the location of that lexical tone.

For sentences without lexical high tone, on the other hand, the description of Hyman (Reference Hyman, Sharon and Draga1982) would lead us to expect a local f0 peak on the second syllable of the sentence, at an f0 level higher than that for a lexical high tone in the same position. The later descriptions of Hyman (Reference Hyman, Sharon and Draga1990) and Hyman & Katamba (Reference Hyman and Katamba2011), on the other hand, would lead us to expect that the questions should have lower f0 than statements throughout the final interval. The description of Stevick (Reference Stevick1969a: 27) predicts in particular that f0 at the end of the utterance will be lower in such questions than in corresponding statements.

3.1 Method

3.1.1 Participants

Nineteen adult native speakers of Luganda participated in the study, ranging in age from 24 to 82. Six were female, and 13 were male. The relevant information for each one is listed in Table 1.

The participants came from all over the Central region of Uganda, where Luganda is spoken. Seventeen of them lived in the Kampala area at the time of the experiment, and two lived in the United States. They all grew up speaking Luganda, and at the time of the experiment spoke Luganda every day. They were also all fluent English speakers, English being a national language and lingua franca in Uganda.

Table 1 Participants.

Special effort was made to recruit participants from a broad range of age groups. The experimental descriptions of Luganda tone by Myers et al. (Reference Myers2019) and Myers et al. (Reference Myers2018) differed in important aspects from the description found in the previous literature (e.g. Tucker Reference Tucker1962, Cole Reference Cole1967, Stevick Reference Stevick1969a), which could be due to a change in pronunciation between speakers of that earlier time and contemporary speakers. Such a change would be evidenced by systematic differences in the measured properties between older and younger speakers, which could only be detected in a subject pool that varies sufficiently in age.

3.1.2 Materials

There were four classes of sentence in the study, depending on the position of the last lexical high tone in the sentence: HLL (lexical high tone on the antepenultimate syllable), LHL (lexical high tone on the penultimate syllable), LLH (lexical high tone on the final syllable), and LLL (no lexical high tone in the sentence). Examples of each class of sentence are provided in (7a–c) and all test sentences are listed in the appendix.

In all sentence types, the last lexical high tone in the sentence was in a short syllable, and there were sonorant consonants preceding and following the vowel of that syllable. In none of the sentences did the lexical high tone meet the conditions for leftward spread (described in Section 2), so each lexical high tone was associated with just a single syllable. Half of the sentences produced were statements, and half were yes–no questions. The only orthographic difference between the yes–no question and the corresponding statement was the final punctuation mark: a question mark, or a period.

There were between four and six sentences in each class. All sentences were presented to the participants more than once in the recording session, yielding 15 tokens per participant per condition. There were thus 15 tokens × 2 levels of Speech Act (Statement/Question) × 4 levels of Tone Position (HLL/LHL/LLH/LLL) = 120 tokens per speaker. There were 19 speakers, so a total of 2280 sentences in the study.

A total of 207 tokens were excluded from the analysis. All 120 tokens were excluded for participant S16, whose yes–no questions displayed no consistent pattern of question marking, differing in the number and location of f0 peaks even in different repetitions of the same question. For the other participants, 27 tokens were excluded because they included a pause within the measurement interval. Twenty-seven question tokens were excluded (all but one produced by S3), because they displayed an alternative question-marking strategy with an f0 peak on the sentence-final syllable, as in English. This might well represent a genuine alternative construction within Luganda, but the measurements for such tokens are not comparable to the ones with the usual Luganda question-marking pattern (which also made up the majority of questions for S3). Six tokens were excluded because the test word was replaced by another word, and 22 because the test word was produced with a lexical tone pattern violating the criteria for that sentence class. Two tokens had such reduced consonants that it was impossible to delimit the test syllable, and three had interruption of modal voicing in a critical part of the f0 trajectory of the test syllable. These exclusions left 2073 tokens for analysis.

3.1.3 Procedure

Seventeen of the participants were recorded at Makerere University in Kampala, Uganda, using a Shure SM10A head-mounted microphone and a Zoom H4n solid-state recorder, with a sampling rate of 44.1 kHz and 16-bit amplitude resolution. Two of the participants were recorded at their home in the United States, using the same microphone, sampling rate and amplitude resolution, but with a Marantz PMD 670 solid-state recorder.

The sentences were presented to the participants in a PowerPoint slideshow on a laptop computer, with each sentence on a separate slide. To avoid confusion, since the questions and statements differ orthographically only in the final punctuation, questions and statements were elicited in separate blocks. Within each block, the order of sentence stimuli was randomized. The sentences for this study were interspersed with sentences for other studies, which acted as distractors for this study.

Participants were instructed to read each sentence to themselves, and then to produce it without internal pauses, as a separate utterance (rather than as a member of a list). Participants were told that if they were not satisfied with their initial production, they could keep saying the sentence until they felt they had it right. They proceeded at their own pace through the sentences, but were instructed to finish saying a sentence before pressing the key to bring on the next one, and were asked to redo any sentence that was not preceded by a sufficient pause from the preceding sentence. When the participant produced a particular stimulus sentence more than once, the last one was selected for analysis, unless it had a clear internal pause or slip.

3.1.4 Measurements

Acoustic measurements were made using Praat (Boersma & Weenink Reference Boersma and Weenink2013). For sentence types with a lexical high tone (HLL, LHL, LLH), the measurement interval extended from the onset of the syllable with the lexical high tone (S1) to the end of the immediately following syllable (S2), if there was one, and otherwise to the end of S1. The onset of each syllable was marked at the end of the amplitude drop from the preceding vowel. The offset of the utterance was marked at the end of voicing.

The duration of both S1 and S2 were measured. F0 measurements were made automatically using a script, with adjustments for each speaker for pitch range, voicing threshold, silence threshold, and octave jump cost. All f0 measurements were subjected to 10 Hz smoothing. In sentence types with a lexical high tone, which were those with a localized f0 peak on a particular syllable, the following f0 measurements were made in the S1–S2 measurement domain:

  1. (8)

    1. a. F0 maximum: The maximum f0 within the measurement interval

    2. b. Offset f0: The f0 at the end of voicing in the utterance

    3. c. Peak delay: The duration of the interval from the onset of S1 to the f0 maximum

    4. d. Relative peak delay: Peak delay divided by the duration of S1

    5. e. F0 sequence: F0 at each 10% increment of the duration of S1 and the duration of S2

Where there was no lexical high tone (LLL), there was no local peak corresponding to a lexical high tone, and the measurement interval was the whole utterance. The second syllable of the verb (Smedial) was marked off, since that would be the onset of the final H% plateau in LLL statements, according to the usual pattern of tone spread (Hyman Reference Hyman, Sharon and Draga1982). This divided the utterance into an initial interval leading up to this syllable, and a final interval extending from that syllable to the end of voicing in the utterance. F0 was sampled at every 20% increment within each of these two intervals.

3.1.5 Predicted differences

The primary goal of the experiment was to test the hypotheses about how Luganda yes–no questions differ in f0 from statements. In those sentences with a lexical high tone (HLL, LHL, LLH), the f0 maximum is expected to be greater in yes–no questions than in statements, reflecting the ‘super-high’ nature of the question-marking intonational tone. Relative peak delay is further expected to be greater in yes–no questions than in statements, reflecting the position of that intonational tone immediately following the lexical high tone.

In LLL sentences, there is no local f0 peak on a particular syllable. In statements, the final stretch of syllables extending from the sentence-final syllable to the second syllable of the verb are expected to have a higher f0 than the syllables preceding those in the sentence, due to the final H% span (Myers et al. Reference Myers2018). In yes–no questions, on the other hand, the description of Hyman (Reference Hyman, Sharon and Draga1982) would lead us to expect a local f0 peak on the second syllable of the sentence, at an f0 level higher than that for a lexical high tone in the same position. The later descriptions of Hyman (Reference Hyman, Sharon and Draga1990) and Hyman & Katamba (Reference Hyman and Katamba2011), on the other hand, would lead us to expect that the questions should have lower f0 than statements throughout the final interval corresponding to the H% in statements. Following the description of Stevick (Reference Stevick1969a: 27), offset f0 would be expected to be lower in LLL questions than in LLL statements. The literature has not made clear whether this difference in offset f0 between questions and statements extends to sentences with a lexical high tone.

The location of the last lexical high tone was varied in this experiment in order to test the distributional claim that the question-marking intonational tone is lodged immediately following that lexical tone. But the location of this high-toned syllable relative to the end of the phrase would also be expected to affect the measurements. For example, the f0 maximum might be lower for peaks closer to the end of the sentence, given the pervasiveness of downtrends in f0 values over the course of the phrase in languages (Liberman & Pierrehumbert Reference Pierrehumbert1984, Poser Reference Poser1984, Pierrehumbert & Beckman Reference Pierrehumbert and Beckman1988).

Relative peak delay is also expected to be greater the farther that lexical high tone is from the end of the phrase. In other words, the f0 peak is expected to occur earlier in the syllable if that syllable is closer to the end of the phrase. Such a gradient effect of phrase-position on peak delay has been observed in Spanish (Prieto, van Santen & Hirschberg Reference Prieto, van Santen and Hirschberg1995) and Persian (Sadeghi Reference Sadeghi2017). Earlier f0 peaks for high tones in final compared to nonfinal syllables has been reported in English (Silverman & Pierrehumbert, Reference Silverman and Pierrehumbert1990), Spanish (Prieto et al. Reference Prieto, van Santen and Hirschberg1995), Palermo Italian (Grice Reference Grice1995), Chichewa (Myers Reference Myers1999), Kinyarwanda (Myers Reference Myers2003), Moroccan Arabic (Yeou Reference Yeou2004), Serbian (Smiljanić Reference Smiljanić, Louis, Doug and Catherine2006), German (Mücke & Hermes Reference Mücke and Anne2007), and Chickasaw (Gordon Reference Gordon and Sun-Ah2008).

Because relative peak delay is derived by dividing peak delay by syllable duration, a difference in relative peak delay according to sentence position could be due to differences in either of these component measurements. A lower relative peak delay in one condition can be attained by a shorter f0 rise, reflected in peak delay. Or it could be due to a longer syllable, reflected in S1 duration. These measurements will be examined to unpack any effects on relative peak delay.

With respect to S1 duration, phrase-final segments are generally longer than comparable phrase-medial ones (Klatt Reference Klatt1975). This pattern of final lengthening has been found in languages all over the world (Myers & Hansen Reference Myers and Hansen2007). It is gradient, in the sense that the effect is greater the closer the relevant segment is to the end of the phrase (Lindblom, Lyberg & Holmgren Reference Lindblom, Lyberg and Holmgren1981, Turk Reference Turk, Ohala, Yoko, Manjari, Daniel and Bailey1999), reflecting a gradual deceleration of articulatory movements as the speaker approaches pause (Edwards, Beckman & Fletcher Reference Edwards, Beckman and Fletcher1991). From these considerations, one might expect that S1 would have a greater duration the closer it was to the end of the phrase (in this case, the end of the sentence).

3.1.6 Statistical analysis

Mixed linear regression models were fit to the data using the packages lme4 (Bates et al. Reference Bates, Mächler, Bolker and Walker2014) and lmerTest (Kuznetsova, Brockhoff & Christensen Reference Kuznetsova, Brockhoff and Christensen2014) in R (R Core Team 2017). For analyses of the sentences with a lexical high tone, the fixed effects were Speech Act (Question/Statement), and Tone Position (the number of syllables separating the lexical high-toned syllable from the end of the phrase: HLL = 2, LHL = 1, LLH = 0). Pairwise comparison of the three levels in Tone Position were performed using the Tukey method in the emmeans package (Lenth et al. Reference Lenth, Singmann, Love, Buerkner and Herve2020). For the analysis of LLL, the only fixed effect was Speech Act. Random intercepts were included for Speaker and for Item (sentence), and random slopes for the interaction of Speaker with fixed effects. If the model failed to converge, the analysis was re-run with the random slopes omitted one by one until convergence was attained. The alpha level was p < .05.

3.2 Results

Figures 14 present sample annotated pitch tracks for representative sentences from Speaker 6. Syllables in the test word are marked off by vertical lines in the pitch track. For glosses and morpheme breakdowns of the examples, see the appendix.

Figure 1 Sample pitch tracks of HLL sentences. (a) Statement: Omulangira yalámula. ‘The prince judged’. (b) Question: Omulangira yalámula? ‘Did the prince judge?’.

Figure 2 Sample pitch tracks of LHL sentences. (a) Statement: Omulongo yalúma. ‘The twin bit’. (b) Question: Omulongo yalúma? ‘Did the twin bite?’.

Figure 3 Sample pitch tracks of LLH sentences. (a) Statement: Omulimi mulalú. ‘The farmer is crazy’. (b) Question: Omulimi mulalú? ‘Is the farmer crazy?’.

Figure 4 Sample pitch tracks of LLL sentences. (a) Statement: Omulimi alima ennimiro. ‘The farmer cultivates a garden’. (b) Question: Omulimi alima ennimiro? ‘Is the farmer cultivating a garden?’.

In Figures 1 and 2, displaying sentences with a nonfinal lexical high tone, the statement has an f0 rise beginning near the onset of the high-toned syllable and ending in an f0 peak near the end of that syllable. In the corresponding questions, the f0 rise that starts at about the same point as in the statement, but continues on to a peak in the syllable following that one. This f0 peak is higher than in the corresponding statement.

In Figure 3, the lexical high tone is in the final syllable of the sentence. The f0 peak is higher and occurs later in the syllable in the question than in the statement. The test syllable is also longer than in the HLL and LHL conditions.

In Figure 4, there is no lexical high tone, and no single-syllable f0 peak in either the statement or the question. The annotation marks the three words of the sentence, and also the three syllables of the medial verb (alima). In the statement in Figure 4a, the final H% plateau extends from the second syllable of the verb to the end of the sentence. In the question in Figure 4b, on the other hand, f0 at the start of the sentence is higher than in the statement, and there is a gradual f0 fall to the end of the sentence.

Pooling across speakers, Figure 5 presents the mean f0 sequence within the measurement interval for each condition with a lexical high tone. F0 measurements were made at each 10% increment of the duration of the target S1 syllable, and the same for S2 (the following syllable), if there was one. In order to allow the pooling of measurements from participants with quite different pitch ranges, f0 measurements have been normalized relative to the mean and standard deviation of the individual speaker, so that each plotting point represents the average for that timepoint of the normalized f0 in z-scores. Questions are marked by filled circles and statements by hollow triangles. In Figures 5a–b, the first panel shows S1, and the second panel shows S2. In Figure 5c, there is just one panel, displaying the sentence-final syllable with the lexical high tone. These normalized plots show that in the conditions with a lexical high tone, the question has a later and a higher f0 peak than the corresponding statement.

Figure 5 Mean normalized f0 (z) by proportion of measurement interval: (a) HLL, (b) LHL, (c) LLH. Hollow triangles mark measurement points in statements, and filled circles mark those in questions.

Figure 6 presents the normalized f0 trajectory for LLL sentences. Since there are no local f0 peaks in these sentences without lexical high tones, the whole sentence is represented in two intervals: the initial interval from the utterance onset to the onset of the second syllable of the verb, and a final interval from the offset of that syllable to the offset of the utterance. F0 was measured at the onset and offset of each interval, and at each 20% increment of the interval duration.

Figure 6 Mean normalized f0 (z) by proportion of measurement interval in the Initial and the Final intervals for LLL sentences.

The normalized f0 is higher throughout the initial interval in questions than in statements. In statements, f0 rises from the onset of Smedial (at the end of the first panel) to the offset of that same syllable (at the beginning of the second panel), representing the beginning of the final H% plateau. In questions, on the other hand, normalized f0 begins to fall at the offset of Smedial, ending at an offset f0 value lower than that for statements.

3.2.1 Sentences with a lexical high tone (HLL, LHL, LLH)

Figure 7 presents the mean normalized f0 maximum in sentences with a lexical high tone, broken down by Tone Position and Speech Act. The mean normalized f0 maximum was greater in questions (1.65) than in statements (0.52), and greater in lexical high tones that were farther from the end of the sentence: HLL (1.45), LHL (1.07), LLH (0.73). The pattern held across participants, all of whom had a higher mean normalized f0 maximum in questions than in statements. Except for S3, all participants also had the same descending pattern for position: HLL > LHL > LLH. For S3, the mean normalized f0 maximum was higher for HLL than for LHL, but LHL had a lower mean than LLH.

Figure 7 Mean normalized maximum f0 (z) by Tone Position and Speech Act. (Error bars represent standard deviation.)

A model of maximum f0 is presented in Table 2. Here the dependent variable is unnormalized maximum f0 (Hz), rather than the normalized values (z) depicted in Figure 7, since inclusion of Participant as a random effect takes into account the variation among participants in mean f0 level. Significant effects (p < .05) are highlighted in this and subsequent tables by boldface. Both the main effects of Speech Act and Tone Position were significant, and there was no significant interaction between them. The coefficient for Speech Act is 31.73, indicating that the predicted value for questions (the marked level) is 31.73 Hz greater than that for statements (the default level), when all other effects are factored out. The coefficient for Tone Position is 9.88, indicating that each syllable that separates the last lexical high tone from the end of the sentence increases the predicted f0 maximum by 9.88 Hz. The three tone position classes were compared pairwise to each other, and all pairs were significantly different.

Table 2 Model of Maximum f0. Maximum f0 ~ Speech Act × Tone Position + (1 + Speech Act| Participant) + (1| Item).

A reviewer points out that three of the sentences (in the LHL and LLH conditions) include a lexical high tone preceding the test high tone, and suggests that the effect of Tone Position could plausibly be due to downstep triggered by the preceding high tone in these items. However, Tone Position still has a significant effect in the same direction if these sentences are excluded.

The offset f0 was the f0 value at the end of the utterance. The group means for normalized offset f0 are presented in Figure 8.

Figure 8 Mean normalized offset f0 (z) by Tone Position and Speech Act. (Error bars represent standard deviation.)

Overall, mean normalized offset f0 was higher in questions (−0.29) than in statements (−0.88), and it was higher in positions closer to the end of the sentence than in those farther from the end: HLL (−0.82), LHL (−0.71), LLH (−0.24). The difference between question and statement is greater in LLH than in LHL, and greater in LHL than in HLL. These effects on offset f0 are modeled in Table 3. There is a main effect of Speech Act, and an interaction of Speech Act with Tone Position. The factor Speech Act has a positive coefficient in the model (36.95), while the interaction has a negative coefficient (−17.13). This indicates that questions (the marked level of Speech Act) have a significantly higher offset f0 than statements, but this effect is reduced with each syllable that separates the lexical high tone from the end of the sentence.

Table 3 Model of Offset f0. Offset f0 ~ Speech Act × Tone Position + (1 + Speech Act| Participant) + (1| Item).

To explore the interaction, each Tone Position subset was submitted to an analysis with the same structure as that in Table 3, but without the Tone Position factor. The results for the Speech Act variable are given in Table 4. The analysis is then broken down by Speech Act class in Table 5. Offset f0 was significantly greater in questions than in statements in Tone Position classes LLH and LHL, but not in HLL. On the other hand, the effect of Tone Position on offset f0 was limited to questions.

Table 4 Models of Offset f0, broken down by Tone Position subset. Offset f0 ~ Speech Act + (1| Participant) + (1| Item).

Table 5 Models of Offset f0, broken down by Speech Act subset. Offset f0 ~ Tone Position + (1| Participant) + (1| Item).

Peak delay for each token in the sample is plotted in Figure 9 against S1 duration, broken down by Tone Position and Speech Act. The dashed line marks x=y, so points on that line would mark instances in which the f0 peak is exactly at the end of the syllable. Points below that line mark f0 peaks within S1, while points above that line mark f0 peaks in the next syllable. The two solid lines are the regression lines through the question and statement points. Both lines slope upward in all three graphs, indicating that longer S1 duration is associated with longer peak delay. The crosses marking questions are generally above the triangles marking statements, reflecting the fact that the f0 peak delay in questions (mean = 252 ms) was generally greater than in statements (mean = 99 ms). The statement markers are generally below the x=y line, indicating that in statements the peak is within S1, while the question markers in HLL and LHL lie above that line, indicating that in these cases the peak for questions is in the syllable following S1. There is no such syllable in the case of LLH, so in that case the question points are above the statement points, but within S1.

Figure 9 Peak delay (ms) by S1 duration (ms), Tone Position and Speech Act: (a) HLL, (b) LHL, (c) LLH.

Relative peak delay provides a measure of how the peak is timed with respect to the syllable, since it gives the proportion of S1 duration at which the f0 peak is attained. If the value is below 1, the f0 peak lies within S1, and if it is over 1, the peak lies in the following syllable. The group means for relative peak delay are presented in Figure 10. Across tone positions, mean relative peak delay was higher in questions (1.64) than in statements (0.67). It was also higher in lexical high-tone positions that are farther from the end of the sentence: HLL (1.59) > LHL (1.41) > LLH (0.61). The difference between questions and statements was greater in the earlier tone positions than in the later ones. Both the differences due to Speech Act and those due to Tone Position held for each participant considered separately.

Figure 10 Mean relative peak delay by Tone Position and Speech Act. (Error bars represent standard deviation.)

A model of relative peak delay is presented in Table 6. Both main effects are significant, as well as the interaction. In a pairwise test of the three levels in Tone Position, all pairs were found to be significantly different.

Table 6 Model of Relative Peak Delay. Relative peak delay ~ Speech Act × Tone Position + (1 + Speech Act + Tone Position| Participant) + (1| Item).

To explore the interaction, we examine the three Tone Position subsets. Each subset was submitted to an analysis with the same structure as that in Table 6, but without the Tone Position factor. The results for the Speech Act factor for each Tone Position subset are given in Table 7.

Table 7 Model of Relative Peak Delay, broken down by Tone Position subset. Relative peak delay ~ Speech Act + (1| Participant) + (1| Item).

The peak is significantly later in questions than in corresponding statements in all three Tone Position classes, but the effect of Speech Act is greater for Tone Position classes with the lexical high tone farther from the end of the sentence.

However, the interpretation of these results is complicated by the fact that relative peak delay is a derived measurement calculated by dividing peak delay by S1 duration. The effects of Speech Act and Tone Position on relative peak delay, as seen in Tables 6 and 7, could therefore be due to effects on peak delay or S1 duration. We therefore examine these measurements next.

The mean duration for the high-toned syllable S1 (ms) is presented broken down by Speech Act and Tone Position in Figure 11. Mean S1 duration was greater in later sentence positions than in earlier ones: LLH (260 ms) > LHL (146 ms) > HLL (116 ms). It was also greater in questions (187 ms) than in statements (162 ms), but this effect was clearly limited mainly to the LLH position class.

Figure 11 Mean S1 duration by Tone Position and Speech Act. (Error bars represent standard deviation.)

The analysis of S1 duration is presented in Table 8. There is a main effect of Tone Position, and a significant interaction of Speech Act with Tone Position. To investigate the interaction, Tone Position subsets were each submitted to an analysis with the same structure as that in Table 8, but without the Tone Position factor. Speech Act did not have a significant effect in any of these models. In the pairwise comparison of the Tone Position classes, S1 duration was significantly greater in LLH than in HLL and LHL, but there was no difference between HLL and LHL. The results thus provide evidence that the test syllable was longer when it was phrase-final than when it was phrase medial.

Table 8 Model of S1 duration. S1 duration ~ Speech Act * Tone Position + (1 + Speech Act| Participant) + (1| Item).

Group means for peak delay, the other component of relative peak delay, are presented in Figure 12. Mean peak delay was greater in questions (252 ms) than in statements (99 ms), but it did not vary greatly according to Tone Position: HLL (181 ms), LHL (180 ms), LLH (165 ms).

Figure 12 Mean peak delay by Tone Position and Speech Act. (Error bars represent standard deviation.)

The analysis is presented in Table 9. The main effect of Speech Act is significant, but not the effect of Tone Position, or the interaction.

Table 9 Model of Peak Delay. Peak delay ~ Speech Act × Tone Position + (1 + Speech Act + Tone Position| Participant) + (1| Item).

The difference in relative peak delay between questions and statements (Tables 6 and 7) was reflected in a parallel difference in peak delay (Table 9), and so was due to the longer absolute peak delay in questions than in statements. On the other hand, the fact that relative peak delay was greater in earlier tone positions than in later ones was reflected in a parallel trend in syllable duration (Table 8), and so was due to the fact that phrase-final syllables were longer.

3.2.2 Sentences without a lexical high tone (LLL)

In the LLL sentences without a lexical high tone, there is no local f0 maximum to measure. Instead, f0 was sampled at 20% increments of the interval up to Smedial and the interval from the offset of that syllable to the end of the utterance. The f0 values for questions and statements were compared at each measurement position, and the results are presented in Table 10. The column ‘Random slope’ indicates whether or not the random slope factor Participant × Speech Act was included in the final model.

Table 10 Models of F0 in LLL, broken down by timepoint. F0 at timepoint ~ Speech Act + (1 + Speech Act| Speaker) + (1| Item).

Questions had a significantly higher f0 value at each measurement point up through the one 20% through the interval from the end of Smedial to the end of the utterance. The next measurement point after that had no significant difference between questions and statements, and then in the last three measurement points, the f0 value in questions was significantly lower than in statements.

The LLL sentences were by far the most variable sentence type in their f0 production, and participants produced in particular the yes–no questions in this sentence type with greater hesitancy and more attempts than in any other condition. Figure 13 presents the time-normalized f0 trajectories for LLL questions and statements for each individual participant (solid circles marking questions and hollow triangles marking statements).

Figure 13 Mean normalized f0 (z) by proportion of measurement interval in the Initial and the Final intervals for LLL sentences for each speaker. Hollow triangles mark measurement points in statements, and black circles mark those in questions.

For 14 of the 18 participants, f0 in questions was higher than in statements throughout the initial interval (in the first panel). This wasn’t the case for S9, S11, S13, or S14. Furthermore, 14 of 18 participants had a high plateau in questions extending from the beginning of the sentence to the first syllable of the second interval, with the final f0 fall beginning in the final interval. But 4 of the participants had a peak early in the first interval in questions, followed by a steady fall to the end (S1, S6, S9, S17). All participants had an increase in f0 in statements from the end of the first interval to the beginning of the second, on the second syllable of the verb, but participants differ in how large this increase was, and in the slope of the subsequent trajectory in the second interval. The final f0 measurement point was lower in questions than in statements for all participants, but the difference between them was very small for S10 and S15. None of these patterns of variation among the participants coincided with groups defined by gender, home district, or age (Table 1).

4 Discussion

4.1 The difference between yes–no questions and statements in Luganda

This study has clarified how yes–no questions in Luganda differ from corresponding statements. It has done so partly because it is the first study of the topic based on objective acoustic measurements, but also because it included cases that have not been covered in the quite brief discussions of the matter in the previous literature.

In sentences with a lexical high tone, Hyman (Reference Hyman, Sharon and Draga1982) described the yes–no question as differing from the corresponding statement in having a ‘super-high’ tone immediately following the last lexical high tone. We expressed this quantitatively in the hypothesis that the yes–no question would have an f0 peak that was significantly higher and later than in the corresponding statement. This hypothesis was supported in our study. In both the question and the statement, the f0 rise began near the beginning of the S1 syllable (the one with the last lexical high tone in the sentence), and it rose in both speech act types throughout most of that syllable. In the question, however, f0 continued to rise during the syllable following S1, attaining an f0 peak in that syllable that was higher than would be found in a lexical high tone in a comparable position. Both the f0 maximum and the relative peak delay were greater in the yes–no question than in the corresponding statement.

In sentence types in which there was a syllable following the lexical high-tone syllable, i.e. in HLL and LHL, the super-high f0 peak marking yes–no question occurred in that following syllable. But in LLH, in which the final lexical high tone occurred in the sentence-final syllable, the question-marking peak occurred in that same final syllable, but later in the syllable than the peak in the corresponding statements. This is consistent with how the pattern is described by Hyman (Reference Hyman, Sharon and Draga1982, Reference Hyman, Sharon and Draga1990) and Stevick (Reference Stevick1969a), though examples of the LLH case were not provided in any of those works.

All of the examples in Hyman (Reference Hyman, Sharon and Draga1982, Reference Hyman, Sharon and Draga1990) and Hyman & Katamba (Reference Hyman and Katamba2011) have just one lexical high tone, but Hyman (Reference Hyman, Sharon and Draga1990: 122) specifies that the yes–no question marker occurs after the last lexical high tone. The materials in the current experiment included examples in which there was more than one lexical high tone, and these confirmed that this generalization was correct. In the sentences in (9), for example, there are lexical high tones on the penultimate syllables of both the verb yanóna and the object omulére, and the yes–no question in (9b) has the super-high tone only after the second of those.

In all cases in this sample with more than one lexical high tone, the question-marking super-high tone occurred only after the last one.

Stevick (Reference Stevick1969a: 27) stated that the question-marking pitch rise only occurred when the sentence-final word had a lexical high tone. Hyman (Reference Hyman, Sharon and Draga1982, Reference Hyman, Sharon and Draga1990) did not restrict the pattern to the final word, but only cited one-word examples. It was therefore unclear from these descriptions what the intonation of a yes–no question would be if the last lexical high tone wasn’t in the final word of the sentence. As it happens, all the sentences in this experiment that have a lexical high tone have it in the final word of the sentence. But in the preliminary pilot work leading up to this experiment, there were sentences such as those in (10), in which the last lexical high tone was in a nonfinal word.

Here the last lexical high tone in the statement in (10a) is on the second syllable of the verb, yamánya ‘he/she knew’, which is the third word from the end of the sentence. In the corresponding question in (10b), the three speakers who produced these test items consistently put the super-high tone on the syllable following the lexical high tone in the verb. This confirms the generalization of Hyman (Reference Hyman, Sharon and Draga1990) that it is the last high tone that is relevant, and provides evidence against the generalization of Stevick (Reference Stevick1969a) that the relevant high tone must be in the last word of the sentence.

In sentences without a lexical high tone (LLL), Hyman (Reference Hyman, Sharon and Draga1982) described the yes–no question as having a super-high tone on the second syllable. This description would be supported if there was a peak on the second syllable of the yes–no question with a greater f0 maximum than in the corresponding syllable of the statement. As it turned out, there was no one-syllable f0 maximum anywhere in either the question or the statement in these sentences, so this description was not supported in this study.

On the other hand, Stevick Reference Stevick1969a), Hyman (Reference Hyman, Sharon and Draga1990) and Hyman & Katamba (Reference Hyman and Katamba2011) all described LLL questions as having low tone throughout with lower final pitch than in the corresponding statement, while the statement had a final high plateau throughout the final phrase. These descriptions would be supported if f0 was low throughout the initial interval in both questions and statements, and then higher in statements than in questions for the final interval (reflecting the final H% in statements).

This description was only partially supported by our findings. Most participants (14 out of 18) had higher f0 in questions than in statements throughout the initial interval of the sentence from the beginning to the onset of Smedial, the second syllable of the verb. Generalizing across speakers, f0 was significantly higher in questions than in statements at each measurement point throughout this interval. This was not expected based on any of the previous descriptions, and it suggests the presence of an intonational high tone in this interval of the yes–no questions that is absent in the corresponding statements.

On the other hand, the f0 trajectories in the second interval, starting with Smedial, were more in line with these descriptions. All speakers showed an f0 rise in the LLL statements over the course of the syllable Smedial, though they varied in whether that syllable was followed by a plateau or a gradual decline. F0 was significantly lower in the question than in the statement at all measurement points in the final 40% of this interval, with the greatest differences in the utterance-final measurement point. The difference in offset f0 is consistent with the accounts of Stevick (Reference Stevick1969a) and Hyman (Reference Hyman, Sharon and Draga1990), who described this as an important difference between statements and yes–no questions in sentences without lexical high tones.

4.2 Effects of Tone Position

In sentences with a lexical high tone, the position of the final lexical high tone was systematically varied, in order to test whether that would affect the position of the intonational tone marking yes–no intonation. The f0 maximum in yes–no questions occurred immediately following the position of the final f0 peak in the corresponding statement, supporting the description of Hyman (Reference Hyman, Sharon and Draga1982, Reference Hyman, Sharon and Draga1990), according to which the super-high tone marking yes–no questions is associated immediately following the last lexical high tone in the sentence.

This variation in the position of the final lexical high tone also had other measurable effects. Maximum f0 was greater the farther that lexical high tone was from the end of the phrase: HLL > LHL > LLH. Such an effect reflects the general downtrend in f0 values over the course of the phrase, and in particular the effects of f0 lowering in phrase-final position (Liberman & Pierrehumbert Reference Pierrehumbert1984, Poser Reference Poser1984, Pierrehumbert & Beckman Reference Pierrehumbert and Beckman1988). The effect was gradient in Luganda, in that a high tone in the antepenultimate syllable had a significantly higher maximum f0 value than one in the penultimate syllable, and the latter in turn had a significantly higher maximum f0 value than one in the final syllable.

The effect of Tone Position on offset f0, however, went in the opposite direction from the lowering effects seen with maximum f0. In questions, f0 at the end of the utterance was higher when the lexical high tone was closer to the end: LLH > LHL > HLL. This can be interpreted as a coarticulatory effect of the high f0 peak in questions. It takes time for f0 to return to baseline values after such a high peak, and the less time there is for this recovery from the peak, the higher f0 will be when time runs out at the end of the utterance.

Tone Position also had an effect on the timing of the f0 peak. Relative peak delay was greater the farther that lexical high tone was from the end of the phrase. In other words, the f0 peak occurred earlier in the syllable if that syllable was closer to the end of the phrase. Such a gradient effect of phrase-position on f0 timing has been observed in Spanish (Prieto et al.Reference Prieto, van Santen and Hirschberg1995) and Persian (Sadeghi Reference Sadeghi2017). The difference in Luganda was not due to differences in the duration of the f0 rise, but instead was associated with longer S1 duration in the final syllable of the phrase compared to nonfinal syllables (final lengthening). Peak delay is in general greater if syllable duration is greater, as we saw in Figure 9, but the additional syllable duration due to phrase position does not seem to count for this relation, leading to proportionally earlier f0 peaks in lengthened syllables, as found in English by Silverman & Pierrehumbert (Reference Silverman and Pierrehumbert1990).

4.3 Representations

The f0 patterns described above reflect sequences of tone categories, both lexical and intonational, and a context-sensitive system of phonetic implementation mapping those sequences to f0 trajectories (Pierrehumbert Reference Pierrehumbert1980, Beckman & Pierrehumbert Reference Pierrehumbert and Beckman1988). In this section we will consider the question of what kinds of tone categories could lead to the observed f0 patterns.

A tone category is associated with a syllable or, in the case of tone spread, a sequence of syllables. It is reflected in the f0 trajectory by a movement in f0 that is timed with respect to that syllable or sequence of syllables. Lexical tones belong to particular lexical items, and whether a lexical tone occurs in the utterance depends on which lexical items are there.

It is assumed here that the contrast between high tone and low tone in Luganda is a privative one between the presence of a high f0 target (H) and the absence of such a target (Stevick Reference Stevick1969b, Myers Reference Myers1998). H tone in Luganda is realized with an f0 rise from the default low f0 level to a relatively high value, followed by a return to the default level (Myers et al. Reference Myers2019), as in the realization of H* in English (Pierrehumbert Reference Pierrehumbert1980). The realization of the falling tone of Luganda, which was not included in the current study, is the same as the high tone, except with an earlier f0 peak (Myers et al. Reference Myers2019).

The final H% in statements is a boundary tone which is associated with the final syllable in the sentence (Hyman Reference Hyman, Sharon and Draga1990). It is an intonational tone because it does not belong to any of the words that make up the sentence, and whether it occurs or not depends on what kind of sentence it is. This intonational high tone, like a lexical high tone, is subject to unbounded spread, as described above in Section 2, which spreads the tone leftward up to the syllable after a high-toned syllable, or the syllable after the onset of the phrase (Hyman & Katamba Reference Hyman and Katamba2010). Hyman & Katamba (Reference Hyman and Katamba2010: 71) describes this boundary tone as optional in statements, but in our sample it occurred reliably in any statement that ended in a sequence of three or more lexically toneless syllables. As shown by Myers et al. (Reference Myers2018), the statement H% is realized with a lower f0 maximum than a comparable lexical high tone.

Yes–no questions in Luganda must be marked by an intonational high tone, since they differ from the corresponding statements in having an additional interval of raised f0, whether following a lexical high tone, or forming a peak or plateau at the beginning of the sentence. This is not a boundary H%, since it is not constrained to the final syllable of the intonational phrase. Rather, its positioning is parallel to the H of English, which Pierrehumbert (Reference Pierrehumbert1980) describes as occurring immediately following the nuclear pitch accent in the sentence. The question phrase accent H occurs immediately following the final lexical H, i.e. as early as it can occur without preceding such a tone, as in (11a–c). If there is no lexical high tone, it occurs on the second syllable in the domain, either the sentence, as in (11d), or the tone phrase, as in (11e).

In LLL sentences, there is no lexical H tone to block the leftward path of H. For those speakers with an early f0 peak in LLL questions, the H is associated with the second syllable in the sentence, as in (11d). For those speakers with an initial f0 plateau, the H is instead associated with the second syllable of the final tone phrase, as defined by Hyman et al. (Reference Hyman, Sharon and Draga1987), which in these sentences corresponds to the final verb phrase (the verb and the following complement or modifier). The H is blocked from landing on the first syllable of the phrase, just as the leftward spread of a high tone within that tone phrase would be blocked from that first syllable (Hyman Reference Hyman, Sharon and Draga1982). From that docking site, the H is subject to unbounded leftward spread to the second syllable of the sentence, as described by Hyman & Katamba (Reference Hyman and Katamba2010). The result is the representation in (11e).

The H is phonetically interpreted like other high tones in the language, except that it is assigned a higher f0 value when it occurs immediately following another high tone. According to the model in Table 2, the f0 peak in a question with a lexical high tone is 31.7 Hz higher than the corresponding statement, all else being equal. This upstep effect is similar to the raising of H% after H in English polar questions (Pierrehumbert Reference Pierrehumbert1980), or the raising of intonational H after H* in circumflex question intonation in Spanish (Torreira & Grice Reference Grice2018).

The offset f0 value of a LLL yes–no question is lower than in a corresponding statement. This cannot be due to the H in questions, which is not at the end of the sentence in LLL sentences. Instead, it can be attributed to the fact that LLL statements end in a H% plateau. There is no such high boundary tone in the corresponding questions, so they end at a lower f0 level.

According to this description, Luganda yes–no questions occupy a previously unattested place in intonational typology. These questions are marked by an intonational high tone, as is the case in many languages, but it is a mobile phrase accent, not a boundary tone restricted to the final syllable of the phrase, as in English (Pierrehumbert Reference Pierrehumbert1980), Japanese (Pierrehumbert & Beckman Reference Pierrehumbert and Beckman1988) or Chichewa (Myers Reference Myers1996). This phrase accent is positioned in the zone between the last pitch prominence in the phrase and the end of the phrase, as proposed for other languages by Pierrehumbert (Reference Pierrehumbert1980) and Grice et al. (Reference Grice, Ladd and Arvaniti2000), but in Luganda that last pitch prominence is a lexical tone, not associated with a stress prominence.

The distribution of the H tone after the last marked element and otherwise at the beginning of the domain is reminiscent of parallel patterns of distribution in nonlinear phonology. For example, stress falls on the last heavy syllable and otherwise the first syllable in Eastern Cheremis or Huasteco (Hayes Reference Hayes1981). In Chaha, the 3rd masculine singular objective in perfectives is marked by labialization of the last labializable consonant in the base (McCarthy Reference McCarthy, Ulrike, Thomas, Huilin, Alfredo GarcÍa, Peter, Brian, Charlie and Iris Chuoying1983). In Japanese mimetics, the ‘uncontrolled’ variant is marked by palatalization of the last coronal in the form and otherwise the initial consonant (Mester & Itô Reference Mester and Itô1989). In each of these cases, an entity is positioned as close to one end as it can get without crossing a designated obstacle. The result is a pattern of distribution in which the entity occurs at the leftmost/rightmost obstacle, and in the absence of obstacles, at the rightmost/leftmost end.

5 Conclusion

In this paper, experimental acoustic evidence has been provided to test descriptive claims about the yes–no question intonation in Luganda. The results support a model in which the yes–no question is marked by an intonational high tone that is positioned immediately after the last lexical high tone, if there is one, and otherwise on the second syllable of the domain (tone phrase or sentence). In its positioning after the last tone target in the phrase, this intonational tone is parallel to the H phrase accent posited in analyses of the intonational systems of European languages (Bruce Reference Bruce1977, Pierrehumbert Reference Pierrehumbert1980, Grice et al. Reference Grice, Ladd and Arvaniti2000).

Acknowledgements

The author would like to thank Dr. Saudah Namyalo, Sam and Rose Musoke, Anatole Kiriggwajjo, and Paul Bbosa for their invaluable help in designing and running this experiment, the 19 participants for sharing their knowledge of Luganda, and the editor, associate editor and reviewers for the feedback that helped get this article into shape.

Appendix. Test sentences

In the following list of test sentences, the statement form of each statement is given. The representation is the standard orthography, augmented with acute accents marking lexical high tones. The question is identical in the orthography except with a question mark at the end of the sentence. The number in parentheses after the gloss indicates the number of repetitions for that sentence for each speaker for each sentence type.

Supplementary material

To view supplementary material for this article (including audio files to accompany the language examples), please visit https://doi.org/10.1017/S0025100321000025.

Footnotes

1 Glossing abbreviations are as in the Leipzig Glossing Rules (http://www.eva.mpg.de/lingua/resources/glossing-rules.php) except for: 1–23 = noun class, fv = final vowel (terminal verb suffix), iv = initial vowel (i.e. augment).

References

Bates, Douglas, Mächler, Martin, Bolker, Ben & Walker, Steve. 2014. Lme4: Linear mixed-effects models using eigen and S4. http://www.R-project.org.Google Scholar
Beckman, Mary & Julia, Hirschberg. 1994. The ToBI annotation conventions. http://www.speech.cs.cmu.edu/tobi/ToBI.6.html.Google Scholar
Boersma, Paul & Weenink, David. 2013. Praat: Doing phonetics by computer (version Stanford, CA, 5.3.53). http://www.praat.org/.Google Scholar
Bruce, Gösta. 1977. Swedish word accents in sentence perspective. Lund: CWK Gleerup.Google Scholar
Carter, Hazel. 1973. Syntax and tone in Kongo. London: School of Oriental and African Studies (SOAS).Google Scholar
Clements, G. N. 1986. Compensatory lengthening and consonant gemination in LuGanda. In Leo, Wetzels & Engin, Sezer (eds.), Studies in compensatory lengthening, 3977. Dordrecht: Foris.Google Scholar
Clements, G. N. & Kevin, Ford. 1981. On the phonological status of downstep in Kikuyu. In Didier, Goyvaerts (ed.), Phonology in the 1980's, 309357. Amsterdam: John Benjamins.Google Scholar
Cole, D. T. 1967. Some features of Ganda linguistic structure. Johannesburg: Witwatersrand University Press.Google Scholar
Downing, Laura. 1995. The metrical domain of register raising in Jita: An optimality approach. In Francis, Katamba (ed.), Bantu phonology and morphology, 2839. Munich: Lincom Europa.Google Scholar
Edwards, Jan, Beckman, Mary E. & Fletcher, Janet. 1991. The articulatory kinematics of final lengthening. The Journal of the Acoustical Society of America 89, 369382.CrossRefGoogle ScholarPubMed
Féry, Caroline. 1993. German intonational patterns. Tübingen: Max Niemeyer Verlag.CrossRefGoogle Scholar
Genzel, Susanne & Frank, Kügler. 2020. Production and perception of question prosody in Akan. Journal of the International Phonetic Association 50(1), 6192.CrossRefGoogle Scholar
Gordon, Matthew. 2005. Intonational phonology of Chickasaw. In Sun-Ah, Jun (ed.), Prosodic typology: The phonology of intonation and phrasing, 301330. Oxford: Oxford University Press.CrossRefGoogle Scholar
Gordon, Matthew. 2008. Pitch accent timing and scaling in Chickasaw. Journal of Phonetics 36, 521535.CrossRefGoogle Scholar
Grice, Martine. 1995. The intonation of interrogation in Palermo Italian. Tübingen: Niemeyer.CrossRefGoogle Scholar
Grice, Martine, Ladd, D. Robert & Arvaniti, Amalia. 2000. On the place of phrase accents in intonational phonology. Phonology 17, 143185.CrossRefGoogle Scholar
Guthrie, Malcolm. 1940. Tone ranges in a two-tone language (Lingala). Bulletin of the School of Oriental and African Studies 10, 469478.CrossRefGoogle Scholar
Hadding-Koch, Kerstin & Michael, Studdert-Kennedy. 1964. An experimental study of some intonation contours. Phonetica 11, 175185.CrossRefGoogle Scholar
Hayes, Bruce. 1981. A metrical theory of stress rules. Bloomington, IN: Indiana University Linguistics Club.Google Scholar
Herman, Rebecca. 1996. Final lowering in Kipare. Phonology 13, 171196.CrossRefGoogle Scholar
Hyman, Larry M. 1982. Globality and the accentual analysis of Luganda tone. Journal of Linguistic Research 2, 140.Google Scholar
Hyman, Larry M. 1990. Boundary tonology and the prosodic hierarchy. In Sharon, Inkelas & Draga, Zec (eds.), The phonology–syntax connection, 109125. Chicago, IL: The University of Chicago Press.Google Scholar
Hyman, Larry M. & Katamba, Francis X.. 1993. A new approach to tone in Luganda. Language 69, 3467.CrossRefGoogle Scholar
Hyman, Larry M. & Katamba, Francis X.. 2010. Tone, syntax and prosodic domains in Luganda. In Laura Downing, Annie Rialland, Jean-Marc Beltzung, Sophie Manus, Cédric Patin & Kristina, Riedel (eds.), Papers from the Workshop on Bantu Relative Clauses, 6998. Berlin: Zentrum für Allgemeine Sprachwissenschaft.Google Scholar
Hyman, Larry M. & Katamba, Francis X.. 2011. The tonology of WH questions in Luganda. ZAS Papers in Linguistics 55, 6581.CrossRefGoogle Scholar
Hyman, Larry M., Katamba, Francis X. & Livingstone, Walusimbi. 1987. Luganda and the strict layer hypothesis. Phonology Yearbook 4, 87108.Google Scholar
Hyman, Larry M. & Monaka, Kemmonye C.. 2011. Tonal and non-tonal intonation in Shekgalagari. In Sonia, Frota, Gorka, Elordieta & Pilar, Prieto (eds), Prosodic categories: Production, perception and comprehension, 267290. Dordrecht: Springer.CrossRefGoogle Scholar
Inkelas, Sharon & Leben, William. 1990. Where phonology and phonetics intersect: The case of Hausa intonation. In Kingston & Beckman (eds.), 17–34.Google Scholar
Kingston, John & Beckman, Mary E. (eds.). 1990. Papers in Laboratory Phonology I: Between the grammar and the physics of Speech. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Klatt, Dennis. 1975. Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics 3, 129140.CrossRefGoogle Scholar
Kuznetsova, Alexandra, Brockhoff, Per B. & Christensen, Rune H. B.. 2014. Tests in linear mixed effects models. http://www.R-project.org.Google Scholar
Lenth, Russell, Singmann, Henrik, Love, Jonathon, Buerkner, Paul & Herve, Maxime. 2020. emmeans: Estimated marginal means, aka least-squares means. https://cran.r-project.org/web/packages/emmeans/index.html.Google Scholar
Liberman, Mark & Pierrehumbert, Janet [B.]. 1984. Intonational invariance under changes in pitch range and length. In Mark Aronoff & Richard Oehrle, with Frances, Kelley & Bonnie Wilker, Stephens (eds.), Language sound structure, 157233. Cambridge, MA: MIT Press.Google Scholar
Lindblom, Björn, Lyberg, Bertil & Holmgren, Karin. 1981. Durational patterns of Swedish phonology: Do they reflect short-term memory processes? Bloomington, IN: Indiana University Linguistics Club.Google Scholar
McCarthy, John J. 1983. Consonantal morphology in the Chaha verb. In Ulrike, Steindl, Thomas, Borer, Huilin, Fang, Alfredo GarcÍa, Pardo, Peter, Guekguezian, Brian, Hsu, Charlie, O’Hara & Iris Chuoying, Ouyang (eds.), West Coast Conference on Formal Linguistics 32 (WCCFL 32), 176188. Somarville, MA: Cascadilla Proceedings Project.Google Scholar
Mester, R. Armin & Itô, Junko. 1989. Feature predictability and underspecification: Palatal prosody in Japanese mimetics. Language 65, 258293.CrossRefGoogle Scholar
Mücke, Doris & Anne, Hermes. 2007. Phrase boundaries and peak alignment: An acoustic and articulatory study. Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS XVI), Saarbrücken, 997–1000. http://www.icphs2007.de.Google Scholar
Myers, Scott. 1996. Boundary tones and the phonetic implementation of tone in Chichewa. Studies in African Linguistics 25, 2960.CrossRefGoogle Scholar
Myers, Scott. 1998. Surface underspecification of tone in Chichewa. Phonology 15, 367391.CrossRefGoogle Scholar
Myers, Scott. 1999. Tone association and f0 timing in Chichewa. Studies in African Linguistics 28, 215239.CrossRefGoogle Scholar
Myers, Scott. 2003. F0 timing in Kinyarwanda. Phonetica 60, 7197.CrossRefGoogle ScholarPubMed
Myers, Scott. 2020. An acoustic study of sandhi vowel hiatus in Luganda. Language and Speech 63(3), 506525.CrossRefGoogle ScholarPubMed
Myers, Scott & Hansen, Benjamin B.. 2007. The origin of vowel length neutralization in final position: Evidence from Finnish speakers. Natural Language & Linguistic Theory 25, 157193.CrossRefGoogle Scholar
Myers, Scott, Namyalo, Saudah & Kiriggwajjo, Anatole. 2019. F0 timing and tone contrasts in Luganda. Phonetica 76(1), 5581.CrossRefGoogle ScholarPubMed
Myers, Scott, Selkirk, Elisabeth & Fainleib, Yelena. 2018. Phonetic implementation of high-tone spans in Luganda. Laboratory Phonology 9(1), 122.CrossRefGoogle Scholar
Pak, Marjorie. 2008. The postsyntactic derivation and its phonological reflexes. Ph.D. dissertation, University of Pennsylvania.Google Scholar
Pierrehumbert, Janet B[reckenridge]. 1980. The phonetics and phonology of English intonation. Bloomington, IN: Indiana University Linguistics Club.Google Scholar
Pierrehumbert, Janet B. & Beckman, Mary E.. 1988. Japanese tone structure. Cambridge, MA: MIT Press.Google Scholar
Poser, William. 1984. The phonetics and phonology of tone and intonation in Japanese. PhD dissertation, MIT.Google Scholar
Prieto, Pilar, van Santen, Jan & Hirschberg, Julia. 1995. Tonal alignment patterns in Spanish. Journal of Phonetics 23, 429451.CrossRefGoogle Scholar
R Core Team. 2017. R: A language and environment for statistical computing (version 3.3.3). http://www.R-project.org/.Google Scholar
Rialland, Annie. 2009. The African lax question prosody: Its realization and geographical distribution. Lingua 119, 928949.CrossRefGoogle Scholar
Sadeghi, Vahid. 2017. The timing of pre-nuclear pitch accents in Persian. Journal of the International Phonetic Association 49(3), 305329.CrossRefGoogle Scholar
Sibomana, Leonidas. 1974. Deskriptive Tonologie des Kinyarwanda. Hamburg: Helmut Buske.Google Scholar
Silverman, Kim E. A. & Pierrehumbert, Janet B.. 1990. The timing of prenuclear high accents in English. In Kingston & Beckman (eds.), 72–106.Google Scholar
Smiljanić, Rajka. 2006. Early vs. late focus: Pitch-peak alignment in two dialects of Serbian and Croatian. In Louis, Goldstein, Doug, Whalen & Catherine, Best (eds.) Laboratory Phonology 8, 495518. Berlin: Mouton de Gruyter.CrossRefGoogle Scholar
Snoxall, R. A. 1967. Luganda–English dictionary. Oxford: Oxford University Press.Google Scholar
Stevick, Earl. 1969a. Pitch and duration in Ganda. Journal of African Languages 8, 128.Google Scholar
Stevick, Earl. 1969b. Tone in Bantu. International Journal of American Linguistics 35, 330341.CrossRefGoogle Scholar
Thorsen, Nina. 1978. An acoustical investigation of Danish intonation. Journal of Phonetics 6, 151175.CrossRefGoogle Scholar
Torreira, Francisco & Grice, Martine. 2018. Melodic constructions in Spanish: Metrical structure determines the association properties of intonational tones. Journal of the International Phonetic Association 48, 932.CrossRefGoogle Scholar
Tucker, A. N. 1962. The syllable in Luganda: A prosodic approach. Journal of African Languages 1, 122166.Google Scholar
Turk, Alice. 1999. Structural influences on boundary-related lengthening in English. In Ohala, John J., Yoko, Hasegawa, Manjari, Ohala, Daniel, Granville & Bailey, Ashlee C. (eds.), Proceedings of the XIVth International Congress of Phonetic Sciences (ICPhs XIV), 237240. Berkeley, CA: Linguistics Department, University of California at Berkeley.Google Scholar
Yeou, Mohamed. 2004. Effects of focus, position and syllable structure on f0 alignment patterns in Arabic. Paper presented at Arabic Language Processing, Fez. http://www.afcp-parole.org/doc/Archives_JEP/2004_XXVe_JEP_Fes/actes/arabe2004/PAMY11.pdf Google Scholar
Ziervogel, Dirk, Wentzel, Petrus & Makuya, T. N.. 1972. A handbook of the Venda language, 2nd edn. Pretoria: UNISA.Google Scholar
Figure 0

Table 1 Participants.

Figure 1

Figure 1 Sample pitch tracks of HLL sentences. (a) Statement: Omulangira yalámula. ‘The prince judged’. (b) Question: Omulangira yalámula? ‘Did the prince judge?’.

Figure 2

Figure 2 Sample pitch tracks of LHL sentences. (a) Statement: Omulongo yalúma. ‘The twin bit’. (b) Question: Omulongo yalúma? ‘Did the twin bite?’.

Figure 3

Figure 3 Sample pitch tracks of LLH sentences. (a) Statement: Omulimi mulalú. ‘The farmer is crazy’. (b) Question: Omulimi mulalú? ‘Is the farmer crazy?’.

Figure 4

Figure 4 Sample pitch tracks of LLL sentences. (a) Statement: Omulimi alima ennimiro. ‘The farmer cultivates a garden’. (b) Question: Omulimi alima ennimiro? ‘Is the farmer cultivating a garden?’.

Figure 5

Figure 5 Mean normalized f0 (z) by proportion of measurement interval: (a) HLL, (b) LHL, (c) LLH. Hollow triangles mark measurement points in statements, and filled circles mark those in questions.

Figure 6

Figure 6 Mean normalized f0 (z) by proportion of measurement interval in the Initial and the Final intervals for LLL sentences.

Figure 7

Figure 7 Mean normalized maximum f0 (z) by Tone Position and Speech Act. (Error bars represent standard deviation.)

Figure 8

Table 2 Model of Maximum f0. Maximum f0 ~ Speech Act × Tone Position + (1 + Speech Act| Participant) + (1| Item).

Figure 9

Figure 8 Mean normalized offset f0 (z) by Tone Position and Speech Act. (Error bars represent standard deviation.)

Figure 10

Table 3 Model of Offset f0. Offset f0 ~ Speech Act × Tone Position + (1 + Speech Act| Participant) + (1| Item).

Figure 11

Table 4 Models of Offset f0, broken down by Tone Position subset. Offset f0 ~ Speech Act + (1| Participant) + (1| Item).

Figure 12

Table 5 Models of Offset f0, broken down by Speech Act subset. Offset f0 ~ Tone Position + (1| Participant) + (1| Item).

Figure 13

Figure 9 Peak delay (ms) by S1 duration (ms), Tone Position and Speech Act: (a) HLL, (b) LHL, (c) LLH.

Figure 14

Figure 10 Mean relative peak delay by Tone Position and Speech Act. (Error bars represent standard deviation.)

Figure 15

Table 6 Model of Relative Peak Delay. Relative peak delay ~ Speech Act × Tone Position + (1 + Speech Act + Tone Position| Participant) + (1| Item).

Figure 16

Table 7 Model of Relative Peak Delay, broken down by Tone Position subset. Relative peak delay ~ Speech Act + (1| Participant) + (1| Item).

Figure 17

Figure 11 Mean S1 duration by Tone Position and Speech Act. (Error bars represent standard deviation.)

Figure 18

Table 8 Model of S1 duration. S1 duration ~ Speech Act * Tone Position + (1 + Speech Act| Participant) + (1| Item).

Figure 19

Figure 12 Mean peak delay by Tone Position and Speech Act. (Error bars represent standard deviation.)

Figure 20

Table 9 Model of Peak Delay. Peak delay ~ Speech Act × Tone Position + (1 + Speech Act + Tone Position| Participant) + (1| Item).

Figure 21

Table 10 Models of F0 in LLL, broken down by timepoint. F0 at timepoint ~ Speech Act + (1 + Speech Act| Speaker) + (1| Item).

Figure 22

Figure 13 Mean normalized f0 (z) by proportion of measurement interval in the Initial and the Final intervals for LLL sentences for each speaker. Hollow triangles mark measurement points in statements, and black circles mark those in questions.

Supplementary material: File

Myers supplementary material

Myers supplementary material

Download Myers supplementary material(File)
File 740.7 KB