Tianjin Mandarin is a member of the northern Mandarin Chinese family (ISO 693-3: [cmn]). It is spoken in the urban areas of the Tianjin Municipality (CN-12) in the People's Republic of China, which is about 120 kilometers to the southeast of Beijing. Existing studies on Tianjin Mandarin have focused mainly on its tonal aspects, especially its intriguing tone sandhi system, with few studies examining the segmental aspects (on tone, see e.g. Li & Liu Reference Li and Liu1985, Shi Reference Shi1986, Liu Reference Liu and Han1993, Lu Reference Lu1997, Wang & Jiang Reference Wang and Jiang1997, Chen Reference Chen2000, Liu & Gao Reference Liu and Gao2003, Ma Reference Ma2005, Ma & Jia Reference Ma and Jia2006, Zhang & Liu Reference Zhang and Liu2011, Li & Chen Reference Li and Chen2016; on segmental aspects, see e.g. Han Reference Han1993a, Reference Hanb; Wee, Yan & Chen Reference Wee, Yan and Chen2005). As also noted in Wee et al. (Reference Wee, Yan and Chen2005), this is probably due to the similarity in segmental structures between Tianjin Mandarin and Standard Chinese, especially among speakers of the younger generation, and what differentiates the two Mandarin varieties is most notably their tonal systems. The aim of the present description is therefore to provide a systematic phonetic description of both segmental and tonal aspects of Tianjin Mandarin, with main focus on the tonal aspects.
The sound files illustrated in the present description were produced by a male speaker born in the 1980s. The speaker grew up in the Nankai District of Tianjin, one of the oldest urban districts of Tianjin. He had lived mostly in Tianjin with the exception of a four-year stay in Shanghai for university education. He speaks exclusively Tianjin Mandarin both at home and at work. While the current illustration is based on data from the younger generation, variation between our speaker's generation and older speakers is noted where appropriate. For further details on the generational differences in Tianjin Mandarin, readers are referred to Lu (Reference Lu2004). Note that lexical tones are marked with superscript numbers throughout the paper, i.e. Footnote 1 for Tone 1, 2 for Tone 2, 3 for Tone 3 and 4 for Tone 4. (See section ‘Lexical tones’ for more details on lexical tones in Tianjin Mandarin.)
Consonants
There are 25 consonants in Tianjin Mandarin. To facilitate the comparison of Tianjin Mandarin to Standard Chinese − two closely related Mandarin varieties, we elicited whenever possible the same words as in Lee & Zee (Reference Lee and Zee2003) for illustration. Note that although the dentoalveolars are very often not marked with dental diacritics in the literature (e.g. Lee & Zee Reference Lee and Zee2003, Wee et al. Reference Wee, Yan and Chen2005), the dental diacritics are marked in the current description following the IPA illustration requirements. (But see Lee & Zee Reference Lee and Zee2003 where a similar ‘denti-alveolar’ term was used.)
Plosives
Plosives in Tianjin Mandarin differentiate three places of articulation: bilabial /p pʰ/, as in /pɐ1/ ‘eight’ and /pʰɐ2/ ‘to climb’, dentoalveolar / ʰ/, as in /ɐ1/ ‘to build’ and /ʰɐ1/ ‘he/she’, and velar /k kʰ/, as in /kɤ1/ ‘song’ and /kʰɤ1/ ‘subject’. They contrast in aspiration, and the contrast holds for all places of articulation. Table 1 shows the mean VOT of aspirated and unaspirated plosives in different places of articulation.
The measurements in Table 1 were made over 923 monosyllabic morphemes with plosive onsets, all of which were selected from a dataset of 3935 monosyllabic morphemes produced by our speaker. Averaged across different places of articulation, the mean VOT for 439 aspirated plosive tokens is 102 ms, while the mean VOT for 484 unaspirated plosive tokens is 23 ms. A one-way ANOVA test was conducted to compare the effect of PLACES OF ARTICULATION (three levels: BILABIAL, DENTOALVEOLAR, and VELAR) on VOT over the 923 plosive tokens. Results revealed a significant effect of PLACES OF ARTICULATION (F(2,920) = 20.44, p < .001). A post-hoc Tukey HSD test further showed that, velar plosives have significantly longer VOT than bilabial and dentoalveolar plosives (VELAR vs. BILABIAL: Diff. = 20 ms, p-adj. < .001; VELAR vs. DENTOALVEOLAR: Diff. = 10 ms, p-adj. < .001), while the bilabial and dentoalveolar plosives are not significantly different.
Affricates
Affricates in Tianjin Mandarin display the same two-way distinction in aspiration as plosives. They have three places of articulation: dentoalveolar /t tʰ/, as in /tɐ1/ ‘to circle’ and /tʰɐ1/ ‘to wipe’, postalveolar /t tʰ/, as in /tɐ1/ ‘residue’ and /tʰɐ1/ ‘to insert’, and alveolo-palatal /tɕ tɕʰ/ as in /tɕjɐ1/ ‘to add’ and /tɕʰjɐ1/ ‘to nip off’. Like dentoalveolar plosives, the dentoalveolar affricates are produced with the tip of the tongue against the upper front teeth and the tongue blade against the alveolar ridge. The postalveolar affricates are apical, pronounced with the tongue tip raised against the postalveolar region. Both dentoalveolar and postalveolar affricates in Tianjin Mandarin are very similar to that in Standard Chinese (Lee & Zee Reference Lee and Zee2014). The alveolo-palatal affricates are pronounced with the tongue tip down behind the lower front teeth and with the dorsum of the tongue against the area between the alveolar ridge and the hard palate (Ladefoged & Wu Reference Ladefoged and Wu1984).
The postalveolar series are conventionally called ‘retroflexes’ (Chao Reference Chao1948). However, as discussed in Lee & Zee (Reference Lee and Zee2014), this series lacks the action of curling the tongue tip up and back, which is a key feature of typical retroflex articulation (Ladefoged Reference Ladefoged2006). We thus adopt ‘postalveolar’ rather than ‘retroflex’ for this series. Note that there are some word-specific generational differences in the specific place of articulation for this series of sounds. As observed in Han (Reference Han1993a, Reference Hanb), while the young-variety Tianjin Mandarin speakers produce some words using the postalveolar consonants (i.e. /t tʰ /), older-generation speakers typically produce them with their dentoalveolar counterparts (i.e. /t tʰ /, respectively) (see Wee et al. Reference Wee, Yan and Chen2005).
The alveolo-palatal affricates in Tianjin Mandarin are obligatorily followed by a palatal glide if the consonant is not followed by a high vowel, such as in /tɕjɐ1/ ‘to add’ and /tɕʰjɐ1/ ‘to nip off’. Figure 1 shows the spectrogram of /tɕjɐ1/ ‘to add’, where we see a transition (glide /j/) between the consonant /tɕ/ and the vowel /ɐ/, taking up between a quarter or a third of the total rhyme length. The F1 of the glide starts with a low value (around 500 Hz) and gradually increases up to about 850 Hz as that of the vowel /ɐ/; the F2 of the glide starts around 2000 Hz and ends with around 1500 Hz. The F1 and F2 values of the glide onset therefore resemble that of a high front vowel. Glides in such contexts have been traditionally considered part of the rhyme and transcribed as a vowel, such as in /tɕia1/ and /tɕʰia1/ (e.g. Wee et al. Reference Wee, Yan and Chen2005). In line with Lin (Reference Lin2007), however, we treat /j/ as a glide which constitutes part of the onset.
Figure 2 plots the spectrograms of three monomorphemic words with the same rhyme: /ɑɔ1/ ‘knife’, /jɑɔ1/ ‘to hold in the mouth’, and /tɕjɑɔ1/ ‘to teach’. There is a clear glide-like transition in /tɕjɑɔ1/ (Figure 2(c)), which is similar to that in /jɑɔ1/ in (Figure 2(b)) where there is a real glide. Both are different from that in /ɑɔ1/ (Figure 2(a)) where there is only subtle phonetic coarticulation. We take this comparison as additional evidence that in Tianjin Mandarin, there is an underlying glide target between an alveolo-palatal consonant and a non-high vowel. This is different from what is reported by Chen & Gussenhoven (Reference Chen and Gussenhoven2015) for Shanghai Chinese; there they found only brief phonetic coarticulatory transition exhibited between an alveolo-palatal onset and its following vowel rhyme, suggesting the non-presence of the glide target.
Nasals
Tianjin Mandarin has nasals in three places of articulation: bilabial /m/, as in /mɐ1/ ‘mother’, dentoalveolar //, as in /ɐ4/ ‘to include’, and velar /ŋ/, as in /ɑŋ2/ ‘to raise’. /m/ can only occur in the onset position while /ŋ/ only in the coda position.
Fricatives
Fricatives in Tianjin Mandarin differentiate five places of articulation: labiodental /f/, as in /fɐ1/ ‘to send’, dentoalveolar //, as in /ɐ1/ ‘to cast’, postalveolar //, as in /ɐ1/ ‘sand’, alveolo-palatal /ɕ/, as in /ɕjɐ1/ ‘shrimp’, and velar /x/, as in /xɤ1/ ‘to drink’. The alveolo-palatal fricative in Tianjin Mandarin is obligatorily followed by a palatal glide (as in /ɕjɐ1/ ‘shrimp’ and /ɕɥe1/ ‘boots’) or a high vowel (as in /ɕi1/ ‘west’). (See section ‘Lateral and approximants’ for more details on glides /j, ɥ/.) The velar fricative /x/ is realized with the uvular fricative [χ] when followed by a low vowel, as in /xaɛ2/ ([χaɛ2]) ‘child’ and /xɑɔ3/ ([χɑɔ3]) ‘good’.
Lateral and approximants
Tianjin Mandarin has one lateral //, as in /ɐ1/ ‘to pull’, and four approximants: /w j ɥ/, as in /wɐ1/ ‘frog’, /ə2/ ‘person’, /jɐ1/ ‘duck’, and /ɥe1/ ‘to restrict’. Among the four approximants, /w j ɥ/ can serve as syllable onset or part of a complex onset. As a syllable onset, /w/ is sometimes pronounced as the labiodental voiced consonant [v] as also reported in Han (Reference Han1993b). /j/ and /ɥ/ are both palatal, with the main contrast in lip rounding. They do not occur in the same contexts except before the vowel /e/, as in /je1/ ‘to pinch’ vs. /ɥe4/ ‘to abuse’, /tɕje1/ ‘to connect’ vs. /tɕɥe2/ ‘to feel’, /tɕhje1/ ‘to cut’ vs. /tɕhɥe1/ ‘to lack’, and /ɕje1/ ‘to rest’ vs. /ɕɥe1/ ‘boots’. According to Han (Reference Han1993a, Reference Hanb), as also noted by one of the reviewers, the onset // can be pronounced as [j] in the old variety of Tianjin Mandarin in words such as /əʊ4/ ([jəʊ4]) ‘meat’; and interestingly, the onset /j/ can also be pronounced as [] in words such as /jɑŋ4/ ([ɑŋ4]) ‘to brim over’. Further research is needed to understand the lexical-specific swap. Furthermore, /ɥ/ was reported to be produced as [j] in the old variety (Han Reference Han1993b).
Syllabic consonants
Tianjin Mandarin has two syllabic consonants (notation in line with Chao Reference Chao1948): the dentoalveolar //, as in /t3/ ‘son’, and postalveolar //, as in /t3/ ‘paper’. These two phonemes are traditionally referred to as ‘apical vowels’ by Sinologists (e.g. Karlgren 1915−Reference Karlgren1926). Ladefoged & Maddieson (Reference Ladefoged and Maddieson1996) refer to them as ‘fricative vowels’. With evidence from both ultrasound imaging and acoustic data, Lee-Kim (Reference Lee-Kim2014) shows that these two phonemes are neither vowels nor fricative, but more comparable with approximants in nature, i.e. syllabic dental approximant // and retroflex approximant /ɻ/, respectively. This is similar to observations made in Lee & Zee (Reference Lee and Zee2003, Reference Lee and Zee2014), in which, however, the two sounds have been transcribed with the same syllabic approximant //.
The dentoalveolar // only follows dentoalveolar consonants /t th / (as in /t3/ ‘son’, /tʰ2/ ‘word’, /1/ ‘to think’), while // follows postalveolar consonants /t th / (as in /t3/ ‘paper’, /tʰ1/ ‘to eat’, /1/ ‘poem’). Their tongue configurations are similar to the preceding homorganic consonants, i.e. // is homorganic with /t, th, /, and // is homorganic with /t th /. In addition, // can occur by itself as in /4/ ‘the sun’. To highlight the homorganicity of the preceding consonants and the following syllabic approximants, we transcribe the two syllabic consonants with two independent symbols as in Lee-Kim (Reference Lee-Kim2014), but we adopted the postalveolar // symbol rather than the retroflex //.
For more information on the debates regarding both the phonemic status and notation of the two sounds, as well as their acoustic and articulatory realizations, readers are referred to Lee-Kim (Reference Lee-Kim2014) and Lee & Zee (Reference Lee and Zee2014).
Vowels
Monophthongs
Tianjin Mandarin has 14 monophthongs. /i y e ɐ o ɤ u/ occur in open syllables, /ɪ ʏ ɛ a ɑ ʊ/ occur in closed syllables, and /ə/ occurs in both open and closed syllables.
Among the seven vowels that occur only in open syllables, /i/ and /y/ (as in /i1/ ‘low’ and /y2/ ‘donkey’) are high front vowels contrasting in lip rounding. /e/ is a mid-high front vowel which obligatorily follows an onset glide (i.e. /j ɥ w/), as in /je1/ ‘dad’. /ɐ/ is a low mid vowel, as in /ɐ1/ ‘to build’. /u/ is a high back rounded vowel, as in /u1/ ‘metropolis’. Both /ɤ/ and /o/ are mid-high back vowels differing mainly in lip rounding, as in /ɤ2/ ‘to get’ and /wo1/ ‘more’ (where an onglide /w/ is obligatory before /o/). (But see Han Reference Han1993b, which reported that all syllables with /o/ are pronounced with /ɤ/ by speakers of the old variety of Tianjin Mandarin.)
/ɪ/ (as in /ɪ2/ ‘forest’ and /ɪŋ2/ ‘zero’) and /ʏ/ (as in /tɕʏ1/ ‘army’) both occur only in closed syllables. They are the lax counterparts of /i/ and /y/, respectively. /ɪ/ occurs before both dentoalveolar and velar nasal coda, while /ʏ/ only occurs before dentoalveolar nasal coda. When /ɪ/ and /ʏ/ are followed by nasal codas, an offglide [ə] is inserted as the articulation of the vowel transits to the nasal in the following coda. This is illustrated in Figure 3, where the spectrogram of the syllable /hɪŋ1/ ([hɪəŋ1] ‘to listen’) in Tianjin Mandarin is plotted against that of the syllable /sɪŋ/ ‘sing’ in American English (Ladefoged Reference Ladefoged1999). In the latter, there is a clear and sharp acoustic boundary between the vowel and the nasal coda (as shown with an arrow in Figure 3(b)) without the presence of a transitional schwa. One of our reviewers pointed out that the nasal codas in Tianjin Mandarin are possibly nasal glides while the English nasal codas are nasal stops (see also Wang Reference Wang1997), which – we agree – might be the reason behind the vowel−coda juncture difference. Further studies are needed to verify this possibility.
/ɛ/ is a mid-low front vowel which only occurs before the dentoalvelar nasal coda, as in /jɛ1/ ‘bump’. /ɛ/ is often treated as an allophone of /a/ in traditional descriptions of Tianjin Mandarin, for example in Han (Reference Han1993a) and Wee et al. (Reference Wee, Yan and Chen2005). /a/ and /ɑ/ are both low vowels, which, however, occur in different contexts. The low front /a/ occurs before dentoalveolar nasal coda (as in /a1/ ‘single’), and the low back vowel /ɑ/ before the velar nasal coda (as in /ɑŋ1/ ‘when’). /ʊ/ is the lax counterpart of /u/, which occurs before the velar nasal coda as in /ʊŋ1/ ‘east’.
/ə/ is a central vowel which can appear in both open and closed syllables although an open syllable with /ə/ is exclusively a neutral-tone syllable (e.g. /ə/ possessive marker in /wo3 ə/ ‘mine’). (See section ‘Neutral tone’ below for more details on neutral tone.) In closed syllables, /ə/ can occur before both nasal codas // and /ŋ/ (as in /ə4/ ‘to drag’ and /əŋ1/ ‘lamp’). Before the dentoalveolar nasal coda, /ə/ is slightly more fronted than before the velar nasal coda /ŋ/.
Figure 4 shows the mean F1 and F2 values of each monophthong occurring in open and closed syllables. The formant data in Figure 4(a) were based on 50 samples of each monophthong produced in open syllables by measuring the vowel midpoint. /ə/ is not included in Figure 4(a) because open syllables with /ə/ can only occur in neutral-tone syllables, which do not occur in isolation. Monophthongs occurring in closed syllables with dentoalveolar and velar nasal coda are plotted in Figure 4(b). Formant values in both graphs were converted from Hertz to Bark using the following formula (after Boersma & Weenink Reference Boersma and Weenink2017):
It can be seen from Figure 4 that vowels in closed syllables (Figure 4(b)) are more centralized compared to those in open syllables (Figure 4(a)). Furthermore, the realization of central vowel /ə/ is influenced by different following nasal codas due to their closure gestures at different places of articulation. To be specific, /ə/ is more fronted if followed by the dentoalveolar nasal coda //, but more backward if followed by the velar nasal coda /ŋ/.
Note that vowels in closed syllables have been often treated as allophonic variants of vowels in open syllables, mainly based on the fact that they are mutually non-contrastive, and occur in different contexts. For example, Wee et al. (Reference Wee, Yan and Chen2005) use /a/ for both open syllables (e.g. /pha2/ ‘to climb’) and closed syllables (e.g. /san1/ ‘three’ and /taŋ3/ ‘party’). However, given the clearly different vowel quality, here we adopt an alternative view to treat them as different phonemes (also see Lin Reference Lin2007 and Chen & Gussenhoven Reference Chen and Gussenhoven2015 for similar treatments). This is to highlight the phonological non-equivalence of pairs of open-syllable vowel vs. closed-syllable vowel (Chen & Gussenhoven Reference Chen and Gussenhoven2015).
Diphthongs
There are four diphthongs in Tianjin Mandarin, with /eɪ aɛ/ gliding towards the front (as in /eɪ2/ ‘thunder’ and /aɛ1/ ‘dull’) and /ɑɔ əʊ/ towards the back (as in /ɑɔ1/ ‘knife’ and /əʊ1/ ‘all’). All diphthongs only occur in open syllables. Figure 5 shows the mean F1 and F2 values of 50 samples for each diphthong by measuring the respective midpoint of the two parts in the vowel. All samples were selected from the 3935 monosyllabic words produced by our speaker. Arrows in Figure 5 demonstrate the trajectories of the gliding. Formant data were converted from Hertz to Bark using the following formula (after Boersma & Weenink Reference Boersma and Weenink2017):
/eɪ/ and /aɛ/ are frequently transcribed as /ei/ and /ai/, respectively, both gliding towards the same high front target /i/ (e.g. Han Reference Han1993a, Reference Hanb; Wee et al. Reference Wee, Yan and Chen2005). Figure 5 shows that neither /eɪ/ nor /aɛ/ in Tianjin Mandarin really reaches the high front region at the offset part.
To further illustrate the different end points of /eɪ/ and /aɛ/, Figure 6 compares their spectrograms. As shown in Figure 6, the end points of /eɪ/ vs. /aɛ/ are very different especially in terms of F1 and F2, where the offset part of /eɪ/ shows clearly lower F1 (about 600 Hz) but higher F2 values (about 2200 Hz) than that of /aɛ/ (F1: about 700 Hz; F2: about 1800 Hz).
Similar differences can also be observed for /ɑɔ/ vs. /əʊ/, both of which have been frequently described as gliding towards the high back vowel /u/ (e.g. Han Reference Han1993a, Reference Hanb; Wee et al. Reference Wee, Yan and Chen2005). Figure 7 compares the spectrograms of /ɑɔ/ vs. /əʊ/, which again show different qualities of the two vowels at the end. To be specific, /əʊ/ (Figure 7(b)) shows lower F1 (450 Hz) and F2 (850 Hz) than those of /ɑɔ/ (F1: 650 Hz; F2: 950 Hz; see Figure 7(a)).
Rhotic vowel and er-hua
Tianjin Mandarin has a rhotic vowel /ə˞/, which is produced as an r-colored schwa with the tip of tongue raised. /ə˞/ is syllabic, as in /ə˞2/ ‘son’, /ə˞3/ ‘ear’ and /ə˞4/ ‘two’. When /ə˞/ is produced in a neutral-tone syllable, it is used as a diminutive suffix.
When adding the suffix /ə˞/ to a noun, the two syllables are typically coalesced into one rhotacized syllable in the output form. The vowel part of the preceding syllable is directly rhotacized and only the lexical tone of the preceding syllable is kept, e.g. /i3/ + /ə˞/ → [i˞3] ‘remnants’ (as compared to [i3] ‘bottom’). Such process is known as ‘rhotacization’ or ‘er-hua’ in Chinese. Figure 8 compares the spectrograms of non-rhotacized [i] to its rhotacized counterpart [i˞]. As shown in Figure 8(b), the first quarter of [i˞] is realized similarly to [i] (as in Figure 8(a)). The remaining three-quarters, however, is realized with a clearly lowered mean F3 (from about 2500 Hz to 1920 Hz), which is a typical acoustic cue of rhotacization (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996). In addition, the process of rhotacization often changes the F1 and F2 values. For example, in Figure 8(b), the F1 of /i/ has been changed from 440 Hz to 670 Hz and F2 from 2000 Hz to 1400 Hz. Due to the changes in F1 and F2, rhotacization has been traditionally transcribed as /ə˞/-insertion, e.g. /i/ + /ə˞/ → [iə˞] (see e.g. Han Reference Han1993a). However, given the clear F3 lowering in the rhotacized vowel, we regard the F1 and F2 changes as the by-products of rhotacization rather than a rhotacized schwa insertion.
There are also other rhotacizing processes in both open and closed syllable structures. In open syllables, if the vowel is /aɛ/, only the /a/ part is rhotacized while /ɛ/ is deleted, e.g. /pʰaɛ2/ + /ə˞2/ → [pʰa˞2] ‘badge’ (as compared to [pʰaɛ2] ‘card’); if the rhyme is /e/, /eɪ/ or a syllabic consonant, the entire rhyme part is replaced with /ə˞/, e.g. /peɪ4/ + /ə˞2/ → [pə˞4] ‘very’ (as compared to [peɪ4] ‘double’). In closed syllables, if the coda is //, the vowel is rhotacized, while the coda is deleted, e.g. /pa4/ + /ə˞2/ → [pa˞4] ‘partner’ (as compared to [pa4] ‘companion’); if the coda is /ŋ/, the vowel is nasalized and rhotacized, e.g. /xwɑŋ2/ + /ə˞2/ → [xw2] ‘yolk’ (as compared to [xwɑŋ2] ‘yellow’).
Syllable structure and phonotactics
The syllable structure in Tianjin Mandarin is (C)(G)V(C). C stands for consonant, G for glide, and V for vowel. Except for /ŋ/, all consonants can occur at syllable onset. Glides can also serve as syllable onset, such as /j/ in /jɐ1/ ‘duck’, /w/ in /wɐ1/ ‘frog’ and /ɥ/ in /ɥe1/ ‘to restrict’. Onsetless syllables are also possible, as in /aɛ1/ ‘sad’. However, in the old variety Tianjin Mandarin as reported in Han (Reference Han1993b), onsetless syllables are not allowed before the rhymes of /a/, /ɑŋ/, /ə/, /ɤ/, /aɛ/, /ɑɔ/, or /əʊ/; a nasal onset // is obligatory. Coda is optional, and only // and /ŋ/ are allowed (as in /a1/ ‘single’ or /ɑŋ1/ ‘when’). // is the only consonant in the language that can appear both at the beginning and end of a syllable (as in /ɐ4/ ‘to include’ and /a1/ ‘single’).
In addition, there are some co-occurrence restrictions of consonants and vowels in Tianjin Mandarin:
(i) High front vowels /i y ɪ ʏ/ and the corresponding glides /j ɥ/ cannot follow dentoalveolar obstruents /t th /, postalveolar /t th /, or velar consonants /k kh x/. However, /y/ and /ɥ/ can follow dentoalveolar / /, as in /y3/ ‘female’, /y2/ ‘donkey’, /ɥe4/ ‘to abuse’, /ɥe4/ ‘to omit’.
(ii) Alveolo-palatals /tɕ tɕh ɕ/ can only occur before high front vowels /i y ɪ ʏ/, as in /tɕi1/ ‘chicken’, /tɕhi1/ ‘seven’, /ɕi1/ ‘west’, /tɕy1/ ‘to live’, /tɕhy1/ ‘maggot’, /ɕy1/ ‘needs’, /tɕɪ1/ ‘gold’, /tɕhɪ1/ ‘to invade’, /ɕɪ1/ ‘heart’, /tɕʏ1/ ‘army’, /tɕhʏ2/ ‘skirt’, /ɕʏ1/ ‘to fumigate’, or their corresponding glides /j ɥ/, as in /tɕje1/ ‘to connect’, /tɕhje1/ ‘to cut’, /ɕje1/ ‘to rest’, /tɕɥe2/ ‘to feel’, /tɕhɥe1/ ‘to lack’, /ɕɥe1/ ‘boots’. Given this restriction, the phonemic status of the alveolo-palatals has been a matter of debate, as relative to the dentoalveolar /t th /, the postalveolar /t th /, and the velar /k kh x/. A full discussion of their status is beyond the scope of the present paper; interested readers are referred to Lin (Reference Lin2014) for further details.
(iii) Mid-high vowels /e o/ and mid-low vowel /ɛ/ have to co-occur with glides /j w/, respectively, as in /je1/ ‘dad’, /wo1/ ‘more’, /jɛ1/ ‘bump’. Mid-high back vowel /ɤ/ cannot follow labial consonants /p ph m f/ in the younger-variety Tianjin Mandarin; the structure of /p ph m f/ + /ɤ/ has been only reported for the old variety in Han (Reference Han1993b).
(iv) Glide /w/ cannot be followed by front vowels while /ɥ/ can only precede front vowels, as in /ɥe1/ ‘to restrict’. /j/, however, can be followed by front, central and back vowels, as in /ji1/ ‘one’, /jɐ1/ ‘duck’, /jɑŋ1/ ‘central’.
(v) Han (Reference Han1993b) noted that in the old-variety Tianjin Mandarin, // or // do not co-occur with the rounded high front vowel /y/ in a number of open syllables. So, words like /y3/ ‘female’ and /y4/ ‘green’ in the young-variety Tianjin Mandarin would be pronounced as /weɪ3/ and /weɪ4/ in the old-variety Tianjin Mandarin.1
Tones
Lexical tones
There are four full lexical tones in Tianjin Mandarin. Figure 9 shows the f0 contours of the four lexical tones elicited in isolation with obstruent onsets. Each tonal contour was obtained by averaging across 50 samples, all of which were selected from the 3935 monosyllabic words produced by our speaker. The f0 values were normalized so that f0 can be interpreted into the five-scale pitch system using the T-normalization method developed by Shi (Reference Shi1986). The intervals 0−1, 1−2, 2−3, 3−4, and 4−5 correspond to pitch levels 1–5 in Chao's (Reference Chao1920) lexical tone annotation system, respectively.
As illustrated in Figure 9, Tone 1 (hereafter referred to as T1) is a low-falling tone, of which pitch contour falls from the mid to the lower end of the speaker's pitch range, as in /ɑɔ1/ ‘to dredge up’. Tone 2 (T2) is a high-rising tone, whose pitch contour rises from the mid to the upper end of the pitch range, as in /ɑɔ2/ ‘hard-working’. Tone 3 (T3) is a low-dipping or low-rising tone, which falls slightly from the lower pitch range, stays at the bottom and then rises to the mid pitch range of the speaker, as in /ɑɔ3/ ‘old’. Tone 4 (T4) is a high-falling tone which falls from the upper end to the mid of the pitch range, as in /ɑɔ4/ ‘to flood’. It is noticeable that T1 and T4 differ in the overall tonal height where T1 is realized in a lower pitch range while T4 in a relatively higher one. Furthermore, T4 has a high plateau/rise at the beginning, while T1 does not. Adopting the pitch range scale in Chao (Reference Chao1920), T1 can be transcribed as /31/, T2 as /45/, T3 as /213/ or /13/, T4 as /53/.
Previous studies on Tianjin lexical tones have been mainly based on impressionistic observations. Researchers have varied greatly in their annotation of the four lexical tones, as summarized in Table 2. It is worth noting that although most studies differ in the absolute pitch values for the four tones, at a more abstract level, the basic f0 patterns of the four lexical tones in Tianjin Mandarin can be described as low-falling (T1), high-rising (T2), low-dipping/low-rising (T3), and high-falling (T4).
Tonal variability
When lexical tones are produced in connected speech, their f0 realizations usually deviate from the canonical tonal contours that are produced in isolation, due to different contextual tonal variation processes such as tonal coarticulation and tone sandhi.
Tonal coarticulation
Tonal coarticulation in Tianjin Mandarin is bi-directional, including the left-to-right carryover effects as well as the right-to-left anticipatory effects. Carryover tonal coarticulation in Tianjin Mandarin is assimilatory in nature, while anticipatory tonal coarticulation tends to be dissimilatory. In Tianjin Mandarin, the carryover tonal coarticulation can be observed in all tonal contexts except when the second tone is the low-falling T1 (Zhang & Liu Reference Zhang and Liu2011), while the anticipatory tonal coarticulation is only triggered by low tones (i.e. T1 and T3) (Li & Chen Reference Li and Chen2016; but see Zhang & Liu Reference Zhang and Liu2011, which reports anticipatory effect only before T3). Figure 10 illustrates the two coarticulatory effects in Tianjin Mandarin. Each tonal contour was obtained by averaging across 12 disyllabic samples produced by our speaker. For more details on tonal coarticulation in Tianjin Mandarin, readers are referred to Zhang & Liu (Reference Zhang and Liu2011) and Li & Chen (Reference Li and Chen2016).
As shown in Figure 10(a), the f0 of a tone can be realized differently due to different preceding tones: when T2 is preceded by a high tone such as T4 (as in /tɕi4 məʊ2/ ‘stratagem’), the onset f0 realization of the second T2 is clearly higher than that following a low tone such as T1 (as in /kweɪ1 mwo2/ ‘scale’). Similar carryover effects could be observed from the comparison of T4T4 (as in /eɪ4 mu4/ ‘inside story’) vs. T1T4 (as in /kweɪ1 mi4/ ‘best female friend’).
Figure 10(b) illustrates the anticipatory effects, where the first tone is realized differently due to different following tones: when T2 is followed by a low tone such as T1 (as in /paɛ2 mɑɔ1/ ‘white cat’), the offset f0 realization of the first T2 shows faster rate of f0 rise than that before a high tone such as T4 (as in /thəʊ2 mi4/ ‘dense’). Similar anticipatory effects could also be observed from the comparison of T4T2 (as in /tɕi4 məʊ2/ ‘stratagem’) vs. T4T3 (as in /mi4 mɐ3/ ‘password’).
Tone sandhi
Previous impressionistic studies on Tianjin Mandarin have proposed four disyllabic tone sandhi patterns: T1T1, T3T3, T4T1, and T4T4 (e.g. Li & Liu Reference Li and Liu1985, Hung Reference Hung1987, Tan Reference Tan1987, Zhang Reference Zhang1987, Chen Reference Chen2000, Wang Reference Wang2002, Hyman Reference Hyman, Riad and Gussenhoven2007, but see Wee et al. Reference Wee, Yan and Chen2005 for two more disyllabic sandhi patterns: T3T2 and T3T4). Among the four claimed tone sandhi patterns, only three have been confirmed with experimental data (Li & Chen Reference Li and Chen2016): T1T1, T3T3, T4T1. Figure 11 shows the f0 contours of the three tone sandhi patterns, with each tonal contour obtained by averaging across 12 samples produced by our speaker. These patterns have been further verified by an experimental study of five more speakers in Li & Chen (Reference Li and Chen2016).
It can be seen from Figure 11 that in all the three tonal combinations, the first tone is realized with a drastically different f0 contour from that of its canonical form (compared to their respective contours in Figure 9 plotted here as the dashed lines). In T1T1, the first T1 does not have a low-falling f0 contour any more as its canonical form (dashed line in Figure 11(a)). Instead, the f0 offset of the sandhi tone (solid line in Figure 11(a)) is raised to a great extent (as in /tɕjɐ1 mɑɔ1/ ‘domestic cat’). In T3T3, the first T3 is realized with a high-rising f0 contour (solid line in Figure 11(b)), which is different from the low-dipping/low-rising f0 contour of the canonical T3 (dashed line in Figure 11(b)), as in /wu3 y3/ ‘dancing girl’. In T4T1, the first tone shows a high-rising f0 (solid line in Figure 11(c)), as in /xəʊ4 mɐ1/ ‘stepmother’. It is again very different from the high-falling f0 contour when the tone was pronounced in isolation (dashed line in Figure 11(c)).
Another aspect of tone sandhi worth noting is that in previous impressionistic studies, tone sandhi has always been argued to involve categorical changes from one lexical tone to another within the language's tonal inventory, i.e. T1 + T1 → T3 + T1, T3 + T3 → T2 + T3, and T4 + T1 → T2 + T1 (e.g. Li & Liu Reference Li and Liu1985, Hung Reference Hung1987, Tan Reference Tan1987, Zhang Reference Zhang1987, Chen Reference Chen2000, Wang Reference Wang2002, Hyman Reference Hyman, Riad and Gussenhoven2007). This view, however, needs to be rectified.
As shown in Figure 12, among the three tone sandhi sequences, T3T3 (solid line in Figure 12(b)) is near-merged with the claimed sandhi output tone sequence T2T3 (dashed line in Figure 12(b); as in /hwo2 jɑɔ3/ ‘ostrich’). T1T1 (solid line in Figure 12(a)) and T4T1 (solid line in Figure 12(c)), however, are far from their purported sandhi-derived lexical tonal contours, i.e. T3T1 (dashed line in Figure 12(a); as in /aɛ3 mɐ1/ ‘nanny’) and T2T1 (dashed line in Figure 12(b); as in /paɛ2 mɑɔ1/ ‘white cat’), respectively. This argues strongly against the view that sandhi variations involve the change of one lexical tone to another. For more experimental data on disyllabic tone sandhi in Tianjin Mandarin, see Zhang & Liu (Reference Zhang and Liu2011) and Li & Chen (Reference Li and Chen2016).
Note that the disyllabic tone sandhi patterns plotted in Figures 11 and 12 (as well as in Li & Chen Reference Li and Chen2016) are based on speech of the young generation. Tianjin tone sandhi has been reported to undergo several diachronic changes. For example, Shi & Wang (Reference Shi, Wang, Shi and Shen2004) noted that T4T4 sandhi is applied among older speakers although it is no longer observed among middle-aged and young speakers. Liu & Gao (Reference Liu and Gao2003) also reported T4T4 sandhi as ‘obsolete’. In contrast, T4T1 sandhi seems to be a new innovation and could only be observed among middle-aged and young speakers; no T4T1 sandhi has been observed among old speakers (Shi & Wang Reference Shi, Wang, Shi and Shen2004). Lu (Reference Lu1997) and Shi & Wang (Reference Shi, Wang, Shi and Shen2004) reported a high-level f0 realization of the first T1 of T1T1 among young speakers, which, however, is not observed in our dataset.
When these disyllabic tonal sequences occur in a larger domain such as trisyllabic sequences, tone sandhi has been claimed to apply consistently regardless of the alignment of the disyllabic sequences within a trisyllabic constituents (e.g. Li & Liu Reference Li and Liu1985, Chen Reference Chen2000, Ma Reference Ma2005, Wee et al. Reference Wee, Yan and Chen2005). The purported consistent applications of disyllabic sandhi in trisyllabic sequences have given rise to much complexity in the analysis of sandhi applications, posing great challenges to theories of tonal alternation (e.g. Chen Reference Chen2000, Yip Reference Yip2002, Hyman Reference Hyman, Riad and Gussenhoven2007). For example, the pattern T1T1 has been claimed to apply when it is both left-aligned (e.g. T1T1T2) and right-aligned (e.g. T2T1T1). This, again, is not supported by our data based on speech of the younger generation.
Among the three tone sandhi sequences, T3T3 is realized with its sandhi change consistently and regardless of its alignment in the trisyllabic sequences, as illustrated in Figure 13. In both T3T3T2 (Figure 13(a); as in /xwo3 pɐ3 tɕje2/ ‘torch festival’) and T2T3T3 (Figure 13(b); as in /xwɐ2 pjɑɔ3 tɕjɑŋ3/ ‘Marble Pillar Award’), the first T3 is realized with a rising f0 contour (comparable to that in Figure 11(b)), indicating the application of tone sandhi in both contexts.
By contrast, T1T1 sandhi and T4T1 sandhi are only applied when the patterns are right-aligned in the trisyllabic sequences, as shown in Figures 14 and 15, respectively. In T2T1T1 (Figure 14(b)) where the T1T1 sequence is right-aligned as in /xʊŋ2 əŋ1 tɕhy1/ ‘red light district’, the middle T1 is realized with a rising f0 contour comparable to that in Figure 11(a), suggesting the application of tone sandhi in this case. When T1T1 is left-aligned (Figure 14(a)) as in /kʊŋ1 ɑŋ1 tɕy2/ ‘Trade and Industry Bureau’, tone sandhi does not apply since given that the first T1 is realized with a falling f0 contour, comparable to its canonical form as in Figure 9. Similarly, right-aligned sandhi application could also be observed for the T4T1 sequence as in T2T4T1 (Figure 15(b); as in /tɐ2 tɕi4 pan1/ ‘acrobatic team’), while tone sandhi does not apply when the sequence is left-aligned as in T4T1T2 (Figure 15(a); as in /haɛ4 khʊŋ1 thwan2/ ‘spaceship’). For more experimental data on trisyllabic tone sandhi in Tianjin Mandarin, see Li & Chen (Reference Li and Chen2016).
Neutral tone
As in many other varieties of Mandarin Chinese, neutral tone also exists in Tianjin Mandarin. The neutral-tone syllables in Mandarin Chinese typically do not surface with any of the lexical tones (Chen Reference Chen and Sybesma2015). As these syllables always occur in the prosodically weak positions (Chen & Xu Reference Chen and Xu2006), they are usually produced with acoustic reduction at the segmental level, where the onset consonant of the neutral-tone syllable is sometimes voiced, and the vowel might be centralized or even deleted. For example, in the word /kɤ1 kɤ/ ‘elder brother’, in which the second syllable is a neutral-tone syllable, its onset consonant is often voiced, and the vowel can be reduced to a schwa, realized as [kɤ1 gə]. Neutral-tone syllables are usually produced with short duration (typically about half of the duration for a full lexical tone syllable) and their f0 realization also exhibits much variability.
Neutral-tone syllables in Tianjin Mandarin are common in grammatical morphemes (e.g. possessive marker /ə/ in /wo3 ə/ ‘mine’), lexical items (e.g. /i/ in /pwo1 i/ ‘glass’), diminutive words (e.g. /kɤ/ in /kɤ1 kɤ/ ‘elder brother’), and reduplication (e.g. /kʰa/ in /kʰa4 kʰa/ ‘to take a look’).
The f0 realization of neutral tone in Tianjin Mandarin is influenced by the preceding tone and shows varied patterns (e.g. Wang Reference Wang2002, Li & Chen Reference Li, Chen, Lee and Zee2011). In particular, neutral tone before the low-falling T1 is very often realized with a rising f0 contour, typically when there is only one neutral tone embedded between two full tones (as in /peɪ1 tə mɐ1/ ‘carrying Mom on the back’). This is clearly shown with the rising f0 contour over the middle neutral-tone syllable (N) in Figure 16, in which each tonal contour was obtained by averaging across six samples produced by our speaker. This has led to the proposal that neutral tone in Tianjin Mandarin has a special high offset tonal target before the lexical T1, different from the typical low neutral-tone offset target before other lexical tones in the language (Wang Reference Wang2002).
However, experimental data suggest that the rising neutral-tone realization could be due to the general raising effect of T1 upon its preceding tones (Li & Chen Reference Li and Chen2016). When there are multiple neutral tones like those shown in Figure 17(a), the mid-low neutral-tone target is approached first by the end of the second neutral tone (N2 in Figure 17(a)), as in the example sentence /thɐ1 wo1 mɐ1 mɐ mə ə mɑɔ1 hi1 wa4 ə ɐ4 kə ɕjɛ4 tɕhjəʊ2/ ‘He said mothers’ cats messed up that cotton ball’. The raised f0 realization of neutral tone could only be observed over the very last neutral-tone syllable (N3 in Figure 17(a)). Importantly, the raising effect can be blocked by a major prosodic boundary as in Figure 17(b), as in the example sentence /hɐ1 wo1 mɐ1 mɐ mə ə təŋ1 tɕjɐ1 lə a1 paɛ3 khwaɛ4 tɕhjɛ2/ ‘He said mothers’ had increased by 300 yuan’.
Transcription of recorded passage ‘North Wind and the Sun’
The passage is transcribed phonemically with symbols described in the consonant and vowel sections. Full lexical tones are marked with superscript tone numbers instead of tonal values. As we do not follow the tradition that sandhi tones are transcribed as some other tone within the inventory, sandhi tones are marked with parentheses outside the original superscript tone numbers, e.g. (1) for sandhi-T1. Neutral-tone syllables are not marked with tone numbers. Syllable boundaries are marked by space, | marks the end of major phrases, and || for the end of utterances.
Orthographic Transcription
Acknowledgements
This research was funded by China Postdoctoral Science Foundation and China Scholarship Council (Qian Li), as well as the ERC Starting grant and the KNAW−China Exchange grant (Yiya Chen). We thank the editors Amalia Arvaniti and Adrian Simpson, our copyeditor Ewa Jaworska, and two anonymous reviewers for their constructive suggestions. We also thank Shipeng Shao for stimuli recording. Thanks to Wen Cao and Feng Shi for making the recording booths available. We are grateful to Menghui Shi for sharing scripts. We also benefit greatly from the inspiring discussions with Ting Zeng.