Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-12T11:47:21.934Z Has data issue: false hasContentIssue false

Explaining the PENTA model: a reply to Arvaniti and Ladd*

Published online by Cambridge University Press:  15 February 2016

Yi Xu*
Affiliation:
University College London
Albert Lee*
Affiliation:
University of Hong Kong
Santitham Prom-on*
Affiliation:
King Mongkut's University of Technology Thonburi
Fang Liu*
Affiliation:
University of Essex

Abstract

This paper presents an overview of the Parallel Encoding and Target Approximation (PENTA) model of speech prosody, in response to an extensive critique by Arvaniti & Ladd (2009). PENTA is a framework for conceptually and computationally linking communicative meanings to fine-grained prosodic details, based on an articulatory-functional view of speech. Target Approximation simulates the articulatory realisation of underlying pitch targets – the prosodic primitives in the framework. Parallel Encoding provides an operational scheme that enables simultaneous encoding of multiple communicative functions. We also outline how PENTA can be computationally tested with a set of software tools. With the help of one of the tools, we offer a PENTA-based hypothetical account of the Greek intonational patterns reported by Arvaniti & Ladd, showing how it is possible to predict the prosodic shapes of an utterance based on the lexical and postlexical meanings it conveys.

Type
Squibs and Replies
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

We would like to thank Amalia Arvaniti, Antonis Botinis, Bronwen Evans, Bob Ladd and four anonymous reviewers for their comments on earlier drafts of this paper. This work received support from the following sources: the National Science Foundation (NSF BCS-1355479 to the first author), the Royal Society and the Royal Academy of Engineering through the Newton International Fellowship Scheme (to the third author) and the Thai Research Fund through a Research Grant for New Researchers (TRG5680096 to the third author).

References

REFERENCES

Arvaniti, Amalia & Ladd, D. Robert (2009). Greek wh-questions and the phonology of intonation. Phonology 26. 4374.Google Scholar
Bailly, Gérard & Holm, Bleicke (2005). SFC: a trainable prosodic model. Speech Communication 46. 348364.Google Scholar
Beckman, Mary E. & Pierrehumbert, Janet B. (1986). Intonational structure in Japanese and English. Phonology Yearbook 3. 255309.Google Scholar
Birkholz, Peter, Kroger, Bernd J. & Neuschaefer-Rube, Christiane (2011). Model-based reproduction of articulatory trajectories for consonant–vowel sequences. IEEE Transactions on Audio, Speech, and Language Processing 19. 14221433.Google Scholar
Black, Alan & Hunt, Andrew (1996). Generating F0 contours from ToBI labels using linear regression. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP 96). Vol. 3. 1385–1388.Google Scholar
Bolinger, Dwight L. (1986). Intonation and its parts: melody in spoken English. London: Arnold.Google Scholar
Broe, Michael B. & Pierrehumbert, Janet B. (eds.) (2000). Papers in laboratory phonology V: acquisition and the lexicon. Cambridge: Cambridge University Press.Google Scholar
Chen, Matthew Y. (2000). Tone sandhi: patterns across Chinese dialects. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Chen, Yiya & Xu, Yi (2006). Production of weak elements in speech: evidence from F0 patterns of neutral tone in Standard Chinese. Phonetica 63. 4775.Google Scholar
Cooper, William E., Eady, Stephen J. & Mueller, Pamela R. (1985). Acoustical aspects of contrastive stress in question–answer contexts. JASA 77. 21422156.CrossRefGoogle ScholarPubMed
de Jong, Kenneth (2004). Stress, lexical focus, and segmental focus in English: patterns of variation in vowel duration. JPh 32. 493516.Google Scholar
Doupe, Allison J. & Kuhl, Patricia K. (1999). Birdsong and human speech: common themes and mechanisms. Annual Review of Neuroscience 22. 567631.Google Scholar
Fujisaki, Hiroya (1983). Dynamic characteristics of voice fundamental frequency in speech and singing. In MacNeilage, Peter F. (ed.) The production of speech. New York: Springer. 3955.Google Scholar
Grice, Martine, Ladd, D. Robert & Arvaniti, Amalia (2000). On the place of phrase accents in intonational phonology. Phonology 17. 143185.CrossRefGoogle Scholar
Gussenhoven, Carlos (2000). The boundary tones are coming: on the nonperipheral realization of boundary tones. In Broe & Pierrehumbert (2000). 132–151.Google Scholar
Gussenhoven, Carlos (2004). The phonology of tone and intonation. Cambridge: Cambridge University Press.Google Scholar
Hart, Johan 't, Collier, René & Cohen, Antonie (1990). A perceptual study of intonation: an experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press.Google Scholar
Heldner, Mattias (2003). On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish. JPh 31. 3962.Google Scholar
Hirst, D. J. (2005). Form and function in the representation of speech prosody. Speech Communication 46. 334347.Google Scholar
Jun, Sun-Ah (ed.) (2005). Prosodic typology: the phonology of intonation and phrasing. Oxford: Oxford University Press.Google Scholar
Kochanski, Greg & Shih, Chilin (2003). Prosody modeling with soft templates. Speech Communication 39. 311352.Google Scholar
Ladd, D. Robert (2008). Intonational phonology. 2nd edn. Cambridge: Cambridge University Press.Google Scholar
Lee, Albert, Xu, Yi & Prom-on, Santitham (2014). Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer. Proceedings of the 4th International Symposium on Tonal Aspects of Languages (TAL2014). 164–167.Google Scholar
Liu, Fang & Xu, Yi (2005). Parallel encoding of focus and interrogative meaning in Mandarin intonation. Phonetica 62. 7087.Google Scholar
Liu, Fang, Xu, Yi, Prom-on, Santitham & Yu, Alan (2013). Morpheme-like prosodic functions: evidence from acoustic analysis and computational modelling. Journal of Speech Sciences 3. 85140.CrossRefGoogle Scholar
Nick, Teresa A. (2014). Models of vocal learning in the songbird: historical frameworks and the stabilizing critic. Developmental Neurobiology. DOI:10.1002/dneu.22189.Google Scholar
O'Connor, J. D. & Arnold, G. F. (1973). Intonation of colloquial English: a practical handbook. 2nd edn. London: Longman.Google Scholar
Peng, Shu-Hui (2000). Lexical versus ‘phonological’ representations of Mandarin sandhi tones. In Broe & Pierrehumbert (2000). 152–167.Google Scholar
Pierrehumbert, Janet B. (1980). The phonology and phonetics of English intonation. PhD dissertation, MIT.Google Scholar
Pierrehumbert, Janet B. (1981). Synthesizing intonation. JASA 70. 985995.Google Scholar
Pierrehumbert, Janet B. (2000). Tonal elements and their alignment. In Horne, Merle (ed.) Prosody: theory and experiment. Studies presented to Gösta Bruce. Dordrecht: Kluwer. 1136.Google Scholar
Pierrehumbert, Janet B. & Beckman, Mary E. (1988). Japanese tone structure. Cambridge, Mass.: MIT Press.Google Scholar
Pierrehumbert, Janet B. & Hirschberg, Julia (1990). The meaning of intonational contours in the interpretation of discourse. In Cohen, Philip R., Morgan, Jerry & Pollack, Martha E. (eds.) Intentions in communication. Cambridge, Mass.: MIT Press. 271311.CrossRefGoogle Scholar
Prom-on, Santitham, Birkholz, Peter & Xu, Yi (2013). Training an articulatory synthesizer with continuous acoustic data. Proceedings of Interspeech 2013. 349–353.CrossRefGoogle Scholar
Prom-on, Santitham & Xu, Yi (2012). PENTATrainer2: a hypothesis-driven prosody modeling tool. In Antonis Botinis (ed.) Proceedings of the 5th IESL Conference on Experimental Linguistics, Athens, Greece. 93–100.Google Scholar
Prom-on, Santitham, Xu, Yi & Thipakorn, Bundit (2009). Modeling tone and intonation in Mandarin and English as a process of target approximation. JASA 125. 405424.Google Scholar
Raidt, S., Bailly, G., Holm, B. & Mixdorff, H. (2004). Automatic generation of prosody: comparing two superpositional systems. In Bel, Bernard & Marlien, Isabelle (eds.) Speech prosody 2004. Nara, Japan. Available (October 2015) at http://www.isca-speech.org/archive/sp2004. 417–420.Google Scholar
Saltzman, Elliot & Munhall, Kevin G. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology 1. 333382.Google Scholar
Sun, Xuejing (2002). The determination, analysis, and synthesis of fundamental frequency. PhD dissertation, Northwestern University.Google Scholar
Taylor, Paul (2000). Analysis and synthesis of intonation using the Tilt model. JASA 107. 16971714.CrossRefGoogle ScholarPubMed
Wang, Bei & Xu, Yi (2011). Differential prosodic encoding of topic and focus in sentence-initial position in Mandarin Chinese. JPh 39. 595611.Google Scholar
Xu, Ching X. & Xu, Yi (2003). Effects of consonant aspiration on Mandarin tones. Journal of the International Phonetic Association 33. 165181.Google Scholar
Xu, Ching X., Xu, Yi & Luo, Li-Shi (1999). A pitch target approximation model for F0 contours in Mandarin. In Ohala, John J., Hasegawa, Yoko, Ohala, Manjari, Granville, Daniel & Bailey, Ashlee C. (eds.) Proceedings of the 14th International Congress of Phonetic Sciences. Berkeley: University of California. 23592362.Google Scholar
Xu, Yi (1997). Contextual tonal variations in Mandarin. JPh 25. 6183.Google Scholar
Xu, Yi (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication 46. 220251.Google Scholar
Xu, Yi (2011a). Speech prosody: a methodological review. Journal of Speech Sciences 1. 85115.CrossRefGoogle Scholar
Xu, Yi (2011b). Post-focus compression: cross-linguistic distribution and historical origin. In Lee, Wai-Sum & Zee, Eric (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong 2011. Hong Kong: University of Hong Kong. 152155.Google Scholar
Xu, Yi, Chen, Szu-Wei & Wang, Bei (2012). Prosodic focus with and without post-focus compression: a typological divide within the same language family? The Linguistic Review 29. 131147.Google Scholar
Xu, Yi, Kelly, Andrew & Smillie, Cameron (2013). Emotional expressions as communicative signals. In Hancil, Sylvie & Hirst, Daniel (eds.) Prosody and iconicity. Amsterdam & Philadelphia: Benjamins. 3359.Google Scholar
Xu, Yi, Lee, Albert, Wu, Wing-Li, Liu, Xuan & Birkholz, Peter (2013). Human vocal attractiveness as signaled by body size projection. PLoS ONE 8. e62397. Available at http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062397.Google Scholar
Xu, Yi & Liu, Fang (2006). Tonal alignment, syllable structure and coarticulation: toward an integrated model. Rivista di Linguistica 18. 125159.Google Scholar
Xu, Yi & Liu, Fang (2012). Intrinsic coherence of prosodic and segmental aspects of speech. In Niebuhr, Oliver (ed.) Understanding prosody: the role of context, function and communication. Berlin & Boston: de Gruyter. 126.Google Scholar
Xu, Yi & Prom-on, Santitham (2010–14). PENTAtrainer1: a Praat script for extracting pitch targets from individual sound files. Available (October 2015) at http://www.phon.ucl.ac.uk/home/yi/PENTAtrainer1.Google Scholar
Xu, Yi & Prom-on, Santitham (2014). Toward invariant functional representations of variable surface fundamental frequency contours: synthesizing speech melody via model-based stochastic learning. Speech Communication 57. 181208.Google Scholar
Xu, Yi & Wang, Q. Emily (2001). Pitch targets and their realization: evidence from Mandarin Chinese. Speech Communication 33. 319337.Google Scholar
Xu, Yi & Xu, Ching X. (2005). Phonetic realization of focus in English declarative intonation. JPh 33. 159197.Google Scholar