Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-27T15:01:51.688Z Has data issue: false hasContentIssue false

13 - Fundamental Frequency and Pitch

from Section III - Measuring Speech

Published online by Cambridge University Press:  11 November 2021

Rachael-Anne Knight
Affiliation:
City, University of London
Jane Setter
Affiliation:
University of Reading
Get access

Summary

Pitch, the subjective impression of whether individual speech sounds are perceived as relatively high or low, is an important characteristic of spoken language, contributing in some languages to the lexical identity of words and in all languages to the perception of the intonation pattern of utterances. Pitch corresponds to the physiological parameter of the frequency of vibration of the vocal folds, the fundamental frequency, which can be measured in cycles per second or hertz.Estimating and measuring fundamental frequency and modelling pitch is not easy. After presenting some automatic models of pitch, we address issues related to the detection and measurement of fundamental frequency, including tracking/detection errors, and explain how many of these errors can be avoided by the appropriate choice of pitch ceiling and floor settings. We finally discuss the use of acoustic scales (linear, logarithmic, psychoacoustic) for the measurement of pitch. Based on evidence from recent findings in neuroanatomy, neurophysiology, behavioural studies and speech production, we suggest that a new scale, the Octave-Median (OMe) scale, appears to be more natural for the study of speech prosody.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

13.7 References

Beranek, L. L. (1949). Acoustical Measurements. Melville, NY: Acoustical Society of America [revised edition 1988].Google Scholar
Bigi, B. (2015). SPPAS – Multi-lingual approaches to the automatic annotation of speech. The Phonetician (International Society of Phonetic Sciences), 111–112(I–II), 5469.Google Scholar
Boersma, P & Weenink, D. (2019). Praat: Doing Phonetics by Computer [computer program]. Version 6.0.56, June 2019, www.praat.org.Google Scholar
Braun, M. (2001). Speech mirrors norm-tones: Absolute pitch as a normal but precognitive trait. Acoustics Research Letters Online, 2(3), 8590.Google Scholar
Braun, M. (2006). A retrospective study of the spectral probability of spontaneous otoacoustic emissions: Rise of octave shifted second mode after infancy. Hearing Research, 215, 3946.CrossRefGoogle ScholarPubMed
Braun, M. & Chaloupka, V. (2005). Carbamazepine induced pitch shift and octave space representation. Hearing Research, 210, 8592.CrossRefGoogle ScholarPubMed
Brøndsted, T. (1997). Intonation contours distorted by tone patterns of stress groups and word accent. In Botinis, A., ed., Intonation: Theory, Models and Applications (Proceedings of an ISCA workshop). Athens: Athanasopoulos, pp. 55–8.Google Scholar
Chentir, A., Guerti, M. & Hirst, D. J. (2009). Extraction of standard Arabic micromelody. Journal of Computer Science, 5(2), 86–9.CrossRefGoogle Scholar
Cho, H. & Rauzy, S. (2008). Phonetic pitch movements of accentual phrases in Korean read speech. In Proceedings of the 4th International Conference on Speech Prosody, Campinas, Brazil.Google Scholar
De Looze, C. (2010). Analyse et interprétation de l’empan temporel des variations prosodiques en français et en anglais. PhD thesis, Université de Provence, Aix-en-Provence, France.Google Scholar
De Looze, C. & Hirst, D. J. (2008). Detecting changes in key and range for the automatic modelling and coding of intonation. In Proceedings of 4th International Conference on Speech Prosody. Campinas, Brazil, pp. 135–8.Google Scholar
De Looze, C. & Hirst, D. J. (2014). The OMe (Octave-Median) scale: A natural scale for speech melody. Proceedings of the 7th International Conference on Speech Prosody, Dublin, pp. 910–13.Google Scholar
Di Cristo, A. & Hirst, D. J. (1986). Modelling French micromelody: Analysis and synthesis. Phonetica, 43 (1–3), 1130.CrossRefGoogle Scholar
Fant, G. (1968). Analysis and synthesis of speech processes. In Malmberg, B., ed., Manual of Phonetics. Amsterdam: North Holland, pp. 173–7.Google Scholar
Fant, G. (2004). Speech Acoustics and Phonetics. Dordrecht: Kluwer.Google Scholar
Fourcin, A. J. & Abberton, E. (1971). First applications of a new laryngograph. Medical and Biological Illustration, 21, 172–82.Google ScholarPubMed
Fujisaki, H. (2004). Information, prosody, and modeling – with emphasis on tonal features of speech. In Proceedings of the Second International Conference on Speech Prosody, Nara, Japan, pp. 110.Google Scholar
Fujisaki, H. & Nagashima, S. (1969). A model for the synthesis of pitch contours of connected speech. Annual Report of the Engineering Research Institute, 28, 5360.Google Scholar
Gårding, E. (1998). Intonation in Swedish. In Hirst, D. J. and Di Cristo, A., eds., Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge University Press, pp. 117–36.Google Scholar
Goldsmith, J. A. (1990). Autosegmental and Metrical Phonology. Cambridge, MA: Blackwell.Google Scholar
Graddol, D. (1986). Discourse specific pitch behaviour. In Johns Lewis, C., ed., Intonation in Discourse. Edinburgh: Croom Helm, pp. 221–38.Google Scholar
Halle, M. & Vergnaud, J.-R. (1987). An Essay on Stress. Cambridge, MA: MIT Press.Google Scholar
Hanson, H. (2009). Effects of obstruent consonants on fundamental frequency at vowel onset in English. Journal of the Acoustical Society of America, 125, 425–41.CrossRefGoogle ScholarPubMed
’t Hart, J., Collier, R. & Cohen, A. (1990). A Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. Cambridge: Cambridge University Press.Google Scholar
Hermes, D. I. & van Gestel, I. E. (1991). The frequency scale of speech intonation. Journal of the Acoustical Society of America, 90, 97102.Google Scholar
Hess, W. (1983). Pitch Determination of Speech Signals: Algorithms and Devices. Belin: Springer-Verlag.Google Scholar
Hirst, D. J. (1981). Phonological implications of a production model of English intonation. Phonologica, 1980, 195201.Google Scholar
Hirst, D. J. (1983). Structures and categories in prosodic representations. In Cutler, A. & Ladd, D. R., eds., Prosody: Models & Measurements. Berlin: Springer, pp. 93109.CrossRefGoogle Scholar
Hirst, D. J. (2007). A Praat plugin for Momel and INTSINT with improved algorithms for modelling and coding intonation. In Proceedings of the XVIth International Conference of Phonetic Sciences (paper 1443), Saarbrücken, pp. 1233–6.Google Scholar
Hirst, D. J. (2012). Diapason.praat. Praat script. www.researchgate.net/publication/327764721_diapason.Google Scholar
Hirst, D. J. (2015). ProZed: A speech prosody editor for linguists, using analysis-by-synthesis. In Hirose, K. & Tao, J., eds., Speech Prosody in Speech Synthesis. Modeling and Generation of Prosody for High Quality and Flexible Speech Synthesis. Berlin: Springer-Verlag, pp. 317.CrossRefGoogle Scholar
Hirst, D. J. & Espesser, R. (1993). Automatic modelling of fundamental frequency using a quadratic spline function. Travaux de l’Institut de Phonétique d’Aix, 15, 7585.Google Scholar
Hirst, D. J., Di Cristo, A. & Espesser, R. (2000). Levels of representation and levels of analysis for intonation. In Horne, M., ed., Prosody: Theory and Experiment. Dordrecht: Kluwer Academic Publishers, pp. 5187.Google Scholar
Hirst, D. J., Cho, H., Kim, S. & Yu, H. (2007). Evaluating two versions of the Momel pitch modeling algorithm on a corpus of read speech in Korean. In Proceedings of INTERSPEECH, VIII. Antwerp, Belgium, pp. 1649–52.Google Scholar
House, A. & Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America, 25, 105–13.Google Scholar
House, D. (1990). Tonal Perception in Speech. Lund: Lund University Press.Google Scholar
Iivonen, A. (1998). Intonation in Finnish. In Hirst, D. J. and Di Cristo, A., eds., Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge University Press, pp. 331–47.Google Scholar
Imig, T. J. & Morel, A. (1985). Tonotopic organization in ventral nucleus of medial geniculate body in the cat. Journal of Neurophysiology, 53, 309–40.CrossRefGoogle ScholarPubMed
Jassem, W. (1952). Intonation of Conversational English (educated Southern British). Wrocław: Wrocławskie Towarzystwo Naukowe [PDF available from the Speech and Language Data Repository, http://sldr.org/sldr000777/en].Google Scholar
Jones, D. (1909). Intonation Curves. Leipzig: Teubner.Google Scholar
Kiessling, A., Kompe, R., Niemann, H., Nöth, E. & Batliner, A. (1995). Voice source state as a source of information in speech recognition: Detection of laryngealizations. Natoasi Series of Computer and Systems Sciences, 147, 329–32.Google Scholar
Kuttner, F. A. (1975). Prince Chu Tsai-Yu’s life and work: A re-evaluation of his contribution to equal temperament theory. Ethnomusicology, 19(2), 163206.Google Scholar
Liberman, M. (2017). Pitch contour perception. http://languagelog.ldc.upenn.edu/nll/?p=34251.Google Scholar
Lindley, Mark. (2001). Well-tempered clavier. In Sadie, S. & Tyrrell, J., eds., The New Grove Dictionary of Music and Musicians, 2nd ed. London: Macmillan.Google Scholar
Liu, J., Wang, N., Li, J., Shi, B. & Wang, H. (2009). Frequency distribution of synchronized spontaneous otoacoustic emissions showing sex-dependent differences and asymmetry between ears in 2- to 4- day-old neonates. International Journal of Pediatric Otorhinolaryngology, 73(5), 731–6.Google Scholar
Maghbouleh, A. (1998). Tobi accent type recognition. In Proceedings of the Sixth International Conference on Spoken Language Processing, Paper 0632.Google Scholar
Martin, P. (1981). Extraction de la fréquence fondamentale par intercorrélation avec une fonction peigne. 12e Journées d’Etude sur la Parole, SFA, Montréal.Google Scholar
Mertens, P. (2004). The Prosogram: Semi-automatic transcription of prosody based on a tonal perception model. In Proceedings of the 2nd International Conference on Speech Prosody, Nara, Japan, pp. 549–52.Google Scholar
Mertens, P. (2018). Prosogram, v 2.15. Pitch contour stylization based on a tonal perception model. https://sites.google.com/site/prosogram/home.Google Scholar
Mertens, P. & d’Alessandro, C. (1995). Pitch contour stylization using a tonal perception model. In Proceedings of the 13th International Congress of Phonetic Sciences vol. 4, pp. 228–31.Google Scholar
Mixdorff, H. -J. (1999). A novel approach to the fully automated extraction of Fujisaki model parameters. In Proceedings of ICASSP 1999, pp. 1281–4.Google Scholar
Moore, B. C. J. & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. Journal of the Acoustical Society of America, 74, 750–3.Google Scholar
Moore, B. C. J. & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model. Acta Acustica, 82, 335–45.Google Scholar
Morel, A. (1980). Codage des sons dans le corps genouille médian du chat: évaluation de l’organisation tonotopique de ses différents noyaux, PhD dissertation, Université de Lausanne, Juris, Zurich.Google Scholar
Morest, D. K. (1965). The laminar structure of the medial geniculate body of the cat. Journal of Anatomy 99, 143–60.Google Scholar
Nolan, F. (2003). Intonational equivalence: an experimental evaluation of pitch scales. In Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 771–4.Google Scholar
Nooteboom, S. (1999). The prosody of speech melody and rhythm. In Hardcastle, W. J. & Laver, J., eds., The Handbook of Phonetic Sciences. London: Blackwell, pp. 640–73.Google Scholar
O’Shaughnessy, D. (1987). Speech Communication: Human and Machine. Reading, MA: Addison-Wesley, p. 150.Google Scholar
Paeschke, A. & Sendlmeier, W. F. (2000). Prosodic characteristics of emotional speech: Measurements of fundamental frequency movements. In Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, Ireland, pp. 7580.Google Scholar
Rossi, M. (1971). Le seuil de glissando ou seuil de perception des variations tonales pour les sons de la parole. Phonetica, 23, 133.Google Scholar
Silverman, K. (1986). f0 segmental cues depend on intonation: The case of the rise after voiced stops. Phonetica, 43(1–3), 7691.Google Scholar
Steele, J. (1779). Prosodia Rationalis: or, an Essay towards Establishing the Melody and Measure of Speech, to be Expressed and Perpetuated by Peculiar Symbols, 2nd ed. London: J. Nichols.Google Scholar
Stevens, S., Volkman, J. & Newman, E. (1937). A scale for the measurement of the psychological magnitude of pitch. Journal of the Acoustical Society of America, 8, 185–90.Google Scholar
Taylor, P. (1995). The rise/fall/connection model of intonation. Speech Communication, 15(1–2), 169–86.Google Scholar
Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. Journal of the Acoustical Society of America, 88, 97100.Google Scholar
Traunmüller, H. (1997). Auditory scales of frequency representation. www2.ling.su.se/staff/hartmut/bark.htm.Google Scholar
Umesh, S., Cohen, L. & Nelson, D. (1999). Fitting the Mel-scale. In Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing, 1, Phoenix, Arizona, USA, March 1999, pp. 217–20.Google Scholar
Véronis, J., Hirst, D. J. & Ide, N. (1994). NL and speech in the Multext project. In Proceedings of AAAI Workshop on Integration of Natural Language and Speech, Seattle, USA, pp. 72–8.Google Scholar
Wightman, C. & Campbell, N. (1995). Improved labeling of prosodic structure. In IEEE Transactions on Speech and Audio Processing.Google Scholar
Wikipedia. (2018). Pitch detection algorithm. https://en.wikipedia.org/wiki/Pitch_detection_algorithm.Google Scholar
Wright, A. A., Rivera, J. J., Hulse, S. H., Shyan, M. & Neiworth, J. J. (2000). Music perception and octave generalization in rhesus monkeys. Journal of Experimental Psychology Gen 129 (3), 291307.Google Scholar
Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands (Frequenz-gruppen). Journal of the Acoustical Society of America, 33, 248.Google Scholar
Zwirner, E. & Zwirner, Z. K. (1937). Über das Hören und Messen des Sprachmelodie, Achiv für vergleichende Phonetik 1, pp. 3547.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×