Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-15T00:57:44.482Z Has data issue: false hasContentIssue false

Where is Female Synthetic Speech?

Published online by Cambridge University Press:  27 April 2009

Caroline Henton
Affiliation:
fonix corporation, 180 West Election Road, Draper, UT 84020 e-mail: [email protected]

Abstract

There is widespread, immediate and enduring demand for high quality, natural, intelligible synthetic female voices in the expanding speech technology industry. Yet synthetic female voices are scarce, both in parametric text-to-speech (TTS) systems and in concatenative ones. Current female synthetic speech largely lacks naturalness, pleasantness and tolerability. Some acoustic specifications of female voices that are relevant to synthesis are discussed in detail. Recent research pertaining to female voice quality is reported and a ranking of these various considerations is proposed. This paper reviews the present situation and considers why there is a paucity of female voice synthesis.

Type
Articles
Copyright
Copyright © Journal of the International Phonetic Association 1999

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aronovitch, C.D. (1976). The voice of personality: stereotyped judgements and their relation to voice quality and sex of speaker. Journal of Social Psychology 99, 207–20.CrossRefGoogle ScholarPubMed
Bennett, S. & Weinberg, B. (1979). Sexual characteristics of pre-adolescent children's voices. Journal of the Acoustical Society of America 65, 179–89.Google Scholar
Bladon, A. (1986). The use of auditory modelling for speaker normalization in speech recognition. In Mermelstein, P. (editor) Montreal Symposium on Speech Recognition.Google Scholar
Bladon, A., Henton, C.G. & Pickering, J.B. (1984) Towards an auditory theory of speaker normalization. Language and Speech 4, 5969.Google Scholar
Brend, R. (1975). Male-female intonation patterns in American English. In Thome, B. & Henley, N. (editors), Language and Sex: Difference and Dominance, 8487. Rowley, MA: Newbury House.Google Scholar
Byrd, D. (1994). Relations of sex and dialect to reduction. Speech Communication 15, 3954.CrossRefGoogle Scholar
Carlson, R. & Granström, B. (1999). Speech Synthesis. In Hardcastle, W.J. & Laver, J. (editors) The Handbook of Phonetic Sciences, 768788. Oxford: Blackwell.Google Scholar
Chen, F.R. (1980). Acoustic Characteristics and Intelligibility of Clear and Conversational Speech at the Segmental Level. S.M. thesis, Dept. Electrical Engineering, MIT.Google Scholar
Crystal, D. (1975). The English Tone of Voice. London: Arnold.Google Scholar
Darrow, B. (1984). Research spurs development of talking machines, Design News, 12 03, 110.Google Scholar
Drinkwater, L. (1984). Quoted in Industry Week, 10 15, p.82.Google Scholar
Elyan, O. (1978). Sex differences in speech style. Women Speaking 4.Google Scholar
Fant, G. (1979). Temporal fine structure of formant damping and excitation. In Wolf, J.J. & Klatt, D. (editors), Speech Communication Papers, Acoustical Society of America, 161–66.Google Scholar
Gates, B. (1999). Talking to your computer-not that crazy. The CostCo Connection, 03 17.Google Scholar
Goldstein, U. (1980). An Articulatory Model for the Vocal Tracts of Growing Children. D.Sc. thesis, MIT.Google Scholar
Henton, C. (1998). Text to Speech Systems: When Size Does Matter. Proceedings of the American Voice Input-Output Society, 129–35, 1998.Google Scholar
Henton, C. (1995). Cross-language Variation in the Vowels of Female and Male Speakers. Proceedings of the XIIIth International Congress of Phonetic Sciences. Stockholm: KTH and Stockholm University, Vol. 3, 420423.Google Scholar
Henton, C. (1986). Comparative Study of Phonetic Sex-specific Differences Across Languages. D.Phil, thesis, University of Oxford.Google Scholar
Henton, C. (1983). Changes in the Vowels of Received Pronunciation. Journal of Phonetics 11, 353–71.Google Scholar
Henton, C. & Bladon, A. (1985). Breathiness in normal female speech: inefficiency versus desirability. Language and Communication 5, 221227.Google Scholar
Henton, C. & Bladon, A. (1988). Creak as a sociophonetic marker. In Hyman, L.M. & Li, C.N. (editors), Language, Speech and Mind: Studies in Honor of Victoria A. Fromkin, 329. London: Routledge.Google Scholar
Hollien, H. & Jackson, B. (1973). Normative data on the speaking fundamental frequency characteristics of young adult males. Journal of Phonetics 1, 117–20.Google Scholar
Hollien, H. & Shipp, T. (1972). Speaking fundamental frequency and chronologic age in males. Journal of Speech and Hearing Research, 15, 155–59.Google Scholar
Ingemann, F. (1968). Identification of the speaker's sex from voiceless fricatives. Journal of the Acoustical Society of America, 44, 1142–44.Google Scholar
Jespersen, O. (1922) Language, Its Nature, Development and Origin. London: Allen & Unwin.Google Scholar
Johansson, C., Sundberg, J. & Wilbrand, H. (1982) x-ray study of articulation and formant frequencies in two female singers. Quarterly Progress Status Reports, RIT Stockholm, 4, 117–34.Google Scholar
Karlsson, I. & Neovius, L. (1994). Rule-based female speech synthesis -segmental level improvements. Proceedings of the 2nd. ESCA/1EEE Workshop on Speech Synthesis. New Paltz, NY: 123126.Google Scholar
Key, M.R. (1972). Linguistic behavior of male and female. Linguistics 88, 1531.Google Scholar
Klatt, D. (1982). Speech processing strategies based on auditory models. In Carlson, R. & Granström, B. (editors), The Representation of Speech in the Peripheral Auditory System, 181–96. New York: Elsevier.Google Scholar
Klatt, D. (1986). Detailed spectral analysis of a female voice. Journal of the Acoustical Society of America 80, Suppl.1: S97.Google Scholar
Ladefoged, P. (1967) Three Areas of Experimental Phonetics. Oxford: Oxford University Press.Google Scholar
Ladefoged, P. & Bladon, R.A.W. (1982) Attempts by human speakers to reproduce Fant's nomograms. Speech Communication 1, 185–98.Google Scholar
Lakoff, R. (1975). Language and Woman's Place. New York: Harper Colophon.Google Scholar
Markel, N.N., Prebor, L.D. & Brandt, J.F. (1972) Biosocial factors in dyadic communication: sex and speaking intensity. Journal of Personality and Social Psychology 23, 1113.Google Scholar
McConnell-Ginet, S. (1983) Intonation in a man's world. In Thome, B., Kramarae, C. & Henley, N. (editors), Language, Gender and Society. Rowley, MA: Newbury House.Google Scholar
Monsen, R.B. & Engebretson, A.M. (1977) Study of variations in the male and female glottal wave. Journal of the Acoustical Society of America 62, 981–93.CrossRefGoogle ScholarPubMed
Muraskin, E. (1999). Today's TTS Technology. Computer Telephony, 03: 8294.Google Scholar
Pellowe, J. & Jones, V. (1978). On intonational variability in Tyneside speech. In Trudgill, P. (editor) Sociolinguistic Patterns in British English, 101–21. London: Arnold.Google Scholar
Pinto, de O. & Hollien, H. (1982). Speaking fundamental frequency characteristics of Australian women: then and now. Journal of Phonetics 10, 367–75.Google Scholar
Raffler-Engel von, W. & Buckner, J. (1976). A difference beyond inherent pitch? In Dubois, B. & Crouch, I. (editors), The Sociology of the Languages of American Women. Texas.Google Scholar
Schwartz, M.F. (1968). Identification of speaker sex from isolated, voiceless fricatives. Journal of the Acoustical Society of America 43, 1178–79.CrossRefGoogle ScholarPubMed
Shuy, R. (1970). Sociolinguistic research at the Center for Applied Linguistics: the correlation of language and sex. International Days of Sociolinguistics. Institutio Luigi Sturzo: 849–57.Google Scholar
Smith, P. (1985). Language, the Sexes and Society. Oxford: Blackwell.Google Scholar
Weeninck, D.J.M. (1984). Literature overview on perceptual and physical normalization of speaker variation. Proceedings of the Institute for Phonetic Sciences, University of Amsterdam 8, 517.Google Scholar