Multilingual pronunciation by analogy

TASANAWAN SOONKLANG; ROBERT I. DAMPER; YANNICK MARCHAND

doi:10.1017/S1351324908004737

Multilingual pronunciation by analogy

Published online by Cambridge University Press: 01 October 2008

TASANAWAN SOONKLANG ,

ROBERT I. DAMPER and

YANNICK MARCHAND

Show author details

TASANAWAN SOONKLANG: Affiliation:
Information: Signals, Images, Systems (ISIS) Research Group, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK e-mail: [email protected], [email protected]
ROBERT I. DAMPER: Affiliation:
Information: Signals, Images, Systems (ISIS) Research Group, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK e-mail: [email protected], [email protected]
YANNICK MARCHAND: Affiliation:
Institute for Biodiagnostics (Atlantic), National Research Council Canada, Neuroimaging Research Laboratory, 1796 Summer Street, Suite 3900, Halifax, Nova Scotia, CanadaB3H 3A7 e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Automatic pronunciation of unknown words (i.e., those not in the system dictionary) is a difficult problem in text-to-speech (TTS) synthesis. Currently, many data-driven approaches have been applied to the problem, as a backup strategy for those cases where dictionary matching fails. The difficulty of the problem depends on the complexity of spelling-to-sound mappings according to the particular writing system of the language. Hence, the degree of success achieved varies widely across languages but also across dictionaries, even for the same language with the same method. Further, the sizes of the training and test sets are an important consideration in data-driven approaches. In this paper, we study the variation of letter-to-phoneme transcription accuracy across seven European languages with twelve different lexicons. We also study the relationship between the size of dictionary and the accuracy obtained. The largest dictionaries of each language have been partitioned into ten approximately equal-sized subsets and combined to give ten different-sized test sets. In view of its superior performance in previous work, the transcription method used is pronunciation by analogy (PbA). Best results are obtained for Spanish, generally believed to have a very regular (‘shallow’) orthography, and poorest results for English, a language whose irregular spelling system is legendary. For those languages for which multiple dictionaries were available (i.e., French and English), results were found to vary across dictionaries. For the relationship between dictionary size and transcription accuracy, we find that as dictionary size grows, so performance grows monotonically. However, the performance gain decelerates (tends to saturate) as the dictionary increases in size; the relation can simply be described by a logarithmic regression, one parameter of which (α) can be taken as quantifying the depth of orthography of a language. We find that α for a language is significantly correlated with transcription performance on a small dictionary (approximately 10,000 words) for that language, but less so for asymptotic performance. This may be because our measure of asymptotic performance is unreliable, being extrapolated from the fitted logarithmic regression.

Type: Papers
Information: Natural Language Engineering , Volume 14 , Issue 4 , October 2008 , pp. 527 - 546

DOI: https://doi.org/10.1017/S1351324908004737 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abercrombie, D. 1981. Extending the Roman alphabet: Some orthographic experiments of the past four centuries. In Asher, R. E. and Henderson, E. (eds.) Towards a History of Phonetics, p. 207–24. Edinburgh, UK: Edinburgh University Press.Google Scholar

Aha, D. W. 1997. Lazy learning. Artificial Intelligence Review 11 (1–5): 7–10.CrossRef Google Scholar

Baayen, H. 2001. Word Frequency Distributions. Dordrecht, The Netherlands: Kluwer Academic Publishers.CrossRef Google Scholar

Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: novel word pronunciation and lexicon compression. Computer Speech and Language 12 (2): 119–42.CrossRef Google Scholar

Banko, M., and Brill, E. 2001. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, p. 26–33.Google Scholar

Carney, E. 1994. A Survey of English Spelling. London, UK: Routledge.Google Scholar

Cherkassky, V. and Mulier, F. 1998. Learning from Data. New York: John Wiley.Google Scholar

Coltheart, M. 1978. Lexical access in simple reading tasks. In Underwood, G. (ed.), Strategies of Information Processing, p. 151–216. New York: Academic Press.Google Scholar

Daelemans, W., van den Bosch, A., and Weijters, T. 1997. IGTree: using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review 11 (1–5): 407–23.CrossRef Google Scholar

Daelemans, W., van den Bosch, A., and Zavrel, J. 1999. Forgetting exceptions is harmful in language learning. Machine Learning 34 (1–3): 11–43.CrossRef Google Scholar

Damper, R. I. 2001. Data-Driven Methods in Speech Synthesis. Dordrecht, The Netherlands: Kluwer Academic Publishers.Google Scholar

Damper, R. I., and Eastmond, J. F. G. 1997. Pronunciation by analogy: impact of implementational choices on performance. Language and Speech 40 (1): 1–23.CrossRef Google Scholar

Damper, R. I., and Marchand, Y. 2006. Information fusion approaches to the automatic pronunciation of print by analogy. Information Fusion 71 (2): 207–20.CrossRef Google Scholar

Damper, R. I., Marchand, Y., Adamson, M. J., and Gustafson, K. 1999. Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches. Computer Speech and Language 13 (2): 155–76.CrossRef Google Scholar

Damper, R. I., Marchand, Y., Adsett, C. R., Soonklang, T., and Marsters, J.-D. S. 2005a. Multilingual data-driven pronunciation. In Proceedings of 10th International Conference on Speech and Computer (SPECOM 2005), Patras, Greece, p. 167–70.Google Scholar

Damper, R. I., Marchand, Y., Marsters, J.-D. S., and Bazin, A. I. 2005b. Aligning text and phonemes for speech technology applications using an EM-like algorithm. International Journal of Speech Technology 8 (2): 149–62.CrossRef Google Scholar

Dedina, M. J., and Nusbaum, H. C. 1991. Pronounce: a program for pronunciation by analogy. Computer Speech and Language 5 (1): 55–64.CrossRef Google Scholar

Dutoit, T. 1997. Introduction to Text-to-Speech Synthesis. Dordrecht, The Netherlands: Kluwer Academic Publishers.CrossRef Google Scholar

Elovitz, H. S., Johnson, R., McHugh, A., and Shore, J. E. 1976. Letter-to-sound rules for automatic translation of English text to phonetics. IEEE Transactions on Speech and Audio Processing ASSP-24 (6): 446–59.Google Scholar

Federici, S., Pirrelli, V., and Yvon, F. 1995. Advances in analogy-based learning: false friends and exceptional items in pronunciation by paradigm-driven analogy. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 1995) Workshop on New Approaches to Learning for Natural Language Processing, Montreal, Canada, pp. 158–163.Google Scholar

Holmes, J. N., and Holmes, W. 2001. Speech Synthesis and Recognition, 2nd ed.New York: Taylor and Francis.Google Scholar

Jiampojamarn, S., Kondrak, G., and Sherif, T. 2007. Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2007), Rochester, NY, pp. 372–79.Google Scholar

Katz, L., and Feldman, L. B. 1981. Linguistic coding in word recognition: comparisons between a deep and a shallow orthography. In Lesgold, A. M. and Perfetti, C. A. (ed.), Interactive Processes in Reading, pp. 85–106. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Klatt, D. H. 1987. Review of text-to-speech conversion for English. Journal of the Acoustical Society of America 82 (3): 737–93.CrossRef Google Scholar PubMed

Liberman, I., Liberman, A., Mattingly, I., and Shankweiler, D. 1980. Orthography and the beginning reader. In Kavanagh, J. and Venezky, R. (eds.), Orthography, Reading and Dyslexia, pp. 137–53. Baltimore, OH: University Park Press.Google Scholar

Marchand, Y., and Damper, R. I. 2000. A multistrategy approach to improving pronunciation by analogy. Computational Linguistics 26 (2): 195–219.CrossRef Google Scholar

Marchand, Y., and Damper, R. I. 2007. Can syllabification improve pronunciation by analogy? Natural Language Engineering 13 (1): 1–24.CrossRef Google Scholar

McCulloch, N., Bedworth, M., and Bridle, J. 1987. netspeak—a re-implementation of nettalk. Computer Speech and Language 2 (3–4): 289–301.CrossRef Google Scholar

Möbius, B. 2003. Rare events and closed domains: Two delicate concepts in speech synthesis. International Journal of Speech Technology, 6 (1), 57–71.CrossRef Google Scholar

Partee, B. H., terMeulen, A. G. B. Meulen, A. G. B., and Wall, R. E. 1993. Mathematical Methods in Linguistics. Dordrecht, the Netherlands: Kluwer Academic Publishers (Corrected second printing).CrossRef Google Scholar

Sampson, G. 1985. Writing Systems. London, UK: Hutchinson.Google Scholar

Schroeder, M. 1991. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. New York: W. H. Freeman.Google Scholar

Scragg, D. G. 1975. A History of English Spelling. Manchester, UK: Manchester University Press.Google Scholar

Sejnowski, T. J., and Rosenberg, C. R. 1987. Parallel networks that learn to pronounce English text. Complex Systems 1 (1)145–68.Google Scholar

Siegel, S. 1956. Nonparametric Statistics for the Behavioral Sciences. Tokyo, Japan: McGraw-Hill Kogakusha.Google Scholar

Sproat, R., Möbius, B., Maeda, K., and Tzoukermann, E. 1998. Multilingual text analysis. In Sproat, R. (ed.), Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, pp. 31–87. Dordrecht, The Netherlands: Kluwer Academic Publishers.Google Scholar

Sullivan, K. P. H. 2001. Analogy, the corpus and pronunciation. In Damper (ed.) Data-Driven methods in speech synthesis, pp. 45–70. Dordrecht, the Netherlands: Kluwer Academic.Google Scholar

Sullivan, K. P. H., and Damper, R. I. 1993. Novel-word pronunciation: a cross-language study. Speech Communication 13 (3–4): 441–52.CrossRef Google Scholar

Turvey, M. T., Feldman, L. B., and Lukatela, G. 1984. The Serbo-Croatian orthography constrains the reader to a phonologically analytic strategy. In Henderson, L. (ed.), Orthographies and Reading, Perspectives from Cognitive Psychology, Neuropsychology and Linguistics, pp. 81–89. London, UK: Lawrence Erlbaum Associates.Google Scholar

van den Bosch, A. 1997. Learning to Pronounce Written Words: A Study in Inductive Language Learning. PhD Thesis, University of Maastricht, The Netherlands.Google Scholar

van den Bosch, A., Content, A., Daelemans, W., and DeGelder, B. Gelder, B. 1994. Measuring the complexity of writing systems. Journal of Quantitative Linguistics 1 (3): 178–88.CrossRef Google Scholar

van den Bosch, A., Weijters, A., van den Herik, H. J., and Daelemans, W. 1997. When small disjuncts abound, try lazy learning. In Proceedings of the 7th Belgian-Dutch Conference on Machine Learning, BENELEARN-97, Tilburg, The Netherlands, pp. 109–118.Google Scholar

Venezky, R. L. 1965. A Study of English Spelling-to-Sound Correspondences on Historical Principles. Ann Arbor, MI: Ann Arbor Press.Google Scholar

Yvon, F. 1996a. Grapheme-to-phoneme conversion using multiple unbounded overlapping chunks. In Proceedings of Conference on New Methods in Natural Language Processing (NeMLaP-2 96), Ankara, Turkey, pp. 218–28.Google Scholar

Yvon, F. 1996b. Prononcer par Analogie: Motivations, Formalisations et Évaluations. PhD Thesis, ENST, Paris, France.Google Scholar

Zipf, G. K. 1949. Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.Google Scholar

Zue, V. W., and Glass, J. R. 2000. Conversational interfaces: advances and challenges. Proceedings of the IEEE 88 (8): 1166–180.CrossRef Google Scholar

Article contents

Multilingual pronunciation by analogy

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests