from Part III - Machine Synthesis of Social Signals
Published online by Cambridge University Press: 13 July 2017
Introduction
Speech synthesis (or alternatively text-to-speech synthesis) means automatically converting natural language text into speech. Speech synthesis has many potential applications. For example, it can be used as an aid to people with disabilities (see Challenges for the Future), for generating the output of spoken dialogue systems (Lemon et al., 2006; Georgila et al., 2010), for speech-to-speech translation (Schultz et al., 2006), for computer games, etc.
Current state-of-the-art speech synthesizers can simulate neutral read aloud speech (i.e., speech that sounds like reading from some text) quite well, both in terms of naturalness and intelligibility (Karaiskos et al., 2008). However, today, many commercial applications that require speech output still rely on prerecorded system prompts rather than use synthetic speech. The reason is that, despite much progress in speech synthesis over the last twenty years, current state-of-the-art synthetic voices still lack the expressiveness of human voices. On the other hand, using prerecorded speech has several drawbacks. It is a very expensive process that often has to start from scratch for each new application. Moreover, if an application needs to be enhanced with new prompts, it is quite likely that the person (usually an actor) that recorded the initial prompts will not be available. Furthermore, human recordings cannot be used for content generation on the fly, i.e., all the utterances that will be used in an application need to be predetermined and recorded in advance. Predetermining all utterances to be recorded is not always possible. For example, the number of names in the database of an automatic directory assistance service can be huge. Not to mention the fact that most databases are continuously being updated. In such cases, speech output is generated by using a mixture of prerecorded speech (for prompts) and synthetic speech (for names) (Georgila et al., 2003). The results of such a mixture can be quite awkward.
The discussion above shows that there is great motivation for further advances in the field of speech synthesis. Below we provide an overview of the current state of the art in speech synthesis, and present challenges for future work.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.