Book contents
- Frontmatter
- Contents
- List of contributors
- 1 Multimodal signal processing for meetings: an introduction
- 2 Data collection
- 3 Microphone arrays and beamforming
- 4 Speaker diarization
- 5 Speech recognition
- 6 Sampling techniques for audio-visual tracking and head pose estimation
- 7 Video processing and recognition
- 8 Language structure
- 9 Multimodal analysis of small-group conversational dynamics
- 10 Summarization
- 11 User requirements for meeting support technology
- 12 Meeting browsers and meeting assistants
- 13 Evaluation of meeting support technology
- 14 Conclusion and perspectives
- References
- Index
5 - Speech recognition
Published online by Cambridge University Press: 05 July 2012
- Frontmatter
- Contents
- List of contributors
- 1 Multimodal signal processing for meetings: an introduction
- 2 Data collection
- 3 Microphone arrays and beamforming
- 4 Speaker diarization
- 5 Speech recognition
- 6 Sampling techniques for audio-visual tracking and head pose estimation
- 7 Video processing and recognition
- 8 Language structure
- 9 Multimodal analysis of small-group conversational dynamics
- 10 Summarization
- 11 User requirements for meeting support technology
- 12 Meeting browsers and meeting assistants
- 13 Evaluation of meeting support technology
- 14 Conclusion and perspectives
- References
- Index
Summary
General overview
Meetings are a rich resource of information that, in practice, is mostly untouched by any form of information processing. Even now it is rare that meetings are recorded, and fewer are then annotated for access purposes. Examples of the latter only include meetings held in parliaments, courts, hospitals, banks, etc., where a record is required for reasons of decision tracking or legal obligations. In these cases a labor-intensive manual transcription of the spoken words is produced. Giving much wider access to the rich content is the main aim of the AMI consortium projects, and there are now many examples of interest in that access – through the release of commercial hardware and software services. Especially with the advent of high-quality telephone and videoconferencing systems the opportunity to record, process, recognize, and categorize the interactions in meetings is recognized even by skeptics of speech and language processing technology.
Of course meetings are an audio-visual experience by nature and humans make extensive use of visual and other sensory information. To illustrate the rich landscape of information is the purpose of this book and many applications can be implemented even without looking at the spoken word. However, it is still verbal communication that forms the backbone of most meetings, and accounts for the bulk of the information transferred between participants. Hence automatic speech recognition (ASR) is key to access the information exchanged and is the most important part required for most higher level processing.
- Type
- Chapter
- Information
- Multimodal Signal ProcessingHuman Interactions in Meetings, pp. 56 - 83Publisher: Cambridge University PressPrint publication year: 2012
- 2
- Cited by