Book contents
- Frontmatter
- Contents
- List of contributors
- 1 Multimodal signal processing for meetings: an introduction
- 2 Data collection
- 3 Microphone arrays and beamforming
- 4 Speaker diarization
- 5 Speech recognition
- 6 Sampling techniques for audio-visual tracking and head pose estimation
- 7 Video processing and recognition
- 8 Language structure
- 9 Multimodal analysis of small-group conversational dynamics
- 10 Summarization
- 11 User requirements for meeting support technology
- 12 Meeting browsers and meeting assistants
- 13 Evaluation of meeting support technology
- 14 Conclusion and perspectives
- References
- Index
14 - Conclusion and perspectives
Published online by Cambridge University Press: 05 July 2012
- Frontmatter
- Contents
- List of contributors
- 1 Multimodal signal processing for meetings: an introduction
- 2 Data collection
- 3 Microphone arrays and beamforming
- 4 Speaker diarization
- 5 Speech recognition
- 6 Sampling techniques for audio-visual tracking and head pose estimation
- 7 Video processing and recognition
- 8 Language structure
- 9 Multimodal analysis of small-group conversational dynamics
- 10 Summarization
- 11 User requirements for meeting support technology
- 12 Meeting browsers and meeting assistants
- 13 Evaluation of meeting support technology
- 14 Conclusion and perspectives
- References
- Index
Summary
Goals and achievements
Money has been spent. About 20 million Euros over six years through the European AMI and AMIDA projects, complemented by a number of satellite projects and national initiatives, including the large IM2 Swiss NSF National Center of Competence in Research. This book has provided a unique opportunity to review this research, and we conclude by attempting to make a fair assessment of what has been achieved compared to the initial vision and goals.
Our vision was to develop multimodal signal processing technologies to capture, analyze, understand, and enhance human interactions. Although we had the overall goal of modeling communicative interactions in general, we focused our efforts on enhancing the value of multimodal meeting recordings and on the development of real-time tools to enhance human interaction in meetings. We pursued these goals through the development of smart meeting rooms and new tools for computer-supported cooperative work and communication, and through the design of new ways to search and browse meetings.
The dominant multimodal research paradigm in the late 1990s was centered on the design of multimodal human-computer interfaces. In the AMI and AMIDA projects we switched the focus to multimodal interactions between people, partly as a way to develop more natural communicative interfaces for human-computer interaction. As discussed in Chapter 1, and similar to what was done in some other projects around the same time, our main idea was to put the computer within the human interaction loop (as explicitly referred to by the EU CHIL project), where computers are primarily used as a mediator to enhance human communication and collaborative potential.
This approach raised a number of major research challenges, while also offering application opportunities. Human communication is one of the most complex processes we know, characterized by fast, highly sensitive multimodal processing in which information is received and analyzed from multiple simultaneous inputs in real time, with little apparent effort.
- Type
- Chapter
- Information
- Multimodal Signal ProcessingHuman Interactions in Meetings, pp. 232 - 237Publisher: Cambridge University PressPrint publication year: 2012