We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This case study demonstrates how semiotic methods can be used for systems analysis, and the other stages of systems development are covered by other approaches. The chapter also discusses why, after having tried other systems analysis methods, the Semantic Analysis method was chosen for the project.
Background
CONTEST (COMputerised TEST construction system) is a software engineering project which started from research and then evolved into a commercial project. In this project, Semantic Analysis has been applied for requirement analysis. An object-oriented system design has been produced based on the semantic model, and, further, the system construction has been carried out using object-oriented programming languages and tools.
The project commenced from two doctoral research projects, in which the major theoretical investigation and some experiments were conducted (Adema 1990, Boekkooi-Timminga 1989). A system analysis and design were performed which led to a prototype. The research project has been continued and expanded with input of more resources. The current objective of the project is to produce a practical, useful, computerised system for automatic management and production of tests.
CONTEST project
Tests are traditionally managed and produced manually. The questions (or the items as termed by the educational professionals) are written by the item composers, and centrally collected and stored with their attributes in an item bank in a test agency.
The motion of the human body can in itself be a useful human-machine interface, and computer vision can provide a method for tracking the body in an unobtrusive fashion. In this chapter, we describe a system capable of tracking the human arm in 3D using only a single camera and no special markers on the body. The real-time implementation has a manipulation resolution of 1 cm and has been tested as a novel 3D input device.
Introduction and Motivation
Visual estimation and tracking of the motion and gestures of the human body is an interesting and exciting computational problem for two reasons: (a) from the engineering standpoint, a non-invasive machine that could track body motion would be invaluable in facilitating most human-machine interactions and, (b) it is an important scientific problem in its own right. Observing the human body in motion is key to a large number of activities and applications:
Security – In museums, factories and other locations that are either dangerous or sensitive it is crucial to detect the presence of humans and monitor/classify their behavior based upon their gait and gestures.
Animation – The entertainment industry makes increasing use of actor-to-cartoon animation where the motion of cartoon figures and rendered models is obtained by tracking the motion of a real person.
Virtual reality – The motion of the user of a virtual reality system is necessary to adjust display parameters and animations.
This chapter describes progress in building computer systems that understand people, and can work with them in the manner of an attentive human-like assistant. To accomplish this, I have built a series of real-time experimental testbeds, called Smart Rooms. These testbeds are instrumented with cameras and microphones, and perform audio-visual interpretation of human users. Real-time capabilities include 3D tracking of head, hands, and feet, and recognition of hand/body gestures. The system can also support face recognition and interpretation of face expression.
Introduction
My goal is to make it possible for computers to function like attentive, human-like assistants. I believe that the most important step toward achieving this goal is to give computers an ability that I call perceptual intelligence. They have to be able to characterize their current situation by answering questions such as who, what, when, where, and why, just as writers are taught to do.
In the language of cognitive science, perceptual intelligence is the ability to solve the frame problem: it is being able to classify the current situation, so that you know what variables are important, and thus can act appropriately. Once a computer has the perceptual intelligence to know who, what, when, where, and why, then simple statistical learning methods have been shown to be sufficient for the computer to determine which aspects of the situation are significant, and to choose a helpful course of action [205].
Computer vision-based sensing of people enables a new class of public multi-user computer interfaces. Computing resources in public spaces, such as automated, information-dispensing kiosks, represent a computing paradigm that differs from the conventional desktop environment and correspondingly, requires a user-interface metaphor quite unlike the traditional WIMP interface. This chapter describes a prototype public computer interface which employs color and stereo tracking to sense the users' activity and an animated speaking agent to attract attention and communicate through visual and audio modalities.
Introduction
An automated, information-dispensing Smart Kiosk, which is situated in a public space for use by a general clientele, poses a challenging human computer interface problem. A public kiosk interface must be able to actively initiate and terminate interactions with users and divide its resources among multiple customers in an equitable manner. This interaction scenario represents a significant departure from the standard WIMP (windows, icons, mouse, pointer) paradigm, but will become increasingly important as computing resources migrate off the desktop and into public spaces. We are exploring a social interface paradigm for a Smart Kiosk, in which computer vision techniques are used to sense people and a graphical speaking agent is used to output information and communicate cues such as focus of attention.
Human sensing techniques from computer vision can play a significant role in public user-interfaces for kiosk-like appliances. Using unobtrusive video cameras, they can provide a wealth of information about users, ranging from their three-dimensional location to their facial expressions and body language.
This chapter describes the work on human-computer interaction being carried out in our laboratory at the University of Osaka. Recognition of human expressions is necessary for human-computer interactive applications. A vision system is suitable for recognition of human expression since this involves passive sensing and the human gestures of hand, body and face that can be recognized without any discomfort for the user. The computer should not restrict the movements of the human to the front of the computer. Therefore, we study methods of looking at people using a network of active cameras.
Introduction
Sensing of human expressions is very important for human-computer interactive applications such as virtual reality, gesture recognition, and communication. A vision system is suitable for human-computer interaction since this involves passive sensing and the human gestures of the hand, body, and face that can be recognized without any discomfort for the user. We therefore use cameras for the sensors in our research to estimate human motion and gestures.
Facial expression is a natural human expression and is necessary to communicate such emotions as happiness, surprise, and sadness to others. A large number of studies have been made on machine recognition of human facial expression. Many of them are based on multi-resolution monochrome images and template pattern matching techniques [62, 293, 324]. This kind of approach needs some average operation on the face model or blurring of the input image to cope with the different appearance of faces in the images.
Face it. Butlers cannot be blind. Secretaries cannot be deaf. But somehow we take it for granted that computers can be both.
Human-computer interface dogma was first dominated by direct manipulation and then delegation. The tacit assumption of both styles of interaction has been that the human will be explicit, unambiguous and fully attentive. Equivocation, contradiction and preoccupation are unthinkable even though they are very human behaviors. Not allowed. We are expected to be disciplined, fully focused, single minded and ‘there’ with every attending muscle in our body. Worse, we accept it.
Times will change. Cipolla, Pentland et al, fly in the face (pun intended) of traditional human-computer interface research. The questions they pose and answers they provide have the common thread of concurrency. Namely, by combining modes of communication, the resulting richness of expression is not only far greater than the sum of the parts, but allows for one channel to disambiguate the other. Look. There's an example right there. Where? Well, you can't see it, because you cannot see me, where I am looking, what's around me. So the example is left to your imagination.
That's fine in literature and for well codified tasks. Works for making plane reservations, buying and selling stocks and, think of it, almost everything we do with computers today. But this kind of categorical computing is crummy for design, debate and deliberation. It is really useless when the purpose of communication is to collect our own thoughts.
A key issue in advanced interface design is the development of friendly tools for natural interaction between user and machine. In this chapter, we propose an approach to non-intrusive human-computer interfacing in which the user's head and pupils are monitored by computer vision for interaction control within on-screen environments. Two different visual pointers are defined, allowing simultaneous and decoupled navigation and selection in 3D and 2D graphic scenarios. The pointers intercept user actions, whose geometry is then remapped onto the environment by a drag and click metaphor providing dialogue with a natural semantics.
Introduction
In the last few years, a huge effort has been made towards building advanced environments for human-machine interaction and human-human communication mediated by computers. Such environments can improve both the activity and satisfaction of individual users and computer supported cooperative work. Apart from some obvious implementation and design differences, virtual reality [255], augmented reality [309] and smart room [235] environments share the very same principle of providing users with a more natural dialogue with (and through) the computer with respect to the past. This is obtained through a careful interface design involving interface languages mimicking everyday experience and advanced interaction techniques.
Recently, the simultaneous growth of computing power and decrease of hardware costs, together with the development of specific algorithms and techniques, has encouraged the use of computer vision as a non intrusive technology for advanced human-machine interaction.
In this chapter, we present our approach to recognizing hand signs. It addresses three key aspects of the hand sign interpretation, the hand location, the hand shape, and the hand movement. The approach has two major components: (a) a prediction-and-verification segmentation scheme to segment the moving hand from its background; (b) a recognizer that recognizes the hand sign from the temporal sequence of segmented hand together with its global motion information. The segmentation scheme can deal with a large number of different hand shapes against complex backgrounds. In the recognition part, we use multiclass, multi-dimensional discriminant analysis in every internal node of a recursive partition tree to automatically select the most discriminating linear features for gesture classification. The method has been tested to recognize 28 classes of hand signs. The experimental results show that the system can achieve a 93% recognition rate for test sequences that have not been used in the training phase.
Introduction
The ability to interpret the hand gestures is essential if computer systems can interact with human users in a natural way. In this chapter, we present a new vision-based framework which allows the computer to interact with users through hand signs.
Since its first known dictionary was printed in 1856 [61], American Sign Language (ASL) is widely used in the deaf community as well as by the handicapped people who are not deaf [49].
Face and hand gestures are an important means of communication between humans. Similarly, automatic face and gesture recognition systems could be used for contact-less human-machine interaction. Developing such systems is difficult, however, because faces and hands are both complex and highly variable structures. We describe how flexible models can be used to represent the varying appearance of faces and hands and how these models can be used for tracking and interpretation. Experimental results are presented for face pose recovery, face identification, expression recognition, gender recognition and gesture interpretation.
Introduction
This chapter addresses the problem of locating and interpreting faces and hand gestures in images. By interpreting face images we mean recovering the 3D pose, identifying the individual and recognizing the expression and gender; for the hand images we mean recognizing the configuration of the fingers. In both cases different instances of the same class are not identical; for example, face images belonging to the same individual will vary because of changes in expression, lighting conditions, 3D pose and so on. Similarly hand images displaying the same gesture will vary in form.
We have approached these problems by modeling the ways in which the appearance of faces and hands can vary, using parametrised deformable models which take into account all the main sources of variability. A robust image search method [90, 89] is used to fit the models to new face/hand images recovering compact parametric descriptions.
In this chapter I describe ongoing research that seeks to provide a common framework for the generation and interpretation of spontaneous gesture in the context of speech. I present a testbed for this framework in the form of a program that generates speech, gesture, and facial expression from underlying rules specifying (a) what speech and gesture are generated on the basis of a given communicative intent, (b) how communicative intent is distributed across communicative modalities, and (c) where one can expect to find gestures with respect to the other communicative acts. Finally, I describe a system that has the capacity to interpret communicative facial, gestural, intonational, and verbal behaviors.
Introduction
I am addressing in this chapter one very particular use of the term “gesture” – that is, hand gestures that co-occur with spoken language. Why such a narrow focus, given that so much of the work on gesture in the human-computer interface community has focused on gestures as their own language – gestures that might replace the keyboard or mouse or speech as a direct command language? Because I don't believe that everyday human users have any more experience with, or natural affinity for, a “gestural language” than they have with DOS commands. We have plenty of experience with actions and the manipulation of objects. But the type of gestures defined as (Väänänen & Böhm, 1993) “body movements which are used to convey some information from one person to another” are in fact primarily found in association with spoken language (90% of gestures are found in the context of speech, according to McNeill, 1992).