We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This book is, in essence, the dissertation I submitted to the University of Edinburgh in early January 1994. My examiners, Peter Harrison of the Imperial College, and Stuart Anderson of the University of Edinburgh, suggested some corrections and revisions. Apart from those changes, most chapters remain unaltered except for minor corrections and reformatting. The exceptions are the first and final chapter.
Since the final chapter discusses several possible directions for future work, it is now supplemented with a section which reviews the progress which has been made in each of these directions since January 1994. There are now many more people interested in stochastic process algebras and their application to performance modelling. Moreover, since these researchers have backgrounds and motivations different from my own some of the most interesting new developments are outside the areas identified in the original conclusions of the thesis. Therefore the book concludes with a brief overview of the current status of the field which includes many recent references. This change to the structure of the book is reflected in the summary given in Chapter 1. No other chapters of the thesis have been updated to reflect more recent developments. A modified version of Chapter 8 appeared in the proceedings of the 2nd International Workshop on Numerical Solution of Markov Chains, January 1995.
I would like to thank my supervisor, Rob Pooley, for introducing me to performance modelling and giving me the job which brought me to Edinburgh initially.
This thesis has developed a coherent framework for analysing image sequences based on the affine camera, and has demonstrated the practical feasibility of recovering 3D structure and motion in a bottom–up fashion, using “corner” features. New algorithms have been proposed to compute affine structure, and these have then been applied to the problems of clustering and view transfer. The theory of affine epipolar geometry has been derived and applied to outlier rejection and rigid motion estimation. Due consideration has been paid to error and noise models, with a χ2 test serving as a termination criterion for cluster growth and outlier detection, and confidence limits in the motion parameters facilitating Kalman filtering.
On a practical level, all the algorithms have been implemented and tested on a wide range of sequences. The use of n points and m frames has lead to enhanced noise immunity and has also simplified the algorithms in important ways, e.g. local coordinate frames are no longer needed to compute affine structure or rigid motion parameters. Finally, the use of 3D information without explicit depth has been illustrated in a working system (e.g. for transfer).
In summary, the affine camera has been shown to provide a solid foundation both for understanding structure and motion under parallel projection, and for devising reliable algorithms.
Future work
There are many interesting problems for future work to address. First, the CI space interpretation of the motion segmentation problem is that each independently moving object contributes a different 3D linear subspace.
The first competence required of a motion analysis system is the accurate and robust measurement of image motion. This chapter addresses the problem of tracking independently–moving (and possibly non–rigid) objects in a long, monocular image sequence. “Corner features” are automatically identified in the images and tracked through successive frames, generating image trajectories. This system forms the low–level front–end of our architecture (cf. Figure 1.1), making reliable trajectory computation of the utmost importance, for these trajectories underpin all subsequent segmentation and motion estimation processes.
We build largely on the work of Wang and Brady [156, 157], and extend their successful corner–based stereo algorithm to the motion domain. Their key idea was to base correspondence on both similarity of local image structure and geometric proximity. There are, however, several ways in which motion correspondence is more complex than stereo correspondence [90]. For one thing, objects can change between temporal viewpoints in ways that they cannot between spatial viewpoints, e.g. their shape and reflectance can alter. For another, the epipolar constraint is no longer hard–wired by once–off calibration of a stereo–rig; motion induces variable epipolar geometry which has to be continuously updated (if the constraint is to be used). Furthermore, motion leads to arbitrarily long image sequences (instead of frame–pairs), which requires additional tracking machinery. The benefits are that temporal integration facilitates noise resistance, resolves ambiguities over time, and speeds up matching (via prediction).
Our framework has two parts: the matcher performs two–frame correspondence while the tracker maintains the multi-frame trajectories. Each corner is treated as an independent feature at this level (i.e. assigned an individual tracker as in [26]), and is tracked purely within the image plane. Section 2.2 justifies this feature–based approach and establishes the utility of corners as correspondence tokens.
This chapter tackles the motion estimation problem, using affine epipolar geometry as the tool. Given m distinct views of n points located on a rigid object, the task is to compute its 3D motion without any prior 3D knowledge. There are several reasons why many existing point–based motion algorithms are of limited practical use: the inevitable presence of noise is often ignored; unreasonable demands are often made on prior processing (e.g. a suitable perceptual frame must first be selected, the features must appear in every frame, etc.); algorithms often only work in special cases (e.g. rotation about a fixed axis); and some algorithms require batch processing, rather than more natural sequential processing.
Although the epipolar constraint has been widely used in perspective and projective motion applications [43, 57, 87] (e.g. to aid correspondence, recover the translation direction and compute rigid motion parameters), it has seldom been used under affine viewing conditions (though see [66, 79]). This chapter therefore makes the following contributions:
Affine epipolar geometry is related to the rigid motion parameters, and Koenderink and van Doom's novel motion representation is formalised [79]. The scale, cyclotorsion angle and projected axis of rotation are then computed directly from the epipolar geometry (i.e. using two views). The only camera calibration parameter needed here is aspect ratio. A suitable error model is also derived.
Images are processed in successive pairs of frames, facilitating extension to the m-view case in a sequential (rather than batch) processing mode.
Once the corner tracker has generated a set of image trajectories, the next task is to group these points into putative objects. The practice of classifying objects into sensible groupings is termed “clustering”, and is fundamental to many scientific disciplines. This chapter presents a novel clustering technique that groups points together on the basis of their affine structure and motion. The system copes with sparse, noisy and partially incorrect input data, and with scenes containing multiple, independently moving objects undergoing general 3D motion. The key contributions are as follows:
A graph theory framework is employed (in the spirit of [119]) using maximum affinity spanning trees (MAST's), and the clusters are computed by a local, parallel network, with each unit performing simple operations. The use of such networks has long been championed by Ullman, who has used them to fill in subjective contours [150], compute apparent motion [151] and detect salient curves [132].
Clustering occurs over multiple frames, unlike the more familiar two–frame formulations (e.g. [3, 80, 134]).
A graduated motion analysis scheme extends the much–used simplistic image motion models, e.g. grouping on the basis of parallel and equal image velocity vectors (as in [80, 134]) is only valid for a fronto–parallel plane translating parallel to the image. The layered complexity of our models utilises full 3D information where available, but doesn't use a more complex model than is required.
The termination criteria (to control cluster growth) are based on sound statistical noise models, in contrast to many heuristic measures and thresholds (e.g. [119, 134]).
Sight is the sense that provides the highest information content – in engineering terms, the highest bandwidth – to the human brain. A computer vision system, essentially a “TV camera connected to a computer”, aims to perform on a machine the tasks which our own visual system seems to perform so effortlessly. Since the world is constantly in motion, it comes as no surprise that time–varying imagery reveals valuable information about the environment. Indeed, some information is easier to obtain from a image sequence than from a single image [62]. Thus, as noted by Murray and Buxton, “understanding motion is a principal requirement for a machine or organism to interact meaningfully with its environment” [100] (page 1). For this reason, the analysis of image sequences to extract 3D motion and structure has been at the heart of computer vision research for the past decade [172].
The problem involves two key difficulties. First, the useful content of an image sequence is intricately coded and implicit in an enormous volume of sensory data. Making this information explicit entails significant data reduction, to decode the spatio–temporal correlations of the intensity values and eliminate redundancy. Second, information is lost in projecting the three spatial dimensions of the world onto the two dimensions of the image. Assumptions about the camera model and imaging geometry are therefore required.
This thesis develops new algorithms to interpret visual motion using a single camera, and demonstrates the practical feasibility of recovering scene structure and motion in a data-driven (or “bottom-up”) fashion. Section 1.2 outlines the basic themes and describes the system architecture.