Book contents
- Frontmatter
- Contents
- Preface
- Notation
- 1 Introduction and Examples
- 2 Statistical Decision Theory
- 3 Linear Discriminant Analysis
- 4 Flexible Discriminants
- 5 Feed-forward Neural Networks
- 6 Non-parametric Methods
- 7 Tree-structured Classifiers
- 8 Belief Networks
- 9 Unsupervised Methods
- 10 Finding Good Pattern Features
- A Statistical Sidelines
- Glossary
- References
- Author Index
- Subject Index
9 - Unsupervised Methods
Published online by Cambridge University Press: 05 August 2014
- Frontmatter
- Contents
- Preface
- Notation
- 1 Introduction and Examples
- 2 Statistical Decision Theory
- 3 Linear Discriminant Analysis
- 4 Flexible Discriminants
- 5 Feed-forward Neural Networks
- 6 Non-parametric Methods
- 7 Tree-structured Classifiers
- 8 Belief Networks
- 9 Unsupervised Methods
- 10 Finding Good Pattern Features
- A Statistical Sidelines
- Glossary
- References
- Author Index
- Subject Index
Summary
Unsupervised methods are used when no classes are defined a priori, or when they are but the data are to be used to confirm that these are suitable classes. Examples of the latter type are quite common in biology, where species are often defined by physical characteristics, and datasets of biochemical measurements become available. The interesting question is then whether the physical and biochemical measurements define the same classification. A variant of this occurs with our Leptograpsus crabs data. There the division into species was based on colour, and the interesting question is whether this is supported by morphological differences. Our analyses hitherto have been to find supporting morphological differences, but this begs the question of whether there might be even more striking differences unrelated to colour.
Unsupervised methods are generally designed for visualization, either 0 to show views of the data which indicate groups, or to show affinities between the examples by displaying similar examples close together. Dendrograms are a one-dimensional display of similarity, with the height of the join indicating (dis)similarity. For example, Figure 9.1 shows a dendrogram of the Cushing's syndrome data. Each pair is joined in the tree, and the height at which they are joined is an indication of their dissimilarity. This plot shows clearly that one point (labelled u) is very different from the rest, and does tend to group the diseases together, imperfectly. However, this is two-dimensional data, and the data can be plotted as in Figure 1.2 on page 11.
- Type
- Chapter
- Information
- Pattern Recognition and Neural Networks , pp. 287 - 326Publisher: Cambridge University PressPrint publication year: 1996
- 1
- Cited by