Unsupervised Methods

Brian D. Ripley

doi:10.1017/CBO9780511812651.010

Unsupervised methods are used when no classes are defined a priori, or when they are but the data are to be used to confirm that these are suitable classes. Examples of the latter type are quite common in biology, where species are often defined by physical characteristics, and datasets of biochemical measurements become available. The interesting question is then whether the physical and biochemical measurements define the same classification. A variant of this occurs with our Leptograpsus crabs data. There the division into species was based on colour, and the interesting question is whether this is supported by morphological differences. Our analyses hitherto have been to find supporting morphological differences, but this begs the question of whether there might be even more striking differences unrelated to colour.

Unsupervised methods are generally designed for visualization, either 0 to show views of the data which indicate groups, or to show affinities between the examples by displaying similar examples close together. Dendrograms are a one-dimensional display of similarity, with the height of the join indicating (dis)similarity. For example, Figure 9.1 shows a dendrogram of the Cushing's syndrome data. Each pair is joined in the tree, and the height at which they are joined is an indication of their dissimilarity. This plot shows clearly that one point (labelled u) is very different from the rest, and does tend to group the diseases together, imperfectly. However, this is two-dimensional data, and the data can be plotted as in Figure 1.2 on page 11.

Book contents

9 - Unsupervised Methods

Summary

Access options

Book contents

9 - Unsupervised Methods

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive