Clustering, classification and data mining

Eric D. Feigelson; G. Jogesh Babu

doi:10.1017/CBO9781139015653.010

9 - Clustering, classification and data mining

Published online by Cambridge University Press: 05 November 2012

Eric D. Feigelson and

G. Jogesh Babu

Show author details

Eric D. Feigelson: Affiliation:
Pennsylvania State University
G. Jogesh Babu: Affiliation:
Pennsylvania State University

Book contents

Get access

Summary

Multivariate analysis discussed in Chapter 8 seeks to characterize structural relationships among the p variables that may be present in addition to random scatter. The primary structural relations may link the subpopulations without characterizing the structure of any one population.

In such cases, the scientist should first attempt to discriminate the subpopulations. This is the subject of multivariate clustering and classification. Clustering refers to situations where the subpopulations must be estimated from the dataset alone whereas classification refers to situationswhere training datasets of known populations are available independently of the dataset under study. When the datasets are very large with well-characterized training sets, classification is a major component of data mining. The efforts to find concentrations in a multivariate distribution of points are closely allied with clustering analysis of spatial distribution when p = 2 or 3; for such low-p problems, the reader is encouraged to examine Chapter 12 along with the present discussion.

The astronomical context

Since the advent of astrophotography and spectroscopy over a century ago, astronomers have faced the challenge of characterizing and understanding vast numbers of asteroids, stars, galaxies and other cosmic populations. A crucial step towards astrophysical understanding was the classification of objects into distinct, and often ordered, categories which contain objects sharing similar properties. Over a century ago, A. J. Cannon examined hundreds of thousands of low-resolution photographic stellar spectra, classifying them in the OBAFGKM sequence of decreasing surface temperature.

Type: Chapter
Information: Modern Statistical Methods for Astronomy
With R Applications
, pp. 222 - 260

DOI: https://doi.org/10.1017/CBO9781139015653.010 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

9 - Clustering, classification and data mining

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

9 - Clustering, classification and data mining

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive