Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-m6dg7 Total loading time: 0 Render date: 2024-11-02T20:47:15.836Z Has data issue: false hasContentIssue false

9 - Clustering, classification and data mining

Published online by Cambridge University Press:  05 November 2012

Eric D. Feigelson
Affiliation:
Pennsylvania State University
G. Jogesh Babu
Affiliation:
Pennsylvania State University
Get access

Summary

Multivariate analysis discussed in Chapter 8 seeks to characterize structural relationships among the p variables that may be present in addition to random scatter. The primary structural relations may link the subpopulations without characterizing the structure of any one population.

In such cases, the scientist should first attempt to discriminate the subpopulations. This is the subject of multivariate clustering and classification. Clustering refers to situations where the subpopulations must be estimated from the dataset alone whereas classification refers to situationswhere training datasets of known populations are available independently of the dataset under study. When the datasets are very large with well-characterized training sets, classification is a major component of data mining. The efforts to find concentrations in a multivariate distribution of points are closely allied with clustering analysis of spatial distribution when p = 2 or 3; for such low-p problems, the reader is encouraged to examine Chapter 12 along with the present discussion.

The astronomical context

Since the advent of astrophotography and spectroscopy over a century ago, astronomers have faced the challenge of characterizing and understanding vast numbers of asteroids, stars, galaxies and other cosmic populations. A crucial step towards astrophysical understanding was the classification of objects into distinct, and often ordered, categories which contain objects sharing similar properties. Over a century ago, A. J. Cannon examined hundreds of thousands of low-resolution photographic stellar spectra, classifying them in the OBAFGKM sequence of decreasing surface temperature.

Type
Chapter
Information
Modern Statistical Methods for Astronomy
With R Applications
, pp. 222 - 260
Publisher: Cambridge University Press
Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×