Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Probability
- 3 Statistical inference
- 4 Probability distribution functions
- 5 Nonparametric statistics
- 6 Data smoothing: density estimation
- 7 Regression
- 8 Multivariate analysis
- 9 Clustering, classification and data mining
- 10 Nondetections: censored and truncated data
- 11 Time series analysis
- 12 Spatial point processes
- Appendix A Notation and acronyms
- Appendix B Getting started with R
- Appendix C Astronomical datasets
- References
- Subject index
- R and CRAN commands
- Plate section
8 - Multivariate analysis
Published online by Cambridge University Press: 05 November 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Probability
- 3 Statistical inference
- 4 Probability distribution functions
- 5 Nonparametric statistics
- 6 Data smoothing: density estimation
- 7 Regression
- 8 Multivariate analysis
- 9 Clustering, classification and data mining
- 10 Nondetections: censored and truncated data
- 11 Time series analysis
- 12 Spatial point processes
- Appendix A Notation and acronyms
- Appendix B Getting started with R
- Appendix C Astronomical datasets
- References
- Subject index
- R and CRAN commands
- Plate section
Summary
The astronomical context
Whenever an astronomer is faced with a dataset that can be presented as a table — rows representing celestial objects and columns representing measured or inferred properties — then the many tools of multivariate statistics come into play. Multivariate datasets also arise in other situations. Astronomical images can be viewed as tables of three variables: right ascension, declination and brightness. Here the spatial variables are in a fixed lattice while the brightness is a random variable. An astronomical datacube has a fourth variable that may be wavelength (for spectro-imaging) or time (for multi-epoch imaging). High-energy (X-ray, gamma-ray, neutrino) detectors give tables where each row is a photon or event with columns representing properties such as arrival direction and energy. Calculations arising from astrophysical models also produce outputs that can be formulated as multivariate datasets, such as N-body simulations of star or galaxy interactions, or hydrodynamical simulations of gas densities and motion.
For multivariate datasets, we designate n for the number of objects in the dataset and p for the number of variables, the dimensionality of the problem. In traditional multivariate analysis, n is large compared to p; statistical methods for high-dimensional problems with p > n are now under development. The variables can have a variety of forms: real numbers representing measurements in any physical unit; integer values representing counts of some variable; ordinal values representing a sequence; binary variables representing “Yes/No” categories; or nonsequential categorical indicators.
We address multivariate issues in several chapters of this volume. The present chapter on multivariate analysis considers datasets that are commonly displayed in a table of objects and properties.
- Type
- Chapter
- Information
- Modern Statistical Methods for AstronomyWith R Applications, pp. 190 - 221Publisher: Cambridge University PressPrint publication year: 2012