Book contents
- Frontmatter
- Contents
- Preface
- List of abbreviations
- 1 Basic notions in classical data analysis
- 2 Linear multivariate statistical analysis
- 3 Basic time series analysis
- 4 Feed-forward neural network models
- 5 Nonlinear optimization
- 6 Learning and generalization
- 7 Kernel methods
- 8 Nonlinear classification
- 9 Nonlinear regression
- 10 Nonlinear principal component analysis
- 11 Nonlinear canonical correlation analysis
- 12 Applications in environmental sciences
- Appendices
- References
- Index
2 - Linear multivariate statistical analysis
Published online by Cambridge University Press: 04 May 2010
- Frontmatter
- Contents
- Preface
- List of abbreviations
- 1 Basic notions in classical data analysis
- 2 Linear multivariate statistical analysis
- 3 Basic time series analysis
- 4 Feed-forward neural network models
- 5 Nonlinear optimization
- 6 Learning and generalization
- 7 Kernel methods
- 8 Nonlinear classification
- 9 Nonlinear regression
- 10 Nonlinear principal component analysis
- 11 Nonlinear canonical correlation analysis
- 12 Applications in environmental sciences
- Appendices
- References
- Index
Summary
As one often encounters datasets with more than a few variables, multivariate statistical techniques are needed to extract the information contained in these datasets effectively. In the environmental sciences, examples of multivariate datasets are ubiquitous – the air temperatures recorded by all the weather stations around the globe, the satellite infrared images composed of numerous small pixels, the gridded output from a general circulation model, etc. The number of variables or time series from these datasets ranges from thousands to millions.Without a mastery of multivariate techniques, one is overwhelmed by these gigantic datasets. In this chapter, we review the principal component analysis method and its many variants, and the canonical correlation analysis method. These methods, using standard matrix techniques such as singular value decomposition, are relatively easy to use, but suffer from being linear, a limitation which will be lifted with neural network and kernel methods in later chapters.
Principal component analysis (PCA)
Geometric approach to PCA
We have a dataset with variables y1, …, ym. These variables have been sampled n times. In many situations, the m variables are m time series each containing n observations in time. For instance, one may have a dataset containing the monthly air temperature measured at m stations over n months.
- Type
- Chapter
- Information
- Machine Learning Methods in the Environmental SciencesNeural Networks and Kernels, pp. 20 - 57Publisher: Cambridge University PressPrint publication year: 2009