Book contents
- Frontmatter
- Contents
- Preface
- 1 Basic Concepts in Probability and Statistics
- 2 Hypothesis Tests
- 3 Confidence Intervals
- 4 Statistical Tests Based on Ranks
- 5 Introduction to Stochastic Processes
- 6 The Power Spectrum
- 7 Introduction to Multivariate Methods
- 8 Linear Regression: Least Squares Estimation
- 9 Linear Regression: Inference
- 10 Model Selection
- 11 Screening: A Pitfall in Statistics
- 12 Principal Component Analysis
- 13 Field Significance
- 14 Multivariate Linear Regression
- 15 Canonical Correlation Analysis
- 16 Covariance Discriminant Analysis
- 17 Analysis of Variance and Predictability
- 18 Predictable Component Analysis
- 19 Extreme Value Theory
- 20 Data Assimilation
- 21 Ensemble Square Root Filters
- Appendix
- References
- Index
11 - Screening: A Pitfall in Statistics
Published online by Cambridge University Press: 03 February 2022
- Frontmatter
- Contents
- Preface
- 1 Basic Concepts in Probability and Statistics
- 2 Hypothesis Tests
- 3 Confidence Intervals
- 4 Statistical Tests Based on Ranks
- 5 Introduction to Stochastic Processes
- 6 The Power Spectrum
- 7 Introduction to Multivariate Methods
- 8 Linear Regression: Least Squares Estimation
- 9 Linear Regression: Inference
- 10 Model Selection
- 11 Screening: A Pitfall in Statistics
- 12 Principal Component Analysis
- 13 Field Significance
- 14 Multivariate Linear Regression
- 15 Canonical Correlation Analysis
- 16 Covariance Discriminant Analysis
- 17 Analysis of Variance and Predictability
- 18 Predictable Component Analysis
- 19 Extreme Value Theory
- 20 Data Assimilation
- 21 Ensemble Square Root Filters
- Appendix
- References
- Index
Summary
Scientists often propose hypotheses based on patterns seen in data. However, if a scientist tests a hypothesis using the same data that suggested the hypothesis, then that scientist has violated a rule of science. The rule is: test hypotheses with independent data. This rule may sound so obvious as to be hardly worth mentioning. In fact, this mistake occurs frequently, especially when analyzing large data sets. Among the many pitfalls in statistics, screening is particularly serious. Screening is the process of evaluating a property for a large number of samples and then selecting samples in which that property is extreme. Screening is closely related to data fishing, data dredging, or data snooping. After a sample has been selected through screening, classical hypothesis tests exhibit selection bias. Quantifying the effect of screening often reveals that it creates biases that are substantially larger than one might guess. This chapter explains the concept of screening and illustrates it through examples from selecting predictors, interpreting correlation maps, and identifying change points.
- Type
- Chapter
- Information
- Statistical Methods for Climate Scientists , pp. 255 - 272Publisher: Cambridge University PressPrint publication year: 2022