Book contents
- Frontmatter
- Contents
- Preface
- Acronyms
- 1 Introduction
- 2 Machine Learning and Statistics Overview
- 3 Performance Measures I
- 4 Performance Measures II
- 5 Error Estimation
- 6 Statistical Significance Testing
- 7 Datasets and Experimental Framework
- 8 Recent Developments
- 9 Conclusion
- Appendix A Statistical Tables
- Appendix B Additional Information on the Data
- Appendix C Two Case Studies
- Bibliography
- Index
6 - Statistical Significance Testing
Published online by Cambridge University Press: 05 August 2011
- Frontmatter
- Contents
- Preface
- Acronyms
- 1 Introduction
- 2 Machine Learning and Statistics Overview
- 3 Performance Measures I
- 4 Performance Measures II
- 5 Error Estimation
- 6 Statistical Significance Testing
- 7 Datasets and Experimental Framework
- 8 Recent Developments
- 9 Conclusion
- Appendix A Statistical Tables
- Appendix B Additional Information on the Data
- Appendix C Two Case Studies
- Bibliography
- Index
Summary
The advances in performance measure characterization discussed in Chapters 3 and 4 have armed researchers with more precise estimates of classifier performance. However, these are not by themselves sufficient to fully evaluate the difference in performances between classifiers on one or more test domains. More precisely, even though the performance of different classifiers may be shown to be different on specified sets of data, it needs to be confirmed whether the observed differences are statistically significant and not merely coincidental. Chapter 5 started to look at this issue, but focused primarily on the objectivity and stability of the results. This can be construed as the first step to assessing the significance of a difference. Only in the case of the comparison of two classifiers on a single domain did the discussion actually move on to significance issues. Statistical significance testing, which is the subject of this chapter, enables researchers to move on to more precise assessments of significance of the results obtained (within certain constraints). The importance of statistical significance testing hence cannot be overstated. Nonetheless, the use of available statistical tools for such testing in the fields of machine learning and data mining has been limited at best. Researchers have concentrated on using the paired t test, many times inappropriately, to confirm the difference in classifiers' performance. Moreover, this has sometimes been done at the cost of excluding other, more appropriate, tests.
- Type
- Chapter
- Information
- Evaluating Learning AlgorithmsA Classification Perspective, pp. 206 - 291Publisher: Cambridge University PressPrint publication year: 2011
- 1
- Cited by