Book contents
- Frontmatter
- Contents
- Preface
- Acronyms
- 1 Introduction
- 2 Machine Learning and Statistics Overview
- 3 Performance Measures I
- 4 Performance Measures II
- 5 Error Estimation
- 6 Statistical Significance Testing
- 7 Datasets and Experimental Framework
- 8 Recent Developments
- 9 Conclusion
- Appendix A Statistical Tables
- Appendix B Additional Information on the Data
- Appendix C Two Case Studies
- Bibliography
- Index
3 - Performance Measures I
Published online by Cambridge University Press: 05 August 2011
- Frontmatter
- Contents
- Preface
- Acronyms
- 1 Introduction
- 2 Machine Learning and Statistics Overview
- 3 Performance Measures I
- 4 Performance Measures II
- 5 Error Estimation
- 6 Statistical Significance Testing
- 7 Datasets and Experimental Framework
- 8 Recent Developments
- 9 Conclusion
- Appendix A Statistical Tables
- Appendix B Additional Information on the Data
- Appendix C Two Case Studies
- Bibliography
- Index
Summary
The evaluation of learning algorithms both in absolute terms and in relation to other algorithms involves addressing four main components:
performance measures,
error estimation,
statistical significance testing, and
test benchmark selection.
The first component concerns the property of the algorithm's performance that one wishes to measure. The answers are sought for questions such as these: Do we measure how accurate the algorithm is? If so, how do we define accuracy? Do we value one aspect of the algorithm's performance more than other? These and related issues are the focus of this and the next chapter. Once a performance measure is chosen, the next big concern is to estimate it in as unbiased a manner as possible, making the best possible use of the available data. This is the focus of Chapter 5, on performance estimation. Chapter 6 then focuses on investigating whether the differences in the performances obtained by the algorithm alone or in relation to others are statistically significant. Finally, we try to complete the puzzle with a discussion on what domains can be deemed suitable as benchmarks to evaluate learning approaches. This is the focus of Chapter 7.
Performance measures have arguably received the greatest amount of attention in the field. As a consequence of the inherent multidisciplinary nature of the machine learning tasks, different variants of these performance measures have been influenced by approaches from a variety of disciplines, including statistics, medicine, and information retrieval.
- Type
- Chapter
- Information
- Evaluating Learning AlgorithmsA Classification Perspective, pp. 74 - 110Publisher: Cambridge University PressPrint publication year: 2011
- 4
- Cited by