Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- Part 1 Foundations
- Part 2 From Theory to Algorithms
- 9 Linear Predictors
- 10 Boosting
- 11 Model Selection and Validation
- 12 Convex Learning Problems
- 13 Regularization and Stability
- 14 Stochastic Gradient Descent
- 15 Support Vector Machines
- 16 Kernel Methods
- 17 Multiclass, Ranking, and Complex Prediction Problems
- 18 Decision Trees
- 19 Nearest Neighbor
- 20 Neural Networks
- Part 3 Additional Learning Models
- Part 4 Advanced Theory
- Appendix A Technical Lemmas
- Appendix B Measure Concentration
- Appendix C Linear Algebra
- References
- Index
13 - Regularization and Stability
from Part 2 - From Theory to Algorithms
Published online by Cambridge University Press: 05 July 2014
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- Part 1 Foundations
- Part 2 From Theory to Algorithms
- 9 Linear Predictors
- 10 Boosting
- 11 Model Selection and Validation
- 12 Convex Learning Problems
- 13 Regularization and Stability
- 14 Stochastic Gradient Descent
- 15 Support Vector Machines
- 16 Kernel Methods
- 17 Multiclass, Ranking, and Complex Prediction Problems
- 18 Decision Trees
- 19 Nearest Neighbor
- 20 Neural Networks
- Part 3 Additional Learning Models
- Part 4 Advanced Theory
- Appendix A Technical Lemmas
- Appendix B Measure Concentration
- Appendix C Linear Algebra
- References
- Index
Summary
In the previous chapter we introduced the families of convex-Lipschitz-bounded and convex-smooth-bounded learning problems. In this section we show that all learning problems in these two families are learnable. For some learning problems of this type it is possible to show that uniform convergence holds; hence they are learnable using the ERM rule. However, this is not true for all learning problems of this type. Yet, we will introduce another learning rule and will show that it learns all convex-Lipschitz-bounded and convex-smooth-bounded learning problems.
The new learning paradigm we introduce in this chapter is called Regularized Loss Minimization, or RLM for short. In RLM we minimize the sum of the empirical risk and a regularization function. Intuitively, the regularization function measures the complexity of hypotheses. Indeed, one interpretation of the regularization function is the structural risk minimization paradigm we discussed in Chapter 7. Another view of regularization is as a stabilizer of the learning algorithm. An algorithm is considered stable if a slight change of its input does not change its output much. We will formally define the notion of stability (what we mean by “slight change of input” and by “does not change much the output”) and prove its close relation to learnability. Finally, we will show that using the squared l2 norm as a regularization function stabilizes all convex-Lipschitz or convex-smooth learning problems. Hence, RLM can be used as a general learning rule for these families of learning problems.
- Type
- Chapter
- Information
- Understanding Machine LearningFrom Theory to Algorithms, pp. 137 - 149Publisher: Cambridge University PressPrint publication year: 2014
- 2
- Cited by