Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- Part 1 Foundations
- Part 2 From Theory to Algorithms
- 9 Linear Predictors
- 10 Boosting
- 11 Model Selection and Validation
- 12 Convex Learning Problems
- 13 Regularization and Stability
- 14 Stochastic Gradient Descent
- 15 Support Vector Machines
- 16 Kernel Methods
- 17 Multiclass, Ranking, and Complex Prediction Problems
- 18 Decision Trees
- 19 Nearest Neighbor
- 20 Neural Networks
- Part 3 Additional Learning Models
- Part 4 Advanced Theory
- Appendix A Technical Lemmas
- Appendix B Measure Concentration
- Appendix C Linear Algebra
- References
- Index
16 - Kernel Methods
from Part 2 - From Theory to Algorithms
Published online by Cambridge University Press: 05 July 2014
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- Part 1 Foundations
- Part 2 From Theory to Algorithms
- 9 Linear Predictors
- 10 Boosting
- 11 Model Selection and Validation
- 12 Convex Learning Problems
- 13 Regularization and Stability
- 14 Stochastic Gradient Descent
- 15 Support Vector Machines
- 16 Kernel Methods
- 17 Multiclass, Ranking, and Complex Prediction Problems
- 18 Decision Trees
- 19 Nearest Neighbor
- 20 Neural Networks
- Part 3 Additional Learning Models
- Part 4 Advanced Theory
- Appendix A Technical Lemmas
- Appendix B Measure Concentration
- Appendix C Linear Algebra
- References
- Index
Summary
In the previous chapter we described the SVM paradigm for learning halfspaces in high dimensional feature spaces. This enables us to enrich the expressive power of halfspaces by first mapping the data into a high dimensional feature space, and then learning a linear predictor in that space. This is similar to the AdaBoost algorithm, which learns a composition of a halfspace over base hypotheses. While this approach greatly extends the expressiveness of halfspace predictors, it raises both sample complexity and computational complexity challenges. In the previous chapter we tackled the sample complexity issue using the concept of margin. In this chapter we tackle the computational complexity challenge using the method of kernels.
We start the chapter by describing the idea of embedding the data into a high dimensional feature space. We then introduce the idea of kernels. A kernel is a type of a similarity measure between instances. The special property of kernel similarities is that they can be viewed as inner products in some Hilbert space (or Euclidean space of some high dimension) to which the instance space is virtually embedded. We introduce the “kernel trick” that enables computationally efficient implementation of learning, without explicitly handling the high dimensional representation of the domain instances. Kernel based learning algorithms, and in particular kernel-SVM, are very useful and popular machine learning tools. Their success may be attributed both to being flexible for accommodating domain specific prior knowledge and to having a well developed set of efficient implementation algorithms.
- Type
- Chapter
- Information
- Understanding Machine LearningFrom Theory to Algorithms, pp. 179 - 189Publisher: Cambridge University PressPrint publication year: 2014