Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Basic Probability Inequalities for Sums of Independent Random Variables
- 3 Uniform Convergence and Generalization Analysis
- 4 Empirical Covering Number Analysis and Symmetrization
- 5 Covering Number Estimates
- 6 Rademacher Complexity and Concentration Inequalities
- 7 Algorithmic Stability Analysis
- 8 Model Selection
- 9 Analysis of Kernel Methods
- 10 Additive and Sparse Models
- 11 Analysis of Neural Networks
- 12 Lower Bounds and Minimax Analysis
- 13 Probability Inequalities for Sequential Random Variables
- 14 Basic Concepts of Online Learning
- 15 Online Aggregation and Second-Order Algorithms
- 16 Multiarmed Bandits
- 17 Contextual Bandits
- 18 Reinforcement Learning
- Appendix A Basics of Convex Analysis
- Appendix B f-divergence of Probability Measures
- References
- Author Index
- Subject Index
17 - Contextual Bandits
Published online by Cambridge University Press: 20 July 2023
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Basic Probability Inequalities for Sums of Independent Random Variables
- 3 Uniform Convergence and Generalization Analysis
- 4 Empirical Covering Number Analysis and Symmetrization
- 5 Covering Number Estimates
- 6 Rademacher Complexity and Concentration Inequalities
- 7 Algorithmic Stability Analysis
- 8 Model Selection
- 9 Analysis of Kernel Methods
- 10 Additive and Sparse Models
- 11 Analysis of Neural Networks
- 12 Lower Bounds and Minimax Analysis
- 13 Probability Inequalities for Sequential Random Variables
- 14 Basic Concepts of Online Learning
- 15 Online Aggregation and Second-Order Algorithms
- 16 Multiarmed Bandits
- 17 Contextual Bandits
- 18 Reinforcement Learning
- Appendix A Basics of Convex Analysis
- Appendix B f-divergence of Probability Measures
- References
- Author Index
- Subject Index
Summary
In the standard multiarmed bandit problem, one observes a fixed number of arms. To achieve optimal regret bounds, one estimates confidence intervals of the arms by counting. In the contextual bandit problem, one observes side information for each arm, which can be used as features for more accurate confidence interval estimation. This chapter studies contextual bandit problems with both linear and nonlinear models
- Type
- Chapter
- Information
- Mathematical Analysis of Machine Learning Algorithms , pp. 345 - 372Publisher: Cambridge University PressPrint publication year: 2023