We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
During the past half-century, exponential families have attained a position at the center of parametric statistical inference. Theoretical advances have been matched, and more than matched, in the world of applications, where logistic regression by itself has become the go-to methodology in medical statistics, computer-based prediction algorithms, and the social sciences. This book is based on a one-semester graduate course for first year Ph.D. and advanced master's students. After presenting the basic structure of univariate and multivariate exponential families, their application to generalized linear models including logistic and Poisson regression is described in detail, emphasizing geometrical ideas, computational practice, and the analogy with ordinary linear regression. Connections are made with a variety of current statistical methodologies: missing data, survival analysis and proportional hazards, false discovery rates, bootstrapping, and empirical Bayes analysis. The book connects exponential family theory with its applications in a way that doesn't require advanced mathematical preparation.
As introduced in Chapter 4, setting up a learning problem requires the selection of an inductive bias, which consists of a model class and a training algorithm. By the no-free-lunch theorem, this first step is essential in order to make generalization possible. A trained model generalizes if it performs well outside the training set, on average with respect to the unknown population distribution.
As discussed in Chapter 2, learning is needed when a “physics”-based mathematical model for the data generation mechanism is not available or is too complex to use for design purposes. As an essential benchmark setting, this chapter discusses the ideal case in which an accurate mathematical model is known, and hence learning is not necessary. As in large part of machine learning, we specifically focus on the problem of prediction. The goal is to predict a target variable given the observation of an input variable based on a mathematical model that describes the joint generation of both variables. Model-based prediction is also known as inference.
Previous chapters have formulated learning problems within a frequentist framework. Frequentist learning aims to determine a value of the model parameter that approximately minimizes the population loss. Since the population loss is not known, this is in practice done by minimizing an estimate of the population loss based on training data – the training loss .
As seen in the preceding chapter, when a reliable model is available to describe the probabilistic relationship between input variable x and target variable t, one is faced with a model-based prediction problem, also known as inference. Inference can in principle be optimally addressed by evaluating functions of the posterior distribution of the output t given the input x.
This chapter provides a refresher on probability and linear algebra with the aim of reviewing the necessary background for the rest of the book. Readers not familiar with probability and linear algebra are invited to first consult one of the standard textbooks mentioned in Recommended Resources, Sec. 2.14. Readers well versed on these topics may briefly skim through this chapter to get a sense of the notation used in the book.
In the examples studied in Chapter 4, the exact optimization of the (regularized) training loss was feasible through simple numerical procedures or via closed-form analytical solutions. In practice, exact optimization is often computationally intractable, and scalable implementations must rely on approximate optimization methods that perform local, iterative updates in search of an optimized solution. This chapter provides an introduction to local optimization methods for machine learning.
The previous chapter, as well as Chapter 4, have focused on supervised learning problems, which assume the availability of a labeled training set . A labeled data set consists of examples in the form of pairs (𝑥, 𝑡) of input 𝑥 and desired output 𝑡.
This chapter aims to motivate the study of machine learning, having in mind as the intended audience students and researchers with an engineering background.