We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Chapter 10, the book turns to practical considerations. In particular, it surveys the software engineering discipline with its rigorous software testing methods, and asks how these techniques can be adapted to the subfield of machine learning. The adaptation is not straightforward, as machine learning algorithms behave in non-deterministic ways aggravated by data, algorithm, and platform imperfections. These issues are discussed and some of the steps taken to handle them are reviewed. The chapter then turns to the practice of online testing and addresses the ethics of machine learning deployment. The chapter concludes with a discussion of current industry practice along with suggestions on how to improve the safety of industrial deployment in the future.
Chapter 5 starts with an analysis of the classification metrics presented in Chapter 4, outlining their strengths and weaknesses. It then presents more advanced metrics such as Cohen’s kappa, Youden’s index, and likelihood ratios. This is followed by a discussion about data and classifier complexities such as the class imbalance problem and classifier uncertainty that require particular scrutiny to ensure that the results are trustworthy. The chapter concludes with a detailed discussion of ROC analysis to complement its introduction in Chapter 4, and a presentation of other visualization metrics.
Chapter 3 discusses the field of machine learning from a theoretical perspective. The review will advance the discussion of advanced metrics in Chapter 5 and error estimation methods in Chapter 6. The specific concepts surveyed in this chapter include loss functions, empirical risk, generalization error, empirical and structural risk minimization, regularization, and learning bias. The unsupervised learning paradigm is also reviewed and the chapter concludes with a discussion of the bias/variance tradeoff.
Chapter 9 is devoted to evaluation methods for an important category of classical learning paradigms left out of Chapter 8 so as to receive fuller coverage: unsupervised learning. In this chapter, a number of different unsupervised learning schemes are considered and their evaluation discussed. The particular tasks considered are clustering and hierarchical clustering, dimensionality reduction, latent variable modeling, and generative models including probabilistic PCA, variational autoencoders, and GANs. Evaluation methodology is discussed discussed for each of these tasks.
Chapter 11 completes the discussion of Chapter 10 by raising the question of how to practice machine learning in a responsible manner. It describes the dangers of data bias, and surveys data bias detection and mitigation methods; it lists the benefits of explainability and discusses techniques, such as LIME and SHAP, that have been proposed to explain the decisions made by opaque models; it underlines the risks of discrimination and discusses how to enhance fairness and prevent discrimination in machine learning algorithms. The issues of privacy and security are then presented, and the need to practice human-centered machine learning emphasized. The chapter concludes with the important issues of repeatability, reproducibility, and replicability in machine learning.
Chapter 1 discusses the motivation for the book and the rationale for its organization into four parts: preliminary considerations, evaluation for classification, evaluation in other settings, and evaluation from a practical perspective. In more detail, the first part provides the statistical tools necessary for evaluation and reviews the main machine learning principles as well as frequently used evaluation practices. The second part discusses the most common setting in which machine learning evaluation has been applied: classification. The third part extends the discussion to other paradigms such as multi-label classification, regression analysis, data stream mining, and unsupervised learning. The fourth part broadens the conversation by moving it from the laboratory setting to the practical setting, specifically discussing issues of robustness and responsible deployment.
Chapter 8 introduces evaluation procedures for paradigms other than classification. In particular, it discusses evaluation for classical problems such as regression analysis, time-series analysis, outlier detection, and reinforcement learning, along with evaluation approaches for newer tasks such as positive-unlabelled classification, ordinal classification, multi-labeled classification, image segmentation, text generation, data stream mining, and lifelong learning.
In Chapter 7, the history of statistical analysis is reviewed and its legacy discussed. Four situations of interest to machine learning evaluation are subsequently discussed within different statistical paradigms: the comparison of two classifiers on a single domain; the comparison of multiple classifiers on a single domain; the comparison of two classifiers on multiple domains; and the comparison of multiple classifiers on multiple domains. The three statistical paradigms considered for each of these situations are the null hypothesis statistical testing (NHST) setting; an enhanced Fisher-flavored methodology that adds the notions of confidence intervals, effect size, and power analysis to NHST; and a newer approach based on Bayesian reasoning.
As machine learning applications gain widespread adoption and integration in a variety of applications, including safety and mission-critical systems, the need for robust evaluation methods grows more urgent. This book compiles scattered information on the topic from research papers and blogs to provide a centralized resource that is accessible to students, practitioners, and researchers across the sciences. The book examines meaningful metrics for diverse types of learning paradigms and applications, unbiased estimation methods, rigorous statistical analysis, fair training sets, and meaningful explainability, all of which are essential to building robust and reliable machine learning products. In addition to standard classification, the book discusses unsupervised learning, regression, image segmentation, and anomaly detection. The book also covers topics such as industry-strength evaluation, fairness, and responsible AI. Implementations using Python and scikit-learn are available on the book's website.
Maximise student engagement and understanding of matrix methods in data-driven applications with this modern teaching package. Students are introduced to matrices in two preliminary chapters, before progressing to advanced topics such as the nuclear norm, proximal operators and convex optimization. Highlighted applications include low-rank approximation, matrix completion, subspace learning, logistic regression for binary classification, robust PCA, dimensionality reduction and Procrustes problems. Extensively classroom-tested, the book includes over 200 multiple-choice questions suitable for in-class interactive learning or quizzes, as well as homework exercises (with solutions available for instructors). It encourages active learning with engaging 'explore' questions, with answers at the back of each chapter, and Julia code examples to demonstrate how the mathematics is actually used in practice. A suite of computational notebooks offers a hands-on learning experience for students. This is a perfect textbook for upper-level undergraduates and first-year graduate students who have taken a prior course in linear algebra basics.
In many applications, dimensionality reduction is important. Uses of dimensionality reduction include visualization, removing noise, and decreasing compute and memory requirements, such as for image compression. This chapter focuses on low-rank approximation of a matrix. There are theoretical models for why big matrices should be approximately low rank. Low-rank approximations are also used to compress large neural network models to reduce computation and storage. The chapter begins with the classic approach to approximating a matrix by a low-rank matrix, using a nonconvex formulation that has a remarkably simple singular value decomposition solution. It then applies this approach to the source localization application via the multidimensional scaling method and to the photometric stereo application. It then turns to convex formulations of low-rank approximation based on proximal operators that involve singular value shrinkage. It discusses methods for choosing the rank of the approximation, and describes the optimal shrinkage method called OptShrink. It discusses related dimensionality reduction methods including (linear) autoencoders and principal component analysis. It applies the methods to learning low-dimensionality subspaces from training data for subspace-based classification problems. Finally, it extends the method to streaming applications with time-varying data. This chapter bridges the classical singular value decomposition tool with modern applications in signal processing and machine learning.