Search results for Pattern Recognition and Machine Learning

13 - Quasi-Monte Carlo Integration
from Part Two - Optimal Recovery
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 102-112
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter is concerned with quasi-Monte Carlo rules, i.e., multivariate quadrature rules featuring equal weights and deterministically chosen evaluation points. The variation of a function and the star discrepancy of a set of points are defined as a prerequisite to the Koksma--Hlawka inequality, which bounds the error of a quasi-Monte Carlo rule by the product of the variation and the star discrepancy. Finally, some evaluation points with small star discrepancy are uncovered, namely the Halton sequence and the Hammersley set.

Appendix B - Probability Theory
from Appendices
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 259-273
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This appendix recalls some key notions of probability theory, such as tails and moment generating functions. These notions are essential in the proof of some concentration inequalities, e.g., the McDiarmid inequality. In turn, these inequalities are used to establish the restricted isometry properties for sparse vectors and for low-rank matrices required earlier.

8 - Dimension Reduction
from Part One - Machine Learning
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 56-64
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The high dimensionality of datapoints often constitutes an obstacle to efficient computations. This chapter investigates three workarounds that replace the datapoints by some substitutes selected in a lower dimensional set. The first workaround is principal component analysis, where the lower dimensional set is a linear space spanned by the top singular vectors of the data matrix. The second workaround is a Johnson–Lindenstrauss projection, where the lower dimensional set is a random linear space. The third workaround is locally linear embedding, where the lower dimensional set is not chosen as a linear space anymore.

4 - Support Vector Machines
from Part One - Machine Learning
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 23-30
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter studies binary classification from a non-statistical viewpoint. For data that are linearly separable, the perceptron algorithm is presented first. It is followed by an optimization program, known as the hard support vector machine (SVM), consisting in maximizing the margin. For data that are not exactly linearly separable, this optimization program is relaxed into soft SVM. Finally, for data that are linearly separable only after applying a feature map, the representer theorem is used to validate the so-called kernel trick.

26 - Various Advantages of Depth
from Part Five - Neural Networks
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 226-238
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter corroborates the empirical belief in the superiority of deep networks over shallow ones. It does so by highlighting three situations where a clear advantage can be demonstrated. First, using depth two, there are activation functions turning neural networks into universal approximators even when restricting the width. Second, depth overcomes the limitation that shallow ReLU networks cannot generate compactly supported functions. Third, the approximation rate of Lipschitz functions by deep ReLU networks is better than that of shallow ones.

Part Three - Compressive Sensing
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 113-114
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Executive Summary
from Part One - Machine Learning
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 3-3
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Executive Summary
from Part Four - Optimization
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 159-159
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Clustering
from Part One - Machine Learning
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 47-55
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter considers the unsupervised learning task known as clustering, which consists in grouping unlabeled datapoints based on some similarity information. The single-linkage algorithm is examined first. Then, the Lloyd algorithm is presented to illustrate the center-based clustering strategy. Finally, the problem of detecting two communities via spectral clustering is analyzed under the stochastic block model.

24 - First Encounter with ReLU Networks
from Part Five - Neural Networks
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 208-215
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter starts by introducing the key concepts attached to neural networks, such as architecture, weights, biases, and activation function. It proceeds with the specific choice of the rectified linear unit (ReLU) as activation function. In this case, neural networks generate continuous piecewise linear (CPwL) functions. It is then shown that, in the univariate setting, any CPwL function can generated by a shallow ReLU network. This is no longer true in the multivariate setting, for which it is nonetheless shown that any CPwL function can generated by a deep ReLU network.

Contents
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp vii-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

27 - Tidbits on Neural Network Training
from Part Five - Neural Networks
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 239-246
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter touches on some aspects related to the training of neural networks. First, a method called backpropagation is presented as a way to efficiently compute gradients in descent algorithms when deep networks are used. Next, the chapterconsiders shallow networks in the overparametrized regime, and it is proved that the empirical-risk landscape, despite its nonconvexity, features no strict local minimizers. Finally, convolutional neural networks are briefly mentioned.

16 - Low-Rank Recovery from Linear Observations
from Part Three - Compressive Sensing
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 132-138
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, a variation of the standard compressive sensing problem is studied. In this variation, sparse vectors are replaced by low-rank matrices. Recovery is now performed by nuclear-norm minimization, with success characterized by an analog of the null space property for the observation map. This property holds with high probability for random observation maps, again as a consequence of an analog of the restricted isometry property. Finally, a formulation of nuclear norm minimization as a semidefinite program is justified.

Frontmatter
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix C - Functional Analysis
from Appendices
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 274-284
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This appendix states and proves several important results about completeness, convexity, and extreme points. These results, including the supporting hyperplane theorem and the Hahn–Banach extension theorem, are invoked throughout the text.

19 - Basic Convex Optimization
from Part Four - Optimization
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 160-168
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces the key concepts of optimization, such as objective function, constraints, local and global minimizers, and gradient descent algorithms. The rate of convergence for the steepest descent algorithm is analyzed when the objective function is smooth and convex or smooth and strongly convex. The analysis is extended to the stochastic gradient descent algorithm.

17 - Sparse Recovery from One-Bit Observations
from Part Three - Compressive Sensing
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 139-148
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter returns to the recovery of sparse vectors, but this time the linear measurements are quantized to retain only their signs. With the help of the restricted isometry property from ?2 to ?1, it is shown that the direction of sparse vectors can still be approximately recovered via a hard thresholding procedure or via a linear program. Furthermore, it is shown that the magnitude, too, can be recovered if an appropriate modification of the signed observations is allowed.

Appendix D - Matrix Analysis
from Appendices
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 285-296
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This appendix establishes some crucial results about eigenvalues, singular values, and matrix norms. Of particular importance are the Mirsky inequality and the von Neumann trace inequality.

23 - Instances of Nonconvex Optimization
from Part Four - Optimization
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 194-204
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents three examples of nonconvex optimization programs that can be solved (almost) exactly. The first example concerns quadratically constrained quadratic programs, whose treatment relies on the so-called S-lemma. The second example is dynamic programming, which is utilized to compute best approximants by sparse and disjointed vectors. The third example consists of projected gradient descent algorithms, including iterative hard thresholding algorithms.

Index
Simon Foucart, Texas A & M University
Book:

Mathematical Pictures at a Data Science Exhibition

Published online:

21 April 2022

Print publication:

28 April 2022, pp 315-318
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2190 results in Pattern Recognition and Machine Learning

13 - Quasi-Monte Carlo Integration

Summary

Appendix B - Probability Theory

Summary

8 - Dimension Reduction

Summary

4 - Support Vector Machines

Summary

26 - Various Advantages of Depth

Summary

Part Three - Compressive Sensing

Executive Summary

Executive Summary

7 - Clustering

Summary

24 - First Encounter with ReLU Networks

Summary

Contents

27 - Tidbits on Neural Network Training

Summary

16 - Low-Rank Recovery from Linear Observations

Summary

Frontmatter

Appendix C - Functional Analysis

Summary

19 - Basic Convex Optimization

Summary

17 - Sparse Recovery from One-Bit Observations

Summary

Appendix D - Matrix Analysis

Summary

23 - Instances of Nonconvex Optimization

Summary

Index

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2190 results in Pattern Recognition and Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary