Search results for Pattern Recognition and Machine Learning

9 - Survival Analysis and the EM Algorithm
from Part II - Early Computer-Age Methods
Bradley Efron, Stanford University, California, Trevor Hastie, Stanford University, California
Book:

Computer Age Statistical Inference, Student Edition

Published online:

26 October 2021

Print publication:

17 June 2021, pp 138-162
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

20 - Inference After Model Selection
from Part III - Twenty-First-Century Topics
Bradley Efron, Stanford University, California, Trevor Hastie, Stanford University, California
Book:

Computer Age Statistical Inference, Student Edition

Published online:

26 October 2021

Print publication:

17 June 2021, pp 407-434
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

21 - Empirical Bayes Estimation Strategies
from Part III - Twenty-First-Century Topics
Bradley Efron, Stanford University, California, Trevor Hastie, Stanford University, California
Book:

Computer Age Statistical Inference, Student Edition

Published online:

26 October 2021

Print publication:

17 June 2021, pp 435-460
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Deep Learning in Science

Pierre Baldi
Published online:

17 April 2021

Print publication:

01 July 2021
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This is the first rigorous, self-contained treatment of the theory of deep learning. Starting with the foundations of the theory and building it up, this is essential reading for any scientists, instructors, and students interested in artificial intelligence and deep learning. It provides guidance on how to think about scientific questions, and leads readers through the history of the field and its fundamental connections to neuroscience. The author discusses many applications to beautiful problems in the natural sciences, in physics, chemistry, and biomedicine. Examples include the search for exotic particles and dark matter in experimental physics, the prediction of molecular properties and reaction outcomes in chemistry, and the prediction of protein structures and the diagnostic analysis of biomedical images in the natural sciences. The text is accompanied by a full set of exercises at different difficulty levels and encourages out-of-the-box thinking.

6 - Uncertainty Relations and Sparse Signal Recovery
- By Erwin Riegler, Helmut Bökei
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 163-196
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter provides an introduction to uncertainty relations underlying sparse signal recovery. We start with the seminal work by Donoho and Stark (1989), which defines uncertainty relations as upper bounds on the operator norm of the band-limitation operator followed by the time-limitation operator, generalize this theory to arbitrary pairs of operators, and then develop, out of this generalization, the coherence-based uncertainty relations due to Elad and Bruckstein (2002), plus uncertainty relations in terms of concentration of the 1-norm or 2-norm. The theory is completed with set-theoretic uncertainty relations which lead to best possible recovery thresholds in terms of a general measure of parsimony, the Minkowski dimension. We also elaborate on the remarkable connection between uncertainty relations and the “large sieve,” a family of inequalities developed in analytic number theory. We show how uncertainty relations allow one to establish fundamental limits of practical signal recovery problems such as inpainting, declipping, super-resolution, and denoising of signals corrupted by impulse noise or narrowband interference.

3 - Compressed Sensing via Compression Codes
- By Shirin Jalali, H. Vincent Poor
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 72-103
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In compressed sensing (CS) a signal x ∈ Rn is measured as y =A x + z, where A ∈ Rm×n (m<n) and z ∈ Rm denote the sensing matrix and measurement noise. The goal is to recover x from measurements y when m<n. CS is possible because we typically want to capture highly structured signals, and recovery algorithms take advantage of a signal’s structure to solve the under-determined system of linear equations. As in CS, data-compression codes take advantage of a signal’s structure to encode it efficiently. Structures used by compression codes are much more elaborate than those used by CS algorithms. Using more complex structures in CS, like those employed by data-compression codes, potentially leads to more efficient recovery methods requiring fewer linear measurements or giving better reconstruction quality. We establish connections between data compression and CS, giving CS recovery methods based on compression codes, which indirectly take advantage of all structures used by compression codes. This elevates the class of structures used by CS algorithms to those used by compression codes, leading to more efficient CS recovery methods.

13 - Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits
- By Yihong Wu, Jiaming Xu
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 383-424
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter provides a survey of the common techniques for determining the sharp statistical and computational limits in high-dimensional statistical problems with planted structures, using community detection and submatrix detection problems as illustrative examples. We discuss tools including the first- and second-moment methods for analyzing the maximum-likelihood estimator, information-theoretic methods for proving impossibility results using mutual information and rate-distortion theory, and methods originating from statistical physics such as the interpolation method. To investigate computational limits, we describe a common recipe to construct a randomized polynomial-time reduction scheme that approximately maps instances of the planted clique problem to the problem of interest in total variation distance.

Index
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 529-538
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

15 - Network Functional Compression
- By Soheil Feizi, Muriel Mérd
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 455-486
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We study compression for function computation of sources at nodes in a network at receiver(s). The rate region of this problem has been considered under restrictive assumptions. We present results that significantly relax these assumptions. For a one-stage tree network, we characterize a rate region by a necessary and sufficient condition for any achievable coloring-based coding scheme, the coloring connectivity condition. We propose a modularized coding scheme based on graph colorings to perform arbitrarily closely to derived rate lower bounds. For a general tree network, we provide a rate lower bound based on graph entropies and show that it is tight for independent sources. We show that, in a general tree network case with independent sources, to achieve the rate lower bound, intermediate nodes should perform computations, but for a family of functions and random variables, which we call chain-rule proper sets, it suffices to have no computations at intermediate nodes to perform arbitrarily closely to the rate lower bound. We consider practicalities of coloring-based coding schemes and propose an efficient algorithm to compute a minimum-entropy coloring of a characteristic graph.

9 - Universal Clustering
- By Ravi Kiran Raman, Lav R. Varshney
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 263-301
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Clustering is a general term for techniques that, given a set of objects, aim to select those that are closer to one another than to the rest, according to a chosen notion of closeness. It is an unsupervised-learning problem since objects are not externally labeled by category. Much effort has been expended on finding natural mathematical definitions of closeness and then developing/evaluating algorithms in these terms. Many have argued that there is no domain-independent mathematical notion of similarity but that it is context-dependent; categories are perhaps natural in that people can evaluate them when they see them. Some have dismissed the problem of unsupervised learning in favor of supervised learning, saying it is not a powerful natural phenomenon. Yet, most learning is unsupervised. We largely learn how to think through categories by observing the world in its unlabeled state. Drawing on universal information theory, we ask whether there are universal approaches to unsupervised clustering. In particular, we consider instances wherein the ground-truth clusters are defined by the unknown statistics governing the data to be clustered.

Dedication
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp v-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

16 - An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation
- By Jonathan Scarlett, Volkan Cevher
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 487-528
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano’s inequality. In this chapter, we provide a survey of Fano’s inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization.

14 - Distributed Statistical Inference with Compressed Data
- By Wenwen Zhao, Lifeng Lai
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 425-454
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces basic ideas of information-theoretic models for distributed statistical inference problems with compressed data, and discusses current and future research directions and challenges in applying these models to various statistical learning problems. In these applications, data are distributed in multiple terminals, which can communicate with each other via limited-capacity channels. Instead of recovering data at a centralized location first and then performing inference, this chapter describes schemes that can perform statistical inference without recovering the underlying data. Information-theoretic tools are borrowed to characterize the fundamental limits of the classical statistical inference problems using compressed data directly. In this chapter, distributed statistical learning problems are first introduced. Then, models and results of distributed inference are discussed. Finally, new directions that generalize and improve the basic scenarios are described.

Frontmatter
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

10 - Information-Theoretic Stability and Generalization
- By Maxim Raginsky, Alexander Rakhlin, Aolin Xu
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 302-329
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Machine-learning algorithms can be viewed as stochastic transformations that map training data to hypotheses. Following Bousquet and Elisseeff, we say such an algorithm is stable if its output does not depend too much on any individual training example. Since stability is closely connected to generalization capabilities of learning algorithms, it is of interest to obtain sharp quantitative estimates on the generalization bias of machine-learning algorithms in terms of their stability properties. We describe several information-theoretic measures of algorithmic stability and illustrate their use for upper-bounding the generalization bias of learning algorithms. Specifically, we relate the expected generalization error of a learning algorithm to several information-theoretic quantities that capture the statistical dependence between the training data and the hypothesis. These include mutual information and erasure mutual information, and their counterparts induced by the total variation distance. We illustrate the general theory through examples, including the Gibbs algorithm and differentially private algorithms, and discuss strategies for controlling the generalization error.

11 - Information Bottleneck and Representation Learning
- By Pablo Piantanida, Leonardo Rey Vega
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 330-358
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A grand challenge in representation learning is the development of computational algorithms that learn the explanatory factors of variation behind high-dimensional data. Representation models (encoders) are often determined for optimizing performance on training data when the real objective is to generalize well to other (unseen) data. This chapter provides an overview of fundamental concepts in statistical learning theory and the information-bottleneck principle. This serves as a mathematical basis for the technical results, in which an upper bound to the generalization gap corresponding to the cross-entropy risk is given. When this penalty term times a suitable multiplier and the cross-entropy empirical risk are minimized jointly, the problem is equivalent to optimizing the information-bottleneck objective with respect to the empirical data distribution. This result provides an interesting connection between mutual information and generalization, and helps to explain why noise injection during the training phase can improve the generalization ability of encoder models and enforce invariances in the resulting representations.

Contents
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp vii-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp xiii-xvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Understanding Phase Transitions via Mutual Information and MMSE
- By Galen Reeves, Henry D. Pfister
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 197-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The ability to understand and solve high-dimensional inference problems is essential for modern data science. This chapter examines high-dimensional inference problems through the lens of information theory and focuses on the standard linear model as a canonical example that is both rich enough to be practically useful and simple enough to be studied rigorously. In particular, this model can exhibit phase transitions where an arbitrarily small change in the model parameters can induce large changes in the quality of estimates. For this model, the performance of optimal inference can be studied using the replica method from statistical physics but, until recently, it was not known whether the resulting formulas were actually correct. In this chapter, we present a tutorial description of the standard linear model and its connection to information theory. We also describe the replica prediction for this model and outline the authors’ recent proof that it is exact.

2 - An Information-Theoretic Approach to Analog-to-Digital Compression
- By Alon Kipnis, Yonina C. Eldar, Andrea J. Goldsmith
Edited by Miguel R. D. Rodrigues, University College London, Yonina C. Eldar, Weizmann Institute of Science, Israel
Book:

Information-Theoretic Methods in Data Science

Published online:

22 March 2021

Print publication:

08 April 2021, pp 44-71
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Processing, storing, and communicating information that originates as an analog phenomenon involve conversion of the information to bits. This conversion can be described by the combined effect of sampling and quantization. The digital representation in this procedure is achieved by first sampling the analog signal so as to represent it by a set of discrete-time samples and then quantizing these samples to a finite number of bits. Traditionally, these two operations are considered separately. The sampler is designed to minimize information loss due to sampling based on prior assumptions about the continuous-time input. The quantizer is designed to represent the samples as accurately as possible, subject to the constraint on the number of bits that can be used in the representation. The goal of this chapter is to revisit this paradigm by considering the joint effect of these two operations and to illuminate the dependence between them.

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2190 results in Pattern Recognition and Machine Learning

9 - Survival Analysis and the EM Algorithm

20 - Inference After Model Selection

21 - Empirical Bayes Estimation Strategies

Deep Learning in Science

6 - Uncertainty Relations and Sparse Signal Recovery

Summary

3 - Compressed Sensing via Compression Codes

Summary

13 - Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits

Summary

Index

15 - Network Functional Compression

Summary

9 - Universal Clustering

Summary

Dedication

16 - An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation

Summary

14 - Distributed Statistical Inference with Compressed Data

Summary

Frontmatter

10 - Information-Theoretic Stability and Generalization

Summary

11 - Information Bottleneck and Representation Learning

Summary

Contents

Preface

7 - Understanding Phase Transitions via Mutual Information and MMSE

Summary

2 - An Information-Theoretic Approach to Analog-to-Digital Compression

Summary

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2190 results in Pattern Recognition and Machine Learning

Deep Learning in Science

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary