Book contents
- Frontmatter
- Contents
- List of Contributors
- Preface
- 1 An Introduction to High-Throughput Bioinformatics Data
- 2 Hierarchical Mixture Models for Expression Profiles
- 3 Bayesian Hierarchical Models for Inference in Microarray Data
- 4 Bayesian Process-Based Modeling of Two-Channel Microarray Experiments: Estimating Absolute mRNA Concentrations
- 5 Identification of Biomarkers in Classification and Clustering of High-Throughput Data
- 6 Modeling Nonlinear Gene Interactions Using Bayesian MARS
- 7 Models for Probability of Under- and Overexpression: The POE Scale
- 8 Sparse Statistical Modelling in Gene Expression Genomics
- 9 Bayesian Analysis of Cell Cycle Gene Expression Data
- 10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model
- 11 Interval Mapping for Expression Quantitative Trait Loci
- 12 Bayesian Mixture Models for Gene Expression and Protein Profiles
- 13 Shrinkage Estimation for SAGE Data Using a Mixture Dirichlet Prior
- 14 Analysis of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models
- 15 Nonparametric Models for Proteomic Peak Identification and Quantification
- 16 Bayesian Modeling and Inference for Sequence Motif Discovery
- 17 Identification of DNA Regulatory Motifs and Regulators by Integrating Gene Expression and Sequence Data
- 18 A Misclassification Model for Inferring Transcriptional Regulatory Networks
- 19 Estimating Cellular Signaling from Transcription Data
- 20 Computational Methods for Learning Bayesian Networks from High-Throughput Biological Data
- 21 Bayesian Networks and Informative Priors: Transcriptional Regulatory Network Models
- 22 Sample Size Choice for Microarray Experiments
- Plate section
5 - Identification of Biomarkers in Classification and Clustering of High-Throughput Data
Published online by Cambridge University Press: 23 November 2009
- Frontmatter
- Contents
- List of Contributors
- Preface
- 1 An Introduction to High-Throughput Bioinformatics Data
- 2 Hierarchical Mixture Models for Expression Profiles
- 3 Bayesian Hierarchical Models for Inference in Microarray Data
- 4 Bayesian Process-Based Modeling of Two-Channel Microarray Experiments: Estimating Absolute mRNA Concentrations
- 5 Identification of Biomarkers in Classification and Clustering of High-Throughput Data
- 6 Modeling Nonlinear Gene Interactions Using Bayesian MARS
- 7 Models for Probability of Under- and Overexpression: The POE Scale
- 8 Sparse Statistical Modelling in Gene Expression Genomics
- 9 Bayesian Analysis of Cell Cycle Gene Expression Data
- 10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model
- 11 Interval Mapping for Expression Quantitative Trait Loci
- 12 Bayesian Mixture Models for Gene Expression and Protein Profiles
- 13 Shrinkage Estimation for SAGE Data Using a Mixture Dirichlet Prior
- 14 Analysis of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models
- 15 Nonparametric Models for Proteomic Peak Identification and Quantification
- 16 Bayesian Modeling and Inference for Sequence Motif Discovery
- 17 Identification of DNA Regulatory Motifs and Regulators by Integrating Gene Expression and Sequence Data
- 18 A Misclassification Model for Inferring Transcriptional Regulatory Networks
- 19 Estimating Cellular Signaling from Transcription Data
- 20 Computational Methods for Learning Bayesian Networks from High-Throughput Biological Data
- 21 Bayesian Networks and Informative Priors: Transcriptional Regulatory Network Models
- 22 Sample Size Choice for Microarray Experiments
- Plate section
Summary
Abstract
Variable selection has been the focus of much research in recent years. In this chapter we review our contributions to the development of Bayesian methods for variable selection in problems that aim at either classifying or clustering samples. These methods are particularly relevant for the analysis of genomic studies, where high-throughput technologies allow thousands of variables to be measured on individual samples. We illustrate the methodologies using a DNA microarray data example.
Introduction
One of the major challenges in analyzing genomic data is their high dimensionality. Such data comes with an enormous amount of variables, which is often substantially larger than the sample size. A typical example with this characteristic, and one that we use to illustrate our methodologies, is DNA microarray data. Commonly used approaches for analyzing gene expression data proceed in two steps. First, the dimension of the data is reduced either by assessing each gene one at a time and removing those that do not pass a certain threshold, or by using a dimension reduction technique such as principal component analysis. Then, in a second stage of the analysis, a statistical model is applied to the reduced data. A limitation of the univariate screening approach is that it does not assess the joint effect of multiple variables and could throw away potentially valuable markers, which are not significant individually but may be important in conjunction with other variables. With the dimension reduction techniques, one drawback is that the actual markers are not assessed, since principal components, for example, are linear combinations of all the original variables. The Bayesian methods reviewed here overcome these limitations and address the selection and prediction problems in a unified manner.
- Type
- Chapter
- Information
- Bayesian Inference for Gene Expression and Proteomics , pp. 97 - 115Publisher: Cambridge University PressPrint publication year: 2006