Book contents
- Frontmatter
- Contents
- List of contributors
- Preface
- Part I Mathematical foundations
- Part II Big data over cyber networks
- Part III Big data over social networks
- Part IV Big data over biological networks
- 12 Inference of gene regulatory networks: validation and uncertainty
- 13 Inference of gene networks associated with the host response to infectious disease
- 14 Gene-set-based inference of biological network topologies from big molecular profiling data
- 15 Large-scale correlation mining for biomolecular network discovery
- Index
- References
13 - Inference of gene networks associated with the host response to infectious disease
from Part IV - Big data over biological networks
Published online by Cambridge University Press: 18 December 2015
- Frontmatter
- Contents
- List of contributors
- Preface
- Part I Mathematical foundations
- Part II Big data over cyber networks
- Part III Big data over social networks
- Part IV Big data over biological networks
- 12 Inference of gene regulatory networks: validation and uncertainty
- 13 Inference of gene networks associated with the host response to infectious disease
- 14 Gene-set-based inference of biological network topologies from big molecular profiling data
- 15 Large-scale correlation mining for biomolecular network discovery
- Index
- References
Summary
Inspired by the problem of inferring gene networks associated with the host response to infectious diseases, a new framework for discriminative factor models is developed. Bayesian shrinkage priors are employed to impose (near) sparsity on the factor loadings, while non-parametric techniques are utilized to infer the number of factors needed to represent the data. Two discriminative Bayesian loss functions are investigated, i.e. the logistic log-loss and the max-margin hinge loss. Efficient mean-field variational Bayesian inference and Gibbs sampling are implemented. To address large-scale datasets, an online version of variational Bayes is also developed. Experimental results on two real world microarray-based gene expression datasets show that the proposed framework achieves comparatively superior classification performance, with model interpretation delivered via pathway association analysis.
Background
From a statistical-modeling perspective, gene expression analysis can be roughly divided into two phases: exploration and prediction. In the former, the practitioner attempts to get a general understanding of a dataset by modeling its variability in an interpretable way, such that the inferred model can serve as a feature extractor and hypotheses generating mechanism of the underlying biological processes. Factor models are among the most widely employed techniques for exploratory gene expression analysis [1, 2], with principal component analysis a popular special case [3]. Predictive modeling, on the other hand, is concerned with finding a relationship between gene expression and phenotypes, that can be generalized to unseen samples. Examples of predictive models include classification methods like logistic regression and support vector machines [4, 5].
Factor models infer a latent covariance structure among the genes or biomarkers, with data modeled as generated from a noisy low-rank matrix factorization, manifested in terms of a loadings matrix and a factor scores matrix. Different specifications for these matrices give rise to special cases of factor models, such as principal components analysis [6], nonnegative matrix factorization [7], independent component analysis [8], and sparse factor models [1]. Factor models employing a sparse factor loadings matrix are of significant interest in gene-expression analysis, as the nonzero elements in the loadings matrix may be interpreted as correlated gene networks [1, 2, 9].
- Type
- Chapter
- Information
- Big Data over Networks , pp. 365 - 390Publisher: Cambridge University PressPrint publication year: 2016