Gene-set-based inference of biological network topologies from big molecular profiling data

doi:10.1017/CBO9781316162750.015

14 - Gene-set-based inference of biological network topologies from big molecular profiling data

from Part IV - Big data over biological networks

Published online by Cambridge University Press: 18 December 2015

Lipi Acharya and

Dongxiao Zhu

Edited by

Shuguang Cui ,

Alfred O. Hero, III ,

Zhi-Quan Luo and

José M. F. Moura

Show author details

Lipi Acharya: Affiliation:
Dow AgroSciences LLC, USA
Dongxiao Zhu: Affiliation:
Wayne State University, USA
Shuguang Cui: Affiliation:
Texas A & M University
Alfred O. Hero, III: Affiliation:
University of Michigan, Ann Arbor
Zhi-Quan Luo: Affiliation:
University of Minnesota
José M. F. Moura: Affiliation:
Carnegie Mellon University, Pennsylvania

Book contents

Get access

Summary

Network discovery is often of primary interest in many scientific domains. It becomes much more challenging in biological domain because: (1) such networks are not directly observable in the experiments; (2) such networks are dynamic, i.e. different parts of the network are activated from time to time and from condition to condition; and (3) the increasingly available biological data are often big (volume), heterogeneous (variety), and error prone (veracity). There is an urgent need for the new methods, algorithms and tools to discover networks from big biological data. In this chapter, we make two assumptions that lead to two approaches to network discovery from big biological data. (1) The true network topology is a distribution of candidate topologies. The challenge is that an exponential number of possible topologies are computational intractable to characterize. Our strategy, i.e. gene set Gibbs sampling (GSGS), is to draw sample topologies and use them to infer the true topology – an approximate learning falling into stochastic algorithm framework. (2) The true network topology is deterministic. The challenge is the large search space, where we design an artificial intelligence algorithm, i.e. gene set simulated annealing (GSSA), to efficiently and intelligently explore the search space of network structures. We use both simulation data and real-world data to demonstrate the performance of our approaches compared to the selected competing approaches.

Introduction

The past decade has witnessed a tremendous explosion in the amount of data generated through high-throughput molecular profiling technologies such as microarrays and next-generation sequencing. Big molecular profiling datasets are enabling a high-resolution view of biological systems and allowing scientists to interrogate the biomolecular activities of tens of thousands of genes simultaneously. However, challenges remain in analyzing big molecular profiling data and gaining meaningful insights into the biomolecular interaction and regulation mechanisms. These mechanisms are often understood through the inference of biological networks using computational systems biology approaches. A wide range of methods have been proposed in the literature for inferring the structure of different types of biological networks, such as gene regulatory networks, protein– protein interaction networks, and signaling networks in the form of Bayesian networks [1, 2], probabilistic Boolean networks (PBNs) [3, 4],mutual information networks [5–7], graphical Gaussian models [8–11], and other approaches [12–16].

Type: Chapter
Information: Big Data over Networks , pp. 391 - 408

DOI: https://doi.org/10.1017/CBO9781316162750.015 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] N., Friedman, M., Linial, I., Nachman, and D., Peer, “Using Bayesian networks to analyze expression data,” J. Comput. Biol., vol. 7, pp. 601–620, 2000.Google Scholar

[2] E., Segal, M., Shapira, A., Regev, D., Peer, D., Botstein, D., Koller, and N., Friedman, “Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data,” Nat. Genet., vol. 34, pp. 166–176, 2003.Google Scholar

[3] I., Shmulevich, E. R., Dougherty, S., Kim, and W., Zhang, “Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks,” Bioinformatics, vol. 18, pp. 261–274, 2002.Google Scholar

[4] I., Shmulevich, I., Gluhovsky, R., Hashimoto, E. R., Dougherty, and W., Zhang, “Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks,” Comp. Funct. Genomics., vol. 4, pp. 601–608, 2003.Google Scholar

[5] A. J., Butte and I. S., Kohane, “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements,” Pac. Symp. Biocomput., vol. 5, pp. 415– 426, 2000.Google Scholar

[6] G., Altay and F., Emmert-Streib, “Revealing differences in gene network inference algorithms on the network-level by ensemble methods,” Bioinformatics, vol. 26, no. 14, pp. 1738–1744, 2010.Google Scholar

[7] P. E., Meyer, K., Kontos, and G., Bontempi, “Information-theoretic inference of large transcriptional regulatory networks,” EUROSIP J. Bioinform. Syst. Biol., 2007.Google Scholar

[8] H., Kishino and P. J., Waddell, “Correspondence analysis of genes and tissue types and finding genetic links from microarray data,” Genome Informatics, vol. 11, pp. 83–95, 2000.Google Scholar

[9] A., Dobra, C., Hans, B., Jones, J. R., Nevins, and M., West, “Sparse graphical models for exploring gene expression data,” J. Multiv. Anal., vol. 90, pp. 196–212, 2004.Google Scholar

[10] J., Schäfer and K., Strimmer, “An empirical Bayes approach to inferring large-scale gene association networks,” Bioinformatics, vol. 21, pp. 756–764, 2005.Google Scholar

[11] J., Schäfer and K., Strimmer, “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Stat. Appl. Genet. Mol. Biol., vol. 4, 2005.Google Scholar

[12] T. S., Gardner, D., di Bernardo, D., Lorenz, and J. J., Collins, “Inferring genetic networks and identifying compound mode of action via expression profiling,” Science, vol. 301, no. 5629, pp. 102–105, 2003.Google Scholar

[13] J., Tegner, M. K. S., Yeung, J., Hasty, and J. J., Collins, “Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling,” Proc. Natl Acad. Sci. USA, vol. 100, pp. 5944–5949, 2003.Google Scholar

[14] D., Zhu, A. O., Hero, Z. S., Qin, and A., Swaroop, “High throughput screening of co-expressed gene pairs with controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS),” J. Comput. Biol., vol. 12, pp. 1027–1043, 2005.Google Scholar

[15] A. L., Tarca, S., Draghici, P., Khatri, et al., “A novel signaling pathway impact analysis,” Bioinformatics, vol. 25, pp. 75–82, 2009.Google Scholar

[16] C. J., Vaske, S. C., Benz, J. Z., Sanborn, et al., “Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM,” Bioinformatics, vol. 26, pp. 237–245, 2010.Google Scholar

[17] A., Subramanian, P., Tamayo, V. K., Mootha, et al., “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” Proc. Natl Acad. Sci. USA, vol. 102, pp. 15 545–15 550, 2005.Google Scholar

[18] L., Tian, S. A., Greenberg, S. W., Kong, et al., “Discovering statistically significant pathways in expression profiling studies,” Proc. Natl Acad. Sci. USA, vol. 102, pp. 13 544–13 559, 2005.Google Scholar

[19] D. W., Huang, B. T., Sherman, and R. A., Lempicki, “Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources,” Nat Protoc., vol. 4, no. 1, pp. 44–57, 2009.Google Scholar

[20] M., Kanehisa, S., Goto, M., Furumichi, M., Tanabe, and M., Hirakawa, “KEGG for representation and analysis of molecular networks involving diseases and drugs,” Nucleic Acids Res., vol. 38, pp. 355–360, 2010.Google Scholar

[21] G. J., Dennis, B. T., Sherman, D. A., Hosack, et al., “DAVID: database for annotation, visualization and integrated discovery,” Genome Biol., vol. 4, no. 5, 2003.Google Scholar

[22] L., Acharya, T., Judeh, Z., Duan, M., Rabbat, and D., Zhu, “GSGS: a computational approach to reconstruct signaling pathway structures from gene sets,” IEEE/ACM Trans. Comput. Biology Bioinform., vol. 9, no. 2, pp. 438–450, 2012.Google Scholar

[23] L., Acharya, T., Judeh, G., Wang, and D., Zhu, “Optimal structural inference of signaling pathways from unordered and overlapping gene sets,” Bioinformatics, vol. 28, no. 4, pp. 546– 556, 2012.Google Scholar

[24] A., Gelman, J. B., Carlin, H. S., Stern, and D. B., Rubin, Bayesian Data Analysis, 2nd edition, Chapman & Hall, 2003.Google Scholar

[25] G. H., Givens and J. A., Hoeting, Computational Statistics, Wiley Series in Proabbility and Statistics, 2005.Google Scholar

[26] S., Kirkpatrick, C. D. J., Gelatt, and M. P., Vecchi, “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, 1983.Google Scholar

[27] T., Judeh, T., Jayyousi, L., Acharya, R. G., Reynolds, and D., Zhu, “Gene set cultural algorithm: A cultural algorithm approach to reconstruct networks from gene sets,” in Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics (BCB), 2013.Google Scholar

[28] G. D., Bader, M. P., Cary, and S., Chris, “Pathguide: a pathway resource list,” Nucleic Acids Research, vol. 34, pp. 504–506, 2006.Google Scholar

[29] K. Y., Yeung, M., Medvedovic, and R. E., Bumgarner, “Clustering gene-expression data with repeated measurements,” Genome Biol., vol. 4, no. 5, 2003.Google Scholar

[30] M., Medvedovic and S., Sivaganesan, “Bayesian infinite mixture model based clustering of gene expression profiles,” Bioinformatics, vol. 18, pp. 1194–1206, 2002.Google Scholar

[31] M., Medvedovic, K. Y., Yeung, and R. E., Bumgarner, “Bayesian mixtures for clustering replicated microarray data,” Bioinformatics, vol. 20, pp. 1222–1232, 2004.Google Scholar

[32] M., Eisen, P., Spellman, P. O., Brown, and D., Botstein, “Cluster analysis and display of genome-wide expression patterns,” Proc. Natl Acad. Sci. USA, vol. 95, pp. 14 863–14 868, 1998.Google Scholar

[33] J. A., Hartigan and M. A., Wong, “A k-means clustering algorithm,” Applied Stat., vol. 28, pp. 100–108, 1979.Google Scholar

[34] G. J., McLachlan and D., Peel, Finite Mixture Models, Wiley Series in Probability and Mathematical Statistics, Applied Probability and Statistics Section, John Wiley & Sons, 2000.Google Scholar

[35] A., Ben-Hur and I., Guyon, Detecting Stable Clusters Using Principal Component Analysis in Methods in Molecular Biology, Humana Press, 2003.Google Scholar

[36] P. M., Kim and B., Tidor, “Subsystem identification through dimensionality reduction of large-scale gene expression data,” Genome Res., vol. 13, no. 7, pp. 1706–1718, 2003.Google Scholar

[37] M., Koyuturk, A., Grama, and N., Ramakrishnan, “Compression, clustering and pattern discovery in very high dimensional discrete-attribute datasets,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, 2005.Google Scholar

[38] X., Yang, Y., Zhou, R., Jin, and C., Chan, “Reconstruct modular phenotype-specific gene networks by knowledge-driven matrix factorization,” Bioinformatics, vol. 25, pp. 2236– 2243, 2009.Google Scholar

[39] S., Draghici, P., Khatri, R. P., Martins, G. C., Ostermeier, and S. A., Krawetz, “Global functional profiling of gene expression,” Genomics, vol. 81, no. 2, pp. 98–104, 2003.Google Scholar

[40] P., Khatri and S., Draghici, “Ontological analysis of gene expression data: current tools, limitations, and open problems,” Bioinformatics, vol. 21, pp. 3587–3595, 2005.Google Scholar

[41] A., Margolin, T., Nemenman, K., Basso, et al., “Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context,” BMC Bioinform., 2006.Google Scholar

[42] J. J., Faith, B., Hayete, J. T., Thaden, et al., “Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles,” PLoS Biol., vol. 5, no. 1, 2007.Google Scholar

[43] G. F., Cooper and E., Herskovits, “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, vol. 9, no. 4, pp. 309–347, 1992.Google Scholar

[44] D., Marbach, T., Schaffter, C., Mattiussi, and D., Floreano, “Generating realistic in silico gene networks for performance assessment of reverse engineering methods,” J. Comput. Biol., vol. 16, no. 2, pp. 229–239, 2009.Google Scholar

[45] D., Marbach, R. J., Prill, T., Schaffter, et al., “Revealing strengths and weaknesses of methods for gene network inference,” Proc. Natl Acad. Sci. USA, vol. 107, no. 14, pp. 6286–6291, 2010.Google Scholar

[46] R. J., Prill, D., Marbach, J., Saez-Rodriguez, et al., “Towards a rigorous assessment of systems biology models: the DREAM3 challenges,” PLoS ONE, vol. 5, 2010.Google Scholar

[47] P., Mendes, Framework for Comparative Assessment of Parameter Estimation and Inference Methods in Systems Biology, MIT Press, Cambridge, MA, 2009.Google Scholar

[48] G., Stolovitzky, R. J., Prill, and A., Califano, Lessons from the DREAM2 Challenges, Annals of the New York Academy of Sciences, G., Stolovitzky and P., Kahlem and A., Califano, Eds., 1158, pp. 159–195, 2009.Google Scholar

[49] B., Hajek, “Cooling schedules for optimal annealing,” Mathematics of Operations Research, vol. 13, no. 2, pp. 311–329, 1998.Google Scholar

[50] D., Chickering, “Optimal structure identification with greedy search,” J.Mach. Learn. Res., vol. 3, pp. 507–554, 2002.Google Scholar

[51] R. W., Robinson, Counting Unlabeled Acyclic Digraphs, Springer Lecture Notes in Mathematics, 622, pp. 28–43, 1977.Google Scholar

[52] K., Murphy, “Active learning of causal Bayes net structure,” UC Berkeley, Tech. Rep., 2001.

[53] B. W., Kernighan and S., Lin, “An efficient heuristic procedure for partitioning graphs,” Bell Systen Technical Journal, vol. 49, pp. 291–307, 1970.Google Scholar

[54] J. H., Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press, Cambridge, MA, 1992.Google Scholar

[55] F., Glover, “Tabu Search – Part I,” ORSA J. Comp., vol. 1, no. 3, pp. 190–206, 1989.Google Scholar

Book contents

14 - Gene-set-based inference of biological network topologies from big molecular profiling data

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive