Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-13T03:14:00.697Z Has data issue: false hasContentIssue false

A new, improved and generalizable approach for the analysis of biological data generated by -omic platforms

Published online by Cambridge University Press:  22 October 2014

A. B. Pleasants
Affiliation:
Mathematical Biology Department, AgResearch, Hamilton, New Zealand Gravida National Centre for Growth and Development, Auckland, New Zealand
G. C. Wake
Affiliation:
Gravida National Centre for Growth and Development, Auckland, New Zealand Department of Mathematics and Statistics, Massey University, Albany, New Zealand
P. R. Shorten
Affiliation:
Mathematical Biology Department, AgResearch, Hamilton, New Zealand Gravida National Centre for Growth and Development, Auckland, New Zealand
C. Z. W. Hassell-Sweatman
Affiliation:
Liggins Institute, University of Auckland, Auckland, New Zealand
C. A. McLean
Affiliation:
Liggins Institute, University of Auckland, Auckland, New Zealand
J. D. Holbrook
Affiliation:
Singapore Institute for Clinical Sciences, National University of Singapore, Singapore
P. D. Gluckman
Affiliation:
Gravida National Centre for Growth and Development, Auckland, New Zealand Liggins Institute, University of Auckland, Auckland, New Zealand Singapore Institute for Clinical Sciences, National University of Singapore, Singapore
A. M. Sheppard*
Affiliation:
Gravida National Centre for Growth and Development, Auckland, New Zealand Liggins Institute, University of Auckland, Auckland, New Zealand
*
*Address for Correspondence: Dr A. M. Sheppard, Liggins Institute, The University of Auckland, Private Bag 92019, Victoria Street West, Auckland 1142, New Zealand. (E-mail [email protected])

Abstract

The principles embodied by the Developmental Origins of Health and Disease (DOHaD) view of ‘life history’ trajectory are increasingly underpinned by biological data arising from molecular-based epigenomic and transcriptomic studies. Although a number of ‘omic’ platforms are now routinely and widely used in biology and medicine, data generation is frequently confounded by a frequency distribution in the measurement error (an inherent feature of the chemistry and physics of the measurement process), which adversely affect the accuracy of estimation and thus, the inference of relationships to other biological measures such as phenotype. Based on empirical derived data, we have previously derived a probability density function to capture such errors and thus improve the confidence of estimation and inference based on such data. Here we use published open source data sets to calculate parameter values relevant to the most widely used epigenomic and transcriptomic technologies Then by using our own data sets, we illustrate the benefits of this approach by specific application, to measurement of DNA methylation in this instance, in cases where levels of methylation at specific genomic sites represents either (1) a response variable or (2) an independent variable. Further, we extend this formulation to consideration of the ‘bivariate’ case, in which the co-dependency of methylation levels at two distinct genomic sites is tested for biological significance. These tools not only allow greater accuracy of measurement and improved confidence of functional inference, but in the case of epigenomic data at least, also reveal otherwise cryptic information.

Type
Original Article
Copyright
© Cambridge University Press and the International Society for Developmental Origins of Health and Disease 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Talens, RP, Boomsma, DI, Tobi, EW, et al. Variation, patterns, and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB J. 2010; 24(9), 31353144.Google Scholar
2.Laird, PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Gen. 2010; 11, 191203.Google Scholar
3.Gervin, K, Hammero, M, Akselsen, H, et al. Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 2011; 21, 18131821.Google Scholar
4.Ehrich, M, Nelson, MR, Stanssens, P, et al. Quantitative high-throughput analysis of DNA methylation patterns by base-specific cleavage and mass spectrometry. Proc Nat Acad Sci. 2005; 102, 1578515790.Google Scholar
5.Warnecke, PM, Stirzaker, C, Melki, JR, et al. Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. Nucleic Acids Res. 2007; 25, 44224426.Google Scholar
6.Warnecke, PM, Stirzaker, C, Song, J, et al. Identification and resolution of artifacts in bisulfite sequencing. Methods. 2002; 27, 101107.Google Scholar
7.Coolen, MW, Statham, AL, Gardiner-Garden, M, Clark, SJ. Genomic profiling of CpG methylation and allelic specificity using quantitative high-throughput mass spectrometry: critical evaluation and improvements. Nucleic Acids Res. 2007; 35, e119.Google Scholar
8.Gallant, AR, Tauchen, G. Semi-nonparametric estimation of conditionally constrained heterogeneous processes: asset pricing applications. Econometrica. 1989; 57, 10911120.CrossRefGoogle Scholar
9.Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood, 2001. OUP: Oxford, 528pp.Google Scholar
10.Buckland, ST. Fitting density functions with polynomials. J App Stats. 1992; 41, 6367.CrossRefGoogle Scholar
11.Hassell-Sweatman, CZW, Wake, GC, Pleasants, AB, McLean, CA, Sheppard, AM. Linear models with response functions based on the Laplace distribution: statistical formulae and their application to epigenomics. ISRN Prob and Stats. 2014; 2013, 122.Google Scholar
12.Freund, JE, Walpole, REMathematical Statistics, 3rd edn, 1992. Prentice Hall: New Jersey, 547pp.Google Scholar
13.Porter, PS, Rao, ST, Ku, J-Y, Poirot, RL, Dakins, M. Small sample properties of non-parametric bootstrap t confidence intervals. J Air Waste Manag Assoc. 1997; 47, 11971203.Google Scholar
14.Jondeau, E, Poon, S-H, Rockinger, M. Financial Modelling Under Non–Gaussian Distributions, 2000. Springer-Verlag: London, 539 pp.Google Scholar
15.Purdom, E, Holmes, SP. Error distribution for gene expression data. Stat Appl Genet Mol Biol 2005; 4, 133.Google Scholar
16.Seow, WJ, Pesatori, AC, Dimont, E, et al. Urinary benzene biomarkers and DNA methylation in Bulgarian petrochemical workers: study findings and comparison of linear and beta regression models. PLoS One. 2012; 7, e50471.Google Scholar
17.Carroll, RJ, Ruppert, D, Stefanski, LA, Crainiceanu, CM. Measurement Error in Nonlinear Models: A Modern Perspective. Monographs on Statistics and Applied Probability, 2nd edn. 2006. Chapman and Hall/CRC Press: Florida, 488pp.Google Scholar
18.Ferrari, S, Cribari-Neto, F. Beta regression for modelling rates and proportions. J App Stats. 2004; 31, 799815.Google Scholar
19.Hebestreit, K, Dugas, M, Klein, HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics. 2013; 29, 16471653.Google Scholar
20.R Core Team. R: A Language and Environment for Statistical Computing, 2013. R Foundation for Statistical Computing: Vienna, Austria, http://www.R-project.org/.Google Scholar
21.Babu, K, Zhang, J, Moloney, S, et al. Epigenetic regulation of ABCG2 gene is associated with susceptibility to xenobiotic exposure. J Proteomics. 2012; 75, 34103418.Google Scholar
22.Godfrey, KM, Sheppard, A, Gluckman, P, et al. Epigenetic gene promoter methylation at birth is associated with child’s later adiposity. Diabetes. 2011; 60, 15281534.Google Scholar
23.Kolassa, JE. Series Approximation Methods in Statistics. Lecture Notes in Statistics, 2006. Springer Science: New York, 218pp.Google Scholar