Our collaboration on genetic effects on biochemical characteristics began in about 1979 when Nick Martin was at the Australian National University, in Canberra, and exploring the possibility of conducting what became the Alcohol Challenge Twin Study (ACTS; Martin et al., Reference Martin, Perl, Oakeshott, Gibson, Starmer and Wilks1985). He visited Sydney and came to see me at Royal Prince Alfred Hospital, mainly to ask about laboratory tests to assess subjects’ alcohol intake. I had to tell him that the prospects for estimating alcohol intake accurately for an individual person were poor, but we went on to agree that doing a range of biochemical tests on twins and using the results to assess heritability would be valuable. At that time there were few studies of this kind, and they had mostly focused on lipids (particularly cholesterol) because of its relevance to cardiovascular disease.
Over the subsequent 40 years, our biochemical studies developed through a number of stages, as happened for other phenotypes of biomedical interest. From initial steps to establish the existence of genetic effects and estimate heritability using comparisons of monozygotic (MZ) and dizygotic (DZ) pair similarity, study size grew to allow consideration of genetic correlations between phenotypes. By about 1990, genotyping of variants in candidate genes was becoming possible and soon after that the typing of genomewide microsatellite markers led to (mostly unsuccessful) attempts to identify loci affecting quantitative variation by genetic linkage. Around 2005, the technical advances allowing manufacture of genotyping arrays, and the conceptual step from linkage to association testing (Risch & Merikangas, Reference Risch and Merikangas1996), made genomewide association studies (GWAS) possible. Possible, that is, if one had access to samples for DNA extraction, phenotypic information, consent from study participants and funds to purchase the genotyping chips. Fortunately, we had the first three and this led gradually to the fourth.
The results of the GWAS revolution are still playing out, but developments so far include not only identification of loci affecting quantitative variation, but a greater understanding of the relationships between phenotypes (including between biomarkers and disease) and increasing use of genetic results to address questions of causation in epidemiology.
Heritability and Other Twin Pair Designs
Blood samples from the ACTS participants were used for a range of biochemical and hematological tests, and the results led to 10 papers that tended to estimate heritability and (because a subsample of ACTS participants were willing to return for a second time) repeatability. This combination clarified a fact that is still not sufficiently appreciated; when heritability and test–retest repeatability are similar, the long-term average of a diagnostic biomarker or risk factor is strongly dependent on genetic variation and environmental effects tend to be evanescent.
One of these biochemical studies (Whitfield & Martin, Reference Whitfield and Martin1983) was an early example of integration of a genetic marker into a twin study. It had been known for a long time that serum alkaline phosphatase activity is affected by the ABO and Lewis blood groups, and ABO grouping was one of the tests used to confirm self-reported zygosity in the twin pairs. About 15% of the genetic variance in alkaline phosphatase activity was associated with ABO type — still a large effect even in the GWAS era, and the ABO locus has turned out to be significant (for reasons which are not clear) in GWAS of many phenotypes.
Another variation on twin studies was the use of MZ pairs, and those who participated twice, to assess postulated genetic effects on sensitivity to environmental variation. The hypothesis (Magnus et al., Reference Magnus, Berg, Borresen and Nance1981) was that some variants, which might or might not affect mean values for a phenotype, would affect the response of the phenotype to environmental variation. By genotyping MZ twin pairs for the genetic variant (the MN blood group), and measuring the phenotype (cholesterol) in each twin or in the same person on more than one occasion, it would be possible to test the hypothesis that within-pair or within-person differences would be associated with genotype. Such gene–environment interaction would be of considerable importance if, say, some people obtained benefit from change in diet and others did not. As so often occurs, the original hypothesis was not strongly supported by results (Martin et al., Reference Martin, Rowell and Whitfield1983), but a slightly different one (of effects on triglycerides) emerged. Subsequent multicentre data (Surakka et al., Reference Surakka, Whitfield, Perola, Visscher, Montgomery and Falchi2012) suggested that gene-by-environment interaction for lipid levels might exist, with a just-significant result (but for a different locus and phenotype) from genomewide testing. I mention these studies as an example of an attractive hypothesis, worth some effort to test, not being supported in practice. More generally, G × E interaction has only been shown infrequently despite the large amount of GWAS data now available.
Candidate Genes, Linkage
Association studies involving candidate genes have proved to be a trap, and it is widely accepted that they have led to many false positives through lack of consideration of the multiple testing problem when claiming significant results. The positive aspect has been an increased awareness of the need to set stringent p values in genomewide studies and, as far as possible, to replicate results in independent cohorts. Linkage studies for quantitative phenotypes such as biochemical test results have mostly failed for a different reason, because the effect sizes (with few exceptions) are too small to be detectable. Our experience with candidate genes and linkage generally followed this pattern.
One successful candidate gene study was to evaluate the effects of variation at the homeostatic iron regulator (HFE) gene, newly found to be necessary (but not sufficient) for hemochromatosis, on serum iron and related measures of iron status in the general population (Whitfield et al., Reference Whitfield, Cullen, Jazwinska, Powell, Heath, Zhu and Martin2000). This integrated HFE genotype information with the twin study method and showed that although HFE variants had significant effects on iron status, they only accounted for a small proportion of the genetic variance — an early example of missing heritability.
Because we had suitable data on related study participants, at first DZ twin pairs and later nontwin siblings, we made a number of attempts to identify loci affecting lipids through linkage analysis, but association analyses soon displaced linkage. One successful attempt was for serum butyrylcholinesterse, where a linkage peak was found on chromosome 3, overlapping the BCHE gene location. This was later substantiated by GWAS, but it should be admitted that linkage also identified a peak on chromosome 5 which did not show association in the later GWAS.
Blood Lead, from h 2 to GWAS
Lead is toxic and widely distributed in the environment, largely because of human mining and industrial processes including previous use in house paints and as a petrol additive. It has been implicated in a range of phenomena from the fall of the Roman empire (now largely refuted; see Retief & Cilliers, Reference Retief and Cilliers2006) to childhood behavior disorders and educational achievement (for which there is strong evidence of association; Bellinger, Reference Bellinger2008, but many potential confounders that make causation uncertain). Because of the presence of lead in the environment, it was taken for granted that variation in blood lead would be ‘environmental’ rather than ‘genetic’. A series of papers using data from twins and their relatives gave a different perspective.
First, the classical twin method (Whitfield et al., Reference Whitfield, Dy, McQuilty, Zhu, Heath, Montgomery and Martin2010) showed evidence for substantial heritability of blood lead concentration in adults (h 2 ≈ 40%), with no significant shared environment effect. Linkage analysis suggested that a region of chromosome 3 contained a variant affecting blood lead. This extensive region includes the solute carrier 4 member 7 (SLC4A7) gene, which codes for a transporter affecting lead influx into erythrocytes, which was very encouraging, but this linkage result was not supported by later GWAS results.
Given the evidence for heritability and the possible localization of a variant having substantial effects on blood lead, the next step was to conduct a GWAS. This was done in collaboration with Dave Evans and used the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort from the UK in addition to our data. It found one significant locus (aminolevulinate dehydratase [ALAD]) using a combined sample size of 5400 people. However, there was no evidence for significant association at the chromosome 3 linkage region.
This example is important because it follows the stages of genetic investigation from heritability (of a seemingly environmental phenotype) to GWAS, with a diversion through linkage on the way. If it ever becomes possible to gather more data, a much larger GWAS for blood lead should identify more loci and permit the use of Mendelian randomization to assess whether associations between lead and childhood development are causal.
GWAS — Heart, Kidney, Liver
The panel of routine diagnostic tests which we ran on blood samples from twins and their families covered a number of organ systems or areas of risk — lipids for heart disease, creatinine, urea and uric acid for kidney function, enzyme tests for liver function, C-reactive protein (CRP) for inflammation.
Although we accumulated biochemical data on around 17,000 adults (mostly with genotyping), the main value of this dataset came from collaboration with other groups who had similar data and from meta-analysis. Through these collaborations, sample sizes in the hundreds of thousands could be achieved, and discovery of significant variants has been far beyond what any single group could have managed (Ligthart et al., Reference Ligthart, Vaez, Vosa, Stathopoulou, de Vries, Prins and Alizadeh2018; Tin et al., Reference Tin, Marten, Halperin Kuhns, Li, Wuttke, Kirsten and Köttgen2019; Willer et al., Reference Willer, Schmidt, Sengupta, Peloso, Gustafsson and Kanoni2013; Wuttke et al., Reference Wuttke, Li, Li, Sieber, Feitosa, Gorski and Pattaro2019). More importantly than listing significant variants, our data contributed to insights such as the causal role of triglycerides in coronary artery disease (Do et al., Reference Do, Willer, Schmidt, Sengupta, Gao, Peloso and Kathiresan2013); confirmation that most loci associated with kidney function assessed from creatinine results are also associated with urea and with diagnosed chronic kidney disease (Wuttke et al., Reference Wuttke, Li, Li, Sieber, Feitosa, Gorski and Pattaro2019); and that genes containing variants that affect C-reactive protein concentration cluster in two groups, representing immune and metabolic pathways (Ligthart et al., Reference Ligthart, Vaez, Vosa, Stathopoulou, de Vries, Prins and Alizadeh2018).
GWAS — Other Phenotypes
Apart from the widely available tests mentioned above, we measured a number of other biochemical phenotypes. Despite the limitations imposed by limited numbers (our studies plus one or just a few others), several important and/or interesting associations have been found.
As well as blood lead (discussed above), the method for lead estimation also gave results for six other toxic or essential elements in blood cells (As, Cd, Cu, Hg, Se, Zn). These also showed significant heritability, and there was a notable genetic correlation between concentrations of As and Hg (r G = .83, whereas r E = .34). GWAS for the essential elements, and meta-analysis with similar data from the ALSPAC cohort, showed a number of significant loci for Cu, Se and Zn with, in many cases, probable explanations in terms of gene functions (Evans et al., Reference Evans, Zhu, Dy, Heath, Madden, Kemp and Whitfield2013).
An early study on iron and HFE genotypes was mentioned above. This was expanded to GWAS with our own data and then to meta-analysis of GWAS data from multiple groups, which included almost 50,000 participants. Eleven loci were identified as significant for one or more of the markers of iron status (Benyamin et al., Reference Benyamin, Esko, Ried, Radhakrishnan, Vermeulen, Traglia and Whitfield2014), and because of the biological importance of iron and its potential to cause tissue damage there have been a number of attempts to use the relevant genotypes as instrumental variables to test whether associations between iron and disease are causal.
Plasma cholinesterase (butyrylcholinesterase, BCHE) is an enzyme whose activity is associated with obesity and other aspects of metabolic syndrome, but its function and the reasons for these associations are unknown. Because BCHE measurement was included in our test profile, we carried out a GWAS with the expectation that identification of genes affecting BCHE variation would shed light on its function and relationships with other phenotypes. By far the strongest associations were within or near the BCHE gene, and other significant loci were not associated with metabolic risk factors. On the other hand, Single Nucleotide Polymorphisms (SNPs) in genes associated with metabolic risk tended to have effects on BCHE, suggesting that BCHE variation is a consequence of metabolic abnormalities.
Carbohydrate-deficient transferrin (CDT) comprises transferrin isoforms that have fewer than the usual four terminal sialic acid residues on their glycan sidechains, and their relative concentration in serum is increased in people with high alcohol intake. Because of our interest in markers of alcohol use, and as an example of variation affecting protein glycosylation, we conducted a GWAS for CDT (Kutalik et al., Reference Kutalik, Benyamin, Bergmann, Mooser, Waeber, Montgomery and Whitfield2011). This identified two loci, the transferrin (TF) gene itself, and the phosphoglucomutase 1 (PGM1) gene, which catalyses an early step in synthesis of the carbohydrate side chains, showing that variation in both the protein structure and in formation of the glycan component can affect the product.
Proteolytic cleavage of chromogranins leads to formation of a number of bioactive peptides including catestatin, which has a role in control of blood pressure. Collaboration with Dan O’Connor, the major player in study of chromogranins and related peptides, led us through heritability, linkage and GWAS stages to discovery of two loci affecting catestatin formation (Benyamin et al., Reference Benyamin, Maihofer, Schork, Hamilton, Rao, Schmid-Schonbein and O’Connor2017). Each locus contained a gene for a proteolytic enzyme involved in the intrinsic pathway of coagulation, and review of published literature showed that this process is important for formation of several peptide hormones from their precursors.
Conclusions
Studies on the genetics of biomarkers carry the expectation that because the biomarkers are associated with disease, results will be translatable to the genetics of disease. GWAS results in general may give insight into the mechanisms that regulate or influence the phenotype; they can (depending on the genetic architecture and on study size) predict the phenotype of an individual or stratify their risk of disease; and they can establish or refute causal relationships between apparent risk factors and disease. Genetic studies on biochemical phenotypes have grown and developed over the past 40 years, from 412 participants in our early twin studies to over a million in recent collaborative meta-analyses. It should be remembered that the justification for mega-GWAS studies came from initial, smaller GWAS, and the justification for the initial GWAS usually came from the knowledge that the phenotypes had significant heritability.