The object of this paper is to show by a numerical example how the commonly held view of the Hardy–Weinberg principle is based on a misconception of its connection to random mating of parents and independent pairing of gametes to form zygotes. The conclusion that follows is that one cannot infer random mating in a population from the observation of Hardy–Weinberg equilibrium. However, this is done commonly — for example, Tallis (Reference Tallis1966, p. 121), after giving a set of Hardy–Weinberg frequencies, writes: ‘the population is assumed to be panmictic’.
Li (Reference Li1988) introduced the term ‘pseudo-random mating’ when he specified a model showing that Hardy–Weinberg proportions could be so maintained at a locus for two alleles. That this is possible is implicit in a formula given by Stark (Reference Stark1980), and Stark (Reference Stark2006) shows that Hardy–Weinberg proportions can be reached in one generation from an arbitrary distribution in one round of nonrandom mating. In the light of the present paper, it can be conjectured confidently that it is possible to maintain the composition of a population with nonrandom mating for any number of alleles.
The ABO blood group series is referred to because it is an example of a locus with three allelomorphs. Crew (Reference Crew1947) is based on a course of genetics that the author gave to medical students for almost 30 years. When considering the difficulty of presenting such a course, he wrote (Reference Crew1947, viii): ‘It is not to be expected that, to one whose conditioned ambition it is to treat the sick or injured human individual, genetics can exercise a strong appeal.’ In presenting the ABO groups, Crew strives to overcome the resistance of medical students. The subdivision of antigen A into forms A1 and A2 noted by Penrose (Reference Penrose1973, p. 25) is ignored here.
Crew points out the possible confusion that can arise as to whether A, B and O refer to the blood groups, the antigens or the genes. We use A, B and O for the genes and A, B, AB and O for the phenotypes. Crew gives a table of the rules to be observed for the transfusion of whole blood (Reference Crew1947, p. 66). The phenotypes and genotypes are grouped as follows: A (AA and A O), B (BB and BO), AB (AB) and O (OO). He gives the percentages of phenotypes: A — 42.2; B — 8.7; AB — 3.2; O — 45.8 (p. 67). Crew gives a table of the 10 different parental mating combinations and both the possible phenotypes of offspring and the phenotypes lacking among the offspring (p. 68).
Yamamoto (Reference Yamamoto2004) gives a more comprehensive review of the ABO system, including the evolution of the system in humans and other species.
In the main, Crew does not touch on population genetics theory. In relation to ‘total sex-linked inheritance’, such as red–green blindness and hemophilia, he makes the following observation (p. 49): ‘If the frequency of hemophilic males among the male population is p and the frequency of normal males is q, where (p + q) = 1, then with random mating, the frequency of hemophiliacs, carriers and normals among the female population is p 2:2pq : q 2.’ The extension of this distribution to an autosomal locus with three alleles is the subject of this paper.
Cavalli-Sforza and Bodmer (Reference Cavalli-Sforza and Bodmer1971, p. 53) state: ‘The Hardy-Weinberg theorem (as shown by Weinberg, Reference Weinberg1909) can be extended quite simply to cover multiple alleles. Thus, if we assume that random mating is equivalent to the random union of gametes, we may compute the frequencies of the various genotypes by the expansion of
where the p i ’s are the frequencies of the genes A i .’
The above statement invites two comments: first, ‘random union of gametes’ should be pairing of gametes drawn independently from the gene pool, and second, Hardy–Weinberg frequencies can be maintained by nonrandom mating of parents, as is demonstrated in the next section.
Nonrandom Mating and Hardy–Weinberg Frequencies
Phenotypic identities are ignored so that the focus is on the three genes denoted A, B, C and genotypes AA, BB, CC, AB, AC, BC, which are numbered, respectively, 1, 2, 3, 4, 5, 6. There are 36 possible mating combinations and the proportions are set out in a symmetric matrix with elements c i,j , i, j = 1, 2, …, 6.
The following identities give the conditions on the elements of the matrix which ensure that the offspring distribution is the same as the parental:
The above equations are satisfied if
Table 1 is an example that illustrates how Hardy–Weinberg frequencies can be maintained with nonrandom mating. The gene frequencies are 2/9, 3/9 and 4/9, and the genotype frequencies 4/81, 9/81, 16/81, 12/81, 16/81 and 24/81. Each element in the table is to be divided by 6561 to convert it to a fraction. Table 2 gives the corresponding matrix for random mating of couples.
Estimating Gene Frequencies
Race and Sanger (Reference Race and Sanger1975) comment on the value of estimating gene frequencies and cite Bernstein (Reference Bernstein1930) in relation to ABO; for example: ‘knowing the gene frequencies we can calculate the expected frequency of children of different groups, from any type of mating.’ (p. 12). Looking back over the 50 years preceding the 1975 edition of their work they write:
We sometimes wonder whether since 1911, or say 1925 to take in Bernstein, the only contributions of the first magnitude to the system [ABO] are to be found in the biochemical work on the ABH substances and in the work on the ‘Bombay’ phenomenon; and in the recognition of the cis phenomenon and perhaps, on a more practical level, the finding of specific agglutinins in extracts of seeds and snails. (Race & Sanger, Reference Race and Sanger1975, p. 15)
Presumably, Race and Sanger were not considering studies of associations between ABO phenotypes and diseases. Mueller and Young (Reference Mueller and Young1995, pp. 188−189) summarize the associations between types and duodenal and gastric ulcers, and investigations were done before 1975. Many other such studies have been done.
Hartl and Clark (Reference Hartl and Clark1989, pp. 40−42) describe a method for calculating gene frequencies from a set of ABO phenotypic frequencies. This uses the assumption that the population is in Hardy–Weinberg equilibrium so that the genotypic proportions are
These authors use the following sample counts from Mourant et al. (Reference Mourant, Kopec and Domaniewska-Sobczak1976):
O — 702; A — 862; B — 365; AB — 131 and give the estimated gene frequencies : A—˜ 0.2813; B — 0.1291; O — 0.5895.
Kempthorne (Reference Kempthorne1957, pp. 172−177) gives a different method, also iterative, for estimating gene frequencies. He uses the following phenotypic counts taken from Taylor and Prior (Reference Taylor and Prior1938):
O — 202; A — 179; B — 35; AB — 6, and gives estimated gene frequencies: A — 0.25156; B — 0.05001; O — 0.69843. He also gives standard errors of the estimates.
Both methods of estimating gene frequencies use the theoretical proportions of the Hardy–Weinberg distribution to split the A and B phenotypic counts into two parts. It is used also to calculate starting frequencies for the iterative methods. The method of Hartl and Clark uses gene counts to make repeated revisions of gene frequency estimates. The important point is the crucial role played by the Hardy–Weinberg assumption.
Discussion
The goal of Zhu and colleagues (Reference Zhu, Liang, Khan, Dong, Wan, Sun and Tian2020) was to explore the association between the ABO system and human longevity. They sampled 2201 centenarians (570 males, 1631 females) and a regionally matched control group of 2330 middle-aged individuals (793 males, 1537 females). They found no significant difference in ABO phenotypic frequencies between the two groups, so concluded that there was no effect of ABO on longevity. The A and B gene frequencies were each about 21%.
By contrast, the study of Groot et al. (Reference Groot, Villegas, Said, Lipsic, Karper and van der Harst2020) found the ABO blood group system to be associated with several parameters of healthy aging and disease development. The analysis was based on data of 406,755 unrelated individuals from the UK Biobank cohort. They summarized the result as follows:
In this large community-based population, we determined ABO blood group phenotypes based on inherited allelic combinations and observed numerous associations between the ABO blood group system with healthy aging and the development of a multitude of diseases. The ABO blood groups were primarily associated with cardiovascular outcomes. The present study observed that individuals with blood group A and B were at higher risk of developing thromboembolic diseases, but lower risk of hypertension, when compared with O-group individuals. Individuals with blood group A were at higher risk of developing hyperlipidemia, atherosclerosis, and heart failure compared with blood group O, whereas individuals with blood group B were at higher risk of myocardial infarction compared with individuals with blood group O. The observed differences suggest blood group-specific approaches for the maintenance of human health and the prevention and treatment of a multitude of diseases. (Groot et al., Reference Groot, Villegas, Said, Lipsic, Karper and van der Harst2020, p. 834)
The gene frequencies in the sample are A — 0.2718; B — 0.0698; O — 0.6583.
The above studies show that there is considerable interest in exploring the relation between the ABO system and disease and the management of disease. They may appear to be rather a blunt approach to understanding compared with, for example, the HapMap method described by Collins (Reference Collins2010, pp. 64−68).
Collins (Reference Collins2010, p. 28) makes a general statement about common diseases such as diabetes, heart disease and cancer, referring to them as polygenic, with the ‘power of each individual genetic risk factor is generally quite low’. For heart disease, this might appear to be not convincing to Groot et al. (Reference Groot, Villegas, Said, Lipsic, Karper and van der Harst2020).
Using the HapMap approach, Levinsson et al. (Reference Levinsson, Olin, Björck, Rosengren and Nyberg2014) sought to find which NOS variants were most strongly associated with cardiovascular pathology. They studied 560 CHD cases and 2791 controls using 58 SNPs. They found the strongest additive protective effect (OR 0.59) was related to rs3782218 of NOS1 (the T-allele). Could this be connected in some way with ABO?
Acknowledgments
The author would like to thank a reviewer for suggesting ways to improve the paper.