A study on the mapping of quantitative trait loci in advanced populations derived from two inbred lines

CHEN-HUNG KAO; MIAO-HUI ZENG

doi:10.1017/S0016672309000081

A study on the mapping of quantitative trait loci in advanced populations derived from two inbred lines

Published online by Cambridge University Press: 27 April 2009

CHEN-HUNG KAO and

MIAO-HUI ZENG

Show author details

CHEN-HUNG KAO*: Affiliation:
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China
MIAO-HUI ZENG: Affiliation:
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China
*: *Corresponding author: Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, Republic of China. Tel: 886-2-2783-5611 ext. 418 Fax. 886-2-2783-1523. e-mail: [email protected]

Article contents

Summary
Introduction
The genome structures of advanced populations
Methods
Simulation studies
Discussion
References

Rights & Permissions

Summary

In genetic and biological studies, the F2 population is one of the most popular and commonly used experimental populations mainly because it can be readily produced and its genome structure possesses several niceties that allow for productive investigation. These niceties include the equivalence between the proportion of recombinants and recombination rates, the capability of providing a complete set of three genotypes for every locus and an analytically attractive first-order Markovian property. Recently, there has been growing interest in using the progeny populations from F2 (advanced populations) because their genomes can be managed to meet specific purposes or can be used to enhance investigative studies. These advanced populations include recombinant inbred populations, advanced intercrossed populations, intermated recombinant inbred populations and immortalized F2 populations. Due to an increased number of meiosis cycles, the genomes of these advanced populations no longer possess the Markovian property and are relatively more complicated and different from the F2 genomes. Although issues related to quantitative trait locus (QTL) mapping using advanced populations have been well documented, still these advanced populations are often investigated in a manner similar to the way F2 populations are studied using a first-order Markovian assumption. Therefore, more efforts are needed to address the complexities of these advanced populations in more details. In this article, we attempt to tackle these issues by first modifying current methods developed under this Markovian assumption to propose an ad hoc method (the Markovian method) and explore its possible problems. We then consider the specific genome structures present in the advanced populations without invoking this assumption to propose a more adequate method (the non-Markovian method) for QTL mapping. Further, some QTL mapping properties related to the confounding problems that result from ignoring epistasis and to mapping closely linked QTL are derived and investigated across the different populations. Simulations show that the non-Markovian method outperforms the Markovian method, especially in the advanced populations subject to selfing. The results presented here may give some clues to the use of advanced populations for more powerful and precise QTL mapping.

Type: Paper
Information: Genetics Research , Volume 91 , Issue 2 , April 2009 , pp. 85 - 99

DOI: https://doi.org/10.1017/S0016672309000081 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

1. Introduction

Many quantitative trait loci (QTLs) detection experiments and statistical QTL mapping methods are conducted and developed on the basis of the backcross and F₂ populations. These two populations are popular mainly for economic reasons as they can be readily generated for use in experiments, thus saving time and money. Further, due to the fact that these populations undergo just a single cycle of meiosis, they have several significant features that make them attractive for general purpose genetic and biological studies (Lander & Botstein, Reference Lander and Botstein1989; Jansen, Reference Jansen1993; Zeng, Reference Zeng1994; Jiang & Zeng, Reference Jiang and Zeng1997; Kao et al., Reference Kao, Zeng and Teasdale1999; Xu, Reference Xu2007). For example, the recombination rate between different loci is equivalent to the proportion of the recombinants and their genomes have a first-order Markovian property in the two populations. Also, the progeny populations after F₂ (advanced populations) have been well devised and implemented in genetic studies. These advanced populations include recombinant inbred (RI) populations, advanced intercrossed (AI) populations, intermated recombinant inbred (IRI) populations and immortalized F₂ populations. For a review of these advanced populations, see e.g. Rockman & Kruglyak (Reference Rockman and Kruglyak2008).

These advanced populations have some very useful features in that their genomic structures allow investigators to achieve better performance in their studies. For example, the RI populations consist of nearly fixed genomes for multiple phenotyping and contain a specific genotype to increase the accuracy of assessment in studying quantitative traits (Lander & Botstein, Reference Lander and Botstein1989). Further, the AI populations can harbour more recombination events in a short chromosome segment for genetic fine mapping (Darvasi, Reference Darvasi1998). Also, the IRI and RIX (recombinant inbred intercrosses) populations can be managed to have both the advantages of RI and AI populations (Liu et al., Reference Liu, Kowalski, Lan, Feldmann and Paterson1996; Hua et al., Reference Hua, Xing, Xu, Sun, Yu and Zhang2002; Winkler et al., Reference Winkler, Jensen, Cooper, Podlich and Smith2003; Zou et al., Reference Zou, Gelfond, Airey, Lu, Manly, Williams and Threadgill2005).

The derivation of the RI populations or AI populations is obtained by recurrently selfing (inbreeding) or randomly intermating the F₂ individuals for several generations. The IRI populations are derived by first producing AI populations, followed by repeated selfing. The immortalized F₂ populations are obtained by first producing RI populations, followed by a generation of random mating. As a generation advances beyond F₂, either by further selfing or intermating, the advanced populations must undergo multiple cycles of meiosis, so that the crossovers will accumulate and the proportions of recombinants will increase in the populations (Haldane & Waddington, Reference Haldane and Waddington1931; Liu et al., Reference Liu, Kowalski, Lan, Feldmann and Paterson1996; Darvasi, Reference Darvasi1998; Winkler et al., Reference Winkler, Jensen, Cooper, Podlich and Smith2003). In the literature, it has been noted that the proportion of recombinants in RI populations can be twice that in the F₂ populations for closely linked loci, and that linkage is broken down even more rapidly by random intercrossing in the AI populations (Haldane & Wanddington, Reference Haldane and Waddington1931; Darvasi, Reference Darvasi1998). The increased number of recombinants provided by the advanced populations facilitates the construction of high-resolution genetic maps and detection of closely linked QTLs (Liu et al., Reference Liu, Kowalski, Lan, Feldmann and Paterson1996; Darvasi, Reference Darvasi1998). Further, cycles of inbreeding and/or random mating in a population will shape differences in the population genomic structures such as the homozygosity, genotypic frequencies and variance components (Weir, Reference Weir1996). As such, different advanced populations produce different genomic structures to be used for different breeding and study purposes (Liu et al., Reference Liu, Kowalski, Lan, Feldmann and Paterson1996; Hua et al., Reference Hua, Xing, Xu, Sun, Yu and Zhang2002; Winkler et al., Reference Winkler, Jensen, Cooper, Podlich and Smith2003; Broman, Reference Broman2005).

When using these advanced populations for QTL mapping, it should be noted that their genome structures no longer have a first-order Markovian property and have different genomic constitutions from that of the F₂ populations (Jiang & Zeng, Reference Jiang and Zeng1997). So far, most of the current QTL mapping methods and related mapping properties are developed and investigated for the genomes of backcross and F₂ populations with the Markovian property (Lander & Botstein, Reference Lander and Botstein1989; Jansen, Reference Jansen1993; Churchill & Doerge, Reference Churchill and Doerge1994; Zeng, Reference Zeng1994; Kao et al., Reference Kao, Zeng and Teasdale1999; Kao & Zeng, Reference Kao and Zeng2002; Kao, Reference Kao2004; Xu, Reference Xu2007). Although issues related to using advanced populations in QTL mapping have been raised (Jiang & Zeng, Reference Jiang and Zeng1997; Darvasi, Reference Darvasi1998; Martin & Hospital, Reference Martin and Hospital2006), they are still investigated by invoking this Markovian assumption. It is therefore desirable to consider the specific structures of these advanced populations for QTL detection, so that their advantages can be utilized to enhance QTL resolution. In this paper, detailed analyses and discussions related to these advanced populations will be given. When samples are drawn from the advanced populations, statistical methods are developed by considering and ignoring their specific population genome structures (without and with a first-order Markovian assumption) and are compared for use with the multiple-QTL model for use in QTL mapping studies. In addition, the QTL mapping properties across different advanced populations are derived and discussed. Simulation studies are performed for purposes of evaluation and comparison. The results show that the proposed methods can improve the resolution of the genetic architecture of quantitative traits and serve as a tool for studying QTL mapping in various advanced populations derived from two inbred lines.

2. The genome structures of advanced populations

We refer an AI (RI) F_t population as an AI (RI) population from intercrossing (selfing) the F₂ individuals for t−2, t>3, generations. An IRI F_i:j population is referred to as a population produced by first randomly intercrossing the F₂ individuals for i−2 generations, followed by j, j⩾1, cycles of selfing, and an IF₂ population denotes an immortalized F₂ population.

(i) Genome structure

In an F₂ population, the genotypic frequencies of P ₁ homozygote, heterozygote and P ₂ homozygote are 1/4, 1/2 and 1/4, respectively, for one locus, and the heterozygosity H _t is 0·5. The genotypic distribution for any two pairwise loci, say A and B, is also well known and characterized (see, for example, Kao & Zeng, Reference Kao and Zeng1997), and it has a simple relationship with the recombination rate between them (r). For example, the genotypic frequency of genotype AB/AB is (1−2r)²/4, and the other nine genotypic frequencies also have similar simple relationships with r (see, for example, Table 2 in Kao & Zeng, Reference Kao and Zeng1997). Also, the proportion of recombinants (R) between A and B is equivalent to the recombination rate, i.e. R=r, and the linkage parameter between A and B can be found to be λ=1−2r in the population. Besides, a very important and nice feature for the F₂ population is that the F₂ genomes have a first-order Markovian structure under the Haldane map function. This allows that the distribution of the multiple genes can be obtained from the distributions of pairwise genes. For example, the probability distribution of three ordered genes, A, B and C, can be derived from the probability distributions of first pairwise genes, A and B, and the second pairwise genes, B and C, i.e. P(ABC)=P(AB)×P(BC).

The heterozygosity for one locus in the RI F_t and IRI F_i:j populations are ${\textstyle{1 \over {2^{t \minus \setnum{1}}}}}$ and ${\textstyle{1 \over {2^{j \plus \setnum{1}}}}}$ , which is decreasing with t and j increasing, as selfing will increase the homozygotes at the expense of heterozygotes, and it is expected to be H _t=0·5 in the AI F _t and IF₂ population (RIX) populations for any t due to random mating. Also, during the process of further meiosis, crossovers will accumulate so that the proportion of recombinants will be increasing and becoming larger than the recombination rate (R>r), and the linkage disequilibrium coefficient will decrease. To generally formulate these genetic parameters, we adopt the notations in Haldane & Waddington (Reference Haldane and Waddington1931) to define C as the frequencies of AB/AB and ab/ab genotypes, D as the frequencies of Ab/Ab and aB/aB genotypes, E as the frequencies of AB/Ab, AB/aB, Ab/ab and aB/ab genotypes, F as the frequency of AB/ab genotype and G as the frequency of Ab/aB genoytype, respectively, for any two loci A and B, and they in terms of C, D, E, F, and G are

$\openup5\eqalign{\tab H \equals 2E \plus F \plus G\comma \quad R \equals 2D \plus 2E \plus G\quad {\rm and}\cr\tab D_{{\rm AB}} \equals \lpar C \plus E \plus \textstyle{1 \over 2}F\rpar \minus {1 \over 4}\comma$

respectively, in any advanced population. In the F₂ population, the frequencies C, D, E, F, G in terms of r are C=(1−r)²/4, D=(1−2r)/4, E=r(1−r)/2, F=(1−r)²/2 and G=r ²/2, which have simple relations with r, and H=1/2, R=r and D _AB=(1−2r)/4. In advanced populations, these values in terms of r become relatively complicated and will vary with different t, i and j, and they can be obtained without difficulty (Jennings, Reference Jennings1916; Robbins, Reference Robbins1918; Haldane & Waddington, Reference Haldane and Waddington1931; Winkler et al., Reference Winkler, Jensen, Cooper, Podlich and Smith2003). The more important and challenging parts in this QTL mapping context under the framework of interval mapping procedure are to characterize the genotypic distributions of three loci for various advanced populations, whose genomes do not have a first-order Markovian property.

3. Methods

(i) Data structure

Consider a sample of size n from an advanced population, such as AI, RI, IRI or IF₂ population, derived from two inbred lines. The n individuals are genotyped for markers (X _i, i=1, 2, …, n) and phenotyped for traits (y _i's, i=1, 2, …, n). When such a sample is used to detect QTL, two approaches under the framework of the interval mapping procedure are proposed here. The approach developed under the Markovian assumption will be hereinafter called the Markovian method, and the approach developed without the Markovian assumption will be hereinafter referred to as the non-Markovian method.

(ii) Genetic model and variance components

Consider that a trait is controlled by m QTLs, Q₁, Q₂, …, Q_m, and there are 3^m possible QTL genotypes. For any individual i, its QTL genotype belongs to one of the 3^m genotypes, and the corresponding genotypic values, G _i's, can be expressed as

(1)

$\openup-1\eqalign{ G_{i} \equals \tab \mu \plus \mathop\sum\limits_{j \equals \setnum{1}}^{m} \,a_{j} x_{ij}{\hskip-4\vskip-1 \ast\vskip1 } \plus \mathop\sum\limits_{j \equals \setnum{1}}^{m} \,d_{j} z_{ij}{\hskip-4\vskip-1 \ast\vskip1 } \plus \mathop\sum\limits_{j \lt k} \,\lpar i_{aa} \rpar _{jk} \lpar x_{ij}{\hskip-4\vskip-1 \ast\vskip1 } x_{ik}{\hskip-4\vskip-1 \ast\vskip1 } \rpar \cr \tab \plus \mathop\sum\limits_{j \lt k} \,\lpar i_{ad} \rpar _{jk} \lpar x_{ij}{\hskip-4\vskip-1 \ast\vskip1 } z_{ik}{\hskip-4\vskip-1 \ast\vskip1 } \rpar \plus \mathop\sum\limits_{j \lt k} \,\lpar i_{da} \rpar _{jk} \lpar z_{ij}{\hskip-4\vskip-1 \ast\vskip1 } x_{ik}{\hskip-4\vskip-1 \ast\vskip1 } \rpar\cr \tab \plus \mathop\sum\limits_{j \lt k} \,\lpar i_{dd} \rpar _{jk} \lpar z_{ij}{\hskip-4\vskip-1 \ast\vskip1 } z_{ik}{\hskip-4\vskip-1 \ast\vskip1 } \rpar \comma \cr}$

where μ is the intercept, a _j and d _j are the additive and dominance effects of Q_j, j=1, 2, …, m, and (i _aa)_jk, (i_ad)_jk, (i_da)_jk and (i_dd)_jk are additive×additive, additive×dominance, dominance×additive, and dominance×dominance interaction effects between Q_j and Q_k. The variables, x _ij* and Z _ij*, associated with a _j and d _j are coded as (1,−1/2), (0,1/2) and (−1,−1/2) for genotypes Q _jQ_j, Q _jq_j and q _jq_j, respectively, according to Cockerham's model (Kao & Zeng, Reference Kao and Zeng2002). Under the genetic model (1), the genetic variances of a quantitative trait can be generally decomposed into 2m ² variances and 2m ⁴ – m ² covariances. In practice, the variance component structure will be simpler in the advanced populations as some covariances vanish due to equal frequencies of the two alleles at any locus. Taking m=2 as an example, the genetic variance components are

(2)

$\hskip-4\eqalign{ V_{G} \equals \tab 2\lpar C \plus D \plus E\rpar \lpar a_{\setnum{1}}^{\setnum{2}} \plus a_{\setnum{2}}^{\setnum{2}} \rpar \plus {\textstyle{1 \over 4}}\lsqb 1 \minus \lpar 1 \minus 4E \minus 2F \minus 2G\rpar ^{\setnum{2}} \rsqb \cr\tab\times\lpar d_{\setnum{1}}^{\setnum{2}} \plus d_{\setnum{2}}^{\setnum{2}} \rpar \hskip-1\plus 2\lsqb C \hskip-1\plus\hskip-1 D \minus 2\lpar C \minus D\rpar ^{\setnum{2}} \rsqb i_{aa}^{\setnum{2}}\hskip-1 \plus\hskip-1 4\lpar C \minus D\rpar a_{\setnum{1}} a_{\setnum{2}} \cr\tab\plus {\textstyle{{1} \over {2}}}\lpar C \plus D \plus E\rpar \lpar i_{ad}^{\setnum{2}} \plus i_{da}^{\setnum{2}} \rpar\plus \textstyle{1 \over {16}}\lsqb 1 \minus \lpar 1 \minus 8E\rpar ^{\setnum{2}} \rsqb i_{dd}^{\setnum{2}} \cr\tab\plus {\textstyle{1 \over 4}}\lsqb 1 \minus 8E \minus \lpar 1 \minus 4E \minus 2F \minus 2G\rpar ^{\setnum{2}} \rsqb d_{\setnum{1}} d_{\setnum{2}} \cr \tab \minus \lpar C \minus D\rpar \lpar 4E \plus 2F \plus 2G\rpar \lpar d_{\setnum{1}} i_{aa} \plus d_{\setnum{2}} i_{aa} \rpar \cr\tab\plus \lpar E \minus C \minus D\rpar \lpar a_{\setnum{1}} i_{ad} \plus a_{\setnum{2}} i_{da} \rpar \minus \lpar C \minus D\rpar \lpar a_{\setnum{2}} i_{ad} \plus a_{\setnum{1}} i_{da} \rpar\cr \tab \plus {\textstyle{1 \over 2}}\lpar C \minus D\rpar i_{ad} i_{da} \minus 4E\lpar C \minus D\rpar i_{aa} i_{dd} \cr\tab\minus E\lpar 1 \minus 4E \minus 2F \minus 2G\rpar \lpar d_{\setnum{1}} i_{dd} \plus d_{\setnum{2}} i_{dd} \rpar. \cr}$

The component structures allow us to investigate some QTL mapping properties. For example, the additive (dominance) variances are found to increase (decrease) in the RI or IRI population, showing that these populations may facilitate (hinder) the estimation of the additive (dominance) effects (Kao, Reference Kao2006). Also, the possible confounding problems in QTL estimation may be identified from the covariances between genetic effects (Kao & Zeng, Reference Kao and Zeng2002; Kao, Reference Kao2006). If the two-locus model is expressed as a model of 15 parameters to distinguish each allelic effect, the genetic variance becomes even more complicated (Weir & Cockerham, Reference Weir, Cockerham, Pollak, Kempthorne and Bailey1977).

(iii) Markovian and non-Markovian methods

With the genetic model in eqn (1), the statistical model to relate a quantitative trait value, y, to the genotypic value, G, contributed from the m QTLs at positions, p₁, p₂, …, and p_m can be written as

(3)

$y_{i} \equals G_{i} \plus \epsi_{i} \comma$

where ε_i is the environmental deviation and assumed to follow normal distribution with mean zero and variance σ². In QTL mapping, the QTLs are usually assumed be located in the intervals and need to be estimated, so that the 3^m genotypes, (x _ij* and z _ij*), may not be observed, and the model becomes a normal mixture model. For n individuals, the likelihood function for θ can be generally expressed as

(4)

$L\lpar \theta \, \vert \, {\bf Y}\comma {\bf X}\rpar \equals \prod\limits_{i \equals \setnum{1}}^{n} \,\left[ {\mathop{\sum}\limits_{j \equals \setnum{1}}^{\setnum{3}^{m} } \,p_{ij} N\lpar \mu _{j} \comma \sigma ^{\setnum{2}} \rpar } \right]\comma$

where the mixing proportions, p _ij's, j=1, 2, …, 3^m, are the conditional probabilities of the putative QTL genotypes given marker genotypes, and μ_j's, j=1, 2, …, 3^m, correspond to the genotypic values of the 3^m different QTL genotypes. Using the interval mapping procedure (Lander & Botstein, Reference Lander and Botstein1989), the conditional probabilities can be predetermined by successively and jointly using the flanking markers of the putative QTL; hence they need not to be estimated. The parameters θ involved in the statistical estimation of the normal mixture model are μ, σ², a _i's, d _i's, i _aa's, i _ad's, i _da's and i _dd's. Especially, it should be pointed out that the derivation of the conditional probabilities for each putative QTL using its flanking markers is not straightforward in the advanced populations as has been done for the F₂ and backcross populations (see below). When m putative QTLs are considered at a time, the joint conditional probability is approximated by the product of m individual conditional probabilities. In the following, we propose two QTL mapping methods for the advanced populations under eqn (3). The one using the conditional probabilities derived from a first-order Markovian assumption as the mixing proportions will be called the Markovian method hereafter, and the other using the conditional probabilities obtained without this assumption (by using the proposed transition equations) as mixing proportions will be called the non-Markovian method hereafter.

(iv) Conditional probabilities of the putative QTL genotypes

The interval mapping approach intends to compute the conditional probabilities of a putative QTL by using the information from its two flanking markers. Set M with alleles M and m, Q with alleles Q and q and N with alleles N and n, where Q is the putative QTL, and M and N are the flanking markers, and assume that r, r ₁ and r ₂ are the recombination rates between M and N, between M and Q and between Q and N. To derive the conditional probability of the QTL genotype within the flanking marker genotype, P(Q | M, N)=P(MQN)/P(MN), for a population, both the genotypic distributions of two and three genes under generations of selfing or/and random mating are needed. The genotypic distribution of two genes, P(MN), under random mating and self has been very well known (Jennings, Reference Jennings1916; Robbins, Reference Robbins1918; Haldane & Waddington, Reference Haldane and Waddington1931). For the F₂ population, the derivation of the genotypic distribution for three genes, P(MQN), is simple and can be obtained by using the probabilities of two adjacent pairwise genes, P(MQ) and P(QN), as its genomes have a first-order Markovian property. That is, P(MQN)=P(M)P(Q | M)P(N | Q, M)=P(M)P(Q | M)P(N | Q), as P(N | Q, M)=P(N | Q). However, for advanced populations, this Markovian property disappears so that the genotypic distribution of three genes cannot be obtained directly from the distributions of two genes, i.e. by simply replacing the recombination rates (r ₁, r ₂ and r) by frequencies of recombinants (R ₁, R ₂ and R) as suggested by Jiang & Zeng (Reference Jiang and Zeng1997) and Lynch & Walsh (Reference Lynch and Walsh1998). For example, it is suggested to approximate the two conditional gametic frequencies by Pr(Mqn | Mn)≈R ₁(1−R ₂)/R and Pr(MQn | Mn)≈(1−R ₁)R ₂/R in an advanced population. Such a replacing implicitly assumes that the genomes of the advanced populations still have a first-order Markovian property and, therefore, the obtained frequencies are approximate. Another obvious yet often unnoticed problem for this replacing is that the sum of the approximate probabilities may not be equal to one as the Haldane map function does not hold for the R (R≠R ₁+R ₂−2R ₁R ₂) in the advanced populations. Appropriate correction is needed when using these approximate probabilities. In this article, correction will be made by dividing the approximate probabilities by their sum. The derivation of the exact genotypic distribution for three genes needs more delicate considerations as provided below.

The derivation of the genotypic frequencies of three genes for the advanced populations needs to consider two different types of mating systems: random mating and selfing. When mating is random, the frequency of a zygotic genotype is the product of two gametic frequencies in the previous population, and the focus is on deriving the transition equations for the frequencies of eight different gametic types from generation to generation. For example, in AI F_t, the probability of MQN (mqn) gamete, P _1,t, can be generally obtained as

(5)

$\eqalign{ P_{\setnum{1}\comma t} \equals \tab \lsqb \lpar 1 \minus r_{\setnum{1}} \rpar \lpar 1 \minus r_{\setnum{2}} \rpar \plus 1\rsqb P_{\setnum{1}\comma t \minus \setnum{1}}^{\setnum{2}} \plus r_{\setnum{1}} r_{\setnum{2}} P_{\setnum{2}\comma t \minus \setnum{1}}^{\setnum{2}} \cr \tab\plus \lsqb \lpar 1 \minus r_{\setnum{1}} \rpar r_{\setnum{2}} \rsqb P_{\setnum{3}\comma t \minus \setnum{1}}^{\setnum{2}} \plus \lsqb r_{\setnum{1}} \lpar 1 \minus r_{\setnum{2}} \rpar \rsqb P_{\setnum{4}\comma t \minus \setnum{1}}^{\setnum{2}}\cr \tab \plus \lsqb 1 \plus \lpar 1 \minus r_{\setnum{1}} \rpar \lpar 1 \minus r_{\setnum{2}} \rpar \plus r_{\setnum{1}} r_{\setnum{2}} \rsqb P_{\setnum{1}\comma t \minus \setnum{1}} P_{\setnum{2}\comma t \minus \setnum{1}} \cr \tab \plus \lpar 2 \minus r_{\setnum{1}} \rpar P_{\setnum{1}\comma t \minus \setnum{1}} P_{\setnum{3}\comma t \minus \setnum{1}} \plus \lpar 2 \minus r_{\setnum{2}} \rpar P_{\setnum{1}\comma t} P_{\setnum{4}\comma t \minus \setnum{1}} \cr \tab \plus r_{\setnum{2}} P_{\setnum{2}\comma t \minus \setnum{1}} P_{\setnum{3}\comma t \minus \setnum{1}} \plus r_{\setnum{1}} P_{\setnum{2}\comma t} P_{\setnum{4}\comma t \minus \setnum{1}} \cr \tab \plus \lsqb r_{\setnum{1}} \lpar 1 \minus r_{\setnum{2}} \rpar \plus r_{\setnum{2}} \lpar 1 \minus r_{\setnum{1}} \rpar \rsqb P_{\setnum{3}\comma t \minus \setnum{1}} P_{\setnum{4}\comma t \minus \setnum{1}} \comma \cr}$

where P _2,t–1 is the frequency of MqN (mQn) gamete, P _3,t–1 is the frequency of MQn (mqN) gamete, and P _4,t–1 is the frequency of mQN (Mqn) gamete in the previous population. An alternative iteration equation for P _1,t can be derived by using Geiringer's formulation (Reference Geiringer1944). If the population is self-fertilized, the gametes of an individual are randomly mating within the individual and are not allowed to seminate the gametes from different individuals, and the focus is on deriving the transition equations for the frequencies of 36 different zygotes from generation to generation. For example, in RI F_t population, the probability of ${{MQN} \over {MQN}}$ zygote is

(6)

$\eqalign{ P_{t} \left( {{{MQN} \over {MQN}}} \right) \equals \tab P_{t \minus \setnum{1}} \left( {{{MQN} \over {MQN}}} \right) \plus {1 \over 4}\left[ {P_{t \minus \setnum{1}} \left( {{{MQN} \over {MqN}}} \right) \plus P_{t \minus \setnum{1}} \left( {{{MQN} \over {MQn}}} \right) \plus P_{t \minus \setnum{1}} \left( {{{MQN} \over {mQN}}} \right)} \right] \cr \tab \plus {{\lpar 1 \minus r_{\setnum{2}} \rpar ^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MQN} \over {Mqn}}} \right) \plus {{r_{\setnum{2}}^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MqN} \over {MQn}}} \right) \plus {{r_{\setnum{1}}^{\setnum{2}} r_{\setnum{2}}^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MqN} \over {mQn}}} \right) \cr \tab \plus {{\lpar 1 \minus r_{\setnum{1}} \rpar ^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MQN} \over {mqN}}} \right) \plus {{r_{\setnum{1}}^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MqN} \over {mQN}}} \right) \plus {{\lsqb r_{\setnum{1}} \lpar 1 \minus r_{\setnum{2}} \rpar ^{\setnum{2}} \rsqb } \over 4}P_{t \minus \setnum{1}} \left( {{{Mqn} \over {mQN}}} \right) \cr \tab \plus {{\lsqb \lpar 1 \minus r_{\setnum{1}} \rpar \lpar 1 \minus r_{\setnum{2}} \rpar \plus r_{\setnum{1}} r_{\setnum{2}} \rsqb ^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MQN} \over {mQn}}} \right) \plus {{\lsqb \lpar 1 \minus r_{\setnum{1}} \rpar r_{\setnum{2}} \rsqb ^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{mqN} \over {MQn}}} \right) \cr \tab \plus {{\lsqb r_{\setnum{1}} \lpar 1 \minus r_{\setnum{2}} \rpar \plus r_{\setnum{2}} \lpar 1 \minus r_{\setnum{1}} \rpar \rsqb ^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MQn} \over {mQN}}} \right) \plus {{\lsqb \lpar 1 \minus r_{\setnum{1}} \rpar \lpar 1 \minus r_{\setnum{2}} \rpar \rsqb ^{\setnum{2}} } \over 4}P_{t \minus \setnum{1}} \left( {{{MQN} \over {mqn}}} \right). \cr}$

Similarly, the other transition equations for the three gamete frequencies under random mating and for the 35 zygote frequencies under selfing can be obtained (see Supplementary material). By jointly using these transition equations, it is sufficient to obtain the gamete or genotypic frequencies to calculate all conditional probabilities for various fixed and unfixed advanced populations subject to different cycles of random mating and/or self. Teuscher & Broman (Reference Teuscher and Broman2007) developed an alternative technique by solving a set of linear equations to obtain the unknown tri-genic haplotype (gametic) probabilities for fixed RIL populations.

The differences between the conditional probabilities of QTL genotypes given marker genotypes obtained with and without a first-order Markovian assumption can be very significant and in turn can have a substantial impact on QTL mapping (see below). Numerical investigation of their differences for QQ, Qq and qq genotypes given the marker genotype MN/MN for the case of r ₁=r ₂=0·1 in AI F _t, RI F _t, IRI F _10,t and RIX F _10,t populations is shown in Figs 1 a–d for illustration. For AI F _t populations, the differences are generally very minor (the differences are within ~0·01; see Figure 1(a)). All three curves are below zero, implying that the probabilities of QTL genotypes are underestimated by the Markovian assumption. The differences between the conditional probabilities become more significant (between ~−0·06 and 0·07; see Figure 1(b)) in RI F _t populations as compared with those in the AI F _t populations. Such differences are increasing at the first few generations of selfing and become stable on proceeding further. For IRI F _10,t populations, the differences are very significant (between ~−0·2 and 0·4) and increase as the selfing cycle increases. For RIX F _10,t populations, the differences are greatly reduced by intercrossing. In general, persistent selfing tends to enlarge their differences, and continuous intercrossing eventually mitigates their differences. The method with the Markovian assumption also overestimates the frequency of Qq and underestimates the other two frequencies during selfing. The sums of the three conditional probabilities are about 0·962–0·980, 0·977–0·995, 0·976–0·991 and 0·964–0·980, respectively, in the RI, AI, IRI and RIX populations. Figures 2 a–d show the numerical differences in conditional probability for QQ, Qq and qq genotypes given the marker genotype MN/Mn. More significant differences are observed in the Mn/Mn class, and the sum of the conditional probabilities may be up to 1·125 (not shown). Therefore, it is important to compute the correct conditional probabilities of the putative QTL genotypes, as they serve as the mixing proportions of the normal mixture model in QTL mapping. The problem of using incorrect (approximate) conditional probabilities of QTL genotypes includes the loss of power and precision in QTL detection as mentioned by Martin & Hospital (Reference Martin and Hospital2006) and shown in this paper (see the Simulation study section).

Fig. 1. The differences between the conditional probabilities of QQ, Qq and qq genotypes given the flanking marker genotype MN/MN obtained by using the Markovian and non-Markovian methods for the case of r ₁=0·1 and r ₂=0·1 in the AI, RI, IRI and RIX populations. The curve below zero implies that the probabilities of QTL genotypes are underestimated by using the Markovian method. (a) AI populations. (b) RI populations. (c) IRI F_10,t populations. (d) RIX F_10,t populations.

Fig. 2. The differences between the conditional probabilities of QQ, Qq and qq genotypes given the flanking marker genotype MN/Mn obtained by using the Markovian and non-Markovian methods for the case of r ₁=0·1 and r ₂=0·1 in the AI, RI, IRI and RIX populations. The curve below zero implies that the probabilities of QTL genotypes are underestimated by using the Markovian method. (a) AI populations. (b) RI populations. (c) IRI F_10,t populations. (d) RIX F_10,t populations.

(v) Maximum likelihood estimation

In parameter estimation, it is straightforward to treat the normal mixture model in eqn (4) as an incomplete-data problem by regarding the trait, Y, and markers, X, as observed data and the putative QTLs, x _ij*'s and z _ij*'s, as missing data, then the EM algorithm (Dempster et al., Reference Dempster, Larid and Rubin1977) can be readily implemented to obtain their maximum likelihood estimates (MLEs). Alternatively, the marker genotypes and the unknown QTL genotypes can be treated as the observed state and hidden state in the set-up of the hidden Markov model (HMM; Koski, Reference Koski2001) under the Markovian assumption along the genome. The EM algorithm is an iterated procedure and, in each iteration, it consists of an expectation step (E-step), followed by a maximization step (M-step). When applying the EM algorithm, the general formulae devised by Kao & Zeng (Reference Kao and Zeng1997) can be implemented to obtain the MLE applied here. The E-step is to compute the posterior probabilities of 3^m QTL genotypes. In M-step, the coded variables associated with the m QTLs in all the 3^m possible genotypic values are assigned to the elements of genetic design matrix. The E- and M-steps are iterated until convergence, and the converged values are the MLEs.

(vi) QTL mapping properties

To investigate and explore QTL mapping properties across populations, without loss of generality, assume that the quantitative trait is affected by the two linked epistatic QTLs, Q_A and Q_B, with complete effects. We consider the scenarios of using Q_A only and of using both Q_A and Q_B in the quantitative trait analysis. If the quantitative trait is regressed on Q_A only, the regression coefficient for the additive effect of Q_A is

(7)

$\eqalign{a_{\rm A} \tab \equals}\tab a_{\setnum{1}} \plus {{C \minus D} \over {C \plus D \plus E}}a_{\setnum{2}} \plus {{E \minus \lpar C \plus D\rpar } \over {2\lpar C \plus D \plus E\rpar }}i_{ad} \cr\tab\quad{\hskip15\minus {{C \minus D} \over {2\lpar C \plus D \plus E\rpar }}i_{da} \comma}$

in an advanced population. Similarly, the regression coefficient for the dominance effect of Q_A, d _A, and the partial regression coefficient for the additive (dominance) effect of Q_A given the additive (dominance) effect of Q_B, a _{A.B_a} (d _{A.B_a}), can be derived and their components are shown in Table 1. By analysing the coefficients, it is possible to decompose the regression coefficient into components and to trace the changes of these components for identifying the confounding problems as the population advances. Taking Eqn (7) as an example, under selfing, the coefficient associated with a ₂ (i _da) is positive (negative) and decreasing (increasing) from 1−2r(−(1−2r)/2) to ${\textstyle{{1 \minus 2r} \over {1 \plus 2r}}}$ $\big( { \minus {\textstyle{{1 \minus 2r} \over {2\lpar 1 \plus 2r\rpar }}}} \big)$ , and the coefficient associated with i _ad is negative and decreasing from −(1−2r)²/2 to −1/2, as generation proceeds (t increases). For t→∞ under self, $a_{\rm A} \equals a_{\setnum{1}} \plus {\textstyle{{1 \minus 2r} \over {1 \plus 2r}}}a_{\setnum{2}} \minus {\textstyle{{i_{ad} } \over 2}} \minus {\textstyle{{1 \minus 2r} \over {2\lpar 1 \plus 2r\rpar }}}i_{da}$ . If mating is random, the coefficient can be generally expressed as $a_{\rm A} \equals a_{\setnum{1}} \plus \lpar 1 \minus 2r\rpar <$> <$>\lpar 1 \minus r\rpar ^{t} a_{\setnum{2}} \minus {\textstyle{1 \over 2}}\lpar 1 \minus 2r\rpar ^{\setnum{2}} \lpar 1 \minus r\rpar ^{\setnum{2}t} i_{ad} \minus {\textstyle{1 \over 2}}\lpar 1 \minus 2r\rpar <$> <$>\lpar 1 \minus r\rpar ^{t} a_{\setnum{2}}$ . The coefficients associated with a ₂, i _ad and i _da approach to zero as t→∞. Such analyses make it possible to clearly identify how the different genotypes and effects play a role in the confounding problem across populations. In general, the confounding problem generally becomes less severe as the generation proceeds under random mating. Under selfing, the confounding of i _ad becomes more severe and the confounding of i _da becomes less severe in the estimation of additive effects of Q_A as generation proceeds. The confounding of the i _dd becomes more severe, and i _aa will be always confounded in the estimation of the dominance effects as generation proceeds by selfing.

Table 1. The components of the regression coefficient and partial regression coefficient

Assume that the quantitative trait is controlled by two QTLs, Q_A and Q_B. a ₁ and d ₁ (a ₂ and d ₂) are the additive and dominance effects of Q_A (Q_B). i _aa, i _ad, i _da and i _dd are their epistatic effects.

a _A (d _A) is the regression coefficient for the additive (dominance) effect of Q_A, and $a_{{\rm A}.{\rm B}_{\rm a} }$ ( $d_{{\rm A}.{\rm B}_{\rm d} }$ ) is the partial regression coefficient for the additive (dominance) effect of Q_A given the additive (dominance) effect of Q_B.

(vii) Power of separating closely linked QTL

To simplify the discussion, we first consider that two linked QTLs with additive effects, a ₁ and a ₂, only are located at known markers; then the QTL mapping model in eqn (3) reduces to a regression model fitting two correlated variables, x _i1* and x _i2*. As derived above, the correlation between x _i1* and x _i2* is equivalent to the linkage parameter between the two QTLs, λ=(C−D)/(C+D+E), which can be interpreted as a measure of the difference between the recombinant (D) and non-recombinant proportions (C) in a population. We can expect that the linkage parameters will decrease for farther genes or in later populations as there are more recombinants and less non-recombinants in either case. In a statistical modelling, fitting correlated variables into the model will raise the problems of collinearity, e.g. inflated variances of â ₁ and â ₂, in estimation and testing (Marquardt, 1970), leading to the difficulty in obtaining simultaneously significant tests for QTL effects (successful separation of linked QTLs). For example, in the AI F _t population (under the process of random mating), C+D+E=1/4 and C−D=(1–2r′)/4, where r′=[1−(1−2r)(1−r)^t−2]/2, so that λ=1−2r′ is decreasing with t, and the decreasing rate of λ is 1−r for each generation of random mating. Under self, λ is also decreasing, but with a much lower rate. In RIL, λ=(1−2r)/(1+2r), which is smaller than (1−2r) in the F₂. In general, the linkage parameter is decreasing and the collinearity problem can be eased in the advanced population. As a consequence, the separation of closely linked QTLs can be more powerful by using the sample from the advanced population, especially from the population subject to several cycles of random mating.

4. Simulation studies

Simulations were conducted to evaluate the performances of the non-Markovian and Markovian methods, to validate the derived mapping properties and to compare relative efficiencies of using different advanced populations in QTL mapping. A large set of fixed and unfixed populations, including RI, AI, IRI and IF₂ populations, was simulated as they are very popular in biological studies (Lee et al., Reference Lee, Sharopova, Beavis, Grant, Katt, Blair and Hallauer2002; Rockman & Kruglyak, Reference Rockman and Kruglyak2008). For RI and AI populations, F ₃, F ₄, F ₅ and F ₁₀ populations were simulated. For IRI and RIX populations, IRI F _5:1, F _5:3 and IF₂ populations were simulated. For each population, two linked epistatic QTLs, Q_A ans Q_B, with complete effects a ₁=2, d ₁=2, a ₂=2, d ₂=2, i _aa=2, i _da=2 and i _dd=2 are considered, and the heritability is assumed to be 0·05 (defined in the F₂ population under the Cockerham model by Kao & Zeng, Reference Kao and Zeng2002). With such parameter settings, the total genetic variance and environmental variance are 6·32 and 120·88, respectively, and the genetic variances contributed by the marginal effects and epistatic effects and genetic covariance are 3, 2·227 and −1·865, respectively. The positions of the two QTLs were assumed to be 30 cM apart and located at 25 and 55 cM along one 100 cM chromosome. Two marker maps are considered. The first map assumes 11 equally spaced markers (the sparse map hereinafter), and the second map assumes 19 markers placed at 0, 10, 15, 20, 24, 27, 30, 35, 40, 45, 50, 54, 57, 60, 65, 70, 80, 90 and 100 cM (the dense map hereinafter). The sample size is 1000 and the number of simulated replicates is 100 for each setting. The applied mapping models are all two-QTL models with different fixed numbers of effects. Except for RI F ₁₀ population (RIL), the mapping models applied to QTL detection include the eight-effect (complete-effect) model, the five-effect model (with a ₁, d ₁, a ₂, d ₂ and i _aa) and the four-effect model (with a ₁, d ₁, a ₂ and d ₂). For RIL, the three-effect model with epistasis (with a ₁, a ₂ and i _aa) and the two-effect model without epistasis (with a ₁ and a ₂) are applied to the analysis as RIL has very few heterozygotes and low power to detect dominance components. These models are applied to a two-dimensional grid search on the chromosome for QTL. At the positions with maximum value of the likelihood function, we test the significance of the first (second) QTL given the second (first) QTL by testing its main and epistatic effects jointly. For example, given the second (first) QTL, the hypothesis H ₀: a ₁=d ₁=i _aa=i _ad=i _da=i _dd=0 (H ₀: a ₂=d ₂=i _aa=i _ad=i _da=i _dd=0) is tested for the existence of the first (second) QTL at the positions if the complete-effect model is used. Similarly, if the five-effect (four-effect) model is used, the hypothesis H0: a ₁=d ₁=i _aa=0 (H ₀: a ₁=d ₁=0) is tested for the existence of the first QTL given the second QTL. If both the LRT statistics are larger than the specified critical values at 5% level, a successful detection of the two QTLs (separation of the two linked QTLs) is declared at the tested positions, and the corresponding estimated effects are reported as the MLE of the effects. In QTL mapping, the issue of determining the critical value for declaring QTL detection has been very complicated, and several methods have been suggested to determine the critical value (see for a review, Zou & Zeng, Reference Zou and Zeng2008). Here, the critical values are evaluated using the quick method of Piepho (Reference Piepho2001) as this method can handle a wide variety of experimental designs, such as the AI, RI, IRI and IF₂ populations considered here.

The non-Markovian method obviously performs better than the Markovian method in the populations subject to self, such as RI and IRI populations. For AI and RIX populations, the two methods have similar powers, but the non-Markovian method provides more precise and accurate estimates for the positions and effects. To condense tables, only the results under the sparse map are tabulated in Tables 2–4, and those under the dense map are not tabulated, but expounded in the context. Table 2 shows the QTL mapping results under the sparse map in the RI populations. For the case of the sparse (dense) map, by applying the complete-effect model to QTL detection, the powers of separation in the RI F₃, F₄ and F₅ populations are 0·39 (0·18), 0·23 (0·05) and 0·10 (0·11), respectively, by the non-Markovian approach, and they are 0·29 (0·19), 0·16 (0·03) and 0·04 (0·13), respectively, by the Markovian approach. The complete-effect model becomes less powerful in the later RI populations due to loss of heterozygotes. When epistasis is completely ignored by applying the four-effect model to the analysis, the powers are lower than those by the complete-effect models. The powers by the non-Markovian method are 0·09 (0·00), 0·02 (0·03) and 0·04 (0·11) for the three populations, respectively, and they are 0·10 (0·01), 0·03 (0·04) and 0·07 (0·14), respectively, by the Markovian method under the sparse (dense) map. When applying the five-effect model by considering i _aa to QTL detection, the powers by the non-Markovian method are 0·55 (0·67), 0·73 (0·84) and 0·68 (0·98), respectively, and they are 0·57 (0·66), 0·74 (0·86) and 0·67 (0·99), respectively, by the Markovian method under the sparse (dense) map. In parameter estimation, for all models, the estimates of positions and effects obtained by the non-Markovian method have a better precision as compared with those by the Markovian method. For example, in the RI F₄ population under the sparse map, the means of the estimated Q_A and Q_B positions for the five-effect model are 25·36 (SD 5·94) and 54·90 (SD 5·74), respectively, by the non-Markovian method, and they are 26·08 (SD 5·98) and 56·70 (SD 5·67), respectively, by the Markovian method. The five-effect model by taking i _aa into account tends to be more powerful and precise than the other two models, and this model becomes more powerful in the later RI populations. For the RI F₁₀ population (RIL), when using the three-effect model, the powers of the non-Markovian (Markovian) method are 93% (94%) and 98% (97%) in the two maps. When using the two-effect model, the powers reduce dramatically to 5% (5%) and 8% (7%), respectively. This shows that the power to detect QTL can be greatly enhanced by taking i _aa into account in RIL. Confounding problems occur in the estimation of the effects if epistatic effects are not completely taken into account. For example, the means of the estimated a ₁, a ₂ and i _aa by the non-Markovian method are 1·031 (SD 0·388), 1·032 (SD 0·402) and 1·965 (SD 0·375), respectively (the predicted values by Table 1 are 1, 1 and 2) for RIL, under the dense map. It is interesting to compare these results with those in the F₂ population. The powers in the F₂ population are 0·42 (0·36), 0·21 (0·43) and 0·03 (0·05) for the complete-effect, five-effect and four-effect models, respectively, under the sparse (dense) map (Table 4). The more powerful performance of using the RI populations occurs only for the five-effect model and does not occur for the other two models.

Table 2. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in the RI populations

A total of 100 replicates, each with sample size 1000, were analysed with two linked epistatic QTLs, Q_A and Q_B. The heritability is 0·05 in the F₂ population. The critical values are determined by Piepho's method. P1 (P2): position of Q_A (Q_B). For reducing the text, standard deviations (SD; numbers in parentheses) are only shown for the complete-effect mode. The reduced models usually show similar or larger SD. SD are smaller in RIL as compared with the RI F3.

^a 8e/8a indicates the eight-effect model with the non-Markovian/Markovian method.

Table 3. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in the different AI populations

For reducing the text, SD (numbers in parentheses) are only shown for the complete-effect mode in AI F3 and F4 populations. SD for the reduced models are usually similar or larger. The SDs of AI F5 have similar size in positions and main effects and larger size in epistatic effects as compared with those in AI F3. The estimates in AI F10 have a much larger SD. A total of 100 replicates, each with sample size 1000, were analysed with two linked epistatic QTLs, Q_A and Q_B. The heritability is 0·05 in the F₂ population. The critical values are determined by Piepho's method.

^a 8e/8a indicates the eight-effect model with the non-Markovian/Markovian method.

Table 4. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in F₂, IF₂ and IRI populations

For reducing the text, SDs (numbers in parentheses) are only shown for the complete-effect mode. SDs for the reduced models are usually similar or larger. A total of 100 replicates, each with sample size 1000, were analysed with two linked epistatic QTLs, Q_A and Q_B. The heritability is 0·05 in the F₂ population. The critical values are determined by Piepho's method. P1 (P2): position of Q_A (Q_B).

^a 8e/8a indicates the eight-effect model with the non-Markovian/Markovian method.

Table 3 presents the QTL mapping results for AI populations under the sparse maps. Under the sparse (dense) map, when the complete-effect model is considered, the detecting powers by the non-Markovian method are 0·61 (0·61), 0·62 (0·52), 0·47 (0·79) and 0·06 (0·65), respectively, in the AI F₃, F₄, F₅ and F₁₀ populations, and they are 0·59 (0·65), 0·59 (0·51), 0·46 (0·79) and 0·07 (0·70), respectively, by the Markovian method. When epistasis is ignored by using the four-effect model, the powers are reducing to 0·11 (0·11), 0·09 (0·08), 0·25 (0·17) and 0·12 (0·05), respectively, by the non-Markovian (Markovian) method. If the five-effect model is considered under sparse map, the powers by the non-Markovian (Markovian) method are 0·41 (0·39), 0·63 (0·37), 0·40 (0·41) and 0·09 (0·11), respectively, in the four populations. An increasing trend in power can be observed in the case of the dense map (not shown). However, such an increasing trend does not occur in the sparse map (Table 3). Also, by taking epistasis into account, the power can be much improved and the confounding problem can be avoided, and the means of the estimated effects are all very close to the true given parameters. Besides, the QTL positions are estimated with better precision in the AI populations as compared with those estimated in the RI populations. Among all the settings, the most powerful experimental population for QTL detection is the AI F₃ (AI F₅) population under the sparse (dense) map. The AI F₁₀ population is not the optimal design under either map, as the powers are about 0·05–0·12 and about 0·45–0·70, respectively, in the two maps. It is expected that a much denser map is required to ensure more powerful QTL detection in the AI F₁₀ population (see the Discussion section). When comparing the results of the AI and F₂ populations (Table 4), the AI populations show more powerful results than the F₂ population in all cases under the dense map.

Table 4 shows the QTL mapping results in the F₂, IF₂, IRI F_5:1 and IRI F_5:3 populations under the sparse maps. The QTL mapping results are better under the dense map in these later advanced populations as compared with those under the sparse map. For example, in the IF₂ population, the powers under the dense map are 0·59 (0·58), 0·54 (0·52) and 0·35 (0·35) by the non-Markovian (Markovian) method for the complete-effect, five-effect and four-effect models (not shown), respectively. Under the sparse map, they are 0·06 (0·05), 0·05 (0·05) and 0·02 (0·03), respectively. The estimated positions and effects are also found to be more precise in the dense map. For example, under the complete-effect model, the estimated effects of a ₁, d ₁, a ₂ and d ₂ by the non-Markovian method are 1·919 (SD 0·657), 1·689 (SD 1·010), 1·757 (SD 0·670) and 1·677 (SD 0·912), respectively, in the dense map (not shown), and they are 1·241 (SD 1·028), 0·766 (SD 1·381), 1·351 (SD 0·907) and 0·850 (SD 1·512), respectively, in the sparse map. Similar situations were also found in the IRI F_5:1 and IRI F_5:3 populations. Besides, the complete-effect model is not appropriate for the IRI populations, and the three-effect and five-effect models are more appropriate for these two populations. For example, the powers in the IRI F_5:1 population are 0·13 (0·01) and 0·07 (0·01) by the complete-effect model of the non-Markovian (Markovian) method in the two different maps, and the powers become 0·77 (0·76) and 0·92 (0·93) by the five-effect model, respectively. Also, taking the additive-by-additive effect into account can greatly benefit the QTL detection. A similar trend can be observed for the RIL.

5. Discussion

The genome structures of the advanced populations can be very different from each other and are no longer similar to that of the F₂ population as mentioned before. This paper tries to distinguish between the genome structures of different populations to deal with the issues of QTL mapping. When using the advanced populations for QTL mapping, we propose the Markovian and non-Markovian methods to map for QTL. Some important properties and issues in QTL mapping, such as mapping closely linked QTLs, confounding problems of ignoring epistasis and the choice of different mapping models, are also derived and discussed across different populations. Theoretically, the non-Markovian method have better performances than the Markovian method, as the more accurate mixing proportions can be used in statistical modelling as discussed. In fact, analytical and simulation studies show that the non-Markovian method does perform better than the Markovian method in the advanced populations, especially in populations subject to the selfing process. The advanced populations can be also designed to be more powerful than the F₂ population in QTL detection. Besides, the issues considered here are under the assumption of large sample size with no selection. In practice, selection and drift may play a role between generations, and it will cause unequal allele frequencies and potential segregation distortion. As suggested by Teuscher & Broman (Reference Teuscher and Broman2007), the solution to these problems is the use of a dense marker set with which the actual recombination breakpoints can be precisely mapped. The results presented here can give some clues to the use of advanced population for better investigation in genetical and biological studies.

The quality of QTL mapping relies on precisely deriving the conditional probabilities of putative QTL genotypes given marker genotypes and on applying appropriate statistical methods to link the quantitative traits with the putative QTL. When deriving the conditional genotypic distribution of a putative QTL, ideally, we would like to use the information from all the linked markers (as many linked markers as possible) to obtain it. This, however, is very challenging as the characterization of the genotypic distribution of many genes is not an easy task. The approach of interval mapping avoids this and proposes to use its two flanking markers instead in derivation, so that its task reduces to characterizing the genotypic distribution for three genes. Such an approach is optimal in capturing the QTL information for the genomes with a first-order Markovian property, but not for the genomes without this property. However, for the latter case, we believe that the closest marker pair may have already captured most of the information about QTLs. When multiple putative QTLs are considered in the advanced population, the joint conditional probability distribution used here is approximate and obtained by using conditional independence property as we are still not sure currently how to derive the exact conditional distribution for an arbitrary number of putative QTLs. In addition, when applying statistical models to detect QTLs, the specific genome structures of advanced populations have to be taken into account in modelling to benefit QTL detection. For example, in the RI or IRI populations, there are larger additive genetic variances (smaller dominance variances) and higher homozygosity (lower heterozygosity), and the applied models should consider that fitting the components involving additive effects into the model can benefit QTL detection and that fitting the components involving dominance effects into the model may deter QTL detection.

One of the most precious features in the advanced populations is that they can generate more recombinants to improve the QTL resolution. From the viewpoint of statistical modelling, such an improvement is to take advantage of more recombinants in a population to alleviate the collinearity problem in modelling-linked putative QTLs (to disassociate the linkage disequilibrium between linked putative QTLs), so that QTL mapping can be more powerful and precise (see the subsection ‘Power of separating closely linked QTLs’); nevertheless, more recombinants also reduce the linkage disequilibrium between markers and QTLs to blur the information about the unobservable putative QTL. Therefore, to expect improved QTL mapping results in the advanced population, a denser marker map around the linked QTL region is required to ensure that the linkage disequilibrium is strong enough in the construction. In a marker interval with given width, the linkage disequilibrium between markers and putative QTLs is strongest in the F₂ population, and it becomes gradually weaker as generation advances. Taking a putative QTL Q in the middle of a 10 cM marker interval flanked by markers, A and B, as an example, the trigenic linkage disequilibrium defined as D _AQB=P _AQB−P _AP _QP _B (Wright, Reference Wright1980) is 0·329 in the F₂ population, and it becomes 0·309 (0·300), 0·286 (0·292), 0·260 (0·290) and 0·111 (0·288) in the AI (RI) F₃, F₄, F₅ and F₁₀ population, respectively. It shows that the linkage disequilibrium is declining more rapidly under random mating. In general, once the designed populations, such as IF₂, IRI F_5:1 and AI F₁₀ populations, have undergone some generations of random mating, they usually require a much denser marker map to obtain improved results. Therefore, the marker density should be considered as a major factor not only in the comparison between the two proposed methods, but also in the issue of using advanced populations to improve QTL mapping results (see also the ‘Simulation studies’ section). Besides, the issues of trade-off between generation number and marker density and of extension to more than two founders (Mott et al., Reference Mott, Talbot, Turri, Collins and Flint2000; Broman, Reference Broman2005) are interesting and worthy of pursuing in the future. Together with the (F_u/F_v, v⩾u) designs (Fisch et al., Reference Fisch, Ragot and Gay1996; Kao, Reference Kao2006) and the strategy of replicated trials (Hua et al., Reference Hua, Xing, Xu, Sun, Yu and Zhang2002), it is very much possible for us to design experimental populations to recover or remove those undetected or ghost QTLs (Lander & Botstein, Reference Lander and Botstein1989) in the F₂ population for high-resolution QTL mapping.

The authors are grateful to two anonymous reviewers for helpful comments. This work was supported by grant numbers NSC97-2118-M-001–008 from the National Science Council, Taiwan, Republic of China.

References

Broman, K. W. (2005). The genomes of recombinant inbred lines. Genetics 169, 1133–1146.CrossRef Google Scholar PubMed

Churchill, G. A. & Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics 138, 967–971.CrossRef Google Scholar PubMed

Darvasi, A. (1998). Experimental strategies for the genetic dissection of complex traits in animal models. Nature Genetics 18, 19–24.CrossRef Google Scholar PubMed

Dempster, A. P., Larid, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38.Google Scholar

Fisch, R. D., Ragot, M. & Gay, G. (1996). A generalization of the mixture model in the mapping of quantitative trait loci for progeny from a biparental cross of inbred lines. Genetics 143, 571–577.CrossRef Google Scholar PubMed

Geiringer, H. (1944). On the probability theory of linkage in Mendelian heredity. The Annals of Mathematical Statistics 15, 25–57.CrossRef Google Scholar

Haldane, J. B. S. & Waddington, C. H. (1931). Inbreeding and linkage. Genetics 16, 357–374.CrossRef Google Scholar PubMed

Hua, J. P., Xing, Y. Z., Xu, C. G., Sun, X. L., Yu, S. B. & Zhang, Q. (2002). Genetic dissection of an elite rice hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162, 1885–1895.CrossRef Google Scholar

Jansen, R. C. (1993). Interval mapping of multiple quantitative trait loci. Genetics 135, 205–211.CrossRef Google Scholar PubMed

Jennings, H. S. (1916). The numerical results of diverse systems of breeding. Genetics 1, 53–89.CrossRef Google Scholar PubMed

Jiang, C.-J. & Zeng, Z.-B. (1997). Mapping quantitative trait loci with dominant and missing markers in various populations from inbred lines. Genetica 101, 47–85.CrossRef Google Scholar PubMed

Kao, C.-H. (2004). Multiple interval mapping for quantitative trait loci controlling endosperm traits. Genetics 167, 1987–2002.CrossRef Google Scholar PubMed

Kao, C.-H. (2006). Mapping quantitative trait loci using the experimental designs of recombinant inbred population. Genetics 174, 1373–1386.CrossRef Google Scholar

Kao, C.-H. & Zeng, Z.-B. (1997). General formulas for obtaining the MLE and the asymptotic variance–covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53, 359–371.CrossRef Google Scholar

Kao, C.-H. & Zeng, Z.-B. (2002). Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics 160, 1243–1261.CrossRef Google Scholar PubMed

Kao, C.-H., Zeng, Z.-B. & Teasdale, R. D. (1999). Multiple interval mapping for quantitative trait loci. Genetics 152, 1203–1216.CrossRef Google Scholar PubMed

Koski, T. (2001). Hidden Markov Models for Bioinformatics. Boston, MA: Kluwer Academic Publishers.CrossRef Google Scholar

Lander, E. S. & Botstein, D. (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199.CrossRef Google Scholar PubMed

Lee, M., Sharopova, N., Beavis, W. D., Grant, D., Katt, M., Blair, D. & Hallauer, A. (2002). Expanding the genetic map of maize with the intermated B73 Mo17 (IBM) population. Plant Molecular Biology 48, 453–461.CrossRef Google Scholar PubMed

Liu, S.-C., Kowalski, S. P., Lan, T.-H., Feldmann, K. A. & Paterson, A. H. (1996). Genome-wide high-resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics 142, 247–258.CrossRef Google Scholar PubMed

Lynch, M. & Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates.Google Scholar

Marquardt, D. W. (1970). Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 12, 591–612.CrossRef Google Scholar

Martin, O. C. & Hospital, F. (2006). Two- and three-locus tests for linkage analysis using recombinant inbred lines. Genetics 173, 451–459.CrossRef Google Scholar PubMed

Mott, R., Talbot, C. J., Turri, M. G., Collins, A. C. & Flint, J. (2000). From the cover: a method for fine mapping quantitative trait loci in outbred animal stocks. Proceedings of the National Academy of Sciences USA 97, 12648–12654.CrossRef Google Scholar

Piepho, H. P. (2001). A quick method for computing approximate threshold for quantitative trait loci detection. Genetics 157, 425–432.CrossRef Google Scholar PubMed

Robbins, R. B. (1918). Some applications of mathematics to breeding problems III. Genetics 3, 375–389.CrossRef Google Scholar

Rockman, M. L. & Kruglyak, L. (2008). Breeding designs for recombinant inbred advanced intercross lines. Genetics 179, 1069–1078.CrossRef Google Scholar PubMed

Teuscher, F. & Broman, K. W. (2007). Haplotype probabilities for multiple-strain recombinant inbred lines. Genetics 175, 1267–1274.CrossRef Google Scholar PubMed

Weir, B. S. (1996). Genetic Data Analysis II. Sunderland, MA: Sinauer Associates.Google Scholar

Weir, B. S. & Cockerham, C. C. (1977). Two-locus theory in quantitative genetics. Proceedings of the International Conference on Quantitative Genetics (ed. Pollak, E., Kempthorne, O. & Bailey, T. B.), pp. 247–269. Ames, IA, USA: Iowa State University.Google Scholar

Winkler, C. R., Jensen, N. M., Cooper, M., Podlich, D. W. & Smith, O. S. (2003). On the determination of recombination rates in intermated recombinant inbred populations. Genetics 164, 741–745.CrossRef Google Scholar PubMed

Wright, S. (1980). Genic and organismic selection. Evolution 34, 825–843.CrossRef Google Scholar PubMed

Xu, S. (2007). An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63, 513–521.CrossRef Google Scholar PubMed

Zeng, Z.-B. (1994). Precision mapping of quantitative trait loci. Genetics 136, 1457–1468.CrossRef Google Scholar PubMed

Zou, F., Gelfond, J. A. L., Airey, D. C., Lu, L., Manly, K. F., Williams, R. W. & Threadgill, D. W. (2005). Quantitative trait locus analysis using recombinant inbred intercrosses: theoretical and empirical considerations. Genetics 170, 1299–1311.CrossRef Google Scholar PubMed

Zou, W. & Zeng, Z.-B. (2008). Statistical methods for mapping multiple QTL. International Journal of Plant Genomics, Article ID 286561, doi: 10.1155/2008/286561.CrossRef Google Scholar PubMed

Fig. 1. The differences between the conditional probabilities of QQ, Qq and qq genotypes given the flanking marker genotype MN/MN obtained by using the Markovian and non-Markovian methods for the case of r1=0·1 and r2=0·1 in the AI, RI, IRI and RIX populations. The curve below zero implies that the probabilities of QTL genotypes are underestimated by using the Markovian method. (a) AI populations. (b) RI populations. (c) IRI F10,t populations. (d) RIX F10,t populations.

Fig. 2. The differences between the conditional probabilities of QQ, Qq and qq genotypes given the flanking marker genotype MN/Mn obtained by using the Markovian and non-Markovian methods for the case of r1=0·1 and r2=0·1 in the AI, RI, IRI and RIX populations. The curve below zero implies that the probabilities of QTL genotypes are underestimated by using the Markovian method. (a) AI populations. (b) RI populations. (c) IRI F10,t populations. (d) RIX F10,t populations.

Table 1. The components of the regression coefficient and partial regression coefficient

Table 2. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in the RI populations

Table 3. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in the different AI populations

Table 4. Simulation results of using different mapping models of the Markovian and non-Markovian methods under the sparse marker map in F2, IF2 and IRI populations

Kao supplementary material

Supplementary material

PDF 115 KB

Article contents

A study on the mapping of quantitative trait loci in advanced populations derived from two inbred lines

Summary

1. Introduction

2. The genome structures of advanced populations

(i) Genome structure

3. Methods

(i) Data structure

(ii) Genetic model and variance components

(iii) Markovian and non-Markovian methods

(iv) Conditional probabilities of the putative QTL genotypes

(v) Maximum likelihood estimation

(vi) QTL mapping properties

(vii) Power of separating closely linked QTL

4. Simulation studies

5. Discussion

References

Kao supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests