Inferring genetic values for quantitative traits non-parametrically

DANIEL GIANOLA; GUSTAVO de los CAMPOS

doi:10.1017/S0016672308009890

Inferring genetic values for quantitative traits non-parametrically

Published online by Cambridge University Press: 06 January 2009

DANIEL GIANOLA and

GUSTAVO de los CAMPOS

Show author details

DANIEL GIANOLA*: Affiliation:
Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway Institut National de la Recherche Agronomique, UR631 Station d'Amélioration Génétique des Animaux, BP 52627, 32326 Castanet-Tolosan, France Institut für Tierzucht und Haustiergenetik, Georg-August-Universität, Göttingen, Federal Republic of Germany
GUSTAVO de los CAMPOS: Affiliation:
Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
*: *Corresponding author. Tel: +1 6082652054. Fax. +1 6082625157. e-mail: [email protected]

Article contents

Summary
Introduction
Extant theory
Confronting complexity
Non-parametric breeding value
Illustration: infinitesimal model
Illustration: non-linear system
Conclusion
References

Rights & Permissions

Summary

Inferences about genetic values and prediction of phenotypes for a quantitative trait in the presence of complex forms of gene action, issues of importance in animal and plant breeding, and in evolutionary quantitative genetics, are discussed. Current methods for dealing with epistatic variability via variance component models are reviewed. Problems posed by cryptic, non-linear, forms of epistasis are identified and discussed. Alternative statistical procedures are suggested. Non-parametric definitions of additive effects (breeding values), with and without employing molecular information, are proposed, and it is shown how these can be inferred using reproducing kernel Hilbert spaces regression. Two stylized examples are presented to demonstrate the methods numerically. The first example falls in the domain of the infinitesimal model of quantitative genetics, with additive and dominance effects inferred both parametrically and non-parametrically. The second example tackles a non-linear genetic system with two loci, and the predictive ability of several models is evaluated.

Type: Paper
Information: Genetics Research , Volume 90 , Issue 6 , December 2008 , pp. 525 - 540

DOI: https://doi.org/10.1017/S0016672308009890 [Opens in a new window]
Copyright: Copyright © 2009 Cambridge University Press

1. Introduction

The problem considered here is that of inferring genetic values and of predicting phenotypes for a quantitative trait under complex forms of gene action, an issue of importance in animal and plant breeding, and in evolutionary quantitative genetics (Lynch & Walsh, Reference Lynch and Walsh1998). The discussion is streamlined as follows. Current methods for dealing with epistatic variability via variance component models are discussed in section 2. Problems posed by cryptic, non-linear, forms of epistasis are identified in section 3. Section 4 proposes non-parametric definitions of additive effects (breeding values), with and without employing molecular information, and shows how these could be inferred using reproducing kernel Hilbert spaces (RKHS) regression models. Sections 5 and 6 present stylized examples to demonstrate the methods. The first example uses the infinitesimal model of quantitative genetics, and the second one tackles a non-linear genetic system. The paper ends with concluding comments.

2. Extant theory

A standard decomposition of phenotypic value in quantitative genetics (Falconer & Mackay, Reference Falconer and Mackay1996) is

$y \equals \mu \plus a \plus d \plus i \plus e\comma$

where a, d and i are additive, dominance and epistatic effects, respectively, and e is a residual, reflecting environmental (residual) variability. This linear decomposition can also be used to describe variability of latent variables, especially if assumed Gaussian (e.g. Dempster & Lerner, Reference Dempster and Lerner1950; Gianola, Reference Gianola1982). The i effect can be decomposed into additive×additive, additive×dominance, dominance×dominance, etc., deviates. In what has been termed ‘statistical epistasis’ (Cheverud & Routman, Reference Cheverud and Routman1995), these deviates are assumed to be random draws from some distributions representing ‘interactions’ between loci. Under certain assumptions (Cockerham, Reference Cockerham1954; Kempthorne, Reference Kempthorne1954), the deviates are uncorrelated, leading to the standard variance decomposition σ²=σ_a²+σ_d²+σ_aa²+σ_ad²+σ_dd²+…+σ_e², where the σ²s are variance components, e.g., σ_ad² represents the contribution of additive by dominance effects to variance. The sum of all variance components other than σ_a², σ_d² and σ_e² is interpreted as variance ‘due to epistasis’. For the purposes of discussing problems this representation has, it suffices to assume that the only relevant epistatic effects are those of an additive×additive, additive×dominance and dominance×dominance nature.

(i) Henderson's methods for predicting epistatic effects

Using the Cockerham–Kempthorne (hereinafter, CK) assumptions, Henderson (Reference Henderson1985) extended their model to the infinitesimal domain, and vectorially, by writing

(1)

$\eqalign{ {\bf y} \equals \tab {\bf X} \bimbeta \plus \lpar {\bf a} \plus {\bf d} \plus {\bf i}_{aa} \plus {\bf i}_{ad} \plus {\bf i}_{dd} \rpar \plus {\bf e} \cr \equals \tab {\bf X} \bimbeta \plus {\bf g} \plus {\bf e}\comma \cr}$

where β is some nuisance location vector (equal to μ if it contains a single element); X is a known incidence matrix; a and d are vectors of additive and dominance effects, respectively; i_aa, i_ad and i_dd are epistatic effects, and g=a+d+i_aa+i_ad+i_dd is the ‘total’ genetic value. Assuming that g and e are uncorrelated, the variance–covariance decomposition is

(2)

${\bf V}_{y} \equals {\bf V}_{g} \plus {\bf V}_{e} \comma$

where V_y,V_g and V_e are the phenotypic, genetic and residual variance–covariance matrices, respectively. Further,

(3)

${\bf V}_{g} \equals {\bf A} \sigma _{a}^{\setnum{2}} \plus {\bf D} \sigma_{d}^{\setnum{2}} \plus \lpar {\bf A} \# {\bf A}\rpar \sigma _{aa}^{\setnum{2}} \plus \lpar {\bf A} \# {\bf D} \rpar \sigma _{ad}^{\setnum{2}} \plus \lpar {\bf D} \# {\bf D}\rpar \sigma_{dd}^{\setnum{2}}.$

Here, A is the numerator relationship matrix; D is a matrix due to dominance relationships which can be computed from entries in A, and the remaining matrices involve Hadamard (element by element) products of matrices A or D. Thus, under CK, all matrices can be computed from elements of A, as noted by Henderson (Reference Henderson1985), because absence of inbreeding and of linkage disequilibrium are assumed. Cockerham (Reference Cockerham1956) and Schnell (Reference Schnell, Hanson and Robinson1963) gave formulae for covariances between relatives in the presence of linkage, and difficulties posed by linkage disequilibrium are discussed by Gallais (Reference Gallais1974).

In CK–Henderson, with the extra assumption that all genetic effects (having null means) and the data (with mean vector Xβ) follow a multivariate normal distribution with known dispersion components, one has

(4)

$E \,\lpar {\bf a} \vert {\bf g} \rpar \equals {\rm Cov} \lpar {\bf a} \comma {\bf g} \rpar {\bf V}_g^{ \minus \setnum{1}}{\bf g} \equals \sigma_{a}^{\setnum{2}} {\bf AV}_{g}^{ \minus \setnum{1}} {\bf g}.$

The best predictor (Henderson, Reference Henderson1973; Bulmer, Reference Bulmer1980) of additive genetic value in the mean-squared error sense is

(5)

$\eqalign{E \, \lpar {\bf a} \vert {\bf y}\rpar \equals\tab E_{g \vert y} \lsqb E \, \lpar {\bf a}\vert {\bf g}\rpar \rsqb \equals E_{g \vert y} \lsqb \sigma _{a}^{\setnum{2}} {\bf AV}_{g}^{ \minus \setnum{1}} {\bf g}\rsqb\cr \equals\tab \sigma_{a}^{\setnum{2}} {\bf AV}_{g}^{ \minus \setnum{1}} \lsqb E_{g \vert y} \, \lpar {\bf g}\rpar \rsqb \comma}$

where E _g|y (g)=V_gV_y⁻¹ (y−Xβ) is the best predictor of g, and the matrix of regression coefficients V_gV_y⁻¹ is the multidimensional counterpart of heritability in the broad sense. With known variance components, β is typically estimated by generalized least-squares (equivalently, by maximum likelihood under normality) as . Then, ‘the empirical best predictor’ of the vector of additive effects is taken to be

(6)

$\eqalign{{\hats {E}} \, \lpar {\bf a}\vert {\bf y}\rpar \equals \tab \sigma _{a}^{\setnum{2}} {\bf AV}_{g}^{ \minus \setnum{1}} \lsqb {\hats {E}}_{g \vert y} \lpar {\bf g}\rpar \rsqb \cr \equals \tab \sigma_{a}^{\setnum{2}} {\bf AV}_{g}^{ \minus \setnum{1}} {\bf V}_{g} {\bf V}_{y}^{ \minus \setnum{1}} \, \lpar {{\bf y} \minus {\bf X} {{\hats\bimbeta}} } \rpar \cr \equals \tab \sigma_{a}^{\setnum{2}} {\bf AV}_{y}^{ \minus \setnum{1}} \, \lpar {{\bf y} \minus {\bf X} {{\hats\bimbeta}} } \rpar. \cr}$

This is precisely the best linear unbiased predictor (BLUP) of additive merit when model (1) holds. Likewise, the BLUP of additive×dominance deviations is

${\hats {E}} \, \lpar {\bf i}_{ad} \vert {\bf y} \rpar \equals \sigma _{ad}^{\setnum{2}} \lpar {\bf A} \# {\bf D} \rpar {\bf V}_{y}^{ \minus \setnum{1}} \lpar {{\bf y} \minus {\bf X} {\hats {\bimbeta}} } \rpar \comma$

and so on.

The BLUPs of a, d, i_aa, i_ad and i_dd in (1) can be computed simultaneously using Henderson's mixed model equations, but this requires forming the inverse matrices A⁻¹, D⁻¹, (A#A)⁻¹, (A#D)⁻¹ and (D#D)⁻¹. In a general setting, most of these inverses are impossible to obtain, contrary to that of A, which can be written directly from a genealogy.

(ii) Reformulation of Henderson's approach

Equivalently, as in de los Campos et al. (Reference de los Campos, Gianola and Rosa2008), one may rewrite (1) as

(7)

$\eqalign{{\bf y} \equals\tab {\bf X} \bimbeta \plus {\bf Aa} \ast \plus {\bf Dd}\ast \plus \lpar {\bf A} \# {\bf A}\rpar \, {\bf i}_{aa}{\hskip-7\ast } \cr\tab\plus \lpar {\bf A} \# {\bf D}\rpar \, {\bf i}_{ad}{\hskip-7\ast } \plus \lpar {\bf D} \# {\bf D}\rpar \, {\bf i}_{dd}{\hskip-6\ast } \plus {\bf e} \comma}$

where a*=A⁻¹a~(0, A⁻¹σ_a²),…, i_dd*=(D#D)⁻¹i_dd~(0, (D#D)⁻¹σ_dd²). Then, the BLUP of any of the transformed genetic effects can be found by solving a system of mixed linear model equations that does not involve inverses of any of the genetic variance–covariance matrices. For example, the β-equation is

$\eqalign{\tab{\bf X} \prime{\bf X} {\hats {\bimbeta}} \plus {\bf X} \prime {\bf A} {\bf\hat{a}} \ast \plus {\bf X} \prime {\bf D} {\bf \hatss{d}} \ast \plus {\bf X} \prime \, \lpar {\bf A} \# {\bf A} \rpar {\bf \hats{i}}_{aa} {\hskip-7\ast} \cr\tab\quad\plus {\bf X} \prime \, \lpar {\bf A} \# {\bf D} \rpar {\bf\hats{i}} _{ad} \plus {\bf X} \prime \, \lpar {\bf D} \# {\bf D}\rpar {\bf \hats{i}}_{dd} \equals {\bf X} \prime {\bf y} \comma$

and the i_aa-equation is

$\eqalign{\tab \lpar {\bf A} \# {\bf A} \rpar \, {\bf X} {\hats{\bimbeta}} \plus \lpar {\bf A} \# {\bf A} \rpar \, {\bf A} {\bf \hat{a}} \ast \plus \lpar {\bf A} \# {\bf A}\rpar {\bf D} {\bf \hatss{d}} \ast\cr\tab\quad \plus \left[ {\lpar {\bf A} \# {\bf A}\rpar ^{\setnum{2}} \plus {{\sigma _{e}^{\setnum{2}} } \over {\sigma _{aa}^{\setnum{2}} }}\lpar {\bf A} \# {\bf A}\rpar } \right] {\bf \hats{i}}_{aa}{\hskip-7\ast} \plus \lpar {\bf A} \# {\bf A} \rpar \, \lpar {\bf A} \# {\bf D} \rpar {\bf \hats{i}}_{ad}{\hskip-7\ast} \cr \tab\quad\plus \lpar {\bf A} \# {\bf A} \rpar \, \lpar {\bf D} \# {\bf D} \rpar {\bf \hats{i}}_{dd}{\hskip-7\ast} \equals \lpar {\bf A} \# {\bf A}\rpar {\bf y}.}$

Once the BLUPs of the (∗) genetic effects are obtained, linear invariance leads, for example, to ${\bf \hat{a}} \equals {\bf A}{\bf \hat{a}}\ast$ , and to ${\bf \hats{i}}_{dd} \equals \lpar {\bf D} \# {\bf D} \rpar {\bf \hats{i}}_{dd} {\hskip-7\vskip-1pt\ast\vskip1\hskip2}$ . A computational difficulty here is that A is typically not sparse, so all A#A, A#D, etc., are not sparse either. Note that the equations for the genetic effects in the ‘reparameterized’ model are similar to those in Henderson (Reference Henderson1984). For example, if in an animal model the a-equation

$\left( {{\bf I} \plus {\bf A}^{ \minus \setnum{1}} {{\sigma _{e}^{\setnum{2}} } \over {\sigma _{a}^{\setnum{2}} }}} \right) {\bf \hat{a}} \equals {\bf Z}\prime \, \lpar {\bf y} \minus {\bf X} \bimbeta \rpar$

is premultiplied by A, one obtains

$\left( {{\bf A} \plus {\bf I}{{\sigma _{e}^{\setnum{2}} } \over {\sigma _{a}^{\setnum{2}} }}} \right){\bf\hat a} \equals {\bf AZ}\prime \, \lpar {\bf y} \minus {\bf X}\bimbeta \rpar.$

(iii) Bayesian implementation of the reformulated model

In the standard Bayesian linear model (Wang et al., Reference Wang, Rutledge and Gianola1993, Reference Wang, Rutledge and Gianola1994; Sorensen & Gianola, Reference Sorensen and Gianola2002), ${\hskip-2\bf \hat{a}} \ast \comma\hskip-1 {\bf \hats{i}}_{dd} {\hskip-7\ast\hskip2}$ , etc., are mean vectors of corresponding conditional posterior distributions, for which sampling procedures are well known. Further, with scaled inverse chi-square priors assigned to variance components, Gibbs sampling is straightforward, and does not require forming inverses either. For example, a draw from the conditional posterior distribution of the dominance×dominance variance component is obtained under the alternative parameterization as

$\sigma _{dd}^{\setnum{2}} \sim \lsqb {\bf i}_{dd}\hskip-6{\prime \ast} \lpar {\bf D}\# {\bf D}\rpar \, {\bf i}_{dd}{\hskip-7\ast} \plus \nu_{dd} S_{dd}^{\setnum{2}} \rsqb \comma$

where i_dd* is a draw from its conditional posterior distribution, given everything else (Sorensen & Gianola, Reference Sorensen and Gianola2002) and ν_dd and S _dd² are hyper-parameters.

While computations may still be formidable, inversion of the genetic covariance matrices is circumvented. Apart from computational issues, an important question, however, is whether or not the CK–Henderson construct can cope with complex genetic systems effectively.

3. Confronting complexity

Dealing with non-additive genetic variability may be much more difficult than what equations (1) and (3) suggest. Theoretically, at least in CK, epistatic variance can be partitioned into orthogonal additive×additive, additive×dominance, dominance×dominance, etc., variance components, only under idealized conditions. These include linkage equilibrium, absence of mutation and of selection, and no inbreeding or assortative mating (Cockerham, Reference Cockerham1954; Kempthorne, Reference Kempthorne1954). All these assumptions are violated in mature and in breeding programmes. Also, estimation of non-additive components of variance is very difficult, even under standard assumptions (Chang, Reference Chang1988), leading to imprecise inference. Linkage disequilibrium induces covariances between different types of effects; algebraically heroic attempts to deal with this problem are in Weir & Cockerham (Reference Weir and Cockerham1977) and Wang & Zeng (Reference Wang and Zeng2006). Gallais (Reference Gallais1974) derived expressions aimed to describe the impact of linkage disequilibrium on partition of genetic variance. The number of parameters is large, which makes estimation of the needed dispersion components intractable.

The question of whether or not standard random effects models for quantitative traits can account accurately for non-linear (non-additive) relationships between infinitesimal genotypes and phenotypes remains open. It is argued subsequently that parametric models cannot handle well complexity resulting from interactions between the hundreds or even thousands of genes expected to affect multifactorial traits, such as liability or resistance to disease. This was discussed by Templeton (Reference Templeton, Wolf, Brodie and Wade2000), Gianola et al. (Reference Gianola, Fernando and Stella2006) and Gianola and van Kaam (Reference Gianola and van Kaam2008).

In the standard theory for non-additive gene action, e.g. epistasis, interaction effects enter linearly into the phenotype, that is, the partial derivative of the model with respect to any effect, be it additive or of an interactive nature, is a constant that does not involve any genetic effect. This theory is unrealistic in non-linear systems, such as those used in metabolic control theory (Bost et al., Reference Bost, Dillmann and De Vienne1999). Arguing from the perspective of ‘perturbation in analysis’, Feldman & Lewontin (Reference Feldman and Lewontin1975) advanced the argument that linear (i.e. analysis of variance) models are ‘local’, presumably in the following sense. Suppose that the expected value of phenotype y, given some genetic values G (genotypes are discrete but their effects are continous), is some unknown function of effects of genotypes at L loci represented as f(G ₁, G ₂, …, G _L). A second-order approximation to the surface of means yields

$\eqalign{\tab E\lpar y \vert{\bf G}\rpar \equals f\hskip1\lpar G_{\setnum{1}} \comma G_{\setnum{2}} \comma \ldots \comma G_{L} \rpar \cr \tab\quad \approx f\hskip1\lpar \bars{G}_{\setnum{1}} \comma \bars{G}_{\setnum{2}} \comma \ldots \comma \bars{G}_{L} \rpar \plus {\bf f} \prime \, \lpar \bars{G}_{\rm \setnum{1}} \comma \bars{G}_{\setnum{2}} \comma \ldots \comma \bars{G}_{L} \rpar \lpar {{\bf G}} \minus {\bf\hskip4{\barss}\hskip-6{G}}\rpar \cr \tab\quad\quad \plus {1 \over 2}\lpar {{\bf G}} \minus {\bf\hskip4{\barss}\hskip-6{G}} \rpar \prime \left\{ {{{\partial ^{\rm \setnum{2}} } \over {\partial G_{i} \partial G_{j} }}f \, \lpar G_{\setnum{1}} \comma G_{\setnum{2}} \comma \ldots \comma G_{L} \rpar } \right\}_{{\bf G} \equals {\overline{\bf G}}}\cr\tab\quad\quad\times \lpar {\bf G} \minus {\bf\hskip4{\barss}\hskip-6{G}}\rpar \comma \cr}$

where $\bars{G}_{i}$ is the mean value of the effect of locus i (typically taken to be 0); G=(G ₁, G ₂, …, G _L)′ is a column vector and Ḡ=E(G); f′( $\overline{G}_{\setnum{1}} \comma\,\overline{G}_{\setnum{2}} \comma\, \ldots \comma\, \overline{G}_{L}$ ) is the row vector of first derivatives of f(.) with respect to G and

$\left\{ {{\partial^{\setnum{2}} \over \partial G_{i} \partial G_{j} }} f\hskip1 \lpar G_{\setnum{1}} \comma\, G_{\setnum{2}} \comma\, \cdots \comma\, G_{L} } \rpar\right\}_{{\bf G} \equals { \overline{\bf G}}}$

is the matrix of second derivatives, both evaluated at Ḡ in the approximation.

A linear-on-variance-components decomposition of variability may not be enlightening at all when a phenotype results, say, from a sum of sine and cosine waves. To illustrate, suppose that non-linearity (non-additivity) enters as E (y|G ₁, G ₂, G ₃)=G ₁ exp (G ₂G ₃). A log-transformation of the expected phenotypic value of individuals with genetic value (G ₁, G ₂, G ₃) would suggest linearity (in a log-scale) with respect to locus 1, and interaction between loci 2 and 3. For this hypothetical model, the first derivatives are

$\eqalign{{\bf f}\lpar G_{\setnum{1}} \comma G_{\setnum{2}} \comma G_{\setnum{3}} \rpar \equals\tab \left[ {\matrix{e^{G_{\setnum{2}} G_{\setnum{3}}} \cr G_{\setnum{1}} G_{\setnum{3}} e^{G_{\setnum{2}} G_{\setnum{3}}}\cr G_{\setnum{1}} G_{\setnum{2}} e^{G_{\setnum{2}} G_{\setnum{3}}}}} \right] \cr\equals \tab\left\{ {\matrix{ {\left[ {\matrix{ 1 \cr 0 \cr 0 \cr} } \right]}\comma \tab {{\rm when}\;\bars{G}_{\setnum{1}} \comma \bars{G}_{\setnum{2}} \comma \bars{G}_{\setnum{3}} \equals 0 \comma } \cr {\left[ {\matrix{ e \cr e \cr e \cr} } \right]}\comma \tab{{\rm when}\;\bars{G}_{\setnum{1}} \comma \bars{G}_{\setnum{2}} \comma \bars{G}_{\setnum{3}} \equals 1\comma } \cr} } \right. \cr \tab \cr}$

and the matrix of second derivatives is

$\eqalign{\tab\openup2\hskip-18pt \left\{ {{{\partial ^{\setnum{2}} } \over {\partial G_{i}\partial G_{j} }}f\left(. \right)} \right\}\cr\enspace\equals \tab \left[ \openup2{\matrix{ 0 \tab {G_{\setnum{3}} e^{G_{\setnum{2}} G_{\setnum{3}} } } \tab {G_{\setnum{2}} e^{G_{\setnum{2}} G_{\setnum{3}} } } \cr {G_{\setnum{3}} e^{G_{\setnum{2}} G_{\setnum{3}} } } \tab {G_{\setnum{1}} G_{\setnum{3}}^{\setnum{2}} e^{G_{\setnum{2}} G_{\setnum{3}} } } \tab {G_{\setnum{1}} e^{G_{\setnum{2}} G_{\setnum{3}} } \lpar G_{\setnum{2}} G_{\setnum{3}} \plus 1\rpar } \cr {G_{\setnum{2}} e^{G_{\setnum{2}} G_{\setnum{3}} } } \tab {G_{\setnum{1}} e^{G_{\setnum{2}} G_{\setnum{3}} } \lpar G_{\setnum{2}} G_{\setnum{3}} \plus 1\rpar } \tab {G_{\setnum{1}} G_{\setnum{2}}^{\setnum{2}} e^{G_{\setnum{2}} G_{\setnum{3}} } } \cr} } \right] \cr\equals \tab \left\{\hskip-8 {\matrix{\hskip8{\left[ {\matrix{ 0 \tab 0 \tab 0 \cr 0 \tab 0 \tab 0 \cr 0 \tab 0 \tab 0 \cr} } \right]}\comma\quad{\hskip5{\rm when}\;\bars{G}_{\setnum{1}} \comma \bars{G}_{\setnum{2}} \comma \bars{G}_{\setnum{3}} \equals 0 \comma } \cr{\left[ {\matrix{ 0 \tab e \tab e \cr e \tab e \tab {2e} \cr e \tab {2e} \tab e \cr} } \right]}\comma\quad {{\rm when}\;\bars{G}_{\setnum{1}}\comma \bars{G}_{\setnum{2}} \comma \bars{G}_{\setnum{3}} \equals 1.} \cr} } \right. \cr}$

In the neighbourhood of 0 for all effects, the local approximation yields

(8)

$E\,\lpar y\vert G_{\setnum{1}} \comma G_{\setnum{2}} \comma G_{\setnum{3}} \rpar \approx G_{\setnum{1}}.$

On the other hand, near 1 the local approximation produces

(9)

$\eqalign{E\,\lpar y\vert G_{\setnum{1}} \comma G_{\setnum{2}} \comma G_{\setnum{3}} \rpar \approx\hskip2\tab e\bigg(3 \minus G_{\setnum{1}} \minus 3G_{\setnum{2}} \minus 3G_{\setnum{3}} \plus {1 \over 2}G_{\setnum{2}}^{\setnum{2}} \plus {1 \over 2}G_{\setnum{3}}^{\setnum{2}}\cr\tab\quad\plus G_{\setnum{1}} G_{\setnum{2}} \plus G_{\setnum{1}} G_{\setnum{3}} \plus 2G_{\setnum{2}} G_{\setnum{3}} \bigg) \comma \cr}\hskip-30$

A local approximation near 0 suggests that phenotypes are linear on the effect of locus 1, while an approximation in the neighbourhood of 1 points towards ‘dominance’ at loci 2 and 3, and at 2 – factor epistasis involving loci (1, 2), (1, 3) and (2, 3). This relates to work by Kojima (Reference Kojima1959), who studied conditions for equilibria when a fitness surface was affected by epistasis. He assumed free recombination, random mating and constancy of genotypic values. The arguments leading to (8) and (9) indicate that constancy of genotypic values is not tenable when local approximations are used to study the surface.

To the extent that a linear model provides a ‘local’ approximation only, it may not be surprising why attaining an understanding of epistasis within the classical paradigm has been elusive. Kempthorne (Reference Kempthorne1978) disagrees with this view, however, although his argument seems more a defence of the technique of the analysis of variance (ANOVA) per se than of the lack of ability of a linear model to describe complex, interacting, systems. However, even if a linear model holds at least locally, a standard fixed effects ANOVA of a highly dimensional, multifactorial system is not feasible, because one ‘runs out’ of degrees of freedom (df) (Gianola et al., Reference Gianola, Fernando and Stella2006). Nevertheless, methodologies for dealing with complex epistatic systems are becoming available and these include, for example, machine learning, regularized neural networks (Lee, Reference Lee2004), neural networks optimized with grammatical evolution computations (Motsinger-Reif et al., Reference Motsinger-Reif, Dudek, Hahn and Ritchie2008) and non-parametric regression (e.g. Gianola et al., Reference Gianola, Fernando and Stella2006; Gianola & van Kaam, Reference Gianola and van Kaam2008). The latter is discussed in what follows.

4. Non-parametric breeding value

There is a wide collection of procedures for non-parametric regression available. In particular, statistical models based on RKHS have been useful, inter alia, for regression (Wahba, Reference Wahba1990) and classification (Vapnik, Reference Vapnik1998). In many respects, RKHS regression is the ‘mother’ of non-parametric functional data analysis, since it includes splines, hard and soft classification and even best linear unbiased prediction (de los Campos et al., Reference de los Campos, Gianola and Rosa2008) as special cases. Contrary to many ad-hoc forms of non-parametric modelling, RKHS is a variational method based on maximizing penalized likelihoods over a rich space of functions defined on a Hilbert space. Also, it uses flexible kernels, which can be adapted to many different circumstances, and allows for varying classes of information inputs, e.g. pedigrees, continuous valued covariates and molecular markers of any type. The Bayesian view of RKHS regression has been used to motivate the methodology using Gaussian processes (Rasmussen & Williams, Reference Rasmussen and Williams2006). In quantitative genetics, Gianola et al. (Reference Gianola, Fernando and Stella2006) and Gianola & van Kaam (Reference Gianola and van Kaam2008) suggested this approach for incorporating information on dense whole-genome markers into models for prediction of genetic value of animals or plants for quantitative traits. The dense molecular data (e.g. SNPs) enter as covariates into a kernel (incidence) square matrix whose dimension is equal to the number of individuals with genotype information available. Thus, the dimensionality of the problem is reduced drastically, from that given by the number of SNPs to the number of individuals genotyped, typically much lower. González-Recio et al. (Reference González-Recio, Gianola, Long, Weigel, Rosa and Avendaño2008a, Reference Gonzalez-Recio, Gianola, Rosa, Weigel and Avendañob) present applications to mortality rate and feed conversion efficiency in broilers. In the absence of kernels based on substantive theory, genetic interactions are dealt with in RKHS in some form of ‘black box’. However, the focus of non-parametric regression is prediction rather than inference, an issue that will be emphasized in this paper. Relationships between RKHS regression and classical models of quantitative genetics are discussed by de los Campos et al. (Reference de los Campos, Gianola and Rosa2008). A question that arises naturally in plant and animal breeding is whether or not measures of breeding value can be derived from an RKHS regression model.

Briefly, a RKHS regression can be represented in terms of the linear model

(10)

${\bf y} \equals {\bf X}\bimbeta \plus {\bf K}_{\bf h} {\bimalpha } \plus {\bf e}\comma$

where Xβ is as before; K_h={k(x_i, x_j, h)} is an n×n symmetric positive-definite matrix of kernel entries dependent on covariates (e.g. SNPs) x_i and x_j (subscripts i and j define the individual whose phenotype is considered and some other individual in the sample, respectively), and possibly on a set of bandwith parameters h which must be tuned in some manner, as noted below. González-Recio et al. (Reference González-Recio, Gianola, Long, Weigel, Rosa and Avendaño2008a) discuss kernels that do not involve any h. Further, α in (10) is a set of non-parametric regression coefficients, one for each individual in the sample of data. For instance, if SNP genotypes enter into k(x_i, x_j, h), then K_hα can be construed as a vector of genetic effects marked by SNPs.

The choice of kernel k (x_i, x_j, h) is absolutely critical for attaining good predictions in RKHS regression, and it can be addressed via some suitable model comparison, e.g. by cross-validation. Each kernel is associated with a space of functions, and de los Campos et al. (Reference de los Campos, Gianola and Rosa2008) describe conditions under which a kernel may be expected to work well. For example, if a Hilbert space of functions associated with a given kernel spans functions of additive and non-additive genetic effects, the model would be expected to capture such effects. Otherwise, predictions may be very poor, even worse than those attained with a linear model (which, in some cases will also be a RKHS regression).

It can be shown that the solution to the RKHS regression problem is obtained by assuming that α~N(0, K_h⁻¹σ_α²), where σ_α² is a variance component entering into the smoothing parameter ${{\sigma _{_{e} }^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}$ , and by solving

(11)

$\left[ {\matrix{ {\bf X \prime} {\bf X} \tab {{\bf X \prime} {\bf K}_{\bf h} } \cr {{\bf K}_{\bf h}{\hskip-4\prime}\hskip3 {\bf X}} \tab {{\bf K}_{\bf h}{\hskip-4\prime}\hskip3 {\bf K}_{\bf h} \plus \displaystyle{{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf K}_{\bf h} } \cr} } \right]\left[ {\matrix{ {{\hats\bimbeta }} \cr {\hat\bimalpha } \cr} } \right] \equals \left[ {\matrix{ {{\bf X \prime}{\bf y}} \cr {{\bf K}_{\bf h}\!\! \prime\,\, {\bf y}} \cr} } \right].$

The method can be implemented in either a BLUP – residual maximum likelihood (REML) context, given h assessed previously by cross-validation or generalized cross-validation (Craven & Wahba, Reference Craven and Wahba1979), or in a fully Bayesian manner, with all parameters assigned prior distributions.

The predicted genetic or genomic value is then K_hᾶ, as illustrated by González-Recio et al. (Reference González-Recio, Gianola, Long, Weigel, Rosa and Avendaño2008a) in an analysis of broiler mortality in which sires had been genotyped for thousands of SNPs, and by González-Recio et al. (Reference Gonzalez-Recio, Gianola, Rosa, Weigel and Avendaño2008b) in a similar study of food conversion ratio conducted in the same population. Their results suggest that RKHS regression using SNP information can produce more reliable prediction of current and future (offspring) phenotypes than the standard parametric additive model used in animal breeding. Also, albeit not unambiguously, some RKHS specifications with filtered SNPs tended to outperform parametric Bayesian regression models in which additive effects of all SNPs had been fitted.

In what follows, the RKHS machinery is used to develop non-parametric measures of breeding value.

(i) Infinitesimal non-parametric breeding value

Assume that some kernel matrix K_h has been found satisfactory, in the sense mentioned above. Given the context, vector K_hα in (10) is a molecularly marked counterpart of g=a+d+i_aa+i_ad+i_dd in (1). Consider now the positive-definite kernel matrix K=A+D+(A#A)+(A#D)+(D#D). This is positive-definite, because it consists of the sum of positive-definite matrices, and leads to the RKHS model

(12)

${\bf y} \equals {\bf X}{\bimbeta} \plus \lsqb {\bf A} \plus {\bf D} \plus \lpar {\bf A}\# {\bf A}\rpar \plus \lpar {\bf A}\# {\bf D}\rpar \plus \lpar {\bf D}\# {\bf D}\rpar \rsqb {\bimalpha } \plus {\bf e}\comma$

with α~N(0, K⁻¹ σ_α²). Here, K does not depend on any bandwidth parameters, which simplifies matters greatly, relative to the parametric specification (7), in which 6 variance components enter into the problem. To obtain the RKHS predictor of genetic value (and with only 2 variance components intervening), note that solving (11) is equivalent to solving

$\left[ {\matrix{ {{\bf X'X}} \tab {{\bf X'K}} \cr {\bf X} \tab {{\bf K} \plus \displaystyle{{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf I}} \cr} } \right]\left[ {\matrix{ {{\hats\bimbeta} } \cr {{\hat\bimalpha }} \cr} } \right] \equals \left[ {\matrix{ {{\bf X'y}} \cr {\bf y} \cr} } \right]\comma$

and the genetic value is predicted as

(13)

$\eqalign{ {\bf \hat {g}} \equals \tab \lsqb {\bf A} \plus {\bf D} \plus \lpar {\bf A}\# {\bf A}\rpar \plus \lpar {\bf A}\# {\bf D}\rpar \plus \lpar {\bf D}\# {\bf D}\rpar \rsqb \hat{\bimalpha } \cr \equals \tab {\bf A}\hat{\bimalpha } \plus {\bf D}\hat{\bimalpha } \plus \lpar {\bf A}\# {\bf A}\rpar \hat{\bimalpha } \plus \lpar {\bf A}\# {\bf D}\rpar \hat{\bimalpha } \plus \lpar {\bf D}\# {\bf D}\rpar \hat{\bimalpha }. \cr}$

Here, the ‘non-parametric’ breeding value would be Aα, and its predictor is Aᾶ. The posterior distribution of Aα can be arrived at by drawing samples from the posterior distribution of α.

(ii) SNP-based non-parametric breeding value

Let now K_h be a kernel matrix whose entries depend on a string x of SNP genotypes or haplotypes available for a set of individuals having phenotypic records. For example, Gianola & van Kaam (Reference Gianola and van Kaam2008) consider the Gaussian kernel

(14)

$k_{\bf h} \lpar {\bf x}_{i} \comma {\bf x}_{j} \rpar \equals \exp \left[ { \minus {{\lpar {\bf x}_{i} \minus {\bf x}_{j} \rpar \prime \lpar {\bf x}_{i} \minus {\bf x}_{j} \rpar } \over h}} \right].$

Alternatively, one could use as kernel

$\eqalign{k_{\bf h} \lpar {\bf x}_{i} \comma {\bf x}_{j} \rpar \equals\tab \prod\limits_{k \equals \setnum{1}}^{{\rm number\ of\ chromosome}\;{\rm pairs}}\cr\tab{\exp \left[ { \minus {{\lpar {\bf x}_{ik} \minus {\bf x}_{jk} \rpar \prime \lpar {\bf x}_{ik} \minus {\bf x}_{jk} \rpar } \over {h_{k} }}} \right]} \comma$

where x_jk is the SNP string for chromosome k in individual j, and h ₁, h ₂, … , h _k are chromosome-specific positive bandwidth parameters.

Irrespective of the kernel adopted, let now

${\bf K}{\bimalpha } \equals {\bf a}_{K} \plus {\bimgamma} \comma$

where a_K is defined as non-parametric breeding value, and γ is independent of a_K. With A being the numerator relationship matrix between the individuals having SNP information, use of the reasoning leading to (4) produces

(15)

$\eqalign{{\bf a}_{K} \equals\tab \sigma _{a}^{\setnum{2}} {\bf A}\lsqb {\rm Var\lpar }{\bf K}{\bimalpha }{\rm \rpar }\rsqb ^{ \minus \setnum{1}} {\bf K}{\bimalpha } \cr\equals\tab \sigma _{a}^{\setnum{2}} {\bf A}\lsqb {\bf KK}^{ \minus \setnum{1}} \sigma _{\alpha }^{\setnum{2}} {\bf K}\rsqb ^{ \minus \setnum{1}} {\bf K}{\bimalpha } \equals {{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A\alpha }\comma$

where σ_a² is some estimate of additive genetic variance for the trait in question. Thus, the non-parametric breeding value of individual i in the sample has the form

$a_{K\comma i} \equals {{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}\mathop\sum\limits_{j \equals \setnum{1}}^{n} {a_{ij} \alpha _{j} \comma }$

so that, if individuals are unrelated, $a_{K\comma i} \equals {{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}\alpha _{i}$ , with variance $V\lpar a_{K\comma i} \rpar \equals \sigma _{a}^{\setnum{2}} \left( {{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}k^{ii} } \right)$ , where kⁱⁱ is the ith diagonal element of K⁻¹.

Note that

(16)

${\rm Var}\lpar {\bf a}_{K} \rpar \equals {\bf A}\sigma _{a}^{\setnum{2}} \left( {{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf K}^{ \minus \setnum{1}} {\bf A}} \right).$

This is equal to Aσ_a ² only if K=A and σ_α²=σ_a²; in this case, no molecular information is used at all. Suppose now that A=I, so that individuals are genetically unrelated. However, non-parametric breeding values turn out to be correlated, since

(17)

${\rm Var}\lpar {\bf a}_{K} \rpar \equals \sigma _{a}^{\setnum{2}} \left( {{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf K}^{ \minus \setnum{1}} } \right).$

This can be interpreted in the following manner, using a Gaussian kernel to illustrate. Here, the entries of the kernel matrix have the form

$\eqalign{k_{\bf h} \lpar {\bf x}_{i} \comma {\bf x}_{j} \rpar \equals \tab \exp \left[ { \minus {{\lpar {\bf x}_{i} \minus {\bf x}_{j} \rpar \prime \lpar {\bf x}_{i} \minus {\bf x}_{j} \rpar } \over h}} \right] \cr \equals \tab \prod\limits_{k \equals \setnum{1}}^{{\rm number\ of\ markers}} {\exp \left[ { \minus {{\lpar x_{ik} \minus x_{jk} \rpar ^{\setnum{2}} } \over h}} \right]} \comma \cr}$

and $\exp \left[ { \minus {{\lpar x_{ik} \minus x_{jk} \rpar ^{\setnum{2}} } \over h}} \right]$ is maximum when x_ik=x_jk, that is, when individuals i and j have the same genotype at marker k. This means that the elements of the kernel matrix (taking values between 0 and 1) will be larger for pairs of individuals that are more ‘molecularly alike’, even if unrelated by line of descent. This molecular similarity is then propagated into (16) and (17).

Given a point estimate of α(ᾶ), e.g. the posterior mean, the non-parametric estimate of breeding value is

(18)

${\bf \hat{a}}_{K} \equals {{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A}\hat{\bimalpha }.$

If computations are carried out in a fully Bayesian Monte Carlo context (with m samples drawn from the posterior distribution), the posterior mean estimate is

${\bf \hat{a}}_{K} \equals E\left( {{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A\bimalpha }\vert {\rm DATA}} \right) \approx {{{\bf A}\sigma _{a}^{\setnum{2}} } \over m}\mathop\sum\limits_{i \equals \setnum{1}}^{m} {{{{\bf \bimalpha }^{\lpar i\rpar } \over {\sigma _{\alpha }^{\setnum{2}\lpar i\rpar } }}\comma }$

and the uncertainty measure (akin to prediction error variance–covariance in BLUP) is

$\eqalign{\tab {\rm Var}\left( {{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A}\bimalpha \vert {\rm DATA}}\! \right) \cr \tab \quad\approx \sigma _{a}^{\setnum{4}} {\bf A}\left\{ {{1 \over m}\mathop\sum\limits_{i \equals \setnum{1}}^{m} {{{\bimalpha ^{\lpar i\rpar } \bimalpha ^{\lpar i\rpar}\prime } \over {\left( {\sigma _{\alpha }^{\setnum{2}}} ^{\lpar i\rpar } \right)}^{\setnum{2}}} \minus \left( {{1 \over m}} \right)^{\setnum{2}} \left[ {\mathop\sum\limits_{i \equals \setnum{1}}^{m} {{{\bimalpha ^{\lpar i\rpar } } \over {\left( {\sigma _{\alpha }^{\setnum{2}}} ^{\lpar i\rpar } \right)}}} } \right]}}\right.\cr\tab\left.\quad\quad\times\left[ {\mathop\sum\limits_{i \equals \setnum{1}}^{m} {{{\bimalpha ^{\lpar i\rpar} \prime } \over {\left( {\sigma _{\alpha }^{\setnum{2}}} ^{\lpar i\rpar } \right)}}} } \right]} } \right\}{\bf A}. \cr}$

Non-parametric breeding values of individuals that are not genotyped can be inferred as follows. Let A_(−,+) be the additive relationship matrix between individuals that are not genotyped (−) and those which are genotyped (+). The non-parametric breeding value of non-genotyped individuals would be

${\bf a}_{{\rm \lpar } \minus \comma K{\rm \rpar }} {\rm \equals }{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A}_{\lpar \minus \comma \plus \rpar } \bimalpha$

with its point estimate being

${\bf\hat{a}}_{{\rm \lpar }\hskip-1 \minus \comma K{\rm \rpar }} {\rm \equals }{{\sigma _{a}^{\setnum{2}} } \over {\hat{\sigma }_{\alpha }^{\setnum{2}} }}{\bf A}_{\lpar \hskip-1\minus \comma \plus \rpar } \hat{\bimalpha }.$

Likewise, non-parametric breeding values of yet-to-be genotyped and phenotyped progeny would be

${\bf a}_{{\rm \lpar prog}\comma K{\rm \rpar }} {\rm \equals }{{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A}_{\lpar {\rm prog}\comma K\rpar } \bimalpha \comma$

where A(_prog,+) is the additive relationship matrix between progenies and individuals with data.

(iii) Semi-parametric breeding value

Another possibility consists of treating additive genetic effects in a parametric manner and non-additive effects in the RKHS framework, as suggested by Gianola et al. (Reference Gianola, Fernando and Stella2006). Let now the model be

(19)

${\bf y \equals {\bf X}}{\bimbeta} {\plus {\bf u} \plus {\bf K}}\bimalpha {\rm \plus {\bf e}\comma }$

where u~(0, Aσ_a²) and Kα as before, but with

${\bf K \equals \lsqb }{\bf D}{\rm \# }{\bf K}_{h} {\rm \plus \lpar }{\bf A}{\rm \# }{\bf A}{\rm \# }{\bf K}_{h} {\rm \rpar \plus \lpar }{\bf A}{\rm \# }{\bf D}{\rm \# }{\bf K}_{h} {\rm \rpar \plus \lpar }{\bf D}{\rm \# }{\bf D}{\# {\bf K}}_{h} {\rm \rpar \rsqb \comma }$

where K_h is a positive-definite matrix of Gaussian kernels with entries as in (14), so that SNP information is used. Since K is the sum of Hadamard products of positive-definite matrices, it is positive-definite as well and, hence, it is a valid kernel for RKHS regression. The estimating equations (can be rendered symmetric by premultiplying the α-set of equations by K) take now the form

$\left[ {\matrix{ {{\bf X \prime} {\bf X}} \tab {{\bf X \prime} } \tab {{\bf X \prime} {\bf K}} \cr {\bf X} \tab {{{\bf I} \plus }\displaystyle{{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A}^{ \minus \setnum{1}} } \tab {\bf K} \cr {\bf X} \tab {\bf I} \tab {{{\bf K} \plus }\displaystyle{{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf I}} \cr} } \right]\left[ {\matrix{ {\hats{\bimbeta }} \cr {\bf \hat{u}} \cr {\hat{\bimalpha }} \cr} } \right] \equals \left[ {\matrix{ {{\bf X \prime} {\bf y}} \cr {\bf y} \cr {\bf y} \cr} } \right] \comma$

where û is the predicted breeding value, and

$\eqalign{{\bf K}\hat{\bimalpha }{\rm \equals}\tab \lpar {\bf D}{\rm \# }{\bf K}_{h} {\rm \rpar }\hat{\bimalpha }{\rm \plus \lpar }{\bf A}{\rm \# }{\bf A}{\rm \# }{\bf K}_{h} {\rm \rpar }\hat{\bimalpha }{\rm \plus \lpar }{\bf A}{\rm \# }{\bf D}{\rm \# }{\bf K}_{h} {\rm \rpar }\hat{\bimalpha }\cr\tab{\rm \plus \lpar }{\bf D}{\rm \# }{\bf D}{\rm \# K}_{h} {\rm \rpar }\hat{\bimalpha }$

is the predicted (SNP-based) non-additive genetic value. For instance, (D#K_h)ᾶ is interpretable as a predicted dominance value, and so on. A natural extension of this consists of removing u from (19) and then taking as kernel matrix

${\bf K}{\rm \equals \lsqb }{\bf A}{\rm \plus }{\bf D}{\rm \plus \lpar }{\bf A}{\rm \# }{\bf A}{\rm \rpar \plus \lpar }{\bf A}{\rm \# D\rpar \plus \lpar }{\bf D}{\rm \# }{\bf D}{\rm \rpar \rsqb \# }{\bf K}_{h} \comma$

so that the predicted additive breeding value would now be (A#K_h)ᾶ, much along the lines of (18). This would give a completely non-parametric treatment of prediction of additive and non-additive genetic effects using dense molecular information.

5. Illustration: infinitesimal model

The toy example in Henderson (Reference Henderson1985) is considered. The problem is to infer additive and dominance genetic effects of five individuals using data consisting of phenotypic records for only four (subject 1 lacks a record). The data and model are:

$\left[ {\matrix{ {y_{\setnum{2}} \equals 5} \cr {y_{\setnum{3}} \equals 3} \cr {y_{\setnum{4}} \equals 7} \cr {y_{\setnum{5}} \equals 8} \cr} } \right] \equals \left[ {\matrix{ 1 \tab 2 \cr 1 \tab 3 \cr 1 \tab 1 \cr 1 \tab 5 \cr} } \right]\left[ {\matrix{ {\beta _{\setnum{0}} } \cr {\beta _{\setnum{1}} } \cr} } \right] \plus \left[ {\matrix{ 0 \tab 1 \tab 0 \tab 0 \tab 0 \cr 0 \tab 0 \tab 1 \tab 0 \tab 0 \cr 0 \tab 0 \tab 0 \tab 1 \tab 0 \cr 0 \tab 0 \tab 0 \tab 0 \tab 1 \cr} } \right]\left( {\left[ {\matrix{ {{a}_{\setnum{1}} } \cr {{a}_{\setnum{2}} } \cr {{a}_{\setnum{3}} } \cr {{a}_{\setnum{4}} } \cr {{a}_{\setnum{5}} } \cr} } \right] \plus \left[ {\matrix{ {d_{\setnum{1}} } \cr {d_{\setnum{2}} } \cr {d_{\setnum{3}} } \cr {d_{\setnum{4}} } \cr {d_{\setnum{5}} } \cr} } \right]} \right) \plus \left[ {\matrix{ {e_{\setnum{2}} } \cr {e_{\setnum{3}} } \cr {e_{\setnum{4}} } \cr {e_{\setnum{5}} } \cr} } \right] \equals {\bf X}\bimbeta {\plus {\bf Z}\lpar {\bf a} \plus {\bf d}\rpar \plus {\bf e}}{\rm.}$

Although 1 does not have a record, all additive and dominance effects are included in the model, zeroing out the incidence of a ₁ and d ₁ via a column of 0s in the Z incidence matrix. Henderson (Reference Henderson1985) assumed that the additive, dominance and residual variance components were σ_a²=5, σ_d²=4 and σ_e²=20, so that the phenotypic variance was 29, and used the additive and dominance relationship matrices

$\eqalign{{\bf A}\!{\rm \equals }\!\left[ \openup3{\matrix{ 1 \tab 0 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \cr 0 \tab 1 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 0 \cr {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 1 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 4}}} \cr {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 1 \tab {{\textstyle{1 \over 4}}} \cr {{\textstyle{1 \over 2}}} \tab 0 \tab {{\textstyle{1 \over 4}}} \tab {{\textstyle{1 \over 4}}} \tab 1 \cr} } \right]\!\quad\! {\rm and}\quad {\bf D} \!\equals\! \left[ {\openup3\matrix{ 1 \tab 0 \tab 0 \tab 0 \tab 0 \cr 0 \tab 1 \tab 0 \tab 0 \tab 0 \cr 0 \tab 0 \tab 1 \tab {{\textstyle{1 \over 4}}} \tab 0 \cr 0 \tab 0 \tab {{\textstyle{1 \over 4}}} \tab 1 \tab 0 \cr 0 \tab 0 \tab 0 \tab 0 \tab 1 \cr} } \right].$

Application of BLUP leads to the unique solutions (rounded at the third decimal)

$\openup3 \eqalign{\tab\hskip4{\hats}{\hskip-6{\bimbeta}\prime} \equals \lsqb 5{\cdot}145\quad 0{\cdot}241\rsqb \comma \cr \tab {\bf{\hat{a }}} \prime \equals \lsqb 0{\cdot}045\quad \minus 0{\cdot}192\quad \minus 0{\cdot}343\quad 0{\cdot}096\quad 0{\cdot}242\rsqb \comma \cr \tab {\bf\hatss{d}} \prime \equals \lsqb 0\quad \minus 0{\cdot}073\quad \minus 0{\cdot}365\quad 0{\cdot}162\quad 0{\cdot}234\rsqb. \cr}$

The predicted total genetic value is ĝ=â+, yielding

${\bf \hat{g}} \equals \lsqb 0{\cdot}045\quad \minus 0{\cdot}265\quad \minus 0{\cdot}708\quad 0{\cdot}259\quad 0{\cdot}477\rsqb.$

Alternatively, a RKHS representation as in (12) is used now, with K=A+D as kernel matrix, which does not involve any bandwidth parameters. One has

${\bf K} \equals \left[ \openup3{\matrix{ 2 \tab 0 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \cr 0 \tab 2 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 0 \cr {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 2 \tab {{\textstyle{3 \over 4}}} \tab {{\textstyle{1 \over 4}}} \cr {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{3 \over 4}}} \tab 2 \tab {{\textstyle{1 \over 4}}} \cr {{\textstyle{1 \over 2}}} \tab 0 \tab {{\textstyle{1 \over 4}}} \tab {{\textstyle{1 \over 4}}} \tab 2 \cr} } \right].$

The RKHS model (using the part of K pertaining to individual with records) is

$\openup3\eqalign{\left[ {\matrix{ {y_{\setnum{2}} } \cr {y_{\setnum{3}} } \cr {y_{\setnum{4}} } \cr {y_{\setnum{5}} } \cr} } \right] \equals \tab \left[ {\matrix{ 1 \tab 2 \cr 1 \tab 3 \cr 1 \tab 1 \cr 1 \tab 5 \cr} } \right]\,\left[ {\matrix{ {\beta _{\setnum{0}} } \cr {\beta _{\setnum{1}} } \cr} } \right]\! \plus\! \left[ {\matrix{ 2 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 0 \cr {{\textstyle{1 \over 2}}} \tab 2 \tab {{\textstyle{3 \over 4}}} \tab {{\textstyle{1 \over 4}}} \cr {{\textstyle{1 \over 2}}} \tab {{\textstyle{3 \over 4}}} \tab 2 \tab {{\textstyle{1 \over 4}}} \cr 0 \tab {{\textstyle{1 \over 4}}} \tab {{\textstyle{1 \over 4}}} \tab 2 \cr} } \right]\,\left[ {\matrix{ {\alpha _{\setnum{2}} } \cr {\alpha _{\setnum{3}} } \cr {\alpha _{\setnum{4}} } \cr {\alpha _{\setnum{5}} } \cr} } \right]\! \plus\! \left[ {\matrix{ {e_{\setnum{2}} } \cr {e_{\setnum{3}} } \cr {e_{\setnum{4}} } \cr {e_{\setnum{5}} } \cr} } \right] \cr\equals\tab {\bf X}{\bimbeta} \plus {\bf K}{\bimalpha} {\rm \plus e}{\rm.}\cr}$

The value adopted for the sole smoothing parameter is σ_α²=σ_a²+σ_d²=9, and the estimating equation (11) and the solution to the RKHS mixed model equations is

$\openup5\eqalign{\tab\lsqb \hats{\beta} _{\setnum{0}} \equals 5{\cdot}289\quad \hats{\beta} _{\setnum{1}} \equals0{\cdot}200\quad \hat\alpha _{\setnum{2}} \equals \minus 0{\cdot}128\cr\tab\quad \hat\alpha _{\setnum{3}} \equals \minus 0{\cdot}781\quad \hat\alpha _{\setnum{4}} \equals 0{\cdot}487\quad \hat\alpha _{\setnum{5}} \equals 0{\cdot}422\rsqb.$

The non-parametric additive genetic effects are inferred using formula (18)

$\openup2\eqalign{\left[ {\matrix{ {\hat{a}_{K\comma \setnum{1}} } \cr {\hat{a}_{K\comma \setnum{2}} } \cr {\hat{a}_{K\comma \setnum{3}} } \cr {\hat{a}_{K\comma \setnum{4}} } \cr {\hat{a}_{K\comma \setnum{5}} } \cr} } \right] \tab\equals {{\sigma _{a}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf A}_{all\comma\! \plus} \hat{\bimalpha} \equals {5 \over 9}\left[ \openup3{\matrix{ 0 \tab {{1 \over 2}} \tab {{1 \over 2}} \tab {{1 \over 2}} \cr 1 \tab {{1 \over 2}} \tab {{1 \over 2}} \tab 0 \cr {{1 \over 2}} \tab 1 \tab {{1 \over 2}} \tab {{1 \over 4}} \cr {{1 \over 2}} \tab {{1 \over 2}} \tab 1 \tab {{1 \over 4}} \cr 0 \tab {{1 \over 4}} \tab {{1 \over 4}} \tab 1 \cr} } \right]\left[ {\matrix{ { \minus 0{\cdot}128} \cr { \minus 0{\cdot}781} \cr {0{\cdot}487} \cr {0{\cdot}422} \cr} } \right] \cr\tab\equals \left[ {\matrix{ {0{\cdot}036} \cr { \minus 0{\cdot}153} \cr { \minus 0{\cdot}276} \cr {0{\cdot}076} \cr {0{\cdot}194} \cr} } \right]\comma$

and the non-parametric estimates of dominance effects are

$\openup3\eqalign{\left[ {\matrix{ {\hatss{d}_{K\comma \setnum{1}} } \cr {\hatss{d}_{K\comma \setnum{2}} } \cr {\hatss{d}_{K\comma \setnum{3}} } \cr {\hatss{d}_{K\comma \setnum{4}} } \cr {\hatss{d}_{K\comma \setnum{5}} } \cr} } \right] \equals\tab {{\sigma _{d}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf D}_{all\comma\! \plus} \hat{\bimalpha} \equals {4 \over 9}\left[ {\matrix{ 0 \tab 0 \tab 0 \tab 0 \cr 1 \tab 0 \tab 0 \tab 0 \cr 0 \tab 1 \tab {{1 \over 4}} \tab 0 \cr 0 \tab {{1 \over 4}} \tab 1 \tab 0 \cr 0 \tab 0 \tab 0 \tab 1 \cr} } \right]\left[ {\matrix{ { \minus 0{\cdot}128} \cr { \minus 0{\cdot}781} \cr {0{\cdot}487} \cr {0{\cdot}422} \cr} } \right] \cr\equals\tab \left[ {\matrix{ 0 \cr { \minus 0{\cdot}057} \cr { \minus 0{\cdot}293} \cr {0{\cdot}130} \cr {0{\cdot}188} \cr} } \right].$

The total genetic value is estimated at

$\left[ {\matrix{ {\hat{g}_{K\comma \setnum{1}} } \cr {\hat{g}_{K\comma \setnum{2}} } \cr {\hat{g}_{K\comma \setnum{3}} } \cr {\hat{g}_{K\comma \setnum{4}} } \cr {\hat{g}_{K\comma \setnum{5}} } \cr} } \right] \equals \left[ {\matrix{ {\hat{a}_{K\comma \setnum{1}} } \cr {\hat{a}_{K\comma \setnum{2}} } \cr {\hat{a}_{K\comma \setnum{3}} } \cr {\hat{a}_{K\comma \setnum{4}} } \cr {\hat{a}_{K\comma \setnum{5}} } \cr} } \right] \plus \left[ {\matrix{ {\hatss{d}_{K\comma \setnum{1}} } \cr {\hatss{d}_{K\comma \setnum{2}} } \cr {\hatss{d}_{K\comma \setnum{3}} } \cr {\hatss{d}_{K\comma \setnum{4}} } \cr {\hatss{d}_{K\comma \setnum{5}} } \cr} } \right] \equals \left[ {\matrix{ {0{\cdot}036} \cr { \minus 0{\cdot}210} \cr { \minus 0{\cdot}569} \cr {0{\cdot}206} \cr {0{\cdot}382} \cr} } \right].$

Suppose now that one wishes to predict future performance of these five individuals, and assume that the record of 1 will be made under the same conditions as those for the record of 2, with all conditions remaining the same. The model for future records f is then

$\openup3\eqalign{\left[ {\matrix{ {y_{\setnum{1}}^{\hskip1f} } \cr {y_{\setnum{2}}^{\hskip1f} } \cr {y_{\setnum{3}}^{\hskip1f} } \cr {y_{\setnum{4}}^{\hskip1f} } \cr {y_{\setnum{5}}^{\hskip1f} } \cr} } \right] \equals\tab \left[ {\matrix{ 1 \tab 2 \cr 1 \tab 2 \cr 1 \tab 3 \cr 1 \tab 1 \cr 1 \tab 5 \cr} } \right]\left[ {\matrix{ {\beta _{\setnum{0}} } \cr {\beta _{\setnum{1}} } \cr} } \right] \plus \left[ {\matrix{ {a_{\setnum{1}} } \cr {a_{\setnum{2}} } \cr {a_{\setnum{3}} } \cr {a_{\setnum{4}} } \cr {a_{\setnum{5}} } \cr} } \right] \plus \left[ {\matrix{ {d_{\setnum{1}} } \cr {d_{\setnum{2}} } \cr {d_{\setnum{3}} } \cr {d_{\setnum{4}} } \cr {d_{\setnum{5}} } \cr} } \right] \plus \left[ {\matrix{ {e_{\setnum{1}}^{\hskip1f} } \cr {e_{\setnum{2}}^{\hskip1f} } \cr {e_{\setnum{3}}^{\hskip1f} } \cr {e_{\setnum{4}}^{\hskip1f} } \cr {e_{\setnum{5}}^{\hskip1f} } \cr} } \right] \cr\equals\tab {\bf M}_{P} { \bimtheta }_{P} \plus {\bf e}^{\hskip1f} \comma$

for the parametric model, whereas that for the RKHS treatment is

$\openup3\eqalign{\left[ {\matrix{ {y_{\setnum{1}}^{\hskip1f} } \cr {y_{\setnum{2}}^{\hskip1f} } \cr {y_{\setnum{3}}^{\hskip1f} } \cr {y_{\setnum{4}}^{\hskip1f} } \cr {y_{\setnum{5}}^{\hskip1f} } \cr} } \right] \!\equals\!\tab \left[ {\matrix{ 1 \tab 2 \cr 1 \tab 2 \cr 1 \tab 3 \cr 1 \tab 1 \cr 1 \tab 5 \cr} } \right]\left[ {\matrix{ {\beta _{\setnum{0}} } \cr {\beta _{\setnum{1}} } \cr} } \right] \!\plus\! \left[ {\matrix{ 0 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \cr 2 \tab {{\textstyle{1 \over 2}}} \tab {{\textstyle{1 \over 2}}} \tab 0 \cr {{\textstyle{1 \over 2}}} \tab 2 \tab {{\textstyle{3 \over 4}}} \tab {{\textstyle{1 \over 4}}} \cr {{\textstyle{1 \over 2}}} \tab {{\textstyle{3 \over 4}}} \tab 2 \tab {{\textstyle{1 \over 4}}} \cr 0 \tab {{\textstyle{1 \over 4}}} \tab {{\textstyle{1 \over 4}}} \tab 2 \cr} } \right]\! \plus\! \left[ {\matrix{ {\alpha _{\setnum{2}} } \cr {\alpha _{\setnum{3}} } \cr {\alpha _{\setnum{4}} } \cr {\alpha _{\setnum{5}} } \cr} } \right] \!\plus\! \left[ {\matrix{ {e_{\setnum{1}}^{\hskip1f} } \cr {e_{\setnum{2}}^{\hskip1f} } \cr {e_{\setnum{3}}^{\hskip1f} } \cr {e_{\setnum{4}}^{\hskip1f} } \cr {e_{\setnum{5}}^{\hskip1f} } \cr} } \right] \cr\!\equals\!\tab {\bf M}_{K} {\bimtheta} _{K} \plus {\bf e}^{\hskip1f}.$

Above M_. and θ_. denote incidence matrices and regression coefficients, respectively; P and K indicate parametric and RKHS treatments, respectively. Using a standard Bayesian argument (Sorensen & Gianola, Reference Sorensen and Gianola2002), the mean vector and covariance matrix of the predictive distributions are

(20)

$\openup3\eqalign{\tab \left[ {\matrix{ {y_{\setnum{1}}^{\hskip1f} } \cr {y_{\setnum{2}}^{\hskip1f} } \cr {y_{\setnum{3}}^{\hskip1f} } \cr {y_{\setnum{4}}^{\hskip1f} } \cr {y_{\setnum{5}}^{\hskip1f} } \cr} } \right]\vert \left[ {\matrix{ {y_{\setnum{2}} } \cr {y_{\setnum{3}} } \cr {y_{\setnum{4}} } \cr {y_{\setnum{5}} } \cr} } \right]\comma\ {\rm dispersion}\,{\rm \lpar smoothing\rpar }\,{\rm parameters} \cr \tab\quad \sim \lpar {\bf M}_{.}\hats{\bimtheta }_{.}\comma \lpar {\rm M}_{.}{\rm C}_{.}^{ \minus \setnum{1}}{\rm M}_{.}{\hskip-2\prime} \plus {\bf I}_{f} \rpar \sigma _{e}^{\setnum{2}} \rpar \comma \cr} \hskip-10pt$

where C is the corresponding coefficient matrix for each of the procedures; the 5×5 matrix I_fσ_e² conveys uncertainty stemming from the fact that the future records (f) have not been realized yet. For the two procedures, means and standard deviations of the predictive distributions are

$P \equals \left[ {\matrix{ {5{\cdot}674 \pm 6{\cdot}020} \cr {5{\cdot}364 \pm 5{\cdot}460} \cr {5{\cdot}162 \pm 5{\cdot}353} \cr {5{\cdot}646 \pm 5{\cdot}834} \cr {6{\cdot}828 \pm 6{\cdot}115} \cr} } \right]\semi \quad K \equals \left[ {\matrix{ {5{\cdot}754 \pm 5{\cdot}576} \cr {5{\cdot}286 \pm 5{\cdot}659} \cr {4{\cdot}735 \pm 5{\cdot}561} \cr {5{\cdot}919 \pm 5{\cdot}940} \cr {7{\cdot}061 \pm 6{\cdot}157} \cr} } \right].$

6. Illustration: non-linear system

(i) Two-locus model

A hypothetical system with two biallelic loci was simulated. It was assumed that phenotypes were generated according to the rule

(21)

$E\lpar y\vert \alpha _{i} \comma \alpha _{j} \comma \beta _{i} \comma \beta _{j} \rpar \equals \alpha _{i} \plus \alpha _{j} \plus \beta _{i} \beta _{j} \plus \alpha _{i} \alpha _{j} \sqrt {\beta _{i} \beta _{j}} \comma$

where α_i (β_i) and α_j (β_j) are effects of alleles i and j at the α (β) locus. The system is nonlinear on allelic effects, as indicated by the first derivatives of the conditional expectation function with respect to the αs or βs. For instance,

${{\partial E\lpar.\rpar } \over {\partial \alpha _{j} }} \equals 1 \plus \alpha _{i} \sqrt {\beta _{i} \beta _{j} } \semi \quad \quad {{\partial E\lpar.\rpar } \over {\partial \beta _{j} }} \equals \beta _{i} \plus {1 \over 2}\alpha _{i} \alpha _{j} \sqrt {{{\beta _{i} } \over {\beta _{j} }}}.$

The α-effects were two random draws from an exponential distribution with mean value equal to 2; the first draw was assigned to allele A and the second to allele a. The βs were drawn from a Weibull (2,1) distribution, having median, mean and mode equal to 0·347, 0·887 and 0·25; the first (second) deviate was the effect of allele B (b). The non-linearity of the system is illustrated in Fig. 1, where the derivative of the model for the expected phenotype with respect to the effect of Weibull allele b is plotted, with all other alleles evaluated at the values drawn. Variation in values of b produces drastic modifications near the origin, but phenotypes are essentially insensitive for b>1·5, even though these values are plausible in the Weibull process hypothesized.

Fig. 1. Rate of change of the expected phenotypic value $\left( {{\partial E\lpar.\rpar} \over {\partial \beta_{j}}}\equals \beta_{i} \plus {1 \over 2}\alpha_i \alpha_j \sqrt {{\beta_i} \over {\beta_j }} \right)$ with respect to the Weibull variable β_j.

Residuals were drawn from the normal distribution N (0, 20), and added to (21) to form phenotypes. The resulting phenotypic distribution is unknown, because y is a non-linear function of exponential and Weibull variates, plus an additive normally distributed residual. There were 5 individuals with records for each of the AABB, AABb, AAbb genotypes; 20 for each of AaBB, AaBb and Aabb and 5 for each of aaBB, aaBb and aabb. Thus, there were 90 individuals with phenotypic records, in total.

Since there are nine distinct mean genotypic values, variation among their average values can be explained completely with a linear model on 9 df; in the standard treatment, these df correspond to an overall mean, additive effects (2 df), dominance (2 df) and epistasis (4 df). Due to non-linearity, it is not straightforward to assess the proportion of variance ‘due to genetic effects’ using a random effects model based on the Weibull and exponential distributions, although an approximation can be arrived at. A linear approximation of the phenotype of an individual yields

$\eqalign{y \approx \tab 2\bar{\alpha } \plus \bars{\beta }^{\setnum{2}} \plus \bar{\alpha }^{\setnum{2}} \bars{\beta } \plus \lpar 2\bar{\alpha }\bars{\beta } \plus 2\rpar \lpar \alpha \minus \bar{\alpha }\rpar \cr \tab \plus\lpar \bar{\alpha }^{\setnum{2}} \plus 2\bars{\beta }\rpar \lpar \beta \minus \bars{\beta }\rpar \plus e \cr }\comma$

where ᾱ and are the means of the exponential and Weibull processes, respectively, and e~N(0, σ²). An approximation to heritability is

(22)

$h^{\setnum{2}} \equals {{\lpar 2\bar{\alpha }\bars{\beta } \plus 2\rpar ^{\setnum{2}} {\rm Var}\lpar \alpha \rpar \plus \lpar \bar{\alpha }^{\setnum{2}} \plus 2\bars{\beta }\rpar ^{\setnum{2}} {\rm Var}\lpar \beta \rpar } \over {\lpar 2\bar{\alpha }\bars{\beta } \plus 2\rpar ^{\setnum{2}} {\rm Var}\lpar \alpha \rpar \plus \lpar \bar{\alpha }^{\setnum{2}} \plus 2\bars{\beta }\rpar ^{\setnum{2}} {\rm Var} \lpar \beta \rpar \plus \sigma ^{\setnum{2}} }}.$

For the situation simulated, ྱ = 2, $\bars{\beta } \equals 0{\cdot}887$ , Var(α)=4, Var(β)=0·785, so that for σ²=1000, 100, 20, 10 one obtains h ²≈0·13, 0·60, 0·88 and 0·94, respectively. The simulation produced a high penetrance trait, that is, one with heritability near 0·88, which provides a challenge to any non-parametric treatment.

(ii) RKHS modelling

Among the many possible candidate kernels available, an arbitrarily chosen Gaussian kernel was adopted for the RKHS regression implementations, using as covariate a 2×1 vector consisting of the number of alleles at each of the two loci, e.g. x _AA=2, x _Aa=1 and x _aa=0. For example, the kernel entry for genotypes AABB and AAbb is

$\eqalign{k\lpar {\bf x}_{AABB} \comma {\bf x}_{AAbb} \comma h\rpar \equals \tab \exp \left[ { \minus {{\lpar 2 \minus 2\rpar ^{\setnum{2}} \plus \lpar 2 \minus 0\rpar ^{\setnum{2}} } \over h}} \right] \cr \equals \tab \exp \left[ { \minus {4 \over h}} \right]\comma$

and the 9×9 kernel matrix for all possible genotypes (labels for genotypes are included, to facilitate understanding of entries) is

${\bf K}_{h} \equals \left[\openup2 {\matrix{ {} \tab {AABB} \tab {AABb} \tab {AAbb} \tab {AaBB} \tab {AaBb} \tab {Aabb} \tab {aaBB} \tab {aaBb} \tab {aabb} \cr {AABB} \tab 1 \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{8}\sol h} } \cr {AABb} \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab 1 \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \cr {AAbb} \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab 1 \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{8}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \cr {AaBB} \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab 1 \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \cr {AaBb} \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab 1 \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \cr {Aabb} \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab 1 \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \cr {aaBB} \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{8}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab 1 \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \cr {aaBb} \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab 1 \tab {e^{ \minus \setnum{1}\sol h} } \cr {aabb} \tab {{\rm e}^{ \minus \setnum{8}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{5}\sol h} } \tab {{\rm e}^{ \minus \setnum{2}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab {{\rm e}^{ \minus \setnum{4}\sol h} } \tab {{\rm e}^{ \minus \setnum{1}\sol h} } \tab 1 \cr} } \right].$

There are only five distinct entries, a result of the measure of ‘allelic disimilarity’ adopted, e.g. e ^−8/h stems from disimilarity in 2 alleles at each of the A and B loci (AABB versus aabb). Figures 2 and 3 depict values of the kernel as a function of the bandwidth parameter h at different levels of S, which enters into exp(−S/h). Values of h larger than 10 (Fig. 2) produce strong ‘prior correlations’ between genotypes; also, the kernel matrix becomes more poorly conditioned as h increases. After evaluating the kernel matrix at h=6, 4, 2 and 1·75, it was decided to adopt h=1·75 as the bandwidth parameter, producing 6 unique entries in the K matrix: 1·0 (diagonal elements, the two individuals have identical genotypes); 0·565 (3 alleles in common in a pair of individuals); 0·319 (2 alleles in common, 1 per locus) or 0·102 (2 alleles in common at only one locus); 0·06 (1 allele in common) and 0·01 (no alleles shared).

Fig. 2. Kernel value $k\lpar.\comma.\semi h\rpar \equals \exp \left(\! { \minus {S \over h}} \right)$ against the bandwidth parameter h. Curves, from top to bottom, correspond to S=1, 2, 4, 5, 8.

Fig. 3. Kernel value $k\lpar.\comma.\semi h\rpar \equals \exp \left(\! { \minus {S \over h}} \right)$ against the bandwidth parameter h. Curves, from top to bottom, correspond to S=1, 2, 4, 5, 8.

(iii) Fitting the RKHS model to the means

A ‘means’ RKHS regression was fitted, including an intercept (β). The model for the 9×1 vector of averages ${\bf \barg{y}} \equals \lcub {\barg{y}}_{i} \rcub$ was

$\left[ {\matrix{ {\bar{y}}_{AABB} \cr {\bar{y}}_{AABb} \cr {\bar{y}}_{AAbb} \cr {\bar{y}}_{AaBB} \cr {\bar{y}}_{AaBb} \cr {\bar{y}}_{Aabb} \cr {\bar{y}}_{aaBB} \cr {\bar{y}}_{aaBb} \cr {\bar{y}}_{aabb} \cr} } \right] \equals \left[ {\matrix{ 1 \cr 1 \cr 1 \cr 1 \cr 1 \cr 1 \cr 1 \cr 1 \cr 1 \cr} } \right]\beta \plus {\bf K}_{\setnum{1{\cdot}75}} \left[ {\matrix{ {\alpha _{\setnum{1}} } \cr {\alpha _{\setnum{2}} } \cr {\alpha _{\setnum{3}} } \cr {\alpha _{\setnum{4}} } \cr {\alpha _{\setnum{5}} } \cr {\alpha _{\setnum{6}} } \cr {\alpha _{\setnum{7}} } \cr {\alpha _{\setnum{8}} } \cr {\alpha _{\setnum{9}} } \cr} } \right] \plus \left[ {\matrix{ {\bar{e}}_{AABB} \cr {\bar{e}}_{AABb} \cr {\bar{e}}_{AAbb} \cr {\bar{e}}_{AaBB} \cr {\bar{e}}_{AaBb} \cr {\bar{e}}_{Aabb} \cr {\bar{e}}_{aaBB} \cr {\bar{e}}_{aaBb} \cr {\bar{e}}_{aabb} \cr} } \right]\comma$

where the αs are the non-parametric regression coefficients; K_1·75 is the 9×9 kernel matrix with h=1·75 and ${\bar{e}}_{i}$ is the mean of the residuals pertaining to observations of individuals with genotype i. As before, the assumptions were $\bimalpha \vert \sigma _{\alpha }^{\setnum{2}} \sim N\lpar {\bf 0} \comma {\bf K}_{\setnum{1{\cdot}75}}^{ \minus \setnum{1}} \sigma_{\alpha }^{\setnum{2}} \rpar$ , and, for the 9×1 vector of average residuals, ${\bf \bar{e}} \vert \sigma _{e}^{\setnum{2}} \sim N \lpar {{\bf 0}\comma {\bf N}^{ \minus \setnum{1}} \sigma _{e}^{\setnum{2}} } \right)$ . Here, N=Diag {5, 5, 5, 20, 20, 20, 5, 5, 5} is a 9×9 diagonal matrix.

The solution to the RKHS regression problem was obtained as

$\openup2\left[ {\matrix{ {\hats{\beta} } \cr {\hat{\bimalpha} } \cr} } \right] \!\equals\! \left[ {\matrix{ {{\bf 1} \prime{\bf N} {\bf 1}} \tab {{\bf 1} \prime {\bf NK}_{{\setnum{1{\cdot}75}}} } \cr {{\bf K}_{{\setnum{1{\cdot}75}}} }{\bf N1}\hskip-24 \prime\hskip21 \tab {{\bf K}_{{\setnum{1{\cdot}75}}}\hskip-12pt\prime\hskip10 {\bf NK}_{\setnum{1{\cdot}75}} \plus \displaystyle{{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf K}_{{\setnum{1{\cdot}75}}} } \cr} } \right]}^{ \minus \setnum{1}}\! \left[ {\matrix{ {{\bf 1} \prime{\bf N}{\rm {\bf\bar{y}}}} \cr {{\bf K}_{{\setnum{1{\cdot}75}}} \hskip-11\prime\hskip10 {\bf N}{\rm {\bf\bar{y}}}} \cr} } \right].$

The model was fitted for each of the following values of the shrinkage ratio $\lambda \equals {{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}$ : 100 (strong shrinkage towards 0), 15, 1, ${1 \over {15}}$ and ${1 \over {100}}$ ; with λ=0 there is no shrinkage of solutions at all. For each of these values, the residual sum of squares

${\rm SSR}_{\lambda } \equals \left( { \bf \barg{y}} \minus {\bf 1} \hats{\beta} \minus {\bf K}_{\setnum{1{\cdot}75}} \hat{\bimalpha }} \right)\prime \left( { \bf \barg{y}} \minus {\bf 1} \hats{\beta } \minus {\bf K}_{\setnum{1{\cdot}75}} {\hat{\bimalpha }} \right)\comma$

and the weighted residual sum of squares

${\rm WSSR}_{\lambda } \equals \left( {\bf \barg{y}} \minus {\bf 1} \hats{\beta } \minus {\bf K}_{\setnum{1{\cdot}75}} \hat{\bimalpha }} \right) \prime\,{\bf N} \left( { \bf \barg{y}} \minus {\bf 1}\hats{\beta } \minus {\bf K}_{\setnum{1{\cdot}75}} {\hat{\bimalpha }} \right)\comma$

were computed, where 1 is a 9×1 vector of ones. Also, the effective number of parameters, or model df (Ruppert et al., Reference Ruppert, Wand and Carroll2003), was assessed as

$df_{\lambda } \equals {\rm Tr} \lpar {\bf W} \ast {\bf C}^{ \minus {\setnum{1}}} {\bf W} \ast \prime \rpar \comma$

where

${\bf W} \ast \equals {\bf N}^{1\sol2} \, \lsqb{\matrix {{\bf 1}\tab {\bf K}_{{\setnum1}\cdot{\setnum {75}}}} \rsqb \comma$

and ${\bf N}^{1\sol2}$ is a diagonal matrix containing the square roots of the entries of N. Note that, when ${{\sigma _{e}^{\setnum{2}} } \over {\sigma _{a}^{\setnum{2}} }} \equals 0$ , the estimating equations have an infinite number of solutions, as only 9 parameters are estimable. In this case, a solution is obtained by using a generalized inverse of the coefficient matrix, yielding

$\openup2\eqalign{\left[ {\matrix{ {\hats{\beta }}^{\setnum{0}} \cr {\hat{\bimalpha }}^{\setnum{0}} \cr} } \right] \equals \tab \left[ {\matrix{ 0 \tab {\bf 0} \cr {\bf 0} \tab {{\bf K}\prime_{\hskip-3{\rm \setnum{1{\cdot}75}}} {\bf NK}_{\setnum{1{\cdot}75}} } \cr} } \right]^{ \minus} \left[ {\matrix{ {{\bf 1}\prime{\bf N}{\bf \barg{y}}} \cr {{\bf K\prime}_{\hskip-3{\setnum{1{\cdot}75}}} {\bf N}{\bf \barg{y}}} \cr} } \right] \cr \equals \tab \left[ {\matrix{ 0 \tab {\bf 0} \cr {\bf 0} \tab {\lpar {\bf K\prime}_{\hskip-3{\setnum{1{\cdot}75}}} {\bf NK}_{\setnum{1}.\setnum{75}} \rpar ^{ \minus \setnum{1}} } \cr} } \right]\left[ {\matrix{ {{\bf 1}\prime{\bf N}{\bf \barg{y}}} \cr {{\bf K\prime}_{\hskip-3{ \setnum{1{\cdot}75}}} {\bf N}{ \bf \barg{y}}} \cr} } \right] \equals \left[ {\matrix{ 0 \cr {{\bf K}_{\setnum{1{\cdot}75}}^{ \minus \setnum{1}} {\bf \barg{y}}} \cr} } \right]. \cr}$

Here, the fitted values are ${\bf {\hats}{\hskip-5\barg{y}}} \equals {\bf 1} \times 0 \plus {\bf K}_{\setnum{1{\cdot}75}} {\bf K}_{\setnum{1{\cdot}75}}^{ \minus \setnum{1}} { \bf\barg{y}} \equals {\bf \barg{y}}$ so the model ‘copies’ the data, and the fit is perfect. Note that $\barg{y}_{i}$ is the least-squares estimate of μ_ij=γ_i+δ_j+(γδ)_ij where γ_i (i=AA, Aa, aa) and δ_j (j=BB, Bb, bb) are main effects of genotypes at loci A and B, respectively; this model accounts for 8 df ‘due to’ additive and dominance effects at each of the two loci, and additive×additive, additive×dominance, dominance×additive and dominance×dominance interactions.

For the sake of comparison, the following fixed effects model was fitted as well

$y \equals \mu \plus \gamma _{i} \plus \delta _{j} \plus \bar{e}_{i} \comma$

where, as before, γ_i (i=AA, Aa, aa) are main effects of genotypes at the A-locus, and δ_j (j=BB, Bb, bb) are the counterparts at the locus B. The linear model was parameterized as

(23)

$\eqalign{\left[ {\matrix{ {\bar{y}}_{AABB} \cr {\bar{y}}_{AABb} \cr {\bar{y}}_{AAbb} \cr {\bar{y}}_{AaBB} \cr {\bar{y}}_{AaBb} \cr {\bar{y}}_{Aabb} \cr {\bar{y}}_{aaBB} \cr {\bar{y}}_{aaBb} \cr {\bar{y}}_{aabb} \cr} } \right] \equals\tab \left[ {\matrix{ 1 \tab 1 \tab 0 \tab 0 \tab 1 \tab 0 \tab 0 \cr 1 \tab 1 \tab 0 \tab 0 \tab 0 \tab 1 \tab 0 \cr 1 \tab 1 \tab 0 \tab 0 \tab 0 \tab 0 \tab 1 \cr 1 \tab 0 \tab 1 \tab 0 \tab 1 \tab 0 \tab 0 \cr 1 \tab 0 \tab 1 \tab 0 \tab 0 \tab 1 \tab 0 \cr 1 \tab 0 \tab 1 \tab 0 \tab 0 \tab 0 \tab 1 \cr 1 \tab 0 \tab 0 \tab 1 \tab 1 \tab 0 \tab 0 \cr 1 \tab 0 \tab 0 \tab 1 \tab 0 \tab 1 \tab 0 \cr 1 \tab 0 \tab 0 \tab 1 \tab 0 \tab 0 \tab 1 \cr} } \right] \left[ {\matrix{ \mu \cr {\gamma _{\setnum{1}} } \cr {\gamma _{\setnum{2}} } \cr {\gamma _{\setnum{3}} } \cr {\delta _{\setnum{1}} } \cr {\delta _{\setnum{2}} } \cr {\delta _{\setnum{3}} } \cr} } \right]\cr\tab \plus \left[ {\matrix{ {\bar{e}}_{AABB} \cr {\bar{e}}_{AABb} \cr {\bar{e}}_{AAbb} \cr {\bar{e}}_{AaBB} \cr {\bar{e}}_{AaBb} \cr {\bar{e}}_{Aabb} \cr {\bar{e}}_{aaBB} \cr {\bar{e}}_{aaBb} \cr {\bar{e}}_{aabb} \cr} } \right].$

This model has 5 free parameters, interpretable as an ‘overall mean’ plus additive and dominance effects at each of the 2 loci. Estimates of estimable functions of fixed effects were obtained from a weighted least-squares approach with solution vector ${\bf b}^{\rm \setnum{0}} \equals \lpar {\bf X}\prime {\bf NX}\rpar ^\minus\times<$><$>{\bf X}\prime {\bf N} {\bf \barg{y}}\comma }$ where (X′NX)⁻ is a generalized inverse of X′NX and X is the 9×7 incidence matrix given above. Weighted residual sums of squares were computed as $\lpar {\bf \barg{y}} \minus {\bf Xb}^{\setnum{0}} \rpar \prime {\bf N} \lpar {\bf \barg{y}} \minus {\bf Xb}^{\rm \setnum{0}} \rpar$ . Mean values of A-locus and B-locus genotypes were estimated as

$\left[ {\matrix{ {\hat{\mu }_{AA} } \cr {\hat{\mu }_{Aa} } \cr {\hat{\mu }_{aa} } \cr {\hat{\mu }_{BB} } \cr {\hat{\mu }_{Bb} } \cr {\hat{\mu }_{bb} } \cr} } \right] \equals \left[\openup3 {\matrix{ 1 \tab 1 \tab 0 \tab 0 \tab {{1 \over 3}} \tab {{1 \over 3}} \tab {{1 \over 3}} \cr 1 \tab 0 \tab 1 \tab 0 \tab {{1 \over 3}} \tab {{1 \over 3}} \tab {{1 \over 3}} \cr 1 \tab 0 \tab 0 \tab 1 \tab {{1 \over 3}} \tab {{1 \over 3}} \tab {{1 \over 3}} \cr 1 \tab {{1 \over 3}} \tab {{1 \over 3}} \tab {{1 \over 3}} \tab 1 \tab 0 \tab 0 \cr 1 \tab {{1 \over 3}} \tab {{1 \over 3}} \tab {{1 \over 3}} \tab 0 \tab 1 \tab 0 \cr 1 \tab {{1 \over 3}} \tab {{1 \over 3}} \tab {{1 \over 3}} \tab 0 \tab 0 \tab 1 \cr} } \right]{\bf b}^{\setnum{0}} \equals \left[ {\matrix{ {8{\cdot}37} \cr {3{\cdot}97} \cr {1{\cdot}39} \cr {4{\cdot}87} \cr {5{\cdot}73} \cr {3{\cdot}12} \cr} } \right]\comma$

and dominance effects were inferred as

$\eqalign{\left[ {\matrix{ {\hatss{d}}_{A} {\hbox{-}} {\rm locus} \cr {\hats {d}_{B} {\hbox{-}} {\rm locus} \cr}} } \right] \tab \equals\left[ {\matrix{ 0 \tab { \minus {1 \over 2}} \tab 1 \tab { \minus {1 \over 2}} \tab 0 \tab 0 \tab 0 \cr 0 \tab 0 \tab 0 \tab 0 \tab { \minus {1 \over 2}} \tab 1 \tab { \minus {1 \over 2}} \cr} } \right]{\bf b}^{\setnum{0}}\cr\tab \equals \left[ {\matrix{ { \minus 0{\cdot}91} \cr {1{\cdot}73} \cr} } \right].$

This analysis would create the illusion of non-additivity and overdominance at the A and B loci, respectively, but without bringing light with respect to the non-linearities of (21). While this model does not have mechanistic relevance, it has predictive value, an issue which is illustrated below.

Table 1 gives estimates of the intercept β and of the RKHS regression coefficients α for each of the shrinkage ratio values employed, using h=1·75 as bandwidth parameter in all cases. The residual sum of squares (weighted and unweighted) and the effective number of parameters, or model df, are presented as well. As λ decreased from 10² to 10⁻², model fit improved, but the efective number of parameters increased from 1·46 to 8·95, near the maximum of 9. The implementations with λ=15⁻¹ and λ=10⁻² essentially produced a ‘saturated’ model, that is, one that fits to the means perfectly (as it is the case when λ=0). Large values of the variance ratio (15, 100) produced excessive shrinkage, as indicated by the small values of the RKHS regression coefficients α, and small values of the variance ratio tended to overfit, as noted. On the other hand, the linear 2-locus model on additive effects, with 5 parameters, had residual sum of squares, SSR=19·98, and weighted residual sum of squares, SSR=99·88. These values are more than 5 times those attained with the RKHS regression implementation with a variance ratio of 1 (SSR=3·55, WSSR=19·12), which are shown in Table 1.

Table 1. Estimates of the intercept (β) and of non-parametric regression coefficients (α_i) for each of the values of the variance ratio (λ=σ_e²/σ_α²) employed. RSS and WRSS are the residual and weighted residual sums of squares, respectively; df gives the model df, or effective number of parameters fitted

A more important issue, at least from the perspective taken in this paper, is ‘out of sample’ predictive ability. To examine this, 3 new (independent) samples of phenotypes were generated, assuming the residual distribution N (0, 20), as before, and with 5 individuals per genotype, i.e. there were 45 subjects in each sample. The predictive residual sums of squares (unweighted and weighted, using 5 as weight) were calculated, using the fitted values from the training sample employed to compute statistics in Table 1, and the ‘new sample’ phenotypes. The 2-locus additive model was also involved in the comparison. Results are shown in Table 2, where entries are the average, minimum and maximum values of the predictive sum of squares over the 3 new samples. As expected, predictive residual sum of squares were much larger than those observed in the training sample, notably for implementations in which overfitting to the training data was obvious $\lpar \lambda \equals {{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }} \equals 15^{ \minus \setnum{1}} \comma 10^{ \minus \setnum{2}} \comma 0\semi$ see Table 1). The specifications producing strong shrinkage towards 0 in the training sample had the worse predictive performance, whereas that with ${{\sigma _{e}^{\setnum{2}} } \over {\sigma _{e}^{\setnum{2}} }} \equals 1$ had the best performance, on average, albeit close to the 2-locus additive model. It is not surprising that small variance ratios led to reasonable predictive performance, be-cause the simulation mimicked high penetrance, i.e. phenotypes are very informative about genotypes, this being so because the approximate heritability (22) for a residual variance of 20 is about 0·88, as already noted.

Table 2. Predictive (weighted and unweighted by the number of individuals per geno-type) residual sums of squares for each of the variance ratios (λ=σ_e²/σ_α²) employed in the non-parametric regression implementation, and for the two-locus model with main effects of genotypes at each of the loci. Entries are average (boldface) from three predictive samples, with minimum and maximum values over samples in parentheses

For this example, the 2-locus additive model is untenable, at least mechanistically, yet it had a reasonable predictive performance. Apart from genetic considerations discussed later, this is because any of the models considered here can be associated with a linear smoother, with predictions having the form

${\bf \hat{y}} \equals {\bf L}{\bf y} \equals \lcub {\bf l \prime }_{\hskip-2pt i}{\bf y} \rcub \comma$

for some matrix L and where l_i′ is its ith row. For the 2-locus model,

${\bf L} \equals {\bf X} \lpar {\bf X}\prime{\bf NX}\rpar ^{ \minus } {\bf X}\prime {\bf N}\comma }$

whereas for any of the RKHS specifications

$\eqalign{\tab{\bf L} \equals \left[ {\matrix{ {\bf 1} \tab {{\bf K}_{\setnum{ 1{\cdot}75}} } \cr} } \right]\left[ {\matrix{ {{\bf 1}\prime{\bf N}{\bf 1}} \tab {{\bf 1}\prime{\bf NK}_{{\setnum{1{\cdot}75}}} } \cr {{\bf K\prime}_{\hskip-3pt{\setnum{1{\cdot}75}}} {\bf N}{\bf 1}} \tab {{\bf K\prime}_{\hskip-3pt{\setnum{1{\cdot}75}}} {\bf NK}_{\setnum{1{\cdot}75}} \plus {{\sigma _{e}^{\setnum{2}} } \over {\sigma _{\alpha }^{\setnum{2}} }}{\bf K}_{{\setnum{1{\cdot}75}}} } \cr} } \right]^{ \minus \setnum{1}}\cr\tab\quad\quad\vskip8\!\!\!\times\left[ {\matrix{ {\bf 1} \tab {{\bf K}_{\setnum{1{\cdot}75}} } \cr} } \right]{\prime }.$

Hence, all predictors can be viewed as consisting of different forms of averaging observations, with the RKHS-based averages having optimality properties, in some well defined sense (Kimeldorf & Wahba, Reference Kimeldorf and Wahba1971; Wahba, Reference Wahba1990; Gianola & van Kaam, Reference Gianola and van Kaam2008; de los Campos et al., Reference de los Campos, Gianola and Rosa2008). Since the data consisted of phenotypic averages, i.e. a non-parametric averaging method where a ‘bin’ is a given genotype, this represented a challenge for any of the smoothers, including the 2-locus additive model. However, some of the smoothers (e.g. RKHS with λ=1 and the 2-locus additive model) met the challenge satisfactorily.

(iv) Fitting the RKHS model to individual observations

The RKHS regression model (using the Gaussian kernel with h=1·75) was fitted again to the 90 data points in a newly simulated sample used to estimate (train) the intercept and the nine non-parametric coefficients. In addition, the following parametric specifications were fitted: additive model (3 location parameters: intercept and additive effects of the A and B loci), additive+dominance model (2 extra parameters corresponding to dominance effects at the two loci), and additive+dominance+epistasis (4 additional df pertaining to additive×additive, additive×dominance, dominance×additive and dominance×dominance interactions). The corresponding regression coefficients for the parametric models were ‘shrunken’ using a common variance ratio, corresponding to each of the 15 λ values employed in the RKHS fitting. Subsequently, 100 predictive samples of size 45 each were simulated, and the realized values were compared against the predictions obtained from the sample used to train either the RKHS regression or the three parametric models.

Figure 4 displays the average squared residual (over the 90 data points in the training sample) for each of the four models fitted. As expected, the RHKS and the additive+dominance+epistatic models fitted the data best at λ=0 (no shrinkage), because of having a larger effective number of parameters (9) than the additive (3) and additive+dominance (5) models. When effects were gradually shrunken (λ increased from 0 to 20), the parametric models maintained their relative standings, whereas RKHS voyaged through all three models, eventually giving a very poor fit, due to oversmoothing. The trajectory of the effective df is shown in Fig. 5, where the ability of RKHS to explore models of different degrees of complexity becomes clear. The out-of-sample predictive performance of the four models is shown in Fig. 6. The simplest model, that is, one with additive effects at each of the two loci fitted, had the best predictive performance, even better than the two additional parametric specifications although the simulated gene action was not additive at all! RKHS regression was competitive, but its predictive ability deteriorated markedly when λ was greater than 5 in the training sample.

Fig. 4. Average (over 90 data points) squared residual for four models plotted to the training sample (RKHS=RKHS regression with Gaussian kernel and h=1·75) for each value of the smoothing parameter λ.

Fig. 5. Effective df for four models plotted to the training sample (RKHS=RKHS regression with Gaussian kernel and h=1·75) at each value of the smoothing parameter λ.

Fig. 6. Average (over 100 samples with 45 realized observations in each) squared prediction error for four models plotted to the predictive sample (RKHS=RKHS regression with Gaussian kernel and h=1·75) for each value of the smoothing parameter λ.

How does one explain the paradox that a simple additive model had better predictive performance when gene action was non-linear, as simulated here? In order to address this question, consider the ‘true’ mean value of the 9 genotypes simulated:

The ‘corrected’ sum of squares among these means is 125·23. A fixed effects ANOVA of these ‘true’ values (assuming genotypes were equally frequent) gives the following partition of sequential sum of squares, apart from rounding errors: (i) additive effect of locus A: 82·8%; (ii) additive effect of locus B after accounting for A: 7·06%; (iii) dominance effects of loci A and B: 4·2%, and (iv) epistasis: 6·2%. Thus, even though the genetic system was non-linear, most of the variation among genotypic means can be accounted for with a linear model on additive effects. The additive model had the worst fit to the data (even worse than the models that assume dominance and epistasis) and, yet, it had the best predictive ability, followed by RKHS for (roughly) 0·5<λ<3.

The simulation was also carried out at larger values of the residual variance, σ_e²=100 and 500. Again, the purely additive model had the best predictive performance, but the difference between models essentially disappeared for λ>50. In particular, the average squared prediction error of RKHS was larger than that for the additive model by 5, 3 and 1% for σ_e²=20, 100 and 500, respectively, when evaluated at the ‘optimal’ λ in each case.

It should be noted that the RKHS implementation used here was completely arbitrary. For example, the kernel chosen was not the result of any formal model comparison, so predictive performance could be enhanced by a more judicious choice of kernel. As noted earlier, the choice of a good kernel is critical in this form of non-parametric modelling.

7. Conclusion

Inference about genotypes and future phenotypes for a complex quantitative trait was discussed in this paper. In particular, it was argued that the Kempthorne–Cockerham theory for partioning variance into additive, dominance and epistatic components has doubtful usefulness, because practically all assumptions required are violated in artificial and natural populations. This theory is probably illusory when genetic systems are complex and non-linear, in agreement with views in Feldman & Lewontin (Reference Feldman and Lewontin1975) and Karlin et al. (Reference Karlin, Cameron and Chakraborty1983). Further, an ANOVA-type decomposition is inadequate for a non-linear system (because the ANOVA model is linear in the parameters), and unfeasible in a highly dimensional and interactive genetic system involving hundreds or thousands of genes. As a minimum, the ANOVA treatment would encounter a huge number of empty cells, extreme lack of orthogonality, and high-order interactions would be extremely difficult to interpret, in the usual sense. Last but not the least, the ANOVA model would require more df than the number of data points available for analysis.

For these reasons, a predictive approach was advanced in this paper, focused on the use of non-parametric methods, especifically RKHS regression. Use of this methodogy in conjunction with standard theory of quantitative genetics leads to non-parametric estimates of additive, dominance and epistatic effects. These ideas were illustrated using a stylyzed example in Henderson (Reference Henderson1985), and it was shown how additive, dominance and total genetic values can be predicted using a single smoothing parameter (in addition to the residual variance) coupled with kernels based on substantive theory, a point that is also made in de los Campos et al. (Reference de los Campos, Gianola and Rosa2008). A non-linear 2-locus system was simulated as well, to illustrate the RKHS approach, which was found to have a better out-of-sample predictive performance of means than the standard 2-locus fixed effects model (with and without epistasis).

On the other hand, a 2-locus model with estimates of additive effects shrunken to different degrees had the best performance when predicting future individual observations. This was explained by the observation that most of the variation among genotypic means could be accounted for by ‘additive effects’. This is consistent with theoretical and empirical results presented by Hill et al. (Reference Hill, Goddard and Visscher2008), who gave evidence that, even in the presence of non-additive genetic action, most variance is of an additive type. Even though molecular geneticists view the additive model as irritatingly reductive, our results give reassurance to a common practice in animal breeding, i.e. predict genetic values using additive theory only. It is unknown, however, to what extent these results (from a predictive perspective) carry to more complex systems, difficult to be described suitably with naive linear models. In such situations, RKHS regression could be valuable, because of its ability to navigate through models of different degrees of complexity and, as shown here, it can be very competitive when the smoothing parameters are tuned properly.

In conclusion, it is felt that the non-parametric methods discussed here coupled with machine learning procedures, such as in Long et al. (Reference Long, Gianola, Rosa, Weigel and Avendaño2007) and Long et al. (Reference Long, Gianola, Rosa, Weigel and Avendaño2008), could play an important role in quantitative genetics in the post-genomic era.

Useful comments made by two anonymous reviewers and by Professor William G. Hill and Dr Andres Legarra are acknowledged. Professor Daniel Sorensen is thanked for suggesting that a 2-locus additive model with shrinkage of effects could have good predictive value, which was confirmed. Part of this work was carried out while the senior author was a Visiting Professor at Georg-August-Universität, Göttingen (Alexander von Humboldt Foundation Senior Researcher Award), and Visiting Scientist at the Station d'Amélioration Génétique des Animaux, Centre de Recherche de Toulouse (Chaire D'Excellence Pierre de Fermat, Agence Innovation, Midi-Pyreneés). Support by the Wisconsin Agriculture Experiment Station and by NSF DMS-NSF grant DMS-044371 to the first author is acknowledged.

References

Bost, B., Dillmann, C. & De Vienne, D. (1999). Fluxes and metabolic pools as model traits for quantitative genetics. Genetics 153, 2001–2012.CrossRef Google Scholar PubMed

Bulmer, M. G. (1980). The Mathematical Theory of Quantitative Genetics. Oxford: Clarendon Press.Google Scholar

Chang, H. L. A. (1988). Studies on estimation of genetic variances under nonadditive gene action. Ph.D. Thesis, University of Illinois at Urbana-Champaign.Google Scholar

Cheverud, J. M. & Routman, E. J. (1995). Epistasis and its contribution to genetic variance components. Genetics 139, 1455–1461.CrossRef Google Scholar PubMed

Cockerham, C. C. (1954). An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39, 859–882.CrossRef Google Scholar PubMed

Cockerham, C. C. (1956). Effect of linkage on the covariances between relatives. Genetics 41, 138–141.CrossRef Google Scholar PubMed

Craven, P. & Wahba, G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik 31, 377–403.CrossRef Google Scholar

de los Campos, G., Gianola, D. & Rosa, G. J. M. (2008). Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. Journal of Animal Science (accepted).Google Scholar

Dempster, E. R. & Lerner, I. M. (1950). Heritability of threshold characters. Genetics 35, 212–236.CrossRef Google Scholar PubMed

Falconer, D. S. & Mackay, T. F. C. (1996). Introduction to Quantitative Genetics, 4th edn. New York: Longman.Google Scholar

Feldman, M. W. & Lewontin, R. C. (1975). The heritability hang-up. Science 190, 1163–1168.CrossRef Google Scholar PubMed

Gallais, A. (1974). Covariances between arbitrary relatives with linkage and epistasis in the case of linkage disequilibrium. Biometrics 30, 429–446.CrossRef Google Scholar PubMed

Gianola, D. (1982). Theory and analysis of threshold characters. Journal of Animal Science 54, 1079–1096.CrossRef Google Scholar

Gianola, D., Fernando, R. L. & Stella, A. (2006). Genomic assisted prediction of genetic value with semi-parametric procedures. Genetics 173, 1761–1776.CrossRef Google Scholar

Gianola, D. & van Kaam, J. B. C. H. M. (2008). Reproducing kernel Hilbert spaces methods for genomic assisted prediction of quantitative traits. Genetics 178, 2289–2303.CrossRef Google Scholar PubMed

González-Recio, O., Gianola, D., Long, N., Weigel, K. A., Rosa, G. J. M. & Avendaño, S. (2008 a). Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178, 2305–2313.CrossRef Google Scholar PubMed

Gonzalez-Recio, O., Gianola, D., Rosa, G. J. M., Weigel, K. A. & Avendaño, S. (2008 b). Genome-assisted prediction of a quantitative trait in parents and progeny: application to food conversion rate in chickens. Genetics Selection Evolution, submitted.Google Scholar

Henderson, C. R. (1973). Sire evaluation and genetic trends. In Proceedings of the Animal Breeding Genetics Symposium in Honour of J. L. Lush, pp. 10–41. Champaign, IL: American Society of Animal Science and American Dairy Science Association.Google Scholar

Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. Guelph, ON: University of Guelph.Google Scholar

Henderson, C. R. (1985). Best linear unbiased prediction of nonadditive genetic merits in noninbred populations. Journal of Animal Science 60, 111–117.CrossRef Google Scholar

Hill, W. G., Goddard, M. E. & Visscher, P. M. (2008). Data and theory point to mainly additive genetic variance for complex traits. PLOS Genetics 4, el000008.CrossRef Google Scholar PubMed

Karlin, S., Cameron, E. C. & Chakraborty, R. (1983). Path analysis in genetic epidemiology: a critique. American Journal of Human Genetics 35, 695–732.Google Scholar

Kempthorne, O. (1954). The correlation between relatives in a random mating population. Proceedings of the Royal Society of London, Series B 143, 103–113.Google Scholar

Kempthorne, O. (1978). Logical, epistemological and statistical aspects of nature-nurture data interpretation. Biometrics 34, 1–23.CrossRef Google Scholar PubMed

Kimeldorf, G. & Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications 35, 82–95.CrossRef Google Scholar

Kojima, K. I. (1959). Role of epistasis and overdominance in stability of equilibria with selection. Proceedings of the National Academy of Sciences of the USA 45, 984–989.CrossRef Google Scholar PubMed

Lee, H. K. H. (2004). Bayesian Nonparametrics Via Neural Networks. Philadelphia, PA: ASA-SIAM.CrossRef Google Scholar

Long, N., Gianola, D., Rosa, G. J. M., Weigel, K. & Avendaño, S. (2007). Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. Journal of Animal Breeding and Genetics 124, 377–389.CrossRef Google Scholar PubMed

Long, N., Gianola, D., Rosa, G. J. M., Weigel, K. A. & Avendaño, S. (2008). Marker-assisted assessment of genotype by environment interaction: a case study of SNP–mortality association in broilers in two hygiene environments. Journal of Animal Science (in press).CrossRef Google Scholar

Lynch, M. & Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer.Google Scholar

Motsinger-Reif, A. A., Dudek, S. M., Hahn, L. W. & Ritchie, M. D. (2008). Comparison of approaches for machine learning optimization of neural networks for detecting gene–gene interactions in genetic epidemiology. Genetic Epidemiology 32, 325–340.CrossRef Google Scholar PubMed

Schnell, F. W. (1963). The covariance between relatives in the presence of linkage. In Statistical Genetics and Plant Breeding. (ed. Hanson, W. D. & Robinson, H. F.), pp. 463–483. Washington, DC: National Academy of Sciences – National Research Council.Google Scholar

Sorensen, D. & Gianola, D. (2002). Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. New York: Springer.CrossRef Google Scholar

Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.Google Scholar

Ruppert, D., Wand, M. P. & Carroll, R. J. (2003). Semiparametric Regression. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Templeton, A. R. (2000). Epistasis and complex traits. In Epistasis and the Evolutionary Process (ed. Wolf, J. B., Brodie, E. D.III and Wade, M. J.), pp. 41–57. New York: Oxford University Press.Google Scholar

Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.Google Scholar

Wahba, G. (1990). Spline Models for Observational Data. Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Wang, C. S., Rutledge, J. J. & Gianola, D. (1993). Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genetics Selection Evolution 25, 41–62.CrossRef Google Scholar

Wang, C. S., Rutledge, J. J. & Gianola, D. (1994). Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genetics Selection Evolution 26, 91–115.CrossRef Google Scholar

Wang, T. & Zeng, Z.-B. (2006). Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium. BMC Genetics 7, 9.CrossRef Google Scholar PubMed

Weir, B. S. & Cockerham, C. C. (1977). Two-locus theory in quantitative genetics. In Proceedings of the International Conference on Quantitative Genetics, pp. 247–269. Ames, IA: Iowa State University Press.Google Scholar

Fig. 1. Rate of change of the expected phenotypic value \left( {{\partial E\lpar.\rpar} \over {\partial \beta_{j}}}\equals \beta_{i} \plus {1 \over 2}\alpha_i \alpha_j \sqrt {{\beta_i} \over {\beta_j }} \right) with respect to the Weibull variable βj.

Fig. 2. Kernel value k\lpar.\comma.\semi h\rpar \equals \exp \left(\! { \minus {S \over h}} \right) against the bandwidth parameter h. Curves, from top to bottom, correspond to S=1, 2, 4, 5, 8.

Fig. 3. Kernel value k\lpar.\comma.\semi h\rpar \equals \exp \left(\! { \minus {S \over h}} \right) against the bandwidth parameter h. Curves, from top to bottom, correspond to S=1, 2, 4, 5, 8.

Table 1. Estimates of the intercept (β) and of non-parametric regression coefficients (αi) for each of the values of the variance ratio (λ=σe2/σα2) employed. RSS and WRSS are the residual and weighted residual sums of squares, respectively; df gives the model df, or effective number of parameters fitted

Table 2. Predictive (weighted and unweighted by the number of individuals per geno-type) residual sums of squares for each of the variance ratios (λ=σe2/σα2) employed in the non-parametric regression implementation, and for the two-locus model with main effects of genotypes at each of the loci. Entries are average (boldface) from three predictive samples, with minimum and maximum values over samples in parentheses

Fig. 5. Effective df for four models plotted to the training sample (RKHS=RKHS regression with Gaussian kernel and h=1·75) at each value of the smoothing parameter λ.

Article contents

Inferring genetic values for quantitative traits non-parametrically

Summary

1. Introduction

2. Extant theory

(i) Henderson's methods for predicting epistatic effects

(ii) Reformulation of Henderson's approach

(iii) Bayesian implementation of the reformulated model

3. Confronting complexity

4. Non-parametric breeding value

(i) Infinitesimal non-parametric breeding value

(ii) SNP-based non-parametric breeding value

(iii) Semi-parametric breeding value

5. Illustration: infinitesimal model

6. Illustration: non-linear system

(i) Two-locus model

(ii) RKHS modelling

(iii) Fitting the RKHS model to the means

(iv) Fitting the RKHS model to individual observations

7. Conclusion

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests