Effect of cow reference group on validation reliability of genomic evaluation

M. Koivula; I. Strandén; G. P. Aamand; E. A. Mäntysaari

doi:10.1017/S1751731115002864

Effect of cow reference group on validation reliability of genomic evaluation

Published online by Cambridge University Press: 06 January 2016

M. Koivula ,

I. Strandén ,

G. P. Aamand and

E. A. Mäntysaari

Show author details

M. Koivula*: Affiliation:
Natural Resources Institute Finland (Luke), Green Technology, FI-31600 Jokioinen, Finland
I. Strandén: Affiliation:
Natural Resources Institute Finland (Luke), Green Technology, FI-31600 Jokioinen, Finland
G. P. Aamand: Affiliation:
NAV Nordic Cattle Genetic Evaluation, Agro Food Park 15, 8200 Aarhus N, Denmark
E. A. Mäntysaari: Affiliation:
Natural Resources Institute Finland (Luke), Green Technology, FI-31600 Jokioinen, Finland
*: †E-mail: [email protected]

Article contents

Abstract
Implications
Introduction
Material and methods
Results and discussion
References

Abstract

We studied the effect of including genomic data for cows in the reference population of single-step evaluations. Deregressed individual cow genetic evaluations (DRP) from milk production evaluations of Nordic Red Dairy cattle were used to estimate the single-step breeding values. Validation reliability and bias of the evaluations were calculated with four data sets including different amount of DRP record information from genotyped cows in the reference population. The gain in reliability was from 2% to 4% units for the production traits, depending on the used DRP data and the amount of genomic data. Moreover, inclusion of genotyped bull dams and their genotyped daughters seemed to create some bias in the single-step evaluation. Still, genotyping cows and their inclusion in the reference population is advantageous and should be encouraged.

Keywords

genomic evaluation single step reference population

Type: Research Article
Information: animal , Volume 10 , Issue 6 , June 2016 , pp. 1061 - 1066

DOI: https://doi.org/10.1017/S1751731115002864 [Opens in a new window]
Copyright: © The Animal Consortium 2016

Implications

Our results indicated that validation reliability of genomic breeding values increases when evaluations include genotyped cows with records in the reference population. The results also indicated that if cow genotypes are included in the analyses then also their phenotypes should be included in order to increase genetic gain and validation reliability, and decrease bias.

Introduction

Accurate genomic evaluations require large reference populations with reliable performance information such as estimated breeding values (EBVs) (Goddard and Hayes, Reference Goddard and Hayes2009). The first dairy cattle genomic evaluations relied on reference populations having only progeny-tested bulls, and were based only on averaged performances of bull’s daughters. Small dairy cattle populations are often restricted by small reference populations of progeny-tested bulls. These populations, therefore, have low reliabilities of genomic-enhanced breeding values (GEBVs) (Thomasen et al., Reference Thomasen, Guldbrandtsen, Su, Brøndum and Lund2012). By including genotyped cows in the reference population, the size of the reference group can be easily increased (Bapst et al., 2013; Dassonneville et al., Reference Dassonneville, Baur, Fritz, Boichard and Ducrocq2012). For example, in the United States, cow genotypes have been included in the US genomic evaluations since their beginning (Wiggans et al., Reference Wiggans, Cooper, VanRaden and Cole2011).

In the DFS countries (Denmark, Finland and Sweden), validation reliabilities for the genomic evaluations of Red Dairy Cattle (RDC) and Jersey have not been as high as for the Holstein breed. One important reason could be the smaller effective population size of the Holstein population (Su et al., Reference Su, Brøndum, Ma, Guldbrandtsen, Aamand and Lund2012a). Goddard (Reference Goddard2009) has shown that effective population size is an important factor influencing the accuracy of genomic evaluation. However, another important factor is that smaller populations cannot provide as many accurately evaluated bulls to be included into the reference population. To overcome this problem, the DFS breeding and AI companies started in 2014 a cow genotyping project called LD-project (Langdahl, Reference Langhdahl2014), where a low-cost low-density chip was offered for the breeders in aim for voluntary genotyping of all young animals in their herds.

Most genomic evaluations are based on a multi-step approach that requires (1) calculation of traditional EBVs without genomic information, (2) extraction of pseudo-observations, typically either daughter yield deviations (DYD) or deregressed EBVs (DRP) and (3) application of a genomic model for prediction of direct genomic values (DGV) (VanRaden, Reference VanRaden2008; VanRaden et al., Reference VanRaden, Van Tassell, Wiggans, Sonstegard and Schnabel2009). The multi-step genomic evaluations can be further improved by blending the DGVs and information from traditional EBV (e.g. VanRaden, Reference VanRaden2008) yielding GEBVs. The multi-step approach is a complex system and includes several approximations. Each approximation reduces accuracy and can increase the bias in GEBVs, for example, by increasing standard deviation of GEBVs.

Single-step evaluation (ssGBLUP by Misztal et al., Reference Meuwissen, Hayes and Goddard2009; Aguilar et al., Reference Aguilar, Misztal, Johnson, Legarra and Tsuruta2010; Christensen and Lund, Reference Christensen and Lund2010) is a unified approach to calculate GEBVs. The ssGBLUP combines the phenotypic records, pedigree information and genomic information in calculation of GEBV. Although the usual multi-step genomic evaluations mostly rely on highly reliable AI sires as reference population, the single-step approach includes genomic data into the traditional EBV analysis that has all the phenotyped animals, and, thus, can be rated computationally demanding with large data sets and multi-trait analysis (Su et al., Reference Su, Madsen, Nielsen, Mäntysaari, Aamand, Christensen and Lund2012b). However, including cow genotypes and phenotypes in the single-step evaluation is much easier than in the multi-step estimation, because in the multi-step the phenotypes need to be carefully constructed in order to avoid double-counting information from genotyped daughters to their sires. The single-step method by Aguilar et al. (Reference Aguilar, Misztal, Johnson, Legarra and Tsuruta2010) and Christensen and Lund (Reference Christensen and Lund2010) does not explicitly divide the population into training group (reference population) and prediction group (validation population), but instead the genomic data are included along the phenotypic data and pedigree relationship information. Consequently, estimation of gains from including daughter genotype data into single-step evaluation is more complicated, particularly in multiple trait evaluations that include records from several years for a cow. When observations in the single-step evaluation are from cows, all records from several years or parities of a young genotyped cow cannot be included into evaluation unless records produced the same year(s) by daughters to the validation bulls are also included. Alternatively, ssGBLUP can be computed using DRPs instead of original phenotypic records. This allows including the information of genotyped females into evaluation even if some production data are from the same years as omitted records by daughters of validation bulls.

Wiggans et al. (Reference Wiggans, Cooper, VanRaden and Cole2011) and Dassonneville et al. (Reference Dassonneville, Baur, Fritz, Boichard and Ducrocq2012) found that the inclusion of cow genotypes can even result in a decrease in the reliability of bull genomic evaluations. The reason for this decrease was assumed to be bias due to pre-selection of cows, where cows have been selected for genotyping based on their genetic merit or expectation for a high genetic evaluation. Thus, potential bull dams have been the first cows to be genotyped. This has been the case also in DFS countries for the older genotyped cows. In order to avoid the pre-selection bias, the particular aim in the Nordic LD-project was to genotype all younger cows from the participating herds.

The main objectives of this study were to estimate (1) the effects of including different sets of phenotypes and genotypes of females into the reference population in single-step evaluations, and (2) how much single-step evaluation based on individual cow DRPs can improve the validation reliability of GEBVs. We also wanted to estimate amount of bias in the evaluations due to inclusion of genotyped bull dams and their genotyped daughters.

Material and methods

Marker data

Genotype data included 15 148 RDC animals of which 5534 were bulls and 9614 cows. Bulls were genotyped using the Illumina BovineSNP50 and cows with BovineLD Bead Chips imputed to the 50K chip (Illumina, San Diego, CA, USA). After applying editing criteria, 46 914 markers on the 29 bovine autosomes were used in the analysis. The used genotype markers were the same as in the official genomic evaluation of NAV (Nordic Cattle Genetic Evaluation) in June 2014.

Phenotypic data

We obtained EBVs and corresponding individual cow reliabilities from NAV for the 3.2 million cows with records in the May 2014 RDC evaluations for milk, protein and fat. These were used to derive the phenotypes for the cows. First, the cow reliabilities were used to derive the effective record contributions (ERC) (Taskinen et al., Reference Taskinen, Mäntysaari, Aamand and Strandén2014). Second, the ERCs and the EBVs were used to calculate DRPs for all cows with ERC>0. The variance parameters in ERC approximation were h ² _milk=0.48, h ² _protein=0.48 and h ² _fat=0.49, and the same values were used throughout the study. Deregression (Strandén and Mäntysaari, Reference Strandén and Mäntysaari2010) used the full pedigree of 5.1 million animals in the NAV evaluation. The three traits were deregressed simultaneously, but assuming genetic and residual correlations to be 0.

Validation candidate bulls were chosen from genotyped bulls born 2006-10 and having ERC⩾3 (corresponds to roughly 20 daughters with records) in the full cow DRP data. This gave 746 candidate bulls for the validation. All evaluations used 4413 genotyped bulls in the reference. Two different options for the number of genotyped reference animals were considered: only genotyped bulls or both genotyped bulls and cows. The full cow DRP data were used to make four different reduced data sets which differed in the amount of DRP information for the genotyped reference cows. DRP records of daughters of validation bulls and the non-genotyped daughters of reference bulls born after 2009 were removed from the data sets. Table 1 describes the four reduced data sets named AllG, nonBdG, nonBdDG and Control.

Table 1 Numbers of animals and deregressed proofs (DRP) in the reduced data sets

Control=DRPs of all genotyped cows excluded; AllG=7143 genotyped cows included; nonBdG=DRPs of genotyped bull dams excluded; nonBdDG=DRPs of genotyped bull dams and their genotyped daughters excluded from the reference population.

All data sets had the same number of reference bulls (N _b) but number of genotyped cows with DRP in the reduced data set (N _c) varied. Total number of DRPs (N _DRP) gives the total number of cows having DRP in the data set.

The Control data (1) included all DRPs of non-genotyped cows until 2008 but no DRPs from the genotyped cows. In the AllG data (2), DRP records of the genotyped cows were added into the Control data set. The nonBDG data (3) were made by removing the DRPs of 52 genotyped bull dams from the AllG data. The nonBdDG data (4) were also made by removing the DRPs of the 104 genotyped daughters of bull dams from the nonBdG data.

Statistical analyses

Breeding values were estimated using the ssGBLUP model. Different reduced data sets and the two alternative genotype sets were used to solve GEBVs and EBVs for all animals in the pedigree. For the validation reliability calculation, full animal model EBVs were estimated using the full cow DRP data, and then DYDs were calculated for the validation bulls. For the validation bulls, the EBVs from the reduced data are hereinafter called parent averages (PA).

Single-step method, for example, Aguilar et al. (Reference Aguilar, Misztal, Johnson, Legarra and Tsuruta2010) and Christensen and Lund (Reference Christensen and Lund2010), was based on model:

$${\bf y}{\, \equals}\,{\bf 1}\mu {\plus}{\bf Wa}{\plus}{\bf e}$$

where y is a vector of cow DRPs, a the vector of random additive genetic effects and W an incidence matrix relating breeding values a to appropriate DRP records in y, and e a vector of random residuals. The co-variance matrices for the random effects were $${\rm var}({\bf{a}} ){\, \equals}\,{\bf{H}} \sigma _{a}^{2} $$ and $${\rm var}({\bf{e}} ){\, \equals}\,{\bf{D}} ^{{{\rm {\minus}1}}} \sigma _{e}^{2} $$ , where the diagonal matrix D consists of ERC of the animals. The unified relationship matrix H in the single-step method defines the relationships between all animals using pedigree and genotype information. Inverse of H is needed in the mixed model equations and has a simple structure (Aguilar et al., Reference Aguilar, Misztal, Johnson, Legarra and Tsuruta2010; Christensen and Lund, Reference Christensen and Lund2010),

$${\bf{H}} ^{{{\rm {\minus}1}}} {\, \equals}\,{\bf{A}} ^{{{\rm {\minus}1}}} {\plus}\left[ {\matrix{ {\bf 0} & {\bf 0} \cr {\bf 0} & {{\bf G}^{{{\minus}1}} {\minus}{\bf A}_{{22}}^{{{\minus}1}} } \cr } } \right]$$

where A ₂₂ is the sub matrix of pedigree-based numerator relationship matrix A for the genotyped animals and G the relationship matrix constructed using genomic information. The genotypes of 15 148 RDC animals, including animals without offspring or records, were used to form the raw G matrix with the method 1 in VanRaden (Reference VanRaden2008). Aguilar et al. (Reference Aguilar, Misztal, Johnson, Legarra and Tsuruta2010) and Christensen and Lund (Reference Christensen and Lund2010) noted that if not all the genetic variance is accounted by the single nucleotide polymorphisms markers, the residual polygenic effects can be included into the model by replacing the genomic matrix G by G _w=(1−w)G+w A ₂₂, where the constant w represents the proportion of polygenic variance unaccounted by markers. Before the matrices G and A ₂₂ were combined into G _w, the raw G matrix was scaled by scalar $$t{\, \equals}\,{{tr({\bf A}_{{22}} )} \over {tr({\bf G})}}$$ where tr denotes trace of matrix. Thus, average of diagonals of G, as well as G _w, is the same as the average of diagonals of the A ₂₂ matrix.

When the mixed model equations for the single-step method is considered, the difference to normal animal model is the matrix block H ²²=A ²²+G ⁻¹−A ₂₂ ⁻¹ between the genotyped animals. Here, the superscript 22 in H ²² and A ²² refers to a sub matrix block of genotyped animals, and superscript indicates that the block is sub matrix in the full inverse (H ⁻¹ or A ⁻¹) of the matrix. To improve the properties of the single-step evaluation, different weights can be used for the component matrices in the H ²² matrix. In Misztal (Reference Aguilar, Misztal, Johnson, Legarra and Tsuruta2010) and Tsuruta et al. (Reference Tsuruta, Misztal, Aguilar and Lawlor2011), the H ²² matrix was scaled as H ²²=A ²²+τ G _w ⁻¹−ω A ₂₂ ⁻¹. Misztal et al. (Reference Misztal, Legarra and Aguilar2013) suggested that optimal weights τ for G _w ⁻¹ and ω for A ₂₂ ⁻¹ decrease the possible inflation of GEBVs. The parameters τ and ω scale the size of the genomic and pedigree relationships. The larger the τ is the less weight is given to G, whereas larger ω decreases the importance of pedigree relationships and increases the importance of genomic relationships. According to our preliminary analyses (Koivula et al., Reference Koivula, Strandén, Pösö, Aamand and Mäntysaari2015), the highest prediction accuracy was achieved when we used a combination of τ=1.6 and ω=0.5, and the proportion of polygenic variance in G _w was fixed to w=0.10. This combination was found to give least inflation for genomic predictions also in Koivula et al. (Reference Koivula, Strandén, Aamand and Mäntysaari2014) and Koivula et al. (Reference Koivula, Strandén, Pösö, Aamand and Mäntysaari2015). Two different H ²² matrix blocks were constructed. One with all genotyped animals included in the genomic relationship matrix (GEBV_a), the second with only bull genotypes included in the genomic relationship matrix (GEBV_b).

The GEBVs of the validation bulls were used to predict the DYDs as specified in the Interbull validation protocol (Mäntysaari et al., Reference Misztal, Tsuruta, Aguilar, Legarra, VanRaden and Lawlor2010)

$${\bf y}{\, \equals}\,{\bf 1}\,b_{0} {\plus}b_{1} \,{\bf{{\scale 65% \hat \vskip -11.5pt \hskip -2pt a}} }{\plus}{\bf e}$$

where y is a vector with DYDs of the validation bulls from the full data, b ₀ and b ₁ are regression coefficients, â contains GEBVs for these bulls from the analysis based on the reduced data and e the vector of residuals. The validation reliability of the model was obtained from the R ² (coefficient of determination) of the model (R ² _model), after adjusting it by the average reliability of $${\rm DYDs}\,(\overline{r} _{{_{{{\rm DYD}}} }}^{2} )$$ of the candidate bulls, that is, $$R^{2} _{{{\rm validation}}} {\, \equals}\,R^{2} _{{{\rm model}}} \,/\,\overline{r} _{{{\rm DYD}}}^{2} $$ . Reliability of each individual DYD _i was calculated as $$r^{2} _{{DYD_{i} }} {\, \equals}\,ERC_{i} \,/\,(ERC_{i} {\plus}\lambda )$$ , where λ=(1−h ²)/h ². In order to estimate the further gain from the genomic information over the PA (VanRaden et al., Reference VanRaden, Van Tassell, Wiggans, Sonstegard and Schnabel2009; Mäntysaari et al., Reference Misztal, Tsuruta, Aguilar, Legarra, VanRaden and Lawlor2010) the same validation tests were also applied to PA.

The EBVs and GEBVs were solved by pre-conditioned conjugate gradient iteration using MiX99 software (Strandén and Lidauer, Reference Strandén and Lidauer1999). Confidence intervals (CIs) were estimated for the regression coefficients (b ₁) and the validation reliabilities (R ² _validation) using non-parametric bootstrap (Koivula et al., Reference Koivula, Strandén, Pösö, Aamand and Mäntysaari2015). The boot and boot.ci functions of the R package (R core team, 2012) were used to calculate 95% bootstrap CIs for candidate bulls. Number of bootstrap samples was 10 000. Bootstrap CIs were calculated using three methods: ‘basic’, ‘norm’ and ‘perc’. CIs by the ‘basic’ method are given because all methods gave about the same values.

Results and discussion

Number of iterations to convergence by the pre-conditioned conjugate gradient method was different by model and data. Number of iterations was 1070 for the EBVs from the full data animal model, and varied from 1143 to 1145 iterations for the EBVs from the reduced data, and from 874 to 884 rounds for the single-step method, depending on which reduced data set and H ²² were used. Computing time to solve the mixed model equations for the animal model was on average 58 min, which increased for the single-step method by 6 min when using only bull genotypes or by 57 min when using both bull and cow genotypes. Thus, process time per iteration was doubled for the single-step evaluation, when cow genotypes were included into the H ²² matrix. The main reason for the increase was the need to read and process the large H ²² matrix in ssGBLUP.

The model validation results are in Table 2. The table has regression coefficients (b ₁) and validation reliabilities (R ²) with 95% bootstrap CIs with 10 000 bootstrap resampling. The improvement in R ² due to inclusion of genotyped reference cows was from 3% to 4% units for milk, from 2% to 3% units for protein and from 3% to 4% units for fat (Table 2). In Koivula et al. (Reference Koivula, Strandén, Aamand and Mäntysaari2014), genotyped reference cows increased the R ² from 0.8% to 2.6% units for the production traits but the study had less data. Our results indicate that genotyping cows and subsequent inclusion in the reference population is advantageous and is expected to further increase the reliabilities. This is in agreement, for example, with Thomasen et al. (Reference Thomasen, Sørensen, Lund and Guldbrandtsen2014) who found that the annual genetic gain and the reliability of genomic predictions were slightly higher when including more cows in the reference population. Current study used individual cow DRPs as phenotypes. However, this was done only to evaluate the value of cow genotype data. In practical single-step evaluations, the genotyped cows can be included along with all their contemporaries, and the gain from the information is most likely larger.

Table 2 Model validation results

Control=deregressed proofs of all genotyped cows excluded; AllG=deregressed proofs of 7143 genotyped cows included; nonBdG=deregressed proofs of genotyped bull dams excluded; nonBdDG=deregressed proofs of genotyped bull dams and their genotyped daughters excluded from the reference population.

Regression coefficients (b ₁) and validation reliabilities (R ² in %) and their 95% bootstrap confidence intervals (CIs) from the parent average (PA), and genomic-enhanced breeding values (GEBV). GEBV_a with all genotyped animals and GEBV_b including only bull genotypes in the genomic relationship matrix.

In general, the effect of genotyped cows was positive for the validation reliability of GEBV, but at the same time the inclusion of DRP information from genotyped cows seemed to create some bias. The degree of inflation is indicated by the coefficient of regression (b ₁) of DYDs on GEBV. In the validation test, DYDs are considered as unbiased estimates of genetic values and, thus, optimal prediction of genetic merit of young individuals should give 1 as the regression coefficient b ₁. When b ₁ is <1, the predictions are inflated and the differences in estimated genetic merit of young individuals are exaggerated compared with their future performance. Wiggans et al. (Reference Wiggans, Cooper, VanRaden and Cole2011) and Dassonneville et al. (Reference Dassonneville, Baur, Fritz, Boichard and Ducrocq2012) found that, the inclusion of cow genotypes can result in a decrease in the reliability of bull genomic evaluations. The reason for this was assumed to be in pre-selection of cows, because selection of cows for genotyping is based on high EBVs or potential for a high genetic evaluation. Thus, potential bull dams have been the first cows to be genotyped. Indeed, genotyped bull dams and their genotyped daughters seemed to have some effect on bias and reliability. Although differences were small, it appeared that for milk and protein the exclusion of DRP data from both the genotyped bull dams and their genotyped daughters gave better validation results when the bias (b ₁) and validation reliability (R ²) are considered. However, for fat, exclusion of genotyped bull dams was enough to overcome bias (Table 2). Still, the regression coefficients deviated from 1. The 95% bootstrap CIs for the regression coefficients of the GEBVs included always 1.0 in milk, in fat with GEBV_a and protein with GEBV_a using either the Control or the nonBdDG data.

The regression coefficients for the PAs indicated large bias (b ₁ varied from 0.69 to 0.90). Preferential treatment of the potential bull dams has been assumed to be one reason to this (Kuhn et al., Reference Kuhn, Boettcher and Freeman1994). In our case, this is unlikely the source of the bias as also with data sets nonBdG and nonBdDG, the regression coefficients for PA deviated from 1. However, only small proportions of bull dams were genotyped. Thus, removing DRPs of genotyped bull dams might not be enough to overcome the preferential treatment. However, we were unable to reduce the bias by removing DRPs of all dams of the validation bulls (b ₁ of PA increased only for fat to 0.79, but decreased for milk and protein to 0.81 and 0.75, respectively). If the heritabilities used in the animal model were incorrect it could also lead to bias. Therefore, we tested the animal model also using average test-day heritabilities given in Lidauer et al. (Reference Lidauer, Pösö, Pedersen, Lassen, Madsen, Mäntysaari, Nielsen, Eriksson, Johansson, Pitkänen and Strandén2015). This decreased the bias about 3% in milk, 5% in protein and 10% in fat, but still regression coefficients deviated from 1.

The official Nordic RDC milk production evaluation includes test-day records from milk, fat and protein production. Production records from the first three lactations are in the same multiple traits model. Each trait has a random regression function for random genetic and permanent environmental effects (Lidauer et al., Reference Lidauer, Pösö, Pedersen, Lassen, Madsen, Mäntysaari, Nielsen, Eriksson, Johansson, Pitkänen and Strandén2015). The original test-day model using the real phenotypic observations is very complicated. In this study, we used 305-day yields combined over three lactations to calculate cow DRPs. This process includes several approximations that may reduce accuracy and can inflate the resulted (G)EBVs. In the validation, we were in principle comparing first lactation result of validation bulls with DYDs based on multi-lactations. As DYDs are based on much more information than the (G)EBVs, bias was expected to be larger in GEBVs. However, our results indicated that bias is smaller for GEBVs. This indicates that moving from the traditional pedigree-based evaluation to genomic evaluations improves the breeding value estimation.

Validation results of bulls did not gain from inclusion of DRPs of genotyped cows (Table 2) when the genotypes of the cows were not included in the H matrix. Both validation reliabilities and variance inflation b ₁ were lower compared with results from analyses using cow genomic and DRP information. Inclusion of cow genomic information seems to give higher reliability and lower bias independent of the amount of cow DRP data. This supports results that cow genotypes are a valuable addition in the genomic evaluation (Tsuruta et al., Reference Tsuruta, Misztal, Aguilar and Lawlor2013). In our case, cow genotypes lessened particularly the bias. The expected increase in validation reliability due to increased reference population can be estimated by non-linear equations suggested by Daetwyler et al. (Reference Daetwyler, Pong-Wong, Villanueva and Woolliams2010) or Meuwissen et al. (Reference Mäntysaari, Liu and VanRaden2013). In these, the information content of reference population is a product of number of animals phenotyped and genotyped and their corresponding evaluation accuracy. Therefore, according to the formulas by Daetwyler et al. (Reference Daetwyler, Pong-Wong, Villanueva and Woolliams2010) and Meuwissen et al. (Reference Mäntysaari, Liu and VanRaden2013), with given model reliabilities of bull and cow DRPs, each bull should not contribute much more information than three to four cows, because bull DRP reliability is high due to progeny information but cow DRP reliability is mostly due to own record information and accurate sire information. However, the value of added information depends on amount of already available information, and the relationships among bulls and cows.

The trends in GEBVs for milk, protein and fat (Figure 1) show no difference whether DRPs of genotyped cows were included in the data or not. Trends are presented for GEBV from ssGBLUP using both cow and bull genotypes. Especially in reference bulls the trend lines go side by side. For the candidate bulls, trends seem to be a little higher if genotyped cows are in the reference compared with situation where DRPs of genotyped cows are excluded. The GEBV trends also follow nicely the EBVs calculated from the full cow DRP data. Thus, including information of genotyped cows seems not to induce any problems in genetic trends.

Figure 1 Genetic trends for (a) milk, (b) protein and (c) fat production using genomic-enhanced breeding values (GEBV_as) and estimated breeding values (EBVs) of reference and candidate bulls from different reduced data. For the candidate bulls, the EBV from reduced data are parent average (PA). Control=no deregressed proofs (DRP) of genotyped cows in the reference data and AllG=DRPs of 7143 genotyped cows in the reference population. EBVs (black solid line) were calculated from the full cow DRP data. Solid lines are for the reference bulls and dashed lines for the candidate bulls. EBVs and GEBV_as are expressed as standardized breeding values with SD of 10 units for bulls born between the years 2003 and 2005.

Koivula et al. (Reference Koivula, Strandén, Pösö, Aamand and Mäntysaari2015) presented that standard deviations of the EBV and GEBV for reference and candidate bulls differ depending on the method used to make the H²² matrix. The impact of changing τ and ω was an important one that affected standard deviations of both candidate and reference animals, whereas changes in polygenic effect, w, affected in larger degree candidate animals (Koivula et al. Reference Koivula, Strandén, Pösö, Aamand and Mäntysaari2015). Although, GEBVs by the single-step method are less biased than PA, it is essential to consider the whole picture before choosing the method for use. Moreover, for different traits, different amount of polygenic proportion can be optimal. Use of genomic relationship matrix that weights markers according to analysed trait (e.g. VanRaden, Reference VanRaden2008; Makgahlela et al., 2013) may better account for differences in genetic architecture. Therefore, there is still a need to study the most appropriate method to build the H ²² matrix for the single-step evaluation.

In conclusion, we observed consistent increase in validation reliability and smaller bias when cow genomic and record information were included in the reference population. Still, the number of genotyped cows was probably too small to produce much higher improvement in validation reliability. However, genotyping cows and subsequent inclusion in the reference population is advantageous and the number of genotyped cows should be increased in the Nordic RDC population. There is some evidence for small bias due to records of genotyped bull dams and their daughters. This should be studied further when more cow genotype information becomes available.

Acknowledgements

This work was a part of the Genomic Selection project originally established by Aarhus University and the Nordic cattle breeding organizations Viking Genetics, Nordic Cattle Genetic Evaluation (NAV) and Faba Coop.

References

Aguilar, I, Misztal, I, Johnson, DL, Legarra, A and Tsuruta, S 2010. Hot topic, a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science 93, 743–752.Google Scholar

Bapst, B, Base, C, Seefried, FR, Bieber, A, Simianer, H and Gredler, B 2013. Effect of cows in the reference population: First results in Swiss Brown Swiss. Interbull Bulletin 47, 187–191.Google Scholar

Christensen, OF and Lund, MS 2010. Genomic prediction when some animals are not genotyped. Genetics Selection Evolution 42, 2.Google Scholar

Daetwyler, HS, Pong-Wong, R, Villanueva, B and Woolliams, JA 2010. The impact of genetic architecture on genome-wide evaluation methods. Genetics 185, 1021–1031.Google Scholar

Dassonneville, R, Baur, A, Fritz, S, Boichard, D and Ducrocq, V 2012. Inclusion of cow records in genomic evaluations and impact on bias due to preferential treatment. Genetics Selection Evolution 44, 40.Google Scholar

Goddard, M 2009. Genomic selection, prediction of accuracy and maximisation of long term response. Genetica 136, 245–257.Google Scholar

Goddard, ME and Hayes, BJ 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics 10, 381–391.Google Scholar

Koivula, M, Strandén, I, Aamand, GP and Mäntysaari, EA 2014. Effect of cow reference group on validation accuracy of genomic evaluation. Proceedings of the 10th World Congress on Genetics Applied to Livestock Production, Vancouver, BC, Canada, 17–22 August, Article no: 083.Google Scholar

Koivula, M, Strandén, I, Pösö, J, Aamand, GP and Mäntysaari, EA 2015. Single-step genomic evaluation using multitrait random regression model and test-day data. Journal of Dairy Science 98, 2775–2784.CrossRef Google Scholar PubMed

Kuhn, MT, Boettcher, PJ and Freeman, AE 1994. Potential biases in predicted transmitting abilities of females from preferential treatment. Journal of Dairy Science 77, 2428–2437.CrossRef Google Scholar

Langhdahl, C 2014. Status on practical breeding program in VG. January 21–22, Copenhagen, Denmark. Retrieved August 1, 2015, from http://www.nordicebv.info/NR/rdonlyres/22510460-48C8-4BA4-AEC5-01AD1A6227F1/0/20140121_2GSworkshopJanuary.pdf.Google Scholar

Lidauer, M, Pösö, J, Pedersen, J, Lassen, J, Madsen, P, Mäntysaari, EA, Nielsen, US, Eriksson, J-Å, Johansson, K, Pitkänen, T and Strandén, I 2015. Across-country test-day model evaluations for Holstein, Nordic Red Cattle, and Jersey. Journal of Dairy Science 98, 1296–1309.Google Scholar

Makgahlela, ML, Mäntysaari, EA, Strandén, I, Koivula, M, Nielsen, US, Sillanpää, MJ and Juga, J 2013. Across breed multi-trait random regression genomic predictions in the Nordic Red dairy cattle. Journal of Animal Breeding and Genetics 130, 10–19.Google Scholar

Mäntysaari, EA, Liu, Z and VanRaden, P 2010. Interbull validation test for genomic evaluations. Interbull Bulletin 40, 1–5.Google Scholar

Meuwissen, T, Hayes, B and Goddard, M 2013. Accelerating improvement of livestock with genomic selection. Annual Review of Animal Biosciences 1, 221–237.Google Scholar

Misztal, I, Aguilar, I, Legarra, A and Lawlor, TJ 2010. Choice of parameters for single-step genomic evaluation for type. Journal of Dairy Science 93 (suppl. 1), 533.Google Scholar

Misztal, I, Legarra, A and Aguilar, I 2009. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. Journal of Dairy Science 92, 4648–4655.CrossRef Google Scholar PubMed

Misztal, I, Tsuruta, S, Aguilar, I, Legarra, A, VanRaden, PM and Lawlor, TJ 2013. Methods to approximate reliabilities in single-step genomic evaluation. Journal of Dairy Science 96, 647–654.CrossRef Google Scholar PubMed

R Core Team 2012. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, Retrieved August 1, 2015, from http://www.R-project.org/.Google Scholar

Strandén, I and Lidauer, M 1999. Solving large mixed models using preconditioned conjugate gradient iteration. Journal of Dairy Science 82, 2779–2787.Google Scholar

Strandén, I and Mäntysaari, EA 2010. A recipe for multiple trait deregression. Interbull Bulletin 42, 21–24.Google Scholar

Su, G, Brøndum, RF, Ma, P, Guldbrandtsen, B, Aamand, GP and Lund, MS 2012a Comparison of genomic predictions using medium-density (~54000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations. Journal of Dairy Science 95, 4657–4665.Google Scholar

Su, G, Madsen, P, Nielsen, US, Mäntysaari, EA, Aamand, GP, Christensen, OF and Lund, MS 2012bGenomic prediction for the Nordic Red Cattle using one-step and selection index blending approaches. Journal of Dairy Science. 95, 909–917.Google Scholar

Taskinen, M, Mäntysaari, EA, Aamand, GP and Strandén, I 2014. Comparison of breeding values from single-step and bivariate blending methods. Proceedings of the 10th World Congress on Genetics Applied to Livestock Production, Vancouver, BC, Canada, 17–22 August, Article no: 507.Google Scholar

Thomasen, JR, Guldbrandtsen, B, Su, G, Brøndum, RF and Lund, MS 2012. Reliabilities of genomic estimated breeding values in Danish Jersey. Animal 6, 789–796.Google Scholar

Thomasen, JR, Sørensen, AC, Lund, MS and Guldbrandtsen, B 2014. Adding cows to the reference population makes a small dairy population competitive. Journal of Dairy Science 97, 5822–5832.Google Scholar

Tsuruta, S, Misztal, I, Aguilar, I and Lawlor, TJ 2011. Multiple-trait genomic evaluation of linear type traits using genomic and phenotypic data in US Holsteins. Journal of Dairy Science 94, 4198–4204.Google Scholar

Tsuruta, S, Misztal, I, Aguilar, I and Lawlor, TJ 2013. Genomic evaluations of final score for US Holsteins benefit from the inclusion of genotypes on cows. Journal of Dairy Science 96, 3332–3335.Google Scholar

VanRaden, PM 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science 91, 4414–4423.Google Scholar

VanRaden, PM, Van Tassell, CP, Wiggans, GR, Sonstegard, TS and Schnabel, RD 2009. Invited review, reliability of genomic predictions for North American Holstein bulls. Journal of Dairy Science 92, 16–24.Google Scholar

Wiggans, GR, Cooper, TA, VanRaden, PM and Cole, JB 2011. Technical note, adjustment of traditional cow evaluations to improve accuracy of genomic predictions. Journal of Dairy Science 94, 6188–6193.Google Scholar

Table 1 Numbers of animals and deregressed proofs (DRP) in the reduced data sets

Table 2 Model validation results

Figure 1 Genetic trends for (a) milk, (b) protein and (c) fat production using genomic-enhanced breeding values (GEBVas) and estimated breeding values (EBVs) of reference and candidate bulls from different reduced data. For the candidate bulls, the EBV from reduced data are parent average (PA). Control=no deregressed proofs (DRP) of genotyped cows in the reference data and AllG=DRPs of 7143 genotyped cows in the reference population. EBVs (black solid line) were calculated from the full cow DRP data. Solid lines are for the reference bulls and dashed lines for the candidate bulls. EBVs and GEBVas are expressed as standardized breeding values with SD of 10 units for bulls born between the years 2003 and 2005.

Article contents

Effect of cow reference group on validation reliability of genomic evaluation

Abstract

Keywords

Implications

Introduction

Material and methods

Marker data

Phenotypic data

Statistical analyses

Results and discussion

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests