Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-26T06:37:20.341Z Has data issue: false hasContentIssue false

Penalized classification for optimal statistical selection of markers from high-throughput genotyping: application in sheep breeds

Published online by Cambridge University Press:  24 October 2017

G. Sottile*
Affiliation:
Dipartimento Scienze Economiche, Aziendali e Statistiche, University of Palermo, Palermo, Italy
M. T. Sardina
Affiliation:
Dipartimento Scienze Agrarie, Alimentari e Forestali, University of Palermo, Palermo, Italy
S. Mastrangelo
Affiliation:
Dipartimento Scienze Agrarie, Alimentari e Forestali, University of Palermo, Palermo, Italy
R. Di Gerlando
Affiliation:
Dipartimento Scienze Agrarie, Alimentari e Forestali, University of Palermo, Palermo, Italy
M. Tolone
Affiliation:
Dipartimento Scienze Agrarie, Alimentari e Forestali, University of Palermo, Palermo, Italy
M. Chiodi
Affiliation:
Dipartimento Scienze Economiche, Aziendali e Statistiche, University of Palermo, Palermo, Italy
B. Portolano
Affiliation:
Dipartimento Scienze Agrarie, Alimentari e Forestali, University of Palermo, Palermo, Italy
*
Get access

Abstract

The identification of individuals’ breed of origin has several practical applications in livestock and is useful in different biological contexts such as conservation genetics, breeding and authentication of animal products. In this paper, penalized multinomial regression was applied to identify the minimum number of single nucleotide polymorphisms (SNPs) from high-throughput genotyping data for individual assignment to dairy sheep breeds reared in Sicily. The combined use of penalized multinomial regression and stability selection reduced the number of SNPs required to 48. A final validation step on an independent population was carried out obtaining 100% correctly classified individuals. The results using independent analysis, such as admixture, Fst, principal component analysis and random forest, confirmed the ability of these methods in selecting distinctive markers. The identified SNPs may constitute a starting point for the development of a SNP based identification test as a tool for breed assignment and traceability of animal products.

Type
Research Article
Copyright
© The Animal Consortium 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alexander, D and Lange, K 2011. Enhancements to the admixture algorithm for individual ancestry estimation. BMC Bioinformatics 12, 16.Google Scholar
Allen, AR, Taylor, M, McKeown, B, Curry, AI, Lavery, JF, Mitchell, A, Hartshorne, D, Fries, R and Skuce, RA 2010. Compilation of a panel of informative single nucleotide polymorphisms for bovine identification in the northern Irish cattle population. BMC Genetics 11, 18.CrossRefGoogle ScholarPubMed
Bertolini, F, Galimberti, G, Calò, DG, Schiavo, G, Matassino, D and Fontanesi, L 2015. Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: application in cattle breeds. Journal of Animal Breeding and Genetics 132, 346356.Google Scholar
Bertolini, F, Galimberti, G, Schiavo, G, Mastrangelo, S, Di Gerlando, R, Strillacci, MG, Bagnato, A, Portolano, B and Fontanesi, L 2017. Preselection statistics and random forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds. Animal https://doi.org/10.1017/S1751731117001355.Google Scholar
Bowcock, AM, Ruiz-Linares, A, Tomfohrde, J, Minch, E, Kidd, JR and Cavalli-Sforza, LL 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455457.Google Scholar
Dimauro, C, Cellesi, M, Steri, R, Gaspa, G, Sorbolini, S, Stella, A and Macciotta, NPP 2013. Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes. Animal Genetics 44, 377382.Google Scholar
Dimauro, C, Nicoloso, L, Cellesi, M, Macciotta, NPP, Ciani, E, Moioli, B, Pilla, F and Crepaldi, P 2015. Selection of discriminant SNP markers for breed and geographic assignment of Italian sheep. Small Ruminant Research 128, 2733.Google Scholar
Friedman, J, Hastie, T and Tibshirani, R 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 122.CrossRefGoogle ScholarPubMed
Heaton, MP, Leymaster, KA, Kalbfeisch, TS, Kijas, JW, Clarke, SM, McEwan, J, Maddox, JF, Basnayake, V, Petrik, DT, Simpson, B, Smith, TP and Chitko-McKown, CG 2014. SNPs for parentage testing and traceability in globally diverse breeds of sheep. PLoS One 9, e94851.Google Scholar
Hulsegge, B, Calus, MPL, Windig, JJ, Hoving-Bolink, AH, Maurice-van Eijndhoven, MH and Hiemstra, SJ 2013. Selection of SNP from 50K and 777K arrays to predict breed of origin in cattle. Journal of Animal Science 91, 51285134.CrossRefGoogle ScholarPubMed
Jakobsson, M and Rosenberg, NA 2007. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 18011806.Google Scholar
Kuehn, LA, Keele, JW, Bennett, GL, McDaneld, TG, Smith, TP, Snelling, WM, Sonstegard, TS and Thallman, RM 2011. Predicting breed composition using breed frequencies of 50,000 markers from the US Meat Animal Research Center 2,000 Bull Project. Journal of Animal Science 89, 17421750.Google Scholar
Kruskal, WH and Wallis, WA 1952. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 47, 583621.Google Scholar
Mastrangelo, S, Di Gerlando, R, Tolone, M, Tortorici, L, Sardina, MT and Portolano, B 2014. Genome wide linkage disequilibrium and genetic structure in Sicilian dairy sheep breeds. BMC Genetics 15, 108.Google Scholar
Mastrangelo, S, Portolano, B, Di Gerlando, R, Ciampolini, R, Tolone, M and Sardina, MT 2017. Genome-wide analysis in endangered populations: a case study in Barbaresca sheep breed. Animal 12, 110.Google Scholar
Mastrangelo, S, Sardina, MT, Riggio, V and Portolano, B 2012. Study of polymorphisms in the promoter region of ovine β-lactoglobulin gene and phylogenetic analysis among the Valle del Belice breed and other sheep breeds considered as ancestors. Molecular Biology Reports 39, 745751.CrossRefGoogle Scholar
Meinshausen, N and Bühlmann, P 2010. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 417473.Google Scholar
Negrini, R, Nicoloso, L, Crepaldi, P, Milanesi, E, Colli, L, Chegdani, F, Pariset, L, Dunner, S, Leveziel, H, Williams, JL and Ajmone Marsan, P 2009. Assessing SNP markers for assigning individuals to cattle populations. Animal Genetics 40, 1826.CrossRefGoogle ScholarPubMed
Nicolazzi, E, Caprera, A, Nazzicari, N, Cozzi, P, Strozzi, F, Lawley, C, Pirani, A, Soans, C, Brew, F, Jorjani, H, Evans, G, Simpson, B, Tosse-Klopp, G, Brauning, R, Williams, JL and Stella, A 2015. SNPchiMp v.3: integrating and standardizing single nucleotide polymorphism data for livestock species. BMC Genomics 16, 283.Google Scholar
Paschou, P, Ziv, E, Burchard, EG, Choudhry, S, Rodriguez-Cintron, W, Mahoney, MW and Drineas, P 2007. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics 3, e160.CrossRefGoogle ScholarPubMed
R Core Team 2016. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
Rousset, F 2008. GENEPOP ‘007: a complete re-implementation of the GENEPOP software for Windows and Linux. Molecular Ecology Resources 8, 103106.CrossRefGoogle ScholarPubMed
Rosenberg, NA 2005. Algorithms for selecting informative marker panels for population assignment. Journal of Computational Biology 12, 11831201.Google Scholar
Shriver, MD, Smith, MW, Jin, L, Akey, JM, Deka, R and Ferrell, RE 1997. Ethnic-affiliation estimation by use of population-specific DNA markers. American Journal of Human Genetics 60, 957964.Google ScholarPubMed
Tibshirani, R 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58, 267288.Google Scholar
Tolone, M, Mastrangelo, S, Rosa, AJM and Portolano, B 2012. Genetic diversity and population structure of Sicilian sheep breeds using microsatellite markers. Small Ruminant Research 102, 1825.CrossRefGoogle Scholar
Wilkinson, S, Wiener, P, Archibald, AL, Law, A, Schnabel, RD, McKay, SD, Taylor, JF and Ogden, R 2011. Evaluation of approaches for identifying population informative markers from high density SNP chips. BMC Genetics 12, 45.CrossRefGoogle ScholarPubMed
Supplementary material: File

Sottile et al. supplementary material

Table S1 and Figures S1-S2

Download Sottile et al. supplementary material(File)
File 249.8 KB