From sequence reads to evolutionary inferences

doi:10.1017/CBO9781139236355.016

15 - From sequence reads to evolutionary inferences

from Part III - Next Generation Challenges and Questions

Published online by Cambridge University Press: 05 June 2016

Edited by

Joseph Hughes and

James A. Cotton: Affiliation:
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
Peter D. Olson: Affiliation:
Natural History Museum, London
Joseph Hughes: Affiliation:
University of Glasgow
James A. Cotton: Affiliation:
Wellcome Trust Sanger Institute, Cambridge

Book contents

Get access

Summary

Introduction

The history of molecular systematics can be caricatured as one of ever-increasing depth of sequence data, analysed by ever more complex models. In this respect, sequence data from whole genomes are the ultimate source of molecular markers that can act as characters for phylogenetic or population genetic analysis. While complete genomes in the strictest sense are only available for very few species, and fragmentary genome assemblies that capture the entire genome, but in many pieces, are also fairly restricted in scope beyond the prokaryotes, this is changing rapidly. More-or-less shallow genomic data, for example from EST sequencing projects, high-throughput transcriptome sequencing or some other kind of reduced-representation sequencing (see review by Davey et al. 2011) are now becoming widespread and of increasing utility in systematics and other areas of evolutionary biology. Studies using these kinds of data to reconstruct relationships between species have become known as ‘phylogenomics’, although the original usage of the term referred to using phylogenetic approaches to infer gene function (Eisen 1998), and the other parts of the research programme proposed under this name (Eisen and Fraser 2003) have been subsumed into the broader study of comparative and evolutionary genomics. Moreover, the term ‘phylogenomics’ has, perhaps, become over-extended, as datasets that claim this title vary in size and can be as few as 11 markers (Horvath et al. 2008) or as little as 30 kb of sequence data (Wiegmann et al. 2011), and in eukaryotic organisms, the ‘genomes’ in question are very often organelle (mitochondrial or chloroplast) genome sequences. Sequence data from whole genomes have the potential to be a rich source of molecular phylogenetic markers for any systematic question, but there are two areas in which large-scale, highly multi-locus data appear most valuable – occupying the two extremes of the range of timescales over which inference about evolutionary history is made.

Genome-scale data promise the ability to resolve ancient divergences, and in particular, fairly rapid (at least in geological terms) ancient radiations that have been difficult to reliably reconstruct with more limited molecular datasets. In this context, phylogenomic data have been applied to a wide taxonomic range of phylogenetic questions. Early usage of whole-genome data was in prokaryote systematics (e.g. Daubin et al. 2002).

Type: Chapter
Information: Next Generation Systematics , pp. 305 - 335

DOI: https://doi.org/10.1017/CBO9781139236355.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Aguinaldo, A. M., Turbeville, J. M., Linford, L. S., et al. (1997). Evidence for a clade of nematodes, arthropods and other moulting animals. Nature, 387, 489–93.CrossRef Google Scholar PubMed

Altenhoff, A. M. and Dessimoz, C. (2012). Inferring orthology and paralogy. Methods in Molecular Biology, 855, 259–79.Google Scholar PubMed

Altshuler, D., Pollara, V. J., Cowles, C. R., et al. (2000). An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature, 407, 513–16.CrossRef Google Scholar PubMed

Ané, C., Larget, B., Baum, D. A., Smith, S. D. and Rokas, A. (2007). Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution, 24, 412–26.CrossRef Google Scholar PubMed

Anisimova, M. and Gascuel, O. (2006). Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Systematic Biology, 55, 539–52.CrossRef Google Scholar

Assefa, S., Keane, T. M., Otto, T. D., Newbold, C. and Berriman, M. (2009). ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics, 25, 1968–9.CrossRef Google Scholar PubMed

Baird, N. A., Etter, P. D., Atwood, T. S., et al. (2008). Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One, 3, e3376.CrossRef Google Scholar PubMed

Bapteste, E., Susko, E., Leigh, J., et al. (2007). Alternative methods for concatenation of core genes indicate a lack of resolution in deep nodes of the prokaryotic phylogeny. Molecular Biology and Evolution, 25, 83–91.CrossRef Google Scholar PubMed

Barry, D. and Hartigan, J. A. (1987). Asynchronous distance between homologous DNA sequences. Biometrics, 43, 261–76.CrossRef Google Scholar PubMed

Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution, and Systematics, 41, 379–406.CrossRef Google Scholar

Blackshields, G., Wallace, I. M., Larkin, M. and Higgins, D. G. (2006). Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biology, 6, 321–39.Google Scholar PubMed

Blair, C. and Murphy, R. W. (2010). Recent trends in molecular phylogenetic analysis: where to next?Journal of Heredity, 102, 130–8.Google Scholar PubMed

Blair, J. E., Ikeo, K., Gojobori, T. and Hedges, S. B. (2002). The evolutionary position of nematodes. BMC Evolutionary Biology, 2, 7.CrossRef Google Scholar PubMed

Blanquart, S. and Lartillot, N. (2008). A site- and time-heterogeneous model of amino acid replacement. Molecular Biology and Evolution, 25, 842–58.CrossRef Google Scholar PubMed

Boetzer, M. and Pirovano, W. (2012). Toward almost closed genomes with GapFiller. Genome Biology, 13, R56.CrossRef Google Scholar PubMed

Bradnam, K. R., Fass, J. N., Alexandrov, A., et al. (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience, 2, 10.CrossRef Google Scholar PubMed

Breese, M. R. and Liu, Y. (2013). NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets. Bioinformatics, 29, 494–6.CrossRef Google Scholar PubMed

Brown, J. M. and Lemmon, A. R. (2007). The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Systematic Biology, 56, 643–55.CrossRef Google Scholar PubMed

Browning, S. R. and Browning, B. L. (2011). Haplotype phasing: existing methods and new developments. Nature Reviews Genetics, 12, 703–14.CrossRef Google Scholar PubMed

Bybee, S. M., Bracken-Grissom, H., Haynes, B. D., et al. (2011). Targeted Amplicon Sequencing (TAS): a scalable next-gen approach to multilocus, multitaxa phylogenetics. Genome Biology and Evolution, 3, 1312–23.CrossRef Google Scholar

Capella-Gutierrez, S., Silla-Martinez, J. M. and Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25, 1972–73.CrossRef Google Scholar PubMed

Carstens, B. C., Pelletier, T. A., Reid, N. M. and Satler, J. D. (2013). How to fail at species delimitation. Molecular Ecology, 22, 4369–83.CrossRef Google Scholar PubMed

Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution, 17, 540–52.CrossRef Google Scholar PubMed

Chain, P. S. G., Grafham, D. V., Fulton, R. S., et al. (2009). Genomics: genome project standards in a new era of sequencing. Science, 326, 236–7.CrossRef Google Scholar

Choi, S. C. and Hey, J. (2011). Joint inference of population assignment and demographic history. Genetics, 189, 561–77.CrossRef Google Scholar PubMed

Ciccarelli, F., Doerks, T., Mering, von, C., et al. (2006). Toward automatic reconstruction of a highly resolved Tree of Life. Science, 311, 1283–7.CrossRef Google Scholar PubMed

Compeau, P. E. C., Pevzner, P. A. and Tesler, G. (2011). How to apply de Bruijn graphs to genome assembly. Nature Biotechnology, 29, 987–91.CrossRef Google Scholar

Cotton, J. A. and Page, R. D. M. (2005). Rates and patterns of gene duplication and loss in the human genome. Proceedings of the Royal Society B-Biological Sciences, 272, 277–83.CrossRef Google Scholar PubMed

Cotton, J. A. and Wilkinson, M. (2009). Supertrees join the mainstream of phylogenetics. Trends in Ecology and Evolution, 24, 1–3.CrossRef Google Scholar PubMed

Cox, C. J., Foster, P. G., Hirt, R. P., Harris, S. R. and Embley, T. M. (2008). The archaebacterial origin of eukaryotes. Proceedings of the National Academy of Sciences of the United States of America, 105, 20356–61.Google Scholar PubMed

Creevey, C. J., Muller, J., Doerks, T., et al. (2011). Identifying single copy orthologs in Metazoa. PLoS Computational Biology, 7, e1002269.CrossRef Google Scholar PubMed

Csilléry, K., Blum, M. G. B., Gaggiotti, O. E. and François, O. (2010). Approximate Bayesian Computation (ABC) in practice. Trends in Ecology and Evolution, 25, 410–18.CrossRef Google Scholar PubMed

Dagan, T. and Martin, W. (2006). The tree of one percent. Genome Biology, 7, 118.CrossRef Google Scholar PubMed

Dalquen, D. A. and Dessimoz, C. (2013). Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biology and Evolution, 5, 1800–6.CrossRef Google Scholar PubMed

Danecek, P., Auton, A., Abecasis, G. et al. (2011). The variant call format and VCFtools. Bioinformatics, 27, 2156–8.CrossRef Google Scholar

Daubin, V., Gouy, M. and Perrière, G. (2002). A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Research, 12, 1080–90.CrossRef Google Scholar PubMed

Davey, J. W., Hohenlohe, P. A., Etter, P. D., et al. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12, 499–510.CrossRef Google Scholar PubMed

de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. and Pollock, D. D. (2011). Repetitive elements may comprise over two-thirds of the human genome. PLoS Genetics, 7, e1002384.CrossRef Google Scholar PubMed

de Queiroz, A., Donoghue, M. J. and Kim, J. (1995). Separate versus combined analysis of phylogenetic evidence. Annual Review of Ecology and Systematics, 26, 657–81.CrossRef Google Scholar

Degnan, J. H. and Rosenberg, N. A. (2006). Discordance of species trees with their most likely gene trees. PLoS Genetics, 2, e68.CrossRef Google Scholar PubMed

DeLuca, D. S., Levin, J. Z., Sivachenko, A., et al. (2012). RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics, 28, 1530–2.CrossRef Google Scholar PubMed

Downing, T., Imamura, H., Decuypere, S., et al. (2011). Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance. Genome Research, 21, 2143–56.CrossRef Google Scholar PubMed

Dunn, C. W., Hejnol, A., Matus, D. Q., et al. (2008). Broad phylogenomic sampling improves resolution of the animal Tree of Life. Nature, 452, 745–9.CrossRef Google Scholar PubMed

Dunn, C. W., Howison, M. and Zapata, F. (2013). Agalma: an automated phylogenomics workflow. BMC Bioinformatics, 14, 330.CrossRef Google Scholar PubMed

Edgecombe, G. D., Giribet, G., Dunn, C. W., et al. (2011). Higher-level Metazoan relationships: recent progress and remaining questions. Organisms Diversity and Evolution, 11, 151–72.CrossRef Google Scholar

Edwards, S. V., Liu, L. and Pearl, D. K. (2007). High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences of the United States of America, 104, 5936–41.Google Scholar PubMed

Eisen, J. A. (1998). Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research, 8, 163–7.CrossRef Google Scholar PubMed

Eisen, J. A. and Fraser, C. M. (2003). Phylogenomics: intersection of evolution and genomics. Science, 300, 1706–7.CrossRef Google Scholar

Erixon, P., Svennblad, B., Britton, T. and Oxelman, B. (2003). Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Systematic Biology, 52, 665–73.CrossRef Google Scholar PubMed

Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoretical Population Biology, 3, 87–112.CrossRef Google Scholar PubMed

Excoffier, L., Dupanloup, I., Huerta-Sãnchez, E., Sousa, V. C. and Foll, M. (2013). Robust demographic inference from genomic and SNP data. PLoS Genetics, 9, e1003905.CrossRef Google Scholar PubMed

Fedrigo, O., Naylor, G. and Collins, T. (2005). Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Systematic Biology, 54, 493–500.Google Scholar

Flouri, T., Izquierdo-Carrasco, F., Darriba, D., et al. (2015). The phylogenetic likelihood library. Systematic Biology, 64, 356–62.CrossRef Google Scholar PubMed

Fonseca, N. A., Rung, J., Brazma, A. and Marioni, J. C. (2012). Tools for mapping high-throughput sequencing data. Bioinformatics, 28, 3169–77.CrossRef Google Scholar PubMed

Foster, P. G. (2004). Modeling compositional heterogeneity. Systematic Biology, 53, 485–95.CrossRef Google Scholar PubMed

Galtier, N. and Gouy, M. (1998). Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular Biology and Evolution, 15, 871–9.CrossRef Google Scholar PubMed

Gascuel, O. and Steel, M. (2006). Neighbor-joining revealed. Molecular Biology and Evolution, 23, 1997–2000.CrossRef Google Scholar PubMed

Gatesy, J. and Baker, R. (2005). Hidden likelihood support in genomic data: can forty-five wrongs make a right?Systematic Biology, 54, 483–92.CrossRef Google Scholar PubMed

Gayral, P., Melo-Ferreira, J., Glémin, S., et al. (2013). Reference-free population genomics from next-generation transcriptome data and the vertebrate–invertebrate gap. PLoS Genetics, 9, e1003457.CrossRef Google Scholar PubMed

Gee, H. (2003). Evolution: ending incongruence. Nature 425, 782.CrossRef Google Scholar PubMed

Gnirke, A., Melnikov, A., Maguire, J., et al. (2009). Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology, 27, 182–9.CrossRef Google Scholar PubMed

Godden, G. T., Jordon-Thaden, I. E. and Chamala, S. (2012). Making next-generation sequencing work for you: approaches and practical considerations for marker development and phylogenetics. Plant Ecology and Diversity, 5, 427–50.CrossRef Google Scholar

Goloboff, P. A., Farris, J. S. and Nixon, K. C. (2008). TNT, a free program for phylogenetic analysis. Cladistics, 24, 774–86.CrossRef Google Scholar

Goodman, M., Czelusniak, J., Moore, G. W., Romero-Herrera, A. E. and Matsuda, G. (1979). Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms from globin sequences. Systematic Zoology, 28, 132–63.CrossRef Google Scholar

Grant, J. R. and Katz, L. A. (2014). Building a phylogenomic pipeline for the eukaryotic tree of life – addressing deep phylogenies with genome-scale data. PLoS Currents Apr, 6.Google Scholar PubMed

Gremme, G., Steinbiss, S. and Kurtz, S. (2013). GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10, 645–56.CrossRef Google Scholar PubMed

Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. and Siepel, A. (2011). Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics, 43, 1031–4.CrossRef Google Scholar PubMed

Guindon, S. and Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52, 696–704.CrossRef Google Scholar PubMed

Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. and Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics, 5, e1000695.CrossRef Google Scholar PubMed

Harris, K. and Nielsen, R. (2013). Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genetics, 9, e1003521.CrossRef Google Scholar PubMed

Heled, J. and Drummond, A. J. (2010). Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27, 570–80.CrossRef Google Scholar PubMed

Hess, J. and Goldman, N. (2011). Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One, 6, e22783.CrossRef Google Scholar PubMed

Hobolth, A., Christensen, O. F., Mailund, T. and Schierup, M. H. (2007). Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genetics, 3, e7.CrossRef Google Scholar PubMed

Holland, B. R. (2004). Using consensus networks to visualize contradictory evidence for species phylogeny. Molecular Biology and Evolution, 21, 1459–61.CrossRef Google Scholar PubMed

Holland, B. R., Jarvis, P. D. and Sumner, J. G. (2012). Low-parameter phylogenetic inference under the general Markov model. Systematic Biology, 62, 78–92.Google Scholar PubMed

Horvath, J. E., Weisrock, D. W., Embry, S. L., et al. (2008). Development and application of a phylogenomic toolkit: resolving the evolutionary history of Madagascar's lemurs. Genome Research, 18, 489–99.CrossRef Google Scholar PubMed

Hunt, M., Newbold, C., Berriman, M. and Otto, T. D. (2014). A comprehensive evaluation of assembly scaffolding tools. Genome Biology, 15, R42.CrossRef Google Scholar PubMed

Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. and McVean, G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics, 44, 226–32.CrossRef Google Scholar PubMed

Jeffroy, O., Brinkmann, H., Delsuc, F. and Philippe, H. (2006). Phylogenomics: the beginning of incongruence?Trends in Genetics, 22, 225–31.CrossRef Google Scholar PubMed

Jones, M. O., Koutsovoulos, G. D. and Blaxter, M. L. (2011). iPhy: an integrated phylogenetic workbench for supermatrix analyses. BMC Bioinformatics, 12, 30.CrossRef Google Scholar PubMed

Kao, R. R., Haydon, D. T., Lycett, S. J. and Murcia, P. R. (2014). Supersize me: how whole-genome sequencing and big data are transforming epidemiology. Trends in Microbiology, 22, 282–91.CrossRef Google Scholar PubMed

Koren, S., Harhay, G. P., Smith, T. P., et al. (2013). Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biology, 14, R101.CrossRef Google Scholar PubMed

Kubatko, L. S., Carstens, B. C. and Knowles, L. L. (2009). STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics, 25, 971–3.CrossRef Google Scholar PubMed

Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L. and Tamura, K. (2012). Statistics and truth in phylogenomics. Molecular Biology and Evolution, 29, 457–72.CrossRef Google Scholar PubMed

Landan, G. and Graur, D. (2007). Heads or tails: a simple reliability check for multiple sequence alignments. Molecular Biology and Evolution, 24, 1380–3.CrossRef Google Scholar PubMed

Lanfear, R., Calcott, B., Ho, S. Y. W. and Guindon, S. (2012). PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution, 29, 1695–701.CrossRef Google Scholar PubMed

Lartillot, N. and Philippe, H. (2004). A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution, 21, 1095–109.CrossRef Google Scholar PubMed

Latreille, P., Norton, S., Goldman, B. S., et al. (2007). Optical mapping as a routine tool for bacterial genome sequence finishing. BMC Genomics, 8, 321.CrossRef Google Scholar PubMed

Lee, E. K., Cibrian-Jaramillo, A., Kolokotronis, S.-O., et al. (2011). A functional phylogenomic view of the seed plants. PLoS Genetics, 7, e1002411.CrossRef Google Scholar PubMed

Lemmon, A. R., Brown, J. M., Stanger-Hall, K. and Lemmon, E. M. (2009). The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Systematic Biology, 58, 130–45.CrossRef Google Scholar PubMed

Lemmon, A. R., Emme, S. A. and Lemmon, E. M. (2012). Anchored hybrid enrichment for massively high-throughput phylogenomics. Systematic Biology, 61, 727–44.CrossRef Google Scholar PubMed

Lemmon, E. M. and Lemmon, A. R. (2013). High-throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 44, 99–121.CrossRef Google Scholar

Li, H. and Durbin, R. (2011). Inference of human population history from individual whole-genome sequences. Nature, 475, 493–6.CrossRef Google Scholar PubMed

Li, H., Handsaker, B., Wysoker, A. et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–9.CrossRef Google Scholar PubMed

Li, H. and Homer, N. (2010). A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11, 473–83.CrossRef Google Scholar PubMed

Li, L., Stoeckert, C. J. and Roos, D. S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research, 13, 2178–89.CrossRef Google Scholar PubMed

Li, R., Zhu, H., Ruan, J., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20, 265–72.CrossRef Google Scholar PubMed

Liu, K., Raghavan, S., Nelesen, S., Linder, C. R. and Warnow, T. (2009). Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science, 324, 1561–4.CrossRef Google Scholar PubMed

Liu, L., Yu, L., Kubatko, L., Pearl, D. K. and Edwards, S. V. (2009). Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution, 53, 320–8.CrossRef Google Scholar PubMed

Löytynoja, A. and Goldman, N. (2005). An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the United States of America, 102, 10557–62.Google Scholar PubMed

Löytynoja, A. and Milinkovitch, M. C. (2001). SOAP: cleaning multiple alignments from unstable blocks. Bioinformatics, 17, 573–4.CrossRef Google Scholar PubMed

Maddison, W. and Knowles, L. (2006). Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21–30.CrossRef Google Scholar PubMed

Mallatt, J. M., Garey, J. R. and Shultz, J. W. (2004). Ecdysozoan phylogeny and Bayesian inference: first use of nearly complete 28S and 18S rRNA gene sequences to classify the arthropods and their kin. Molecular Phylogenetics and Evolution, 31, 178–91.CrossRef Google Scholar PubMed

Mamanova, L., Coffey, A. J., Scott, C. E., et al. (2010). Target-enrichment strategies for next-generation sequencing. Nature Methods, 7, 111–18.Google Scholar PubMed

Manske, M., Miotto, O., Campino, S., et al. (2012). Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature, 487, 375–9.CrossRef Google Scholar PubMed

McCormack, J. E., Hird, S. M., Zellmer, A. J., Carstens, B. C. and Brumfield, R. T. (2013). Applications of next-generation sequencing to phylogeography and phylogenetics. Molecular Phylogenetics and Evolution, 66, 526–38.CrossRef Google Scholar PubMed

McVean, G. A. T. and Cardin, N. J. (2005). Approximating the coalescent with recombination. Philosophical Transactions of the Royal Society B-Biological Sciences, 360, 1387–93.CrossRef Google Scholar PubMed

Medvedev, P., Stanciu, M. and Brudno, M. (2009). Computational methods for discovering structural variation with next-generation sequencing. Nature Methods, 6, S13–S20.CrossRef Google Scholar PubMed

Miller, J. R., Koren, S. and Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–27.CrossRef Google Scholar PubMed

Morrison, D. A. and Ellis, J. T. (1997). Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Molecular Biology and Evolution, 14, 428–41.CrossRef Google Scholar PubMed

Mullikin, J. C. and Ning, Z. (2003). The Phusion Assembler. Genome Research, 13, 81–90.CrossRef Google Scholar PubMed

Nguyen-Dumont, T., Pope, B. J., Hammet, F., Southey, M. C. and Park, D. J. (2013). A high-plex PCR approach for massively parallel sequencing. BioTechniques, 55, 69–74.CrossRef Google Scholar PubMed

Nichols, R. (2001). Gene trees and species trees are not the same. Trends in Ecology and Evolution, 16, 358–64.CrossRef Google Scholar

Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. and Clark, A. G. (2007). Recent and ongoing selection in the human genome. Nature Reviews Genetics, 8, 857–68.CrossRef Google Scholar PubMed

Nielsen, R., Paul, J. S., Albrechtsen, A. and Song, Y. S. (2011). Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics, 12, 443–51.CrossRef Google Scholar PubMed

Nosenko, T., Schreiber, F., Adamska, M., et al. (2013). Deep metazoan phylogeny: when different genes tell different stories. Molecular Phylogenetics and Evolution, 67, 223–33.CrossRef Google Scholar PubMed

Nylander, J. A. A., Ronquist, F., Huelsenbeck, J. P. and Nieves-Aldrey, J.-L. (2004). Bayesian phylogenetic analysis of combined data. Systematic Biology, 53, 47–67.CrossRef Google Scholar PubMed

Ogden, T. H. and Rosenberg, M. S. (2006). Multiple sequence alignment accuracy and phylogenetic inference. Systematic Biology, 55, 314–28.CrossRef Google Scholar PubMed

Page, R. D. and Charleston, M. A. (1997). From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution, 7, 231–40.CrossRef Google Scholar PubMed

Pagel, M. and Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Systematic Biology, 53, 571–81.CrossRef Google Scholar PubMed

Parkhill, J. (2002). The importance of complete genome sequences. Trends in Microbiology, 10, 219–20; author reply 220.CrossRef Google Scholar PubMed

Penny, D., McComish, B. J., Charleston, M. A. and Hendy, M. D. (2014). Mathematical elegance with biochemical realism: the covarion model of molecular evolution. Journal of Molecular Evolution, 53, 711–23.Google Scholar

Perkel, J. (2008). SNP genotyping: six technologies that keyed a revolution. Nature Methods, 5, 447–53.Google Scholar

Philip, G. K., Creevey, C. J. and McInerney, J. O. (2005). The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Molecular Biology and Evolution, 22, 1175–84.CrossRef Google Scholar

Philippe, H., Delsuc, F., Brinkmann, H. and Lartillot, N. (2005a). Phylogenomics. Annual Review of Ecology, Evolution, and Systematics, 36, 541–62.CrossRef Google Scholar

Philippe, H., Lartillot, N. and Brinkmann, H. (2005b). Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Molecular Biology and Evolution, 22, 1246–53.CrossRef Google Scholar PubMed

Phillips, M. J. (2004). Genome-scale phylogeny and the detection of systematic biases. Molecular Biology and Evolution, 21, 1455–8.CrossRef Google Scholar PubMed

Pisani, D. (2004). Identifying and removing fast-evolving sites using compatibility analysis: an example from the Arthropoda. Systematic Biology, 53, 978–89.CrossRef Google Scholar PubMed

Pisani, D., Cotton, J. A. and McInerney, J. O. (2007). Supertrees disentangle the chimerical origin of eukaryotic genomes. Molecular Biology and Evolution, 24, 1752–60.CrossRef Google Scholar PubMed

Pons, J., Barraclough, T., Gómez-Zurita, J., et al. (2006). Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology, 55, 595–609.CrossRef Google Scholar PubMed

Pool, J. E., Hellmann, I., Jensen, J. D. and Nielsen, R. (2010). Population genetic inference from genomic sequence variation. Genome Research, 20, 291–300.CrossRef Google Scholar PubMed

Posada, D. and Buckley, T. (2004). Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53, 793–808.CrossRef Google Scholar PubMed

Qiu, Y.-L., Li, L., Wang, B., et al. (2006). The deepest divergences in land plants inferred from phylogenomic evidence. Proceedings of the National Academy of Sciences of the United States of America, 103, 15511–16.Google Scholar PubMed

Quinlan, A. R. and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–2.CrossRef Google Scholar PubMed

Rannala, B. and Yang, Z. (2003). Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164, 1645–56.CrossRef Google Scholar PubMed

Rannala, B. and Yang, Z. (2008). Phylogenetic inference using whole genomes. Annual Review of Genomics and Human Genetics, 9, 217–31.CrossRef Google Scholar PubMed

Rhaesa, A. S., Bartolomaeus, T., Lemburg, C., Ehlers, U. and Garey, J. R. (1998). The position of the Arthropoda in the phylogenetic system. Journal of Morphology, 238, 263–85.Google Scholar

Rodríguez-Ezpeleta, N., Brinkmann, H., Roure, B., et al. (2007). Detecting and overcoming systematic errors in genome-scale phylogenies. Systematic Biology, 56, 389–99.CrossRef Google Scholar PubMed

Rokas, A., Williams, B. L., King, N. and Carroll, S. B. (2003). Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature, 425, 798–804.CrossRef Google Scholar PubMed

Rosenberg, M. S., ed. (2011). Sequence Alignment: Methods, Models, Concepts, and Strategies. Oakland, CA, University of California Press.Google Scholar

Rosenberg, N. A. and Nordborg, M. (2002). Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Reviews Genetics, 3, 380–90.CrossRef Google Scholar PubMed

Roth, A. C., Gonnet, G. H. and Dessimoz, C. (2009). Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics, 10, 220.CrossRef Google Scholar

Roure, B., Baurain, D. and Philippe, H. (2012). Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Molecular Biology and Evolution, 30, 197–214.Google Scholar PubMed

Salichos, L. and Rokas, A. (2014). Inferring ancient divergences requires genes with strong phylogenetic signals. Nature, 497, 327–31.Google Scholar

Sankoff, D., Morel, C. and Cedergren, R. J. (1973). Evolution of 5S RNA and the non-randomness of base replacement. Nature New Biology, 245, 232–4.CrossRef Google Scholar PubMed

Scheinfeldt, L. B. and Tishkoff, S. A. (2013). Recent human adaptation: genomic approaches, interpretation and insights. Nature Reviews Genetics, 14, 692–702.CrossRef Google Scholar PubMed

Schiffels, S. and Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. Nature Genetics, 46, 919–25.CrossRef Google Scholar PubMed

Schneeberger, K., Ossowski, S., Ott, F., et al. (2011). Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proceedings of the National Academy of Sciences of the United States of America, 108, 10249–54.Google Scholar PubMed

Scholtz, G. (2002). The Articulata hypothesis – or what is a segment?Organisms Diversity and Evolution, 2, 197–215.CrossRef Google Scholar

Shapiro, B. (2005). Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Molecular Biology and Evolution, 23, 7–9.Google Scholar PubMed

Simpson, J. T. and Durbin, R. (2010). Efficient construction of an assembly string graph using the FM-Index. Bioinformatics, 26, i367–73.CrossRef Google Scholar PubMed

Simpson, J. T. and Durbin, R. (2012). Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 22, 549–56.CrossRef Google Scholar PubMed

Simpson, J. T., Wong, K., Jackman, S. D., et al. (2009). ABySS: a parallel assembler for short read sequence data. Genome Research, 19, 1117–23.CrossRef Google Scholar PubMed

Smith, B. T., Harvey, M. G., Faircloth, B. C., Glenn, T. C. and Brumfield, R. T. (2013). Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales. Systematic Biology, 63, 83–95.Google Scholar PubMed

Sousa, V. and Hey, J. (2013). Understanding the origin of species with genome-scale data: modelling gene flow. Nature Reviews Genetics, 14, 404–14.CrossRef Google Scholar PubMed

Spang, A., Saw, J. H., Jørgensen, S. L., et al. (2015). Complex Archaea that bridge the gap between prokaryotes and eukaryotes. Nature, 521, 173–9.CrossRef Google Scholar

Stamatakis, A. (2014). RAxML Version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312–13.CrossRef Google Scholar PubMed

Stamatakis, A., Hoover, P. and Rougemont, J. (2008). A rapid bootstrap algorithm for the RAxML web servers. Systematic Biology, 57, 758–71.CrossRef Google Scholar PubMed

Steel, M. (2005). Should phylogenetic models be trying to “fit an elephant”?Trends in Genetics, 21, 307–9.CrossRef Google Scholar

Struck, T. H., Paul, C., Hill, N., et al. (2011). Phylogenomic analyses unravel annelid evolution. Nature, 471, 95–98.CrossRef Google Scholar PubMed

Suchard, M. A. and Rambaut, A. (2009). Many-core algorithms for statistical phylogenetics. Bioinformatics, 25, 1370–76.CrossRef Google Scholar PubMed

Swain, M. T., Tsai, I. J., Assefa, S. A., et al. (2012). A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nature Protocols, 7, 1260–84.CrossRef Google Scholar PubMed

Swofford, D. L., Olsen, G. J., Waddell, P. J. and Hillis, D. M. (1996). Phylogenetic inference. In Molecular Systematics, ed. Hillis, D. M., Moritz, C. and Mable, B. K.. Sunderland, MA, Sinauer Associates; pp. 407–515.Google Scholar

Szöllősi, G. J., Tannier, E., Daubin, V. and Boussau, B. (2015). The inference of gene trees with species trees. Systematic Biology, 64, e42–e62.CrossRef Google Scholar PubMed

Taylor, D. J. (2004). An assessment of accuracy, error, and conflict with support values from genome-scale phylogenetic data. Molecular Biology and Evolution, 21, 1534–7.CrossRef Google Scholar PubMed

Telford, M. J., Bourlat, S. J., Economou, A., Papillon, D. and Rota-Stabelli, O. (2008). The evolution of the Ecdysozoa. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences, 363, 1529–37.Google Scholar PubMed

Tewhey, R., Warner, J. B., Nakano, M., et al. (2009). Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nature Biotechnology, 27, 1025–31.CrossRef Google Scholar PubMed

The 1000 Genomes Project Consortium (2013). An integrated map of genetic variation from 1,092 human genomes. Nature, 490, 56–65.Google Scholar

Thompson, J. D., Linard, B., Lecompte, O. and Poch, O. (2011). A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One, 6, e18093.CrossRef Google Scholar PubMed

Thompson, J. F. and Milos, P. M. (2011). The properties and applications of single-molecule DNA sequencing. Genome Biology, 12, 217.CrossRef Google Scholar

Timme, R. E., Bachvaroff, T. R. and Delwiche, C. F. (2012). Broad phylogenomic sampling and the sister lineage of land plants. PLoS One, 7, e29696.CrossRef Google Scholar PubMed

Treangen, T. J. and Salzberg, S. L. (2011). Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics, 13, 36–46.Google Scholar PubMed

Trivedi, U. H. (2014). Quality control of next-generation sequencing data without a reference. Frontiers in Genetics, 5, 111.CrossRef Google Scholar PubMed

Turner, E. H., Ng, S. B., Nickerson, D. A. and Shendure, J. (2009). Methods for genomic partitioning. Annual Review of Genomics and Human Genetics, 10, 263–84.CrossRef Google Scholar PubMed

Vilella, A. J., Severin, J., Ureta-Vidal, A., et al. (2008). EnsemblCompara genetrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Research, 19, 327–35.Google Scholar PubMed

Vitti, J. J., Grossman, S. R. and Sabeti, P. C. (2013). Detecting natural selection in genomic data. Annual Review of Genetics, 47, 97–120.CrossRef Google Scholar PubMed

Watson, M. (2014). Quality assessment and control of high-throughput sequencing data. Frontiers in Genetics, 5, 235.CrossRef Google Scholar PubMed

Westesson, O., Barquist, L. and Holmes, I. (2012). HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstruction. Bioinformatics, 28, 1170–1.CrossRef Google Scholar PubMed

Wheeler, W. C. and Gladstein, D. S. (1994). MALIGN: a multiple sequence alignment program. Journal of Heredity, 85, 417–18.CrossRef Google Scholar

Whelan, N. V., Kocot, K. M., Moroz, L. L. and Halanych, K. M. (2015). Error, signal, and the placement of Ctenophora sister to all other animals. Proceedings of the National Academy of Sciences of the United States of America, 112, 5773–8.Google Scholar PubMed

Whelan, S. (2008). Spatial and temporal heterogeneity in nucleotide sequence evolution. Molecular Biology and Evolution, 25, 1683–94.CrossRef Google Scholar PubMed

Whitelaw, C. A., Barbazuk, W. B., Pertea, G., et al. (2003). Enrichment of gene-coding sequences in maize by genome filtration. Science, 302, 2118–20.CrossRef Google Scholar PubMed

Wiegmann, B. M., Trautwein, M. D., Winkler, I. S., et al. (2011). Episodic radiations in the fly tree of life. Proceedings of the National Academy of Sciences of the United States of America, 108, 5690–5.Google Scholar PubMed

Wilkinson, M. (2006). Identifying stable reference taxa for phylogenetic nomenclature. Zoologica Scripta, 35, 109–12.CrossRef Google Scholar

Williams, T. A., Foster, P. G., Cox, C. J. and Embley, T. M. (2014). An archaeal origin of eukaryotes supports only two primary domains of life. Nature, 504, 231–6.Google Scholar

Williams, T. A., Foster, P. G., Nye, T. M. W., Cox, C. J. and Embley, T. M. (2012). A congruent phylogenomic signal places eukaryotes within the Archaea. Proceedings of the Royal Society B – Biological Sciences, 279, 4870–9.CrossRef Google Scholar PubMed

Wong, K. M., Suchard, M. A. and Huelsenbeck, J. P. (2008). Alignment uncertainty and genomic analysis. Science, 319, 473–6.CrossRef Google Scholar PubMed

Wood, D. E. and Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15, R46.CrossRef Google Scholar PubMed

Wu, M., Chatterji, S. and Eisen, J. A. (2012). Accounting for alignment uncertainty in phylogenomics. PLoS One, 7, e30288.CrossRef Google Scholar PubMed

Wu, M. and Eisen, J. A. (2008). A simple, fast, and accurate method of phylogenomic inference. Genome Biology, 9, R151.CrossRef Google Scholar PubMed

Yalcin, B., Adams, D. J., Flint, J. and Keane, T. M. (2012). Next-generation sequencing of experimental mouse strains. Mammalian Genome, 23, 490–8.CrossRef Google Scholar PubMed

Yang, Z. (1996a). Maximum-likelihood models for combined analyses of multiple sequence data. Journal of Molecular Evolution, 42, 587–96.CrossRef Google Scholar PubMed

Yang, Z. (1996b). Among-site rate variation and its impact on phylogenetic analyses. Trends in Ecology and Evolution, 11, 367–72.CrossRef Google Scholar PubMed

Zerbino, D. R. and Birney, E. (2008). Velvet: algorithms for de novo short read assembly using De Bruijn graphs. Genome Research, 18, 821–9.CrossRef Google Scholar

Zhou, X. and Rokas, A. (2014). Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Molecular Ecology, 23, 1679–700.CrossRef Google Scholar PubMed