Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-26T20:33:38.725Z Has data issue: false hasContentIssue false

Structure, not Bias

Published online by Cambridge University Press:  12 October 2017

Steven M. Holland*
Affiliation:
Department of Geology, University of Georgia, Athens, Georgia 30602-2501, USA 〈[email protected]

Abstract

Type
Presidential Address
Copyright
Copyright © 2017, The Paleontological Society 

Darwin’s Origin of Species, Chapter 9. We have all read it. It is the one where Darwin points out the poor quality of the fossil record, how it does not sufficiently support his ideas. Biologists have also read it. For most scientists, Darwin’s words are their most lasting impression of the fossil record. Compounding matters, we have underscored and emphasized Darwin’s point for the past 150 years by routinely highlighting incompleteness and bias. And if bias was not good enough at scaring off the biologists, we have added megabias.

Steven M. Holland

Through my career, I have considered the nature of the fossil record, what we call bias, and how we respond to it. Tonight, I want to suggest that we take a different path.

As a record of everything that has ever lived on earth, the fossil record is an imperfect and incomplete data set. We know this. However, all data sets are incomplete. For example, we often make a point in our classes about the rarity of fossilization, that only a tiny fraction of organisms that have lived are fossilized. This is true, but it is also true that only a tiny fraction of organisms alive today, much less over the history of earth, will ever be sequenced. We would never argue that all organisms or even that all species must be sequenced for molecular data to be useful. When we emphasize the rarity of fossilization, we hold the fossil record to an unfair standard.

We could examine data sets from many other fields, and if we approached those data as we approach our own, we would find incompleteness and bias. Our exaggerated emphasis on the imperfection of the fossil record feeds the perception among scientists in general that the fossil record is an unusually poor data set. It isn’t.

Part of this perception comes from how we understand what we have called bias, and how we respond to it. All data sets have a distribution. For some data, the distribution may be simple, like a normal or an exponential. When we analyze those data, we are required to use methods that are appropriate for that distribution. When we choose those methods, we are said to be specifying a model of the data. This is important, because if our data do not have the distribution required by our methods, the problem is not in the data—the problem is that we have chosen the wrong way to analyze them.

For example, in Bayesian phylogenetic methods, one has to make a set of assumptions called the priors, and one common assumption is the probability of fossil preservation through time, which is generally treated as a constant. There are other priors as well, and a model for each of those must also be specified. If these assumptions or priors are not valid, the approach of molecular phylogeneticists is not to say that the data are biased, but that the model is misspecified. Instead of stopping there, they revise the model so that it better reflects the nature of the world. We need to do the same. We need to think less about bias as an end, and more about model specification as a way forward.

We have a long history of focusing on bias and incompleteness, but we ought to be focusing on the structure of the fossil record, how the fossil record is actually assembled. Considering that structure will help us to be better at model specification, better at interpreting the fossil record.

We already know much about the structure of the fossil record. For starters, the fossil record is, by and large, time-averaged. For invertebrates, a typical bed contains organisms that lived over a time span on the order of a century (Kowalewski and Bambach, Reference Kowalewski and Bambach2003). That structure imposes a lower limit on what we can resolve and therefore what processes we can study. On longer time scales, sequence stratigraphic architecture is the main control on the occurrence of fossils (Holland, Reference Holland2000), and on even longer time scales, basin formation is what matters most (Holland, Reference Holland2016). Knowing this structure will let us frame problems that we can test.

A skeptic might say that thinking about the fossil record in terms of structure and model specification rather than bias and incompleteness is merely the swapping of words, but it is much more than that. A focus on structure and model specification reflects a change in outlook and strategy, one that will improve our analyses. It will also improve how other scientists view what can be done with the fossil record and what they think about science of paleontology in general. Our aim should be to emphasize how the fossil record informs us, not that it is biased. We also need to consider what our biological colleagues hear from us and what they see in print. Titles that shout incompleteness, bias, and megabias do us no favors.

I am not arguing that we should ignore the nature or quality of the fossil record. Absolutely we should consider them; an attention to the nature of our data is one of the strengths of our field. But we need to go beyond that, far beyond that. When we stop there and write yet another paper about bias in the fossil record, that is what our colleagues hear. When they hear this repeatedly, they conclude that the fossil record is not worth bothering with. We need to go the next step by sampling and analyzing the fossil record with its structure in mind. We need to use that structure to answer questions about the history of life over those long time scales where paleontology excels. We have real success stories, people that are already doing this, and these guide our way forward.

Conservation paleobiology is my first example. So many taphonomic studies of the 1970s and 1980s and onward catalogued the many ways in which the fossil record is so different from a modern ecological field sample. It was a message of bias and incompleteness, that our data would never satisfy a modern ecologist. Through her comprehensive examinations of live-dead comparisons, Susan Kidwell (Reference Kidwell2002, Reference Kidwell2013) showed that the fossil record contains a high-fidelity record of species richness and especially abundance, a pattern both unexpected and most welcome. The field of conservation paleobiology is now a robust one, a model of how the fossil record is directly useful for establishing baselines for modern ecological studies. The key was to embrace the structure of the record. The key was that time-averaging is good; rather than apologize for it, we need to capitalize on it.

My second example comes from stratigraphic paleobiology (Patzkowsky and Holland Reference Patzkowsky and Holland2012). We have a tremendous desire to understand why ecosystems go off the rails during mass extinctions and biotic invasions, and how they recover afterwards. In the past, the tendency had been to go through a single stratigraphic column, documenting the upward changes in the fossils and treating that as a simple history or time series. We now know that most of these stratigraphic changes in faunal composition are the result of sampling different environments over time. By knowing the sequence stratigraphic architecture, we can now design sampling strategies that let us distinguish these environmental changes from temporal changes within one environment. This is not simply removing a bias: stratigraphic paleobiology lets us understand ecological changes over time across an entire landscape, as well as the variation among environments to the same disturbance. Taking into account the structure of the record provides us with a richer interpretation.

My third example comes from phylogenetic studies, where as I mentioned earlier, a common assumption is that preservation is constant over the earth and through time. Peter Wagner and Jonathan Marcot (Reference Wagner and Marcot2013) showed how, with a relatively simple segregation of their data in time bins on different continents, they could allow preservation probability to vary through time and space, producing superior estimates of divergence times. Others have had similar success (Sansom et al. Reference Sansom, Randle and Donoghue2014, Silvestro et al. Reference Silvestro, Schnitzler, Liow, Antonelli and Salamin2014), and macrostratigraphy (Peters 2006) has great promise for allowing these kinds of approaches to be done more widely. All of these hinge on understanding and embracing the structure of the fossil record and the sedimentary record in which it is found.

As paleontologists, we have an extraordinary data set at our disposal, and we have the expertise to understand it. We have something that no other field of biology has—time, deep time—and we need to play to that strength. We have access to worlds far different from our own, with biotas, geographies, and climates unlike anyone has seen. All of these offer opportunities to test ideas about how the biological world operates. We cannot test every modern biological process, because some of them operate on a time scale far too fast for us to resolve, but we can test those processes that operate over longer expanses of time that are utterly inaccessible to modern biology. This is what we have to offer. The start is to think about the structure of the fossil record and use that to frame our tests. That structure does not inhibit our analyses; it should guide how we do them. It is time for us to move on from bias and focus on structure.

References

Holland, S.M., 2000, The quality of the fossil record—a sequence stratigraphic perspective, in Erwin, D.H., and Wing, S.L, eds.., Deep Time: Paleobiology’s Perspective: Lawrence, Kansas, The Paleontological Society, p. 148168.Google Scholar
Holland, S.M., 2016, The non-uniformity of fossil preservation: Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, v. 371, no. 20150130–11.Google ScholarPubMed
Kidwell, S.M., 2002, Time-averaged molluscan death assemblages: palimpsests of richness, snapshots of abundance: Geology, v. 30, p. 803806.Google Scholar
Kidwell, S.M., 2013, Time-averaging and fidelity of modern death assemblages: building a taphonomic foundation for conservation palaeobiology: Palaeontology, v. 56, p. 487522.Google Scholar
Kowalewski, M., and Bambach, R.K., 2003, The limits of paleontological resolution, in Harries, P.J., ed., Approaches in High-Resolution Stratigraphic Paleontology: Dordrecht, Netherlands, Kluwer Academic Publishers, p. 148.Google Scholar
Patzkowsky, M.E., and Holland, S.M., 2012, Stratigraphic Paleobiology: Understanding the Distribution of Fossil Taxa in Time and Space: Chicago, The University of Chicago Press, 256 p.CrossRefGoogle Scholar
Peters, S.E., 2006, Macrostratigraphy of North America: The Journal of Geology, v. 114, p. 391412.CrossRefGoogle Scholar
Sansom, R.S., Randle, E., and Donoghue, P.C.J., 2014, Discriminating signal from noise in the fossil record of early vertebrates reveals cryptic evolutionary history: Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, v. 282, no. 20142245.Google Scholar
Silvestro, D., Schnitzler, J., Liow, L.H., Antonelli, A., and Salamin, N., 2014, Bayesian estimation of speciation and extinction from incomplete fossil occurrence data: Systematic Biology, v. 63, p. 349367.Google Scholar
Wagner, P.J., and Marcot, J.D., 2013, Modelling distributions of fossil sampling rates over time, space and taxa: assessment and implications for macroevolutionary studies: Methods in Ecology and Evolution, v. 4, p. 703713.Google Scholar