Non-technical Summary
The study of the history of life (paleobiology) relies heavily on data preserved in various forms. Over the past 50 years, there has been a big shift toward using computers and other technology to store and analyze these data, opening up new possibilities for research. As a result, the amount of data available has increased dramatically. However, with this growth comes the responsibility to make sure everyone in the paleobiological community can collect, store, share, analyze, and use data in a fair and sustainable way. This review looks at how well we have done in this regard over the past five decades and what challenges still lie ahead. While progress has been made in creating tools for sharing digital data, there are still many issues we must address. These include the process of how fossil data are collected and biases based on social and economic factors (e.g., wealth and access to resources). To address these challenges, everyone in the paleobiological community needs to work together. We provide suggestions for actions that individuals, their teams, academic journals, and societies can take to promote equity in the field now and into the future.
Introduction
The history of life on Earth is uniquely preserved in paleobiological data. These data take numerous forms, including taxonomic, anatomical, molecular, morphological, (paleo)ecological, geographic, and stratigraphic. They have been used for centuries to answer fundamental and diverse questions about biodiversity and evolutionary patterns throughout the Phanerozoic and beyond (Phillips Reference Phillips1860; Raup Reference Raup1972; Sepkoski Reference Sepkoski and Walliser1996; Betts et al. Reference Betts, Puttick, Clark, Williams, Donoghue and Pisani2018; Cohen and Kodner Reference Cohen and Kodner2022; Finnegan et al. Reference Finnegan, Harnik, Lockwood, Lotze, McClenachan and Kahanamoku2024).
The journal Paleobiology was founded in 1975 just as the field of paleobiology, and paleontology more boradly, was undergoing a computational revolution. This revolution opened multiple new avenues for paleobiologists to record, store, and analyze paleobiological data (Davies et al. Reference Davies, Rahman, Lautenschlager, Cunningham, Asher, Barrett and Bates2017; Pandolfi et al. Reference Pandolfi, Raia, Fortuny and Rook2020). The research community that Paleobiology now encompasses builds on this rich history to explore exciting and dynamic research rooted in data-driven and computational methods (Raup Reference Raup1991; Cunningham et al. Reference Cunningham, Rahman, Lautenschlager, Rayfield and Donoghue2014; Davies et al. Reference Davies, Rahman, Lautenschlager, Cunningham, Asher, Barrett and Bates2017; Dillon et al. Reference Dillon, Dunne, Womack, Kouvari, Larina, Claytor and Ivkić2023). Over the last 50 years, as the variety of research questions and approaches have increased in number (e.g., Seddon et al. Reference Seddon, Mackay, Baker, Birks, Breman, Buck and Ellis2014), so too have compilations of paleobiological data. With an ever-growing amount of data, it is critical that our tools, systems, and repositories keep up with growing research demand and continue to innovate in order to best serve our diverse community (Payne et al. Reference Payne, Smith, Kowalewski, Krause, Boyer, McClain, Finnegan, Novack-Gottshall and Sheble2012; Seddon et al. Reference Seddon, Mackay, Baker, Birks, Breman, Buck and Ellis2014; Kaufman and PAGES 2k Special-Issue Editorial Team Reference Kaufman2018; Smith et al. Reference Smith, Raja, Clements, Dimitrijević, Dowding, Dunne and Gee2023b).
Data Equity
Data equity is a growing movement for more responsible data work, from data collection and storage to analysis and data-driven decision making (Jagadish et al. Reference Jagadish, Stoyanovich and Howe2023; data.org 2024). “Equity” is the term given to the pursuit of ensuring fair treatment, equality of opportunity, and fairness in access to information and resources for all. Achieving equity in science, including paleobiology, is an ongoing challenge that requires sustained action from across the scientific community (Sugimoto et al. Reference Sugimoto, Robinson-Garcia, Murray, Yegros-Yegros, Costas and Larivière2017; Bernard and Cooperdock Reference Bernard and Cooperdock2018; Dutt Reference Dutt2020; North et al. Reference North, Hastie and Hoyer2020; Posselt Reference Posselt2020; Ranganathan et al. Reference Ranganathan, Lalk, Freese, Freilich, Wilcots, Duffy and Shivamoggi2021; Muralidhar and Ananthanarayanan Reference Muralidhar and Ananthanarayanan2024). Equity strengthens science in a multitude of interconnected ways, providing diversity of experiences, knowledge, and skills (Dutt Reference Dutt2018; A.-M. Núñez et al. Reference Núñez, Rivera and Hallmark2020; Emery et al. Reference Emery, Bledsoe, Hasley and Eaton2021; M. A. Nuñez et al. Reference Nuñez, August, Bacher, Galil, Hulme, Ikeda and McGeoch2024).
Data equity is an essential component of equity in science. We define data equity as the responsible, accessible, and sustainable collection, sharing, analysis, and use of scientific data. In this article, we present an overview of data equity in paleobiology, focusing on the kinds of research typically published by Paleobiology. We outline where significant progress has and is being made and point to current and future challenges that our field must systemically address in order to increase data equity over the next half century and beyond.
Data Collection
Fossil Collection and Study
Fossils are the primary unit of paleobiological data, and their associated geographic, stratigraphic, ecological, and morphological data underpin a vast array of paleobiological studies (i.e., the “extended specimen”; Webster Reference Webster2017; Lendemer et al. Reference Lendemer, Thiers, Monfils, Zaspel, Ellwood, Bentley and LeVan2020). The collection and documentation of fossil specimens is therefore foundational to any paleobiological investigation on any scale (Fig. 1). Principles of data equity apply to paleobiological data collection in that all data collection should be carried out in a responsible and sustainable way that is widely accessible to all paleobiologists.
It is widely documented that the known fossil record provides an invaluable but imperfect view of how biodiversity has changed over Earth’s history, due to various taphonomic, geological, and anthropogenically introduced sampling biases (Raup Reference Raup1972; Behrensmeyer et al. Reference Behrensmeyer, Kidwell and Gastaldo2000; Alroy et al. Reference Alroy, Marshall, Bambach, Bezusko, Foote, Fürsich and Hansen2001; Smith and McGowan Reference Smith and McGowan2011; Vilhena and Smith Reference Vilhena and Smith2013; Close et al. Reference Close, Evers, Alroy and Butler2018; Whitaker and Kimmig Reference Whitaker and Kimmig2020; Benson et al. Reference Benson, Butler, Close, Saupe and Rabosky2021). While there is a large and growing body of research on how quantitative methods can alleviate some of these limitations imposed by fossil record biases (e.g., Warnock et al. Reference Warnock, Heath and Stadler2020; Smith et al. Reference Smith, Handley and Dietl2022; Antell et al. Reference Antell, Benson and Saupe2023; Dillon et al. Reference Dillon, Dunne, Womack, Kouvari, Larina, Claytor and Ivkić2023; Reitan et al. Reference Reitan, Martino and Liow2024), considerably less attention has been paid to how we, as paleobiologists, impose additional biases through data collection and handling that are far less easy to solve through quantitative means. Inequities and global geographic biases in how paleobiological data are collected can influence downstream analysis of global/regional patterns.
There is a long history of fossils being collected by humans (e.g., Cortés-Sánchez et al. Reference Cortés-Sánchez, Simón-Vallejo, Corral, del C. Lozano-Francisco, Vera-Peláez, Jiménez-Espejo and García-Alix2020), but the scale of collection increased dramatically as paleontology developed around an extractive process connected to mining, quarrying, and systematic mapping surveys (Schofer Reference Schofer2003; Manias Reference Manias2015; Das and Lowe Reference Das and Lowe2018; Monarrez et al. Reference Monarrez, Zimmt, Clement, Gearty, Jacisin, Jenkins and Kusnerik2022; Stewens et al. Reference Stewens, Raja and Dunne2022). Over centuries, this extraction and exploitation of fossil-rich and paleontologically significant regions of the world has continued, facilitated in particular by European colonialism in the nineteenth century (Aldrich Reference Aldrich2009; Manias Reference Manias2015; Zuroski Reference Zuroski2017; Das and Lowe Reference Das and Lowe2018; Yen Reference Yen2024). Today, due to the fundamental nature of fossil specimens (as limited, sought-after physical specimens), their extraction continues to underpin paleobiological research. However, there is a growing awareness within the paleobiological community regarding the connection between fossil collecting and issues related to equity, ethics, socioeconomics, legality, environmental degradation, and distress for Indigenous communities (Bradley Reference Bradley2010; Cisneros et al. Reference Cisneros, Raja, Ghilardi, Dunne, Pinheiro, Fernández and Sales2022; Dunne et al. Reference Dunne, Raja, Stewens, Zin-Maung-Maung-Thein and Zaw2022; Monarrez et al. Reference Monarrez, Zimmt, Clement, Gearty, Jacisin, Jenkins and Kusnerik2022; Raja et al. Reference Raja, Dunne, Matiwane, Khan, Nätscher, Ghilardi and Chattopadhyay2022; Kempf et al. Reference Kempf, Olson, Monarrez, Bradley, Keane and Carlson2023).

Figure 1. Simplified flowchart illustrating generalized steps in paleobiological research processes and the various factors that introduce inequity with regard to data collection, storage, study, analysis, publication, and reuse. Note: inequitable factors may be relevant at more steps than indicated, but are anticipated to be acute where included. FAIR, Findable, Accessible, Interoperable, and Reusable; TRUST, Transparency, Responsibility, User focus, Sustainability, and Technology.
Global Inequalities in Knowledge Generation
Compilations of paleontological data exhibit a strong association between the production of published knowledge and wealthier, more politically stable countries, especially in North America and western Europe (Raja et al. Reference Raja, Dunne, Matiwane, Khan, Nätscher, Ghilardi and Chattopadhyay2022). This same asymmetrical pattern is also seen in disciplines allied to paleobiology, such as in modern biodiversity data compilations (Boakes et al. Reference Boakes, McGowan, Fuller, Chang-qing, Clark, O’Connor and Mace2010; Amano and Sutherland Reference Amano and Sutherland2013; Hughes et al. Reference Hughes, Orr, Ma, Costello, Waller, Provoost, Yang, Zhu and Qiao2021; Trisos et al. Reference Trisos, Auerbach and Katti2021). Many socioeconomic factors related to wealth, access to education, security, and working conditions determine who can participate in scientific research (Bernard and Cooperdock Reference Bernard and Cooperdock2018; Nuñez et al. Reference Nuñez, Chiuffo, Pauchard and Zenni2021; Valenzuela-Toro and Viglino Reference Valenzuela-Toro and Viglino2021). If we are to work toward a world with equitable knowledge production in paleobiology, it is important to not only be aware of these entrenched biases when working with paleobiological data, but also to actively work to mitigate and counteract them in our own research (e.g., more equitable sharing of funding, tools, and training; Table 1).
Table 1. Recommended actions for improving and enhancing data equity in paleobiology

Parachute science (also scientific colonialism or expropriation) refers to when researchers, typically from higher-income countries, “drop in” on lower-income countries to extract scientific material and do so without acknowledging local expertise or connecting with local communities (Stefanoudis et al. Reference Stefanoudis, Licuanan, Morrison, Talma, Veitayaki and Woodall2021; Asase et al. Reference Asase, Mzumara-Gawa, Owino, Peterson and Saupe2022; Cisneros et al. Reference Cisneros, Raja, Ghilardi, Dunne, Pinheiro, Fernández and Sales2022). This practice is prevalent in paleontology and can threaten both the ethical and legal integrity of paleobiological data (Cisneros et al. Reference Cisneros, Raja, Ghilardi, Dunne, Pinheiro, Fernández and Sales2022; Dunne et al. Reference Dunne, Raja, Stewens, Zin-Maung-Maung-Thein and Zaw2022; Raja and Dunne Reference Raja, Dunne, Yates and Oosterman2022; Raja et al. Reference Raja, Dunne, Matiwane, Khan, Nätscher, Ghilardi and Chattopadhyay2022). One such recent example is the widely publicized case of “Ubirajara jubatus”, a dinosaur fossil removed from Brazil contrary to long-established national laws and housed in Germany while being studied by German and British researchers until its eventual repatriation to Brazil in 2023 (Pérez Ortega Reference Pérez Ortega2022). Parachute science can also threaten the scientific integrity of the data; for instance, important contextual stratigraphic or geographic information might be overlooked or missing, a situation that could be greatly improved through collaboration with local experts. In the case of “Ubirajara jubatus”, the fossil specimen was left without a valid taxonomic name following the retraction of the original publication at the onset of the legal investigation (Cisneros et al. Reference Cisneros, Raja, Ghilardi, Dunne, Pinheiro, Fernández and Sales2022). While these issues appear at first to be restricted to fossil specimens and not their associated data, there is a growing informal discussion around the most ethical way to handle these data when they are part of larger compilations (e.g., see Dunne et al. Reference Dunne, Raja, Stewens, Zin-Maung-Maung-Thein and Zaw2022). As awareness grows, more actions, such as repatriation of fossil specimens, are being undertaken to remedy unethical and illegal actions (Harris Reference Harris2015; Cisneros et al. Reference Cisneros, Ghilardi, Raja and Stewens2021; Stewens et al. Reference Stewens, Raja and Dunne2022). However, extraction goes beyond material objects. For example, parachute science may not only plunder geological, paleontological, or biological specimens, but also local knowledge about prospective study sites and how to navigate to those areas. (Cisneros et al. Reference Cisneros, Raja, Ghilardi, Dunne, Pinheiro, Fernández and Sales2022; Nóbrega et al. Reference Nóbrega, Alencar, Baniwa, Buell, Chaffe, Correa and Santos Correa2023; Raposo et al. Reference Raposo, da Silva, Francisco, Vieira, da Fonseca, de Assis and Kirwan2023; Coningham et al. Reference Coningham, Lewer, Acharya, Weise, Kunwar, Joshi and Khanal2024).
(Un)ethical Collection Practices
Fossils have long been used not only for scientific reasons, but also for cultural and commercial purposes, notably by Indigenous peoples across the world (Mayor Reference Mayor2005; Cortés-Sánchez et al. Reference Cortés-Sánchez, Simón-Vallejo, Corral, del C. Lozano-Francisco, Vera-Peláez, Jiménez-Espejo and García-Alix2020). Extractive practices have led to a worldwide paucity of input from Indigenous peoples and local communities in the generation of scientific and paleobiological data (Jennings et al. Reference Jennings, Anderson, Martinez, Sterling, Chavez, Garba, Hudson, Garrison and Carroll2023; Carvalho et al. Reference Carvalho, Resende, Barlow, França, Moura, Maciel and Alves-Martins2023; Kempf et al. Reference Kempf, Olson, Monarrez, Bradley, Keane and Carlson2023), highlighting a global lack of Indigenous data Sovereignty. For example, the United States has a long history of fossil dispossession from Indigenous peoples in North America (Bradley Reference Bradley2010; Kempf et al. Reference Kempf, Olson, Monarrez, Bradley, Keane and Carlson2023). Data sovereignty, in the most general sense, ensures that data are subject to the laws and governance structures of the country or nation where they are collected. Indigenous Data Sovereignty is the right of Indigenous peoples to own and govern data about their communities, resources, and lands, meaning they are in control of how these data are accessed and used (Kukutai and Taylor Reference Kukutai, Taylor, Kukutai and Taylor2016; Smith Reference Smith, Kukutai and Taylor2016; Rainie et al. Reference Rainie, Kukutai, Walter, Figueroa-Rodríguez, Walker, Axelsson, Davies, Walker, Rubinstein and Perini2019; McCartney et al. Reference McCartney, Anderson, Liggins, Hudson, Anderson, TeAika, Geary, Cook-Deegan, Patel and Phillippy2022; Diviacchi Reference Diviacchi2023). Indigenous Data Sovereignty can be implemented through Indigenous data governance, which respects and leverages the values, traditions, and roles that communities have for the care and use of their data (Carroll et al. Reference Carroll, Rodriguez-Lonebear and Martinez2019; Jennings et al. Reference Jennings, Anderson, Martinez, Sterling, Chavez, Garba, Hudson, Garrison and Carroll2023). The CARE Principles for Indigenous Data Governance (Collective Benefit, Authority to Control, Responsibility, and Ethics) were developed to ensure that data collected on Indigenous lands will ultimately benefit the peoples of those lands and that this will be conducted in a manner that is not harmful to their communities (Carroll et al. Reference Carroll, Garba, Figueroa-Rodríguez, Holbrook, Lovett, Materechera and Parsons2020; www.gida-global.org/care). The CARE Principles guide researchers to include Indigenous peoples in data governance while increasing their access to, use of, and benefit from these data (Carroll et al. Reference Carroll, Herczog, Hudson, Russell and Stall2021). Adopting the CARE Principles in paleobiological research is critical for establishing more mutually beneficial research projects involving Indigenous lands across the world (Jennings et al. Reference Jennings, Anderson, Martinez, Sterling, Chavez, Garba, Hudson, Garrison and Carroll2023; Kempf et al. Reference Kempf, Olson, Monarrez, Bradley, Keane and Carlson2023). First, nonlocal paleobiologists must be highly proactive in their outreach to and engagement with local Indigenous peoples, tribes and tribal-serving organizations, as it is the right of these groups to decide what data can be collected and shared and how that may be undertaken (Table 1). Visiting paleobiologists need to then work to understand and respect the wishes of Indigenous communities, especially when restrictions are placed on collection and sharing (publishing) of certain data (Jennings et al. Reference Jennings, Anderson, Martinez, Sterling, Chavez, Garba, Hudson, Garrison and Carroll2023; Kempf et al. Reference Kempf, Olson, Monarrez, Bradley, Keane and Carlson2023). Implementation of the CARE Principles, in collaboration with local tailored resources (e.g., the AIATSIS Code for Australian territories) is already reframing research partnerships and data stewardship in ecology, conservation science, and the geosciences beyond paleobiology (Taitingfong and Carroll Reference Taitingfong and Carroll2023; O’Brien et al. Reference O’Brien, Duerr, Taitingfong, Martinez, Vera, Jennings and Downs2024). In paleobiology, there is enormous potential to apply these principles to build more ethical data stewardship practices, infrastructures, and technologies.
Fossils are essential for all kinds of paleobiological research, yet not all fossils are made available for scientific use. Some argue that commercial fossil collecting (i.e., collecting fossils to sell for profit) ultimately benefits both science and the seller (Larson and Russell Reference Larson and Russell2014), but many others are concerned about the loss of these specimens to science and the public, as it poses a threat to data equity and accessibility (Shimada et al. Reference Shimada, Currie, Scott and Sumida2014). Numerous high-profile auctions of exceptionally complete vertebrate fossils have drawn criticism from the paleontological community, particularly with regard to their enormous price tags (Reynolds Reference Reynolds2018; Greshko Reference Greshko2020). At millions of U.S. dollars, these specimens are often far outside the budget of natural history museums, which leads to the loss of paleobiological data to commercial ventures (Lukiv Reference Lukiv2024). Even if some scientists were to have the financial means to buy certain fossils (e.g., specimens with lower price tags), this would eventually create a hierarchy among scientists: wealthier scientists would be able to gather more data than those without such resources. While the impact of these activities is primarily felt by vertebrate paleontologists, unethical practices related to commercial interests can lead to other critical issues, such as irreparable damage and loss of access to fossil sites (Raja and Dunne Reference Raja, Dunne, Yates and Oosterman2022; Swallow et al. Reference Swallow, Faulkner and Dennis2023), which disproportionately affects countries with fewer resources to designate and protect important fossil sites and perpetuates global data inequity (Kumar Reference Kumar2018; Gutiérrez-Marco and García-Bellido Reference Gutiérrez-Marco and García-Bellido2022).
Building awareness of these issues is a necessary first step in improving data-collection practices in paleobiology and moving toward equity. Several paleontological societies have committed to developing and providing guidelines to their memberships. For example, the Society for Vertebrate Paleontology provides guidance documents for working with and publishing on amber from Myanmar, as well as the commercial sale of vertebrate fossils (see www.vertpaleo.org/governance-documents). These guidelines were developed through specially formed working groups composed of researchers with experience in these areas and have since been applied to the society’s journal, the Journal of Vertebrate Paleontology (Barrett and Johanson Reference Barrett and Johanson2020). However, these actions can come with a significant time lag between the catalyst and implementation. In the case of Myanmar amber, the human rights abuses associated with amber mining in the north of the country were reported on by the United Nations several years before the paleontological community began developing guidance, and the commercial nature of fossils in Myanmar amber had been widely known for decades (Zin-Maung-Maung-Thein and Khin Zaw Reference Zin-Maung-Maung-Thein and Zaw2021; Dunne et al. Reference Dunne, Raja, Stewens, Zin-Maung-Maung-Thein and Zaw2022). Other academic publishers are becoming increasingly aware of unethical and inequitable paleobiological data collection and are establishing stricter editorial standards, including requesting official documents outlining fossil provenance and ethical declarations. For example, Nature Ecology and Evolution and Palaeontology have strict editorial policies on the publication of research surrounding Myanmar amber. The majority of other journals catering to paleobiology currently lag further behind in their implementation of such policies, highlighting how much work is yet to be done to preserve ethical and equitable data collection and sharing (Table 1).
Data Storage and Curation
Paleobiological Databases
Paleobiological data come in various forms and are stored in a variety of different places: physically (as specimens in museums and research institutions) and digitally (in large databases, various dedicated online repositories, and supplementary files). The way in which paleobiological data are stored, maintained, and managed has far-reaching implications for data equity. The issue of data equity is an integral component of the FAIR Data Principles, which are aimed at making data Findable, Accessible, Interoperable, and Reusable (Wilkinson et al. Reference Wilkinson, Dumontier, Aalbersberg, Appleton, Axton, Baak and Blomberg2016). These principles, which have rapidly come to define best practices in the management of research data, are relevant to all stakeholders involved with paleobiological data, including data collectors, curators, managers, publishers, repositories, and users. However, they have not yet been widely applied across all facets of our field, including as a required framework for data management and sharing in funding applications and as standard principles in research institutions (Table 1).
The establishment of online data repositories, such as the Paleobiology Database (PBDB; paleobiodb.org; Uhen et al. Reference Uhen, Allen, Behboudi, Clapham, Dunne, Hendy and Holroyd2023), Neotoma (neotoma.org), Triton (Fenton et al. Reference Fenton, Woodhouse, Aze, Lazarus, Renaudie, Dunhill, Young and Saupe2021), the Geobiodiversity Database (www.geobiodiversity.com; Fan et al. Reference Fan, Chen, Hou, Miller, Melchin, Shen and Wu2013), DigiMorph (https://digimorph.org), and MorphoSource (Boyer et al. Reference Boyer, Gunnell, Kaufman and McGeary2016; https://www.morphosource.org), has undoubtedly increased the accessibility and usability of paleobiological data over the last half century. The majority of these databases were instigated by, are physically stored in, and are managed by teams based in countries of the Global North, particularly in North America and western Europe (Fig. 2). Many paleobiological databases are established with a particular goal in mind (e.g., BioDeepTime; biodeeptime.github.io: cross-scale time-series analysis; Smith et al. Reference Smith, Rillo, Kocsis, Dornelas, Fastovich, Huang and Jonkers2023a) or were developed to answer specific paleobiological questions (e.g., PBDB, magnitude of Phanerozoic diversification; Alroy et al. Reference Alroy, Marshall, Bambach, Bezusko, Foote, Fürsich and Hansen2001). Over time, paleobiological databases can morph and expand in various directions, often surpassing their original purposes, and new databases are created for new research purposes. This dynamism poses challenges for the future of these repositories, such as increased need for better data integration, infrastructure updates, and long-term financial support, all of which have been highlighted extensively in biodiversity, ecology, and conservation science (Kamp et al. Reference Kamp, Oppel, Heldbjerg, Nyegaard and Donald2016; Kindsvater et al. Reference Kindsvater, Dulvy, Horswill, Juan-Jordá, Mangel and Matthiopoulos2018; Peterson and Soberón Reference Peterson and Soberón2018; Isaac et al. Reference Isaac, Jarzyna, Keil, Dambly, Boersch-Supan, Browning and Freeman2020). This is further compounded by the current international funding landscape, which often does not cater to expenses related to digital infrastructure, producing disproportionate effects on the development of data infrastructure in regions of the Global South (Fig. 2). A notable exception in domestic funding is the NSF Geoinformatics funding scheme in the United States, which supports the development of community cyberinfrastructure to advance research and education in Earth science (https://new.nsf.gov/funding/opportunities/gi-geoinformatics). These challenges also point to the need for a multifaceted approach to paleobiological data storage and curation (e.g., systems that can integrate multiple forms of data input). Although several initiatives have attempted to do this for paleobiological data (e.g., ePANDDA and iDigBio; www.idigbio.org), an all-inclusive “data lake” for integration of data from across paleobiology and allied sciences is currently lacking and could facilitate even greater synthesis research in paleobiology and beyond. Importantly, this would also greatly enhance data equity through increased accessibility and data sharing (Drew et al. Reference Drew, Moreau and Stiassny2017).

Figure 2. Locations of non-governmental/community-developed digital databases that store paleobiological data or are regularly associated with studies in paleobiology. A tile grid map was used to avoid distorting the representation of the data that is typical of standard map projections.
Museum Collection Data
Large fossil occurrence databases, such as the PBDB and Neotoma, are invaluable resources for studies of past biodiversity. However, they rely primarily on published literature, which represents only a small proportion of the paleobiological data housed in natural history collections. One study conducted several decades ago demonstrated that the published record underestimates diversity within a specific time or geographic interval by three to five times, depending on the taxonomic group (Koch Reference Koch1978). A more recent survey found that 9 museum collections in the United States held up to 23 times more unique localities for marine invertebrates than were contained in the equivalent geographic region in the PBDB (Marshall et al. Reference Marshall, Finnegan, Clites, Holroyd, Bonuso, Cortez and Davis2018), highlighting how much data could be mobilized from museum collections, particularly invertebrate data. Indeed, some biodiversity databases, such as GBIF (gbif.org) and iDigBio (idigbio.org), do integrate specimen information alongside occurrence data, which could provide feasible examples for paleobiological databases.
Mobilizing “dark data” has the potential to shine a light on underutilized fossil material that, due to a lack of resources, could be missing from the published literature, especially from institutions in the Global South (Kaiser et al. Reference Kaiser, Heumann, Nadim, Keysar, Petersen, Korun and Berger2023). Yet mobilization of this data is restricted by several factors. Substantial time, money, and effort are required to move through what can be a tedious digitization process. This often includes verifying information about how, when, and where a specimen was collected (e.g., upholding Darwin Core standards); holding requisite taxonomic expertise to check and update taxonomic assignments; detailed photography with specialized equipment (e.g., StackShot photography); and entry into databasing software (e.g., Specify). This information then may or may not be integrated into broader tools such as iDigBio, GBIF, or the field-specific databases mentioned earlier (Nelson et al. Reference Nelson, Paul, Riccardi and Mast2012; Paterson et al. Reference Paterson, Albuquerque, Blagoderov, Brooks, Cafferty, Cane and Carter2016; Allmon et al. Reference Allmon, Dietl, Hendricks, Ross, Rosenberg and Clary2018; Marshall et al. Reference Marshall, Finnegan, Clites, Holroyd, Bonuso, Cortez and Davis2018). This extensive work can only be carried out by larger, resource-rich institutions (Booth et al. Reference Booth, Navarrete and Ogundipe2021), further highlighting the interconnectedness of data equity challenges in paleobiology.
Just as data from online data repositories are not an accurate reflection of the fossil record, museum collections are not complete reflections of fossil-bearing outcrops (Lieberman and Kaesler Reference Lieberman and Kaesler2000). Studies have found that both invertebrate (Whitaker and Kimmig Reference Whitaker and Kimmig2020; Nanglu and Cullen Reference Nanglu and Cullen2023) and vertebrate (Davis and Pyenson Reference Davis and Pyenson2007) collections show anthropogenically introduced biases, such as those based on gender, sex, and race, which can result in greater uncertainty in diversity and abundance estimates and have important implications for data equity (Das and Lowe Reference Das and Lowe2018; Cooper et al. Reference Cooper, Bond, Davis, Miguez, Tomsett and Helgen2019). Specific collection criteria (e.g., set by institutions) and methodological choices by collectors (e.g., what pieces of information to record) are likely to be invisible to data users, meaning that these issues are challenging to identify and mitigate, especially for researchers who are more isolated due to geography, economics, and politics. Future work should endeavor to report the differences between collection protocols and initial sampling of material to provide transparency and allow these issues to be mitigated (e.g., Nanglu and Cullen Reference Nanglu and Cullen2023).
Natural history collections are bastions in which paleobiological research is rooted, yet they are also facing significant challenges across the globe. In the Global North, reduced government funding due to political and socioeconomic changes (Dalton Reference Dalton2003; Kemp Reference Kemp2015; Zamudio et al. Reference Zamudio, Kellner, Serejo, de Britto, Castro, Buckup and Pires2018) has led to museums needing to reorient themselves as market actors (DesRoches Reference DesRoches2015). One such example is related to “new museology,” a necessary and relevant discourse around the roles of museums in society and politics (Vergo Reference Vergo1989), following which the museum landscape has diversified to reorient museums to include a greater focus on societal and political issues (McCall and Gray Reference McCall and Gray2014). While this is a necessary step to building more equitable and inclusive museum spaces, it can also be argued that this has hastened a shift in museum priorities toward entertainment and education (DesRoches Reference DesRoches2015). This overall “marketization” of museums has therefore resulted in a change of organizational approach from an internal focus on scholarship and curation to an externally oriented corporatist model of growth (McCall and Gray Reference McCall and Gray2014; DesRoches Reference DesRoches2015). The prioritization of short-term economic goals over long-term values results in staff cuts, the commodification of labor, and reduced financial support for collection maintenance (Suarez and Tsutsui Reference Suarez and Tsutsui2004; DesRoches Reference DesRoches2015; Miller et al. Reference Miller, Barrow, Ehlman, Goodheart, Greiman, Lutz and Misiewicz2020). With improvements to collections being reduced to triage and more critical data (specimens, archive materials, etc.) being added year upon year, stockpiled information will be increasingly difficult to access. Further exacerbating this situation is the “widening role” of museum curators, who are increasingly tasked with expanded managerial, administrative, educational, outreach, conservation, and digitization responsibilities (discussed in the next section) alongside traditional curation (McCall and Gray Reference McCall and Gray2014). Inadequate staffing of museum collections can also lead to the loss and degradation of both physical material and data, creating a vicious cycle wherein the workload is ever increasing. In natural history museums of the Global South, these same challenges exist but are greatly exacerbated through the legacy of colonial practices and socioeconomic inequalities (Booth et al. Reference Booth, Navarrete and Ogundipe2021). Many fossil specimens from the Global South have been, and continue to be, transported to repositories in the Global North, making them largely inaccessible to local communities and curators. Repatriation of fossils is one important step in rebalancing global inequalities in paleobiology, but it requires careful considerations about infrastructure, as well as financial resources and available expertise (Cisneros et al. Reference Cisneros, Ghilardi, Raja and Stewens2021; Sebuliba et al. Reference Sebuliba, Wesche and Xylander2021; Zin-Maung-Maung-Thein and Khin Zaw Reference Zin-Maung-Maung-Thein and Zaw2021; Stewens et al. Reference Stewens, Raja and Dunne2022). This highlights the need for a multifaceted approach to improving data equity in paleobiology that not only encompasses technological advancements, but also historical, socioeconomic, and political factors (Table 1).
Digitization
Advances in technology and computing over the last 50 years have allowed paleobiologists to develop an increasing diversity of tools to collect, collate, store, share, analyze, and visualize paleobiological data. In particular, the process of digitization has democratized and greatly improved the accessibility of data in paleobiology and allied sciences (Maschner and Schou Reference Maschner and Schou2013; Drew et al. Reference Drew, Moreau and Stiassny2017; Science Europe Reference Europe2018; Nagaraj et al. Reference Nagaraj, Shears and de Vaan2020; Nagendra et al. Reference Nagendra, Goswami and Mundoli2024). Digitization transforms data from physical material, such as fossil specimens into a digital format (e.g., digital images, occurrence data, and morphological measurements), typically stored in digital repositories and databases. However, access to these digital data, as well as the resources to manage, store, and analyze these data, are not equitable across the whole paleobiological community.
Importantly for paleobiology, it is not just the digital versions of the data that require long-term preservation, the physical materials (e.g., fossil specimens) that are foundational to these data also require long-term preservation, which is often as, or even more, challenging. Natural history museums are critical for studies of past, present, and future life on Earth, particularly as they provide data that have enormous potential for tackling the current biodiversity crisis (Suarez and Tsutsui Reference Suarez and Tsutsui2004; Plotnick et al. Reference Plotnick, Smith and Lyons2016; Meineke et al. Reference Meineke, Davies, Daru and Davis2018). They not only serve research and educational needs, but also provide a high monetary and social return for communities (Booth et al. Reference Booth, Navarrete and Ogundipe2021; Popov et al. Reference Popov, Roychoudhury, Hardy, Livermore and Norris2021). Many natural history museums across the world are therefore committed to digitizing their collections, thus providing wider access to their fossil data (Nelson and Ellis Reference Nelson and Ellis2019; Bakker et al. Reference Bakker, Antonelli, Clarke, Cook, Edwards, Ericson and Faurby2020; Hedrick et al. Reference Hedrick, Heberling, Meineke, Turner, Grassa, Park and Kennedy2020; Sandramo et al. Reference Sandramo, Nicosia, Cianciullo, Muatinte and Guissamulo2021). Digital representations of physical fossil specimens (e.g., photographs, 3D scans, or genetic sequences) have dramatically expanded the impact of natural history collections, making them more accessible and transforming research and researchers (Drew et al. Reference Drew, Moreau and Stiassny2017; Blackburn et al. Reference Blackburn, Boyer, Gray, Winchester, Bates, Baumgart and Braker2024).
Several museums have already made subsets of their data publicly accessible on their websites, for example, the Natural History Museum in London, UK, the Smithsonian Institution in Washington DC, USA, and the Paleontological Research Institution in Ithaca, New York, USA (Hendricks et al. Reference Hendricks, Stigall and Lieberman2015). However, digitization (and the associated data storage) requires a significant amount of resources (e.g., financial) that are not available equally to all museums (Vollmar et al. Reference Vollmar, Macklin and Ford2010; Allmon et al. Reference Allmon, Dietl, Hendricks, Ross, Rosenberg and Clary2018). It also requires a diverse array of expertise, not only in terms of technological and museum expertise, but also regarding taxonomic expertise to ensure that preexisting errors and biases are not exaggerated. Above all else, digitization must fit the requirements and needs of those who use the data. Some digitization processes result in images being made available online, which is both attractive to the general public wanting to explore the content of various collections and useful for researchers and students wishing to access anatomical, morphological, and taxonomic information for specimens without the need to access the specimens in person. One example of a project to increase the accessibility of fossil specimens through digital 2D and 3D representations is the University of Michigan Museum of Paleontology’s UMORF project (University of Michigan Online Repository of Fossils; https://umorf.ummp.lsa.umich.edu/wp/). The goal of the project is to serve a range of different user communities, from researchers and students to the general public, as well as to highlight the type and figured collection (specimens that are the basis for descriptions of new species and new interpretations of known species) and the parts of the museum’s collection that could be used for comparative study.
Historical and Global Inequalities
Digitization of museum specimens should also be carried out with an awareness of historic injustices and inequalities, otherwise it has the capacity to perpetuate these issues (Kaiser et al. Reference Kaiser, Heumann, Nadim, Keysar, Petersen, Korun and Berger2023). Many natural history collections have colonial or exploitative roots, and digitization of these data integrates assumptions about communities, capacities, and values that can reinforce inequalities in paleobiology (Heumann et al. Reference Heumann, Stoecker, Tamborini and Vennen2018; Cisneros et al. Reference Cisneros, Raja, Ghilardi, Dunne, Pinheiro, Fernández and Sales2022; Kaiser et al. Reference Kaiser, Heumann, Nadim, Keysar, Petersen, Korun and Berger2023). In 2021, the Biodiversity Heritage Library (BHL), the largest open access digital library for biodiversity literature and archives, adopted an Acknowledgement of Harmful Content. This initiative acknowledges the existence of harmful content in many of the BHL’s collections, which reflects centuries of historical decisions, practices, and colonial processes. The BHL website provides users with resources to critically evaluate content alongside an opportunity to report instances of harmful content through a feedback form (www.about.biodiversitylibrary.org/about/harmful-content). At the Museum für Naturkunde in Berlin, Germany, interdisciplinary researchers and museum staff are leading an ambitious digitization project that aims to increase the accessibility and equity of the Tendaguru dinosaur collection. Between 1909 and 1913, during German colonial rule, countless dinosaur fossils were taken from the Tendaguru Formation in southeastern Tanzania, then German East Africa (Schwarz and Heumann Reference Schwarz and Heumann2023). The project, funded by the German research foundation (Deutsche Forschungsgemeinschaft; DFG), is working to digitize a vast amount of data from the Tendaguru dinosaur collection, including the fossils themselves, through photographs, 3D models, and archival material associated with the colonial expeditions. The digitized material will be stored on a single data platform, which will enhance both the accessibility and transparency of the collection and will enable more equitable research development with Tanzanian colleagues. Furthermore, the project is specifically being conducted within the FAIR framework (Wilkinson et al. Reference Wilkinson, Dumontier, Aalbersberg, Appleton, Axton, Baak and Blomberg2016), highlighting the potential of this framework for similar future projects in paleobiology.
Data collection in paleobiology often involves visiting museums or gathering digital data from museum specimens, yet this can be logistically, financially, or politically infeasible—or even impossible—for many researchers across the globe (Bezuidenhout and Chakauya Reference Bezuidenhout and Chakauya2018). Most major natural history museums are located in high-income countries of the Global North, such as in the United States or Europe, and traveling to these museums from lower-income countries can carry a heavy financial and administrative burden. Some scientists simply will not be granted entry to certain countries solely on the basis of their citizenship (Talavera-Soza Reference Talavera-Soza2023; Chugh and Joseph Reference Chugh and Joseph2024). Further compounding this issue is the “digital divide” (Lythreatis et al. Reference Lythreatis, Singh and El-Kassar2022), whereby different demographic regions have varying degrees of access to the tools and resources necessary for processing and analyzing digital data. In paleobiology, the digital divide often shapes the kind of work that researchers in particular regions can engage in (Abungu Reference Abungu2002; Mogajane Reference Mogajane2022; Sánchez Membrilla Reference Sánchez Membrilla2024). For example, licensed computer software essential for the processing of 3D images and scans can be prohibitively expensive, especially when combined with the need for powerful devices to run such software. The urgent need for greater resources to be pooled into increasing the accessibility of museum collections was highlighted during the COVID-19 pandemic. At the height of the pandemic, researchers were not able to travel to collections for study, but with the urgency of the pandemic noticeably reduced and the changing landscape in the museum sector, the realities of making museum collections more accessible are becoming increasingly complex. Equitable research collaborations that are built on mutual trust and the sharing of resources are critical to overcoming many accessibility barriers. Researchers based in proximity to specimens or with expertise in certain data repositories can be invaluable connections for those who are more marginalized.
Data Sharing
Data Accessibility in Paleobiology
Since the 1970s, paleobiology has continued to embrace new and emerging digital technologies, and a knock-on effect of this is that data sharing has become increasingly simple. The sharing of data is an integral part of the research process, as it permits increased access to knowledge for all and improves transparency and reproducibility in science. Sharing research data can also have individual benefits, such as increased citation rates (Piwowar et al. Reference Piwowar, Day and Fridsma2007). However, uptake of open access (OA) and open data practices is not uniform across the world (see Fig. 2). For academics within low- and middle-income countries, current academic rewards (citations, altmetrics, institutional visibility, etc.) are severely compromised by structural limitations on data generation and aforementioned failures in data-collection practices, such as parachute science. This results in a disincentivization of participation in Open Data (Bezuidenhout and Chakauya Reference Bezuidenhout and Chakauya2018). Data sharing, consequently, requires an environment where there is sufficient academic security and benefit for the contributing scientists (Bezuidenhout and Chakauya Reference Bezuidenhout and Chakauya2018; Smith et al. Reference Smith, Raja, Clements, Dimitrijević, Dowding, Dunne and Gee2023b).
Efforts in increasing social and scientific equity within paleontology has meant that there has been an overall positive trend toward open science and open data in recent years, fueled in part by new infrastructure and policies introduced by funding agencies and scientific journals (e.g., means-tested fee waivers for OA publishing). However, there is not yet a consistent requirement or protocol for the digital storage or sharing of paleobiological data (Rowe and Frank Reference Rowe and Frank2011; Davies et al. Reference Davies, Rahman, Lautenschlager, Cunningham, Asher, Barrett and Bates2017; Dillon et al. Reference Dillon, Dunne, Womack, Kouvari, Larina, Claytor and Ivkić2023, Smith et al. Reference Smith, Raja, Clements, Dimitrijević, Dowding, Dunne and Gee2023b). In this regard, paleobiology is lagging somewhat behind other allied disciplines such as the geospatial research and environmental sciences that are embracing community standards for data repositories, access, and reuse (Seltmann et al. Reference Seltmann, Lafia, Paul, James, Bloom, Rios and Ellis2018; Kinkade and Shepherd Reference Kinkade and Shepherd2021; Crystal-Ornelas et al. Reference Crystal-Ornelas, Varadharajan, O’Ryan, Beilsmith, Bond-Lamberty, Boye and Burrus2022).
As paleontology data resources proliferate, the broad application of the TRUST principles (Transparency, Responsibility, User focus, Sustainability, and Technology) may serve to ensure good governance of shared resources and proper attribution for all contributions. The TRUST principles were developed to demonstrate the trustworthiness of digital repositories (Lin et al. Reference Lin, Crabtree, Dillo, Downs, Edmunds, Giaretta and De Giusti2020). This includes communicating a clear mission statement and repository policies, which would facilitate paleobiological data discovery and provide governance for necessary long-term preservation of data. These principles also provide a common framework to facilitate discussion and implementation of best practices in digital preservation by all stakeholders, including researchers, their institutions, academic publishers, and the digital repositories. For these principles to serve their intention, data equity must be centered in all research processes from data access to data entry, use, and attribution.
Principles in practice require guidelines and facilitation. Acknowledging the increase in quantitative studies in paleobiology due to data sharing, some journals have introduced data editors who ensure that all data related to a publication are available, together with associated materials (e.g., coding scripts) (Table 1). These editors serve to enhance the reproducibility of studies and are currently installed at journals such as Proceedings of the Royal Society B, Journal of Evolutionary Biology, and the American Naturalist. Some journals, such as those published by the British Ecological Society, have produced guides for authors on the topics of data management and reproducible code (www.britishecologicalsociety.org/publications/better-science). To further promote better data-sharing practices, paleobiological journals could enforce requirements for authors to make their data freely accessible in data repositories such as MorphoBank, the Paleobiology Database, and Phenome10K (phenome10k.org), instead of in supplementary data files. Indeed, some paleobiological journals (e.g., Paleobiology and Journal for Vertebrate Paleontology) already do this through associations with Figshare, Dryad, Zenodo, and MorphoBank. Although this entails an additional step in the publication process, authors are also likely to benefit through greater citations of their work when data are shared (Colavizza et al. Reference Colavizza, Hrynaszkiewicz, Staden, Whitaker and McGillivray2020; Dorta-González et al. Reference Dorta-González, González-Betancor and Dorta-González2021).
Open Access Publishing
Open access (OA), the movement that seeks to grant free and open online access to academic information, is becoming increasingly more widespread, including within the field of paleobiology (Fig. 3). Despite the promise to make science more inclusive, capacities to engage with OA vary considerably across regions, institutions, and demographics (Bezuidenhout and Chakauya Reference Bezuidenhout and Chakauya2018; Ross-Hellauer et al. Reference Ross-Hellauer, Reichmann, Cole, Fessl, Klebel and Pontika2022). For example, green and gold OA options do not charge fees to readers, but charge fees to authors, making them unattainable for many researchers and institutions who may lack the means to pay the large fees required. For example, many funding agencies in India do not support publication charges, making it impossible for researchers to choose OA for their publications. Without a free OA or generous waiver policy, these researchers are excluded from circulating their published articles to a wider audience.

Figure 3. The total number of articles published in the journal Paleobiology from 1977 to 2023 according to the data indexed by Web of Science (2024; see reference for access details). The number of open access (OA) articles has increased steadily over time since the 1990s.
In 2014, the Alliance of German Science Organisations initiated Projekt DEAL (now known as DEAL Konsortium; www.deal-konsortium.de), which sought to negotiate transformative OA agreements with the largest commercial publishers of academic journals (Elsevier, Springer Nature, and Wiley). The consortium was successful in negotiations and continues to work toward fair pricing structures for academic publishing with the aim of increasing accessibility and visibility of researchers’ work (Vogel Reference Vogel2023). In 2018, Plan S was launched by a consortium of national funding agencies in Europe, which requires researchers who benefit from state funding to publish their work in open repositories or journals. Although this mandate will only apply to authors who produced approximately 6% of the world’s papers (based on an estimate from 2017), it is still expected to make a sizable impact in the long term, as the mandate applies to about one-third of papers published in Nature and Science (Brainard Reference Brainard2020).
The diamond OA publishing model, in which fees are not charged to authors or readers, addresses many issues related to financial constraints. Some journals catering to paleobiological research already adhere to the diamond OA model, including Palaeontologia Electronica and Lethaia. Despite the obvious advantages for increasing access and data equity, there continues to be a noticeable lack of transparent open science publications in paleobiology (Drage and Wong Hearing Reference Drage and Hearing2023). Preprints are an increasingly popular option, not only for researchers to showcase their work to a public audience, but also to gain additional feedback beyond the closed peer-review system (Sarabipour et al. Reference Sarabipour, Debat, Emmott, Burgess, Schwessinger and Hensel2019). Preprint services, such as bioRxiv, EcoEvoRxiv, and EarthArXiv, that cater to paleontological research papers, are rapidly becoming more popular.
Paleobiology recently announced that it will be transitioning entirely to an OA model, with a four-tiered approached based on agreements with institutions and funding bodies, locations of authors, and a waiver request form. Encouragingly, when announced, Cambridge University Press and the Paleontological Society stated a commitment to approving all waivers not covered by other funding sources. This move appears to bring Paleobiology's publishing model to meet the criteria for diamond OA. Bibliometric data indexed by Web of Science show that the total number of articles published in Paleobiology has remained steady over the last five decades, while the number of OA articles has generally increased since the mid-1990s (Fig. 3). Fifteen percent of all Paleobiology articles indexed by Web of Science in the last 47 years are available under green, gold, or hybrid OA agreements (Web of Science 2024). In 2023, OA articles accounted for more than three-quarters (76%) of publications in Paleobiology (Fig. 3). Looking at the trends for Paleobiology, the field of paleobiology is comparatively more open than it was a decade ago (Fig. 3), yet there is much room for improvement. The majority of paleobiological journals still use a publishing model that is unfair, inequitable, and unsustainable for global science. Models such as diamond OA promote greater flexibility, accessibility, and data equity, and there is an increasing appetite within the research community for transformative changes to the status quo (Drage and Wong Hearing Reference Drage and Hearing2023).
Language Barriers
English remains the lingua franca of scientific research, including paleobiology. Today, 98% of all scientific publications are estimated to be in English (Gordin Reference Gordin2015). In the last 30 years, 92% of publications recorded in the PBDB were written in English, with Chinese, German, French, and Spanish making up the majority of the remainder (Raja et al. Reference Raja, Dunne, Matiwane, Khan, Nätscher, Ghilardi and Chattopadhyay2022). This dominance of English disadvantages paleobiologists for whom English is a secondary language or who are based in countries with low English proficiency (Ramírez-Castañeda Reference Ramírez-Castañeda2020). In paleobiological research, the dominance of English could lead to biases through the exclusion of non-English publications, for example, in literature searches, as has been demonstrated in ecology and biodiversity research (Amano et al. Reference Amano, González-Varo and Sutherland2016; Konno et al. Reference Konno, Akasaka, Koshida, Katayama, Osada, Spake and Amano2020; Nuñez and Amano Reference Nuñez and Amano2021). Within publishing, journals should adopt linguistically inclusive policies to overcome language barriers, including providing translation tools on their web pages, promoting the use of non–English language references, and providing author guidelines in multiple languages (Arenas-Castro et al. Reference Arenas-Castro, Berdejo-Espinola, Chowdhury, Rodríguez-Contreras, James, Raja and Dunne2024). Paleobiology is among a number of journals, including Palaeontologia Electronica, Integrative Organismal Biology, and Geodiversitas, that permit authors to submit manuscript abstracts in several different languages. In many cases, these journals are based in countries where English is not the first language, such as in the case of Revista Brasileira de Paleontologia, and multiple language abstracts allow them to be accessed by a wider audience and more discoverable in more online searches (Amano et al. Reference Amano, Rios Rojas, Boum, Calvo and Misra2021a; Arenas-Castro et al. Reference Arenas-Castro, Berdejo-Espinola, Chowdhury, Rodríguez-Contreras, James, Raja and Dunne2024). Out of 55 journals catering to paleobiology, a quarter (n = 13, 24%) are indexed by Web of Science as being multilingual in some way (Smith et al. Reference Smith, Raja, Clements, Dimitrijević, Dowding, Dunne and Gee2023b). Journals and venues catering to paleobiological research should continue to embrace multilingualism, as it is a simple step toward increasing data equity for both paleobiologists and the general public (Table 1).
Future Outlook and Actions
Over the past half century, paleobiology has undergone a computational revolution that has given rise to a multitude of new avenues for recording, storing, and analyzing data on the history of life on Earth. With these advances, the amount of data available for research has grown considerably, accompanied by an expansion in the definition of paleontological data. Paleobiology once almost exclusively comprised data as counts of taxa at localities from different geological times. Now, paleontological data consists of terabytes in single images and high-resolution 3D models, as well as databases of millions of fossil occurrences, stratigraphic units, paleoenvironmental variables, and even molecular signatures. This translates to an increasing array of exciting opportunities for new research questions, but also to a critical responsibility to ensure that our data tools and infrastructures continue to innovate in order to best serve our diverse community.
In this review, we have highlighted how individual and systemic action is required to continue increasing data equity in paleobiology and to tackle ongoing challenges related to inequality, accessibility, and sustainability.
Paleobiologists can engage with data equity in a multitude of ways. The actions recommended here are by no means exhaustive, but instead should be considered a minimum requirement for all those working within and governing the paleobiology research community. There is undoubtedly a need for greater governmental and institutional support for fundamental resources that increase data equity, such as for digitization and OA publishing. However, there are still numerous actions that can be taken by individuals, teams, institutions, funders, journals, and societies to center and enhance data equity in paleobiological research.
Individuals should be proactive in ensuring that their data are collected in an ethical, equitable, and sustainable manner and can work toward promoting data equity within their collaborative networks and teams (Table 1). This can include regularly engaging with the topic of data equity through literature and other media and conducting multilingual data and literature searches. Institutions can support paleobiologists, curators, and students by offering (or even mandating) training on data equity protocols and principles, as well as providing robust infrastructure for managing and sharing data (Table 1). Academic journals that are not already working toward increasing data equity and openness should make commitments to do so. Increasing data equity requires strict data repository protocols, such as those based on the FAIR, CARE, and TRUST principles (Table 1). Journals should strive to have linguistically inclusive policies, including translation services and multi-language abstracts (Amano et al. Reference Amano, Berdejo-Espinola, Christie, Willott, Akasaka, Báldi and Berthinussen2021b; Arenas-Castro et al. Reference Arenas-Castro, Berdejo-Espinola, Chowdhury, Rodríguez-Contreras, James, Raja and Dunne2024). Academic societies can also work to promote data equity, for example, through advocating to relevant government bodies for legal protection and safeguarding of geoheritage and museums (Table 1).
We cannot know with certainty what data advances will follow in the next 50 years or further into the future. Regardless, we paleobiologists have a responsibility to safeguard the potentiality of those future data for everyone in our community by developing and committing to ethical, sustainable, and equitable data practices today.
Acknowledgments
We are grateful to colleagues for invaluable wider discussions on the topics featured in this article, particularly, Á. Kocsis, L. Mulvey, and T. Clements. We also thank the editors of the 50th Anniversary Issue for inviting us to contribute this article and promoting such a critical topic in our field. We are enormously grateful for the constructive and positive reviews of B. Allen, H. L. Kempf, and an anonymous reviewer. E. M. Dunne was supported by an Emerging Talents Initiative grant at FAU Erlangen-Nürnberg. D.C. was supported by a SERB Core Research Grant (CRG/2022/001658). C.D.D. was supported by a Royal Society grant (RF_ERE_210013). E. M. Dillon was supported by an Earl S. Tupper Postdoctoral Fellowship from the Smithsonian Tropical Research Institute. E. M. Dowding was supported by the Paleosynthesis Project and the Volkswagen Stiftung (Az 96 796). P.L.G was supported by the University of São Paulo (grant no. 22.1.09345.01.2).
Competing Interests
The authors declare no competing interests.
Data Availability Statement
All data are available on GitHub: https://github.com/emmadunne/data_equity.
Code Availability Statement
All code used to manipulate and visualize data for this manuscript are available on GitHub: https://github.com/emmadunne/data_equity.