Systems thinking, or systems analysis, is an entire discipline that provides an approach and methodology for defining, exploring, and visualizing systems to understand and improve how they function (Forrester Reference Forrester1990; Gibson et al. Reference Gibson, Scherer, Gibson and Smith2017; Kauffman Reference Kauffman1980; Meadows Reference Meadows2008). The application of this approach is not new in archaeology, and it has been used, for example, to identify how coherent interdependency emerges among past communities and cultures (Bentley and Maschner Reference Bentley, Maschner, Bentley, Maschner and Chippindale2003:246–247). Systems thinking provides at least two notable benefits: (1) the tools to move beyond “event-level” thinking to determine how long-term behaviors create the structures and interdependences of a system and (2) the concept that the behaviors and structures of a system are determined by the individuals and organizations (agents) that compose that system. In other words, actors within the system create it and have the capability to change it (Bentley and Maschner Reference Bentley, Maschner, Bentley, Maschner and Chippindale2003:245; Meadows Reference Meadows2008:2–3).
In this framework, a system is a set of things or a collection of parts that function together in such a way as to produce their own pattern of behavior over time, which is meant to achieve something (Kauffman Reference Kauffman1980:5; Meadows Reference Meadows2008:11–12). However, systems are more than the sum of their parts. They are both dynamic (able to change) and evolving (having emergent properties). They consist of a web of elements, actors, agencies, nodes, stocks or “parts” that are connected through the flows of things and people. A system also has a boundary, so here, we are concentrating on the digital data system generated by archaeological work conducted in the United States. We have set that boundary based on the shared set of established federal, state, and local laws that guide behaviors on how archaeological data are created and curated. Although these laws provide a framework for practice, it is important to remember that they were also conceptualized by humans, and we stress that it is the individuals and organizations that drive the behavior of the system.
Our goal in this review is to succinctly explain and visualize the structure of the system in the United States that generates, manages, and curates digital archaeological data. Through this overview, we aim to demonstrate that although the growth of the digital archaeological system has been steady, scholarly efforts and investments in developing digital curation and data sharing practices have been ad hoc. This behavior has resulted in a web of loosely affiliated organizations, departments, and agencies of varying sizes that generate, manage, and share archaeological data in a variety of ways. In its current configuration, individual and organizational behaviors have created a system that supports and reinforces data generation without balancing it with data curation and sharing. This reinforcing loop is a causal mechanism that encourages individuals and organizations to keep generating and accumulating new data, at the expense of making existing data findable, accessible, and (re)useable. As a result of this unbalanced practice, the labor of data curation and sharing is disproportionately shifted to a limited number of data curators and data publishers. This in turn reinforces an implicit break in the digital data life cycle and increases the likelihood of loss of knowledge and skills about data caretaking, which is ultimately not sustainable.
Over the last two decades, multiple visions for how stabilizing mechanisms should be enacted have been offered and subsequent approaches have been enacted. A key question that emerges from this review is whether the prioritization of a centralized disciplinary repository (Altschul et al. Reference Altschul, Kintigh, Klein, Doelle, Hays-Gilpin, Herr and Kohler2017, Reference Altschul, Kintigh, Klein, Doelle, Hays-Gilpin, Herr and Kohler2018; Kintigh Reference Kintigh2006) has shifted as the participation of many kinds of actors creating, curating, sharing, and using the digital data has grown. We suggest that more distributed models and broader-scale efforts to provide resources and digital literacy and data literacy training may need to take priority for the system to truly find equilibrium.
MODELING THE US DIGITAL DATA SYSTEM
In the United States, there are at least 4,700 institutions that curate digital and physical archaeological data (Institute of Museum and Library Services 2019), and thousands more organizations and individuals that are active partners in generating and using those data. There are undoubtedly petabytes of existing archaeological data (analog and digital) that have already been generated from more than a half century of archaeological research in the digital age. The early 2000s saw a shift to digitization (initially in archives, libraries, museums, and galleries) and the use of earth observation sensors as part of research. These parallel movements resulted in significantly increased volumes of structured and unstructured digital data (digitized and born digital). However, the US archaeological community has been slow to build capacity in digitization, familiarity with remote sensing technologies, investment in infrastructure, and skills in digital curation. This does not mean that archaeologists did not engage with digital technologies and digital data; rather, until recently, digital applications were typically limited to individual interests and often directed toward making existing processes seemingly more efficient and transparent (e.g., digital submission of permit applications or dashboards to track their progress).
The data deluge (Altschul et al. Reference Altschul, Kintigh, Klein, Doelle, Hays-Gilpin, Herr and Kohler2017; Bevan Reference Bevan2015) refers to the growing number of digitized artifact and field records, images, locational data, and grey literature generated each year by excavations (Petrosyan et al. Reference Petrosyan, Azizbekyan, Gasparyan, Dan, Bobokhyan and Amiryan2021). Because of the division in the life cycle of digital archaeological data between creators and curators, the large amounts of data flowing into the system negatively impact flows and feedback mechanisms that, if balanced, could channel data to a variety of users. However, when infrastructures and skills in digital curation are not in place to handle the growing volume of digital archaeological data effectively, then processing and proper curation are delayed or, worse, never enacted, leading to data loss (Figure 1). Meadows (Reference Meadows2008:117–119) notes that missing or delayed feedback is one of the most common causes of systems malfunction, such as a scenario (Figure 1a) when there is no curation of data generated from fieldwork, resulting in the loss of data and opportunity for reevaluation. In such situations, a powerful intervention is necessary to restore information and balance the system, which can include (Figure 1b) adding curation (analog and digital) as an aspect to save the data from being lost and to provide opportunities for feedback on how the data are generated and organized. Adding a third aspect to the system (Figure 1c) such as data sharing and reuse can create more opportunities for feedback and intervention before future decision making. With this additional aspect, archaeologists make available information on which data already exist, whether methodologies enable others to answer research questions and assess what kinds of data or documentation might be missing for reuse.
Responses to past and recent surveys (Rivers Cofield et al. Reference Rivers Cofield, Terry Childs and Majewski2024; Watts Reference Watts2011) indicate that curators and repository managers are struggling to find the time, staff, and resources to properly digitize legacy materials from old excavations and to care for born-digital data. The most challenging aspects of managing data are keeping on top of storage needs and allotting the time and training necessary to make data accessible to others within and outside the organization (Kansa and Kansa Reference Kansa and Kansa2022; Rivers Cofield et al. Reference Rivers Cofield, Terry Childs and Majewski2024; Watts Reference Watts2011; Williams and Williams Reference Williams and Williams2019). Creating a stable digital system that follows best practices for curation and enables evidence-informed research and decision making (Kintigh et al. Reference Kintigh, Altschul, Beaudry, Drennan, Kinzig, Kohler and Fredrick Limp2014; Marwick et al. Reference Marwick, D'Alpoim Guedes, Michael Barton, Bates, Baxter, Bevan and Bollwerk2017; Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023) requires that we all move beyond project-level and organizational-level thinking and evaluate actions and decisions with the larger system in mind. To that end, we provide a brief description and visualization of the system as it currently stands, using a systems thinking approach.
The System's Structure
Currently, the digital data system for US archaeology can be characterized as a complex open system—it functions without guidance from a centralized source (Bentley and Maschner Reference Bentley, Maschner, Bentley, Maschner and Chippindale2003:246). The constituent parts of the system are people and organizations that generate, curate, and share data and that use the existing digital infrastructure (e.g., software, hardware, and networks). Figure 2 illustrates the intersections and relationships between different types of organizations that collect, curate, and share data. The number and variety of organizations demonstrate the complexity of the infrastructure(s) that have been built to handle digital data in archaeology. For example, individual researchers (academic researchers, students, and various university departments), cultural resource management (CRM) firms, and smaller-sized governmental repositories regularly use in-house, custom-built Microsoft Access or FileMaker Pro databases or Excel spreadsheets to collect and store information (Kansa and Kansa Reference Kansa and Kansa2021; Pavel Reference Pavel2010). The expectation and ability to share those data with other parties varies widely depending on resources and funding. In practice, this means that individual organizations or small contingents of organizations typically set and follow their own guidelines and use nomenclature for data and metadata that is not necessarily mapped to larger standardized systems.
In a different vein, institutions—such as private and public museums and state, tribal, and county historical preservation offices—collect and curate data to purposely make them available and usable to other parties, who are generally researchers, land managers, the public, and descendant community members. These organizations may use discipline-specific digital tools designed for their needs, or content management systems that are designed to be used in a variety of disciplines and are customized to work with archaeological data. For example, federal agencies such as the Department of Interior and its bureaus, including the National Park Service and Fish and Wildlife, use a third-party collections management system to help standardize data entry for the collections from all the parks or refuges in their jurisdiction. However, these independent units are not required to coordinate with other types of organizations that similarly serve as aggregators of data, such as state or other federal agencies or nongovernmental repositories (e.g., the Digital Archaeological Record [tDAR]), and there is no shared standard for data nomenclature across all federal agencies or even within one agency. A major exception to this operating standard is the recent fruition of efforts by the Bureau of Land Management and more than 11 western State Historic Preservation Offices to create the National Cultural Resources Information Management System (NCRIMS), which provides normalized archaeological site data for 11 states and is used by the Bureau of Land Management for planning and decision-making.
Finally, another set of organizations—data hubs—do the additional work of storing, standardizing, and integrating data produced by others. These organizations also make such data accessible to the public through web portals. Over the last two decades, a variety of aggregation strategies have been implemented (Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023:Table 2; Ortman and Altschul Reference Ortman and Altschul2023:93–96). Some organizations focus on compiling examples of a specific and useful data type (e.g., 14C or tree-ring dates). Data publishers, such as Open Context, have applied this model to more expansive and varied datasets (Kansa Reference Kansa, Averett, Gordon and Counts2016). Others recatalog large collections using standardized methodologies and metadata protocols into a relational database (Digital Archaeological Archive of Comparative Slavery [DAACS; Galle et al. Reference Galle, Bollwerk, Neiman, Allen and Ford2019). Still other hubs bring together existing databases, reports, and site files from particular regions or research foci (Colonial Encounters [King Reference King2016], Chaco Research Archive [Plog and Heitman Reference Plog and Heitman2006], cyberSW, Digital Index of North American Archaeology [DINAA; Wells et al. Reference Wells, Kansa, Kansa, Yerka, Anderson, Bissett and Myers2014], National Archaeological Database [NADB]). Last are digital archives that are specifically designated spaces for the management and long-term retention and retrieval of a variety of different types of information. These can be repositories hosted by universities or archaeology-specific archives such as tDAR (McManamon et al. Reference McManamon, Kintigh, Ellison and Brin2017; Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023). Such organizations have become increasingly vital in facilitating data curation practices following moves by granting agencies to mandate data management plans as part of project planning and grant proposals.
In addition to preserving data, these aggregation projects provide protocols and procedures for standardizing and integrating data from multiple archaeological datasets. Each has its trade-offs in terms of balancing breadth and detail of the data curated and shared, but all offer examples of best practices, and they support and fund data preservation, standardization, integration, and accessibility (Galle et al. Reference Galle, Bollwerk, Neiman, Allen and Ford2019; Heitman Reference Heitman2017; Kansa and Kansa Reference Kansa and Kansa2021; McManamon and Ellison Reference McManamon, Ellison, Lercari, Wendrich, Porter, Burton and Levy2022; McManamon et al. Reference McManamon, Kintigh, Ellison and Brin2017). However, although these hubs have made interoperability of data housed within their systems a priority, it is only more recently that coordination and interoperability between these various hubs have begun to be addressed (Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023).
BRINGING BALANCE TO THE SYSTEM
Given the previously mentioned issue of the imbalance between data creators and curators, can we collectively change our behaviors to establish more equilibrium in the US archaeological data system? Figures 1 and 3 illustrate the causal loop relationships between fieldwork, data generation, curation, and data sharing that would move the US digital data system closer to equilibrium. Within a more balanced system where data are ready for reuse, archaeologists have reliable information to facilitate research, decision-making, and collaborative planning. Some have advocated for centralized archaeological information infrastructure, whereas guiding principles such as FAIR (Findable, Accessible, Interoperable, Reusable; Wilkinson et al. Reference Wilkinson, Dumontier, Aalbersberg, Appleton, Axton, Baak and Blomberg2016) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics; Carroll et al. Reference Carroll, Garba, Figueroa-Rodríguez, Holbrook, Lovett, Materechera and Parsons2020) have received heightened attention recently. In addition, concepts such as para- and metadata can make data more useful, and there has been a continuous push to increase digital and data literacy among archaeologists (Cobb et al. Reference Cobb, Sigmier, Creamer and French2019; Daems Reference Daems2020; Kansa and Kansa Reference Kansa and Kansa2021).
One proposal for centralized US infrastructure that was suggested nearly 20 years ago (Kintigh Reference Kintigh2006) has evolved with time. The core principle of that suggestion was the establishment of a national center that works with a distributed network to curate and synthesize existing primary datasets and the associated metadata. A related goal was to create products/deliverables that would inform planning, research, outreach, and coordination with descendant communities and other stakeholders (Altschul et al. Reference Altschul, Kintigh, Klein, Doelle, Hays-Gilpin, Herr and Kohler2017, Reference Altschul, Kintigh, Klein, Doelle, Hays-Gilpin, Herr and Kohler2018; Kintigh Reference Kintigh2006; Kintigh et al. Reference Kintigh, Altschul, Beaudry, Drennan, Kinzig, Kohler and Fredrick Limp2014:567–568, Reference Kintigh, Altschul, Kinzig, Fredrick Limp, Michener, Sabloff, Hackett, Kohler, Ludäscher and Lynch2015; Ortman and Altschul Reference Ortman and Altschul2023). In practice, that vision has been enacted in two ways in the US context: (1) tDAR, which focuses on data preservation and retention, and (2) the Coalition for Archaeological Synthesis (CfAS), a nonprofit organization established in 2017 as a group of partner organizations and individual associates that supports workshops and miniconferences that foster synthesis in archaeology to expand knowledge. Other aggregation projects previously mentioned (e.g., CRA, cyberSW, DAACS, DINAA, Open Context) also operate with similar principles but focus on specific areas of the United States or include other regions abroad.
Those advocating for more centralized infrastructure have noted that for synthesis to be successful, there must also be a coordinated effort to ensure standardization, integration, management, curation of data, maintenance of data itself, and indexing (i.e., linking to other organizations/institutions)—along with a host of methodologies for properly handling sensitive data. An important consideration is what benefits are derived from data aggregation when the same groups hold the data and control the technology and narrative. Realizations about the value and necessity of data sharing and reuse not only among archaeologists but also between archaeologists and descendant communities have come to the forefront in the last decade. This aspect of coordination, however, has been slower to be established. In the last few years, the FAIR and CARE principles have been conceived for the scientific data life cycle more broadly and applied to US and Canadian archaeology specifically (Gupta et al. Reference Gupta, Martindale, Supernant and Elvidge2023; Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023). Figure 3 visualizes how these principles fit into the system.
The benefit of these principles is that they can be incorporated into existing practices and are more focused on adding documentation to increase transparency, connection building, and feedback. For example, for data to be reusable, it is critical, when an archaeologist seeks to reuse those data, to understand how a dataset originally came into being, especially from another data provider. No data are truly “raw,” and datasets are not “objective” (Gitelman Reference Gitelman2013) but are created by the decisions of people in what they observed, what they chose to collect and document, and what/how they measured (Smith Reference Smith, Watrall and Goldstein2022:99). If an archaeologist uses digital datasets without knowledge of their context and transformations, they may incorrectly assess the reliability of the received datasets, which would impact subsequent data analysis. Results would therefore carry the data provider's assumptions and limitations, as well as the uncertainties of the archaeologist reusing the data (Allison Reference Allison2008).
One way to inform reuse of datasets is through “paradata.” This is the documentation of research processes that result in the creation of datasets (Börjesson et al. Reference Börjesson, Sköld and Huvila2020). It refers to the methods, tools, and technologies used to create data, as well as contextual information on the decisions made during collection and analysis of archaeological data (Atici et al. Reference Atici, Kansa, Lev-Tov and Kansa2013). In the simplest terms, an archaeologist records the creation process and shares the “recipe” with others (Gupta Reference Gupta, Gillings, Hacıgüzeller and Lock2020). These “data alongside data” (Denard Reference Denard, Corsi, Slapšak and Vermeulen2013) inform about an archaeologist's interactions with a dataset and provide critical contextual information for reuse.
In addition to paradata, metadata is a powerful tool for reusing datasets. Metadata provide information on who made the data, when and where the data were made, the explanation of variables, and measurements taken, among others. Like paradata, metadata must stay connected with datasets. Some software, including ArcGIS and QGIS, provide options to store metadata within the application, whereas in other cases, the archaeologist must use a text file to document relevant metadata fields. This is one facet in digital curation that can be better implemented in digital archaeological data and requires greater scholarly attention. For example, at present, it is unclear how often archaeologists use the built-in metadata tool in various GIS software programs when sharing datasets, and how the data receiver uses the metadata information during data reuse.
Equally important is adding cultural metadata to datasets so that information on the cultural provenance, permissions, and protocols associated with data are shared with a data user (Anderson and Hudson Reference Anderson and Hudson2020; Gupta et al. Reference Gupta, Martindale, Supernant and Elvidge2023). Tribal Historic Preservation Offices, tribal communities, and other descendant communities need to be able to find, access, and maintain control over archaeological data that are about them and that are recovered on their lands (Neller et al. Reference Neller, Heckman, Bollwerk, Myers and Wells2024). Developed for use and sharing of Indigenous data, cultural metadata can enable archaeologists to “reconnect” a community with data related to it. Digital tags, such as Traditional Knowledge (TK) labels and licenses, can be embedded within data so that they remain with the data when shared. By acknowledging Indigenous cultural protocols, provenance, and permissions associated with datasets, archaeologists and organizations can facilitate both Indigenous stewardship of their data and active participation in all parts of the data system.
Another component—increasing data literacy—has also suffered from slow implementation in the overall system. Data literacy might focus on implementing FAIR and CARE practices in everyday workflows and processes. Knowing how to add paradata and metadata and preserve files properly is key, but literacy also includes the ability to interrogate datasets and analytic steps to (1) consider their implicit biases, blind spots, and tacit assumptions and (2) develop thinking about how data can and should relate to a broader context and system (Kansa and Kansa Reference Kansa and Kansa2021:82–83). Although the creation of regional and national repositories is critical, the lack of a national data governance structure (Nicholson et al. Reference Nicholson, Kansa, Gupta and Fernandez2023:67) and training programs on best practices for preparing data for long-term curation and appropriate reuse for research and planning have continued to serve as major hurdles for behavior change (Kansa and Kansa Reference Kansa and Kansa2021; Watrall Reference Watrall2019).
Due to a lack of systematic thinking, the implementation of both top-down and bottom-up approaches that provide coherent guidelines and best practices for digital curation are needed. More programs—such as Michigan State University's Institute for Digital Archaeology Method and Practice and its Digital Culture Heritage program (Watrall Reference Watrall2019; Watrall and Goldstein Reference Watrall and Goldstein2022), the Networking Archaeological Data and Communities (Alexandria Archive Institute 2023), and the Spatial Archaeology Residential and Online Institute (Klehm Reference Klehm2023)—that are dedicated to training practitioners on preparing data and digital projects for long-term preservation and reuse are also needed. Other initiatives, such as online classes (Digital Archaeological Archive of Comparative Slavery 2023) and workshops for professional conferences, provide important educational opportunities for institutions with fewer resources to learn how to collect, prepare, and analyze data for one's own research while preserving the data and resulting interpretations for sharing and appropriate reuse. Such efforts can help balance vulnerability in the system because these can not only serve as intervention points that prevent technological obsolescence and lost data but encourage effective data reuse to respond to both grand and daily challenges (Flewellen et al. Reference Flewellen, Dunnavant, Odewale, Jones, Wolde-Michael, Crossland and Franklin2021; Kintigh et al. Reference Kintigh, Altschul, Beaudry, Drennan, Kinzig, Kohler and Fredrick Limp2014, Rivers Cofield et al. Reference Rivers Cofield, Terry Childs and Majewski2024).
CONCLUSION
This discussion of the digital data system in US archaeology seeks to not only help orient practitioners so that they get a sense of the parts, structure, intersections, and causal dynamics of the system but also raise useful questions about our collective next steps. Applying systems-thinking concepts and diagrams provides helpful conceptualizations and illustrations of long-favored behaviors that have created causal pathways in US archaeology's digital data system. Like many others, we are advocating for significant shifts in behavior to balance the system by increasing resources that prioritize data accessibility, sharing, and appropriate reuse. When we place the priority on training and instilling the behaviors that foster digital curation practices, our system can become more robust and better able to respond. This prepares us to adapt to more volatile changes in digital technologies that push for the return to bad behaviors (i.e., the always present pressure to prioritize new excavations without equal time given to proper curation and analysis of legacy data). In making the structure of archaeology's data system more visible to archaeologists and other specialists, we hope to encourage conversation about the imbalance in the system, as well as the actionable steps and scholarly shifts needed to benefit the archaeological community as a whole both now and in the long term.