Over the last decade, R has gradually emerged as the archaeological language for data analysis (Schmidt and Marwick Reference Schmidt and Marwick2020). This trend reflects a renewed interest in formal approaches and the use of statistics in archaeology. Moreover, this usage is situated in a particular context. The 2010s witnessed two major reflections, though not unrelated to each other. The first is the recognition of the reproducibility crisis spanning all disciplines (Baker Reference Baker2016; Ioannidis Reference Ioannidis2005), including archaeology (Karoune and Plomp Reference Karoune and Plomp2022; Marwick Reference Marwick2017). The second is the burgeoning open science movement, supported by national and international initiatives, with varying degrees of institutional commitment. All of this unfolds against a backdrop of substantial growth in the volume of collected and processed archaeological data.
R (R Core Team 2023) is a programming language for statistical computing released under the GNU General Public License. The freedoms offered by the GNU license and the modular structure of R allow the development of packages that provide additional functionality, usually dedicated to a specific task, making R a versatile tool.
The tesselle project (https://www.tesselle.org; Figure 1) is a collection of R packages for research and teaching in archaeology that emerged from this evolving landscape of research practices. This article describes the tesselle project, its objectives, and the main principles of its design, and it provides some reflections on the encountered challenges and future directions. The aim is to offer an overview of the project, presenting its key components, and encouraging the reader to continue exploring the documentation.
Motivation
The increasing use of R poses a number of new challenges for the archaeological community. These challenges can be grouped into two main categories, allowing us to distinguish between intrinsic and extrinsic difficulties in the use of programming languages—including R—in archaeology. Intrinsic difficulties pertain to the very use of programming languages and, in general, the use of any research software, especially in light of the challenges of research reproducibility. Despite all the care taken during development, no software is entirely free from bugs. Similarly, unintended uses by users can lead to unexpected, if not erroneous, results. Finally, each language—and by extension, each software—has its life cycle: more or less significant changes may occur over time (whether or not visible to the user), and maintenance may also cease.
This latter point echoes what can be termed as extrinsic difficulties: those that do not directly relate to the use of programming languages but to the organization and functioning of the archaeological discipline as a community. Baptiste and Roe (Reference Baptiste and Roe2021) have highlighted the fragility of open-source archaeology: most projects have a short lifespan and rely on precarious work, that which often lacks professional and institutional recognition. Additionally, there is the issue of training for archaeologists; as emphasized by Schmidt and Marwick (Reference Schmidt and Marwick2020), it is unlikely that established professionals would be motivated to program if they had not been trained to work with code early in their career. Reflection should be undertaken at the institutional level on how digital tools are becoming prominent in the professional context (Tufféry Reference Tufféry2019) and on the additional workload that open science may represent (Hostler Reference Hostler2023).
The tesselle project was conceived as an attempt to respond to some of the intrinsic challenges associated with using R in archaeology. In doing so, the project encounters the same extrinsic challenges as the rest of open-source archaeology. This project is driven by two primary objectives: to move away from proprietary environments and advance toward more transparent and open methodologies in archaeological research. At the time of writing this article, there are over 20,000 packages available on the Comprehensive R Archive Network (CRAN; https://cran.r-project.org), providing a vast array of tools to meet most analytical needs. Furthermore, owing to the collective efforts of the community, a wealth of high-quality packages tailored to archaeology have been developed (for a comprehensive list, see Marwick et al. Reference Marwick, Wang, Giusti, Crema, Galili, Bartholdy and Spake2022).
The tesselle packages are centered on quantitative analysis methods specifically crafted for archaeology. They are designed to complement both general-purpose and other specialized statistical packages. These packages serve as a versatile toolbox, facilitating the exploration and analysis of common data types in archaeology—such as count data, compositional data, or chronological data—and enabling the construction of reproducible workflows.
Additionally, the project was designed with a focus on university-level teaching. Although this last point requires an in-depth discussion beyond the scope of this article, it is worth noting that improved statistical and scientific programming training contributes to addressing research reproducibility issues (Munafò et al. Reference Munafò, Nosek, Bishop, Button, Chambers, du Sert, Simonsohn, Wagenmakers, Ware and Ioannidis2017). Numerous teaching resources are available (e.g., Carlson Reference Carlson2017), but the importance of these courses appears to vary widely across archaeology programs.Footnote 1 The tesselle project also aims to help novice programmers start analyzing their data in R by offering a consistent toolbox.
Design Principles
The design of the tesselle project and its packages drew inspiration from certain aspects of the tidyverse (https://www.tidyverse.org)—particularly its emphasis on prioritizing end users, given that R is primarily used by nonprogrammers (Wickham et al. Reference Wickham, Averick, Bryan, Chang, D'Agostino McGowan, François and Grolemund2019). This is manifested through the attention given to package documentation. Each package is accompanied by a website consolidating all the documentation, which is accessible from the portal https://packages.tesselle.org. The enhancement of documentation represents one of the most significant ongoing endeavors: providing novice users with sufficient resources to facilitate their initial use of the tools.
The tesselle project also aims to adhere to the recommendations of the tinyverse (https://www.tinyverse.org) by trying to minimize external hard dependencies to the bare essentials. This simplifies maintenance by avoiding external changes that might impact or break the project. Keeping the project as lightweight as possible also serves to minimize the impact on the end user, ensuring that the installation of one package does not entail installing dozens of others. Although not all packages in the tesselle project are entirely dependency-free (Figure 2), the dependencies, with a few exceptions, are internal to the project (the arkhe package, for example, was initially designed for internal use by other packages within the project).
The project is developed with transparency and reliability in mind, as indicated by the following:
• All packages are distributed under GNU General Public License (https://www.gnu.org/licenses/gpl-3.0.html): this makes it possible to freely run, copy, distribute, study, change, and improve them.
• All packages are publicly maintained, with source code accessible and versioned on GitHub (https://github.com/tesselle/).
• All packages undergo rigorous testing and code coverage. Most of them are distributed on CRAN, which implies adherence to stringent standards (Chambers Reference Chambers2020).
However, some reservations must be addressed regarding the implementation of these guiding principles. Like many open-source software, the tesselle packages come without any warranty. As highlighted by Kreutzer et alia (Reference Kreutzer, Burow, Dietze, Fuchs, Fischer and Schmidt2017), software quality assurance is a shared responsibility between developers and users. Even with adherence to rigorous development practices (testing, cross-validation, code coverage, etc.), incorrect or unexpected results may arise (flaws in design, corner cases, etc.), or breaking changes may be introduced.
End users must accurately report and cite any software used, along with its version number, to ensure transparency and reproducibility of published results. By doing so, the published results are associated with a specific state of the software, ensuring traceability in case a software error is discovered later. Within the tesselle project, semantic versioning (https://semver.org) is employed to assign version numbers. Semantic versioning is a versioning scheme used to convey meaningful information: it supports compatibility and stability, because it distinguishes between major changes that may require adjustments in existing code and minor changes that can be safely integrated without major disruptions. Furthermore, every version of each package is archived on Zenodo (https://zenodo.org) and receives a DOI to be easily citable.
Components
A meta-package, called tesselle, lets one download and install the project's core packages with a single R command:
install.packages("tesselle")
Using the library() function, one can then attach the core tesselle packages:
library("tesselle")
The following core packages are designed to work seamlessly together and can be used to explore and analyze common data types in archaeology:
• tabula (Frerebeau Reference Frerebeau2023a; https://packages.tesselle.org/tabula/) allows for the examination of archaeological count data. It provides several tests and measures of diversity: heterogeneity and evenness, richness and rarefaction, turnover, and similarity. This package makes it easy to visualize count data and statistical thresholds—rank versus abundance plots, heatmaps, and Ford and Bertin diagrams.
• kairos (Frerebeau Reference Frerebeau2024a; https://packages.tesselle.org/kairos/) provides a tool kit for absolute dating and the analysis of chronological patterns. This package includes functions for chronological modeling and dating of archaeological assemblages from count data. It provides methods for matrix seriation and allows for the computation of time point estimates and density estimates of the occupation and duration of an archaeological site. This package relies on aion (Frerebeau and Roe Reference Frerebeau and Roe2023; https://packages.tesselle.org/aion/), which makes it easier to work with time series in archaeology.
• nexus (Frerebeau and Philippe Reference Frerebeau and Philippe2024; https://packages.tesselle.org/nexus/) allows for the exploration and analysis of compositional data. It provides tools for chemical fingerprinting and source tracking of ancient materials by chemical composition.
• dimensio (Frerebeau Reference Frerebeau2024b; https://packages.tesselle.org/dimensio/) offers methods to compute, extract, summarize, and visualize results of simple multivariate data analysis (Principal Components Analysis [PCA] and Correspondence Analysis [CA]).
• isopleuros (Frerebeau Reference Frerebeau2024c; https://packages.tesselle.org/isopleuros/) enables the creation of ternary plots and includes common ternary diagrams useful for archaeologists (e.g., soil texture charts, ceramic phase diagrams).
Additionally, companion packages complement these core packages for specific tasks, such as data visualization or preparation, and can be installed separately. khroma (Frerebeau Reference Frerebeau2024d; https://packages.tesselle.org/khroma/) provides accessible color schemes tailored for each type of data (qualitative, diverging, or sequential). alkahest (Frerebeau Reference Frerebeau2023b; https://packages.tesselle.org/alkahest/) is a toolbox for preprocessing XY data from experimental methods (i.e., any signal that can be measured along a continuous variable): it provides methods for baseline estimation and correction, smoothing, normalization, and more. For teaching purposes, folio (Frerebeau Reference Frerebeau2024e; https://packages.tesselle.org/folio/) offers several datasets related to broad topics in archaeology and paleontology, which can be used to illustrate statistical methods in the classroom.
Concluding Words
The tesselle project has reached a stable state and is actively being developed. This collection of R packages aims to contribute to the development of open-source computational archaeology. It provides a consistent and reproducible tool kit that can be easily extended. Users are invited to contribute, share feedback, request new features, or report bugs on GitHub: https://github.com/tesselle/.
Further reading—including examples, tutorials, and detailed documentation—can be found at http://www.tesselle.org.
Acknowledgments
The following contributors have made it possible to develop this project by providing helpful discussion and bringing in new ideas: Jean-Baptiste Fourvel, Ben Marwick, Anne Philippe, and Joe Roe. The author would like to thank Brice Lebrun for creating the project and package logos. The development and maintenance of packages within the tesselle project are greatly facilitated by the following packages: usethis (Wickham et al. Reference Wickham, Bryan, Barrett and Teucher2024), devtools (Wickham, Hester, et al. Reference Wickham, Hester, Chang and Bryan2022), pkgdown (Wickham, Hesselberth, and Salmon Reference Wickham, Hesselberth and Salmon2022), tinytest (van der Loo Reference van der Loo2020), tinysnapshot (Arel-Bundock Reference Arel-Bundock2024), codemetar (Boettiger and Salmon Reference Boettiger and Salmon2022), and cffr (Hernangómez Reference Hernangómez2021). The project also benefits from these infrastructures built and maintained by the R community for package distribution: the Comprehensive R Archive Network (https://cran.r-project.org) and R-universe (Ooms Reference Ooms2021; https://r-universe.dev).
Funding Statement
This research received no specific grant funding from any funding agency or from commercial or not-for-profit sectors.
Data Availability Statement
No original data have been presented in this article. The source code for all R packages is available on GitHub (https://github.com/tesselle/) and archived on Zenodo (see references cited).
Competing Interests
The author declares none.