INTRODUCTION
Models of the coupled climate system are used to project how changes in natural and anthropogenic forcing will affect future climates (Collins et al., Reference Collins, Knutti, Arblaster, Dufresne, Fichefet, Friedlingstein, Gao, Gutowski, Johns and Krinner2013; Kirtman et al., Reference Kirtman, Power, Adedoyin, Boer, Bojariu, Camilloni, Doblas-Reyes, Fiore, Kimoto and Meehl2013). The expected changes in twenty-first-century forcing and climate are larger than those experienced during the recent historic period against which these models have been calibrated. Quaternary climate states in which the changes in forcing and climate are as large as those projected for the end of the twenty-first century are therefore now routinely used as an out-of-sample test of such models (Braconnot et al., Reference Braconnot, Harrison, Kageyama, Bartlein, Masson-delmotte, Abe-ouchi, Otto-bliesner and Zhao2012; Schmidt et al., Reference Schmidt, Annan, Bartlein, Cook, Guilyardi, Hargreaves and Harrison2014; Harrison et al., Reference Harrison, Bartlein, Izumi, Li, Annan, Hargreaves, Braconnot and Kageyama2015). This evaluation is crucially dependent on the availability of reliable quantitative reconstructions of multiple climate variables (Harrison et al., Reference Harrison, Bartlein, Brewer, Prentice, Boyd, Hessler, Holmgren, Izumi and Willis2014). Indications of past climate can be found in many different marine and terrestrial archives, but pollen analysis provides by far the most geographically widespread source of data for terrestrial palaeoclimate reconstructions (Bartlein et al., Reference Bartlein, Harrison, Brewer, Connor, Davis, Gajewski and Guiot2011; Marsicek et al., Reference Marsicek, Shuman, Bartlein, Shafer and Brewer2018 and references therein). There are many methods to make quantitative reconstructions of climate variables from pollen data using both statistical relationships and process-based modelling. Most reconstructions of terrestrial palaeoclimate rely on statistical relationships between modern pollen abundances and modern climate. There are two basic approaches (see discussion in ter Braak and Juggins, Reference ter Braak and Juggins1993): analogue methods select the modern climate associated with the modern pollen samples whose abundance patterns are most like a given fossil sample; regression-based methods calculate a transfer coefficient for each pollen taxon and the climate variable, which is then applied to taxon abundances in fossil pollen samples to reconstruct climate through time.
Weighted averaging partial least squares (WA-PLS; ter Braak and Juggins, Reference ter Braak and Juggins1993) is the most widely used of the regression-based methods. Weighted averaging is a form of regression adapted to the fact that taxon abundances generally show unimodal, rather than monotonic, responses to climate variables. The climate estimate for any given sample is the abundance-weighted mean of the estimated optima of all the taxa present. WA-PLS refines these optima by looking for further information in the residuals between the initial regression and the modern observations and repeats this until the incremental change ceases to be statistically significant. WA-PLS is considered more robust against spatial autocorrelation than modern analogue methods—that is, it is considered to be less affected by the fact that geographically clustered sites may inherently show similar taxon composition (Telford and Birks, Reference Telford and Birks2005). WA-PLS has been widely used for climate reconstructions based on biotic assemblages, including pollen, diatoms, chironomids, and foraminifera (Lotter et al., Reference Lotter, Birks, Hofmann and Marchetto1997; Brooks and Birks, Reference Brooks and Birks2001; Seppä et al., Reference Seppä, Bjune, Telford, Birks and Veski2009). In some more recent publications WA-PLS reconstructions have been presented alongside reconstructions using other statistical methods, such as modern analogue methods (Brewer et al., Reference Brewer, Guiot, Sánchez-Goñi and Klotz2008; Peyron et al., Reference Peyron, Goring, Dormoy, Kotthoff, Pross, de Beaulieu, Drescher-Schneider, Vannière and Magny2011; Sinopoli et al., Reference Sinopoli, Peyron, Masi, Holtvoeth, Francke, Wagner and Sadori2019).
A number of methodological choices have to be made in the application of WA-PLS, including the choice of a training data set and which taxa are included in the regression. Studies using WA-PLS generally rely on model performance statistics as a measure of the reliability of reconstructions. They are usually silent about how methodological decisions were reached, and the implications of specific choices for the quality of the reconstructions are rarely made explicit. However, a number of studies of quantitative reconstruction techniques (Birks and Seppä, Reference Birks and Seppä2004; Bjune et al., Reference Bjune, Birks, Peglar and Odland2010; Telford and Birks, Reference Telford and Birks2011; Juggins, Reference Juggins2013; Salonen et al., Reference Salonen, Seppä and Birks2013; Juggins et al., Reference Juggins, Simpson and Telford2015; Shennan et al., Reference Shennan, Long and Horton2015; Jonkers and Kučera, Reference Jonkers and Kučera2018) have examined some of these choices and suggested that they can influence the quality of the reconstructions. None of the previous studies provides a comprehensive analysis of the issues or discriminates between the quality of the statistical model and the impact on the climate reconstructions.
Our goal in this paper is to increase confidence in the quantitative reconstructions of past climates that are used for model evaluation through an analysis of the implications of methodological decisions both on model performance measures under modern conditions and the resulting quantitative reconstructions of glacial climates in southern Europe. We propose ways in which the implications of specific methodological choices can be evaluated in order to document the reliability of the resulting reconstructions.
METHODS
We test the impact of methodological choices by making climate reconstructions for selected fossil pollen records using subsets of a continental-scale modern training data set as inputs to WA-PLS (see “Modern pollen data”). In order to run these tests under climate conditions substantially different from present, we use fossil pollen records from southern Europe covering the last glacial period (see “Fossil pollen”). We examine the effects of methodological choices on WA-PLS model parameters and on the resulting climate reconstructions and their uncertainties (see “Application of WA-PLS”).
Modern pollen data
The modern pollen data set (Fig. 1, Supplementary Fig. 1), which we refer to as SMPDS, was constructed by combining records from the European Modern Pollen Database (EMPD) v3.0 (Davis et al., Reference Davis, Zanon, Collins, Mauri, Bakker, Barboni and Barthelmes2013), the Eastern Mediterranean-Black Sea-Caspian Corridor Biomes (EMBSeCBIO, which we abbreviate as EMB) database (Marinova et al., Reference Marinova, Harrison, Bragg, Connor, Laet, Leroy and Mudie2018), additional published records (see Supplementary Table 1) from the European Pollen Database (http://www.europeanpollendatabase.net/) or taken from Pangaea (https://www.pangaea.de/), and 73 modern surface samples from northern Spain (Wei et al., Reference Wei, González-Sampériz, Gil-Romera, Harrison and Prentice2019a). About two-thirds of the sites (4575) were derived from the EMPD v3.0 (Davis et al., Reference Davis, Zanon, Collins, Mauri, Bakker, Barboni and Barthelmes2013), and a further 1088 sites were derived from the EMBSeCBIO database (Marinova et al., Reference Marinova, Harrison, Bragg, Connor, Laet, Leroy and Mudie2018). Some of the sites in the EMPD also occur in the EMBSeCBIO database, and these duplicates have been removed. The final SMPDS data set consists of records from 6458 terrestrial sites (Harrison, Reference Harrison2019). We compare the EMPD and EMBSeCBIO subsets of the SMPDS data set to the full data set to examine the impact of the choice of training data set. The majority of the records were available as raw pollen counts; the remainder were percentages. The individual pollen records were taxonomically standardised and cleaned to remove obligate aquatics, insectivorous species, introduced species, and taxa that only occur as cultivars. The final data set expresses the counts as a percentage of the sum of all taxa remaining after this screening. There are 1558 taxa recorded in the SMPDS. Since some of these taxa are only recorded sporadically, we amalgamated them at higher taxonomic levels to produce a data set with 249 taxa (Harrison, Reference Harrison2019). We confined our analyses to taxa with ten or more occurrences in the SMPDS (n = 195).
Modern climate data
Modern climate data were derived from the Climate Research Unit CRU CL 2.0 data set, which provides monthly mean precipitation, monthly mean temperature, and fractional sunshine hours as long-term means for 1961 to 1990 at 10 minute spatial resolution (New et al., Reference New, Lister, Hulme and Makin2002). Geographically weighted regression (GWR) (Brunsdon et al., Reference Brunsdon, Fotheringham and Charlton2002) using latitude, longitude, and elevation as predictors was performed in ArcGIS (ESRI, 2014) to obtain the climate at the location and elevation of each modern pollen site. We then calculated three bioclimatic variables: (a) mean temperature of the coldest month (MTCO), (b) growing degree days above a baseline of 0°C (GDD0), and (c) moisture index (MI), the ratio of annual precipitation to annual potential evapotranspiration. MTCO was taken directly from GWR output. Daily values of temperature, sunshine fraction, and precipitation were derived using a mean-conserving interpolation (Rymes and Myers, Reference Rymes and Myers2001) of the monthly data. MI was calculated from the daily temperature, precipitation, and sunshine data using modified Python code from the Simple Process-Led Algorithms for Simulating Habitats (SPLASH) model (Davis et al., Reference Davis, Prentice, Stocker, Thomas, Whitley, Wang, Evans, Gallego-Sala, Sykes and Cramer2017).
These three bioclimatic variables have been shown to provide a good prediction of vegetation distribution both at global (Wang et al., Reference Wang, Prentice, Keenan, Davis, Wright, Cornwell, Evans and Peng2017) and regional (Wang et al., Reference Wang, Prentice and Ni2013) scales because they reflect important and distinct ecophysiological controls on plant growth (Harrison et al., Reference Harrison, Prentice, Barboni, Kohfeld, Ni and Sutra2010). MTCO is a surrogate for extreme winter temperatures and influences the survival of woody plants through a wide range of low temperature–tolerance mechanisms, GDD0 is a combined measure of the length and warmth of the growing season that determines potential annual carbon accumulation, and MI is a measure of plant-available moisture (Harrison et al., Reference Harrison, Prentice, Barboni, Kohfeld, Ni and Sutra2010). Canonical correspondence analysis (CCA) applied to the SMPDS data set (Wei et al., Reference Wei, González-Sampériz, Gil-Romera, Harrison and Prentice2019a) showed a strong correlation between species abundance and these climate variables. The modern climate data at each of the SMPDS sites (Harrison, Reference Harrison2020) are available online (doi: 10.5281/zenodo.3605003). Partial CCA carried out using each climate variable in turn, with the other two as covariates, shows that each variable has a highly significant effect (p < 0.001) that is independent of the others (Wei et al., Reference Wei, González-Sampériz, Gil-Romera, Harrison and Prentice2019a).
Fossil pollen
The fossil pollen data were taken from the Abrupt climate Changes and Environmental Responses (ACER) database (Sanchez Goñi et al., Reference Goñi, Desprat, Daniau, Bassinot, Polanco-Martínez, Harrison, Allen and J.R.M.2017). The ACER database includes 93 pollen records globally covering part or all of the last glacial period (73–15 ka) with a temporal resolution better than 1000 years. Glacial-age pollen records provide a test of reconstruction techniques under climate conditions substantially different from present and therefore allow rigorous testing of the adequacy of the modern training data set. As examples, we use eight lacustrine records from southern Europe from the ACER database (Supplementary Table 2). The database provides standardised age models for all these cores, but the sampling resolution varies among cores. The Ioannina core (northwestern Greece, 39° 45' N, 20° 51' E) is used here as the primary example because it covers a long period (79.7–10.6 ka) and has relatively high temporal resolution (mean 237 years). However, as discussed below, the conclusions based on this core are supported by analysis of the other fossil data sets.
Climate space analysis
One assumption of WA-PLS is that the taxa used for reconstruction show a unimodal response to the climate variable being reconstructed. In order to test whether this assumption holds true, particularly for amalgamations to higher taxonomic levels, the climate space occupied by individual pollen taxa was represented and visualised using Generalized Additive Models (GAM) (Guisan et al., Reference Guisan, Edwards and Hastie2002) implemented with the Mixed GAM Computation Vehicle (mgcv) R package (Wood, Reference Wood2017). This approach fits a response surface to the concentration of the pollen abundance in 3D climate space. Abundance is taken as the pollen percentage based on a pollen sum that includes all of the 195 taxa. Convex hulls are used to delineate the area of climate space that contains sampling points and thus avoid representing taxon abundances in climates not closely constrained by the modern pollen data. Convex hulls were fitted using the alphahull and ggplot2 packages in R (Pateiro-Lopez and Rodriguez-Casal, Reference Pateiro-Lopez and Rodriguez-Casal2016; Wickham, Reference Wickham2016). We used a square root transformation of MI, as differences between MI values at the low end of the MI scale are more important than differences at the high end in their effect on vegetation (Prentice et al., Reference Prentice, Cleator, Huang, Harrison and Roulstone2017); taking the square root “stretches” the lower values and “compresses” the higher ones. MTCO and GDD0 were not transformed. We do not include interactions among the climate variables since previous analyses (Wei et al., Reference Wei, González-Sampériz, Gil-Romera, Harrison and Prentice2019a) have shown that they each have a significant and independent influence on the distribution of plant taxa and vegetation types. For visualisation purposes, we show 2D slices (MTCO and MI) through the fitted 3D response surfaces at low, medium, and high values of GDD0. Here we present a number of illustrative GAMs; Wei et al. (Reference Wei, Harrison and Prentice2019b, Reference Wei, Prentice and Harrison2020) provide GAMs for all of the 195 pollen and pteridophyte spore taxa from the SMPDS data set.
Application of WA-PLS
The modern bioclimatic and pollen data were used to create pollen-climate transfer functions independently for MTCO, GDD0, and √MI using WA-PLS (ter Braak and Juggins, Reference ter Braak and Juggins1993). WA-PLS was implemented with the rioja R package (v0.9-15.1; Juggins, Reference Juggins2017). The performance of the calibration models was assessed through leave-one-out cross-validation. The number of components used in each model was estimated through a randomisation t-test on the results of this cross-validation (van der Voet, Reference van der Voet1994). We selected the significant component with the lowest root mean square error (RMSE), but only if there was a significant improvement in RMSE relative to a lower number of components—since including more components can result in over-fitting of the data so that model predictive value decreases. When making direct comparisons between different subsets of the data and SMPDS, we used the component that was considered significant for that subset (see Supplementary Table 3) rather than the same component across all sets. This choice does not affect the analysis, and for completeness we include an example comparison using the same components for all training sets in the Supplementary Information (Supplementary Figs. 2, 3, and 4).
The (in)stability of transfer coefficients and the statistical uncertainty of reconstructions were assessed by bootstrapping the modern sample set with replacement 1000 times (Efron, Reference Efron1979) and running WA-PLS each time to derive 1000 instances of both the taxon transfer coefficients and the reconstructed climate for each sample. The standard deviations (SDs) of the taxon transfer coefficients and the SDs of the reconstructions were then calculated from the 1000 bootstrap samples. The variability of a taxon coefficient, as measured by its SD, reflects how consistently the available modern samples including that taxon represent climate. The SD of the reconstruction calculated in this way represents the combined effect of the taxon coefficient variability across all taxa in a fossil sample. This approach differs from the standard method for calculating uncertainties in the rioja R package (Juggins, Reference Juggins2017), which is based only on bootstrapping of the modern data set, because it combines the uncertainties of individual taxa from the modern pollen data set. Data analysis and plotting were performed in R v3.5.1 (R Core Team, 2018).
RESULTS
WA-PLS provides a transfer coefficient for each taxon and each bioclimatic variable. The validity of this transfer coefficient depends on the taxon being sampled across the full range of its realised niche in climate space, and this in turn is determined by the climate space sampled in the modern pollen training data set. Our initial tests therefore focus on the impact of the sampling of climate space on WA-PLS performance metrics and the resulting reconstructions (see “Impact of choice of training data set” below). One of the factors that affects the width of the realised taxon niche is taxonomic resolution, with species in general occupying a more limited niche than genera or families. The use of higher taxonomic groupings increases the number of samples available and can improve the sampling of climate space. We therefore examine the impact of using higher taxonomic groupings on the sampling of climate (see “Impact of amalgamation of pollen taxa to higher taxonomic levels” below). The choice of taxa included in the WA-PLS model will also impact the overall width of the sampled climate space, and we also address the potential impact of including or excluding taxa on model performance and the resulting reconstructions (see “Impact of number of taxa” below).
Impact of choice of training data set
In order to test the impact of the choice of training data set on WA-PLS performance metrics and reconstructions, we compared results from the full SMPDS data set and those obtained using either the EMB or the EMPD subsets. The EMB samples warmer seasonal climates than the EMPD (see Fig. 1, Supplementary Fig. 5). The SMPDS samples much colder winter climates (as measured by MTCO) than either the EMB or EMPD data sets, and colder summers (as measured by GDD0) than the EMB set. These differences in the sampled climate space are reflected in reconstructions of winter (Fig. 2, see Supplementary Fig. 2) and summer (see Supplementary Fig. 3) temperatures, and also in moisture (as measured by MI) (see Supplementary Fig. 4).
Reconstructions based on the EMB data set are generally warmer than those based on the EMPD, and there is little overlap between the spread of the reconstruction estimates in the coldest intervals of the glacial period (see Fig. 2). The SMPDS data set produces temperature reconstructions similar to those based on the EMPD but has a considerably narrower reconstruction spread because it samples a wider climate space and contains more samples for each pollen taxon and thus a more complete sampling of the taxon's climatic range. The SDs of the coefficients are related to the number of occurrences (Fig. 3) of the taxon in the sampled data set.
The difference in the reconstructions and their stability is not reflected in differences in the WA-PLS model performance measures (Table 1). Model performance is poorer for EMB-based reconstructions than for SMPDS-based reconstructions, but all of the measures of model performance (r 2, RMSE, maximum bias) are better for EMPD than for the SMPDS data set because they are measures of how well the WA-PLS model replicates the sampled climate space, rather than whether the training data set encompasses a sufficiently large climate space. The standard performance measures are therefore an insufficient guide to the reliability of WA-PLS reconstructions. Comparison of observed versus predicted values (and their residuals) for the SMPDS data set show that this data set adequately reproduces modern climate (Supplementary Fig. 6).
Both the range and the continuity of the sampled climate are important. This is a consequence of the underlying assumption of WA-PLS that response curves are unimodal and extend across the full realised niche of the taxon; an abundance distribution which is truncated or has gaps cannot be expected to give reliable coefficients. This is borne out by analyses in which the density of sampling is artificially reduced. A proportion of the modern samples was progressively removed while preserving the overall range of climate space (Fig. 4, Supplementary Figs. 7, 8, and 9), having permuted the samples to avoid artificial similarities when samples which are neighbours in the database are also geographically close. Even a 70% reduction in the number of samples does not change the reconstructed winter (see Fig. 4, Supplementary Fig. 10) or summer (Supplementary Figs. 11 and 12) temperature, providing that the range of the sampled climate is maintained (see Supplementary Figs. 7 and 8). It does, however, lead to larger uncertainties because the number of times each taxon is sampled is reduced and the taxon coefficients are therefore less stable (Supplementary Fig. 9). Again, model performance measures do not discriminate between the quality of the reconstructions made with the reduced density or the full SMPDS data set (Table 2).
Tests in which samples from part of the climate range are systematically removed demonstrate the importance of continuous sampling of the climate range (Fig. 5). Removing blocks of the MTCO gradient representing equal numbers of samples shows that discontinuities result in greater uncertainties (as measured by the bootstrapped SDs of the taxon coefficients) than those of the full SMPDS set. With one marginal exception, the uncertainties are also greater than those of a set from which the same number of samples randomly selected has been removed. Width and continuity are both important for obtaining stable taxon coefficients. Thus, it is important to ensure that the modern pollen training data set encompasses a wide range of climate space, to avoid offsets in the climate reconstructions. It is also important that the training data set samples the bioclimatic gradient as continuously as possible, in order to provide the most precise coefficients.
Impact of amalgamation of pollen taxa to higher taxonomic levels
Large modern pollen data sets are created by combining data collected by many palynologists, often for different types of studies, and as a result the level of taxonomic discrimination varies among sites. Many pollen taxa are recorded at a limited number of sites in such data sets, either because they are not regularly identified or because they are genuinely rare. Using taxa that are only recorded rarely, or show geographic clustering suggesting that they have not been sampled across the whole of their potential climate range, leads to unstable WA-PLS coefficients because the bioclimatic range of the taxon is unlikely to have been sampled continuously. The principle of niche conservatism indicates that higher taxa commonly have coherent environmental distributions (Ackerly, Reference Ackerly2003; Wake et al., Reference Wake, Hadly and Ackerly2009) (Fig. 6), and this principle provides a basis for amalgamating taxa that are only recorded at a few sites in the training data set. The coherency of the climate space occupied by the amalgamated taxon can be assessed using, for example, GAMs. Wei et al. (Reference Wei, Harrison and Prentice2019b) have tested the assumption that the taxa used here for reconstructions show a unimodal response to individual climate variables. They show that some taxa are unimodal with respect to one climate variable but are insensitive to others; the classic example is Artemisia, which appears to be insensitive to temperature but shows a clear optimum with respect to moisture. The assumption of unimodality holds true for most taxa across all climate variables; amalgamation to higher taxonomic levels does not modify this response substantially. As we show here, amalgamating rare taxa into higher taxa improves the bootstrapped SD of the transfer coefficients (Table 3, Supplementary Information Tables 5 and 6) by providing a more continuous sampling of bioclimatic gradients, and as a result leads to reduced uncertainties in the reconstructions.
Impact of number of taxa
Some taxa appear to be relatively uninformative either because they have a wide climatic tolerance or because they are rarely sampled, which raises the issue of whether such taxa should be excluded from the WA-PLS regression. However, analyses of randomly sampled taxa show that increasing the number of taxa monotonically narrows the reconstruction spread (Fig. 7a). In our analyses, the impact of including more taxa is steep initially and becomes discernibly more gradual above about 80 taxa, after which the mean reconstruction remains relatively stable with minimal offsets (Fig. 7b).
The abundance of pollen does not directly correspond to the abundance of a taxon in the vegetation (Prentice and Parsons, Reference Prentice and Parsons1983; Prentice, Reference Prentice1985; Sugita, Reference Sugita2007; Hellman et al., Reference Hellman, Gaillard, Broström and Sugita2008). Some taxa are systematically overrepresented in pollen assemblages (e.g., Pinus) while others are systematically under-represented (e.g., Larix). Furthermore, the comparative ease of pollen transport from closed canopy vegetation such as forest into more open landscapes that produce and disperse less pollen means that arboreal pollen is systematically overrepresented at sites in open tundra or steppe-type vegetation (e.g., Edwards et al., Reference Edwards, Anderson, Brubaker, Ager, Andreev, Bigelow and Cwynar2000; Bigelow et al., Reference Bigelow, Brubaker, Edwards and Harrison2003; Marinova et al., Reference Marinova, Harrison, Bragg, Connor, Laet, Leroy and Mudie2018). However, the inclusion of taxa that are substantially overrepresented in modern assemblages compared to their abundance in the vegetation does not degrade model performance. Reconstructions made with the full SMPDS data set and with a data set that excludes Pinus (Supplementary Fig. 14) show little difference either in reconstructed MTCO or in reconstruction spread across multiple sites. Thus, although some studies (e.g., Sinopoli et al., Reference Sinopoli, Peyron, Masi, Holtvoeth, Francke, Wagner and Sadori2019) have excluded such taxa, there is no general a priori reason to do so.
Fossil pollen assemblages almost always contain fewer taxa than the training data set. Thus, it might seem logical to base reconstructions solely on the taxa present in the fossil assemblages. However, the absence of taxa can also provide information about climate and thus can contribute usefully to the WA-PLS reconstruction. Reconstructions of MTCO using only the taxa that are present in the fossil pollen samples at Lake Ioannina are not distinguishable from those made with the full SMPDS, either in terms of mean values or reconstruction spread (Supplementary Fig. 15b). However, at other sites, the use of only taxa present in the fossil record results in a difference in the reconstructions. At Megali Limni, for example, the reconstructions based on the fossil-taxa-only model are ca 1°C colder than those based on the full SMPDS data set (Supplementary Fig. 15a). At both Lake Ioannina and at Megali Limni, the model performance indicators show a substantial improvement using the full SMPDS data set: the r 2 increases from ~0.55 to 0.7 at both sites, and the RMSE decreases from 5.80 to 4.82 at Ioannina and from 5.88 to 4.82 at Megali Limni (Supplementary Table 3). Using the full SMPDS data set is beneficial because, although taxon transfer coefficients are nearly independent of each other (and thus are little affected by the presence/absence of other taxa), WA-PLS uses information from other taxa to refine the coefficients.
The coefficients of rare taxa are not stable (see Fig. 3), and this can cause problems if these taxa are anomalously abundant, even at relatively low levels, in fossil samples. Changes in reconstruction spread are one indication of what is essentially a poor-analogue problem. At Lake Ioannina, for example, the increased spread in reconstructed MTCO during the interval between 73 and 68 ka (see Fig. 2b) occurs because the abundance of Ulmus/Zelkova increases from ca 0.75% to 6–8% of the sample total. This taxon is recorded at only 31 sites in the modern data set and has a very large SD (14.7°C). Alternative methods, such as squared chord distance (Supplementary Fig. 16), confirm that this interval has poor analogues. Although little can be done about such poor-analogue situations, it is important to identify them through the use of a robust measure of reconstruction uncertainty for individual samples, such as the reconstruction spread measure used here.
DISCUSSION
We have shown that the breadth and continuity of the climate space sampled by the modern training data set are important determinants of the quality of the reconstructions, although this may not be reflected in the WA-PLS model performance indicators. The number of samples in the data set does not provide a measure of the adequacy of sampling. There is a risk that training sets that rely on a very small number of samples (e.g., Xu et al., Reference Xu, Xiao, Li, Tian and Nakagawa2010; Salonen et al., Reference Salonen, Ilvonen, Seppä, Holmström, Telford, Gaidamavičius, Stančikaitė and Subetto2012; Ding et al., Reference Ding, Xu and Tarasov2017) do not sample a large enough range of climate space to encompass the true past climate if this is very different from present. However, as our comparison between reconstructions based on the EMPD and the full SMPDS shows, even large data sets may not have a sufficiently continuous sampling of climate space to provide stable coefficients and hence robust reconstructions. Furthermore, large data sets may still not encompass more extreme climates. This issue could be particularly important if the goal is to reconstruct climates very different from present-day climates, such as those of the last glacial period. Although most studies specify the training data set used, very few discuss how the training set was chosen or whether it samples an appropriate range of climate for the target reconstruction. Our analyses suggest that a minimum requirement is to plot the sampled climate space and thus to evaluate whether it is fit for purpose (Table 4). From the perspective of improving existing training data sets, the focus should be on filling gaps in climate space rather than adding samples in climate space already represented.
Amalgamating pollen taxa to higher taxonomic levels can provide a way of increasing the continuity of the sampling of climate and reducing the statistical error of the reconstructions. Huntley et al. (Reference Huntley, Bartlein and Prentice1989) were the first to demonstrate niche conservatism in surface pollen data, showing that the abundance distribution of Fagus as recorded in surface pollen samples in North America could be predicted from the distribution of the genus in Europe and vice versa. Subsequent studies have emphasised the degree to which lineages are conservative in terms of climate preferences (Ackerly, Reference Ackerly2003; Wake et al., Reference Wake, Hadly and Ackerly2009). Williams and Shuman (Reference Williams and Shuman2008) have provided counterexamples, where congeneric species occur at different parts of the moisture gradient in eastern and western North America. However, they also point out that the impact of this on reconstructions of climate decreases as the number of taxa used increases. Nevertheless, it should not be assumed that taxon amalgamation is always or automatically justified. One way of testing whether taxon distributions are unimodal and whether amalgamations to higher taxonomic levels are defensible is to construct GAMs (Wei et al., Reference Wei, Harrison and Prentice2019b).
The SMPDS includes sites from across the whole of the Palaearctic phytogeographic region (Wallace, Reference Wallace1876; Good, Reference Good1974; Kreft and Jetz, Reference Kreft and Jetz2010; Dengler et al., Reference Dengler, Janišová, Török and Wellstein2014) in order to create a training data set that samples colder and more continental climates than would be available from a more limited region. Although the sampling of this region is incomplete, since there are large sampling gaps in Russia and northern China, the SMPDS provides samples from much colder winter climates and climates with a greater seasonal temperature range (see Fig. 1). The expansion to the Palaearctic phytogeographic region relies on the fact that this region is characterised by taxa that have diversified in relatively recent geological time, that is, during approximately the past 20 Ma. The climate preferences of these taxa are expected to be conservative, as has been demonstrated for many plant lineages (Ackerly, Reference Ackerly2003; Williams and Shuman, Reference Williams and Shuman2008; Wake et al., Reference Wake, Hadly and Ackerly2009). The expansion of a training data set to increase the diversity of sampled climates is useful in principle, because it increases the range of different climates that can be reconstructed. As in the case of amalgamations to higher taxonomic levels, however, it is both possible and desirable to test (e.g., using GAMs, see Table 4) whether the taxa considered retain a coherent, unimodal relationship to the climate variables of interest. There is no established rule about the choice of training data set, and the general hypothesis of climatic niche conservatism within phytogeographic regions would repay further research. However, the common practice of using training data sets with limited geographic coverage (Xu et al., Reference Xu, Xiao, Li, Tian and Nakagawa2010; Salonen et al., Reference Salonen, Ilvonen, Seppä, Holmström, Telford, Gaidamavičius, Stančikaitė and Subetto2012; Ding et al., Reference Ding, Xu and Tarasov2017) effectively limits the range of climates that can be reconstructed. Similarly, exclusion of modern pollen samples from particular vegetation types (Sinopoli et al., Reference Sinopoli, Peyron, Masi, Holtvoeth, Francke, Wagner and Sadori2019) prevents the reconstruction of climates characterised by these vegetation types.
We have shown that WA-PLS model performance statistics and the stability of the coefficients improve as taxa are added. Several studies have examined the impact of using a subset of important taxa, defined in terms of abundance (Birks, Reference Birks1994) or predictive importance (Racca et al., Reference Racca, Wild, Birks and Prairie2003), on reconstructions based on weighted averaging (WA). The results have been inconclusive. Juggins et al. (Reference Juggins, Simpson and Telford2015) showed that removing uninformative taxa resulted in better performance using artificially constructed data sets but did not have a similar impact on real data sets. Jonkers and Kučera (Reference Jonkers and Kučera2018) found no impact from reducing the number of foraminifera species on WA model performance but showed that the resulting sea-surface temperature reconstructions were significantly different between reduced- and full-taxa reconstructions. A similar conclusion was reached by Bjune et al. (Reference Bjune, Birks, Peglar and Odland2010) in a study contrasting climate reconstructions based on 191 vs. 321 core-top pollen samples.
The impact of including taxa that do not have a significant relationship to the climate variable being reconstructed and therefore have limited predictive power is closely tied to whether they are influenced by other climate or environmental factors, which would result in a degradation of model performance as shown by the analysis in Juggins et al. (Reference Juggins, Simpson and Telford2015). Thus, the choice of appropriate variables to reconstruct, and accounting for interactions among variables, is probably more useful than removing taxa that are thought to be uninformative.
Some studies have excluded taxa that are overrepresented in pollen assemblages, such as Pinus, on the assumption that these suppress the expression of the signal from less abundant taxa (e.g., Sinopoli et al., Reference Sinopoli, Peyron, Masi, Holtvoeth, Francke, Wagner and Sadori2019). Our analyses suggest there is no reason to exclude such taxa on a priori grounds. However, there may be circumstances that warrant exclusion of specific taxa. Wei et al. (Reference Wei, González-Sampériz, Gil-Romera, Harrison and Prentice2019a), for example, excluded both Poaceae and Polypodiales from the taxa used to reconstruct climate at El Cañizar de Villarquemado on the grounds that high abundances of these two taxa caused anomalous excursions in moisture reconstructions that were not seen in temperature reconstructions, and because independent sedimentological evidence supported the idea that anomalously high abundances of these two taxa in some samples represented reeds (aquatic grasses, in the case of Poaceae) or erosional inputs (in the case of Polypodiales). They showed that the removal of these taxa had no impact except on the MI reconstructions of the anomalous samples. Insofar as anomalous results can be attributed to the high abundance of specific taxa that could reflect the influence of non-climatic factors, exclusions may be warranted. It is, however, important to test the impact of these exclusions on the final reconstructions (see Table 4).
It is often tacitly assumed that better WA-PLS model performance means better reconstruction, but we have shown that WA-PLS models of equivalent performance can produce very different reconstructions (e.g., see Fig. 2). WA-PLS model performance only reflects how well the model replicates the climate space of the modern training data set but says nothing about whether that climate space is adequate or appropriate. We conclude that model performance must be considered alongside an assessment of the representativeness of climate at the data set and taxon level (see Table 4). This could be done, for example, by plotting the sites and taxon abundances in climate space and expanding the calibration data set to ensure that it samples a range of climates commensurate with the climate that might have been experienced at a given location. Although the RMSE of the modern calibration is generally used as a measure of reconstruction uncertainty, it is in fact a measure of the average uncertainty for a randomly selected surface sample. Thus, it provides the same estimate of uncertainty for all fossil pollen samples, regardless of whether the taxa in the fossil samples are well represented in the modern data set and have stable coefficients. It provides no information about uncertainties arising because of mismatches between the sampling of modern taxa and their representation in fossil assemblages. We have used an alternative approach that propagates the calibration errors into the downcore reconstructions by assessing the calibration error associated with the particular assemblage in an individual pollen sample. This approach provides a way of identifying specific intervals downcore where the reconstruction errors are well constrained and differentiating these from intervals with larger uncertainties. As we show in the Lake Ioannina example (see Fig. 2b), it is then possible to identify the specific cause of these larger uncertainties and, in particular, a way of identifying poor-analogue situations that would be overlooked using the standard approach to representing uncertainty. Thus, the bootstrapped reconstruction spread provides an alternative, and we would argue better, measure of the consistency and redundancy of the calibration set and thus indirectly the robustness of the reconstructions than the standard use of RMSE.
There will be a continuing need for quantitative estimates of past climates in order to evaluate climate model simulations, and it is important to continue to explore ways to demonstrate and, if possible, improve their robustness. While there are alternative approaches to statistical reconstruction, such as vegetation-model inversion (Guiot et al., Reference Guiot, Torre, Jolly, Peyron, Boreux and Cheddadi2000; Wu et al., Reference Wu, Guiot, Brewer and Guo2007, Reference Wu, Guiot, Peng and Guo2009; Izumi and Bartlein, Reference Izumi and Bartlein2016) or data assimilation (Goosse et al., Reference Goosse, Renssen, Timmermann, Bradley and Mann2006; Annan and Hargreaves, Reference Annan and Hargreaves2013; Cleator et al., Reference Cleator, Harrison, Nichols, Prentice and Roulstone2019), they do not provide a panacea—in the one case because they are heavily dependent on the specifics of the model, in the other because they require quantitative estimates as inputs. It has been suggested (Brewer et al., Reference Brewer, Guiot, Sánchez-Goñi and Klotz2008) that more robust estimates of past climates can be obtained by using multiple statistical approaches, and this has been done in an increasing number of studies (Xu et al., Reference Xu, Xiao, Li, Tian and Nakagawa2010; Guiot and de Vernal, Reference Guiot and de Vernal2011; Peyron et al., Reference Peyron, Goring, Dormoy, Kotthoff, Pross, de Beaulieu, Drescher-Schneider, Vannière and Magny2011; Jonkers and Kučera, Reference Jonkers and Kučera2018; Sinopoli et al., Reference Sinopoli, Peyron, Masi, Holtvoeth, Francke, Wagner and Sadori2019). However, many of the sampling issues described here apply equally to all analogue- or regression-based reconstruction methods. Thus, while using several methods together may provide an indication of the robustness of a reconstruction, such robustness does not necessarily demonstrate that the reconstructions are correct. Exploring and documenting methodological decisions, as suggested here, should lead to more transparent reconstructions and perhaps greater consistency between reconstructions at different sites.
CONCLUSIONS
Care needs to be exercised in the application of statistical reconstruction techniques, such as WA-PLS, in order to ensure that the results are robust and reasonable. We have demonstrated that WA-PLS model performance metrics are not a sufficient guide to the reliability of the reconstructions: models with similar performance metrics nevertheless provide different reconstructions of past climates. We argue that the modern pollen training data set should not only sample the full width of the climate niche of each taxon but also provide a continuous sampling of this range. The width and continuity of the sampling of climate space are more important than number of samples. Width and continuity can be improved by amalgamating taxa to higher taxonomic levels, providing the assumptions about the unimodality of these higher taxa with respect to individual climate variables are explicitly tested. We also argue that it is important for the modern training set to sample climates that encompass the full range of the climates that might be expected to have occurred in the past; for example, to sample colder and drier climates that might be found in the Mediterranean during the last glacial period. While it is difficult to know a priori how broad a sampling of climate space is necessary to ensure this for a specific site or region, ensuring that the training data set is broad enough to test explicit hypotheses about the expected climate change would be useful. These hypotheses could be based on alternative sources of palaeoclimate information, including model experiments.
Our analyses are not designed to provide a prescription for how to apply WA-PLS: we are not recommending the use of a specific modern pollen training data set or specific methods to test the usefulness of this data set. Instead, we are advocating that the robustness of statistical reconstructions should be explicitly tested and documented. We have suggested a number of straightforward checks (see Table 4) that should be made. These include: (1) testing the range and continuity of the sampling of climate gradients in the modern training data set by plotting the modern samples in climate space; (2) ensuring that the sampled climate covers a sufficient range to be able to reconstruct substantially different climates in the past, for example by ensuring that the training data set includes samples which are both much warmer/colder and wetter/drier than the modern climate of the region; (3) examining whether amalgamation to higher taxonomic levels provides a more continuous sampling of climate space by plotting the climate space of both individual taxa and the higher taxon into which they are placed; and (4) using the largest number of taxa in the reconstructions and avoiding a priori taxon exclusions. The reasons for any exclusions should be transparent and supported by independent evidence, and the impact of these exclusions on the reconstructed climate should be tested. The proposed checks do not necessarily identify whether specific methodological choices are correct, but they do provide insights into the uncertainties associated with the reconstructions.
ACKNOWLEDGMENTS
SPH and DW acknowledge support from the European Research Council (ERC)-funded project Global Change 2.0: Unlocking the past for a clearer future (GC2.0), grant number 694481. This research is a contribution to the AXA Chair Programme in Biosphere and Climate Impacts and the Imperial College initiative on Grand Challenges in Ecosystems and the Environment (ICP). ICP also acknowledges funding from the ERC under the European Union's Horizon 2020 research and innovation programme (grant agreement No: 787203 REALM). SPH also acknowledges support from the JPI-Belmont project “PAlaeo-Constraints on Monsoon Evolution and Dynamics (PACMEDY)” through the UK Natural Environmental Research Council (NERC). The pollen data used in our analyses were taken from the ACER database. We thank members of the ACER Working Group for this compilation and the creation of standardised age models. We also thank Cajo ter Braak and an anonymous reviewer for their helpful comments on this paper.
AUTHOR CONTRIBUTIONS
All authors contributed to the conceptual development. Formal analysis, methods, and software: MT with contributions from DW on climate and GAMs. Writing—original draft: MT and SPH. Writing—editing and review: all authors.
SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found at https://doi.org/10.1017/qua.2020.44.