Introduction
Surveillance of SARS-CoV-2 infections typically relies on event-based surveillance, with positive reverse transcription PCR (RT-PCR) test results being notified to health authorities. These data have proven useful in estimating trends in transmission dynamics over short periods of time. However, they do not reflect the true incidence of SARS-CoV-2 infection in the population for a variety of reasons, including the motivation of individuals to get tested, targeted testing or required testing from particular groups, accessibility of test centres, out-of-pocket expense, and the increasing use of at-home testing. Younger people with COVID-19, those with mild or no symptoms or with more limited access to healthcare may all be underrepresented [Reference Lipsitch1, Reference Tancredi, Anker, Rosella and Chiolero2].
An alternative approach to monitoring is the estimation of the population prevalence of SARS-CoV-2 infection using population-based RT-PCR testing of randomly selected samples. Because of the huge financial and logistical costs involved with individually testing the large number of participants required to reach acceptable precision (especially when the true prevalence is low), the pooled testing design has gained popularity. This approach, proposed by Robert Dorfman in 1943 ‘to weed out all syphilitic men called up for induction’ in the United States military [Reference Dorfman3], pools samples from randomly selected individuals in groups. One test is performed per pool to save resources. If the test result is negative, the entire pool is considered free of infection. If positive, the samples from the pool can be retested individually to identify cases. This approach is commonly used in blood banks [Reference Garcı́a4], for disease screening [Reference Gaydos5], and other applications. It has also been used during the COVID-19 pandemic as part of the health protection response, to identify and isolate infectious cases [Reference Barak6–Reference Pikovski and Bentele11]. In addition to these objectives, it is also possible to use these data to estimate the prevalence of infection in the population from the number of positive pools, even without retesting individuals [Reference Bilder12].
Despite the long history of pooled testing, several challenges remain. These relate to the level of prevalence to which the method can sensibly be applied, the optimal pool size, the independence of infection risk within a pool, and the influence of the accuracy of the test used. For prevalence values below 38%, an optimal pool size can be calculated that minimizes the total number of tests [Reference Hanel and Thurner9]. Above this level, the optimal pool size is 1 and the method, therefore, is irrelevant [Reference Ungar13]. The assumption that the individuals in pools are independent regarding their risk of infection is often violated as pools typically group people that are linked (e.g. family members, classmates, or work colleagues). Counter-intuitively, the clustering of cases may improve the efficiency of pooled testing by increasing the separation between infected and non-infected pools [Reference Comess, Wang, Holmes and Donnat10]. Finally, several approaches have been developed to deal with test accuracy, and particularly imperfect sensitivity [Reference Pikovski and Bentele11, Reference Aprahamian, Bish and Bish14]. Daon et al. proposed a method accounting for imperfect sensitivity that takes the number of positive tests within a pool into account [Reference Daon, Huppert and Obolski15]. Sensitivity might also differ between nasopharyngeal swabs – the reference standard – and saliva samples, which are preferred for children. The uncertainty in these estimates of sensitivity should also be reflected in the results [Reference Gelman and Carpenter16]. A final challenge is dealing with the correlation of test results when pooled testing is repeated over time in the same population.
We developed a Bayesian hierarchical model to estimate the prevalence of infection from pooled test data; (1) correcting for imperfect test sensitivity; (2) propagating the uncertainty in test sensitivity into the results; and (3) including correlation over time and space using Gaussian processes (GPs) and a hierarchical structure. We examined the performance of this framework in a simulation study and applied it to real-world data from Switzerland.
Methods
Modelling pooled testing
We aimed to estimate the population prevalence over time, represented by the latent variable $ \pi (t) $ , from pooled testing data concerning $ {N}_t $ individuals divided into $ {P}_t $ pools of size $ {M}_t $ , of which $ {K}_t $ have a positive test. From the prevalence $ \pi (t) $ , we computed the probability $ \theta (t) $ that a single pooled test returns positive. In case of perfect specificity and sensitivity of RT-PCR tests, the probability $ \theta (t) $ is
We then accounted for the imperfect sensitivity $ ( Se) $ and specificity $ ( Sp $ ) of RT-PCR tests. We followed the approach of Daon et al. who observed that test sensitivity increases with pool size and assumed that imperfect specificity was caused by sample contamination and thus does not depend on pool size [Reference Daon, Huppert and Obolski15]. The probability $ \theta (t) $ that a single pooled test returns positive becomes:
See Supplementary Material S1, Section 1.1 for more details.
Inference
The sensitivity and the specificity were treated differently. We assumed a fixed specificity of 100% (this assumption was relaxed in a sensitivity analysis, Supplementary Material S1, Section 2.3). On the other hand, due to varying estimates reported for the sensitivity, we treated it as a free parameter, propagating the uncertainty about $ Se $ into the population prevalence estimate. The sensitivity $ Se $ was jointly estimated alongside the prevalence using the number of positive RT-PCR tests $ L $ among $ M $ individuals with confirmed SARS-CoV-2 infection:
For this, we used data from 20 studies, with a total sample size of 8,026 [Reference Marando, Tamburello, Gianella, Taylor, Bernasconi and Fusi-Schmidhauser17], resulting in a mean $ Se $ value of 84.6%. Last, we linked this probability to the pooled test results using a binomial likelihood:
Information about prior distributions and parameter choice can be found in Supplementary Material S1, Section 1.2.
Spatio-temporal structure
We used inverse-logit transformed GPs to express the temporal correlation of prevalence $ \pi (t) $ [Reference Riutort-Mayol, Bürkner, Andersen, Solin and Vehtari18]:
where the GP is defined by a mean $ \mu $ and a kernel $ \varSigma $ . $ \varSigma $ is characterized by an exponentiated quadratic kernel, where the covariance between prevalence at two time points decreases exponentially with the interval between these points. The two parameters of the exponentiated quadratic kernel, that is, the length scale and the variance parameters, were estimated during the fitting procedure. We used a hierarchical structure with two levels expressing the spatial correlation of prevalence. To provide an aggregated prevalence estimated for each higher-level area while allowing for some variation between the subareas belonging to the same higher-level area, we used a beta-binomial distribution [Reference Kim and Lee19]:
where $ i $ refers to the subarea, $ j $ refers to the higher-level area, and $ \kappa $ is a dispersion parameter. This approach can be adapted to any two nested geographical areas to estimate the prevalence at the highest level. For Switzerland, the subareas correspond to the cantons (NUTS-3 in the Eurostat nomenclature of territorial units), which are nested within regions (NUTS-2) or the entire country [20].
Simulation study
We performed a simulation study to validate the ability of the model including a spatiotemporal structure (thereafter named GP model) to estimate population prevalence. We simulated four scenarios mimicking different dynamics of SARS-CoV-2 prevalence in five hypothetical subareas over 30 weeks, using a deterministic susceptible-exposed-infected-recovered (SEIR) model (Supplementary Material S1, Section 1.3). In scenario 1, the prevalence was kept relatively stable at around 2.5%, while it increased from 0% to 5% in scenario 2, increased up to 7.5% in one wave in scenario 3, and increased up to 10% in two successive waves in scenario 4 (Figure 1a). We introduced heterogeneity in prevalence across subareas with a beta distribution. For each scenario and region, we simulated pooled test data for each week of different pool sizes (5, 10, or 20) and total sample sizes (100, 500, 1,000, or 5,000, equally spread over the five subareas). We then applied the GP model to the simulated data to estimate the weekly overall prevalence and compared estimates to the true values used in the simulation. For comparison, we also assessed the performance of a naive model that ignored the temporal correlation and estimated the prevalence for each week independently. We repeated the procedure 100 times for each scenario and combination of pool size and total sample size, and calculated two metrics: (1) the root mean squared error (RMSE) of the point estimate of prevalence, measuring accuracy, and (2) the half-width of the 95% credible interval (95%CrI) around the prevalence estimate, measuring sharpness [Reference Gneiting, Balabdaoui and Raftery21].
Analysis of Swiss data
We applied the model to the Swiss weekly pooled test data from 19 April 2021 to 29 August 2022. During this period, repeated pooled testing was conducted as part of the wider health protection response, with the primary objective of identifying and isolating infectious cases of SARS-CoV-2 infection. It was not designed to produce prevalence estimates, so the data collection was secondary to the immediate use of information by local health authorities. The program was implemented in each canton independently but followed similar modalities. The implementation was supported by the federal government with seed financing of the software and logistical infrastructure, and cost coverage of each test. All samples were tested using RT-PCR, and most cantons used the same infrastructures and laboratories. All laboratories were accredited. The data were collected by the 26 Swiss cantons and it was legally required to be sent to the Federal Office of Public Health (FOPH) weekly. Each canton had to report the weekly numbers of total pools, positive pools and number of participants. We summarized the data by the seven Swiss NUTS-2 regions (Central, Eastern, Lake Geneva, Middle (‘Mittelland’), Northwest, Ticino, and Zurich [20]) to estimate weekly regional prevalence estimates from multiple observations at the cantonal level (except for canton Ticino which is also a NUTS-2 region by itself). We also summarized cantonal-level observations to obtain country-level estimations of prevalence for each week with the same approach. We used multiple imputation to impute missing pool sizes (0.7% of cases, Supplementary Material S1, Section 1.4) [Reference Azur, Stuart, Frangakis and Leaf22].
Pooled tests with a pool size of 4 or larger were financed by the state and performed in a selection of (1) schools, (2) care centres, and (3) workplaces in order to reduce the need for global control measures indiscriminate to local conditions. Samples from schools included children and their teachers. The care setting included long-term care facilities, elderly people homes and hospitals and comprised both staff and patients or residents. Workplaces included several companies and public administrations. In most cases, saliva samples were pooled on-site, before being transported to the laboratory, but the pooling could also be done at the laboratory. There were standardized recommendations for all processes including sample collection and pooling, transportation, waste management, and molecular analysis (Supplementary Material S2). In all settings, individuals could contribute repeatedly over successive weeks.
We considered the three settings separately and obtained prevalence estimates over time for each setting and NUTS-2 region, and at the national level. We used Spearman’s rank correlation coefficient on posterior samples to compare the prevalence estimates across areas and settings and with other data on the SARS-CoV-2 epidemic in Switzerland (counts of reported cases of SARS-CoV-2 infection, COVID-19 hospitalizations and COVID-19 deaths). We performed all the analyses in R version 4.2.1 [23] and Stan version 2.29.1 [Reference Carpenter24]. The code is available from https://github.com/erikstuder/poolprevBAG. We also developed the R package poolprev that can be used to apply these methods in other settings (https://github.com/anthonyhauser/poolprev).
Results
Simulation study
The simulation study demonstrated the ability of the GP model to accurately estimate prevalence from pooled test data (Figure 1a). Figure 1b illustrates the superior model fit of the GP model compared to the naïve model for scenario 3. The RMSE from the GP model was consistently below or equal to 1.5 percentage points across combinations of pool size (5, 10, and 20) and total sample size (100, 500, 1,000, and 5,000) (Figure 1c). Both the accuracy (measured by the RMSE) and the sharpness (measured by the half-width of the 95%CrI) of the prevalence estimates increased with larger sample sizes and, to a small extent, with smaller pool sizes (Figure 1c,d). The quality of the estimates dropped when the sample size was small (N = 100) and the pool size was large (N = 20). This deterioration of quality was, however, observed only for higher prevalence values (Figure 1e), suggesting that sample sizes of 500 or above with pool sizes between 5 and 10 are sufficient to produce reliable estimate for prevalence values below 5%. Compared to the naive model ignoring the temporal correlation, the GP model had better accuracy and sharpness in all situations.
Application to Swiss data
A total of 1,439,984 pooled tests were done in Switzerland over the study period (Table 1). Of these, 837,278 (58%) were from samples collected in schools, 169,634 (12%) in care settings, and 433,072 (30%) in workplaces (Figure 2a). The number of pooled tests varied over time (for example, in schools depending on holidays). Data collection stopped in schools in April 2022 and continued at lower levels in the other settings until August 2022. The distribution of pooled tests across the seven Swiss regions was unequal, with few data for the canton Ticino, which was excluded from region-level analyses. The average number of individuals per pool ranged between 4 and 48, with a median of 6 (interquartile range 5–8) (Table 1 and Figure 2b). The proportion of positive pools varied over time, with the average proportion of positive pools at the country level following similar patterns in the three settings (Figure 2c).
Note: In the column ‘Total number of pools’, the percentages in bracket indicates the distributions of the pools over the three settings and the seven regions. In the column ‘Number of positive pools’, the percentages in bracket are the pool positivities. In the column “Median pool size”, the numbers in bracket indicate the interquartile ranges.
Based on these data, we estimated SARS-CoV-2 prevalence in the three settings over time at the national level (Figure 3). In schools, there were three waves of increasing magnitude: a first small wave that peaked in September 2021 at an estimated prevalence of 0.7%, a second short wave peaking in early December 2021 at about 1.4%, immediately followed by a third, larger wave from January to March 2022 that peaked at 4.6% prevalence. The prevalence trajectory was similar in care centres and selected workplaces, with a large wave from December 2021 to the end of March 2022 peaking at 4.8% and 3.6%, respectively. While pooled testing in schools was discontinued at this time, it continued in the care centres and workplaces, with another wave in the summer of 2022 peaking at around 2%.
Prevalence in schools was generally slightly higher than in the other two settings (Figure 4a). This was especially apparent during the early phase of the largest wave, around February 2022, suggesting an earlier peak in schools. The dynamics of SARS-CoV-2 prevalence across regions of Switzerland were generally synchronous, with important differences in magnitude (Figure 4b,c). The largest wave of December 2021 to March 2022 started slightly earlier in the Lake Geneva region in all three settings, but reached higher values in the Central region, especially in schools. We also found very high estimates in the Northwest region during this period but concentrated in care centres and workplaces. The steep rise in prevalence estimates observed only in June 2022 in the selected workplaces of region Mittelland appears to be an artefact based on few data points, with very large uncertainty intervals (Supplementary Material S1, Section 2.1, Supplementary Figure S1).
Prevalence estimates from pooled testing were highly correlated across the three settings, with Spearman’s correlation coefficient ranging from 0.91 to 0.95 (Table 2). Correlations of prevalence estimates across areas and settings were also generally high, with some exceptions (Supplementary Material S1, Section 2.2, Supplementary Figure S2). The correlation of prevalence estimates from pooled test data with event-based surveillance data varied (Table 2). The estimated prevalence in schools was highly correlated with the weekly counts of reported cases, hospitalizations and deaths (Spearman’s correlation coefficients 0.84 to 0.90). For care centres and workplaces, the correlation with event-based surveillance data was weaker (range of coefficients 0.21–0.84). The correlation coefficients for reported cases across the three age groups were consistently high for the estimates from schools (range 0.89–0.93) but lower for the estimates from care centres and workplaces (range 0.38–0.93). The exception was the age group 60 years and above for which the correlation was high for estimates from all three settings (Table 2). Across regions, correlations between reported cases and prevalence were lower overall but highest for the prevalence estimates from the school setting (Supplementary Material S1, Section 2.1, Supplementary Figure S1).
Discussion
In this study, we demonstrated that the pooled test design can be useful to monitor the prevalence of SARS-CoV-2 infections when analyzed with the appropriate tools. We developed a reliable approach to analyze pooled test data while accounting for the spatial and temporal structure of the data and imperfect test performance. This approach was thoroughly validated in a simulation study and applied to real data from Switzerland. We estimated prevalence levels up to 4–5% during the omicron wave of winter 2021–2022 in three different settings: schools, care centres, and workplaces. Despite the apparent noisiness of the pooled test data, the estimated trajectories of prevalence were consistent across settings at the national level, showing high correlation across settings. These trajectories were also aligned with external data about the dynamics of reported cases of SARS-CoV-2 infection.
Reported incidence has been largely used to track SARS-CoV-2 transmission. Estimates of SARS-CoV-2 prevalence were rarely accessible due to the large sample size required to measure low prevalence levels. The Office of National Statistics (ONS) coronavirus infection survey in the United Kingdom and the REACT-1 study in England collected samples from hundreds of thousands of individuals to estimate prevalence over time. Here, we took advantage of pooled test data collected locally with limited oversight to reconstruct the dynamics of SARS-CoV-2 transmission in Switzerland [Reference Pouwels25, Reference Elliott26]. Prevalence estimates provide additional insights about the trends based on reported incidence, mainly from newly symptomatic people, which can lead to misleading interpretations due to variations in testing levels. For example, the reported incidence in 0–19 year olds dropped in March 2022, while the estimated prevalence in schools remained high. As this period coincides with a sharp decrease in testing in the 0–19 age group [27], this suggests that the observed decreasing trend is likely wrong, and that reported incidence failed to capture the transmission dynamics.
Our estimates of SARS-CoV-2 prevalence in Switzerland provided other important insights. We found high prevalence in the three settings, reaching 4–5% at the national level during the omicron wave of winter 2021–2022. Such high prevalence levels have been observed in other European countries around the same time, for example, in England where the REACT-1 study estimated a peak prevalence of 6.4% in March 2022 [Reference Elliott26]. The comparison of prevalence estimates in the three settings shows a temporal lag between schools and working places or care centres. This suggests that the omicron wave in Switzerland occurred earlier in schools, supporting the literature about the role of schools in enhancing of spread of respiratory infections [Reference Cauchemez28]. The lower immunity levels in children, whose access to vaccination was limited, could also explain this discrepancy, which was also observed in the REACT-1 study [Reference Elliott26].
While the dynamics of prevalence were likely reliably captured, the prevalence estimated from these Swiss pooled testing data could be biased downwards. Pooled tests conducted in schools or workplaces are likely to at least partially exclude infected students, teachers, and workers showing symptoms, as these people would be encouraged to stay at home, and only asymptomatic or mild cases would remain in the pool. This likely creates a selection bias, leading to the underestimation of the community prevalence (although still representing the prevalence in the settings themselves). This bias is expected to be smaller in care centres, as patients and residents should remain even if symptomatic, although they could be discouraged from participating. Care settings are also more likely to implement stringent control measures and might have a lower prevalence, so direct comparison across settings does not allow for a better characterization of this potential selection bias. Interestingly, the pooled test data from schools and workplaces can be considered as complementary to reported cases, the former approach focusing on the asymptomatic and paucisymptomatic portion of the infected population, while the latter focuses on newly symptomatic people who receive a test leading to a notification to the surveillance system.
The general lack of information about individuals and local practices constitutes another important limitation of this work. The modalities of selection of individuals participating in pooled testing, as well as test sensitivity and specificity, could vary across time and space in unexpected fashion, with incentives to participate or not participate depending on local conditions and policies. The quality of the data reported by the local teams may also vary. As the primary objective of the program was practical and aimed at providing information to local authorities, a systematic protocol was not immediately established at the time of the earliest testing, and some information was not recorded. For instance, we could not access the detailed results of each pool, but only obtained the proportion of positive pools, together with the average pool sizes by week and canton. This could lead to an underestimation of the uncertainty in our prevalence estimates. For the same organizational reasons, pooled test data were not available at some times and places, emphasizing the importance of using Gaussian processes to stabilize the estimates over time.
Our approach has several important strengths, despite the limitations. The simulation study showed that our model based on Gaussian processes could provide reliable prevalence estimates even with sample sizes as small as 100 if the true prevalence remains below 5% (which was the case in most settings throughout the SARS-CoV-2 pandemic). We accounted for the temporal correlation of prevalence and thus limit the impact of missing or scarce data on the estimates. We modelled the imperfect sensitivity of RT-PCR in the context of pool testing, and propagated all sources of uncertainty into the results. We provide an R package that makes the application of our method easy with minimal pool test data (number of pools, number of positive pools, and pool size). Since the beginning of the SARS-CoV-2 pandemic, other approaches that share commonalities with our model have been proposed to appropriately analyze pooled test data. McLure et al. have developed a R package that also provides flexible functions to model the prevalence over time, but does not account for test sensitivity [Reference McLure, O’Neill, Mayfield, Lau and McPherson29]. The European Center for Disease Prevention and Control has also developed methods that adjusted for imperfect test sensitivity, but they did not account for the correlation of pooled test results over time [30].
Conclusion
Pooled testing can be used as a reliable approach to monitor the dynamics of SARS-CoV-2, especially as part of a wider health protection response aimed at mitigating the immediate consequences of an outbreak by identifying and isolating infectious cases. It is more affordable than alternatives based on event-based surveillance or large-scale prevalence studies, but can provide highly accurate estimates with relatively small sample sizes as long as (1) prevalence remains lower than 5% and (2) appropriate tools are used to account for imperfect testing and space–time correlation. For these reasons, it could be considered in pandemic preparedness plans as a potential addition to traditional surveillance strategies in situations of low to intermediate circulation of SARS-CoV-2 and other viruses.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0950268824000876.
Data availability statement
The COVID-19 surveillance data are publicly available [27]. The pooled test data are available on motivated request to the corresponding author. The code to reproduce the results presented in this article is available from https://github.com/erikstuder/poolprevBAG. We developed the R package poolprev that can be used to apply these methods in other settings (https://github.com/anthonyhauser/poolprev).
Author contribution
J.R. and A.H. conceived the study, carried out the simulations and analyses, and wrote the first draft of the manuscript. E.S., A.F., and T.M.S. collected the data. All authors contributed to the interpretation of results and the final manuscript.
Funding statement
This work was supported by the Swiss Federal Office of Public Health (mandate 142006323) and by Swiss National Science Foundation grant 189498.
Competing interest
The authors declare none.
Ethical standard
Not applicable – According to the Federal Act on Data Protection (235.1 AS vol. 3387; 1992), ethical clearance is not needed when working with fully anonymized governmental data.