The overestimation of the effect sizes of psychotherapies for depression in waitlist controlled trials: a meta-analytic comparison with usual care controlled trials

Pim Cuijpers; Clara Miguel; Mathias Harrer; Marketa Ciharova; Eirini Karyotaki

doi:10.1017/S2045796024000611

The overestimation of the effect sizes of psychotherapies for depression in waitlist controlled trials: a meta-analytic comparison with usual care controlled trials

Published online by Cambridge University Press: 06 November 2024

Marketa Ciharova and

Pim Cuijpers*: Affiliation:
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands Babeș-Bolyai University, International Institute for Psychotherapy, Cluj-Napoca, Romania
Clara Miguel: Affiliation:
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
Mathias Harrer: Affiliation:
Psychology & Digital Mental Health Care, Technical University Munich, Munich, Germany
Marketa Ciharova: Affiliation:
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
Eirini Karyotaki: Affiliation:
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
*: Corresponding author: Pim Cuijpers; Email: [email protected]

Article contents

Abstract
Aims
Methods
Results
Conclusions
Introduction
Methods
Results
Discussion
Supplementary material
Availability of data and materials
Financial support
Competing interests
Ethical standards
References

Rights & Permissions

Abstract

Aims

There is considerable evidence that waiting list (WL) control groups overestimate the effect sizes of psychotherapies for depression. It is not clear, however, what are the exact causes for this overestimation. We decided to conduct a meta-analytic study to compare trials on psychotherapy for depression with a WL control group against trials with a care-as-usual (CAU) control group.

Methods

We used an existing meta-analytic database of randomized trials comparing psychological treatments of adult depression with control groups and selected trials using a WL or a CAU control group. We used subgroup and meta-regression analyses to examine differences in effect sizes between WL and CAU controlled trials.

Results

We included 333 randomized controlled trials (472 comparisons; total number participants: 41,480), 141 with a WL and 195 with a CAU control group (3 included both). We found several significant differences between WL and CAU controlled trials (in type of therapy examined, treatment format, recency, target group, recruitment strategy, number of treatment arms and number of depression outcome measures). The overall effect size indicating the difference between treatment and control at post-test for all comparisons was g = 0.77 (95% confidence interval [CI]: 0.71; 0.84) with high heterogeneity (I2 = 84; 95% CI: 82; 85). A highly significant difference was observed between studies with a CAU control group (g = 0.63; 95% CI: 0.55; 0.71; I2 = 85; 95% CI: 83; 86) and studies with a WL (g = 0.95; 95% CI: 0.85; 1.04; I2 = 80; 95% CI: 78; 82; p for difference < 0.001). This difference remained significant in all sensitivity analyses, including a meta-regression analysis in which we adjusted for all differences in characteristics of studies with a WL versus CAU control group. We also found that pre-post effect sizes in WL control conditions (g = 0.37; 95% CI: 0.28; 0.46) were significantly smaller than change within CAU conditions (g = 0.64; 95% CI: 0.50; 0.78). We found few indications that pre-post effect sizes within therapy conditions differed between WL and CAU controlled trials.

Conclusions

WL control conditions considerably overestimate the effect sizes of psychological treatments, compared to trials using CAU control conditions. This overestimation is probably caused by a smaller improvement within the WL condition compared to the improvement in the CAU condition. WL control conditions should be avoided in randomized trials examining psychological treatments of adult depression.

Keywords

cognitive therapy depression randomized controlled trials systematic reviews

Type: Original Article
Information: Epidemiology and Psychiatric Sciences , Volume 33 , 2024 , e56

DOI: https://doi.org/10.1017/S2045796024000611 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press.

Introduction

It is well-established that psychological interventions are effective in the treatment of depression (Cuijpers et al., Reference Cuijpers, Harrer, Miguel, Ciharova and Karyotaki2023a). Several types of therapy have been found to have small to moderate effects on depression, including cognitive behaviour therapy (CBT), interpersonal psychotherapy (IPT), behavioural activation therapy, but also for example psychodynamic therapy and life review therapy (Cuijpers et al., Reference Cuijpers, Quero, Noma, Ciharova, Miguel, Karyotaki, Cipriani, Cristea and Furukawa2021a). The effects of these interventions are typically examined in randomized controlled trials comparing these interventions with waiting lists (WLs), care-as-usual (CAU) or other control conditions such as attention placebo (Cuijpers et al., Reference Cuijpers, Karyotaki, de Wit and Ebert2020). Meta-analyses of these trials typically show that the effect sizes of therapies are larger when they are compared to WL control groups, compared to trials in which the therapies are compared to other control groups (Cuijpers et al., Reference Cuijpers, Cristea, Karyotaki, Reijnders and Huibers2016; Furukawa et al., Reference Furukawa, Noma, Caldwell, Honyashiki, Shinohara, Imai, Chen, Hunot and Churchill2014; Hesser et al., Reference Hesser, Weise, Rief and Andersson2011; Michopoulos et al., Reference Michopoulos, Furukawa, Noma, Kishimoto, Onishi, Ostinelli, Ciharova, Miguel, Karyotaki and Cuijpers2021). WL control groups also have been found to have an inflated effect size in other conditions (e.g., Laws et al., Reference Laws, Pellegrini, Reid, Drummond and Fineberg2022; Young, Reference Young2006; Zhu et al., Reference Zhu, Zhang, Jiang, Li, Cao, Zhou, Zhang and Li2014), although this may vary across conditions (Cunningham et al., Reference Cunningham and McCambridge2013). However, in depression this inflating effect size is well-established. In a large network meta-analysis of psychological treatments of depression we found a standardized mean difference (SMD) of 0.29 (0.14–0.45) of being in a WL control group compared to being in a CAU control group (Cuijpers et al., Reference Cuijpers, Quero, Noma, Ciharova, Miguel, Karyotaki, Cipriani, Cristea and Furukawa2021a), which is comparable to the difference between antidepressants and placebo (SMD = 0.30; Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa, Leucht, Ruhe, Turner, Higgins, Egger, Takeshima, Hayasaka, Imai, Shinohara, Tajika, Ioannidis and Geddes2018).

It is not clear why WLs inflate treatment effect sizes. It has been suggested that patients on WLs actually ‘wait’ to change until they receive the intervention (Miller and Rollnick, Reference Miller and Rollnick2002). This may result in lower ‘spontaneous recovery’ rates in WL conditions. Expectancies have also been suggested as a mediating variable (Cunningham and McCambridge, Reference Cunningham and McCambridge2013), with higher expectancies in people in WL conditions, compared to other control conditions. Furthermore, there is also evidence that patients are disappointed when assigned to control conditions (Lindström et al., Reference Lindström, Sundberg-Petersson, Adami and Tönnesen2010; Skingley et al., Reference Skingley, Bungay, Clift and Warden2014), and this disappointment may be lower in WL conditions compared to other control groups, because these participants do get the intervention after the waiting time.

Overall, there is very little research on WLs in clinical settings (Cunningham and McCambridge, Reference Cunningham and McCambridge2013), and many questions have not yet been answered. To the best of our knowledge no previous meta-analysis has compared the characteristics of trials using WL control conditions to trials with CAU control conditions, while such trials may have different characteristics that could explain the superior effect sizes for WL controlled trials. Also, it is not clear whether the superior effect sizes found in WL controlled trials are caused by lower response rates in the control conditions or by higher response rates in the treatment conditions.

We decided therefore to conduct a meta-analysis with three goals: (1) to compare the characteristics of trials using WL control groups to trials with CAU; (2) to confirm the findings from previous meta-analyses that WL control groups result in larger effect sizes than CAU control groups and examine if this difference remains significant after adjusting for the characteristics of the participants, treatments and studies; (3) to compare response rates and pre-post effect sizes in control and treatment conditions to examine whether the superior effect sizes of WL controlled trials is related to smaller effects in the control conditions or larger effects in the treatment conditions.

Methods

Identification and selection of studies

This study is part of a larger meta-analytic project on psychological treatments of depression (registered at the Open Science Framework; Cuijpers et al., Reference Cuijpers and Karyotaki2022; https://doi.org/10.17605/OSF.IO/825C6). The general methods of the project have been described in a separate paper (Harrer et al., submitted) and supplemental materials are available at the website of the project (http://www.metapsy.org). This database has been used in a series of earlier published meta-analyses (Cuijpers et al., Reference Cuijpers, Miguel, Harrer, Plessen, Ciharova, Papola, Ebert and Karyotaki2023b). The protocol for the current review and meta-analysis has been published at the Open Science Framework (Cuijpers, Reference Cuijpers2023c; https://doi.org/10.17605/OSF.IO/GWPV2).

The studies included in the current study were identified through the larger, already existing database of randomized trials on the psychological treatment of depression. For this database we searched four major bibliographical databases (PubMed, PsycINFO, Embase and the Cochrane Library) by combining index and free terms indicative of depression and psychotherapies, with filters for randomized controlled trials. The full search strings can be found at the website of the project (www.metapsy.org and docs.metapsy.org/databases/depression-psyctr/). Furthermore, we checked the references of earlier meta-analyses on psychological treatments of depression. The database is updated every 4 months and was developed through a comprehensive literature search (from 1966 to 1 May 2023). All records were screened by two independent researchers and all papers that could possibly meet inclusion criteria according to one of the researchers were retrieved as full-text. The decision to include or exclude a study in the database was also done by the two independent researchers, and disagreements were resolved through discussion.

For the current meta-analysis, we selected randomized controlled trials in which a psychological treatment of depression was compared with a WL or a CAU control group. We only included trials in adults. We allowed trials in any treatment format, as long as there was human support available (including individual, group, digital or non-digital guided self-help, telephone). We excluded trials in which no human support was given (Cuijpers et al., Reference Cuijpers, Noma, Karyotaki, Cipriani and Furukawa2019), and studies in inpatients (Cuijpers et al., Reference Cuijpers, Ciharova, Miguel, Harrer, Ebert, Brakemeier and Karyotaki2021b). Depression could be defined as meeting criteria for a depressive disorder according to a diagnostic interview or as a score above the cut-off on a validated self-report depression measure.

Quality assessment and data extraction

We assessed the validity of included studies using four criteria of the ‘Risk of bias’ (RoB) assessment tool, version 1, developed by the Cochrane Collaboration (Higgins et al., Reference Higgins, Altman, Gøtzsche, Jüni, Moher, Oxman, Savović, Schulz, Weeks and Sterne2011). We used version 1 of this tool because this meta-analysis is included in the broader meta-analytic project of psychological treatments of depression (Sterne et al., Reference Sterne, Savović, Page, Elbers, Blencowe, Boutron, Cates, Cheng, Corbett, Eldridge and Emberson2019). The RoB tool assesses possible sources of bias in randomized trials, including the adequate generation of allocation sequence; the concealment of allocation to conditions; the prevention of knowledge of the allocated intervention (masking of assessors); and dealing with incomplete outcome data (this was assessed as positive when intention-to-treat analyses were conducted, meaning that all randomized patients were included in the analyses). We considered trials as having low risk of bias when they scored positive on all four domains. Assessment of the validity of the included studies was conducted by two independent researchers, and disagreements were solved through discussion.

We also coded participant characteristics (diagnostic method for participant inclusion; recruitment method; target group; mean age; the proportion of women); characteristics of the psychological treatments in the experimental conditions (type of therapy (according to the classification developed for this project earlier, Reference Cuijpers, Karyotaki, de Wit and Ebert2020; Cuijpers et al., Reference Cuijpers, van Straten, Andersson and van Oppen2008); treatment format; and number of sessions); and general characteristics of the studies (publication year; the country where the study was conducted; number of experimental conditions in the trial, number of outcome measures). The details of these characteristics can be found at the website of the project (docs.metapsy.org/databases/depression-psyctr).

Outcome measures

For each comparison between a psychological treatment and a control condition, the small-sample bias corrected SMD between the two groups at post-test was calculated (Hedges’ g). When means and standard deviations were not reported, we used change scores, binary outcomes (that were converted to Hedges’ g) or other statistics (e.g., p value, t value) to calculate the effect size. We used one depression measure from each study for the calculation of effect sizes, based on the frequency of the use of the measures.

We also calculated pre-post effect sizes within the treatment and, separately, within the control groups, as the difference between the mean pre-test and the mean post-test score, divided by the standard deviation (SD) of the pre-test. We used the SD of the pre-test to avoid a potential impact of the usual care on the post-test SD (Harrer et al., Reference Harrer, Cuijpers, Furukawa and Ebert2021, chap. 3.3.1.3). We assumed a correlation of $\rho $ = 0.8 between pre- and post-test.

Treatment response was a secondary outcome. It was defined as the number of patients with 50% symptom reduction between baseline and post-test, divided by the total number of patients. Patients randomized but not included in the analyses of responders in the original reports were assumed to be non-responders and were included in the analyses to abide by the intention-to-treat principle. We calculated response rates with a well-validated method, using the baseline mean, and the mean, standard deviation and number of patients at post-test (Furukawa Reference Furukawa, Cipriani, Barbui, Brambilla and Watanabe2005).

Meta-analyses

Differences between characteristics of WL and CAU controlled trials

To compare baseline differences between trials using WL control groups and those with CAU control groups, we conducted bivariable analyses with χ ²-tests for categorical variables and t-tests for continuous variables. For comparing baseline severity across trials in primary and outpatient care, we converted the most common depression measures (Beck Depression Inventory (BDI; Beck et al., Reference Beck, Ward, Mendelson, Mock and Erbaugh1961), the Beck Depression Inventory II (BDI-II; Beck et al., Reference Beck, Steer and Brown1996), Montgomery–Åsberg Depression Rating Scale (MADRS; Williams et al., Reference Williams and Kobak2008), Patient Health Questionnaire-9 (PHQ-9; Kroenke et al., Reference Kroenke, Spitzer and Williams2001), Edinburgh Postnatal Depression Scale (EPDS; Cox et al., Reference Cox, Holden and Sagovsky1987) to the Hamilton Depression Rating Scale-17 (HDRS-17; Hamilton, Reference Hamilton1960), using established conversion methods (Furukawa et al., Reference Furukawa, Reijnders, Kishimoto, Sakata, DeRubeis, Dimidjian, Dozois, Hegerl, Hollon, Jarrett, Lespérance, Segal, Mohr, Simons, Quilty, Reynolds, Gentili, Leucht, Engel and Cuijpers2020; Leucht et al., Reference Leucht, Fennema, Engel, Kaspers-Janssen and Szegedi2018, Wahl et al., Reference Wahl, Löwe, Bjorner, Fischer, Langs, Voderholzer, Aita, Bergemann, Brähler and Rose2014)).

The differences between the effect sizes in trials using WL and CAU control conditions

These analyses were conducted using the ‘metapsyTools’ package in R (version 4.1.1; Harrer et al., Reference Harrer, Kuper and Cuijpers2022) and Rstudio (version 1.1.463 for Mac). The ‘metapsyTools’ package was specifically developed for our meta-analytic project and imports functionality of the ‘meta’ (Balduzzi et al., Reference Balduzzi, Rücker and Schwarzer2019), ‘metafor’ (Viechtbauer, Reference Viechtbauer and Cheung2010) and ‘dmetar’ (Harrer et al., Reference Harrer, Cuijpers, Furukawa and Ebert2019) packages.

We first pooled the effect sizes of all trials (indicating the difference between treatment and control conditions at post-test) using a random effects model. Between-study heterogeneity variance (components) ${\tau ^2}$ were estimated using restricted maximum likelihood (REML). We applied the Knapp–Hartung method to obtain robust confidence intervals (CIs) and significance tests of the overall effect (IntHout et al., Reference IntHout, Ioannidis and Borm2014). We calculated the I ²-statistic and its 95% CI, which is an indicator of heterogeneity in percentages (Higgins et al., Reference Higgins, Thompson, Deeks and Altman2003).

We conducted several sensitivity analyses. First, we pooled effects while excluding outliers, using the ‘non-overlapping confidence intervals’ approach (Harrer et al., Reference Harrer, Cuijpers, Furukawa and Ebert2021). Second, we estimated the pooled effect using only studies with low risk of bias. We also used Duval and Tweedie’s trim and fill procedure (Duval and Tweedie, Reference Duval and Tweedie2000) to adjust for potential publication bias.

We tested the difference between WL and CAU controlled trials in a subgroup analysis, using a mixed effects model with group-specific ${\tau ^2}$ estimates. We also conducted a meta-regression analysis with the effect size as the dependent variable. As predictors we entered a dummy variable for the type of control group (WL vs CAU) and the characteristics of the studies that were found to significantly differ between WL and CAU conditions.

We also compared the pre-post effect sizes within the control conditions, as well as the effect sizes within the treatment conditions. We examined the difference between WL and CAU control conditions with a subgroup analyses, and ran meta-regression analyses with the same predictors as the previous meta-regression analyses described.

Differences in response rates between conditions

We first pooled rates for response, separately for the treatment and the control conditions, across all included trials using the ‘meta’ package in R (version 3.6.3). We synthesized the binary outcome data using a normal-normal random-effects pooling models after performing a logit transformation of the response rates. The summary results were then re-converted to the raw proportion scale, and the estimates and their 95% CIs are presented. The between-study heterogeneity variance was approximated using the REML estimator.

We conducted several sensitivity analyses, one in which we excluded outliers, one in which we only examined studies with low risk of bias and a third in which we adjusted for small study effects through the Duval and Tweedie trim-and-fill procedure. Differences between studies with WL and CAU control conditions were examined with subgroup analyses and meta-regression analyses.

Results

Selection and inclusion of studies

After examining a total of 33,967 records (23,896 after the removal of duplicates), we retrieved 4,119 full-text papers for further consideration of which 3,786 were excluded. The PRISMA flowchart describing the inclusion process is presented in Figure 1. A total of 333 randomized controlled trials (472 comparisons between therapy and control conditions) were included. The trials included 41,480 participants (18,104 in the control conditions and 23,376 in the treatment conditions). The references of the 333 trials are given in Appendix A and a summary of key characteristics in Appendix B.

Figure 1. Flowchart of the inclusion of studies.

Characteristics of included studies

The aggregated characteristics of the included studies are presented in Table 1. The majority of studies included an arm with CBT (55% of the studies) and none of the other therapies were examined in more than 9% of the trials. Overall, 38% of the studies examined individual therapies, 32% group therapies, 5% telephone therapy, 22% guided self-help and 9% used a mixed or other format. The mean number of sessions across all interventions was 8.68 (SD = 4.59).

Table 1. Selected characteristics of randomized trials comparing psychotherapies with waiting list (WL) and care-as-usual (CAU) control groups^a

BAT: behavioural activation therapy; CAU: care-as-usual; CBT: cognitive behaviour therapy; IPT: interpersonal psychotherapy; M: means; SD: standard deviation; WL: waiting list.

^a There were 3 studies in which both a WL and CAU control group was included; these studies were excluded from these analyses.

^b This row includes the 3 studies with both a WL and CAU control group.

^c At least one intervention arm includes this characteristic, which means that one trial can include more than one type of therapy or format.

^d One of the cells has less than 5 studies, indicating that the p value may not be correct.

Most studies (79%) had two arms and the other 21% had three or more arms. Most studies included one depression outcome measure (73%), and 36% were rated as low risk of bias across all four domains. Most studies were conducted in Europe (38%), North America (30.0%) and East Asia (11%). The mean publication year was 2012 (SD = 9.75).

Most studies used a cut-off on a depression rating scale as inclusion criterion (56%), while the rest required a diagnosis according to a clinical interview for participation. Most studies were aimed at adults in general (37%), general medical patients (23%), women with perinatal depression (14%), older adults (8%) or college students (6%).

Participants were recruited through the community (37%), clinical referrals only (24%) or other sources (38%). The mean age across all samples was 43.21 (SD = 16.23), the mean proportion of women 0.74 (SD = 0.22) and the mean baseline severity score on the HDRS-17 was 17.22 (SD = 3.72).

Differences between WL and CAU controlled trials

We found several significant differences between studies using a WL and CAU control group. Studies with WL examined less often IPT (p = 0.02), and more often a third wave therapy (p = 0.02). The WL studies examined less often an individual (p < 0.001) or telephone-delivered format (p = 0.02) and more often a guided self-help format (p < 0.001). Studies with WL were older (mean publication year: 2012) compared to studies with CAU (mean: 2014; p < 0.001), the target group differed significantly (more often aimed at adults in general and less often on perinatal depression; p < 0.001) and used different recruitment strategies (more often recruitment from the community; p < 0.001). WL controlled trials also had more often three or more treatment arms than trials with CAU (p < 0.001) and used only one depression outcome measure less often (p = 0.006).

Effect sizes of treatment versus control in studies with CAU and WL

The overall effect size of all 472 comparisons between therapy and control (either CAU or WL) was g = 0.77 (95% CI: 0.71; 0.84) with high heterogeneity (I ² = 84; 95% CI: 82; 85; Table 2). This was somewhat smaller when outliers were excluded (g = 0.70), smaller when the analyses were limited to studies with low risk of bias (g = 0.62) and considerably smaller after adjustment for publication bias (g = 0.43).

Table 2. Effect sizes of treatment versus control, pre- to post within control conditions and pre- to post within therapy conditions across WL and CAU controlled trials

95% CI: 95% confidence interval; adj: adjusted; CAU: care-as-usual; g: standardized mean difference (Hedges’ g); k: number of studies; publ. bias: publication bias; RoB: risk of bias; WL: waiting list.

^a The effect sizes for the two subgroups were calculated separately for each subgroup, and the significance of the difference between the two subgroups were calculated afterward.

* p-value is significant.

Subgroup analyses indicated that WL controlled studies (g = 0.95; 95% CI: 0.85; 1.04; I ² = 80; 95% CI: 78; 82), had significantly larger effects than studies with CAU (g = 0.63; 95% CI: 0.55; 0.71; I ² = 85; 95% CI: 83; 86; p for difference < 0.001). This difference between WL and CAU controlled trials remained highly significant in all sensitivity analyses, including when outliers were excluded (p < 0.001), when limited to studies with low risk of bias (p = 0.009) and after adjustment for publication bias (p = 0.004).

We conducted a multivariable meta-regression with a dummy variable indicating whether a study used a CAU or WL control group as predictor, as well as all characteristics of the interventions, the patients and the studies that differed significantly between studies with a CAU and a WL control condition (Table 3). The dummy variable indicating type of control group remained significantly associated with the effect size (p < 0.001).

Table 3. Meta-regression analyses of effect sizes of treatment versus control, pre- to post within control conditions and pre- to post within therapy conditions

CAU: care-as-usual; Coeff: coefficient; IPT: interpersonal psychotherapy; SE: standard error; WL: waiting list.

*p-value is significant.

Pre-post effect sizes within control conditions

We could calculate pre-post effect sizes for the CAU and WL control conditions in 280 trials (152 CAU and 128 WL). The overall pre-post effect size across all studies was g = 0.51 (95% CI: 0.43; 0.60; I ² = 95; 95% CI: 94; 95). The pre-post effect size within the CAU conditions (g = 0.64; 95% CI: 0.50; 0.78; I ² = 96; 95% CI: 96; 96) was significantly larger than the one within the WL conditions (g = 0.37; 95% CI: 0.28; 0.46; I ² = 91; 95% CI: 89; 92; p for difference: 0.001). This difference remained significant after excluding outliers (p < 0.001) and after limiting the studies to those with low risk of bias (p = 0.01), but not after adjustment for publication bias (p = 0.25).

We conducted another multivariable meta-regression with the dummy variable for type of control group as well as all characteristics of the interventions, the patients and the studies that differed significantly between studies with a CAU and a WL control condition as predictors (Table 3). We found that the dummy variable indicating the type of control group remained significantly associated with the effect size (p = 0.01).

Pre-post effect sizes within the therapy conditions

The overall pre-post effect size within the 407 therapy conditions was g = 1.48 (95% CI: 1.40; 1.57; I ² = 93; 95% CI: 93; 94; Table 2). We found no significant difference between the effect sizes of the trials with a CAU control group (g = 1.44; 95% CI: 1.30; 1.58; I ² = 95; 95% CI: 95; 96) and those with a WL control group (g = 1.52; 95% CI: 1.41; 1.62; I ² = 88; 95% CI: 86; 89; p for difference: 0.38). None of the sensitivity analyses indicated a significant difference between the two groups of studies (p’s from 0.06 to 0.81). The multivariable meta-regression with the pre-post effect size within the therapy conditions as the dependent variable, with the same predictors as the previous meta-regression analyses, also did not find that the dummy variable for type of control group was significant (p = 0.06; Table 3).

Response rates

The overall response rate within control conditions was 0.17 (95% CI: 0.16; 0.19; I ² = 73; 95% CI: 69; 76; Appendix C). The response rate in the trials with a CAU control group (0.19; 95% CI: 0.17; 0.21; I ² = 78; 95% CI: 74; 81) was significantly larger than in the trials with a WL control group (0.16; 95% CI: 0.14; 0.18; I ² = 55; 95% CI: 45; 63; p for difference: <0.001). In the sensitivity analyses this higher response rate for CAU was confirmed (Appendix C).

The overall response rate within therapy conditions was 0.39 (95% CI: 0.37; 0.42; I ² = 83; 95% CI: 81; 84). There was no significant difference for the response rate in trials with a CAU control group (0.36; 95% CI: 033; 0.40; I ² = 88; 95% CI: 86; 89) and the trials with a WL control group (0.42; 95% CI: 0.39; 0.45; I ² = 71; 95% CI: 67; 75; p for difference: 0.27). The sensitivity analyses also supported these findings, except that in the studies with low risk of bias, the response rate was higher for trials with a CAU control group (p = 0.03).

We conducted a multivariable meta-regression analysis of the response rates within the control groups, in which we included a dummy variable indicating whether a study had a WL or CAU condition, as well as the other characteristics of the studies. This analysis supported the finding that trials with a WL control group resulted in lower response rates (Appendix D; p = 0.02). We also conducted a multivariable meta-regression analysis of the response rates within the treatment groups, with the same predictors (Appendix D). These analyses indicated that the rates were lower in the trials with a CAU control group compared with the WL controlled trials (p = 0.01). Because of the large number of tests and the relatively high p-value these results should be interpreted with caution.

Discussion

We explored the difference between WL and CAU control conditions in randomized trials of psychological treatments of depression. We could confirm that WL controlled trials have larger effect sizes compared to trials with a CAU control group. This difference remained highly significant in all sensitivity analyses, as well as in a meta-regression analysis in which we adjusted for the differences in characteristics of the trials with WL and CAU control groups. This finding is in line with previous research showing that WL control groups substantially overestimate the effects of treatments, when compared to other control groups, such as CAU or placebo (Cuijpers et al., Reference Cuijpers, Cristea, Karyotaki, Reijnders and Huibers2016, Reference Cuijpers, Quero, Papola, Cristea and Karyotaki2021c; Furukawa et al., Reference Furukawa, Noma, Caldwell, Honyashiki, Shinohara, Imai, Chen, Hunot and Churchill2014; Hesser et al., Reference Hesser, Weise, Rief and Andersson2011).

We also examined whether this difference between WL and CAU controlled trials was caused by a smaller change within the control group or a larger change within the treatment groups. We found that the change within the WL conditions was significantly smaller than the change within the CAU conditions. This remained significant in almost all sensitivity analyses, and also in the meta-regression analysis in which we adjusted for the differences in characteristics of the trials with WL and CAU conditions.

This finding suggests that the hypothesis is true that patients on WL actually ‘wait’ to change until they receive the intervention (Miller and Rollnick, Reference Miller and Rollnick2002), while patients in the CAU conditions try to change their problems more actively. It is also possible that patients on WLs are less disappointed compared to patients assigned to CAU (Lindström et al., Reference Lindström, Sundberg-Petersson, Adami and Tönnesen2010; Skingley et al., Reference Skingley, Bungay, Clift and Warden2014), because these patients will get the intervention, but only have to wait for some time.

We found no evidence that the patients in the treatment group improve more or less in the WL versus CAU controlled trials when they receive the treatment. Some sensitivity analyses did indicate differential effects, but these were not consistent and were probably the result of chance and the very high level of heterogeneity. This finding suggests that the difference between WL and CAU controlled trials is mostly caused by the difference in change within the control conditions.

We also found several significant differences in characteristics between the two groups of trials, including differences in type of therapy examined, treatment format, recency, target group, recruitment strategy, number of treatment arms and number of outcome measures. However, these differences could not explain the larger effect sizes of WL controlled trials.

It is difficult to understand the exact reasons for these differences, but it is important to conclude that such significant differences do exist, because this may point at different research questions that are examined in WL controlled trials. Recently, there have been attempts to develop models of between-comparator variability in psychological treatments that allow to re-estimate effects as if all interventions have been evaluated against the same standardized comparator (Glasziou and Zwar, Reference Glasziou and Zwar2023; Kraiss et al., Reference Kraiss, Viechtbauer, Black, Johnston, Hartmann‐Boyce, Eisma, Javornik, Bricca, Michie, West and de Bruin2023). Such fine-grained approaches, albeit time-consuming to implement, could also be investigated in psychotherapy research to further illuminate the causes of effect differences between and within comparators.

Control conditions are an essential element in evidence-based mental health research. Unfortunately, there is no optimal control condition when examining the effects of psychological interventions (Cuijpers, Reference Cuijpers2024). All types of control groups, including CAU, psychological placebo, psychoeducation and WL have their own weaknesses and problems (Cuijpers, Reference Cuijpers2024; Harrer et al., Reference Harrer, Cuijpers, Schuurmans, Kaiser, Buntrock, van Straten and Ebert2023). However, WL control groups should be avoided in randomized trials, because they clearly inflate the effect size by artificially reducing the change within the WL control group.

This study has several strengths. First, it is a meta-analysis of a large sample of trials. It is also the first to examine differences in characteristics of WL controlled trials with trials using another control group (CAU). To the best of our knowledge, this is also the first meta-analysis to examine change scores within control and treatment conditions in the context of this research question.

However, this study has also several important limitations that have to be taken into account when interpreting the outcomes. First, only 36% of the included studies was assessed as having low risk of bias. The main outcomes were confirmed when the analyses were limited to the studies with low risk of bias. However, the low quality of the full set of studies remains an important limitation of this meta-analysis. Second, we could only examine differences between the characteristics of trials using WL and CAU conditions that were available in our database. For example, we did not examine differences between the two groups of studies in terms of medication use. On the other hand, we found in a recent meta-analysis that the use of antidepressants is not related to the effects of psychotherapy (Cuijpers et al., Reference Cuijpers, Miguel, Harrer, Ciharova and Karyotaki2023d).

A third important limitation of our study is that heterogeneity was high to very high in most analyses, both for the WL and CAU control conditions. Although this is typically found when examining pre-post effect sizes and response rates, it still means that the effect sizes found in the included studies varied considerably and that we could not explain these differences through subgroup and meta-regression analyses. Future research should focus more on potential causes of this heterogeneity, as was recently done in an elegant meta-analysis in which different levels of intensity of CAU (Munder et al., Reference Munder, Geisshüsler, Krieger, Zimmermann, Wolf, Berger and Watzke2022). Such future research should focus on how the intensity of CAU can best be estimated and defined, but it should also take into consideration the setting in which CAU is delivered (primary, secondary, general medical, perinatal care; Cuijpers et al., Reference Cuijpers, Quero, Papola, Cristea and Karyotaki2021c), and for example whether and how CAU was ‘enhanced’. Such more fine-grained research may also result in a better understanding of why effect sizes are overestimated in WL controlled trials compared to other control groups.

Despite these limitations, we can conclude that trials with WL control conditions considerably overestimate the effect sizes of psychological treatments, compared to trials using CAU control conditions, and that the overestimation of effect sizes is probably caused by a smaller improvement within the WL condition compared to the improvement in the CAU condition.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S2045796024000611.

Availability of data and materials

Most data are available in the supplemental materials, all other information about the datasets and individual studies can be found at the website of the project www.metapsy.org. For questions and additional information, the first author can be contacted.

Financial support

No financial support was received for this study.

Competing interests

All authors report no conflict of interest.

Ethical standards

No humans were involved in this study.

References

Balduzzi, S, Rücker, G and Schwarzer, G (2019) How to perform a meta-analysis with R: A practical tutorial. Evidence Based Mental Health 22, 153–160.CrossRef Google Scholar

Beck, AT, Steer, RA and Brown, GK (1996) BDI-II. Beck Depression Inventory Second Edition: Manual. San Antonio: Psychological Corporation.Google Scholar

Beck, AT, Ward, CH, Mendelson, M, Mock, J and Erbaugh, J (1961) An inventory for measuring depression. Archives of General Psychiatry 4, 561–571.CrossRef Google Scholar PubMed

Cipriani, A, Furukawa, TA, Salanti, G, Chaimani, A, Atkinson, LZ, Ogawa, Y, Leucht, S, Ruhe, HG, Turner, EH, Higgins, JPT, Egger, M, Takeshima, N, Hayasaka, Y, Imai, H, Shinohara, K, Tajika, A, Ioannidis, JPA and Geddes, JR (2018) Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: A systematic review and network meta-analysis. Lancet 391, 1357–1366.CrossRef Google Scholar PubMed

Cox, JL, Holden, JM and Sagovsky, R (1987) Detection of postnatal depression: Development of the 10-item Edinburgh Postnatal Depression Scale. British Journal of Psychiatry 150, 782–786.CrossRef Google Scholar PubMed

Cuijpers, P (2023c) The overestimation of the effects of psychotherapies for depression in waitlist controlled trials: A meta-analytic comparison with usual care controlled trials.Google Scholar

Cuijpers, P (2024) Has the time come to stop using control groups in trials of psychosocial interventions? World Psychiatry: Official Journal of the World Psychiatric Association (WPA) in pressGoogle Scholar

Cuijpers, P, Ciharova, M, Miguel, C, Harrer, M, Ebert, DD, Brakemeier, EL and Karyotaki, E (2021b) Psychological treatment of depression in institutional settings: A meta-analytic review. Journal of Affective Disorders 286, 340–350.CrossRef Google Scholar PubMed

Cuijpers, P, Cristea, IA, Karyotaki, E, Reijnders, M and Huibers, MJH (2016) How effective are cognitive behavior therapies for major depression and anxiety disorders? A meta-analytic update of the evidence. World Psychiatry: Official Journal of the World Psychiatric Association (WPA) 15, 245–258.CrossRef Google Scholar PubMed

Cuijpers, P, Harrer, M, Miguel, C, Ciharova, M and Karyotaki, E (2023a) Five decades of research on psychological treatments of depression: A historical and meta-analytic overview. American Psychologist epub ahead of print.CrossRef Google Scholar

Cuijpers, P and Karyotaki, E (2022) A meta-analytic database of randomised trials on psychotherapies for depression.Google Scholar

Cuijpers, P, Karyotaki, E, de Wit, L and Ebert, DD (2020) The effects of fifteen evidence-supported therapies for adult depression: A meta-analytic review. Psychotherapy Research 30, 279–293.CrossRef Google Scholar PubMed

Cuijpers, P, Miguel, C, Harrer, M, Ciharova, M and Karyotaki, E (2023d) Does the use of pharmacotherapy interact with the effects of psychotherapy? A meta-analytic review. European Psychiatry 66, .CrossRef Google Scholar PubMed

Cuijpers, P, Miguel, C, Harrer, M, Plessen, CY, Ciharova, M, Papola, D, Ebert, D and Karyotaki, E (2023b) Psychological treatment of depression: A systematic overview of a ‘Meta-Analytic Research Domain’. Journal of Affective Disorders 335, 141–151.CrossRef Google Scholar PubMed

Cuijpers, P, Noma, H, Karyotaki, E, Cipriani, A and Furukawa, T (2019) Individual, group, telephone, self-help and internet-based cognitive behavior therapy for adult depression; A network meta-analysis of delivery methods. JAMA Psychiatry 76, 700–707.CrossRef Google Scholar

Cuijpers, P, Quero, S, Noma, H, Ciharova, M, Miguel, C, Karyotaki, E, Cipriani, A, Cristea, I and Furukawa, TA (2021a) Psychotherapies for depression: A network meta-analysis covering efficacy, acceptability and long-term outcomes of all main treatment types. World Psychiatry: Official Journal of the World Psychiatric Association (WPA) 20, 283–293.CrossRef Google Scholar PubMed

Cuijpers, P, Quero, S, Papola, D, Cristea, IA and Karyotaki, E (2021c) Care-as-usual control groups across different settings in randomised trials on psychotherapy for adult depression: A meta-analysis. Psychological Medicine 51, 634–644.CrossRef Google Scholar PubMed

Cuijpers, P, van Straten, A, Andersson, G and van Oppen, P (2008) Psychotherapy for depression in adults: A meta-analysis of comparative outcome studies. Journal of Consulting and Clinical Psychology 76, 909–922.CrossRef Google Scholar PubMed

Cunningham, KK and McCambridge, J (2013) Exploratory randomized controlled trial evaluating the impact of a waiting list control design. BMC Medical Research Methodology 13, 1–7.CrossRef Google Scholar PubMed

Duval, S and Tweedie, R (2000) Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56, 455–463.CrossRef Google Scholar PubMed

Furukawa, TA, Cipriani, A, Barbui, C, Brambilla, P and Watanabe, N (2005) Imputing response rates from means and standard deviations in meta-analyses. International Clinical Psychopharmacology 20, 49–52.CrossRef Google Scholar PubMed

Furukawa, TA, Noma, H, Caldwell, DM, Honyashiki, M, Shinohara, K, Imai, H, Chen, P, Hunot, V and Churchill, R (2014) Waiting list may be a nocebo condition in psychotherapy trials: A contribution from network meta‐analysis. Acta Psychiatrica Scandinavica 130, 181–192.CrossRef Google Scholar PubMed

Furukawa, TA, Reijnders, M, Kishimoto, S, Sakata, M, DeRubeis, RJ, Dimidjian, S, Dozois, DJ, Hegerl, U, Hollon, SD, Jarrett, RB, Lespérance, F, Segal, ZV, Mohr, DC, Simons, AD, Quilty, LC, Reynolds, CF, Gentili, C, Leucht, S, Engel, RR and Cuijpers, P (2020) Translating the BDI and BDI-II into the HAMD and vice versa with equipercentile linking. Epidemiology and Psychiatric Sciences 29, .CrossRef Google Scholar

Glasziou, PP and Zwar, NA (2023) Commentary on Kraiss et al.: Read the label – Improving the applicability of systematic reviews by coding and analysis of intervention elements. Addiction 118, 1851–1852.CrossRef Google Scholar PubMed

Hamilton, M (1960) A rating scale for depression. Journal of Neurology, Neurosurgery & Psychiatry 23, 56–62.CrossRef Google Scholar PubMed

Harrer, M, Cuijpers, P, Furukawa, TA and Ebert, DD (2021). Doing Meta-Analysis with R: A Hands-On Guide. Boca Raton, FL and London: Chapman & Hall/CRC Press.CrossRef Google Scholar

Harrer, M, Cuijpers, P, Furukawa, T and Ebert, DD (2019). dmetar: Companion R package for the guide ‘Doing Meta-Analysis in R’. R package version 0.1.0. http://dmetar.protectlab.org/ (accessed 15 October 2024).Google Scholar

Harrer, M, Cuijpers, P, Schuurmans, LK, Kaiser, T, Buntrock, C, van Straten, A and Ebert, D (2023) Evaluation of randomized controlled trials: A primer and tutorial for mental health researchers. Trials 24, .CrossRef Google Scholar PubMed

Harrer, M, Kuper, P and Cuijpers, P (2022). metapsyTools: Several R helper functions for the “metapsy” database. R package version 0.3.2, 2022. https://tools.metapsy.org (accessed 15 October 2024).Google Scholar

Hesser, H, Weise, C, Rief, W and Andersson, G (2011) The effect of waiting: A meta-analysis of wait-list control groups in trials for tinnitus distress. Journal of Psychosomatic Research 70, 378–384.CrossRef Google Scholar PubMed

Higgins, JP, Altman, DG, Gøtzsche, PC, Jüni, P, Moher, D, Oxman, AD, Savović, J, Schulz, KF, Weeks, L and Sterne, JA (2011) The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ 343, .CrossRef Google Scholar PubMed

Higgins, JPT, Thompson, SG, Deeks, JJ and Altman, DG (2003) Measuring inconsistency in meta-analyses. BMJ 327, 557–560.CrossRef Google Scholar PubMed

IntHout, J, Ioannidis, JP and Borm, GF (2014) The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Medical Research Methodology 14, .CrossRef Google Scholar PubMed

Kraiss, J, Viechtbauer, W, Black, N, Johnston, M, Hartmann‐Boyce, J, Eisma, M, Javornik, N, Bricca, A, Michie, S, West, R and de Bruin, M (2023) Estimating the true effectiveness of smoking cessation interventions under variable comparator conditions: A systematic review and meta‐regression. Addiction 118, 1835–1850.CrossRef Google Scholar PubMed

Kroenke, K, Spitzer, RL and Williams, JB (2001) The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine 16, 606–613.CrossRef Google Scholar PubMed

Laws, KR, Pellegrini, L, Reid, JE, Drummond, LM and Fineberg, NA (2022) The inflating impact of waiting-list controls on effect size estimates. Frontiers in Psychiatry 13, .CrossRef Google Scholar PubMed

Leucht, S, Fennema, H, Engel, RR, Kaspers-Janssen, M and Szegedi, A (2018) Translating the HAM-D into the MADRS and vice versa with equipercentile linking. Journal of affective Disorders 226, 326–331.CrossRef Google Scholar PubMed

Lindström, D, Sundberg-Petersson, I, Adami, J and Tönnesen, H (2010) Disappointment and drop-out rate after being allocated to control group in a smoking cessation trial. Contemporary Clinical Trials 31, 22–26.CrossRef Google Scholar

Michopoulos, I, Furukawa, TA, Noma, H, Kishimoto, S, Onishi, A, Ostinelli, EG, Ciharova, M, Miguel, C, Karyotaki, E and Cuijpers, P (2021) Different control conditions can produce different effect estimates in psychotherapy trials for depression. Journal of Clinical Epidemiology 132, 59–70.CrossRef Google Scholar PubMed

Miller, WR and Rollnick, S (2002) Motivational Interviewing: Preparing People to Change Addictive Behavior, 2nd edition. New York, NY: Guilford Press.Google Scholar

Munder, T, Geisshüsler, A, Krieger, T, Zimmermann, J, Wolf, M, Berger, T and Watzke, B (2022) Intensity of treatment as usual and its impact on the effects of face-to-face and internet-based psychotherapy for depression: A preregistered meta-analysis of randomized controlled trials. Psychotherapy and Psychosomatics 91, 200–209.CrossRef Google Scholar PubMed

Skingley, A, Bungay, H, Clift, S and Warden, J (2014) Experiences of being a control group: Lessons from a UK-based randomized controlled trial of group singing as a health promotion initiative for older people. Health Promotion International 29, 751–758.CrossRef Google Scholar

Sterne, JA, Savović, J, Page, MJ, Elbers, RG, Blencowe, NS, Boutron, I, Cates, CJ, Cheng, HY, Corbett, MS, Eldridge, SM and Emberson, JR (2019) RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ 366, .Google Scholar PubMed

Viechtbauer, W and Cheung, MWL (2010) Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods 1, 112–125.CrossRef Google Scholar PubMed

Wahl, I, Löwe, B, Bjorner, JB, Fischer, F, Langs, G, Voderholzer, U, Aita, SA, Bergemann, N, Brähler, E and Rose, M (2014) Standardization of depression measurement: A common metric was developed for 11 self-report depression measures. Journal of Clinical Epidemiology 67, 73–86.CrossRef Google Scholar PubMed

Williams, JBW and Kobak, KA (2008) Development and reliability of a structured interview guide for the Montgomery-Asberg Depression Rating Scale. British Journal of Psychiatry 192, 52–58.CrossRef Google Scholar PubMed

Young, C (2006) What happens when people wait for therapy? Assessing the clinical significance of the changes observed over the waiting period for clients referred to a primary care psychology service. Primary Care and Mental Health 4, 113–119.Google Scholar

Zhu, Z, Zhang, L, Jiang, J, Li, W, Cao, X, Zhou, Z, Zhang, T and Li, C (2014) Comparison of psychological placebo and waiting list control conditions in the assessment of cognitive behavioral therapy for the treatment of generalized anxiety disorder: A meta-analysis. Shanghai Archives of Psychiatry 26, 319–331.Google Scholar PubMed

Figure 1. Flowchart of the inclusion of studies.

Table 1. Selected characteristics of randomized trials comparing psychotherapies with waiting list (WL) and care-as-usual (CAU) control groupsa

Table 2. Effect sizes of treatment versus control, pre- to post within control conditions and pre- to post within therapy conditions across WL and CAU controlled trials

Table 3. Meta-regression analyses of effect sizes of treatment versus control, pre- to post within control conditions and pre- to post within therapy conditions

Cuijpers et al. supplementary material

File 254.4 KB

Article contents

The overestimation of the effect sizes of psychotherapies for depression in waitlist controlled trials: a meta-analytic comparison with usual care controlled trials

Abstract

Keywords

Introduction

Methods

Identification and selection of studies

Quality assessment and data extraction

Outcome measures

Meta-analyses

Differences between characteristics of WL and CAU controlled trials

The differences between the effect sizes in trials using WL and CAU control conditions

Differences in response rates between conditions

Results

Selection and inclusion of studies

Characteristics of included studies

Differences between WL and CAU controlled trials

Effect sizes of treatment versus control in studies with CAU and WL

Pre-post effect sizes within control conditions

Pre-post effect sizes within the therapy conditions

Response rates

Discussion

Supplementary material

Availability of data and materials

Financial support

Competing interests

Ethical standards

References

Cuijpers et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests