When trying to replicate some results of our meta-analysis, Reference Leichsenring and Rabung1 Kliem and colleagues reported some methodological discrepancies. Reference Hedges and Vevea2 These discrepancies, however, are due to modifications in their statistical approach as compared with the one we originally reported.
First, in contrast to our results, Reference Leichsenring and Rabung1 Kliem et al reported significant heterogeneity between studies for overall outcome as indicated by the Q statistic. As stated in our meta-analysis, we had aggregated the effect size estimates across studies, adopting a random effects model, which is more appropriate than a fixed effects model if the aim is to make inferences beyond the observed sample of studies. Reference Leichsenring and Rabung1,Reference Hedges and Vevea2 Applying a random effects model, the aggregated effect size for overall outcome was 0.54, and heterogeneity was not significant (Q = 11.72, P = 0.23, I Reference Hedges and Vevea2 = 23). Thus, there was no need for an additional outlier analyses or for the exclusion of any study. As Rosenthal's fail-safe N was 66, which is above 60 (5K + 10), the effect can be regarded as robust. Kliem et al, however, apparently applied the fixed effects model to test for heterogeneity. The use of a fixed effects model, however, addresses another research question and consequently yields different results.
Furthermore, Kliem et al reported a larger confidence interval for the overall effect size. The confidence interval that they calculated corresponds well to the respective interval that we reported in Table 1, which was 0.26–0.83. Reference Leichsenring and Rabung1 A narrower confidence interval, however, was erroneously reported by us in the forest plot owing to a transcription error (Fig. 2, 95% CI 0.41–0.67) Reference Leichsenring and Rabung1 – see correction.
Second, after excluding the study by Bateman & Fonagy, Reference Bateman and Fonagy3 which they regarded as an outlier, Kliem et al reported a fail-safe N of 16 unpublished studies with an effect size of 0, which would need to be added in the meta-analysis to change the result from significant to non-significant. However, we could not replicate these findings. After excluding the study by Bateman & Fonagy, we found a fail-safe N of 69, which is above 55 (5K + 10), again indicating that the effect is robust. Apparently, Kliem et al erroneously did not calculate a fail-safe N according to Rosenthal but according to Orwin's method. Reference Orwin4 Consequently, they did not assess how many studies with ES = 0 had to be included to change the result from significant to non-significant but to ‘not significantly different from 0.16’ – an irrelevant result.
Third, results of Bayesian meta-analyses depend largely on the specification of prior assumptions on the treatment effect and between-trial variance. Since Kliem et al did not provide any information about the assumptions of their analyses, it is impossible to interpret the presented result reasonably.
In summary, we could not confirm the discrepancies reported by Kliem et al. We did not find substantial heterogeneity or any cogent indication of publication bias. The effect in favour of long-term psychodynamic psychotherapy was confirmed as robust. Instead, we could show that most of those ‘discrepancies’ seem to be based on differing methodological approaches.
eLetters
No eLetters have been published for this article.