Equivalence and non-inferiority testing in psychotherapy research

Falk Leichsenring; Allan Abbass; Ellen Driessen; Mark Hilsenroth; Patrick Luyten; Sven Rabung; Christiane Steinert

doi:10.1017/S0033291718001289

Equivalence and non-inferiority testing in psychotherapy research

Published online by Cambridge University Press: 11 May 2018

Sven Rabung and

Falk Leichsenring*: Affiliation:
Department of Psychosomatics and Psychotherapy, Justus-Liebig-University Giessen, Ludwigstr 76, D-35392 Giessen, Germany
Allan Abbass: Affiliation:
Department of Psychiatry, Dalhousie University, Centre for Emotions and Health, Halifax 8203-5909 Veterans Memorial Lane, Halifax, NS B3H 2E2, Canada
Ellen Driessen: Affiliation:
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Van der Boechorststraat 1, 1081 BT Amsterdam, The Netherlands
Mark Hilsenroth: Affiliation:
Derner School of Psychology, Adelphi University, Hy Weinberg Center, 1 South Avenue, Garden City, NY 11530-0701, USA
Patrick Luyten: Affiliation:
Faculty of Psychology and Educational Sciences, University of Leuven, Klinische Psychologie (OE), Tiensestraat 102 – bus 3722, 3000 Leuven, Belgium Research Department of Clinical, Educational and Health Psychology, University College London, Gower Street, London WC1E 6BT, UK
Sven Rabung: Affiliation:
Department of Psychology, Alpen-Adria-Universität Klagenfurt, Universitätsstr, 65-67, A-9020 Klagenfurt, Austria
Christiane Steinert: Affiliation:
Department of Psychosomatics and Psychotherapy, Justus-Liebig-University Giessen, Ludwigstr 76, D-35392 Giessen, Germany Department of Psychology, MSB Medical School Berlin, Calandrellistr. 1-9, 12247 Berlin, Germany
*: Author for correspondence: Falk Leichsenring, E-mail: [email protected]

Article contents

Abstract
Equivalence and non-inferiority margins
Statistical hypotheses in equivalence and non-inferiority testing
Equivalence v. non-inferiority testing
Assay sensitivity and constancy of study conditions
Conclusions
References

Rights & Permissions

Abstract

An abstract is not available for this content. As you have access to this content, full HTML content is provided on this page. A PDF of this content is also available in through the ‘Save PDF’ action button.

Type: Correspondence
Information: Psychological Medicine , Volume 48 , Issue 11 , August 2018 , pp. 1917 - 1919

DOI: https://doi.org/10.1017/S0033291718001289 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

With more than 100 non-inferiority or equivalence trials published per year in many areas of research (Piaggio et al., Reference Piaggio2012), statistical and methodological issues involved in these trials become increasingly important. A recent article by Rief and Hofmann (Reference Rief and Hofmann2018) suggests, however, that some of these issues are not sufficiently clear. For this reason, central issues will be discussed here and some misunderstandings will be addressed.

Equivalence and non-inferiority margins

For defining a non-inferiority or equivalence margin (i.e. the minimum difference important enough to make treatments non-equivalent), no generally accepted standards exist. In 332 equivalence or non-inferiority medical trials, a median margin of 0.50 standard deviations was found (Lange and Freitag, Reference Lange and Freitag2005), corresponding quite well to the value of 0.42 reported by Gladstone and Vach (Reference Gladstone and Vach2014). Only five studies used margins < 0.25 (Gladstone and Vach, Reference Gladstone and Vach2014) and only 12% of studies margins ⩽0.25 (Lange and Freitag, Reference Lange and Freitag2005).

In psychotherapy research, margins ranging from 0.24 to 0.60 have been proposed (e.g. Steinert et al., Reference Steinert2017, p. 944). In a meta-analysis of psychodynamic therapy (PDT) including different mental disorders, Steinert et al. (Reference Steinert2017) chose a margin of g = 0.25, which is among the smallest margins ever used in psychotherapy and medical research (Gladstone and Vach, Reference Gladstone and Vach2014, Figure 2, Steinert et al., Reference Steinert2017, p. 944). This margin is very close to both (a) the threshold for a minimally important difference specifically suggested for depression (0.24, Cuijpers et al., Reference Cuijpers2014), and (b) the margin recommended by Gladstone and Vach (Reference Gladstone and Vach2014) to protect against degradation of treatment effects in non-inferiority trials (d = −0.23).

In their recent correspondence article, Rief and Hofmann (Reference Rief and Hofmann2018) make a quite different proposal, recommending margins not to fall below 90% of the uncontrolled effect size of the established treatment. This proposal, however, is associated with several problems described in more detail in Table 1, particularly regarding the clinical significance of the suggested margin and its implications for sample size determination, rendering non-inferiority trials in psychotherapy research virtually impossible (Table 1).

Table 1. Further methodological issues of equivalence and non-inferiority testing

^a Paul Crits-Christoph, personal communication, 16 February 2018.

^b Paul Crits-Christoph, personal communication, 26 February 2018.

Statistical hypotheses in equivalence and non-inferiority testing

In equivalence testing, the null and alternative hypotheses of superiority testing are reversed and the statistical alternative hypothesis is consistent with the assumption of equivalence (Lesaffre, Reference Lesaffre2008; Walker and Nowacki, Reference Walker and Nowacki2011). To test for equivalence, two one-sided tests are performed determining whether the upper and the lower boundary of the CI are included in the margin, whereas, for testing non-inferiority, one one-sided test inspecting the lower boundary is used (Lesaffre, Reference Lesaffre2008; Walker and Nowacki, Reference Walker and Nowacki2011). A statistically significant result implies here that the effect size and its CI are within the margin, demonstrating equivalence or non-inferiority (Walker and Nowacki, Reference Walker and Nowacki2011). A recent meta-analysis testing equivalence of PDT to other approaches established in efficacy reported a significant result indicating that the effect sizes and their CIs were completely included in the margin (Steinert et al., Reference Steinert2017). Thus, the recently given interpretation by Rief and Hofmann (Reference Rief and Hofmann2018, p. 2) that Steinert et al. (Reference Steinert2017) ‘… found a significant disadvantage of PDT [psychodynamic therapy] compared with other treatments (including CBT)’ is simply wrong (Lesaffre, Reference Lesaffre2008; Walker and Nowacki, Reference Walker and Nowacki2011).

Equivalence v. non-inferiority testing

Equivalence and non-inferiority testing need to be differentiated (Treadwell et al., Reference Treadwell2012). In non-inferiority testing, for example, the test treatment is expected to be superior to the standard treatment in measures not related to efficacy such as side effects or costs (Treadwell et al., Reference Treadwell2012). Rief and Hofmann did not make this differentiation. In fact, the meta-analysis by Steinert et al. (Reference Steinert2017), for example, was a test of equivalence, not of non-inferiority as suggested by Rief and Hofmann (Reference Rief and Hofmann2018).

Assay sensitivity and constancy of study conditions

Equivalence and non-inferiority testing require that the efficacy of the comparator is ensured and that the study conditions are comparable with in which the efficacy of the comparator was established (Treadwell et al., Reference Treadwell2012). In those context, Rief and Hofmann (Reference Rief and Hofmann2018) claim that specific issues of (low) study quality favour non-inferiority results, e.g. low response rates found in specific studies or low treatment integrity. Again, however, these claims are not supported by evidence (Table 1). This applies to several further issues put forward by Rief and Hofmann (Reference Rief and Hofmann2018) which are briefly discussed in Table 1, for example to the relationship between equivalence testing and the number of studies available for a specific treatment (Table 1).

Conclusions

Equivalence and non-inferiority testing pose specific methodological problems (Piaggio et al., Reference Piaggio2012; Treadwell et al., Reference Treadwell2012), for example, in defining a margin, statistical testing, and ensuring the efficacy of the comparator or comparability of study conditions (Table 1). Conclusions about equivalence and non-inferiority testing differing from Rief and Hofmann's (Reference Rief and Hofmann2018) are presented which are more consistent with the available evidence and usual standards across a range of scientific disciplines.

References

Connolly Gibbons, MB et al. (2016) Comparative effectiveness of cognitive therapy and dynamic psychotherapy for major depressive disorder in a community mental health setting: a randomized clinical noninferiority trial. JAMA Psychiatry 9, 904–911.Google Scholar

Cuijpers, P et al. (2014) What is the threshold for a clinically relevant effect? The case of major depressive disorders. Depression and Anxiety 31, 374–378.Google Scholar

Cuijpers, P et al. (2016) How effective are cognitive behavior therapies for major depression and anxiety disorders? A meta-analytic update of the evidence. World Psychiatry 15, 245–258.10.1002/wps.20346Google Scholar

Driessen, E et al. (2013) The efficacy of cognitive-behavioural therapy and psychodynamic therapy in the outpatient treatment of major depression: a randomized clinical trial. American Journal of Psychiatry 170, 1041–1050.10.1176/appi.ajp.2013.12070899Google Scholar

Gladstone, BP and Vach, W (2014) Choice of non-inferiority (NI) margins does not protect against degradation of treatment effects on an average--an observational study of registered and published NI trials. PLoS ONE 9, e103616.Google Scholar

Lange, S and Freitag, G (2005) Therapeutic equivalence – clinical issues and statistical methodology in noninferiority trials choice of delta: requirements and reality – results of a systematic review. Biometrical Journal 47, 12–27.Google Scholar

Lesaffre, E (2008) Superiority, equivalence, and non-inveriority trials. Bulletin of the NYU Hospital for Joint Diseases 66, 150–154.Google Scholar

McGlothlin, AE and Lewis, RJ (2014) Minimal clinically important difference: defining what really matters to patients. JAMA 312, 1342–1343.Google Scholar

Munder, T et al. (2013) Researcher allegiance in psychotherapy outcome research: an overview of reviews. Clinical Psychology Review 33, 501–511.Google Scholar

Persons, JB, Bostrom, A and Bertagnolli, A (1999) Results of randomized controlled trials of cognitive therapy for depression generalize to private practice. Cognitive Therapy and Research 23, 535–548.Google Scholar

Piaggio, G et al. (2012) Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA 308, 2594–2604.Google Scholar

Rief, W and Hofmann, SG (2018) Some problems with non-inferiority tests in psychotherapy research: psychodynamic therapies as an example. Psychological Medicine, 1–3.Google Scholar

Steinert, C et al. (2017) Psychodynamic therapy: as efficacious as other empirically supported treatments? A meta-analysis testing equivalence of outcomes. American Journal of Psychiatry 174, 943–953.Google Scholar

Thoma, NC et al. (2012) A quality-based review of randomized controlled trials of cognitive-behavioral therapy for depression: an assessment and metaregression. American Journal of Psychiatry 169, 22–30.Google Scholar

Treadwell, JR et al. (2012) Assessing equivalence and noninferiority. Journal of Clinical Epidemiology 65, 1144–1149.Google Scholar

Walker, E and Nowacki, AS (2011) Understanding equivalence and noninferiority testing. Journal of General Internal Medicine 26, 192–196.Google Scholar

Webb, CA, deRubeis, RJ and Barber, J (2010) Therapist adherence/competence and treatment outcome: a meta-analytic review. Journal of Consulting and Clinical Psychology 78, 200–211.Google Scholar

Table 1. Further methodological issues of equivalence and non-inferiority testing

Article contents

Equivalence and non-inferiority testing in psychotherapy research

Abstract

Equivalence and non-inferiority margins

Statistical hypotheses in equivalence and non-inferiority testing

Equivalence v. non-inferiority testing

Assay sensitivity and constancy of study conditions

Conclusions

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests