Significant outcomes
-
∙ Future trials need to address various methodological and clinical considerations based on the ‘participants, interventions, comparisons, outcomes, and study designs (PICOS)’ criteria in order to advance further our understanding on the antidepressant impact of exercise for clinically depressed adults.
Limitations
-
∙ This is a selective critical review. Thus it is prone to bias as it ranks moderate in the evidence hierarchy.
Introduction
A growing corpus of evidence provides substantial evidence that exercise is an efficacious treatment modality for mild to moderate depression (1–Reference Craft and Landers7). Previous high-profile reviews (Reference Cooney, Dwan and Greig8,Reference Lawlor and Hopker9), however, have cast doubt on the strength of the supporting evidence, highlighting methodological weaknesses in most of the reviewed randomised controlled trials (RCTs). For example, according to the latest edition of a Cochrane review (Reference Cooney, Dwan and Greig8), ‘analysis of methodologically robust trials only shows a smaller effect in favour of exercise’. A recent critique of the methodological details of this review, however, raised serious doubts about this conclusion due, among other reasons, to problematic inclusion and exclusion criteria upon which trials were selected for the Cochrane review (Reference Ekkekakis10). It is, thus, conceivable that the antidepressant efficacy of exercise might have been underestimated (Reference Schuch, Vancampfort, Richards, Rosenbaum, Ward and Stubbs5,Reference Ekkekakis10), doing a disservice to patients, and highlighting the central importance of sound methodological decisions in yielding accurate estimates of magnitude of the exercise efficacy as a treatment for depression (Reference Schuch and de Almeida Fleck11).
In order to advance our understanding on the antidepressant impact of exercise and to help elucidate the reasons for the discrepant conclusions currently found in the literature, we conducted a selective critical review of relevant studies. For this purpose, we drew on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement, focussing specifically on PICOS (Reference Liberati, Altman and Tetzlaff12). Although the PICOS criteria can have a crucial impact on the outcome of any trial, critical assessments of these important factors are scarce on the existing literature of exercise and depression.
Participants
The use of participants with homogenous symptoms and similar changes in response to exercise would reduce within-group variance and increase the effect size associated with exercise interventions. Depression, however, is a heterogeneous disorder (Reference Parker13,Reference Fried and Nesse14) comprising diverse symptomatology (e.g. psychomotor retardation or agitation, increased or decreased appetite, insomnia or hypersomnia). Hence, patients with similar scores on measures of depression may experience dissimilar symptoms (Reference Parker13,Reference Spanemberg, Caldieraro and Vares15,Reference Caldieraro, Baeza, Pinheiro, Ribeiro, Parker and Fleck16). This heterogeneity in symptoms may reflect differences in underlying neurobiological processes (Reference Parker13,Reference Spanemberg, Caldieraro and Vares15,Reference Caldieraro, Baeza, Pinheiro, Ribeiro, Parker and Fleck16), suggesting that the same exercise prescription may be less effective for some patients and more effective for others. The heterogeneity of depression also involves various biopsychosocial factors. For example, a recent review provided initial evidence that clinical (severity of somatic symptoms), biological [brain-derived neurotrophic factor, (BDNF) and tumour necrosis-α], psychological (self-esteem and life satisfaction), and social factors (support and marital status) may moderate the antidepressant effects of exercise (Reference Schuch, Dunn, Kanitz, Delevatti and Fleck17). Hence, developing a typology that would allow matching a depression ‘type’ to the most appropriate exercise prescription would both help advance research and facilitate clinical application.
The issue of classification has been the subject of a long, yet inconclusive debate (Reference Fried and Nesse14) and the recent Diagnostic and Statistical Manual of the American Psychiatric Association (18) failed to provide new insights, especially from a neurobiological perspective (Reference Nemeroff19). In light of this uncertainty, researchers may consider the Research Domain Criteria (RDoC) project. The RDoC aims to develop a taxonomy of mental disorders by taking advantage of research advances. To this end, the core of RDoC is a transdiagnostic matrix of functional dimensions, grouped on various domains including, among others, cognitive and reward-related systems. These are studied through seven units of analysis: (1) genes, (Reference Anderson, Ferrier and Baldwin2) molecules, (Reference Cleare, Pariante and Young3) cells, (Reference Malhi, Bassett and Boyce4) neural circuits, (Reference Schuch, Vancampfort, Richards, Rosenbaum, Ward and Stubbs5) physiology, (Reference Rethorst, Wipfli and Landers6) behaviours and (Reference Craft and Landers7) self-reports (Reference Insel, Cuthbert and Garvey20). The adoption of the RDoC as a research framework is recommended by the National Institute of Mental Health (Reference Insel, Cuthbert and Garvey20), with clear implications for exercise and mental health research.
Research focussing on identifying patient’s biomarkers and predictors of treatment response to exercise should be assigned high priority. This can be done by exploring moderators of the antidepressant effects of exercise, taking into account the following points (Reference Insel, Cuthbert and Garvey20). First, potential moderators must be chosen a priori, in a hypothesis-driven fashion, guided by theory or clinical experience. In this manner, the risk of facing ‘data dredging’ effects or spurious moderators is minimised. Second, data analysis should be based primarily on effect sizes instead of p values, as a large sample size may generate ‘statistically significant’ results of little clinical relevance (Reference Kraemer, Wilson, Fairburn and Agras21). Finally, antidepressant efficacy may be gauged by a battery of outcome measures, instead of a single measure, possibly enhancing reliability, reducing measurement error, and improving the odds of identifying important moderators (Reference Wallace, Frank and Kraemer22). The reporting the mean changes in symptoms on the outcome measure will also help gauge clinical usefulness of interventions compared with controls groups.
Interventions
Identifying the optimal ‘dose’ of exercise in terms of frequency, intensity, duration, or overall volume is also crucial in developing effective exercise interventions for depression. Quantifying exercise in terms of energy expenditure is one possible way of defining ‘dose’. For example, doses of 17.5 and 16.5 kcal/kg/week have been found to be effective as monotherapy and add-on therapy for patients with light-to-moderate (Reference Dunn, Trivedi, Kampert, Clark and Chambliss23) and severe (Reference Schuch, Vasconcelos-Moreno, Borowsky and Fleck24,Reference Schuch, Vasconcelos-Moreno, Borowsky, Zimmermann, Rocha and Fleck25) depression, respectively. Interestingly, intensity in these trials was self-selected, possibly making exercise more suitable to patient preferences and needs.
The energy-expenditure approach to defining the exercise ‘dose’ may be consistent with public health recommendations for physical activity, but may not be the optimal method of deciphering potential biological mechanisms driving the antidepressant effects of exercise (Reference Schuch, Deslandes, Stubbs, Gosmann, Silva and Fleck26). Exercise prescriptions designed to specifically target putative mechanisms of the antidepressant effects of exercise could potentially achieve even greater benefits. A prescribed amount of exercise could be achieved via different combinations of frequency, duration, and intensity. However, each of these may be related to different mechanisms underlying the antidepressant effects of exercise, possibly resulting in a varying degree of symptom reduction. Some patients may reach the same prescribed amount of exercise via low-intensity and long-duration interventions (e.g. walking for a long distance) or via high intensity for a short duration (e.g. brief periods of running). Given the multitude of possible ways of reaching the same total amount of energy expenditure, identifying the frequency, duration, and intensity that optimally stimulate the biological mechanisms underlying the antidepressant effects of exercise becomes difficult, if not impossible. Instead, dose-response studies that manipulate specific parameters of the exercise dose (i.e. frequency, duration, intensity) specifically targeted for each individual, ideally in a factorial design, should be preferred, despite their high cost.
Gaining better understanding of the neurobiological processes underlying the antidepressant effects of exercise (Reference Schuch, Deslandes, Stubbs, Gosmann, Silva and Fleck26) will be crucial in designing more effective exercise interventions. For this purpose, hormonal, oxidative-stress, and neurogenetic pathways should be considered (Reference Schuch, Deslandes, Stubbs, Gosmann, Silva and Fleck26). The upregulation of BDNF (Reference Deslandes, Moraes and Ferreira27) may warrant particular attention. BDNF promotes the process of neurogenesis. Acute bouts of exercise have been found to increase the serum levels of BDNF in psychiatric patients (Reference Schuch, Deslandes, Stubbs, Gosmann, Silva and Fleck26 Reference Broocks, Ahrendt and Sommer28–Reference Schuch, da Silveira and de Zeni32). Both exercise intensity and volume play a significant role in the magnitude of the exercise-induced effect on BDNF (Reference Schmolesky, Webb and Hansen33). It should be noted, however, that the BDNF response to long-term exercise in people with major depression is presently less clear (Reference Schuch, Deslandes, Stubbs, Gosmann, Silva and Fleck26,Reference Schuch, Vasconcelos-Moreno and Borowsky34–Reference Toups, Greer and Kurian36).
Comparisons
Similar to other non-pharmacological treatments (Reference Donovan, Kwekkeboom, Rosenzweig and Ward37,Reference Weimer, Colloca and Enck38), it has not been possible to disentangle the relative contribution of the effect of exercise on depression versus other non-exercise-specific factors, which can also play a role. For example, in most cases, exercising involves social interaction, which may have its own antidepressant effects, independent of exercise (Reference Stathopoulou, Powers, Berry, Smits and Otto39). However, social support has not been shown to be a strong predictor of subsequent depression (Reference Morres, Van de Vliet, Knapen and VanCoppenolle40) or contributor to the antidepressant effects of exercise (Reference Callaghan, Khalil, Morres and Carter41). Nevertheless, trial arms must fully balance social influences to rule out any confounding impact on between-group comparisons and safeguard internal validity.
A common trend in RCTs investigating the antidepressant effects of exercise is to employ ‘control’ groups that also receive exercise, albeit of a different modality and/or intensity than that administered to the ‘treatment’ groups (e.g. aerobic exercise is compared with strength and flexibility training). The rationale typically offered for employing such groups as controls is that the current literature does not contain descriptions of specific mechanistic pathways by which these alternate exercise modalities might influence depression. Rather than being inert, however, alternate modalities of exercise have consistently been associated with statistically reliable and clinically meaningful reductions in depressive symptoms (Reference Stubbs, Vancampfort and Rosenbaum42). As all modalities of exercise share many common biological features (e.g. repeated cycles of muscular contraction and relaxation, some degree of stimulation of the cardiovascular system), the employment of comparators also engaged in a form of exercise as ‘control’ may introduce a critical confound in exercise studies for depression.
As one example, based on the results of an RCT comparing aerobic exercise, strength exercise, and a ‘control’ arm engaged in ‘relaxation,’ the authors stated: ‘Our findings do not support a biologically mediated effect of exercise on symptom severity in depressed patients’ (Reference Krogh, Saltin, Gluud and Nordentoft43). This conclusion was intended to reflect the lack of significant differences in post-intervention depression scores between the three groups. All groups, however, including the ‘control’ group that was described as engaged in ‘relaxation,’ exhibited meaningful improvements in physical fitness (aerobic capacity and strength) and, accordingly, also showed substantially reduced post-intervention depressive symptomatology. Specifically, from pre- to post-intervention, the aerobic exercise group exhibited a standardised mean difference (SMD) of −1.15 (95% CI −1.56–0.75), the strength intervention showed a SMD of −1.57 (95% CI −2.00 to −1.14), and the ‘relaxation’ group had a SMD of −1.27 (95% CI −1.68 to −0.86). Although the provision of a fitness-enhancing exercise intervention to the group designated as ‘control’ arguably can turn the aforementioned conclusion from this trial on its head, this crucial confound only received a brief and rather inconspicuous acknowledgement in the discussion: ‘Limitations include confounding due to a possible antidepressant effect of the intervention used in our control group (relaxation training)’ (Reference Stathopoulou, Powers, Berry, Smits and Otto39).
In another RCT that compared a group engaged in aerobic exercise with a ‘control’ group engaged in a treatment described as ‘stretching exercise,’ the researchers concluded: ‘our trial data do not support any effect of aerobic exercise on depressive symptoms’ (Reference Krogh, Videbech, Thomsen, Gluud and Nordentoft44). In actuality, however, besides 20 min of stretching, participants in the ‘control’ group also engaged in 25 min of low-intensity aerobic exercise, including 10 min of ‘warm-up on a stationary bike’ and 15 min of ‘throwing and catching balls.’ As in the previous trial, it is noteworthy to mention that both groups exhibited large reductions in depressive symptoms (aerobic, SMD −1.37, 95% CI −1.78 to −0.96; stretching, SMD −1.51, 95% CI −1.92 to −1.10) mirroring an entirely different-(positive) antidepressant status for exercise compared with the one presented by the trialists.
To avoid misleading conclusions, trialists, as well as authors, reviewers, editors, and readers engaged in critical appraisal, should consider the magnitude of changes in depressive symptoms in both the experimental and control groups, especially when the control groups also engaged in a type of exercise. The necessity of scrutinising the nature of treatments provided to groups described as ‘controls’ is also underscored by a recent meta-analysis of exercise trials for depression (Reference Stubbs, Vancampfort and Rosenbaum42). This study has found uncommonly large reductions in depressive symptoms in ‘control’ groups receiving alternate exercise treatments (e.g. stretching) (Reference Stubbs, Vancampfort and Rosenbaum42). These reductions were two times larger when compared with those reductions experienced by control groups assigned to trials on antidepressant medication.
Outcomes
Widely used outcome measures, such as the clinician-administered Hamilton Rating Scale for depression (Reference Hamilton45), have been known to present various psychometric problems, including lack of unidimensionality and poor ability to detect changes among individuals with mild to moderate depressive symptoms (Reference Isacsson and Adler46,Reference Salum, Manfro and Fleck47). These problems may be particularly pertinent to studies investigating the antidepressant effects of exercise because exercise is a treatment that is specifically recommended for individuals with mild to moderate symptom severity. In the latest edition of the Cochrane review on exercise for depression (Reference Cooney, Dwan and Greig8), >90% of the reviewed trials included patients with mild to moderate levels of depressive symptoms. Hence, the use of the Hamilton rating scale may not accurately reflect the magnitude of the antidepressant effect of exercise. Instead, recently developed scales based on Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) criteria, such as the Inventory of Depressive Symptoms, may more accurately reflect changes across all levels of symptom severity (Reference Helmreich, Wagner and Mergl48). Thus, their use in future trials is strongly recommended.
Study design
RCTs are widely considered the ‘gold standard’ for establishing the causal effect of a given therapy on a certain clinical outcome. Establishing the causal effect, however, is conditional upon the degree of bias inherent in the experimental design or the extent to which the trials meet important criteria of methodological quality (Reference Akobeng49,Reference Moher, Schulz, Altman and Group50). Given that risk of bias has been shown to be inversely associated with the magnitude of intervention effects (Reference Wood, Egger and Gluud51), trials with a methodologically robust design are essential. Methodological criteria commonly considered hallmarks of a robust design include the generation of truly random allocation sequences, the successful concealment of group allocation, the blinding of outcome assessors, intention-to-treat analyses, the complete reporting of point estimates and variability indices on the primary outcome measures, and between-group baseline balance in the most important prognostic indicators including the primary outcome. Blinding participants and treatment providers are also considered crucial methodological criteria. However, the nature of an exercise intervention makes it impossible to blind both the participants themselves and the individuals administering the exercise interventions, clearly this is a particular challenge in control group arms of trials. Collecting and reporting data on drop-out rates, side effects, other adverse events, and the number needed to treat can also offer valuable insight into the acceptability of exercise as an intervention strategy. Finally, follow-up assessments can provide much-needed information about the enduring effects of exercise interventions.
Over the past 30 years, the methodological quality of RCTs investigating the antidepressant effects of exercise has improved, reflected by improved adherence with reporting standards (Reference Perraton, Kumar and Machotka52–Reference Morres, Stathi, Martinsen and Sørensen54). Recent RCTs, in particular, tend to be of high methodological quality by commonly employed criteria (Reference Schuch, Vasconcelos-Moreno, Borowsky, Zimmermann, Rocha and Fleck25,Reference Hallgren, Kraepelien and Öjehagen55) and sample sizes have been substantially increased (Reference Akobeng49). Pragmatic RCTs, however, are scarce (Reference Callaghan, Khalil, Morres and Carter41). Pragmatic RCTs are characterised by high external validity (outcome generalisability) by virtue of methodological features that are more closely aligned with ‘real life’ practice norms, such as interventions delivered in routine practice and inclusive (non-restrictive) eligibility criteria for participation (Reference Hotopf56). Thus, pragmatic RCTs can be invaluable in facilitating the translation of clinical trial results to routine practice. This is of major importance given that practitioners are faced with a series of challenges in treating depressed patients. These challenges include primarily limited time and resources. Thus, we encourage researchers to conduct pragmatic RCTs. Given that a number of clinicians report lack of confidence in designing appropriate exercise prescriptions as a result of the aforementioned challenges, and that supervised trials present a lower drop-out rate (Reference Stubbs, Vancampfort and Rosenbaum57), it is noteworthy that a pragmatic trial involving supervised-based exercise of preferred (self-selected) intensity yielded promising results in terms of safety, compliance, and depressive symptom reductions (Reference Callaghan, Khalil, Morres and Carter41).
Conclusion
Specifying and advancing the current understanding of the antidepressant effects of exercise in relationship to the PICOS criteria is an important clinical topic with broad implications. The factors associated with the PICOS criteria that were reviewed here should be considered not only when designing future trials aiming to advance further previous trial outcomes but also when critically appraising published RCTs. In particular, with reference to participants, researchers should consider the RDoC (not only the DSM) as a diagnostic framework and include moderator analyses. Regarding the design of exercise interventions, the relationships between potential biological mediators (e.g. BDNF) and components of the exercise dose (e.g. volume or intensity) should be considered. With regard to control or comparison groups, while balancing non-specific effects (e.g. social interaction) is essential in safeguarding internal validity, the use of alternate-modality exercise interventions as ‘controls’ creates serious confounds and should be avoided especially as alternate exercise modalities bring about antidepressant effects. In evaluating the results of trials employing exercising groups as ‘controls,’ trialists and critical readers are urged to assess the clinical meaningfulness of pre-to-post changes in depressive symptoms within these groups in addition to between-group comparisons post-intervention. With respect to outcomes, psychometric scales that can accurately reflect changes at mild and moderate levels of depressive symptomatology should be preferred, as these are likely to be the most prevalent levels of symptom severity in recruited samples. Lastly, concerning study designs, researchers are encouraged to conduct more pragmatic (effectiveness) RCTs, to pave the way for the development of more practical and scalable exercise interventions that will be embedded in routine practice.
Acknowledgements
The authors would like to thank the Conselho nacional desenvolvimento científico e tecnológico (CNPq), Coordenação de aperfeiçoamento de pessoal de nível superior (CAPES), and the Exercise Psychology and Quality of Life Laboratory, Department of Physical Education and Sport Science, University of Thessaly, Trikala, Greece. Authors’ Contribution: F.B.S., I.D.M., P.E., S.R., and B.S. contributed to the manuscript design and writing. All authors agreed with the final version of the manuscript.
Financial Support
The authors received no specific funding for this review from any agency, commercial entity, or not-for-profit organisation.
Conflicts of Interest
The authors declare no potential conflict of interest.