Background
Depression is the leading cause of disability worldwide.1 After a first episode of depression, approximately half of patients will experience a relapse or recurrence (re-emergence of depressive symptoms after an initial improvement),Reference Beshai, Dobson, Bockting and Quigley2 and most do so within the first 6 months.Reference Ali, Rhodes, Moreea, McMillan, Gilbody and Leach3 Those who experience a relapse or recurrence are more likely to relapse again in the future compared with those who do not.Reference Burcusa and Iacono4 There is evidence to suggest that relapse or recurrence of depression result in an increased risk of subsequent relapseReference Burcusa and Iacono4 and, possibly, increased treatment resistance.Reference Post5 Reliable prediction of individuals’ risk of relapse and recurrence might enable a precision medicine approach to relapse prevention, personalising the allocation and potentially type of relapse prevention interventions offered to ensure maximum benefit. Prognostic factors are variables that are associated with an outcome of interest, although are not necessarily causal, and overall prognosis can be estimated within groups defined by the values of a prognostic factor. These are differentiated from prescriptive factors, which are associated with outcomes and also moderate treatment effects. Prognostic factors associated with relapse and recurrence include childhood maltreatment, history of recurrent depression and presence of residual depressive symptoms, among others, whereas evidence for prescriptive factors remains limited.Reference Buckman, Underwood, Clarke, Saunders, Hollon and Fearon6 Multivariable prognostic models combine information about multiple prognostic factors for a particular person to provide individualised risk predictions.Reference Riley, van der Windt, Croft and Moons7 There have been an increasing number of attempts to derive and validate prognostic models to predict depression-related outcomes.Reference Bone, Simmonds-Buckley, Thwaites, Sandford, Merzhvynska and Rubel8–11 There has been no previous systematic review to identify all prognostic models designed to predict relapse or recurrence of depression.
Objectives
To identify and critically appraise prognostic model development and validation studies aimed at predicting relapse, recurrence, sustained remission or recovery in adults with major depressive disorder who meet the criteria for remission or recovery. In addition, we planned to summarise and meta-analyse their predictive performance, to describe the characteristics of the models identified, and to review the clinical utility (net benefit) of the identified models, where possible.
Method
The protocol was preregistered in the Cochrane Database of Systematic Reviews (CD013491)Reference Moriarty, Meader, Gilbody, Chew-Graham, Churchill and Ali12,Reference Moriarty, Meader, Snell, Riley, Paton and Chew-Graham13 and is reported in line with the Preferred Reported Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline.Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann and Mulrow14
Eligibility criteria
We specified the following inclusion criteria (see the Appendix for PICOTS criteria):Reference Debray, Damen, Snell, Ensor, Hooft and Reitsma15
(a) adult population (18 years and over) with major depressive disorder (defined using validated diagnostic criteria) who met criteria for remission or recovery (i.e. no longer meeting diagnostic criteria for major depressive episode) at the point of prediction;
(b) any setting (primary, secondary, or community care);
(c) all multivariable prognostic models developed to predict individual risk of relapse, recurrence, sustained remission, or recovery of depression over any time period.
Remission and recovery are terms used to describe an improvement in depressive symptoms; remission meaning improved but still ‘in episode’ and recovery being the resolution of the underlying episode (usually after 6 to 12 months of remission).Reference Bockting, Hollon, Jarrett, Kuyken and Dobson16 Relapse occurs following some level of remission but precedes recovery, whereas recurrence is the onset of a new episode of depression following recovery.Reference Frank, Prien, Jarrett, Keller, Kupfer and Lavori17,Reference Rush, Kraemer, Sackeim, Fava, Trivedi and Frank18 Sustained remission can be thought of as the inverse, or opposite of relapse; and recovery as the inverse of recurrence. Both of these hold potentially valuable prognostic information pertinent to relapse risk prediction models in depression, and are therefore included as outcomes in this review. The precise temporal cut-offs of these terms have not been robustly validated empirically and are inconsistently operationalised in the literature.Reference Buckman, Underwood, Clarke, Saunders, Hollon and Fearon6 For this reason, we accepted all definitions of these terms, as operationalised by the authors of the primary studies.
We included all three types of prognostic model study:
(a) development studies with internal validation (which derive a model for individualised prediction and quantify predictive performance in the development data-set);
(b) development with external validation (which develop a model and then quantify the performance in data external to the development set); and
(c) external validation only (attempt to externally validate an existing model).Reference Wolff, Moons, Riley, Whiting, Westwood and Collins19
External validation did not include randomly splitting the development data-set to produce two separate data-sets (an approach more appropriately considered an inefficient form of internal validation),Reference Riley, van der Windt, Croft and Moons7 but did include studies where a validation data-set was produced by a non-random split, for example, participants from the same institution but at different time points (temporal validation) or by location (geographical validation).Reference Collins, De Groot, Dutton, Omar, Shanyinde and Tajar20
We excluded models developed in populations with comorbid severe mental illness (for example, schizophrenia and bipolar affective disorder), as these patients typically receive more intensive psychiatric input and the results would be less generalisable. We excluded studies where the intention was not to provide individualised risk predictions (for example those aimed at quantifying the adjusted effects of prognostic factors).
Information sources and search strategy
We searched the Cochrane Library (current issue); Ovid MEDLINE (1946 onwards); Ovid Embase (1980 onwards); and Ovid PsycINFO (1806 onwards) up to May 2021, using relevant subject headings (controlled vocabularies) and search syntax, appropriate to each resource. We also searched several grey literature resources primarily for dissertations and theses (Open Grey (www.opengrey.eu); ProQuest Dissertations & Theses Global (www.proquest.com/products-services/pqdtglobal.html); DART-Europe E-theses Portal (www.dart-europe.eu); EThOS - the British Libraries e-theses online service (ethos.bl.uk); Open Access Theses and Dissertations (oatd.org)), also up to May 2021. We applied no restrictions by date, language or publication status. We checked the reference lists of all included articles and conducted forward citation searches on the Web of Science (12 March 2021 and 19 May 2021), to identify additional studies missed from the original electronic searches (for example unpublished or in-press citations). We contacted corresponding authors for information on unpublished or ongoing studies.
Selection of studies
Two review authors (A.S.M. and N.M.) independently reviewed the titles and abstracts of studies identified by the search strategy. We excluded prognostic model studies that clearly did not meet our inclusion criteria at the title and abstract screening stage. For any studies where there was uncertainty, we undertook a full-text review. We resolved disagreement in judgements through discussion or, if necessary, by referral to a third review author (K.I.E.S. or D.M.).
Data collection
Two independent review authors (A.S.M. and N.M.) conducted the data extraction, commencing 1 September 2020. The Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), which has been specifically designed for systematic reviews of prognostic models, was used to guide data extraction.Reference Debray, Damen, Snell, Ensor, Hooft and Reitsma15 This included the following measures of predictive performance, where available:
-
calibration, which measures the extent to which risk predictions and observed outcomes are in agreement (measures extracted included calibration slope, ratio of observed (O) to expected (E) events (O:E ratio), calibration plots); and
-
discrimination, the model's ability to separate patients who develop the outcome of interest and those who do not (usually measured using the Concordance (C)-statistic or area under the receiver operator curve (AUC)).
Where these measures were not available directly, we planned to calculate them from other information available with reference to recent guidance.Reference Debray, Damen, Riley, Snell, Reitsma and Hooft21 We also planned to extract information on clinical utility, where available. Clinical utility is important to consider when a model's predicted risks are to be used to inform decision-making. It can be measured by the net benefit at a particular risk threshold, and by plotting decision curves of the net benefit across a range of relevant thresholds.Reference Vickers, Van Calster and Steyerberg22
Data synthesis and meta-analysis approaches
If a sufficient number of external validation studies were identified for a particular model, we planned to conduct random-effects meta-analyses to summarise the performance of prognostic models, as the data were likely to be highly heterogeneous. In the absence of sufficient data for a meta-analysis, we have used a narrative synthesis instead.
Risk of bias assessment in included studies
Two independent review authors (A.S.M. and N.M.) assessed risk of bias (ROB) using the Prediction model risk of bias assessment tool (PROBAST), which assesses ROB (low, high or unclear) over four domains (participants, predictors, outcomes and analysis) and applicability (concerns about applicability; also low, high, or unclear) in the first three of the domains.Reference Riley, van der Windt, Croft and Moons7,Reference Wolff, Moons, Riley, Whiting, Westwood and Collins19,Reference Moons, Wolff, Riley, Whiting, Westwood and Collins23
For the ‘Analysis’ domain, when determining whether an appropriate sample size was used, we adhered to PROBAST recommendations, which use the rule of thumb using events per candidate predictor parameter (EPP). The PROBAST guidance suggests an EPP of 20 and over for development studies (although those between 10 and 20 EPP can be rated ‘probably yes’ or ‘probably no’, depending on outcome frequency, overall model performance and distribution of predictors in the model) and 100 participants with the outcome and 100 without the outcome for external validation studies. For handling of missing data, multiple imputation is considered the most appropriate method when data are missing at randomReference Riley, van der Windt, Croft and Moons7 and is recommended by PROBAST.Reference Moons, Wolff, Riley, Whiting, Westwood and Collins23 The PROBAST tool has been developed primarily for studies that used a more traditional regression method and guidance on best practice for machine learning models is less widely available. In the case of any machine learning models identified, we applied the PROBAST guidance as described for traditional regression techniques, but judgements should be interpreted with these limitations in mind.
Results
Results of the search
We identified a total of 8694 studies initially, with one study located through a forward citation search performed on 12 March 2021.Reference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 Deduplicated records (n = 5777) records underwent title and abstract screening by two independent review authors (A.S.M. and N.M.), 51 underwent full-text screening and 12 studies were included in the final review (2 full-text articles required referral to K.I.E.S. and were excluded following this referral). These included 11 unique prognostic models; 1 of the studiesReference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 externally validated a model developed elsewhere.Reference Van Loo, Aggen, Gardner and Kendler25 Studies excluded after full-text screening (n = 37) fell into two categories: not meeting study design criteria (i.e. model not intended for prediction) or not meeting participant population criteria. Two studies (awaiting further information) were conference proceedings; we were unable to obtain further information on these studies and so did not include them in the reviewReference Trivedi, Morrison, Daly, Singh, Fedgchin and Jamieson26,Reference Cohen, DeRubeis, Hayes, Watkins, Lewis and Byng27 (Fig. 1).
Description of studies
Of the included studies (Table 1), three were development and external validation studies,Reference Klein, Holtman, Bockting, Heymans and Burger28–Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30 eight were development-only studiesReference Van Loo, Aggen, Gardner and Kendler25,Reference Backs-Dermott, Dobson and Jones31–Reference Pintor, Torres, Navarro, Martinez de Osaba, Matrai and Gastó37 and oneReference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 was an external validation study. ThreeReference Van Loo, Aggen, Gardner and Kendler25,Reference Mocking, Naviaux, Li, Wang, Monk and Bright35,Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 of the development-only studies reported internal validation. No prognostic model was externally validated in more than one included study and, therefore, a meta-analysis was not necessary. All included studies used prospectively gathered data for developing the prognostic models. Four of the models were developed in secondary care,Reference Berlanga, Heinze, Torres, Apiquián and Caballero32–Reference Judd, Schettler and Rush34,Reference Pintor, Torres, Navarro, Martinez de Osaba, Matrai and Gastó37 whereas the other seven were developed in primary careReference Klein, Holtman, Bockting, Heymans and Burger28,Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 or community settings.Reference Van Loo, Aggen, Gardner and Kendler25,Reference van Loo, Aggen, Gardner and Kendler29–Reference Backs-Dermott, Dobson and Jones31,Reference Mocking, Naviaux, Li, Wang, Monk and Bright35 Van Loo et al (2020) used a data-set drawn from primary care, secondary care and community settings (the Netherlands Study of Depression and Anxiety (NESDA)) for external validation.Reference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 Further details of the studies can be found in Supplementary Table S1 (available at https://doi.org/10.1192/bjp.2021.218).
MDE, major depressive episode; NA, not applicable; SCID, Structured Clinical Interview for DSM-IV; RCT, randomised controlled trial; MDD, major depressive disorders; SCL-90, Symptom Checklist 90; SEM, standard error of the mean IQR, interquartile range.
The Appendix summarises the specific outcome definitions used. The included studies covered a wide range of predictors (Table S2 outlines the different predictors included in the final models and how they were measured for the individual studies). Most commonly, these were disease-related characteristics and demographic factors. Some studies explored some less common predictors such as: neuropsychological predictors (emotional categorisation, emotional memory, and facial expression recognition);Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 personality characteristics such as neuroticism;Reference Berlanga, Heinze, Torres, Apiquián and Caballero32 psychosocial predictors such as life stress and interpersonal difficulties;Reference Backs-Dermott, Dobson and Jones31 biochemical predictors such as results from the corticotrophin-releasing factor test;Reference Pintor, Torres, Navarro, Martinez de Osaba, Matrai and Gastó37 peripheral blood metabolomic markers;Reference Mocking, Naviaux, Li, Wang, Monk and Bright35 and combinations of items from the Symptom Checklist (SCL-90).Reference Judd, Schettler and Rush34
Of the 11 development studies, nine used regression analysis (five used logistic regressionReference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30,Reference Berlanga, Heinze, Torres, Apiquián and Caballero32–Reference Judd, Schettler and Rush34,Reference Pintor, Torres, Navarro, Martinez de Osaba, Matrai and Gastó37 and four used Cox proportional hazards regression to study time to recurrence.Reference Van Loo, Aggen, Gardner and Kendler25,Reference Klein, Holtman, Bockting, Heymans and Burger28,Reference van Loo, Aggen, Gardner and Kendler29,Reference Mocking, Naviaux, Li, Wang, Monk and Bright35 Of the remaining two included development studies, one used a machine learning support vector machine model to predict recurrence over a median period of 233 daysReference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 and the other used discriminant function analysis (DFA), a statistical method to identify which continuous variables (predictors) best discriminate between two or more groups (in this case, relapse or stable remission).Reference Backs-Dermott, Dobson and Jones31
Predictive performance of prognostic models
The predictive performance of all included models is summarised in Table S2. Six of the model development studies identifiedReference Van Loo, Aggen, Gardner and Kendler25,Reference Klein, Holtman, Bockting, Heymans and Burger28–Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30,Reference Mocking, Naviaux, Li, Wang, Monk and Bright35,Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 reported internal validation to account for overfitting and optimism within the developed model. Three also reported external validation, using a data-set separate from the training data-set to give a truer reflection of model performance and generalisability.Reference Klein, Holtman, Bockting, Heymans and Burger28–Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30 Van Loo (2020)Reference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 presented the external validation of the model developed in Van Loo (2018).Reference Van Loo, Aggen, Gardner and Kendler25
Klein (2018)Reference Klein, Holtman, Bockting, Heymans and Burger28 used a randomized controlled trial data-set separate from that used for development for external validation and presented a calibration slope of 0.56 (0.81 on internal validation) and a Harrell's C-statistic of 0.59 (0.56 on internal validation). Van Loo (2015)Reference van Loo, Aggen, Gardner and Kendler29 used a temporal cut-off to define their development and validation samples (temporal validation). They presented ‘comparable’ Kaplan–Meier curves as evidence that their prognostic model was well calibrated for people at lower risk of relapse but less so for higher-risk participants, and an AUC of 0.61 on external validation (0.79 on internal validation). Wang et al (2014)31 used data from the same source but from a different geographical region (geographical validation) to define development and external validation data-sets. The authors presented a C-statistic of 0.72, indicating good discrimination, and presented the result of the Hosmer–Lemeshow goodness-of-fit test (3.51, P = 0.9) as evidence of ‘excellent calibration’.
Van Loo et al (2020)Reference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 presented the results of the developed model in two ‘test’ sets. One of these, the Virginia Adult Twin Study of Psychiatric and Substance Use Disorder (VATSPSUD), was data from the same sample used in Van Loo et al (2018)26 for model development and we have therefore classified this as an internal validation. The second test sample (NESDA) is separate from the development data-set and we have focused on this as the external validation. Discrimination was reported as good (AUC = 0.68 (95% CI 0.66–0.71) predicting recurrence over 0 to 2 years; AUC = 0.72 (95% CI 0.69–0.75) predicting recurrence over 0 to 9 years); calibration was not reported. Of the external validations included in this review, only Van Loo et al (2020)25 included 95% CI for measures of predictive performance.
Klein et al (2018)29 was the only included study to present all of the regression coefficients for the predictors included in the final model as well as the intercept and associated 95% CI. This model could therefore be used based on the information provided in the primary source. None of the included studies explored net benefit analysis (clinical utility) with respect to the developed models.
ROB and applicability assessment of included studies
We rated 11 of the 12 included studies as being at high overall ROB (see Fig. 2(a) and Supplementary Table 3). Only one study, Klein et al (2018),29 was assessed to be at low ROB in all four domains. ROB was generally assessed as being low for most studies in the domains of participants and predictors. ROB was unclear for 8 out of 12 of the studies in the domain of outcomes, because the studies did not state that outcomes were determined masked to the predictor information. For the fourth domain (analysis), there was variable quality for the reported methods and some weaknesses and potential sources of bias were identified in this domain for 11 of the 12 included studies.
The most common weakness related to sample size or number of events, or both, a lack of which seriously and adversely impairs the ability of a statistical model in the real world because of a significant risk of overfitting.Reference Riley, Ensor, Snell, Harrell, Martin and Reitsma38 Most studies did not describe how the sample size was determined. Only one studyReference Klein, Holtman, Bockting, Heymans and Burger28 reported sufficient EPP for model development (104 recurrences for eight candidate predictor parameters). All other regression modelsReference Van Loo, Aggen, Gardner and Kendler25,Reference van Loo, Aggen, Gardner and Kendler29,Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30,Reference Berlanga, Heinze, Torres, Apiquián and Caballero32–Reference Mocking, Naviaux, Li, Wang, Monk and Bright35,Reference Pintor, Torres, Navarro, Martinez de Osaba, Matrai and Gastó37 had inadequate sample size, according to PROBAST (see Method). The sample size determination used by Backs-Dermott et al (2010),Reference Backs-Dermott, Dobson and Jones31 which used DFA, appeared to be appropriate according to their reported methods.
Ruhe et al (2019)37 used a machine learning approach for model development.Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 Formal guidance is lacking to aid sample size determinations for prognostic model studies using machine learning techniques. The guidance and literature that does exist suggests that we should demand, if anything, significantly larger sample sizes when using a machine learning approach to prognostic model development, with one paper estimating that one would need more than ten times the EPP required for regression models to achieve a stable AUC and small optimism.Reference Van Der Ploeg, Austin and Steyerberg39 This study did not have an adequate sample size according to any of the existing guidance and recommendations. For Van Loo et al (2020),Reference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24 although it was not explicitly stated, we made the assessment that the sample size probably met PROBAST requirements for external validation (at least 100 events).
Another limitation of the majority of the included studies (n = 8) was their handling of missing data. Multiple imputation was used to handle missing data in only four of the identified studies.Reference van Loo, Bigdeli, Milaneschi, Aggen and Kendler24,Reference Van Loo, Aggen, Gardner and Kendler25,Reference Klein, Holtman, Bockting, Heymans and Burger28,Reference Judd, Schettler and Rush34 The remaining studies either did not report their approachReference Backs-Dermott, Dobson and Jones31–Reference Johansson, Lundh and Bjärehed33,Reference Pintor, Torres, Navarro, Martinez de Osaba, Matrai and Gastó37 or used non-PROBAST recommended approaches for handling missing data, such as imputing the meanReference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 or single imputation.Reference van Loo, Aggen, Gardner and Kendler29,Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30 Finally, most studies (n = 11) did not present appropriate performance statistics. The PROBAST guidance recommends that, as a minimum, a calibration plot and discrimination statistics (C-statistic for binary and time-to-event outcome models) are presented as relevant performance measures for a prognostic model study.Reference Wolff, Moons, Riley, Whiting, Westwood and Collins19 Classification measures, such as sensitivity and specificity, can be presented in addition to calibration and discrimination statistics, but they have the drawback of loss of information and of requiring risk thresholds to be specified, often based on the data rather than on meaningful, clinical grounds. One studyReference Klein, Holtman, Bockting, Heymans and Burger28 presented both a calibration plot and C-statistic in line with minimum best practice.
We had low concern about applicability for all included studies except for one,Reference Berlanga, Heinze, Torres, Apiquián and Caballero32 which was rated at an unclear level of concern (Fig. 2(b)). It was unclear whether all participants had reached remission and it appears that a proportion of participants would have met the criteria for depression according to the Hamilton Rating Scale for Depression.
Discussion
This is the first systematic review looking at prognostic models predicting relapse and recurrence of depression. We have identified 11 unique models, across 12 included studies. None of the models underwent independent external validation (i.e. by researchers not involved in the original model development) or net benefit analysis to assess clinical utility. Only one of the included models was found to be at overall low ROBReference Klein, Holtman, Bockting, Heymans and Burger28 and the discrimination and calibration of this model were poor on external validation. We were guided by the recent prognosis literature and guidance in developing our review methods, searches and in critically appraising the included studies. Our planned meta-analysis was not necessary because of an insufficient number of studies reporting performance statistics for the same model.
Comparison with the previous literature
The findings from this review align with previous prognosis research in this area, the majority of which has focused on prognostic factors. In contrast to prognostic models, which provide individualised risk prediction of particular outcomes conditional on multiple factors, prognostic factor studies focus on the factors themselves and whether they add (causal or prognostic) value over existing factors. Two recent systematic reviews and meta-analyses have explored prognostic factors associated with relapse and recurrence of depression.Reference Buckman, Underwood, Clarke, Saunders, Hollon and Fearon6,Reference Wojnarowski, Firth, Finegan and Delgadillo40 There is ‘strong evidence’ that residual depressive symptoms are prognostic for relapse and recurrence, and ‘good’ evidence that the number of previous episodes are associated with increased risk of relapse and recurrence.Reference Buckman, Underwood, Clarke, Saunders, Hollon and Fearon6 In addition, the following factors are associated with relapse and recurrence: childhood maltreatment, comorbid anxiety, neuroticism, age at first onset, rumination,Reference Buckman, Underwood, Clarke, Saunders, Hollon and Fearon6 experiencing a higher number of dependent chronic stressors, or a severe independent life event post-treatment.Reference Wojnarowski, Firth, Finegan and Delgadillo40
Individual participant data meta-analyses have also been used to explore prognostic and prescriptive factorsReference Kuyken, Warren, Taylor, Whalley, Crane and Bondolfi41,Reference Breedvelt, Warren, Segal, Kuyken and Bockting42 and have been broadly in agreement, finding that younger age at onset, residual symptoms and a shorter duration of remission are associated with an increased risk of relapse. The prescriptive value of these factors remains uncertain. Previous research has also found a higher odds of recurrence associated with both psychosocial impairment and poor coping skills, and that avoidant coping style and ‘daily hassles/life events’ were predictive of recurrence.Reference Beshai, Dobson, Bockting and Quigley2,Reference Hardeveld, Spijker, De Graaf, Nolen and Beekman43
The number of previous episodes was the most common included predictor across the models identified in this review (n = 6).Reference Van Loo, Aggen, Gardner and Kendler25,Reference Klein, Holtman, Bockting, Heymans and Burger28–Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30,Reference Johansson, Lundh and Bjärehed33,Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 The presence of residual symptoms was used as a predictor only in one developed model.Reference Klein, Holtman, Bockting, Heymans and Burger28 Childhood maltreatment was included as a predictor in four of our included studies,Reference Van Loo, Aggen, Gardner and Kendler25,Reference van Loo, Aggen, Gardner and Kendler29,Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30,Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 comorbid anxiety in three,Reference Van Loo, Aggen, Gardner and Kendler25,Reference van Loo, Aggen, Gardner and Kendler29,Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30 neuroticism in oneReference Berlanga, Heinze, Torres, Apiquián and Caballero32 and age of onset in two models.Reference Van Loo, Aggen, Gardner and Kendler25,Reference Ruhe, Mocking, Figueroa, Seeverens, Ikani and Tyborowska36 Notably, rumination was not explored as a predictor in any of the included prognostic models, despite good evidence that this is associated with increased risk of relapse.Reference Buckman, Underwood, Clarke, Saunders, Hollon and Fearon6,Reference Hardeveld, Spijker, De Graaf, Nolen and Beekman43
Wang et al (2014)Reference Wang, Patten, Sareen, Bolton, Schmitz and MacQueen30 found that marital status ‘contributed to’ the prediction of recurrence, whereas Johansson et al (2015)Reference Johansson, Lundh and Bjärehed33 included having a partner or not as one of the two predictors in their final model (odds ratio of 0.12 (95% CI 0.02–0.64), P = 0.01). The extant literature does not support marital status as a predictor of recurrenceReference Burcusa and Iacono4,Reference Evans, Hollon and DeRubeis44 and weaknesses in the methodology of the prognostic model studies mean that we cannot make conclusive statements about this but, given the strength of the association presented,Reference Johansson, Lundh and Bjärehed33 the prognostic significance of ‘having a partner or not’ warrants further investigation. The model development study by Van Loo et al (2018)Reference Van Loo, Aggen, Gardner and Kendler25 supports the findings of earlier research suggesting that gender is unlikely to be predictive of relapse.
There have been some previous attempts to derive and validate multivariable prognostic models to predict depression-related outcomes other than relapse and recurrence. Existing prognostic models for depression outcomes include a model (the Depression Outcomes Calculator-Six Items, (DOC-6©)) to predict remission (C-statistic (AUC) of 0.62, 95% CI 0.57–0.66) or persistent depressive symptoms (C-statistic (AUC) of 0.67, 95% CI 0.61–0.72) at 6 months’ post-diagnosis;11 a model to predict persistent symptoms at six months (C-statistic not reported; R 2 of 0.40 in the development sample and 0.27 in the validation sample);Reference Rubenstein L, Rayburn, Keeler, Ford, Rost and Sherbourne45 and a model to predict onset of depression in general practice attendees who did not currently have depression (C-statistic of 0.79, 95% CI 0.77–0.81).11 The studies in this review present predictive performance statistics broadly in line with these, suggesting that successful individualised prediction might be possible for depression outcomes, but better quality studies and potentially different combinations of predictors are needed to explore this further.
Implications for clinical practice and research
Relapse and recurrence occur in a significant proportion of people with remitted depression and are a source of considerable morbidity. The economic burden of depression is higher in those who experience relapse or recurrence than in those who do notReference Gauthier, Mucha, Shi and Guerin46 and, although interventions to prevent relapse or recurrence of depression (including pharmacological and psychological approaches) can be resource-intensive, they are effectiveReference Clarke, Mayo-Wilson, Kenny and Pilling47–Reference Breedvelt, Brouwer, Harrer, Semkovska, Ebert and Cuijpers49 and cost-effective.Reference Klein, Wijnen, Lokkerbol, Buskens, Elgersma and van Rijsbergen50 Implementation research is needed to ensure that such interventions can be made available to a greater number of patients in a scalable and feasible way.
A potentially effective way of ensuring efficient allocation of relapse prevention interventions is by risk-stratifying patients according to risk of relapse and recurrence. Interventions can then be provided to those most likely to benefit from them. The aetiology of depression and depressive relapse is multifaceted, and multivariable models are likely to be a more helpful approach to predicting outcomes than relying on the presence or absence of single prognostic factors. None of the prognostic models identified in this review had sufficiently high-performance metrics to enable a personalised approach to relapse prevention for depression at present.
We reported some key methodological weaknesses in the studies identified in this review, particularly with respect to sample size. Unless the sample size is adequate, there will be limitations to how far we can trust the predictive performance statistics presented by the model development study as overfitting is likely. Going forward, it might be that data from multiple sources should be combined and harmonised to increase the available sample size for model development. A further consideration is that the data in the included studies were taken from samples collected for other purposes, for example randomised controlled trials and longitudinal cohort studies. Although these are considered acceptable and feasible sources of data for prognostic model studies,Reference Pajouheshnia, Groenwold, Peelen, Reitsma and Moons51 there may be advantages to prospectively gathering data (in a pre-designed prospective cohort study) with the explicit purpose of prognostic model development.Reference Riley, van der Windt, Croft and Moons7 A benefit of this is that researchers can control the collection and ensure standardised measurement of predictor and outcome information, but such an approach is more costly and time-consuming than the secondary analysis of pre-existing data and would require a commitment to resource and fund such work. The International Taskforce for relapse prevention of depression (ITFRA) (www.itfra.org) have begun to address these issues by bringing together data from trials of existing relapse prevention interventions and aiming to harmonise predictor and outcome measurement to improve personalised medicine in this area. Work is also underway aiming to move beyond stratification to provide more robust evidence for treatment moderators and prescriptive factors in relapse prevention.Reference Breedvelt, Warren, Brouwer, Karyotaki, Kuyken and Cuijpers52
Most of the included predictors in the studies identified in this review were clinical or demographic variables. It is possible that including a greater number of biomarkers or genetic information may help move towards such a precision medicine approach, as has been shown promising in a number of other areas, including diagnosing mood disorders.Reference Le-Niculescu, Roseberry, Gill, Levey, Phalen and Mullen53 Nevertheless, such an approach may not be clinically feasible, and an important consideration for researchers is the context and setting in which a prognostic model is intended to be used. Models intended for a primary care setting, for example, may need to focus on a different set of predictors than those intended for use within a specialist service. Primary care-based models would ideally need to include predictors that were available and routinely collected in primary care, such as demographics, socioeconomic information, comorbidities and depression history characteristics.
This review has highlighted a range of statistical approaches to prognostic model development, from ‘traditional’ regression-based techniques to those using machine learning. Machine learning approaches offer the potential of greater predictive performances than more traditional approaches.Reference Tiffin and Paton54 However, this not always the case, as some studiesReference Tate, McCabe, Larsson, Lundström, Lichtenstein and Kuja-Halkola55 have shown. The technique can also be criticised for lack of interpretability, and variable reporting standards, although the forthcoming TRIPOD-AI may encourage greater consistency in this regard. When designing future prognosis research, researchers should be mindful of the relative benefits and disadvantages associated with different methodological approaches. Prognosis research has grown as an area over recent yearsReference Riley, van der Windt, Croft and Moons7 and, with the development of the PROGRESS initiative, there are now standards and guidelines for conducting,Reference Steyerberg, Moons, van der Windt, Hayden and Perel56 reportingReference Moons, Altman, Reitsma, Ioannidis, Macaskill and Steyerberg57 and appraisingReference Wolff, Moons, Riley, Whiting, Westwood and Collins19 prognostic model studies. Future studies looking to develop prognostic models for relapse and recurrence of depression should follow best practice guidance when designing methodology, and should be reported in line with the TRIPOD statement.Reference Moons, Altman, Reitsma, Ioannidis, Macaskill and Steyerberg57
In conclusion, this review identified 11 prognostic models developed to predict the risk of relapse or recurrence in people with remitted depression. The models were developed in a variety of clinical settings and patient populations and with a range of included predictors. We are not yet at the point where we can reliably predict outcomes for a given person with remitted depression based on their demographic, clinical and disease-level characteristics. This review suggests that this might be possible, although the studies identified here were limited by their high ROB because of methodological weaknesses. Researchers should conform to best practice when developing prognostic models in future. Beyond this, any such prognostic models will require good-quality external validation, assessment of clinical utility and evaluation of implementation before they can successfully be translated into clinical practice.
Supplementary material
Supplementary material is available online at https://doi.org/10.1192/bjp.2021.218.
Acknowledgements
This article is based on a Cochrane review published in the Cochrane Database of Systematic Reviews (CDSR) 2021, Issue 5, DOI: 10.1002/14651858.CD013491.pub2 (see www.cochranelibrary.com for information).13 Cochrane Reviews are regularly updated as new evidence emerges and in response to feedback, and the CDSR should be consulted for the most recent version of the review. We thank the Cochrane Prognosis Methods Group for providing guidance and the editorial team of the Cochrane Common Mental Disorders (CCMD) Group. The authors are grateful to the following Patient Advisory Group members who contributed to and provided constructive feedback on the final review: Gregory Ball, Joanne Castleton, Gillian Payne, Sue Penn and Emma Williams. The authors thank Professor Trevor Sheldon and Professor Paul Tiffin, who have provided comments and advice on drafts of this review through their roles as Thesis Advisory Panel members. Thanks to Johanna Damen (Cochrane Prognosis Methods Group), Professor Patty Chondros (Department of General Practice, University of Melbourne) and Karen Morley (Cochrane Consumer) who provided peer review on the original Cochrane review.
Author contribution
A.S.M.: lead author of the review. Responsible for screening and selection of studies, data extraction, ‘Characteristics of studies’ tables, ROB and applicability assessment. N.M.: contributed to the write-up of the review. Responsible for screening and selection of studies, data extraction, ‘Characteristics of studies’ tables, ROB and applicability assessment. K.I.E.S.: third review author in screening of references and selection of studies and ‘Risk of bias’ assessment. Contributed to the write-up of the review. Methodological expertise. R.D.R.: contributed to the write-up of the review. Methodological expertise. L.W.P.: contributed to the write-up of the review. S.D.: developed and conduction information searching strategy. J.H.: contributed to review and write-up of manuscript. S.G.: contributed to the conception of the review. Content expertise. C.A.C.G.: contributed to the conception and write-up of the review. Content expertise. R.C.: contributed to the write-up of the review. R.S.P.: commented on the final draft and provided methodological expertise. S.A.: contributed to the conception of the review and commented on the final draft. D.M.: contributed to the conception of the review and commented on the final draft. Content expertise.
Funding
A.S.M. is funded by a NIHR Doctoral Research Fellowship for this research project (NIHR Doctoral Research Fellowship, Dr Andrew Moriarty, DRF-2018-11-ST2-044). K.I.E.S. is funded by the NIHR School for Primary Care Research (SPCR Launching Fellowship). This publication presents independent research funded by the NIHR. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Declaration of interest
None.
Appendix
PICOTS criteria
eLetters
No eLetters have been published for this article.