Introduction
Predicting treatment outcomes and prognosis for psychiatric patients remains a daunting task. Precision psychiatry is a branch of research focused on this problem (Fernandes et al., Reference Fernandes, Williams, Steiner, Leboyer, Carvalho and Berk2017; Vieta, Reference Vieta2015). This field aims to improve the lives of people suffering from mental illness through ‘the development of tools capable of providing better and more accurate diagnosis, of ascertaining prognosis, guiding treatment and predicting response to treatment, and aiding the development of new and better pharmacological and non-pharmacological treatments’. (Fernandes et al., Reference Fernandes, Williams, Steiner, Leboyer, Carvalho and Berk2017). It has been suggested that tailoring treatments in psychiatry requires increasing the predictability of outcomes for individual patients (Bzdok, Varoquaux, & Steyerberg, Reference Bzdok, Varoquaux and Steyerberg2021). The approach is inspired by precision medicine research based on data-driven analyses; machine learning algorithms are trained on multiple variables to make diagnostic classifications or predictions. The question arises when we can reap the benefits from prediction algorithms in clinical practice (Chekroud et al., Reference Chekroud, Bondar, Delgadillo, Doherty, Wasil, Fokkema and Choi2021; Stein et al., Reference Stein, Shoptaw, Vigo, Lund, Cuijpers, Bantjes and Maj2022)?
Methodologically, the precision approach builds on the foundation of statistical prediction models. In the 1950s, Paul Meehl questioned clinicians' ability to make predictions based on their clinical assessments. He posed that statistical predictions outperform clinical judgments when it comes to diagnosis and treatment indication (Meehl, Reference Meehl1956). However, the integration of clinical assessments of an individual with group-level statistical information remained an unsolved problem. The first attempts to solve this issue with artificial intelligence date back to the 1970s, when so-called expert systems were introduced. Expert systems were computer programs that were assigned the task to mimic human decision-making, including clinical decisions (Kassirer & Gorry, Reference Kassirer and Gorry1978). Although promising at the time, this work failed to transform clinical practice. The interest in biological psychiatry later shifted toward biomarker studies and biological subtyping (Kapur, Phillips, & Insel, Reference Kapur, Phillips and Insel2012). With advances in machine learning methodology in the last decade, and success of precision medicine approaches in other fields such as oncology, precision psychiatry, gained interest. It has the advantage over the expert systems from the seventies that the technology is more sophisticated, while big datasets containing a range of information sources are now available, as described by Topol: ‘The ability to digitize the medical essence of a human being is predicated on the integration of multi-scale data, akin to a Google map, which consists of superimposed layers of data such as street, traffic and satellite views. For a human being, these layers include demographics and the social graph, biosensors to capture the individual's physiome, imaging to depict the anatomy (often along with physiologic data), and the biology from the various omics [..]. In addition to all these layers, there is one's important environmental exposure data’ (Topol, Reference Topol2014). Data-driven approaches may shed new light on pathophysiological pathways (Bzdok & Meyer-Lindenberg, Reference Bzdok and Meyer-Lindenberg2018; Grzenda et al., Reference Grzenda, Kraguljac, McDonald, Nemeroff, Torous, Alpert and Widge2021).
Recent reviews emphasize that the field is in an early stage, and suggest that attempts to move beyond trial-and-error treatments are leading to emerging new therapies – for example using brain-circuit-based approaches (Coutts, Koutsouleris, & McGuire, Reference Coutts, Koutsouleris and McGuire2023; Scangos, State, Miller, Baker, & Williams, Reference Scangos, State, Miller, Baker and Williams2023). Several recent studies report promising results (Chekroud et al., Reference Chekroud, Bondar, Delgadillo, Doherty, Wasil, Fokkema and Choi2021; Dwyer, Falkai, & Koutsouleris, Reference Dwyer, Falkai and Koutsouleris2018; Fernandes et al., Reference Fernandes, Williams, Steiner, Leboyer, Carvalho and Berk2017; Williams, Reference Williams2016), for example by predicting antipsychotic treatment response and side-effects with high accuracy (Coutts et al., Reference Coutts, Koutsouleris and McGuire2023; Dominicus et al., Reference Dominicus, Oranje, Otte, Ambrosen, Düring, Scheepers and van Dellen2023; Koutsouleris et al., Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai, Wobrock and Hasan2016). Unfortunately, validation and implementation largely remain unconsidered, and a closer look at the data currently used for such studies from a clinical point of view suggests that the desired clinical breakthrough is far from within reach (Fountoulakis, Reference Fountoulakis2021). From this perspective, ten clinical and statistical issues in the precision psychiatry literature are discussed (Table 1). First, I will argue that the lack of a valid gold standard in psychiatric diagnoses makes prediction approaches the most promising way forward (challenge 1). I will then consider limitations of commonly used datasets (challenges 2–6) and outcome definitions (challenge 7) for the development of such models. I discuss why the focus of the field needs to shift from technical model development to real-world applicability (challenges 8 and 9), and conclude that complex dynamical systems approaches are the most promising way forward (challenge 10). References to relevant literature on these challenges are provided where available, while newly identified issues (in particular the challenges of treatment (non-)response) are discussed in more detail. Examples used in this paper mainly focus on schizophrenia spectrum disorders because this is the most studied population in the precision psychiatry, but topics discussed here are generalizable to order disorders. Some issues for model development based on retrospective datasets from clinical trials are identified, and promising ways forward are highlighted to really make the translation to clinical implementation.
Classification or prediction: precisely what?
Prediction of future outcomes is the most clinically relevant application of the precision approach. Data-driven classification of patients compared to a ‘gold standard’ such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) (American Psychiatric Association, 2013) is of little value because the specific mix of an individual's symptoms and their evolution over time often poorly fit into one classification (Plana-Ripoll et al., Reference Plana-Ripoll, Pedersen, Holtz, Benros, Dalsgaard, De Jonge and McGrath2019; Romero et al., Reference Romero, Werme, Jansen, Gelernter, Stein, Levey and van der Sluis2022; Van Os et al., Reference Van Os, Gilvarry, Bale, Van Horn, Tattan, White and Murray2000). Heterogeneity in symptoms also exists between patients with the same classification, the classification itself is a poor indicator for treatment susceptibility, and while some possible pathophysiological associations have been identified, these do not form the basis of the diagnosis as they invariably have low diagnostic likelihood ratio's (van Os, Guloksuz, Vijn, Hafkenscheid, & Delespaul, Reference van Os, Guloksuz, Vijn, Hafkenscheid and Delespaul2019).
Precision studies therefore focus on data-driven subtyping of patients based on existing datasets, subsequently comparing the prognosis or treatment susceptibility between categorical subtypes. Alternatively, retrospective studies may attempt to predict outcomes using data from completed treatment trials; prediction models based on randomized controlled trials (RCTs) often use data from the active treatment group to identify patient characteristics that may predict response. However, this approach also has several potential pitfalls that limit translation to clinical practice, as will be discussed in the following (Box 1).
Precision psychiatry – Precision in the context of precision medicine refers to similar outcomes with repeated measurements (Ashley, Reference Ashley2016). Interventions may be targeted with more precision when they are based on better characterization of similarities with other patients.
Personalized psychiatry – The term precision psychiatry is sometimes used as an interchangeable term for personalized psychiatry, but they have slightly different meanings. Personalized psychiatry aims to tailor interventions to specific individuals. Precision psychiatry may thus be used to develop models that help to inform patients more accurately about expected outcomes of interventions, and this information can aid personalized clinical decisions (National Research Council, 2011).
Biomarker – A biomarker is a measurable indicator of a biological state or condition. In the context of precision psychiatry, a biomarker could be used as an indicator of treatment response or prognosis (First et al., Reference First, Botteron, Castellanos, Dickstein and Hospital2012).
Machine learning – a form of artificial intelligence where data and algorithms are used to imitate human learning, hereby improving task performance.
Predictor – independent variable in a statistical model that contains information about the occurrence of an event
Accuracy - Accuracy refers to the extent to which an outcome reflects the true state of the targeted construct or condition under investigation. An example is the fraction of correctly predicted outcomes of a prediction model. Accuracy may thus be used to evaluate the merit of precision approaches as compared to a randomized, one-size-fits-all approach.
Patient selection
Patients with psychiatric disorders described in the scientific literature on treatment response and/or prognosis were mostly required to give informed consent for study participation, and for good reasons. However, patients with certain characteristics, for example, those who are severely paranoid at the time of assessment, are systematically undersampled as a consequence (Taipale et al., Reference Taipale, Schneider-Thoma, Pinzón-Espinosa, Radua, Efthimiou, Vinkers and Luykx2022). Similarly, patients are often excluded if treated under justiciary coercive measures (Luciano et al., Reference Luciano, Sampogna, Del Vecchio, Pingani, Palumbo, De Rosa and Fiorillo2014). Studies based on these data will thus consider, on average, moderately ill patients (Taipale et al., Reference Taipale, Schneider-Thoma, Pinzón-Espinosa, Radua, Efthimiou, Vinkers and Luykx2022). This is a well-known limitation of clinical trials for the generalizability of findings to other populations and settings, such as patients with severe psychosis. When clinical information is used as input variable for a prediction model of e.g. treatment outcome, this selection has additional negative impact: the (distribution of) input information deviates from the data in real-world clinical settings, further reducing the generalizability of findings (Brand, de Boer, Dazzan, & Sommer, Reference Brand, de Boer, Dazzan and Sommer2022). For psychosis treatment, male sex, unmet psychosocial needs, and functional deficits are examples of predictors of worse clinical outcome that also increase the likelihood of coercive measures being applied (Koutsouleris et al., Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai, Wobrock and Hasan2016). As coercive measures are often an exclusion criterion of clinical trials, this will negatively impact prediction model performance in clinical practice.
Future studies should therefore train models based on real-world data with limited exclusion criteria where possible. Data harmonization initiatives that are currently being developed are crucial to ensure that naturalistic data are of sufficient quality to make generalizable inferences (‘Research Harmonisation Award Schizophrenia International Research Society’, n.d.).
Fairness
Diversity and inclusion are essential to consider in precision medicine approaches. This is particularly relevant in the field of psychiatry, as societal exclusion and discrimination are directly linked to the development of psychiatric disorders. Representation of groups sensitive to exclusion, for example based on gender, ethnicity, or sexual orientation, is therefore particularly relevant. Non-native speakers may have been excluded from studies because standardized interviews are otherwise not available, or data have been obtained in psychiatric hospitals which are less accessible for specific groups due to insurance discrimination (Mamun et al., Reference Mamun, Nsiah, Srinivasan, Chaturvedula, Basha, Cross and Vishwanatha2019). Geographic underrepresentation of included samples is another factor that has been shown to limit the generalizability of precision prediction models (Meehan et al., Reference Meehan, Lewis, Fazel, Fusar-Poli, Steyerberg, Stahl and Danese2022).
In the machine learning field, inclusion is closely related to the concept of fairness, which refers to the idea that machine learning models should not be biased or discriminatory (Mitchell, Potash, Barocas, D'Amour, & Lum, Reference Mitchell, Potash, Barocas, D'Amour and Lum2021). To address fairness, one approach is to ensure that the algorithms themselves are not biased or based on discriminatory variables (algorithmic fairness). Algorithms may be systematic biased toward assigning less favorable outcomes to specific groups (group fairness), such as patients with lower educational attainment, both in prediction models and clinical judgment (Sahin et al., Reference Sahin, Kambeitz-Ilankovic, Wood, Dwyer, Upthegrove, Salokangas and Kambeitz2024). Another approach is to consider the impact of the model on different groups of people (group fairness). For example, non-native speakers may have been excluded from studies because standardized interviews are otherwise not available, or data have been obtained in psychiatric hospitals which are less accessible for specific groups due to insurance discrimination (Mamun et al., Reference Mamun, Nsiah, Srinivasan, Chaturvedula, Basha, Cross and Vishwanatha2019). In addition to group fairness, individual fairness involves treating individual instances of data (i.e. similar individuals) equally. By ensuring that precision models are fair and unbiased, we can use them ethically and responsibly. There may be unresolved or unidentified issues related to diversity and inclusivity in precision psychiatry research. To address these issues and promote an inclusive approach, it is recommendable to include a diversity and fairness statement in precision psychiatry papers for transparency, as has been suggested for citations (Zurn, Bassett, & Rust, Reference Zurn, Bassett and Rust2020).
Treatment dose and duration
A substantial number of medication trials treated patients with a dose or duration that is insufficient for the evaluation of treatment efficacy (Howes et al., Reference Howes, McCutcheon, Agid, De Bartolomeis, Van Beveren, Birnbaum and Correll2017). Many clinical trials were designed to demonstrate the efficacy of an agent rather than to determine the optimal dose and duration of treatment. Importantly, the optimal dose and minimal treatment duration to reach an effect may vary across subjects, while the optimal dose for treatment effects may often not be reached due to intolerable side effects (Kahn et al., Reference Kahn, Winter van Rossum, Leucht, McGuire, Lewis, Leboyer and Sommer2018; Leucht et al., Reference Leucht, Cipriani, Spineli, Mavridis, Örey, Richter and Davis2013; Zhu et al., Reference Zhu, Krause, Huhn, Rothe, Schneider-Thoma, Chaimani and Leucht2017). Treatment tolerability is a very important but different issue than treatment effectiveness. Patients can therefore be labeled non-responders to a treatment that is in fact potentially effective because the minimally effective dose is never reached due to intolerability. Finally, many trials have a relatively short follow-up. This may lead to underestimations of the effectiveness (and overestimations of tolerability) of the treatment because a longer follow-up was needed. It may also lead to overestimation of the effectiveness in others because the treatment effects were only evaluated under strict conditions (e.g. during hospital admission), which may not represent the real-world functioning of the patient (Fig. 1). Note that while these issues are addressed here for medication trials, similar issues can occur in studies of other interventions such as psychotherapy or brain stimulation. Minimally effective dose and duration should therefore be defined in outcome prediction studies but are currently rarely reported, and personalized estimates of dose and duration appropriateness should be obtained in prospective studies where possible.
Treatment response
A major limitation of retrospective prediction studies on clinical trial data is the lack of consideration of the placebo effect, the Hawthorne effect (the phenomenon where people modify their behavior and may experience symptom reduction due to the fact that they are being observed or studied) and the natural course of the disorder (Howick et al., Reference Howick, Friedemann, Tsakok, Watson, Tsakok, Thomas and Heneghan2013). Psychotropic medication or psychotherapeutic effects are likely at least partially based on separate (biological) mechanisms (Chopra et al., Reference Chopra, Francey, O'Donoghue, Sabaroedin, Arnatkeviciute, Cropley and Fornito2021). In psychiatry, placebo-effects are relatively stronger than active treatment effects (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer and Davis2017; van Os et al., Reference van Os, Guloksuz, Vijn, Hafkenscheid and Delespaul2019). For precision psychiatry studies aiming to predict treatment response, especially when based on biological data, this becomes a major problem.
A thought experiment of a study with a theoretical ‘perfect predictor’ shows the implications of placebo-induced bias. A perfect predictor will only label responders due to active treatment effects with a deviant prediction score, while all other patients will be labeled non-responder. If this predictor is truly specific to active treatment effects, this means that it will categorize ‘placebo-responders’ as non-responders: in these patients, there is no relationship between active treatment and reduction of symptoms.
According to an American Psychiatric Association (APA) consensus statement for (neuroimaging) markers, a biomarker must be at least 80% sensitive, 80% specific, and 80% accurate in order to be considered reliable (First, Botteron, Castellanos, Dickstein, & Hospital, Reference First, Botteron, Castellanos, Dickstein and Hospital2012). For a perfectly reliable predictor to meet these requirements – be it a biomarker or a predictor of any other nature – a treatment would need to be at least four times (80%/20%) more effective than placebo in order account for placebo response in the ‘gold standard’ data.
This level of effectiveness is far from reality for psychiatric treatments. For example, 51% of patients suffering from psychosis are estimated to show minimal response to antipsychotic treatment, in comparison to 30% to placebo treatment (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer and Davis2017). Thus, for every 51 patients classified as a responder, 30 may have recovered due to effects unrelated to the pharmacological antipsychotic treatment response (labeled false negatives). As a result, the sensitivity of our predictor will drop to 41% (21 true positives out of 51 responders) in the trial, and its accuracy will be 70% (21 true positives + 49 true negatives), failing the APA requirements (Fig. 2).
Setting a more stringent threshold for treatment response (which could be done because this threshold is arbitrary, as will be discussed later) cannot help to overcome this problem. In antipsychotic treatment trials, the response-ratio between active treatment (23%) and placebo treatment (14%) for 50% symptom reduction was similar to that for minimal response (defined as 20% symptom reduction) (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer and Davis2017). With this more stringent threshold for response, sensitivity will even drop to 39%.
To summarize: in psychiatric treatment conditions where placebo effects and natural course of the disorder cannot be disentangled at the individual level, any theoretically perfect predictor will fail the reliability test in clinical trials. Studies reporting predictors of treatment response with high-performance levels without accounting for these issues should caution readers that the reliability of the model may be overestimated.
Treatment non-response
It may be argued that the effects of placebo and natural fluctuations in mental health can be circumvented by making non-response instead of response the target of our outcome predictor. However, several factors may cause false negatives (i.e. treatment is labeled ineffective for a person, even though it could have been beneficial) in the group of non-responders. For example, in patients with schizophrenia spectrum disorders, non-adherence to treatment is approximated at 50% (adherence is here defined as medication taken as described at least 75% of the time) (Lacro, Dunn, Dolder, Leckband, & Jeste, Reference Lacro, Dunn, Dolder, Leckband and Jeste2002). In a study of our perfect predictor, these participants may be classified as responders while they are clinically classified as non-responders, and will therefore be considered as ‘false positives’. Even when placebo-effects are not considered, the accuracy in such a study would be around 75% (24 true negatives + 51 true positives), again failing the APA criteria. The Treatment Response and Resistance in Psychosis (TRRIP) Working Group made recommendations for adherence monitoring, but excluding non-adhering patients from trials will likely induce selection bias, and, in the best-case scenario, will lead to 72% adherence (Howes et al., Reference Howes, McCutcheon, Agid, De Bartolomeis, Van Beveren, Birnbaum and Correll2017).
Social circumstances and external factors such as ongoing exposure to cannabis or (traumatic) stressors during treatment may further contribute to treatment ineffectiveness (Marsman et al., Reference Marsman, Pries, Ten Have, De Graaf, Van Dorsselaer, Bak and Van Os2020; Patel et al., Reference Patel, Wilson, Jackson, Ball, Shetty, Broadbent and Bhattacharyya2016). In clinical trials, these factors may be considered random noise in comparisons between active and placebo interventions, but this assumption is not necessarily helpful for the validation of outcome prediction models.
Possible ways forward are the additional inclusion placebo-treatment data in prediction studies where ethically defendable and feasible, or to perform open-label trials with blinded discontinuation. This would make it possible to predict the proportional improvement due to ‘true’ treatment effects (Hafliðadóttir et al., Reference Hafliðadóttir, Juhl, Nielsen, Henriksen, Harris, Bliddal and Christensen2021). Similar approaches could be used to incorporate estimates of natural course of the disorder or non-adherence, in order to improve the real-world performance of the model. Another promising approach in patients with relatively stable states of disorder and a focus on short-term treatment effects is incorporation of information from multiple N = 1 trials, and subsequent meta-analysis thereof, where the impact of treatment is randomized within an individual (Hendrickson, Thomas, Schork, & Raskind, Reference Hendrickson, Thomas, Schork and Raskind2020).
Outcome definitions
Psychiatric disorders such as psychosis form a spectrum or continuum, ranging from chronically disabling illness to brief, transient, and non-clinical experiences (Guloksuz & Van Os, Reference Guloksuz and Van Os2018). The spectrum is expressed at multiple levels, including symptom severity, genetic liability, neuroanatomical correlates, and functional outcomes after a psychotic episode (Guloksuz & Van Os, Reference Guloksuz and Van Os2018; Ripke et al., Reference Ripke, Neale, Corvin, Walters, Farh, Holmans and O'Donovan2014; Van Dellen et al., Reference Van Dellen, Bohlken, Draaisma, Tewarie, Van Lutterveld, Mandl and Sommer2016; Van Os, Linscott, Myin-Germeys, Delespaul, & Krabbendam, Reference Van Os, Linscott, Myin-Germeys, Delespaul and Krabbendam2009). Clinical translation of these insights remains an unsolved problem. Guidelines for clinical decisions in patients with psychosis are still largely based on research that uses the categorical concept of schizophrenia (van Os et al., Reference van Os, Guloksuz, Vijn, Hafkenscheid and Delespaul2019). The state-of-the-art consensus criteria for remission after treatment in psychosis research are the Andreassen remission criteria, which are based on a subset of Positive and Negative Symptom Scale (PANSS) items (Andreasen et al., Reference Andreasen, Carpenter, Kane, Lasser, Marder and Weinberger2005). Patients diagnosed with psychosis may, however, already fulfill the remission criteria at baseline (Kahn et al., Reference Kahn, Winter van Rossum, Leucht, McGuire, Lewis, Leboyer and Sommer2018). Alternatively, treatment response may be defined as an (arbitrarily defined) cut-off point in the reduction of symptom severity (e.g. 20% reduction on the PANSS) (Howes et al., Reference Howes, McCutcheon, Agid, De Bartolomeis, Van Beveren, Birnbaum and Correll2017; Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer and Davis2017). Recent trial data show that this will roughly result in a ‘median split’ dichotomization of the sample into treatment responders and non-responders (Kahn et al., Reference Kahn, Winter van Rossum, Leucht, McGuire, Lewis, Leboyer and Sommer2018). This approach may help to gain statistical power and contrast but is unlikely to represent a (biologically or epidemiologically) plausible contrast between patients, as symptom reduction distributions follow a Gaussian distribution (Fig. 3) (Fried, Flake, & Robinaugh, Reference Fried, Flake and Robinaugh2022; MacCallum, Zhang, Preacher, & Rucker, Reference MacCallum, Zhang, Preacher and Rucker2002). Prediction models of treatment response based on this approach are therefore unlikely to lead to meaningful insights that can be directly implemented in clinical practice. Continuous treatment outcome measures are more realistic and estimating change in symptom severity may be a way forward. Furthermore, absolute reductions rather than relative reductions in symptoms may be used as outcome measures, because treatment may be more effective in patients with more severe symptoms (Furukawa et al., Reference Furukawa, Levine, Tanaka, Goldberg, Samara, Davis and Leucht2015). At another level, outcomes are often defined based on symptom severity scores. Other outcomes – such as social and existential outcomes – are more relevant for patients, and therefore should be prioritized when an algorithm is used to indicate if a treatment would be suitable for the individual (Maj et al., Reference Maj, van Os, De Hert, Gaebel, Galderisi, Green and Ventura2021). A possible mismatch between modeled and desired outcome measures should therefore be considered.
Validation and implementation
External validation of prediction models in independent, naturalistic cohorts across multiple settings is required in order to establish the generalizability of findings. In practice, validation studies rarely use the same methods as the original work they aim to replicate (if attempts to do so are made at all). Moreover, prediction algorithms need to be tested prospectively (and in multiple n = 1 studies where possible) before they can be clinically implemented. The current literature not only lacks such rigorous testing but also lacks a comparison of their performance to existing standards of care (Salazar De Pablo et al., Reference Salazar De Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver and Fusar-Poli2021). The evaluation of these models based on symptom severity questionnaires may show a mismatch with patient outcomes if factors such as treatment tolerability are not taken into account (Chen & Asch, Reference Chen and Asch2017). Prospective validation of prediction models across real-life outcomes and settings is thus crucial but rarely performed.
While a lot of research is devoted to the development of new outcome prediction models, few studies address how these models should be implemented in clinical care (Salazar De Pablo et al., Reference Salazar De Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver and Fusar-Poli2021). Factors that may hamper implementation include potential harm to the service user, limited access to data from the local setting, and unfamiliarity with prediction models among practitioners and patients (Baldwin et al., Reference Baldwin, Loebel-Davidsohn, Oliver, Salazar de Pablo, Stahl, Riper and Fusar-Poli2022). This risk increases when the complexity of models increases and the implications and assumptions of the model become less transparent.
Finally, the implementation of prediction models may shape clinical practice, for example by causing a shift in the composition of the patient population. This can in turn impact the validity of the model. Certain treatment options can become more attractive when the outcome is more predictable (for example if potential severe side-effects of treatment can be ruled out in advance). This will change the population treated with this intervention, as this treatment may be considered earlier in the treatment protocol. Adaptive modeling approaches are therefore required, but this introduces new challenges, for example regarding privacy (Garralda et al., Reference Garralda, Dienstmann, Piris-Giménez, Braña, Rodon and Tabernero2019). Federated learning – a learning paradigm to collectively train algorithms in local settings without exchanging the data itself – is an attractive approach to solving such issues. With this approach, models are dispatched to individual healthcare facilities without exchanging personal data. Parameters are optimized to the local setting and sent back to a central server for aggregation. This process actively addresses privacy concerns and minimizes exposure to personal data. Healthcare information processing systems should be transformed to facilitate such approaches (McMahan, Moore, Ramage, Hampson, & Arcas, Reference McMahan, Moore, Ramage, Hampson and Arcas2017; Rieke et al., Reference Rieke, Hancox, Li, Milletarì, Roth, Albarqouni and Cardoso2020).
Contextual behavioral factors
From a contextual behavioral perspective, mental health emerges from the dynamic interaction between the individual and the environment (Ford & Urban, Reference Ford and Urban1998). For studies aiming to predict treatment outcomes, this means that the effectiveness of interventions may vary within an individual depending on the setting and circumstances in which the intervention is provided. For example, the treatment response of medication may be (non-linearly) influenced by the setting: response to treatment could be different in clinical v. outpatient care settings with or without community treatment facilities in place. Other factors include the system of friends and family surrounding the patient, the local mental health care system (e.g. private v. public insurance systems), concomitant treatments (e.g. pharmacological treatment with or without parallel psychotherapy), and judiciary status (e.g. voluntary or coercive treatment) (Glick, Stekoll, & Hays, Reference Glick, Stekoll and Hays2011; Kessing et al., Reference Kessing, Hansen, Hvenegaard, Christensen, Dam, Gluud and Wetterslev2013; Koutsouleris et al., Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai, Wobrock and Hasan2016; Polese, Fornaro, Palermo, De Luca, & De Bartolomeis, Reference Polese, Fornaro, Palermo, De Luca and De Bartolomeis2019; Taipale et al., Reference Taipale, Schneider-Thoma, Pinzón-Espinosa, Radua, Efthimiou, Vinkers and Luykx2022). The fact that the impact of interventions is context-dependent is further illustrated by the increase in placebo response over time in psychiatric clinical trial data (Weimer, Colloca, & Enck, Reference Weimer, Colloca and Enck2015).
Precision psychiatry studies often (implicitly) assume that treatment response markers are stable over time and context, which may not be the case; all the factors mentioned above may change over time within an individual. Cultural factors, beliefs, expectations, and values of the individual that are to be treated within a precision framework may also contribute to the distress caused by mental health symptoms, both in a positive and negative way (de Andino & de Mamani, Reference de Andino and de Mamani2022). Integrating the variability of mental health and behavior in the (often biologically oriented) precision framework is a major challenge (Köhne & Van Os, Reference Köhne and Van Os2021). A possible solution is the use of an integrative approach during model development, where static and dynamic factors contributing to outcomes are combined. Contextual predictive factors of interest that dynamically change over time (and therefore could be improved with targeted interventions) include the recognition of the impact of discrimination, (self)stigma, and value alignment of the therapy with the familial, social, and cultural context. In addition, quantitative and qualitative research, within the same study sample and in cocreation with patients, may strengthen model validity and may lead to additional insights.
From linear predictions to complex dynamics
Taken together, linear prediction models of outcomes in precision studies are unlikely to lead to improvements in clinical care. Even with complex machine learning approaches, the underlying assumption remains that a combination of factors at baseline will linearly lead to a predictable outcome (Van Os & Kohne, Reference Van Os and Kohne2021). It has also been argued that even successful implementation of precision medicine may only have a limited impact from a public health perspective (Joyner & Paneth, Reference Joyner and Paneth2015). So how to move forward?
There is compelling evidence that mental health is better understood as a complex dynamical system (Borsboom, Haslbeck, & Robinaugh, Reference Borsboom, Haslbeck and Robinaugh2022; Fried & Robinaugh, Reference Fried and Robinaugh2020). Complexity theory suggests that systems are unique and should be approached individually. There is rich diversity in the clinical symptoms of patients and in the contributing factors to their mental health (Fried et al., Reference Fried, Flake and Robinaugh2022; van Os et al., Reference van Os, Guloksuz, Vijn, Hafkenscheid and Delespaul2019). These factors include positive contributing factors in addition to psychiatric vulnerabilities (Huber et al., Reference Huber, Van Vliet, Giezenberg, Winkens, Heerkens, Dagnelie and Knottnerus2016). All these factors are interconnected in systems, and their interactions influence outcomes (Borsboom, Reference Borsboom2017).
Advances in psychiatric symptom network analysis are therefore promising and require further integration with biological, psychological, and social factors. Symptom network theory reconceptualizes mental disorders as intricate networks of interconnected nodes and edges rather than collections of co-occurring symptoms (Borsboom, Reference Borsboom2017). In this network, each symptom acts as a node whose edges describe their interrelationships (Epskamp & Fried, Reference Epskamp and Fried2018). An example of the potential value of symptom network analysis is a study that revealed how childhood trauma may be linked to psychosis through different paths. For some individuals, childhood trauma was connected to psychosis via depression, while in others, it was linked to impulse control (Isvoranu et al., Reference Isvoranu, Van Borkulo, Boyette, Wigman, Vinkers, Borsboom and Myin-Germeys2017). Symptom network analysis may also be used to capture dynamics in mental health. For example, delusions are often a core (central) symptom of psychosis in acute phases, but a few months later, this is no longer the case (Demyttenaere et al., Reference Demyttenaere, Leenaerts, Acsai, Sebe, Laszlovszky, Barabássy and Correll2022). Different antipsychotic treatments can uniquely modulate these symptom nodes, providing more evidence that the network approach offers a potential roadmap for dynamic, personalized treatments (Sun et al., Reference Sun, Zhang, Lu, Yan, Guo, Liao and Yue2023).
In complex dynamical systems, the history of individual elements is crucial for the probability distribution of future outcomes. This again contrasts with the idea that outcomes of future patients can be made predictable based on retrospective analysis of data from others. It stresses the importance of prevention in mental health care and fits naturally in descriptive approaches used in clinical practice when we take patients' personal histories (Psaty, Dekkers, & Cooper, Reference Psaty, Dekkers and Cooper2018). Computational psychiatry and implementations of virtual trials based on personal data, as currently under development in neuroscience, may be important steps forward (de Haan, Reference de Haan2017; Huys, Maia, & Frank, Reference Huys, Maia and Frank2016). For example, virtual brain models are currently being developed to model the impact of resective surgery on epilepsy and brain tumors to inform surgical planning (Jirsa et al., Reference Jirsa, Wang, Triebkorn, Hashemi, Jha, Gonzalez-Martinez and Bartolomei2023; van Dellen et al., Reference van Dellen, Hillebrand, Douw, Heimans, Reijneveld and Stam2013). Similar approaches could be used to model the impact of interventions in psychiatry. Finally, as the survival of dynamical complex (eco)systems depends on their adaptivity or resilience (Gao, Barzel, & Barabási, Reference Gao, Barzel and Barabási2016), specific interventions should go hand in hand with an intervention that increases resilience and flexibility (Davydov, Stewart, Ritchie, & Chaudieu, Reference Davydov, Stewart, Ritchie and Chaudieu2010).
Conclusion
By leveraging the advances in technology and the availability of large datasets, precision psychiatry approaches may contribute to the predictability of prognosis and response to prevention or treatment. Future research should consider limitations of currently available datasets including selection bias, fairness, and the noisy reality of treatment data from clinical trials, and incorporate contextual behavioral factors in a broader framework of mental health as a complex dynamical system. Research on methodological innovations should consider implementation in the real-world settings early on in the process.
Acknowledgements
I thank Jim van Os and Arjen Slooter for their insightful comments on an earlier version of this manuscript.
Funding statement
This work was supported by The Netherlands Organization for Health Research and Development (ZonMW) GGZ fellowship, Award ID: 60-63600-98-711, and a Rudolf Magnus Fellowship from the UMC Utrecht Brain Center.
Competing interests
None.