A computational neuroimaging study of reinforcement learning and goal-directed exploration in schizophrenia spectrum disorders

A. J. Culbreth; E. K. Schwartz; M. J. Frank; E. C. Brown; Z. Xu; S. Chen; J. M. Gold; J. A. Waltz

doi:10.1017/S0033291722003993

A computational neuroimaging study of reinforcement learning and goal-directed exploration in schizophrenia spectrum disorders

Published online by Cambridge University Press: 08 February 2023

Z. Xu ,

S. Chen ,

J. M. Gold and

J. A. Waltz

Show author details

A. J. Culbreth: Affiliation:
Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
E. K. Schwartz: Affiliation:
Signant Health, San Diego, CA, USA
M. J. Frank: Affiliation:
Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, RI, USA Department of Psychiatry and Brown Institute for Brain Science, Brown University, Providence, RI, USA
E. C. Brown: Affiliation:
School of Health and Care Management, Arden University, Berlin, Germany
Z. Xu: Affiliation:
Applied LifeSciences & Systems, Morrisville, NC, USA
S. Chen: Affiliation:
Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, USA
J. M. Gold: Affiliation:
Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
J. A. Waltz*: Affiliation:
Department of Psychiatry, Maryland Psychiatric Research Center (MPRC), University of Maryland School of Medicine, Baltimore, MD, USA
*: Author for correspondence: J. A. Waltz, E-mail: [email protected]

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Methods and materials
Results
Discussion
Conflict of interest
References

Rights & Permissions

Abstract

Background

Prior evidence indicates that negative symptom severity and cognitive deficits, in people with schizophrenia (PSZ), relate to measures of reward-seeking and loss-avoidance behavior (implicating the ventral striatum/VS), as well as uncertainty-driven exploration (reliant on rostrolateral prefrontal cortex/rlPFC). While neural correlates of reward-seeking and loss-avoidance have been examined in PSZ, neural correlates of uncertainty-driven exploration have not. Understanding neural correlates of uncertainty-driven exploration is an important next step that could reveal insights to how this mechanism of cognitive and negative symptoms manifest at a neural level.

Methods

We acquired fMRI data from 29 PSZ and 36 controls performing the Temporal Utility Integration decision-making task. Computational analyses estimated parameters corresponding to learning rates for both positive and negative reward prediction errors (RPEs) and the degree to which participates relied on representations of relative uncertainty. Trial-wise estimates of expected value, certainty, and RPEs were generated to model fMRI data.

Results

Behaviorally, PSZ demonstrated reduced reward-seeking behavior compared to controls, and negative symptoms were positively correlated with loss-avoidance behavior. This finding of a bias toward loss avoidance learning in PSZ is consistent with previous work. Surprisingly, neither behavioral measures of exploration nor neural correlates of uncertainty in the rlPFC differed significantly between groups. However, we showed that trial-wise estimates of relative uncertainty in the rlPFC distinguished participants who engaged in exploratory behavior from those who did not. rlPFC activation was positively associated with intellectual function.

Conclusions

These results further elucidate the nature of reinforcement learning and decision-making in PSZ and healthy volunteers.

Keywords

Dopamine motivation psychosis reward prediction error rostrolateral prefrontal cortex ventral striatum

Type: Original Article
Information: Psychological Medicine , Volume 53 , Issue 14 , October 2023 , pp. 6600 - 6610

DOI: https://doi.org/10.1017/S0033291722003993 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

Previous literature indicates that cognitive and motivational systems are impaired in people with schizophrenia (PSZ), such that PSZ exhibit a reduced tendency to engage in goal-directed behavior (Gold, Waltz, Prentice, Morris, & Heerey, Reference Gold, Waltz, Prentice, Morris and Heerey2008; Kring & Barch, Reference Kring and Barch2014). In this literature, goal-directed behavior is frequently examined with paradigms designed to elicit learning from prediction errors (PE), mismatches between expectations and outcomes (McClure, Berns, & Montague, Reference McClure, Berns and Montague2003; Pessiglione, Seymour, Flandin, Dolan, & Frith, Reference Pessiglione, Seymour, Flandin, Dolan and Frith2006). However, the precise mechanisms that give rise to reduced goal-directed behavior in PSZ remain unknown. Reduced goal-directed behavior could arise from multiple mechanisms including (1) deficient learning from positive PEs (Go-learning); (2) intact or enhanced learning from negative PEs (NoGo-learning); and/or (3) a reduced ability to engage in information-seeking behavior to determine the best actions in unfamiliar environments (diminished uncertainty-driven exploration). Our goal was to examine these contributors to goal-directed behavior and their neural correlates in PSZ and controls.

Goal-directed behavior is commonly examined through paradigms designed to elicit learning for PEs. These tasks are able to quantify the extent to which individuals use surprising positive and/or negative PEs to guide decision-making (Go-learning v. NoGo-Learning). However, PE-driven learning is not the only signal relevant to goal-directed behavior. For example, in unfamiliar and changing environments individuals must decide between either repeating actions that have resulted in optimal outcomes (exploiting) or trying new actions that could yield even better results (exploring). Uncertainty-driven exploration refers to behaviors where the goal is to increase knowledge of reward contingencies of options about which the least is known, by selecting those options (Frank, Doll, Oas-Terpstra, & Moreno, Reference Frank, Doll, Oas-Terpstra and Moreno2009; Gershman, Reference Gershman2019; Wilson, Geana, White, Ludvig, & Cohen, Reference Wilson, Geana, White, Ludvig and Cohen2014). In such circumstances, the relative uncertainty about the value of competing options is believed to be a key factor in the trade-off between the exploitation of known contingencies and information seeking about contingencies about which we know little (Cavanagh, Figueroa, Cohen, & Frank, Reference Cavanagh, Figueroa, Cohen and Frank2012; Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009; Payzan-Lenestour & Bossaerts, Reference Payzan-Lenestour and Bossaerts2011). Relative uncertainty is essentially the difference in certainty between ‘known’ and ‘unknown’ options, and thus it is distinct from the mean or overall uncertainty of all options.

Previous reports have provided evidence for deficient learning from positive outcomes, intact learning from negative outcomes, and reduced uncertainty-driven exploration in PSZ, with associations to negative symptoms. Several studies have found that PSZ show deficits in using positive reward PEs to drive behavior, and that such deficits are most robust in those with severe negative symptoms (Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012; Waltz, Frank, Wiecki, & Gold, Reference Waltz, Frank, Wiecki and Gold2011; Waltz & Gold, Reference Waltz and Gold2007). In contrast, learning from negative reward PEs has often been found to be intact in medicated PSZ (Culbreth, Westbrook, Xu, Barch, & Waltz, Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b; Dowd, Frank, Collins, Gold, & Barch, Reference Dowd, Frank, Collins, Gold and Barch2016). Such findings provide an intriguing account of reduced goal-directed behavior wherein PSZ are capable of learning what not to do, to avoid punishment but not what to do, to obtain reward. Finally, multiple studies have found evidence for reduced uncertainty-driven exploration in PSZ (Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011; Waltz, Wilson, Albrecht, Frank, & Gold, Reference Waltz, Wilson, Albrecht, Frank and Gold2020). For example, Strauss found that PSZ exhibited a reduction in goal-directed exploration, which was related to anhedonia (2011). Thus, negative symptoms have been associated with behavioral and neural signals underlying goal-directed behavior in PSZ.

Cognitive deficits in PSZ are also associated with behavioral and neural signals underlying goal-directed behavior. PE-driven learning and uncertainty-driven exploration rely on cognitive resources, in terms of being able to use active representations of value to detect differences among options and changes in environmental contingencies. A consistent finding in the literature is that PSZ show a reduced ability to use cognitively demanding learning strategies compared to controls (Culbreth, Westbrook, Daw, Botvinick, & Barch, Reference Culbreth, Westbrook, Daw, Botvinick and Barch2016a; Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012). Further, reduced uncertainty-driven exploration in PSZ has been linked to cognitive deficits (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020).

In addition to detailing potential contributors to goal-directed behavior, work in the basic and clinical sciences has begun to delineate the neural correlates of these behavioral mechanisms. Numerous neuroimaging studies in non-psychiatric samples have shown that intact positive PE-driven learning depends on the intact functioning of ventral striatum (VS) and ventromedial prefrontal cortex (PFC), as these structures are involved in the representation of value and the signaling of positive PEs (Clark, Cools, & Robbins, Reference Clark, Cools and Robbins2004). Studies examining VS PE-related activation in PSZ have been mixed, some showing impairments in mostly medication-naïve samples (Radua et al., Reference Radua, Schmidt, Borgwardt, Heinz, Schlagenhauf, McGuire and Fusar-Poli2015) and others showing intact signaling in medicated patients (Culbreth et al., Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b). Finally, additional studies in the basic science literature point to a role for rostrolateral (rl) PFC in the representation of relative uncertainty of option value (Badre, Doll, Long, & Frank, Reference Badre, Doll, Long and Frank2012; Zajkowski, Kossut, & Wilson, Reference Zajkowski, Kossut and Wilson2017). Importantly, the neural correlates of exploratory behavior have yet to be examined in PSZ.

Our goal was to examine contributors to goal-directed behavior and their neural correlates in PSZ and controls. Behaviorally, we hypothesized that impairments in the execution of goal-directed behavior observed in psychosis will be characterized by both (1) impaired learning from positive PEs and (2) intact learning from negative PEs. However, we hypothesized that impairments in goal-directed behavior would extend beyond reduced PE-driven learning, involving mechanisms for adjudicating the explore/exploit trade-off. We employed a behavioral paradigm, the Temporal Utility Integration (TUI) task, which lends itself to computational modeling, allowing us to estimate parameters corresponding to learning from positive and negative outcomes, as well as exploratory behavior. Computational modeling also enabled us to estimate the contributions of representations of certainty about value in modulating decision-making.

Administration of our experimental task in conjunction with fMRI allowed for examination of the neural correlates associated with aspects of goal-directed behavior. Importantly, this is the first study to investigate neural mechanisms of exploratory behavior in PSZ. We predicted that activity in rlPFC would track the tendency to use relative uncertainty to drive exploration (distinguishing individuals prone to use exploration from those who do not). Finally, we expected neural correlates to systematically relate to the severity of motivational deficits and cognitive impairment in PSZ.

Methods and materials

Recruiting and screening

Twenty-nine PSZ and 36 demographically matched controls provided written informed consent to protocols approved by the Institutional Review Board of the University of Maryland School of Medicine (Protocol HP-00051996) and successfully completed a behavioral task in the MRI scanner. All patients were chronic outpatients on stable antipsychotic medication regimens (no changes for four weeks). Diagnosis of schizophrenia or schizoaffective disorder in patients was confirmed using the Structured Clinical Interview for DSM-IV-R Disorders (First & Gibbon, Reference First and Gibbon2004), as was the absence of Axis I and Axis II diagnoses in controls. Additional exclusionary criteria included pregnancy and admission of past substance dependence.

Assessment

Premorbid intellectual function was assessed using the Wechsler Test of Adult Reading (WTAR; Weschler, Reference Weschler2001). Standard symptom ratings were obtained for PSZ using the Scale for the Assessment of Negative Symptoms (SANS; Andreasen, Reference Andreasen1989), the Brief Psychiatric Rating Scale (Overall & Gorham, Reference Overall and Gorham1962), the Calgary Depression Scale (Addington, Addington, Maticka-Tyndale, & Joyce, Reference Addington, Addington, Maticka-Tyndale and Joyce1992), and the Young Mania Rating Scale (Young, Biggs, Ziegler, & Meyer, Reference Young, Biggs, Ziegler and Meyer1978). We computed the experiential negative symptom factor score by averaging item scores for avolition/role-functioning and anhedonia/asociality from the SANS (Table 1). We computed individual psychosis scores for PSZ by averaging scores from the four psychosis items from the BPRS (Suspiciousness, Grandiosity, Hallucinations, and Unusual Thought Content).

Table 1. Demographic, clinical, and standard cognitive characterization of participants

WASI, Wechsler Abbreviated Scale of Intelligence; WTAR, Wechsler Test of Adult Reading; WRAT4, Wide-ranging Achievement Test, Reading Subtest; BPRS, Brief Psychiatric Rating Scale; SANS, Scale for the Assessment of Negative Symptoms; Avol/Anhed, Avolition/Anhedonia subscales; CDS, Calgary Depression Scale; YMRS, Young Mania Rating Scale.

Experimental task

In the TUI task (Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009), participants observe a clock hand which completes a single revolution over 5 s (Fig. 1). Participants were instructed to stop the clock hand by pressing a response button, after which a number of points were awarded (see online Supplemental Methods for verbatim instructions). The outcomes of trials (i.e. points earned) were displayed on the center of the clock face at the end of each trial. Participants could only win points on trials on which they responded in less than 5 s.

Fig. 1. Temporal Utility Integration (TUI) task. (a) Example clock-face stimulus; (b) the probability of reward occurring as a function of response time; (c) reward magnitude (contingent on RT); (d) expected value across trials for each time point. The functions are designed such that the expected value in the beginning in DEV is approximately equal to that at the end in IEV so that if optimal, subjects should obtain the same average reward in both IEV and DEV. Within a given condition (consisting of a block of 40 trials) individuals can learn to produce responses that yield the most points, on average, by stopping the clock at the optimal time (with a key press). Without exploring multiple RTs, a subject might be reinforced often for a given RT, but never discover whether he/she might obtain a larger number of points if he/she had only explored other options. Note that CEV and CEVR have the same EV, so the black line represents EV for both conditions. The x-axis in all plots corresponds to the time after onset of the clock stimulus at which the response is made. Reprinted from Ref. 1.

Points were awarded with a probability and magnitude that varied as a function of response time (RT; Fig. 1). The Matlab code used to generate reward probability and magnitude, across time, within a trial, is provided in online Supplemental Methods S2. In three conditions, reward probability decreased with RT and reward magnitude increased with RT. However, the precise relationships between reward probability and RT, and between reward magnitude and RT, varied across conditions such that the product of reward probability and magnitude, expected value (EV), increased with increasing RT in one condition (the increasing expected value/IEV condition), decreased in another (the decreasing expected value/DEV condition), and remained constant in a third (the constant expected value/CEV condition). In a fourth condition (the constant expected value – reversed/CEVr condition), EV remained constant, but reward probability increased with RT and reward magnitude decreased with RT. The CEV and CEVr conditions served as control conditions. Because expected value is higher earlier in the clock, in the DEV condition, subjects are more likely to receive a positive PE for speeding up responding in the DEV condition; thus, the DEV condition primarily assesses the degree to which people learn to speed up responding, following experience of positive PEs. Because expected value is higher later in the clock, in the IEV condition, subjects are more likely to receive a negative PE for speeding up responding in the IEV condition; thus, the IEV condition primarily assesses the degree to which people learn to slow down responding, following experience of negative PEs. Participants performed two blocks each of the DEV and IEV conditions, and one block each of the CEV and CEVr conditions, for a total of six 40-trial blocks. The orders of these blocks were randomized across participants.

Behavioral data analysis

Behavioral measures of performance are shown in Table 2. For each subject, we computed mean RT changes in each condition (first 10 trials–last 10 trials of each block). We also averaged RT differences between the previous trial and the current trial, as a function of the previous trial's outcome (wins: >0 points; non-wins: 0 points). This was done separately in the IEV and DEV conditions and was used as a behavioral measure of exploration. We refer to this variable as ‘RT swing’. Finally, we compared groups on the total number of points earned (online Supplementary Table S1). Groups did not differ in number of points earned for either the IEV or DEV conditions.

Table 2. Experimental variables of interest

Computational modeling

Original model

We used a previously validated computational model of the TUI paradigm to probe contributors of goal-directed behavior on a subject-wise basis (Badre et al., Reference Badre, Doll, Long and Frank2012; Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009; Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011). This model estimates the degree to which individuals modulate RT as a function of positive and negative PEs, as well as uncertainty-driven exploration.

On each trial (t), our model estimates a response time, $\widehat{{RT}}( t )$:

$$\eqalign{& \widehat{{RT}}( t ) = K + \lambda RT( {t-1} ) + \nu ( {[ RT_{best}-RT_{avg}} ] ) -Go( t ) \cr& + NoGo( t ) + \rho [ {\mu_{slow}( t ) -\;\mu_{\,fast}( t ) } ] + \varepsilon [ {\theta_{slow}( t ) -\;\theta_{\,fast}( t ) } ] }$$

Here, K is a free parameter capturing a participant's baseline response speed, λ represents the autocorrelation between the previous and current RT, and ν captures the tendency of individuals to adapt RT toward the single largest reward experienced thus far, RT _best, the ‘Going for the Gold’ parameter (Badre et al., Reference Badre, Doll, Long and Frank2012; Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009).

To probe contributions of exploitative and exploratory behavior, the model includes components that track the expected value (V) of two specific classes of actions separately, fast and slow RT, compared to a participant's local average RT. The local average RT (RT_avg) is calculated as follows:

$$RT_{avg}( t ) = RT_{avg}( {t-1} ) + \alpha [ {\;RT( {t-1} ) -\;RT_{avg}( {t-1} ) } ] $$

Even though RT is continuous, the reward functions are monotonic. Subjects are told that sometimes it will be better to respond faster and sometimes slower. Thus, participants only need to track the reward value of responding relatively faster for slower and adjust RT accordingly.

For both exploitative and exploratory components, value is updated using a delta rule. α is the rate at which new information is integrated into V and δ is the reward PE [Outcome(t–1)–V(t–1)]:

$$V( t ) = V( {t-1} ) + \alpha \delta ( {t-1} ) \;$$

The exploitative component of the model tracks the expected value associated with both fast and slow RT, allowing participants to continuously modulate RT in proportion to their relative difference. More specifically, the model assumes that participants track the probability of obtaining a better than average outcome following fast or slow responses, which are separately computed via Bayesian integration:

$$P( {\theta {\rm \vert }\delta_1 \ldots \delta_n} ) \propto \;P( {\delta_1 \ldots \delta_n{\rm \vert }\theta } ) P( \theta ) \;$$

Here, θ represents the parameters in the probability distribution and δ ₁…δ _n represents the prediction errors observed thus far in the experiment. The difference in expected value means (μ _slow, μ _fast) contributes to RT on trial t scaled by free parameter ρ:

$$\;\rho [ {\mu_{slow}( t ) -\;\mu_{\,fast}( t ) } ] $$

The exploratory component of the model capitalizes on the uncertainty of the probability distributions to strategically explore those responses for which reward statistics are most uncertain. Specifically, the model assumes that subjects explore uncertain RTs to reduce uncertainty. This component is computed as:

$$\varepsilon [ {\sigma_{slow}( t ) -\;\sigma_{\,fast}( t ) } ] $$

Here σ _slow and σ _fast are uncertainties, quantified in terms of standard deviations of the probability distributions tracked by the Bayesian update rule, and ɛ is a free parameter controlling the degree to which subjects make exploratory responses in proportion to relative uncertainty, σ _slow(t) − σ _fast(t).

Finally, Go and NoGo learning reflect a striatal bias to speed responding as a function of positive RPE's and to slow responding as a function of negative RPEs. Evidence for speeding and slowing in the task is separately tracked:

$$Go( t ) = Go( {t-1} ) + \alpha _G\delta _ + ( {t-1} ) $$

$$NoGo( t ) = NoGo( {t-1} ) + \alpha _N\delta _-( {t-1} ) $$

where α _G and α _N are learning rates scaling the effects of positive (δ ₊) and negative (δ ₋) errors in expected value prediction V (i.e. positive and negative RPE).

Sticky choice model

We compared the model described above with a model testing the alternative idea that people are averse to uncertainty. To test uncertainty aversion, $\varepsilon$ was allowed to take on a negative value. To aid interpretation of uncertainty aversion v. simply mimicking the tendency to respond the same as previous trials, we added an additional free parameter, sticky choice, where λ RT(t − 1) is replaced with λ sticky(t). This allowed for estimation of a decaying function of previous trials' RTs v. simply accounting for the previous trials RTs:

$$sticky( t ) = RT( {t-1} ) + d \times sticky( {t-1} ) $$

Here, d is a decay parameter influencing the degree to which prior RTs influence the current RT estimate. To limit the number of free parameters in the sticky choice model, we removed the ‘going for the gold’ parameter, as it was not critical for hypotheses examining effort aversion. See online Supplementary Table S2 for model comparison. Both models appeared to fit participant behavior in roughly an equivalent manner. In the following analyses, we report results from the original model; however, results from the sticky choice model can be found in the online Supplemental Data S1.

Modeling summary

The three parameters of interest in our study are α _P, α _N, and ɛ (Table 2). Similar to Badre et al. (Reference Badre, Doll, Long and Frank2012), the estimation of the ɛ parameter also allowed us to classify participants as ‘Explorers’ and ‘Non-explorers’, based on whether this ɛ parameter was positive or not. Along with fitting parameters estimated for each participant, the computational model also provides trial-level estimates of expected value, relative uncertainty (the difference in uncertainty between fast v. slow RT), and prediction error. We use these trial-level variables in the neuroimaging analyses described below to elicit the neural correlates of decision variables. See online Supplementary Fig. S1 for correlations among trial-level variables.

Analyses of event-related MRI data

Based on prior literature (Badre et al., Reference Badre, Doll, Long and Frank2012), we examined effects of relative uncertainty in right rlPFC. Taken from Badre et al., the Talairach coordinates were [27 50 23] (Reference Badre, Doll, Long and Frank2012). The ROI was a sphere of 10 mm radius. Finally, we specified an ROI in the VS, to investigate RPE-evoked activity, consisting of two spheres of 5 mm radius centered on Talairach coordinates: [±10 8 −4] (Culbreth et al., Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b; Schlagenhauf et al., Reference Schlagenhauf, Huys, Deserno, Rapp, Beck, Heinze and Heinz2014).

In each ROI, we extracted mean beta values for the two regressors of interest described above (RPE and relative uncertainty) and then performed one-sample t tests comparing the overall mean contrast to zero, as well as two-sample t tests to test for between-group differences. In order to examine how symptom severity and measures of intellectual function modulated MRI responses, we performed Spearman correlations on the mean beta values from the ROIs. To characterize the influence of antipsychotic drugs on BOLD signal responses in ROIs, we converted antipsychotic doses for PSZ to haloperidol equivalents (Andreasen, Pressler, Nopoulos, Miller, & Ho, Reference Andreasen, Pressler, Nopoulos, Miller and Ho2010).

As stated above, our computational modeling analyses allowed us to characterize participants as ‘Explorers’ v. ‘Non-Explorers’. In order to assess effects of diagnosis, premorbid IQ, and regional brain activity on explorer status, we performed logistic regression analyses with predictor variables including diagnosis, premorbid IQ, and relative-uncertainty-evoked rlPFC activity and dependent variable of explorer status.

Results

Behavioral measures of goal-directed behavior

PE-driven learning

The DEV condition assesses Go-Learning, while the IEV condition assesses NoGo-Learning (Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009; Moustafa, Cohen, Sherman, & Frank, Reference Moustafa, Cohen, Sherman and Frank2008). Analyses revealed that PSZ showed reduced DEV acceleration, relative to controls (F _3,186 for group × block interaction = 2.981, p = 0.033), whereas groups did not differ in RT deceleration in the IEV condition (F _3.186 for group × block interaction = 0.437, p = 0.727; Fig. 2). Patients and controls also did not differ significantly in the control conditions (CEV and CEVr). Patients and controls did not differ in the total number of points earned in DEV conditions, IEV conditions, or across all six trial blocks (online Supplementary Table S1). In summary, the current results provide further evidence for reduced learning from positive outcomes and intact learning from negative outcomes in medicated PSZ. See online Supplementary Fig. S2 for raw trial-wise RT by trial number by condition.

Fig. 2. Behavioral performance of patients and controls in the four experimental conditions. (a) Relative to controls, PSZ showed reduced DEV acceleration, from the first to the last block of trials, whereas patients and controls did not differ in their increases in mean response latencies, from the first to the last block of trials, in the (b) IEV, (c) CEV, or (d) CEVr conditions.

In terms of symptom associations, motivational deficits and premorbid IQ correlated positively with IEV deceleration (Fig. 3; Table 3). Patients with the greatest motivational deficits and highest premorbid IQs were the most sensitive to loss. We observed no significant correlations between RL measures and positive symptom severity (Table 3).

Fig. 3. Overall slowing in the IEV condition (a measure of NoGo-learning) correlates with negative symptom severity and premorbid intellectual functioning. (a) Relationship between IEV slowing and ratings for experiential negative symptoms (SANS avolition/ anhedonia/asociality, or AAA). (b) Relationship between IEV slowing and premorbid IQ, as measured by the Wechsler Test of Adult Reading. Better NoGo-learning is associated with both greater avolition/anhedonia/asociality and higher premorbid intellectual functioning.

Table 3. Relationships among behavioral and neural measures of reinforcement learning and measures of symptom severity and intellectual capacity in people with schizophrenia

AAA, Avolition/Anhedonia/Asociality items from the Scale for the Assessment of Negative Symptoms; BPRS RD – Positive Symptom items from the Brief Psychiatric Rating Scale WTAR, Premorbid IQ – estimated from the Wechsler Test of Adult Reading; DEV, Decreasing Expected Value condition; IEV, Increasing Expected Value condition; [a_P–a_N], contrast in learning rates for positive and negative RPEs; RPE, reward prediction error; VS, ventral striatum. **p < 0.01; *p < 0.05.

Exploratory behavior

Averaged RT differences between the previous trial and the current trial (‘RT swings’) served as our behavioral measure of exploration. Across groups, participants showed large RT swings following non-win outcomes, suggesting exploration after sub-optimal actions. Patients and controls did not differ in their mean no-win RT swings in the IEV and DEV conditions, or in mean win-shifts in the IEV condition (online Supplementary Table S3). However, we observed that controls showed larger mean win-shifts compared to PSZ in the DEV condition (t ₆₁ = 2.222, p = 0.030). When we examined relationships between clinical measures and experimental measures of exploration, we observed that mean RT shifts following non-wins, in the IEV condition, positively correlated with the severity of motivational deficits in PSZ (Table 4). In both PSZ and controls, mean RT shifts following IEV non-wins positively correlated with premorbid IQ.

Table 4. Relationships among neural and behavioral measures of exploration, measures of symptom severity and measures of intellectual capacity in PSZ and controls

SANS AAA, Avolition/Anhedonia/Asociality items from the Scale for the Assessment of Negative Symptoms (a measure of experiential negative symptoms); WTAR, Premorbid IQ – estimated from the Wechsler Test of Adult Reading; DEV_NWshift, mean RT change after non-win (0 points) in DEV condition; IEV_NWshift, mean RT change after non-win (0 points) in IEV condition; e, explore parameter (contribution of relative uncertainty to RT change); RLPFC_RU, fMRI parameter estimate corresponding to relative-uncertainty-associated activity in rostrolateral prefrontal cortex. **p < 0.01; *p < 0.05; +p < 0.10.

Computational measures of goal-directed behavior

Computational modeling analyses recapitulated findings from behavioral measures of goal-directed behavior (online Supplementary Fig. S3). Specifically, the contrast in learning rates [(α _P – α _N)] correlated positively with the [DEV acceleration–IEV deceleration] contrast in both PSZ (ρ = 0.580, p = 0.001) and controls (ρ = 0.342, p = 0.048; online Supplementary Table S4). These associations lend credence to the idea that DEV acceleration reflects a bias toward greater learning from positive RPEs and that IEV deceleration reflects a bias toward greater learning from negative RPEs. However, the contrast in learning rates for positive and negative RPEs [(α _P–α _N)] did not differ significantly between groups (online Supplementary Table S5).

Surprisingly, we did not observe significant group differences or negative symptom associations when examining the ɛ parameter, our computational estimate of uncertainty-driven exploration (Table 4; online Supplementary Table S5). However, ɛ was positively associated with premorbid IQ, a measure of intellectual function. Further, we conducted analyses where participants were classified as ‘Explorers’ v. ‘Non-Explorers’ based on ɛ > 0. Here, we observed that premorbid IQ showed a significant positive association with explorer status, but diagnosis and their interaction did not (beta = 0.12, p value = 0.04; online Supplementary Table S6). Thus, measures of intellectual function were positively associated with exploration measures, but diagnosis was not.

Neural correlates of goal-directed behavior

PE-driven learning

Given that prior research has demonstrated a role of the VS in the representation of value and signaling of RPEs, we hypothesized a similar finding in the present study. Indeed, the results of ROI analyses were indicative of the VS positively tracking RPE magnitude and valence (t ₆₁ = 4.227, p < 0.01; online Supplementary Fig. S4). Of note, a stronger VS RPE signal was observed in controls who showed greater DEV acceleration (but not PSZ; online Supplementary Table S7). However, there was no significant effect of diagnosis on RPE-related VS activity (t ₆₁ = −0.951, p = 0.346) (Fig. 4).

Fig. 4. Relationships among diagnosis, explorer status, relative-uncertainty-evoked rlPFC activity, and measures of intellectual function. Both (a) current IQ (estimated from the WASI) and (b) premorbid IQ (estimated from the WTAR) predict explorer status, without respect to diagnosis. (c) Diagnosis and relative-uncertainty-evoked rlPFC activity interact to predict explorer status, such that relative-uncertainty-evoked rlPFC activity distinguishes control explorers from control non-explorers. Relative-uncertainty-evoked rlPFC activity does not distinguish patient explorers from patient non-explorers.

Neural correlates of uncertainty-driven exploration

Based on Badre et al. (Reference Badre, Doll, Long and Frank2012), we hypothesized that rlPFC activity would track the tendency to use relative uncertainty to drive exploration, thus distinguishing ‘Explorers’ from ‘Non-explorers’. Regression analyses utilizing diagnosis and rlPFC activity to predict explorer status yielded an overall model at a trend level of significance (χ² = 7.14, p = 0.07). The model indicated that rlPFC activity and the interaction between diagnosis and rlPFC were significant predictors (p = 0.04 and p = 0.03, respectively). Diagnosis alone, however, was not significant (p = 0.68).

Results from a regression model that included diagnosis and premorbid IQ estimates (from the WTAR), as predictors, as well as the interaction term, was significant (F _3,57 = 2.95, p = 0.040) and indicated that both diagnosis and premorbid IQ were significant positive predictors of rlPFC activity (online Supplementary Table S8). Their interaction was observed at a trend level of significance. These analyses indicate that both diagnosis and cognitive measures are associated with rlPFC relative uncertainty to driven exploration activity. No relationship was observed between measures of motivation deficits and rlPFC activity. We observed no significant correlations between ratings of positive symptom severity from the BPRS and behavioral, modeling, or neural measures of exploration.

Medication effects

Spearman correlation analyses revealed that antipsychotic medication dose was not significantly related to any measures of interest pertaining to symptoms, behavior, or neural activity among PSZ (online Supplementary Table S9).

Additional models and fit analyses

While we present analyses of the original model in the main manuscript, analyses in a subset of participants who showed good fit to the computational model, as well as analyses using the ‘sticky choice’ model can be found in the online Supplemental Results (Table S14). Results were largely similar.

Discussion

We examined contributors to goal-directed behavior and their neural correlates in PSZ and controls. Specifically, we probed (1) learning from positive PEs; (2) learning from negative PEs; and (3) uncertainty-driven exploration. Importantly, this was the first study to examine neural correlates of uncertainty-driven exploration in PSZ. Consistent with previous reports, PSZ behaviorally demonstrated reduced reward-seeking behavior (decreased DEV acceleration) (Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012; Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011). In contrast, negative symptoms were positively correlated with IEV deceleration, such that patients with more severe motivational deficits showed greater IEV deceleration (enhanced NoGo-learning). This finding of a stronger bias toward NoGo-learning, in PSZ, relative to controls, is also consistent with our previous work (Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012; Waltz et al., Reference Waltz, Frank, Wiecki and Gold2011; Waltz & Gold, Reference Waltz and Gold2007). Surprisingly, behavioral measures of uncertainty-driven exploration did not differ significantly between groups; however, exploration was positively associated with intellectual function. At a neural level, we found clear evidence of RPE signals in VS, but no between-group differences. Replicating Badre et al. (Reference Badre, Doll, Long and Frank2012), we showed that trial-wise estimates of relative uncertainty in the rlPFC distinguished participants who engaged in exploratory behavior from those that did not. This finding is consistent with work from Zajkowski et al., showing that rlPFC intervention impairs directed exploration (Zajkowski et al., Reference Zajkowski, Kossut and Wilson2017). Further, mirroring behavioral analyses, uncertainty-related activation in rlPFC was positively associated with intellectual function. Unexpectedly, however, neural correlates of uncertainty in the rlPFC did not differ between groups.

Regarding exploration, we replicated prior modeling results indicating that representations of uncertainty contribute to decisions to sample options about which less is known, and that rlPFC plays a role in this process. While we observed no between-group difference in the exploration parameter, ɛ, and no significant correlations between clinical ratings for motivational deficits and ɛ (in contrast to Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011), we observed effects of cognition on both ɛ and neural activity associated with uncertainty-driven exploration. Our finding of relationships between measures of intellectual function and goal-directed exploration indicates that those with decreased cognitive capacity are less likely to engage in uncertainty-driven exploration and is consistent with recent results from our group investigating uncertainty-driven exploration using a different paradigm (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020).

There are several explanations for null findings, regarding effects of diagnosis and negative symptom severity on uncertainty-driven exploration. First, reduced uncertainty-driven exploration may be characteristic of only a subset of PSZ. Thus, the effect may depend on the particular sample recruited. In our sample, participants exhibited relatively high cognition and relatively low negative symptoms. Second, the TUI task is not designed to isolate directed and random components of exploration. Whereas directed exploration is driven by information seeking, ‘random exploration’ pertains to behavioral variability that drives exploration by chance (Wilson et al., Reference Wilson, Geana, White, Ludvig and Cohen2014). In our paper using a different paradigm (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020), we found that PSZ and controls differed on measures of directed exploration, but not measures of random exploration. Third, some PSZ may generate relatively accurate representations of uncertainty, which then do not influence decision-making or even lead to the active avoidance of more uncertain options – a phenomenon called ambiguity aversion. Extreme ambiguity aversion was characteristic of a substantial fraction of PSZ in our recent behavioral study (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020). It is possible that cognitive functioning may have a stronger association with uncertainty-driven exploration and reinforcement learning, compared to negative symptomatology, than we previously appreciated.

Consistent with Badre et al. (Reference Badre, Doll, Long and Frank2012), neural activity in rlPFC distinguished ‘Explorers’ from ‘Non-explorers’ in controls but not PSZ. This may result from the fact that uncertainty, even when adaptively represented, did not always lead to exploratory behavior in patients (sometimes, it led to the active avoidance of more uncertain options). Thus, uncertainty representations in the brain may have been less coupled from exploration-related neural activity in rlPFC. However, due to the fact that uncertainty-driven exploration was most characteristic of participants with higher levels of cognitive functions, it is understandable that greater relative-uncertainty-driven activity in rlPFC was observed in those with higher intellectual functioning, overall.

Limitations

A modest sample size may have precluded the detection of behavioral and neural associations of goal-directed exploration in the present sample. Additionally, PSZ recruited for this study consisted of stable outpatients on antipsychotic medication. We also did not assess duration of illness. Although we observed no effects of standardized antipsychotic dose, possible effects of medication cannot be ruled out.

Summary

Schizophrenia patients demonstrated reduced reward-seeking behavior and showed IEV deceleration that correlated positively with experiential negative symptoms. Surprisingly, behavioral measures of uncertainty-driven exploration were not significantly different between groups. We showed that trial-wise estimates of relative uncertainty in the rlPFC distinguished Explorers and Non-Explorers. Uncertainty-related activation in rlPFC was also positively associated with intellectual function. These results further elucidate the nature of reinforcement learning and decision-making in PSZ and controls, linking specific cognitive and computational processes to specific neural substrates, which could serve as biomarkers to quantify the effects of potential interventions.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722003993

Conflict of interest

The authors report no conflicts of interest related to the current manuscript.

References

Addington, D., Addington, J., Maticka-Tyndale, E., & Joyce, J. (1992). Reliability and validity of a depression rating scale for schizophrenics. Schizophrenia Research, 6(3), 201–208. https://doi.org/10.1016/0920-9964(92)90003-N.CrossRef Google Scholar PubMed

Andreasen, N. C. (1989). The Scale for the Assessment of Negative Symptoms (SANS): conceptual and theoretical foundations. The British journal of psychiatry, 155(S7), 49–52.Google Scholar

Andreasen, N. C., Pressler, M., Nopoulos, P., Miller, D., & Ho, B. C. (2010). Antipsychotic dose equivalents and dose-years: A standardized method for comparing exposure to different drugs. Biological Psychiatry, 67(3), 255–262. https://doi.org/10.1016/j.biopsych.2009.08.040.CrossRef Google Scholar PubMed

Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73(3), 595–607. https://doi.org/10.1016/J.NEURON.2011.12.025.CrossRef Google Scholar PubMed

Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012). Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cerebral Cortex, 22(11), 2575–2586. https://doi.org/10.1093/cercor/bhr332.CrossRef Google Scholar PubMed

Clark, L., Cools, R., & Robbins, T. W. (2004). The neuropsychology of ventral prefrontal cortex: Decision-making and reversal learning. Brain and Cognition, 55(1), 41–53. https://doi.org/10.1016/S0278-2626(03)00284-7.CrossRef Google Scholar PubMed

Culbreth, A. J., Westbrook, A., Daw, N. D., Botvinick, M., & Barch, D. M. (2016a). Reduced model-based decision-making in schizophrenia. Journal of Abnormal Psychology, 125(6), 777–787. https://doi.org/10.1037/abn0000164.CrossRef Google Scholar PubMed

Culbreth, A. J., Westbrook, A., Xu, Z., Barch, D. M., & Waltz, J. A. (2016b). Intact ventral striatal prediction error signaling in medicated schizophrenia patients. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 1(5), 474–483. https://doi.org/10.1016/j.bpsc.2016.07.007.Google Scholar PubMed

Dowd, E. C., Frank, M. J., Collins, A., Gold, J. M., & Barch, D. M. (2016). Probabilistic reinforcement learning in patients with schizophrenia: Relationships to anhedonia and avolition. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 1(5), 460–473. https://doi.org/10.1016/J.BPSC.2016.05.005.Google Scholar PubMed

First, M. B., & Gibbon, M., (2004). The Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) and the Structured Clinical Interview for DSM-IV Axis II Disorders (SCID-II). In M. J. Hilsenroth & D. L. Segal (Eds.), Comprehensive handbook of psychological assessment, Vol. 2. Personality assessment (pp. 134–143). Hoboken, New Jersey: John Wiley & Sons, Inc.Google Scholar

Frank, M. J., Doll, B. B., Oas-Terpstra, J., & Moreno, F. (2009). Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience, 12(8), 1062–1068. https://doi.org/10.1038/nn.2342.CrossRef Google Scholar PubMed

Gershman, S. J. (2019). Uncertainty and exploration. Decision, 6(3), 277–286. https://doi.org/10.1037/dec0000101.CrossRef Google Scholar PubMed

Gold, J. M., Waltz, J. A., Matveeva, T. M., Kasanova, Z., Strauss, G. P., Herbener, E. S., … Frank, M. J. (2012). Negative symptoms and the failure to represent the expected reward value of actions: Behavioral and computational modeling evidence. Archives of General Psychiatry, 69(2), 129–138. https://doi.org/10.1001/archgenpsychiatry.2011.1269.CrossRef Google Scholar PubMed

Gold, J. M., Waltz, J. A., Prentice, K. J., Morris, S. E., & Heerey, E. A. (2008). Reward processing in schizophrenia: A deficit in the representation of value. Schizophrenia Bulletin, 34(5), 835–847. https://doi.org/10.1093/schbul/sbn068.CrossRef Google Scholar PubMed

Kring, A. M., & Barch, D. M. (2014). The motivation and pleasure dimension of negative symptoms: Neural substrates and behavioral outputs. European Neuropsychopharmacology, 24(5), 725–736. https://doi.org/10.1016/j.euroneuro.2013.06.007.CrossRef Google Scholar PubMed

McClure, S. M., Berns, G. S., & Montague, P. R. (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron, 38(2), 339–346. https://doi.org/10.1016/S0896-6273(03)00154-5.CrossRef Google Scholar

Moustafa, A. A., Cohen, M. X., Sherman, S. J., & Frank, M. J. (2008). A role for dopamine in temporal decision making and reward maximization in Parkinsonism. Journal of Neuroscience, 28(47), 12294–12304. https://doi.org/10.1523/JNEUROSCI.3116-08.2008.CrossRef Google Scholar PubMed

Overall, J. E., & Gorham, D. R. (1962). The brief psychiatric rating scale. Psychological Reports, 10(3), 799–812. https://doi.org/10.2466/pr0.1962.10.3.799.CrossRef Google Scholar

Payzan-Lenestour, E., & Bossaerts, P. (2011). Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings. PLoS Computational Biology, 7(1), e1001048. https://doi.org/10.1371/journal.pcbi.1001048.CrossRef Google Scholar PubMed

Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature, 442(7106), 1042–1045. https://doi.org/10.1038/nature05051.CrossRef Google Scholar PubMed

Radua, J., Schmidt, A., Borgwardt, S., Heinz, A., Schlagenhauf, F., McGuire, P., & Fusar-Poli, P. (2015). Ventral striatal activation during reward processing in psychosis a neurofunctional meta-analysis. JAMA Psychiatry, 72(12), 1243–1251. https://doi.org/10.1001/jamapsychiatry.2015.2196.CrossRef Google Scholar PubMed

Schlagenhauf, F., Huys, Q. J. M., Deserno, L., Rapp, M. A., Beck, A., Heinze, H.-J., … Heinz, A. (2014). Striatal dysfunction during reversal learning in unmedicated schizophrenia patients. NeuroImage, 89, 171–180. https://doi.org/10.1016/J.NEUROIMAGE.2013.11.034.CrossRef Google Scholar PubMed

Strauss, G. P., Frank, M. J., Waltz, J. A., Kasanova, Z., Herbener, E. S., & Gold, J. M. (2011). Deficits in positive reinforcement learning and uncertainty-driven exploration are associated with distinct aspects of negative symptoms in schizophrenia. Biological Psychiatry, 69(5), 424–431. https://doi.org/10.1016/J.BIOPSYCH.2010.10.015.CrossRef Google Scholar PubMed

Waltz, J. A., Frank, M. J., Wiecki, T. V., & Gold, J. M. (2011). Altered probabilistic learning and response biases in schizophrenia: Behavioral evidence and neurocomputational modeling. Neuropsychology, 25(1), 86–97. https://doi.org/10.1037/a0020882.CrossRef Google Scholar PubMed

Waltz, J. A., & Gold, J. M. (2007). Probabilistic reversal learning impairments in schizophrenia: Further evidence of orbitofrontal dysfunction. Schizophrenia Research, 93(1–3), 296–303. https://doi.org/10.1016/J.SCHRES.2007.03.010.CrossRef Google Scholar PubMed

Waltz, J. A., Wilson, R. C., Albrecht, M. A., Frank, M. J., & Gold, J. M. (2020). Differential effects of psychotic illness on directed and random exploration. Computational Psychiatry, 4(0), 18. https://doi.org/10.1162/cpsy_a_00027.CrossRef Google Scholar PubMed

Weschler, D. (2001). Wechsler Test of Adult Reading (WTAR). The Psychological Corporation.Google Scholar

Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore-exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074–2081. https://doi.org/10.1037/a0038199.CrossRef Google Scholar PubMed

Young, R. C., Biggs, J. T., Ziegler, V. E., & Meyer, D. A. (1978). A rating scale for mania: Reliability, validity and sensitivity. British Journal of Psychiatry, 133(11), 429–435. https://doi.org/10.1192/bjp.133.5.429.CrossRef Google Scholar PubMed

Zajkowski, W. K., Kossut, M., & Wilson, R. C. (2017). A causal role for right frontopolar cortex in directed, but not random, exploration. ELife, 6. p.e27430. https://doi.org/10.7554/eLife.27430.CrossRef Google Scholar

Table 1. Demographic, clinical, and standard cognitive characterization of participants

Table 2. Experimental variables of interest

Table 3. Relationships among behavioral and neural measures of reinforcement learning and measures of symptom severity and intellectual capacity in people with schizophrenia

Table 4. Relationships among neural and behavioral measures of exploration, measures of symptom severity and measures of intellectual capacity in PSZ and controls

Culbreth et al. supplementary material

File 1.2 MB

Article contents

A computational neuroimaging study of reinforcement learning and goal-directed exploration in schizophrenia spectrum disorders

Abstract

Keywords

Introduction

Methods and materials

Recruiting and screening

Assessment

Experimental task

Behavioral data analysis

Computational modeling

Original model

Sticky choice model

Modeling summary

Analyses of event-related MRI data

Results

Behavioral measures of goal-directed behavior

PE-driven learning

Exploratory behavior

Computational measures of goal-directed behavior

Neural correlates of goal-directed behavior

PE-driven learning

Neural correlates of uncertainty-driven exploration

Medication effects

Additional models and fit analyses

Discussion

Limitations

Summary

Supplementary material

Conflict of interest

References

Culbreth et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests