Introduction
Previous literature indicates that cognitive and motivational systems are impaired in people with schizophrenia (PSZ), such that PSZ exhibit a reduced tendency to engage in goal-directed behavior (Gold, Waltz, Prentice, Morris, & Heerey, Reference Gold, Waltz, Prentice, Morris and Heerey2008; Kring & Barch, Reference Kring and Barch2014). In this literature, goal-directed behavior is frequently examined with paradigms designed to elicit learning from prediction errors (PE), mismatches between expectations and outcomes (McClure, Berns, & Montague, Reference McClure, Berns and Montague2003; Pessiglione, Seymour, Flandin, Dolan, & Frith, Reference Pessiglione, Seymour, Flandin, Dolan and Frith2006). However, the precise mechanisms that give rise to reduced goal-directed behavior in PSZ remain unknown. Reduced goal-directed behavior could arise from multiple mechanisms including (1) deficient learning from positive PEs (Go-learning); (2) intact or enhanced learning from negative PEs (NoGo-learning); and/or (3) a reduced ability to engage in information-seeking behavior to determine the best actions in unfamiliar environments (diminished uncertainty-driven exploration). Our goal was to examine these contributors to goal-directed behavior and their neural correlates in PSZ and controls.
Goal-directed behavior is commonly examined through paradigms designed to elicit learning for PEs. These tasks are able to quantify the extent to which individuals use surprising positive and/or negative PEs to guide decision-making (Go-learning v. NoGo-Learning). However, PE-driven learning is not the only signal relevant to goal-directed behavior. For example, in unfamiliar and changing environments individuals must decide between either repeating actions that have resulted in optimal outcomes (exploiting) or trying new actions that could yield even better results (exploring). Uncertainty-driven exploration refers to behaviors where the goal is to increase knowledge of reward contingencies of options about which the least is known, by selecting those options (Frank, Doll, Oas-Terpstra, & Moreno, Reference Frank, Doll, Oas-Terpstra and Moreno2009; Gershman, Reference Gershman2019; Wilson, Geana, White, Ludvig, & Cohen, Reference Wilson, Geana, White, Ludvig and Cohen2014). In such circumstances, the relative uncertainty about the value of competing options is believed to be a key factor in the trade-off between the exploitation of known contingencies and information seeking about contingencies about which we know little (Cavanagh, Figueroa, Cohen, & Frank, Reference Cavanagh, Figueroa, Cohen and Frank2012; Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009; Payzan-Lenestour & Bossaerts, Reference Payzan-Lenestour and Bossaerts2011). Relative uncertainty is essentially the difference in certainty between ‘known’ and ‘unknown’ options, and thus it is distinct from the mean or overall uncertainty of all options.
Previous reports have provided evidence for deficient learning from positive outcomes, intact learning from negative outcomes, and reduced uncertainty-driven exploration in PSZ, with associations to negative symptoms. Several studies have found that PSZ show deficits in using positive reward PEs to drive behavior, and that such deficits are most robust in those with severe negative symptoms (Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012; Waltz, Frank, Wiecki, & Gold, Reference Waltz, Frank, Wiecki and Gold2011; Waltz & Gold, Reference Waltz and Gold2007). In contrast, learning from negative reward PEs has often been found to be intact in medicated PSZ (Culbreth, Westbrook, Xu, Barch, & Waltz, Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b; Dowd, Frank, Collins, Gold, & Barch, Reference Dowd, Frank, Collins, Gold and Barch2016). Such findings provide an intriguing account of reduced goal-directed behavior wherein PSZ are capable of learning what not to do, to avoid punishment but not what to do, to obtain reward. Finally, multiple studies have found evidence for reduced uncertainty-driven exploration in PSZ (Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011; Waltz, Wilson, Albrecht, Frank, & Gold, Reference Waltz, Wilson, Albrecht, Frank and Gold2020). For example, Strauss found that PSZ exhibited a reduction in goal-directed exploration, which was related to anhedonia (2011). Thus, negative symptoms have been associated with behavioral and neural signals underlying goal-directed behavior in PSZ.
Cognitive deficits in PSZ are also associated with behavioral and neural signals underlying goal-directed behavior. PE-driven learning and uncertainty-driven exploration rely on cognitive resources, in terms of being able to use active representations of value to detect differences among options and changes in environmental contingencies. A consistent finding in the literature is that PSZ show a reduced ability to use cognitively demanding learning strategies compared to controls (Culbreth, Westbrook, Daw, Botvinick, & Barch, Reference Culbreth, Westbrook, Daw, Botvinick and Barch2016a; Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012). Further, reduced uncertainty-driven exploration in PSZ has been linked to cognitive deficits (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020).
In addition to detailing potential contributors to goal-directed behavior, work in the basic and clinical sciences has begun to delineate the neural correlates of these behavioral mechanisms. Numerous neuroimaging studies in non-psychiatric samples have shown that intact positive PE-driven learning depends on the intact functioning of ventral striatum (VS) and ventromedial prefrontal cortex (PFC), as these structures are involved in the representation of value and the signaling of positive PEs (Clark, Cools, & Robbins, Reference Clark, Cools and Robbins2004). Studies examining VS PE-related activation in PSZ have been mixed, some showing impairments in mostly medication-naïve samples (Radua et al., Reference Radua, Schmidt, Borgwardt, Heinz, Schlagenhauf, McGuire and Fusar-Poli2015) and others showing intact signaling in medicated patients (Culbreth et al., Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b). Finally, additional studies in the basic science literature point to a role for rostrolateral (rl) PFC in the representation of relative uncertainty of option value (Badre, Doll, Long, & Frank, Reference Badre, Doll, Long and Frank2012; Zajkowski, Kossut, & Wilson, Reference Zajkowski, Kossut and Wilson2017). Importantly, the neural correlates of exploratory behavior have yet to be examined in PSZ.
Our goal was to examine contributors to goal-directed behavior and their neural correlates in PSZ and controls. Behaviorally, we hypothesized that impairments in the execution of goal-directed behavior observed in psychosis will be characterized by both (1) impaired learning from positive PEs and (2) intact learning from negative PEs. However, we hypothesized that impairments in goal-directed behavior would extend beyond reduced PE-driven learning, involving mechanisms for adjudicating the explore/exploit trade-off. We employed a behavioral paradigm, the Temporal Utility Integration (TUI) task, which lends itself to computational modeling, allowing us to estimate parameters corresponding to learning from positive and negative outcomes, as well as exploratory behavior. Computational modeling also enabled us to estimate the contributions of representations of certainty about value in modulating decision-making.
Administration of our experimental task in conjunction with fMRI allowed for examination of the neural correlates associated with aspects of goal-directed behavior. Importantly, this is the first study to investigate neural mechanisms of exploratory behavior in PSZ. We predicted that activity in rlPFC would track the tendency to use relative uncertainty to drive exploration (distinguishing individuals prone to use exploration from those who do not). Finally, we expected neural correlates to systematically relate to the severity of motivational deficits and cognitive impairment in PSZ.
Methods and materials
Recruiting and screening
Twenty-nine PSZ and 36 demographically matched controls provided written informed consent to protocols approved by the Institutional Review Board of the University of Maryland School of Medicine (Protocol HP-00051996) and successfully completed a behavioral task in the MRI scanner. All patients were chronic outpatients on stable antipsychotic medication regimens (no changes for four weeks). Diagnosis of schizophrenia or schizoaffective disorder in patients was confirmed using the Structured Clinical Interview for DSM-IV-R Disorders (First & Gibbon, Reference First and Gibbon2004), as was the absence of Axis I and Axis II diagnoses in controls. Additional exclusionary criteria included pregnancy and admission of past substance dependence.
Assessment
Premorbid intellectual function was assessed using the Wechsler Test of Adult Reading (WTAR; Weschler, Reference Weschler2001). Standard symptom ratings were obtained for PSZ using the Scale for the Assessment of Negative Symptoms (SANS; Andreasen, Reference Andreasen1989), the Brief Psychiatric Rating Scale (Overall & Gorham, Reference Overall and Gorham1962), the Calgary Depression Scale (Addington, Addington, Maticka-Tyndale, & Joyce, Reference Addington, Addington, Maticka-Tyndale and Joyce1992), and the Young Mania Rating Scale (Young, Biggs, Ziegler, & Meyer, Reference Young, Biggs, Ziegler and Meyer1978). We computed the experiential negative symptom factor score by averaging item scores for avolition/role-functioning and anhedonia/asociality from the SANS (Table 1). We computed individual psychosis scores for PSZ by averaging scores from the four psychosis items from the BPRS (Suspiciousness, Grandiosity, Hallucinations, and Unusual Thought Content).
WASI, Wechsler Abbreviated Scale of Intelligence; WTAR, Wechsler Test of Adult Reading; WRAT4, Wide-ranging Achievement Test, Reading Subtest; BPRS, Brief Psychiatric Rating Scale; SANS, Scale for the Assessment of Negative Symptoms; Avol/Anhed, Avolition/Anhedonia subscales; CDS, Calgary Depression Scale; YMRS, Young Mania Rating Scale.
Experimental task
In the TUI task (Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009), participants observe a clock hand which completes a single revolution over 5 s (Fig. 1). Participants were instructed to stop the clock hand by pressing a response button, after which a number of points were awarded (see online Supplemental Methods for verbatim instructions). The outcomes of trials (i.e. points earned) were displayed on the center of the clock face at the end of each trial. Participants could only win points on trials on which they responded in less than 5 s.
Points were awarded with a probability and magnitude that varied as a function of response time (RT; Fig. 1). The Matlab code used to generate reward probability and magnitude, across time, within a trial, is provided in online Supplemental Methods S2. In three conditions, reward probability decreased with RT and reward magnitude increased with RT. However, the precise relationships between reward probability and RT, and between reward magnitude and RT, varied across conditions such that the product of reward probability and magnitude, expected value (EV), increased with increasing RT in one condition (the increasing expected value/IEV condition), decreased in another (the decreasing expected value/DEV condition), and remained constant in a third (the constant expected value/CEV condition). In a fourth condition (the constant expected value – reversed/CEVr condition), EV remained constant, but reward probability increased with RT and reward magnitude decreased with RT. The CEV and CEVr conditions served as control conditions. Because expected value is higher earlier in the clock, in the DEV condition, subjects are more likely to receive a positive PE for speeding up responding in the DEV condition; thus, the DEV condition primarily assesses the degree to which people learn to speed up responding, following experience of positive PEs. Because expected value is higher later in the clock, in the IEV condition, subjects are more likely to receive a negative PE for speeding up responding in the IEV condition; thus, the IEV condition primarily assesses the degree to which people learn to slow down responding, following experience of negative PEs. Participants performed two blocks each of the DEV and IEV conditions, and one block each of the CEV and CEVr conditions, for a total of six 40-trial blocks. The orders of these blocks were randomized across participants.
Behavioral data analysis
Behavioral measures of performance are shown in Table 2. For each subject, we computed mean RT changes in each condition (first 10 trials–last 10 trials of each block). We also averaged RT differences between the previous trial and the current trial, as a function of the previous trial's outcome (wins: >0 points; non-wins: 0 points). This was done separately in the IEV and DEV conditions and was used as a behavioral measure of exploration. We refer to this variable as ‘RT swing’. Finally, we compared groups on the total number of points earned (online Supplementary Table S1). Groups did not differ in number of points earned for either the IEV or DEV conditions.
Computational modeling
Original model
We used a previously validated computational model of the TUI paradigm to probe contributors of goal-directed behavior on a subject-wise basis (Badre et al., Reference Badre, Doll, Long and Frank2012; Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009; Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011). This model estimates the degree to which individuals modulate RT as a function of positive and negative PEs, as well as uncertainty-driven exploration.
On each trial (t), our model estimates a response time, $\widehat{{RT}}( t )$:
Here, K is a free parameter capturing a participant's baseline response speed, λ represents the autocorrelation between the previous and current RT, and ν captures the tendency of individuals to adapt RT toward the single largest reward experienced thus far, RT best, the ‘Going for the Gold’ parameter (Badre et al., Reference Badre, Doll, Long and Frank2012; Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009).
To probe contributions of exploitative and exploratory behavior, the model includes components that track the expected value (V) of two specific classes of actions separately, fast and slow RT, compared to a participant's local average RT. The local average RT (RTavg) is calculated as follows:
Even though RT is continuous, the reward functions are monotonic. Subjects are told that sometimes it will be better to respond faster and sometimes slower. Thus, participants only need to track the reward value of responding relatively faster for slower and adjust RT accordingly.
For both exploitative and exploratory components, value is updated using a delta rule. α is the rate at which new information is integrated into V and δ is the reward PE [Outcome(t–1)–V(t–1)]:
The exploitative component of the model tracks the expected value associated with both fast and slow RT, allowing participants to continuously modulate RT in proportion to their relative difference. More specifically, the model assumes that participants track the probability of obtaining a better than average outcome following fast or slow responses, which are separately computed via Bayesian integration:
Here, θ represents the parameters in the probability distribution and δ 1…δ n represents the prediction errors observed thus far in the experiment. The difference in expected value means (μ slow, μ fast) contributes to RT on trial t scaled by free parameter ρ:
The exploratory component of the model capitalizes on the uncertainty of the probability distributions to strategically explore those responses for which reward statistics are most uncertain. Specifically, the model assumes that subjects explore uncertain RTs to reduce uncertainty. This component is computed as:
Here σ slow and σ fast are uncertainties, quantified in terms of standard deviations of the probability distributions tracked by the Bayesian update rule, and ɛ is a free parameter controlling the degree to which subjects make exploratory responses in proportion to relative uncertainty, σ slow(t) − σ fast(t).
Finally, Go and NoGo learning reflect a striatal bias to speed responding as a function of positive RPE's and to slow responding as a function of negative RPEs. Evidence for speeding and slowing in the task is separately tracked:
where α G and α N are learning rates scaling the effects of positive (δ +) and negative (δ −) errors in expected value prediction V (i.e. positive and negative RPE).
Sticky choice model
We compared the model described above with a model testing the alternative idea that people are averse to uncertainty. To test uncertainty aversion, $\varepsilon$ was allowed to take on a negative value. To aid interpretation of uncertainty aversion v. simply mimicking the tendency to respond the same as previous trials, we added an additional free parameter, sticky choice, where λ RT(t − 1) is replaced with λ sticky(t). This allowed for estimation of a decaying function of previous trials' RTs v. simply accounting for the previous trials RTs:
Here, d is a decay parameter influencing the degree to which prior RTs influence the current RT estimate. To limit the number of free parameters in the sticky choice model, we removed the ‘going for the gold’ parameter, as it was not critical for hypotheses examining effort aversion. See online Supplementary Table S2 for model comparison. Both models appeared to fit participant behavior in roughly an equivalent manner. In the following analyses, we report results from the original model; however, results from the sticky choice model can be found in the online Supplemental Data S1.
Modeling summary
The three parameters of interest in our study are α P, α N, and ɛ (Table 2). Similar to Badre et al. (Reference Badre, Doll, Long and Frank2012), the estimation of the ɛ parameter also allowed us to classify participants as ‘Explorers’ and ‘Non-explorers’, based on whether this ɛ parameter was positive or not. Along with fitting parameters estimated for each participant, the computational model also provides trial-level estimates of expected value, relative uncertainty (the difference in uncertainty between fast v. slow RT), and prediction error. We use these trial-level variables in the neuroimaging analyses described below to elicit the neural correlates of decision variables. See online Supplementary Fig. S1 for correlations among trial-level variables.
Analyses of event-related MRI data
Based on prior literature (Badre et al., Reference Badre, Doll, Long and Frank2012), we examined effects of relative uncertainty in right rlPFC. Taken from Badre et al., the Talairach coordinates were [27 50 23] (Reference Badre, Doll, Long and Frank2012). The ROI was a sphere of 10 mm radius. Finally, we specified an ROI in the VS, to investigate RPE-evoked activity, consisting of two spheres of 5 mm radius centered on Talairach coordinates: [±10 8 −4] (Culbreth et al., Reference Culbreth, Westbrook, Xu, Barch and Waltz2016b; Schlagenhauf et al., Reference Schlagenhauf, Huys, Deserno, Rapp, Beck, Heinze and Heinz2014).
In each ROI, we extracted mean beta values for the two regressors of interest described above (RPE and relative uncertainty) and then performed one-sample t tests comparing the overall mean contrast to zero, as well as two-sample t tests to test for between-group differences. In order to examine how symptom severity and measures of intellectual function modulated MRI responses, we performed Spearman correlations on the mean beta values from the ROIs. To characterize the influence of antipsychotic drugs on BOLD signal responses in ROIs, we converted antipsychotic doses for PSZ to haloperidol equivalents (Andreasen, Pressler, Nopoulos, Miller, & Ho, Reference Andreasen, Pressler, Nopoulos, Miller and Ho2010).
As stated above, our computational modeling analyses allowed us to characterize participants as ‘Explorers’ v. ‘Non-Explorers’. In order to assess effects of diagnosis, premorbid IQ, and regional brain activity on explorer status, we performed logistic regression analyses with predictor variables including diagnosis, premorbid IQ, and relative-uncertainty-evoked rlPFC activity and dependent variable of explorer status.
Results
Behavioral measures of goal-directed behavior
PE-driven learning
The DEV condition assesses Go-Learning, while the IEV condition assesses NoGo-Learning (Frank et al., Reference Frank, Doll, Oas-Terpstra and Moreno2009; Moustafa, Cohen, Sherman, & Frank, Reference Moustafa, Cohen, Sherman and Frank2008). Analyses revealed that PSZ showed reduced DEV acceleration, relative to controls (F 3,186 for group × block interaction = 2.981, p = 0.033), whereas groups did not differ in RT deceleration in the IEV condition (F 3.186 for group × block interaction = 0.437, p = 0.727; Fig. 2). Patients and controls also did not differ significantly in the control conditions (CEV and CEVr). Patients and controls did not differ in the total number of points earned in DEV conditions, IEV conditions, or across all six trial blocks (online Supplementary Table S1). In summary, the current results provide further evidence for reduced learning from positive outcomes and intact learning from negative outcomes in medicated PSZ. See online Supplementary Fig. S2 for raw trial-wise RT by trial number by condition.
In terms of symptom associations, motivational deficits and premorbid IQ correlated positively with IEV deceleration (Fig. 3; Table 3). Patients with the greatest motivational deficits and highest premorbid IQs were the most sensitive to loss. We observed no significant correlations between RL measures and positive symptom severity (Table 3).
AAA, Avolition/Anhedonia/Asociality items from the Scale for the Assessment of Negative Symptoms; BPRS RD – Positive Symptom items from the Brief Psychiatric Rating Scale WTAR, Premorbid IQ – estimated from the Wechsler Test of Adult Reading; DEV, Decreasing Expected Value condition; IEV, Increasing Expected Value condition; [aP–aN], contrast in learning rates for positive and negative RPEs; RPE, reward prediction error; VS, ventral striatum. **p < 0.01; *p < 0.05.
Exploratory behavior
Averaged RT differences between the previous trial and the current trial (‘RT swings’) served as our behavioral measure of exploration. Across groups, participants showed large RT swings following non-win outcomes, suggesting exploration after sub-optimal actions. Patients and controls did not differ in their mean no-win RT swings in the IEV and DEV conditions, or in mean win-shifts in the IEV condition (online Supplementary Table S3). However, we observed that controls showed larger mean win-shifts compared to PSZ in the DEV condition (t 61 = 2.222, p = 0.030). When we examined relationships between clinical measures and experimental measures of exploration, we observed that mean RT shifts following non-wins, in the IEV condition, positively correlated with the severity of motivational deficits in PSZ (Table 4). In both PSZ and controls, mean RT shifts following IEV non-wins positively correlated with premorbid IQ.
SANS AAA, Avolition/Anhedonia/Asociality items from the Scale for the Assessment of Negative Symptoms (a measure of experiential negative symptoms); WTAR, Premorbid IQ – estimated from the Wechsler Test of Adult Reading; DEV_NWshift, mean RT change after non-win (0 points) in DEV condition; IEV_NWshift, mean RT change after non-win (0 points) in IEV condition; e, explore parameter (contribution of relative uncertainty to RT change); RLPFC_RU, fMRI parameter estimate corresponding to relative-uncertainty-associated activity in rostrolateral prefrontal cortex. **p < 0.01; *p < 0.05; +p < 0.10.
Computational measures of goal-directed behavior
Computational modeling analyses recapitulated findings from behavioral measures of goal-directed behavior (online Supplementary Fig. S3). Specifically, the contrast in learning rates [(α P – α N)] correlated positively with the [DEV acceleration–IEV deceleration] contrast in both PSZ (ρ = 0.580, p = 0.001) and controls (ρ = 0.342, p = 0.048; online Supplementary Table S4). These associations lend credence to the idea that DEV acceleration reflects a bias toward greater learning from positive RPEs and that IEV deceleration reflects a bias toward greater learning from negative RPEs. However, the contrast in learning rates for positive and negative RPEs [(α P–α N)] did not differ significantly between groups (online Supplementary Table S5).
Surprisingly, we did not observe significant group differences or negative symptom associations when examining the ɛ parameter, our computational estimate of uncertainty-driven exploration (Table 4; online Supplementary Table S5). However, ɛ was positively associated with premorbid IQ, a measure of intellectual function. Further, we conducted analyses where participants were classified as ‘Explorers’ v. ‘Non-Explorers’ based on ɛ > 0. Here, we observed that premorbid IQ showed a significant positive association with explorer status, but diagnosis and their interaction did not (beta = 0.12, p value = 0.04; online Supplementary Table S6). Thus, measures of intellectual function were positively associated with exploration measures, but diagnosis was not.
Neural correlates of goal-directed behavior
PE-driven learning
Given that prior research has demonstrated a role of the VS in the representation of value and signaling of RPEs, we hypothesized a similar finding in the present study. Indeed, the results of ROI analyses were indicative of the VS positively tracking RPE magnitude and valence (t 61 = 4.227, p < 0.01; online Supplementary Fig. S4). Of note, a stronger VS RPE signal was observed in controls who showed greater DEV acceleration (but not PSZ; online Supplementary Table S7). However, there was no significant effect of diagnosis on RPE-related VS activity (t 61 = −0.951, p = 0.346) (Fig. 4).
Neural correlates of uncertainty-driven exploration
Based on Badre et al. (Reference Badre, Doll, Long and Frank2012), we hypothesized that rlPFC activity would track the tendency to use relative uncertainty to drive exploration, thus distinguishing ‘Explorers’ from ‘Non-explorers’. Regression analyses utilizing diagnosis and rlPFC activity to predict explorer status yielded an overall model at a trend level of significance (χ2 = 7.14, p = 0.07). The model indicated that rlPFC activity and the interaction between diagnosis and rlPFC were significant predictors (p = 0.04 and p = 0.03, respectively). Diagnosis alone, however, was not significant (p = 0.68).
Results from a regression model that included diagnosis and premorbid IQ estimates (from the WTAR), as predictors, as well as the interaction term, was significant (F 3,57 = 2.95, p = 0.040) and indicated that both diagnosis and premorbid IQ were significant positive predictors of rlPFC activity (online Supplementary Table S8). Their interaction was observed at a trend level of significance. These analyses indicate that both diagnosis and cognitive measures are associated with rlPFC relative uncertainty to driven exploration activity. No relationship was observed between measures of motivation deficits and rlPFC activity. We observed no significant correlations between ratings of positive symptom severity from the BPRS and behavioral, modeling, or neural measures of exploration.
Medication effects
Spearman correlation analyses revealed that antipsychotic medication dose was not significantly related to any measures of interest pertaining to symptoms, behavior, or neural activity among PSZ (online Supplementary Table S9).
Additional models and fit analyses
While we present analyses of the original model in the main manuscript, analyses in a subset of participants who showed good fit to the computational model, as well as analyses using the ‘sticky choice’ model can be found in the online Supplemental Results (Table S14). Results were largely similar.
Discussion
We examined contributors to goal-directed behavior and their neural correlates in PSZ and controls. Specifically, we probed (1) learning from positive PEs; (2) learning from negative PEs; and (3) uncertainty-driven exploration. Importantly, this was the first study to examine neural correlates of uncertainty-driven exploration in PSZ. Consistent with previous reports, PSZ behaviorally demonstrated reduced reward-seeking behavior (decreased DEV acceleration) (Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012; Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011). In contrast, negative symptoms were positively correlated with IEV deceleration, such that patients with more severe motivational deficits showed greater IEV deceleration (enhanced NoGo-learning). This finding of a stronger bias toward NoGo-learning, in PSZ, relative to controls, is also consistent with our previous work (Gold et al., Reference Gold, Waltz, Matveeva, Kasanova, Strauss, Herbener and Frank2012; Waltz et al., Reference Waltz, Frank, Wiecki and Gold2011; Waltz & Gold, Reference Waltz and Gold2007). Surprisingly, behavioral measures of uncertainty-driven exploration did not differ significantly between groups; however, exploration was positively associated with intellectual function. At a neural level, we found clear evidence of RPE signals in VS, but no between-group differences. Replicating Badre et al. (Reference Badre, Doll, Long and Frank2012), we showed that trial-wise estimates of relative uncertainty in the rlPFC distinguished participants who engaged in exploratory behavior from those that did not. This finding is consistent with work from Zajkowski et al., showing that rlPFC intervention impairs directed exploration (Zajkowski et al., Reference Zajkowski, Kossut and Wilson2017). Further, mirroring behavioral analyses, uncertainty-related activation in rlPFC was positively associated with intellectual function. Unexpectedly, however, neural correlates of uncertainty in the rlPFC did not differ between groups.
Regarding exploration, we replicated prior modeling results indicating that representations of uncertainty contribute to decisions to sample options about which less is known, and that rlPFC plays a role in this process. While we observed no between-group difference in the exploration parameter, ɛ, and no significant correlations between clinical ratings for motivational deficits and ɛ (in contrast to Strauss et al., Reference Strauss, Frank, Waltz, Kasanova, Herbener and Gold2011), we observed effects of cognition on both ɛ and neural activity associated with uncertainty-driven exploration. Our finding of relationships between measures of intellectual function and goal-directed exploration indicates that those with decreased cognitive capacity are less likely to engage in uncertainty-driven exploration and is consistent with recent results from our group investigating uncertainty-driven exploration using a different paradigm (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020).
There are several explanations for null findings, regarding effects of diagnosis and negative symptom severity on uncertainty-driven exploration. First, reduced uncertainty-driven exploration may be characteristic of only a subset of PSZ. Thus, the effect may depend on the particular sample recruited. In our sample, participants exhibited relatively high cognition and relatively low negative symptoms. Second, the TUI task is not designed to isolate directed and random components of exploration. Whereas directed exploration is driven by information seeking, ‘random exploration’ pertains to behavioral variability that drives exploration by chance (Wilson et al., Reference Wilson, Geana, White, Ludvig and Cohen2014). In our paper using a different paradigm (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020), we found that PSZ and controls differed on measures of directed exploration, but not measures of random exploration. Third, some PSZ may generate relatively accurate representations of uncertainty, which then do not influence decision-making or even lead to the active avoidance of more uncertain options – a phenomenon called ambiguity aversion. Extreme ambiguity aversion was characteristic of a substantial fraction of PSZ in our recent behavioral study (Waltz et al., Reference Waltz, Wilson, Albrecht, Frank and Gold2020). It is possible that cognitive functioning may have a stronger association with uncertainty-driven exploration and reinforcement learning, compared to negative symptomatology, than we previously appreciated.
Consistent with Badre et al. (Reference Badre, Doll, Long and Frank2012), neural activity in rlPFC distinguished ‘Explorers’ from ‘Non-explorers’ in controls but not PSZ. This may result from the fact that uncertainty, even when adaptively represented, did not always lead to exploratory behavior in patients (sometimes, it led to the active avoidance of more uncertain options). Thus, uncertainty representations in the brain may have been less coupled from exploration-related neural activity in rlPFC. However, due to the fact that uncertainty-driven exploration was most characteristic of participants with higher levels of cognitive functions, it is understandable that greater relative-uncertainty-driven activity in rlPFC was observed in those with higher intellectual functioning, overall.
Limitations
A modest sample size may have precluded the detection of behavioral and neural associations of goal-directed exploration in the present sample. Additionally, PSZ recruited for this study consisted of stable outpatients on antipsychotic medication. We also did not assess duration of illness. Although we observed no effects of standardized antipsychotic dose, possible effects of medication cannot be ruled out.
Summary
Schizophrenia patients demonstrated reduced reward-seeking behavior and showed IEV deceleration that correlated positively with experiential negative symptoms. Surprisingly, behavioral measures of uncertainty-driven exploration were not significantly different between groups. We showed that trial-wise estimates of relative uncertainty in the rlPFC distinguished Explorers and Non-Explorers. Uncertainty-related activation in rlPFC was also positively associated with intellectual function. These results further elucidate the nature of reinforcement learning and decision-making in PSZ and controls, linking specific cognitive and computational processes to specific neural substrates, which could serve as biomarkers to quantify the effects of potential interventions.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722003993
Conflict of interest
The authors report no conflicts of interest related to the current manuscript.