Altered Reinforcement Learning from Reward and Punishment in Anorexia Nervosa: Evidence from Computational Modeling

Christina E. Wierenga; Erin Reilly; Amanda Bischoff-Grethe; Walter H. Kaye; Gregory G. Brown

doi:10.1017/S1355617721001326

Altered Reinforcement Learning from Reward and Punishment in Anorexia Nervosa: Evidence from Computational Modeling

Published online by Cambridge University Press: 29 November 2021

Christina E. Wierenga

Erin Reilly ,

Amanda Bischoff-Grethe ,

Walter H. Kaye and

Gregory G. Brown

Show author details

Christina E. Wierenga*: Affiliation:
University of California, San Diego, CA, USA
Erin Reilly: Affiliation:
Hofstra University, Hempstead, NY, USA
Amanda Bischoff-Grethe: Affiliation:
University of California, San Diego, CA, USA
Walter H. Kaye: Affiliation:
University of California, San Diego, CA, USA
Gregory G. Brown: Affiliation:
University of California, San Diego, CA, USA
*: *Correspondence and reprint requests to: Christina E. Wierenga, Ph.D., Professor of Psychiatry, UCSD Eating Disorder Research and Treatment Program UCSD Department of Psychiatry, University of California, Chancellor Park, 4510 Executive Dr., Suite 315, San Diego, CA, 92121, USA. E-mail: [email protected]

Article contents

Abstract
Objectives:
Methods:
Results:
Conclusions:
INTRODUCTION
METHOD
RESULTS
DISCUSSION
Supplementary material
FINANCIAL SUPPORT
CONFLICT OF INTEREST
ETHICAL STANDARDS
References

Rights & Permissions

Abstract

Objectives:

Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior.

Methods:

This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge.

Results:

AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome.

Conclusions:

This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.

Keywords

Eating disorders prediction error operant learning decision-making cognition probabilistic associative learning

Type: Research Article
Information: Journal of the International Neuropsychological Society , Volume 28 , Issue 10 , November 2022 , pp. 1003 - 1015

DOI: https://doi.org/10.1017/S1355617721001326 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © INS. Published by Cambridge University Press, 2021

INTRODUCTION

Anorexia nervosa (AN) is a serious eating disorder characterized by severe food avoidance and weight loss, an intense fear of gaining weight, and a distorted experience of one’s body (American Psychiatric Association, 2000). It is well known that individuals with AN tend to be cognitively inflexible and have impaired set-shifting, which may contribute to the high rates of chronicity and death (Papadopoulos, Ekbom, Brandt, & Ekselius, Reference Papadopoulos, Ekbom, Brandt and Ekselius2009; Roberts, Tchanturia, Stahl, Southgate, & Treasure, Reference Roberts, Tchanturia, Stahl, Southgate and Treasure2007; Roberts, Tchanturia, & Treasure, Reference Roberts, Tchanturia and Treasure2010; Tchanturia et al., Reference Tchanturia, Davies, Roberts, Harrison, Nakazato, Schmidt and Morris2012; Wu et al., Reference Wu, Brockmeyer, Hartmann, Skunde, Herzog and Friederich2014). Persistent dietary restriction despite negative consequences and evidence of altered reward and punishment sensitivity in AN (Bischoff-Grethe et al., Reference Bischoff-Grethe, McCurdy, Grenesko-Stevens, Irvine, Wagner, Yau and Kaye2013; Glashouwer, Bloot, Veensra, Franken, & de Jong, Reference Glashouwer, Bloot, Veensra, Franken and de Jong2014; Harrison, O’Brien, Lopez, & Treasure, Reference Harrison, O’Brien, Lopez and Treasure2010; Harrison, Treasure, & Smillie, Reference Harrison, Treasure and Smillie2011; Jappe et al., Reference Jappe, Frank, Shott, Rollin, Pryor, Hagman and Davis2011; Matton, Goossens, Braet, & Vervaet, Reference Matton, Goossens, Braet and Vervaet2013) raise the question of whether impaired learning from reward and loss might contribute to repeated engagement in maladaptive behavior and illness maintenance.

Dysfunction of reward processing in AN is well documented, with reduced subjective reward sensitivity and decreased limbic-striatal neural response to rewarding stimuli such as food or money (Brooks, Rask-Andersen, Benedict, & Schioth, Reference Brooks, Rask-Andersen, Benedict and Schioth2012; Fladung, Schulze, Scholl, Bauer, & Gron, Reference Fladung, Schulze, Scholl, Bauer and Gron2013; Jappe et al., Reference Jappe, Frank, Shott, Rollin, Pryor, Hagman and Davis2011; Keating, Tilbrook, Rossell, Enticott, & Fitzgerald, Reference Keating, Tilbrook, Rossell, Enticott and Fitzgerald2012; O’Hara, Schmidt, & Campbell, Reference O’Hara, Schmidt and Campbell2015; Oberndorfer et al., Reference Oberndorfer, Frank, Fudge, Simmons, Paulus, Wagner and Kaye2013; Wierenga et al., Reference Wierenga, Ely, Bischoff-Grethe, Bailer, Simmons and Kaye2014; Wu et al., Reference Wu, Brockmeyer, Hartmann, Skunde, Herzog and Friederich2016). Emerging evidence suggests processing of aversive stimuli may also be disrupted in AN; individuals with AN demonstrate elevated harm avoidance, intolerance of uncertainty, anxiety, and oversensitivity to punishment (Glashouwer et al., Reference Glashouwer, Bloot, Veensra, Franken and de Jong2014; Harrison et al., Reference Harrison, O’Brien, Lopez and Treasure2010; Harrison et al., Reference Harrison, Treasure and Smillie2011; Jappe et al., Reference Jappe, Frank, Shott, Rollin, Pryor, Hagman and Davis2011; Matton et al., Reference Matton, Goossens, Braet and Vervaet2013), which may contribute to an altered response to negative feedback or a bias to avoid outcomes perceived as aversive (Kaye et al., Reference Kaye, Wierenga, Knatz, Liang, Boutelle, Hill and Eisler2015). Neuroimaging studies support a neural dysfunction to loss, with an exaggerated (Bischoff-Grethe et al., Reference Bischoff-Grethe, McCurdy, Grenesko-Stevens, Irvine, Wagner, Yau and Kaye2013) or undifferentiated (Wagner et al., Reference Wagner, Aizenstein, Venkatraman, Fudge, May, Mazurkewicz and Kaye2007) striatal response to monetary losses compared to wins and decreased response to aversive taste (Monteleone et al., Reference Monteleone, Monteleone, Esposito, Prinster, Volpe, Cantone and Maj2017). However, much of the existing work in AN has focused on responsivity to reward and punishment, with less attention to learning from both reward and punishment (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr and Ehrlich2018; Foerde & Steinglass, Reference Foerde and Steinglass2017).

The core idea of reinforcement learning is that the rate of learning is driven by violations of expectations, or prediction errors (PEs), which are operationalized as the received outcome minus the expected outcome, and are markers of dopamine activity (Pearce & Hall, Reference Pearce and Hall1980; Rescorla and Wagner Reference Rescorla, Wagner, Black and Prokasy1972; Sutton & Barto, Reference Sutton and Barto2018). Learning from experience occurs through updating expectations about the outcome in proportion to PE, so that the expected outcome converges to the actual outcome. The majority of studies of learning in AN have focused on passive Pavlovian conditioning (Schaefer & Steinglass, Reference Schaefer and Steinglass2021), with evidence of elevated reward PE signals in the ventral striatum and orbitofrontal cortex in ill and remitted AN (GK Frank, Collier, Shott, & O’Reilly, Reference Frank, Collier, Shott and O’Reilly2016; GK Frank et al., Reference Frank, Reynolds, Shott, Jappe, Yang, Tregellas and O’Reilly2012). However, Pavlovian tasks have demonstrated poor behavioral profiles (National Institute of Mental Health, 2016). Given the importance of choice behavior and decision-making in AN, instrumental response-outcome learning may be more relevant to psychopathology. Limited behavioral data (i.e., Acquired Equivalence Task) suggest reduced reward reinforcement learning in AN (Foerde & Steinglass, Reference Foerde and Steinglass2017; Shott et al., Reference Shott, Filoteo, Jappe, Pryor, Maddox, Rollin and Frank2012).

To probe the influence of rewarding and punishing outcomes on instrumental reinforcement learning, we employed a well-studied two-choice feedback-based probabilistic associative learning task (PALT) that relies on the contingency between a participant’s response and outcome (i.e., whether or not they won or lost points) to facilitate learning (i.e., to select the optimal reward-based stimuli and avoid the nonoptimal punishment-based stimuli) (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw and Gluck2009; Herzallah et al., Reference Herzallah, Khdour, Taha, Elmashala, Mousa, Taha and Gluck2017; Herzallah et al., Reference Herzallah, Moustafa, Natsheh, Abdellatif, Taha, Tayem and Gluck2013; Mattfeld, Gluck, & Stark, Reference Mattfeld, Gluck and Stark2011; Myers et al., Reference Myers, Moustafa, Sheynin, Vanmeenen, Gilbertson, Orr and Servatius2013). The PALT is sensitive to dopaminergic medication effects on reward and punishment processing in Parkinson’s disease (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw and Gluck2009), has been applied to several psychiatric disorders (i.e., substance use, post-traumatic stress, depression (Beylergil et al., Reference Beylergil, Beck, Deserno, Lorenz, Rapp, Schlagenhauf and Obermayer2017; Herzallah et al., Reference Herzallah, Khdour, Taha, Elmashala, Mousa, Taha and Gluck2017; Myers et al., Reference Myers, Moustafa, Sheynin, Vanmeenen, Gilbertson, Orr and Servatius2013), and corresponds to functional specialization within the striatum for reward and punishment PE estimates (Mattfeld et al., Reference Mattfeld, Gluck and Stark2011). Moreover, research over the past two decades has shown that the direction and magnitude of PE may be a marker of altered dopaminergic activity in AN (Glimcher, Reference Glimcher2011; Schultz, Dayan, & Montague, Reference Schultz, Dayan and Montague1997; Schultz, Reference Schultz2016; Steinberg et al., Reference Steinberg, Keiflin, Boivin, Witten, Deisseroth and Janak2013).

Given the link between PE and reinforcement learning, it is tempting to infer group or individual differences in PE from observable reinforcement learning scores. Such an inference would be valid only if the observed scores were unidimensional and reflected PE-based learning. However, if PALT performance involved multiple processes, group or individual differences in the observed scores would be challenging to interpret because the differences might be due to any of the several processes that underlie the task (Sojitra, Lerner, Petok, & Gluck, Reference Sojitra, Lerner, Petok and Gluck2018; Strauss & Smith, Reference Strauss and Smith2009). Before comparing AN and healthy control (HC) participants, we investigated the multidimensionality of data derived from the PALT by comparing the fits of three computational reinforcement learning models.

All of these models assumed that when a stimulus is presented, participants choose between two alternatives based on unobserved choice values that reflect the participant’s expectancy of obtaining a favorable outcome (See Supplement). Once a choice is made, the expectancy value associated with the choice made is updated based on the PE and PE learning rates, represented by the parameter η (Figure 1). In expectancy value-based learning models of this type, the difference between the expectancy values associated with the two-choice alternatives is multiplied by a logistic regression weight, represented by the parameter β, to turn the value difference into a probability of choosing a particular alternative (Gershman, Reference Gershman2016); Supplement – Equation 1; Figure 1). Although the logistic regression weight has been called inverse temperature in some applications (Daw, Reference Daw, Delgado, Phelps and Robins2011), it has been described as an explore-exploit parameter in the psychology literature and reflects how decisively participants make choices based on small differences in the expectancy values (Gershman, Reference Gershman2016; Moustafa, Gluck, Herzallah, & Myers, Reference Moustafa, Gluck, Herzallah and Myers2015).

Fig. 1. (A) Rather than setting all expectancy values, V, to zero on the first trial a stimulus, s_j, is presented, as in the No Bias model, they are set either to a bias value, bias(s_j), or to zero in the First Choice Bias model. The bias(s_j) values are sampled from a normal distribution with mean zero, indicating no bias, and a precision = 10, where precision = 1/variance. If the sampled bias value for stimulus s_j is positive, the choice that would yield the optimal long-term outcome is favored and its expectancy value for trial 1, V₁(c_Opt|s_j), is set to the sampled bias value, bias(s_j), whereas the expectancy value for the nonoptimal response, V₁(c_NonOpt|s_j), is set to zero. If the sampled bias value is negative the nonoptimal choice is favored and the expectancy value for the nonoptimal choice is set to the absolute value of the bias, whereas the expectancy value for the optimal choice is set to zero. For the First Choice Bias (Singlet) model, the bias parameters for each stimulus is set to the same estimated value bias(s.). (B) The expectancy value for trial t + 1 associated with the choice c_i made to stimulus s_j on trial t, V_t+1(c_i|s_j), is the expectancy value on trial t updated by the product of a learning rate with the prediction error. Different learning rates, η_p|n, are estimated for positive or negative prediction errors, PE_p|n. Learning rates are sampled from a beta distribution using values of the α and β parameters listed in Table 2 (Also see Supplement). A logistic equation maps the differences between the expectancy value of the choice made on trial t, V_t(c_i|s_j), and the value of the choice not made, $${{\rm{V}}_{\rm{t}}}({{\rm{\overline c}}_i}|{{\rm{S}}_j})$$ , to the probability P_t(c_i|s_j) of making the chosen response c_i given that stimulus s_j was presented on trial t. The logistic regression weight β is sampled from a gamma distribution using values of the shape and rate parameters presented in Table 2 (Also see Supplement).

As shown by Shultz (Schultz, Reference Schultz2016), positive and negative PEs differentially effect dopaminergic activity. Because differential levels of dopaminergic activity influence amount of PE learning (Steinberg et al., Reference Steinberg, Keiflin, Boivin, Witten, Deisseroth and Janak2013), positive and negative PE might be associated with different PE learning rates. All models discussed in this paper assume that separate learning parameters differentially update expectancy values depending on the positive or negative valence of the PE (Gershman, Reference Gershman2016). In particular, the No Bias model is composed of the explore-exploit parameter, β, and two learning rate parameters, one to update expectancy values when PE is positive, η_p, the other when it is negative, η_n.

The No Bias model assumes that the first choice made to a stimulus is unbiased. However, global choice biases, the tendency to choose one alternative over another regardless of previous outcomes, and choice inertia bias, the tendency to repeat choices, are commonly reported in the choice literature (Fritsche, Mostert, & de Lange, Reference Fritsche, Mostert and de Lange2017; Garcia-Perez & Alcala-Quintana, Reference Garcia-Perez and Alcala-Quintana2013; Gold & Ding, Reference Gold and Ding2013; Linares, Aguilar-Lleyda, & Lopez-Moliner, Reference Linares, Aguilar-Lleyda and Lopez-Moliner2019; Morgan, Dillenburger, Raphael, & Solomon, Reference Morgan, Dillenburger, Raphael and Solomon2012). It is during experimental conditions leading to uncertainty that choice biases are most likely to be observed (Morgan et al., Reference Morgan, Dillenburger, Raphael and Solomon2012; Urai, Braun, & Donner, Reference Urai, Braun and Donner2017). When a stimulus is first presented on the PALT, participants are doubly uncertain, neither knowing whether the trial is a reward or punishment trial nor knowing which category to choose. Given this uncertainty, initial choice biases might be due to a global choice bias or to a choice history bias – the latter occurring on the initial presentation of subsequent stimuli after the first PALT stimulus is presented. If choice biases occur on the PALT, they would be unobserved processes that would obscure the use of observed scores as markers of PE learning. In the First Choice Bias model, we modeled the impact of choice biases on the expectancy value of a choice when a stimulus is first presented, which is when uncertainty is likely maximal. This model included a separately estimated bias parameter, bias(s_j), for each of the four stimuli, s_j, presented on a trial set in addition to the explore-exploit parameter, β, and the two learning rate parameters, η_p and η_n. The First Choice Bias (Singlet) model constrained estimates of the four bias parameters to be equal to a single estimated value.

Considering the importance of biases in accounting for choice performance, we predicted that the First Choice Bias model would provide a better fit to the data than would the Base model. Once the best fitting model was chosen, we tested the hypothesis that individuals with AN would demonstrate deficient reinforcement learning as evidenced by worse optimal response accuracy on reward and punishment trials and/or poorer learning rates, η_p|n, associated with positive and negative PEs compared to HCs. Moreover, within AN, differences between accuracy on reward and punishment trials or positive and negative PEs would indicate differential sensitivity to learning from rewarding or disappointing outcomes. Exploratory analyses examined associations between learning rates, size of PEs and AN symptom severity and clinical outcome.

METHOD

Participants

Forty-two individuals meeting criteria for DSM-5 restricting-type AN (4 also endorsed purging; mean age = 22.8, range = 16–60) were compared to 38 HC volunteers (mean age = 21.6, range = 15–32; Table 1). Individuals with AN were recruited from the University of California, San Diego Eating Disorders Treatment and Research outpatient Partial Hospitalization Program (PHP). The PHP uses a blend of family-based treatment and dialectical behavior therapy adapted for intensive treatment settings. Patients received treatment 6 to 10 h/day, 6 days/week, including individual, family, group, and multi-family therapy, nutritional counseling, psychiatric care, and medical monitoring (Brown et al., Reference Brown, Cusack, Anderson, Trim, Nakamura, Trunko and Kaye2018; Reilly et al., Reference Reilly, Rockwell, Ramirez, Anderson, Brown, Wierenga and Kaye2020). AN diagnosis was determined by semi-structured interview performed by program psychiatrists at treatment admission according to 2010 draft criteria for the DSM-5 (Hebebrand & Bulik, Reference Hebebrand and Bulik2011) and included atypical and partially remitted AN (BMI range: 14.5–23.8 kg/m²). HCs were recruited from the San Diego community and did not have any eating disorder symptomatology or Axis I psychiatric disorder based on a modified version of the Structured Clinical Interview for DSM-IV-TR Module H (First, Spitzer, Gibbon, & Williams, Reference First, Spitzer, Gibbon and Williams2002) and the Mini International Neuropsychiatric Interview (Sheehan et al., Reference Sheehan, Lecrubier, Sheehan, Amorim, Janavs, Weiller and Dunbar1998). See Supplement for additional exclusion criteria.

Table 1. Demographic and clinical characteristics of the sample

Note: Welch’s two sample t-tests were used to assess statistical significance for between-group differences in continuous variables. Cronbach’s alphas for all self-report measures were strong (α = .84−.99). Self-report questionnaires were completed within 16.1 days of the PALT administration.

^aTwo AN did not complete this assessment.

^bOne AN did not complete this assessment.

^cSeventeen AN were prescribed only one class of medication, 6 AN were prescribed two classes, and 2 AN were prescribed 3 classes of medication. All medications with presumed dopaminergic action fell within the atypical antipsychotic classification.

BDI = Beck Depression Inventory-Second Edition (BDI-2) (Beck, Steer, & Brown, Reference Beck, Steer and Brown1996); BIS/BAS = Behavioral Inhibition/Behavioral Activation Scale (Carver & White, Reference Carver and White1994); BMI = body mass index; EDE-Q = Eating Disorder Exam – Questionnaire (Fairburn & Beglin, Reference Fairburn and Beglin1994); SPSRQ = Sensitivity to Punishment Sensitivity to Reward Questionnaire (Torrubia, Avila, Molto, & Caseras, Reference Torrubia, Avila, Molto and Caseras2001); STAI = Spielberger State-Trait Anxiety Inventory (Spielberger, Gorsuch, & Lushene, Reference Spielberger, Gorsuch and Lushene1970); TCI = Temperament and Character Inventory (TCI; (Cloninger, Przybeck, Svrakic, & Wetzel, Reference Cloninger, Przybeck, Svrakic and Wetzel1994).

Procedure

AN participants completed the PALT on average 19.8 days (SD = 19.9) after treatment admission. Weight and height, measured via digital scale and stadiometer, were obtained at admission, within two days of PALT completion, and at discharge for AN, and during the task visit for HC. Self-report questionnaires to assess anxiety, depression and temperament traits common in AN (e.g., reward/punishment sensitivity, inhibition, harm avoidance) that might relate to learning behavior (Table 1) were completed within 16.1 days (SD = 18.9) of the PALT in AN (Harrison, Treasure, & Smillie, Reference Harrison, Treasure and Smillie2011; Jappe et al., Reference Jappe, Frank, Shott, Rollin, Pryor, Hagman and Davis2011; Wagner et al., Reference Wagner, Barbarich-Marsteller, Frank, Bailer, Wonderlich, Crosby and Kaye2006). The study was approved by the Institutional Review Board of the University of California, San Diego, research was completed in accordance with the Helsinki Declaration, and all participants gave written informed consent and received a stipend.

Probabilistic Associative Learning Task

The PALT (Figure 2) involves receiving 25 points when choosing the optimal response on reward trials, but losing 25 points when choosing the nonoptimal response on punishment trials (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011; Myers et al., Reference Myers, Moustafa, Sheynin, Vanmeenen, Gilbertson, Orr and Servatius2013). On each trial, participants saw one of four stimulus images and were prompted to decide whether it was associated with one of two categories “A” or “B”, corresponding to different response keys. Two images were randomly assigned to be “reward” stimuli in that selection of the optimal category typically produced feedback and a gain of 25 points, whereas selection of the nonoptimal category typically produced no gain of points. The remaining two images were “punishment” stimuli in that selection of the nonoptimal category typically produced feedback and a loss of 25 points, whereas selection of the optimal category typically produced no loss of points. Reward-learning trials and punishment learning trials were intermixed within the task with a favorable outcome associated with a gain on reward trials and the avoidance of loss on punishment trials. Unfavorable outcomes led to no change in points on reward trials and a loss of 25 points on punishment trials. The participant’s cumulative point tally was shown at the bottom of the screen on each trial and was initialized to 500 points at the start of the experiment. As done in prior studies (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011), two task sets were administered, each with a different set of pictures to increase the number of trials during which participants were actively learning new associations. The order of stimulus sets was counterbalanced across participants. Each set contained 160 trials, divided into four 40-trial blocks. Within each block, each stimulus appeared 10 times; 8 times the optimal response was associated with the more favorable outcome, whereas two times the nonoptimal response was associated with the more favorable outcome. For each participant, trial order was randomized within a block. Trials lasted until the participant responded and were separated by a 2s interval, during which time the screen was blank. On each trial, the computer recorded whether the participant made the optimal response, regardless of the actual outcome on that trial. The task took about 30 min to complete. The experiment was administered on a MacBook Pro, programmed in MatLab version R2016B.

Fig. 2. Probabilistic associative learning task (copied with permission from (Mattfeld et al., Reference Mattfeld, Gluck and Stark2011)).

Computational Reinforcement Learning Models

Like Confirmatory Factor Analysis, computational models of cognitive processes embody assumptions about a model’s architecture and parameters that determine how observed data are related to latent processes. Whereas the assumptions fix the architecture of a model, varying the model’s parameters can fine-tune the model’s functioning (Farrell & Lewandowsky, Reference Farrell and Lewandowsky2018). Parameters estimated for each of the three models are listed in Table 2 and discussed in more detail in the caption of Figure 1 and in Supplemental Materials. To operationalize PE size, outcome was coded 1 for gains on reward trials, −1 for loss on punishment trials, and 0 for no change in points. Successful learning drives the expectancy values toward gains, coded 1, on reward trials and toward avoidance of loss, coded 0, on punishment trials. The No Bias model allowed positive and negative PE learning rate parameters, η_p and η_n, and the explore-exploit parameter, β, to vary and set initial expectancy values to zero. The First Choice Bias model (Figure 1) allowed β, η_p and η_n to vary, but also included four parameters that determined the initial expectancy values of choices made to each of the four stimuli in order to account for choice biases. Given how expectancy values are updated, the impact of these biases propagates to subsequent trials. The First Choice Bias (Singlet) model set the four bias parameters to the same estimated value. The full First Choice Bias model was selected as the best fitting model as assessed by deviance information criterion weights (see Supplement).

Table 2. Parameters estimated for each of the four models and their prior distributions

Note. Parameters η_p and η_n represent the learning rates for positive and negative prediction errors respectively. Parameter Bias _r1 is the bias weight for the first reward stimulus; Bias _r2 is the bias weight for the second reward stimulus; Bias _p1 the bias weight for the first punishment stimulus; Bias _p2 the bias weight for the second punishment stimulus. ∼ signifies “distributed as.” The Gaussian distribution in rjags is parameterized as mean and precision, where precision = 1/variance.

Parameter estimation

We used the R routine rjags to generate Bayesian estimates of model parameters based on fits to trial by trial optimal response data for each stimulus (Plummer, Reference Plummer2017). See Supplement for details and model sensitivity analysis. The predicted block means for reward and punishment trials based on parameter estimates for the best fitting model are presented in Figure 3.

Fig. 3. Plots of the observed and predicted mean probability of selecting the optimal choice for AN and HC groups across the four blocks by trial type (reward, punishment) and picture set. We calculated for each participant the predicted block means for reward and punishment trials based on the participant’s full First Choice Bias model parameter estimates and present the average of these means for AN and HC groups for the two picture sets as black squares. As can be seen, in every instance the model derived means are within the 95% confidence interval of the observed means, and most cover the data means, supporting the prediction model. (A) For observed data, on reward trials, results indicate improved performance over time across all participants, consistent with learning, [main effect of Block, F(3,225) = 41.482, p < .001, η ² _p = .356], and the HC group had a greater learning rate overall than the AN group [Group × Block interaction, F(3,225) = 5.771, p = .001, η ² _p = .071]. However, AN performed better than HC on Set 1 and worse than HC on Set 2 [Group × Set interaction, F(1,75) = 5.556, p = .021, η ² _p = .069]. No other main effects or interactions were significant for reward trials, ps > .3. No other main effects or interactions were significant for reward trials, ps > .3. (B) On punishment trials, performance improved over time across all participants [main effect of Block, F(3,225) = 3.711, p = .012, η ² _p = .047], and HC performed better than AN [main effect of Group, F(1,75) = 6.833, p = .011, η ² _p = .083]. No other main effects or interactions were significant for punishment trials, ps > .1.

Statistical Analysis

Behavioral performance

Choice behavior was analyzed using a repeated measures analysis of variance (rmANOVA) on optimal response accuracy with Group as a between subjects effect and Block and Set as within subject effects, separately for reward trials and punishment trials.

Model-generated parameters

Analyses were performed separately for reward and punishment trials. To compare groups on learning rate parameters, we performed a rmANOVA with Group as a between effect and Set and PE learning rates (η_p, η_n) as within effects. We also performed a Group x Set rmANOVA to investigate group differences in the β parameter. To investigate the bias parameters, we averaged the two bias values for reward stimuli and the two bias values for punishment stimuli, then performed a rmANOVA involving Group × Set. To more completely examine group differences in level of learning from a PE perspective, we averaged the size of PEs over trials separating values by PE type (positive or negative) within reward and punishment trials for each set (e.g., mean negative PE for punishment trials on set 1) and submitted these means to Group × Set × PE type rmANOVAs.

Exploratory clinical associations

To examine whether standard clinical assessments are associated with learning in AN, Pearson correlational analyses examined relationships between 14 reinforcement learning model values (for each set: η_p, η_n, positive and negative PEs for each trial type, and β) and 9 AN clinical measures (age, admission BMI, EDE-Q Global score, TCI Harm Avoidance, TCI Novelty Seeking, BIS/BAS, SPSRQ, STAI, BDI) at time of study. To examine associations with treatment outcome, reinforcement learning model values were explored as predictors of BMI at discharge using hierarchical linear regression analyses, controlling for BMI at treatment admission, length of treatment, and medication status. The hierarchical linear regression analysis was repeated using each self-reported clinical measure as a predictor. Bonferroni correction for multiple comparisons was used to determine a family-wise p-value for the 14 learning model values (.004) and the 9 clinical measures (.006) assuming p = .05 for each test.

Sensitivity analyses

To examine the potential impact of low weight and medication status on our results, we compared AN participants with a BMI below 18.5 kg/m² (n = 25; 59.5% of sample) to AN participants with a BMI above 18.5 kg/m² (n = 17; 41.5% of sample), and AN participants on medication (n = 25; 61% of sample) to AN participants not on medication (n = 16; 39% of sample) on clinical measures using Welch’s two sample t-tests and repeated the rmANOVAs described above for each subsample. Small samples precluded analysis of medication class (Table 1).

RESULTS

Sample Characteristics

AN and HC groups did not differ in age or education (Table 1). AN had significantly lower current BMI (p < .001). In AN, there was a significant increase in BMI from treatment admission to discharge (t(39) = 7.9, p < .001, Cohen’s d = 1.0).

Behavioral Performance

A Group × Block × Set rmANOVA on optimal responses for reward trials revealed a main effect of Block, indicating increased accuracy over time across all participants, consistent with learning, F(3,225) = 41.482, p < .001, η ² _p = .356 (Figure 3A). We detected a Group × Block interaction, corresponding to faster learning rates in the HC group compared to AN, F(3,225) = 5.771, p = .001, η ² _p = .071. A Group × Set interaction indicated AN were more accurate than HC on Set 1, but less accurate than HC on Set 2, F(1,75) = 5.556, p = .021, η ² _p = .069.

For punishment trials, a Group × Block × Set rmANOVA revealed a main effect of Block, indicating increased accuracy over time, F(3,225) = 3.711, p = .012, η ² _p = .047 (Figure 3B). A main effect of Group indicated AN performed worse than HC, F(1,75) = 6.833, p = .011, η ² _p = .083. Taken together, both groups demonstrated greater accuracy over time (aka, learning) for reward and punishment trials; compared to HC, AN had slower overall learning on reward trials, with better overall accuracy on Set 1 and worse accuracy on Set 2 (possibly suggesting greater difficulty set-shifting and learning new associations, see (Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell and Kaye2014)), and were less accurate across punishment trials.

Model Generated Parameters

Prediction error learning rates (η)

A Group × Set × PE learning rate type (η_p vs. η_n) rmANOVA revealed a main effect of Group, indicating that AN learned more slowly than HC following both positive PEs and negative PEs, F(1,75) = 5.521, p = .021, η ² _p = .061 (Table 3; Figure 4A). A main effect of PE type revealed faster learning rates following positive PEs compared to negative PEs across the entire sample, F(1,75) = 78.792, p < .001, η ² _p = .512. That is, faster learning occurred when the outcomes were better than expected relative to when the outcomes were worse than expected.

Table 3. Reinforcement learning model generated parameters by group and set

Note: PE: predication error; η_p: learning rate for positive PE; η_n: learning rate for negative PE; β: “inverse temperature” parameter representing the balance between exploring new choice rules and exploiting the rules learned. Two HC and one AN did not complete Set 2.

Fig. 4. (A) Plot of the mean learning rate by prediction error type and group collapsed across set demonstrating the main effect of Group resulting from the Group × Set × PE type ANOVA. The main effect of Group indicated that AN learn more slowly than HC following both positive PEs and negative PEs. A main effect of PE type revealed faster learning rates following positive PEs compared to negative PEs across the entire sample. Neither the main effect of Set nor any interactions were significant (all η ²_p < .039). (B) Plot of explore-exploit values by group and set showing a main effect of Group. AN had lower β values than HC. Smaller values imply individuals are still exploring stimulus-response-outcome hypotheses and are less certain about exploiting learned rules. The main effect of Set was not significant, nor was the interaction of Group x Set (all η ²_p < .030). (C) Plot of the change in BMI from admission to discharge with size of negative PE on punishment trials of Set 1. Error bars represent standard error of the mean; *p < .05, **p < .01, ***p < .001.

Prediction error size

To directly examine whether groups might have differed in accuracy as a result of better than or worse than expected outcomes on reward and punishment trials, Group × Set × PE type rmANOVAs for average PE size revealed no effects involving Group for reward trials (all η ² _p < .025) or for punishment trials (all η ² _p < .045).

Explore-exploit strategy (β)

A Group × Set rmANOVA for the explore-exploit parameter, β, revealed a main effect of Group, whereby AN had smaller β values than HC, F(1,75) = 6.366, p = .014, η ² _p = .078 (Table 3; Figure 4B). Since smaller values imply individuals are exploring more than exploiting stimulus-response-outcome hypotheses, results indicate AN may less decisively make choices.

Choice bias parameters

To assess whether groups differed in the degree to which early reward and punishment reinforcement trials reflected choice biases, the Group × Set interaction for bias values was significant only for reward trials, indicating that HC had a greater bias against making the optimal choice on Set 1, whereas AN had a greater bias against making the optimal choice on Set 2, F(1,75) = 10.651, p = .002, η ² _p = .124 (Table 3; Figure S10). This is consistent with the behavioral response data indicating that AN outperformed HC on Set 1 and performed worse than HC on Set 2 on reward trials. No significant effects of choice bias were detected for punishment trials (all η ² _p < .018).

Exploratory Clinical Associations

No associations between reinforcement learning model parameters and clinical variables were detected in AN (uncorrected p < .05). Separate hierarchical linear regression models indicated the size of positive PE and of negative PE on punishment trials in Set 1 significantly added to the prediction of discharge BMI controlling for admission BMI, treatment length, and medication status (positive PE: multiple R² = .62, F_change(1,34) = 9.528, p = .004; negative PE: multiple R² = .56, F_change(1,34) = 15.901, p < .001). Both models remained significant after Bonferroni correction.

To test whether both positive and negative PE predicted a portion of the change in BMI with treatment, we entered both into the regression model (multiple R² = .64, F_change(2,33) = 8.546, p = .001). Negative PE (Beta = −.348, t = −2.475, p = .019) more potently predicted discharge BMI than did positive PE (Beta = −.141, t = −1.063, p = .296) (Figure 4C). In other words, AN with smaller negative PE on punishment trials on Set 1, i.e., values closer to −1.0, gained the most weight. Negative PE will approach −1 on punishment trials when successful performers learn to expect outcomes that are close to the favorable outcome, coded 0, but instead receive an unfavorable outcome, coded −1. The eight AN participants with negative PE between −.85 and −1.0 in fact had an average expectancy of 0.013 on punishment trials when negative PE occurred (range for entire sample: −.467 to .545) (see Supplement). Moreover, on punishment trials where negative PE occurred, the regression of expectancy values onto negative PE produced a significant negative regression weight (b = −.419, p = .048), implying that AN participants with larger negative PE (i.e. closer to zero) had more negative expectancies about avoiding loss.

Sensitivity Analyses

As expected, the low weight group had lower BMI at admission, time of study, and discharge (all ps < .001, all Cohen’s ds > 1.0), and showed greater change in BMI during treatment (p = .01, Cohen’s d = 1.1), but weight status groups did not differ on any other clinical measure. Medication status groups did not differ on any clinical measure, including BMI, change in BMI during treatment, length of treatment, or self-report questionnaires. The rmANOVA results from the full sample reported above were observed in the subsample contrasts. Regression results (PE on punishment trials predicting discharge BMI) were observed only in the low weight sample. Overall, sensitivity analyses suggest weight and medication status did not appreciably contribute to the full sample results.

DISCUSSION

This is the first study to apply computational models of reinforcement learning to assess learning from both reward and punishment in restricting-type AN using an instrumental probabilistic associative learning task. A unique aspect to this study is that we distinctly examined differences in instrumental reinforcement learning from better or worse than expected outcomes by deriving trial-specific PE estimates for both reward and punishment conditions. We then modeled and compared learning based on positive and negative PEs separately for reward and punishment trials to examine learning rate when a positive PE occurs (unexpectedly favorable outcome) and when a negative PE occurs (unexpectedly disappointing outcome). Model-based results indicated that both HC and AN learn better following positive PEs compared to negative PEs. Consistent with our hypotheses, individuals with AN have lower learning rates for positive and negative PEs compared to HC. This indicates that AN learn less than HCs from the same PE, slowing their learning of favorable choices. This deficit in learning to predict the most favorable choice was also evidenced in their optimal choice performance by a flatter learning curve on reward trials and by fewer optimal responses on punishment trials. These results are consistent with previous work showing poorer learning performance from reward-based feedback in ill AN (Foerde & Steinglass, Reference Foerde and Steinglass2017) and extends these findings to learning from loss-based feedback. Deficits in learning from punishment could help explain the rigid persistence of disordered eating behaviors despite negative consequences.

The degree to which cognitive inflexibility and difficulty set-shifting in AN contribute to altered reinforcement learning remains to be determined; assessing reversal learning may inform this issue. The lower explore-exploit β values observed in the AN group suggest that poor learning was not due to perseverative responding, as lower β values indicate that individuals with AN were less decisive about exploiting what they had learned and continued to explore stimulus-response outcomes rather than employing the same strategy across all trials, regardless of whether they were aware of the strategy employed. Clinically, AN is characterized by increased sensitivity to uncertainty (Kesby, Maguire, Brownlow, & Grisham, Reference Kesby, Maguire, Brownlow and Grisham2017). It is possible that diminished certainty in exploiting what they learned is secondary to uncertainty in the task contingencies, although this was not directly tested.

In addition to comparing groups on response accuracy and rate of learning, we also examined the size of PE as a determinant of learning level. Counter to our hypotheses, no group differences in magnitude of positive and negative PEs within reward or punishment trials were detected. However, within the AN group, the magnitude of negative PE when punishment was possible was most strongly associated with treatment outcome. Moreover, larger negative PEs were associated with more negative expectations on punishment trials, suggesting that AN individuals who gained the least amount of weight during the course of treatment held negative expectancies about avoiding loss on punishment trials. This negative expectancy is consistent with reports of elevated punishment sensitivity, increased lose-shift behavior on a reversal learning task (Geisler et al., Reference Geisler, Ritschel, King, Bernardoni, Seidel, Boehm and Ehrlich2017), negative interpretation bias for ambiguous social stimuli that involve the risk of rejection, and tendency to resolve ambiguity in a negative manner in AN (Cardi, Di Matteo, Gilbert, & Treasure, Reference Cardi, Di Matteo, Gilbert and Treasure2014; Cardi, Di Matteo, Corfield, & Treasure, Reference Cardi, Di Matteo, Corfield and Treasure2012; Cardi et al., Reference Cardi, Turton, Schifano, Leppanen, Hirsch and Treasure2017). No other learning parameter or clinical measure predicted BMI change during treatment, and PEs were not associated with self-report measures of sensitivity to reward or punishment, suggesting that this learning metric may be a particularly sensitive prognostic indicator.

Other studies have observed a relationship between reward PE brain response and weight gain in AN (DeGuzman, Shott, Yang, Riederer, & Frank, Reference DeGuzman, Shott, Yang, Riederer and Frank2017; GKW Frank et al., Reference Frank, DeGuzman, Shott, Laudenslager, Rossi and Pryor2018); for example, elevated absolute PE (positive and negative PE combined) response in the caudate, orbitofrontal cortex and insula has been associated with less weight gain during inpatient treatment. Taken together, our behavioral findings further support the role of altered PE in the pathophysiology of AN, extending prior findings to include operant learning in response to both reward and punishment, and are consistent with the hypothesis that a failure to appropriately modify expectancies may contribute to poor outcome.

Strengths of this study include novel aspects and refinements of the reinforcement learning model, that included modeling segregating learning for each of the four stimuli within a set, adding parameters to account for choice biases rapidly acquired on early trials, performing Bayesian estimates of model parameters for each subject, and modeling separate positive and negative PE learning rate parameters. However, reinforcement learning models are inherently limited by the parameters included in the model. While our models demonstrated good fit to the behavioral data, future work may consider testing models with additional parameters, such as a stickiness (or perseveration) parameter (Palminteri, Khamassi, Joffily, & Coricelli, Reference Palminteri, Khamassi, Joffily and Coricelli2015). To increase generalizability, we did not exclude for medication use and co-morbidities. Prior studies in major depressive disorder (MDD) report worse learning to reward (Herzallah et al., Reference Herzallah, Khdour, Taha, Elmashala, Mousa, Taha and Gluck2017), and that SSRI antidepressants impair learning from negative feedback (Herzallah et al., Reference Herzallah, Moustafa, Natsheh, Abdellatif, Taha, Tayem and Gluck2013). Notably, 50% of our sample was prescribed antidepressants, and 20% of our sample had a comorbid MDD diagnosis. Although our sensitivity analysis suggests medication status did not contribute to overall results, larger, controlled studies are needed to examine the effects of these clinical variables on reinforcement learning. We also do not have neuropsychological data to characterize the general cognitive function of participants; however, groups did not differ on reaction time on the PALT (see Supplement), suggesting the AN group did not have slowed processing speed indicative of cognitive impairment or medication effects. Thus, it is unlikely that differences in reward/punishment learning in AN are reflective of broader cognitive impairment. Lastly, change in BMI is just one metric of treatment outcome; limited data on cognitive symptoms prevented analysis of other outcome measures.

Conclusions

Results suggest that both AN and HC groups learned better following unexpected favorable outcomes (positive PEs) than unexpected disappointing outcomes, suggesting that maximizing positive PEs may potentiate learning in general. Moreover, individuals with AN demonstrated slower learning from both positive and negative experience compared to HC. Additionally, negative PEs on punishment trials were associated with worse treatment outcome. Treatments that modify negative expectations about avoiding loss, or the perceived value of the outcomes themselves, either with medication or cognitive-behavioral strategies, may be effective in promoting recovery. Overall, findings support the potential of applying computational approaches to reinforcement learning in AN to enhance mechanistic explanations of behavior, identify new neurobehavioral constructs relevant to psychopathology and advance treatment development through target identification.

Supplementary material

For supplementary material accompanying this paper visit https://doi.org/10.1017/S1355617721001326

ACKNOWLEDGMENTS

We thank Noriko Coburn, Sarah Kouzi, Danika Peterson, and Emily Romero for assistance with participant screening and data collection. In addition, we thank the individuals who participated in this study for their time.

FINANCIAL SUPPORT

This work was supported in part by grants from the National Institute of Mental Health (R01MH113588 to ABG & CEW, R21MH118409 to CEW). The contents of this manuscript are solely the responsibility of the authors and do not necessarily represent the official view of the NIH.

CONFLICT OF INTEREST

None of the authors have conflicts of interest to disclose.

ETHICAL STANDARDS

The study was approved by the Institutional Review Board of the University of California, San Diego, research was completed in accordance with the Helsinki Declaration, and all participants gave written informed consent and received a stipend.

References

REFERENCES

American Psychiatric Association (2000). Diagnostic & Statistical Manual of Mental Disorders: DSM:VI-TR (4th ed.). Washington, DC: Association AP, editor.Google Scholar

Beck, A., Steer, R., & Brown, G. (1996). Beck Depression Inventory—Second Edition. Manual. San Antonio, TX: The Psychological Corporation.Google Scholar

Bernardoni, F., Geisler, D., King, J.A., Javadi, A.H., Ritschel, F., Murr, J., … Ehrlich, S. (2018). Altered medial frontal feedback learning signals in anorexia nervosa. Biological Psychiatry, 83(3), 235–243. doi: 10.1016/j.biopsych.2017.07.024.CrossRef Google Scholar PubMed

Beylergil, S.B., Beck, A., Deserno, L., Lorenz, R., Rapp, M., Schlagenhauf, F., … Obermayer, K. (2017). Dorsolateral prefrontal cortex contributes to the impaired behavioral adaptation in alcohol dependence. Neuroimage Clinical, 15, 80–94. doi: 10.1016/j.nicl.2017.04.010.CrossRef Google Scholar

Bischoff-Grethe, A., McCurdy, D., Grenesko-Stevens, E., Irvine, L., Wagner, A., Yau, W.-Y., … Kaye, W. (2013). Altered brain response to reward and punishment in adolescents with anorexia nervosa. Psychiatry Research Neuroimaging, 214(3), 331–340. doi: 10.1016/j.pscychresns.2013.07.004.CrossRef Google Scholar PubMed

Bodi, N., Keri, S., Nagy, H., Moustafa, A., Myers, C.E., Daw, N., … Gluck, M. (2009). Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson’s patients. Brain, 132(Pt 9), 2385–2395. doi: 10.1093/brain/awp094.CrossRef Google Scholar PubMed

Brooks, S., Rask-Andersen, M., Benedict, C., & Schioth, H. (2012). A debate on current eating disorder diagnoses in light of neurobiological findings: is it time for a spectrum model? BMC Psychiatry, 12, 76. doi: 10.1186/1471-244X-12-76.CrossRef Google Scholar PubMed

Brown, T.A., Cusack, A., Anderson, L.K., Trim, J., Nakamura, T., Trunko, M.E., & Kaye, W.H. (2018). Efficacy of a partial hospital programme for adults with eating disorders. European Eating Disorder Review, 26(3), 241–252. doi: 10.1002/erv.2589.CrossRef Google Scholar PubMed

Cardi, V., Di Matteo, R., Gilbert, P., & Treasure, J. (2014). Rank perception and self-evaluation in eating disorders. The International Journal of Eating Disorders, 47(5), 543–552. doi: 10.1002/eat.22261.CrossRef Google Scholar PubMed

Cardi, V., Di Matteo, R., Corfield, F., & Treasure, J. (2012). Social reward and rejection sensitivity in eating disorders: An investigation of attentional bias and early experiences. The World Journal of Biological Psychiatry, 14(8), 622–633. doi: 10.3109/15622975.2012.665479.CrossRef Google Scholar PubMed

Cardi, V., Turton, R., Schifano, S., Leppanen, J., Hirsch, C., & Treasure, J. (2017). Biased interpretation of ambiguous social scenarios in anorexia nervosa. European Eating Disorders Review, 25(1), 60–64. doi: 10.1002/erv.2493 CrossRef Google Scholar PubMed

Carver, C. & White, T. (1994). Behavioral inhibition, behavioral activation, and affective responses to impending reward and punishment: the BIS/BAS Scales. Journal of Personality and Social Psychology, 67, 319–333. doi: 10.1037/0022-3514.67.2.319 CrossRef Google Scholar

Cloninger, C., Przybeck, T., Svrakic, D., & Wetzel, R. (1994). The Temperament and Character Inventory (TCI): A Guide to Its Development and Use (Vol. 2, Chapter 4, pp. 19–28). St. Louis, MO: Center for Psychobiology of Personality, Washington University, ISBN 0-9642917-1-1.Google Scholar

Daw, N.D. (2011). Trial-by-trial data analysis using computational models. In Delgado, M.R., Phelps, E.A., & Robins, T.W. (Eds.), Decision making, affect, and learning, attention and performance, Vol. XXIII, (pp. 5–38). Oxford: Oxford University Press. doi: 10.1093/acprof:oso/9780199600434.003.0001 Google Scholar

DeGuzman, M., Shott, M., Yang, T., Riederer, J., & Frank, G. (2017). Association of elevated reward prediction error response with weight gain in adolescent anorexia nervosa. American Journal of Psychiatry, 174(6), 557–565. doi: 10.1176/appi.ajp.2016.16060671.CrossRef Google Scholar PubMed

Fairburn, C.G. & Beglin, S. (1994). Assessment of eating disorders: interview or self-report questionnaire? The International Journal of Eating Disorders, 16, 363–370. doi: 10.1002/1098–108X(199412)16:4<363::AID-EAT2260160405>3.0.CO;2-#.3.0.CO;2-#>CrossRef Google Scholar PubMed

Farrell, S. & Lewandowsky, S. (2018). Computational Modeling of Cognition and Behavior. New York: Cambridge University Press.CrossRef Google Scholar

Filoteo, J., Paul, E., Ashby, F., Frank, G., Helie, S., Rockwell, R., … Kaye, W. (2014). Simulating category learning and set shifting deficits in patients weight-restored from anorexia nervosa. Neuropsychology, 28(5), 741–751. doi: 10.1037/neu0000055.CrossRef Google Scholar PubMed

First, M., Spitzer, R., Gibbon, M., & Williams, J. (2002). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Patient Edition (SCID-I/P). New York: Biometrics Research, New York State Psychiatric Institute.Google Scholar

Fladung, A., Schulze, U., Scholl, F., Bauer, K., & Gron, G. (2013). Role of the ventral striatum in developing anorexia nervosa. Translational Psychiatry, 3, e315 doi: 10.1038/tp.2013.88.CrossRef Google Scholar PubMed

Foerde, K. & Steinglass, J. (2017). Decreased feedback learning in anorexia nervosa persists after weight restoration. The International Journal of Eating Disorders, 50(4), 415–423. doi: 10.1002/eat.22709.CrossRef Google Scholar PubMed

Frank, G., Collier, S., Shott, M., & O’Reilly, R. (2016). Prediction error and somatosensory insula activation in women recovered from anorexia nervosa. The Journal of Psychiatry & Neuroscience, 41(2), 304–311. doi: 10.1503/jpn.150103.CrossRef Google Scholar PubMed

Frank, G., DeGuzman, M., Shott, M., Laudenslager, M., Rossi, B., & Pryor, T. (2018). Association of brain reward learning response with harm avoidance, weight gain, and hypothalamic effective connectivity in adolescent anorexia nervosa. JAMA Psychiatry, 75(10), 1071–1080. doi: 10.1001/jamapsychiatry.2018.2151.CrossRef Google Scholar PubMed

Frank, G., Reynolds, J., Shott, M., Jappe, L., Yang, T., Tregellas, J., & O’Reilly, R. (2012). Anorexia nervosa and obesity are associated with opposite brain reward response. Neuropsychopharmacology, 37(9), 2031–2046. doi: 10.1038/npp.2012.51.CrossRef Google Scholar PubMed

Fritsche, M., Mostert, P., & de Lange, F. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590–595. doi: 10.1016/j.cub.2017.01.006.CrossRef Google Scholar PubMed

Garcia-Perez, M. & Alcala-Quintana, R. (2013). Shifts of the psychometric function: Distinguishing bias from perceptual effects. Quarterly Journal of Experimental Psychology, 66(3), 319–337. doi: 10.1080/17470218.2012.708761.CrossRef Google Scholar PubMed

Geisler, D., Ritschel, F., King, J., Bernardoni, F., Seidel, M., Boehm, I., … Ehrlich, S. (2017). Increased anterior cingulate cortex response precedes behavioural adaptation in anorexia nervosa. Scientific Reports, 7, 42066. doi: 10.1038/srep42066.CrossRef Google Scholar PubMed

Gershman, S. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6. doi: 10.1016/j.jmp.2016.01.006.CrossRef Google Scholar

Glashouwer, K., Bloot, L., Veensra, E., Franken, I., & de Jong, P. (2014). Heightened sensitivity to punishment and reward in anorexia nervosa. Appetite, 75, 97–102. doi: 10.1016/j.appet.2013.12.019.CrossRef Google Scholar PubMed

Glimcher, P. (2011). Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences of the United States of America, 108(Suppl 3), 15647–15654. doi: 10.1073/pnas.1014269108.CrossRef Google Scholar PubMed

Gold, J. & Ding, L. (2013). How mechanisms of perceptual decision-making affect the psychometric function. Progress in Neurobiology, 103, 98–114. doi: 10.1016/j.pneurobio.2012.05.008.CrossRef Google Scholar PubMed

Harrison, A., O’Brien, N., Lopez, C., & Treasure, J. (2010). Sensitivity to reward and punishment in eating disorders. Psychiatry Research, 177(1–2), 1–11. doi: 10.1016/j.psychres.2009.06.010.CrossRef Google Scholar PubMed

Harrison, A., Treasure, J., & Smillie, L. (2011). Approach and avoidance motivation in eating disorders. Psychiatry Research, 188(3), 396–401. doi: 10.1016/j.psychres.2011.04.022 CrossRef Google Scholar PubMed

Hebebrand, J. & Bulik, C. (2011). Critical appraisal of the provisional DSM-5 criteria for anorexia nervosa and an alternative proposal. The International Journal of Eating Disorders, 44(8), 665–678. doi: 10.1002/eat.20875.CrossRef Google Scholar

Herzallah, M., Khdour, H., Taha, A., Elmashala, A., Mousa, H., Taha, M., … Gluck, M. (2017). Depression reduces accuracy while Parkinsonism slows response time for processing positive feedback in patients with Parkinson’s Disease with comorbid major depressive disorder tested on a Probabilistic Category-Learning Task. Frontiers in Psychiatry, 8, 84. doi: 10.3389/fpsyt.2017.00084.CrossRef Google Scholar PubMed

Herzallah, M., Moustafa, A., Natsheh, J., Abdellatif, S., Taha, M., Tayem, Y., … Gluck, M. (2013). Learning from negative feedback in patients with major depressive disorder is attenuated by SSRI antidepressants. Frontiers in Integrated Neuroscience, 7, 67. doi: 10.3389/fnint.2013.00067.CrossRef Google Scholar PubMed

Jappe, L., Frank, G., Shott, M., Rollin, M., Pryor, T., Hagman, J., … Davis, E. (2011). Heightened sensitivity to reward and punishment in anorexia nervosa. The International Journal of Eating Disorders, 44(4), 317–324. doi: 10.1002/eat.20815.CrossRef Google Scholar PubMed

Kaye, W., Wierenga, C., Knatz, S., Liang, J., Boutelle, K., Hill, L., & Eisler, I. (2015). Temperament-based treatment for anorexia nervosa. European Eating Disorders Review, 23(1), 12–18. doi: 10.1002/erv.2330.CrossRef Google Scholar PubMed

Keating, C., Tilbrook, A., Rossell, S., Enticott, P., & Fitzgerald, P. (2012). Reward processing in anorexia nervosa. Neuropsychologia, 50(5), 567–575. doi: 10.1016/j.neuropsychologia.2012.01.036.CrossRef Google Scholar PubMed

Kesby, A., Maguire, S., Brownlow, R., & Grisham, J. (2017). Intolerance of uncertainty in eating disorders: An update on the field. Clinical Psychology Review, 56, 94–105. doi: 10.1016/j.cpr.2017.07.002.CrossRef Google Scholar PubMed

Linares, D., Aguilar-Lleyda, D., & Lopez-Moliner, J. (2019). Decoupling sensory from decisional choice biases in perceptual decision making. eLife, 8, e43994. doi: 10.7554/eLife.43994.CrossRef Google Scholar PubMed

Mattfeld, A., Gluck, M., & Stark, C. (2011). Functional specialization within the striatum along both the dorsal/ventral and anterior/posterior axes during associative learning via reward and punishment. Learning & Memory, 18(11), 703–711. doi: 10.1101/lm.022889.111.CrossRef Google Scholar PubMed

Matton, A., Goossens, L., Braet, C., & Vervaet, M. (2013). Punishment and reward sensitivity: Are naturally occurring clusters in these traits related to eating and weight problems in adolescents? European Eating Disorders Review, 21, 184–194. doi: 10.1002/erv.2226.CrossRef Google Scholar PubMed

Monteleone, A., Monteleone, P., Esposito, F., Prinster, A., Volpe, U., Cantone, E., … Maj, M. (2017). Altered processing of rewarding and aversive basic taste stimuli in symptomatic women with anorexia nervosa and bulimia nervosa: An fMRI study. Journal of Psychiatric Research, 90, 94–101. doi: 10.1016/j.jpsychires.2017.02.013.CrossRef Google Scholar PubMed

Morgan, M., Dillenburger, B., Raphael, S., & Solomon, J. (2012). Observers can voluntarily shift their psychometric functions without losing sensitivity. Attention, Perception & Psychophysics, 74(1), 185–193. doi: 10.3758/s13414-011-0222-7.CrossRef Google Scholar PubMed

Moustafa, A., Gluck, M., Herzallah, M., & Myers, C. (2015). The influence of trial order on learning from reward vs. punishment in a probabilistic categorization task: experimental and computational analyses. Frontiers in Behavioral Neuroscience, 9, 153. doi: 10.3389/fnbeh.2015.00153.CrossRef Google Scholar

Myers, C., Moustafa, A., Sheynin, J., Vanmeenen, K., Gilbertson, M., Orr, S., … Servatius, R. (2013). Learning to obtain reward, but not avoid punishment, is affected by presence of PTSD symptoms in male veterans: empirical data and computational model. PLoS One, 8(8), e72508. doi: 10.1371/journal.pone.0072508.CrossRef Google Scholar

National Institute of Mental Health (2016). Behavioral assessment methods for RDoC constructs [Internet]. [cited 2020 Nov 28].Google Scholar

Oberndorfer, T., Frank, G., Fudge, J., Simmons, A., Paulus, M., Wagner, A., … Kaye, W. (2013). Altered insula response to sweet taste processing after recovery from anorexia and bulimia nervosa. American Journal of Psychiatry, 214(2), 132–141. doi: 10.1176/appi.ajp.2013.11111745.Google Scholar

O’Hara, C., Schmidt, U., & Campbell, I. (2015). A reward-centered model of anorexia nervosa: A focussed narrative review of the neurological and psychophysiological literature. Neuroscience and Biobehavioral Reviews, 52, 131–152. doi: 10.1016/j.neubiorev.2015.02.012.CrossRef Google Scholar

Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature Communications, 6, 8096. doi: 10.1038/ncomms9096.CrossRef Google Scholar PubMed

Papadopoulos, F., Ekbom, A., Brandt, L., & Ekselius, L. (2009). Excess mortality, causes of death and prognostic factors in anorexia nervosa. The British Journal of Psychiatry, 194(1), 10–17. doi: 10.1192/bjp.bp.108.054742.CrossRef Google Scholar PubMed

Pearce, J. & Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552. doi: 10.1037/0033-295X.113.3.584.CrossRef Google Scholar

Plummer, M. (2017). JAGS version 4.3.0 User Manual. ∼https://web.sgh.waw.pl/∼atoroj/ekonometria_bayesowska/jags_user_manual.pdf.Google Scholar

Reilly, E.E., Rockwell, R.E., Ramirez, A.L., Anderson, L.K., Brown, T.A., Wierenga, C.E., & Kaye, W.H. (2020). Naturalistic outcomes for a day-hospital programme in a mixed diagnostic sample of adolescents with eating disorders. European Eating Disorders Review, 28(2), 199–210. doi: 10.1002/erv.2716.CrossRef Google Scholar

Rescorla, R.A. & Wagner, A.R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In Black, A.H. & Prokasy, W.F. (Eds.), Classical conditioning II: Current research and theory, (pp. 64–99). New York: Appleton Century Crofts.Google Scholar

Roberts, M., Tchanturia, K., Stahl, D., Southgate, L., & Treasure, J. (2007). A systematic review and meta-analysis of set-shifting ability in eating disorders. Psychological Medicine, 37(8), 1075–1084. doi: 10.1017/S0033291707009877.CrossRef Google Scholar PubMed

Roberts, M., Tchanturia, K., & Treasure, J. (2010). Exploring the neurocognitive signature of poor set-shifting in anorexia and bulimia nervosa. Journal of Psychiatric Research, 44(14), 964–970. doi: 10.1016/j.jpsychires.2010.03.001.CrossRef Google Scholar PubMed

Schaefer, L. & Steinglass, J. (2021). Reward learning through the lens of RDoC: A review of theory, assessment, and empirical findings in the eating disorders. Current Psychiatry Reports, 23(1), 2. doi: 10.1007/s11920-020-01213-9.CrossRef Google Scholar PubMed

Schultz, W. (2016). Dopamine reward prediction error coding. Dialogues in Clinical Neuroscience, 18(1), 23–32. doi: 10.31887/DCNS.2016.18.1/wschultz.CrossRef Google Scholar PubMed

Schultz, W., Dayan, P., & Montague, P. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. doi: 10.1126/science.275.5306.1593.CrossRef Google Scholar PubMed

Sheehan, D.V., Lecrubier, Y., Sheehan, K.H., Amorim, P., Janavs, J., Weiller, E., … Dunbar, G.C. (1998). The Mini-international neuropspychiatric interview (M.I.N.I.): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. Journal of Clinical Psychiatry, 59(20), 22–33; quiz 34–57.Google Scholar PubMed

Shott, M., Filoteo, J., Jappe, L., Pryor, T., Maddox, W., Rollin, M., … Frank, G. (2012). Altered implicit category learning in anorexia nervosa. Neuropsychology, 26(2), 191–201. doi: 10.1037/a0026771.CrossRef Google Scholar PubMed

Sojitra, R., Lerner, I., Petok, J., & Gluck, M. (2018). Age affects reinforcement learning through dopamine-based learning imbalance and high decision noise-not through Parkinsonian mechanisms. Neurobiology of Aging, 68, 102–113. doi: 10.1016/j.neurobiolaging.2018.04.006.CrossRef Google Scholar PubMed

Spielberger, C., Gorsuch, R., & Lushene, R. (1970). STAI Manual for the State Trait Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press.Google Scholar

Steinberg, E.E., Keiflin, R., Boivin, J., Witten, I., Deisseroth, K., & Janak, P. (2013). A causal link between prediction errors, dopamine neurons and learning. Nature Neuroscience, 16(7), 966–973. doi: 10.1038/nn.3413.CrossRef Google Scholar

Strauss, M. & Smith, G. (2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5, 1–25. doi: 10.1146/annurev.clinpsy.032408.153639.CrossRef Google Scholar PubMed

Sutton, R. & Barto, A. (2018). Reinforcement Learning: An Introduction (2nd ed.). Cambridge, MA: The MIT Press.Google Scholar

Tchanturia, K., Davies, H., Roberts, M., Harrison, A., Nakazato, M., Schmidt, U., … Morris, R. (2012). Poor cognitive flexibility in eating disorders: Examining the evidence using the Wisconsin Card Sorting Task. PLoS One, 7(1), e28331. doi: 10.1371/journal.pone.0028331.CrossRef Google Scholar PubMed

Torrubia, R., Avila, C., Molto, J., & Caseras, X. (2001). The sensitivity to punishment and sensitivity to reward questionnaire (SPSRQ) as a measure of Gray’s anxiety and impulsivity dimensions. Personality and Individual Differences, 31(6), 837–862. doi: 10.1016/S0191-8869(00)00183-5.CrossRef Google Scholar

Urai, A., Braun, A., & Donner, T. (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications, 8, 14637. doi: 10.1038/ncomms14637.CrossRef Google Scholar PubMed

Wagner, A., Aizenstein, H., Venkatraman, M., Fudge, J., May, J., Mazurkewicz, L., … Kaye, W.H. (2007). Altered reward processing in women recovered from anorexia nervosa. American Journal of Psychiatry, 164(12), 1842–1849. doi: 10.1176/appi.ajp.2007.07040575.CrossRef Google Scholar PubMed

Wagner, A., Barbarich-Marsteller, N.C., Frank, G.K., Bailer, U.F., Wonderlich, S.A., Crosby, R.D., … Kaye, W.H. (2006). Personality traits after recovery from eating disorders: do subtypes differ? International Journal of Eating Disorders, 39(4), 276–284. doi: 10.1002/eat.20251.CrossRef Google Scholar PubMed

Wierenga, C., Ely, A., Bischoff-Grethe, A., Bailer, U., Simmons, A., & Kaye, W. (2014). Are extremes of consumption in eating disorders related to an altered balance between reward and inhibition? Frontiers in Behavioral Neuroscience, 9(8), 410. doi: 10.3389/fnbeh.2014.00410.Google Scholar

Wu, M., Brockmeyer, T., Hartmann, M., Skunde, M., Herzog, W., & Friederich, H. (2014). Set-shifting ability across the spectrum of eating disorders and in overweight and obesity: a systematic review and meta-analysis. Psychological Medicine, 44(16), 3365–3385. doi: 10.1017/S0033291714000294.CrossRef Google Scholar PubMed

Wu, M., Brockmeyer, T., Hartmann, M., Skunde, M., Herzog, W., & Friederich, H. (2016). Reward-related decision making in eating and weight disorders: A systematic review and meta-analysis of the evidence from neuropsychological studies. Neuroscience and Biobehavioral Reviews, 61, 177–196. doi: 10.1016/j.neubiorev.2015.11.017.CrossRef Google Scholar PubMed

Fig. 1. (A) Rather than setting all expectancy values, V, to zero on the first trial a stimulus, sj, is presented, as in the No Bias model, they are set either to a bias value, bias(sj), or to zero in the First Choice Bias model. The bias(sj) values are sampled from a normal distribution with mean zero, indicating no bias, and a precision = 10, where precision = 1/variance. If the sampled bias value for stimulus sj is positive, the choice that would yield the optimal long-term outcome is favored and its expectancy value for trial 1, V1(cOpt|sj), is set to the sampled bias value, bias(sj), whereas the expectancy value for the nonoptimal response, V1(cNonOpt|sj), is set to zero. If the sampled bias value is negative the nonoptimal choice is favored and the expectancy value for the nonoptimal choice is set to the absolute value of the bias, whereas the expectancy value for the optimal choice is set to zero. For the First Choice Bias (Singlet) model, the bias parameters for each stimulus is set to the same estimated value bias(s.). (B) The expectancy value for trial t + 1 associated with the choice ci made to stimulus sj on trial t, Vt+1(ci|sj), is the expectancy value on trial t updated by the product of a learning rate with the prediction error. Different learning rates, ηp|n, are estimated for positive or negative prediction errors, PEp|n. Learning rates are sampled from a beta distribution using values of the α and β parameters listed in Table 2 (Also see Supplement). A logistic equation maps the differences between the expectancy value of the choice made on trial t, Vt(ci|sj), and the value of the choice not made, $${{\rm{V}}_{\rm{t}}}({{\rm{\overline c}}_i}|{{\rm{S}}_j})$$, to the probability Pt(ci|sj) of making the chosen response ci given that stimulus sj was presented on trial t. The logistic regression weight β is sampled from a gamma distribution using values of the shape and rate parameters presented in Table 2 (Also see Supplement).

Table 1. Demographic and clinical characteristics of the sample

Fig. 2. Probabilistic associative learning task (copied with permission from (Mattfeld et al., 2011)).

Table 2. Parameters estimated for each of the four models and their prior distributions

Fig. 3. Plots of the observed and predicted mean probability of selecting the optimal choice for AN and HC groups across the four blocks by trial type (reward, punishment) and picture set. We calculated for each participant the predicted block means for reward and punishment trials based on the participant’s full First Choice Bias model parameter estimates and present the average of these means for AN and HC groups for the two picture sets as black squares. As can be seen, in every instance the model derived means are within the 95% confidence interval of the observed means, and most cover the data means, supporting the prediction model. (A) For observed data, on reward trials, results indicate improved performance over time across all participants, consistent with learning, [main effect of Block, F(3,225) = 41.482, p < .001, η2p = .356], and the HC group had a greater learning rate overall than the AN group [Group × Block interaction, F(3,225) = 5.771, p = .001, η2p = .071]. However, AN performed better than HC on Set 1 and worse than HC on Set 2 [Group × Set interaction, F(1,75) = 5.556, p = .021, η2p = .069]. No other main effects or interactions were significant for reward trials, ps > .3. No other main effects or interactions were significant for reward trials, ps > .3. (B) On punishment trials, performance improved over time across all participants [main effect of Block, F(3,225) = 3.711, p = .012, η2p = .047], and HC performed better than AN [main effect of Group, F(1,75) = 6.833, p = .011, η2p = .083]. No other main effects or interactions were significant for punishment trials, ps > .1.

Table 3. Reinforcement learning model generated parameters by group and set

Fig. 4. (A) Plot of the mean learning rate by prediction error type and group collapsed across set demonstrating the main effect of Group resulting from the Group × Set × PE type ANOVA. The main effect of Group indicated that AN learn more slowly than HC following both positive PEs and negative PEs. A main effect of PE type revealed faster learning rates following positive PEs compared to negative PEs across the entire sample. Neither the main effect of Set nor any interactions were significant (all η2p< .039). (B) Plot of explore-exploit values by group and set showing a main effect of Group. AN had lower β values than HC. Smaller values imply individuals are still exploring stimulus-response-outcome hypotheses and are less certain about exploiting learned rules. The main effect of Set was not significant, nor was the interaction of Group x Set (all η2p< .030). (C) Plot of the change in BMI from admission to discharge with size of negative PE on punishment trials of Set 1. Error bars represent standard error of the mean; *p < .05, **p < .01, ***p < .001.

Wierenga et al. supplementary material

File 17.4 MB

Article contents

Altered Reinforcement Learning from Reward and Punishment in Anorexia Nervosa: Evidence from Computational Modeling

Abstract

Keywords

INTRODUCTION

METHOD

Participants

Procedure

Probabilistic Associative Learning Task

Computational Reinforcement Learning Models

Parameter estimation

Statistical Analysis

Behavioral performance

Model-generated parameters

Exploratory clinical associations

Sensitivity analyses

RESULTS

Sample Characteristics

Behavioral Performance

Model Generated Parameters

Prediction error learning rates (η)

Prediction error size

Explore-exploit strategy (β)

Choice bias parameters

Exploratory Clinical Associations

Sensitivity Analyses

DISCUSSION

Conclusions

Supplementary material

ACKNOWLEDGMENTS

FINANCIAL SUPPORT

CONFLICT OF INTEREST

ETHICAL STANDARDS

References

REFERENCES

Wierenga et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests