1 Introduction
For centuries, scholars have been investigating whether humans make rational decisions (e.g., Reference BernoulliBernoulli, 1738/1954), where “rationality” is defined as the conformity of choices to an axiomatic system of preferences (e.g., Reference von Neumann and Morgensternvon Neumann Morgenstern, 1947, Reference LuceLuce, 1959). Across different situations, human decision makers seem to violate every single principle of economic rationality (e.g., Reference Müller-Trede, Sher and McKenzieMüller-Trede et al., 2015; see Reference Rieskamp, Busemeyer and MellersRieskamp et al., 2006, for an overview), which sparked the development of descriptively more adequate decision-making theories (e.g., Reference Roe, Busemeyer and TownsendRoe et al., 2001; see Reference Busemeyer, Gluth, Rieskamp and TurnerBusemeyer et al., 2019, for a recent review). Many of these theories aim to explain violations of independence from irrelevant alternatives (Reference LuceLuce, 1959), according to which preferences between any two options should be unaffected by addition or removal of other available options. Violations of this type are called context effects, such as the attraction effect (Reference Huber, Payne and PutoHuber et al., 1982), the compromise effect (Reference SimonsonSimonson, 1989), or the similarity effect (Reference TverskyTversky, 1972), and are among the most-studied phenomena in the decision-making literature (e.g., Reference Trueblood, Brown, Heathcote and BusemeyerTrueblood et al., 2013).
Context effects are assumed to arise due to the multi-attribute nature of choice alternatives, with their attributes being evaluated not in isolation but in comparison to one another (e.g., Reference Noguchi and StewartNoguchi Stewart, 2018; see Reference Spektor, Bhatia and GluthSpektor et al., 2021 for a recent review). For example, a hiring officer might have to decide between job candidates that are equally qualified but differ in terms of their work experience and salary expectations. All theories that rely on a multi-attribute structure assume that attribute values are exactly known and accessible to the decision maker. However, in many situations, people have to infer the properties of choice alternatives from interactions with them. These decisions from experience have been shown to differ substantially from their description-based counterparts (Reference Wulff, Canseco and HertwigWulff et al., 2018). For example, when people are faced with a decision between two described lottery options with a discrete number of monetary rewards, they choose as if they are overweighting the probabilities of unlikely events (as proposed by prospect theory: Reference Kahneman and TverskyKahneman Tversky, 1979). In contrast, when participants have to learn about the two options from experience, the opposite pattern occurs, a phenomenon that became known as the description-experience gap in risky choice (Reference Hertwig and ErevHertwig Erev, 2009). Research on experience-based choices bears a lot of potential to understand how contexts affect choices and the cognitive processes underlying them, as these types of decisions provide insights not only about how people make decisions but also about which representations of the options they obtain. For example, traditional context-effect research often relies on decisions between lottery options that are characterized by a single non-zero outcome (e.g., Reference HerneHerne, 1999, Reference TverskyTversky, 1972, Reference WedellWedell, 1991, Reference Soltani, De Martino and CamererSoltani et al., 2012), where the outcomes and their corresponding probabilities span a two-dimensional attribute space. While it has been shown that classical context effects that rely on a multi-attribute structure of options can arise when people obtain such a representation from experience (Reference Hadar, Danziger and HertwigHadar et al., 2018), the required representation does not always arise (Reference Hadar, Danziger and HertwigHadar et al., 2018, Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019, Reference Ert and LejarragaErt Lejarraga, 2018).
Despite the evidence that individuals often do not obtain such a representation in experience-based choices, their choices were nevertheless systematically influenced by the context (Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019). This influence can best be described as follows: In a choice situation in which there are no clearly superior options, options whose rewards (outcomes) are particularly different (or distinct) from other rewards (repeatedly over trials) are chosen more often compared to a choice situation in which the very same outcomes are not as different from the other rewards — the accentuation effect. For example, consider the three stocks X, Y, and Z, whose values across six months are depicted in Table 1. The values of X and Y are negatively correlated, such that when X’s value rises, the value of Y tends to decrease. If the value of Z is positively correlated with that of Y (and negatively with that of X), then the value of X is particularly distinct from the other two, and therefore perceived as more attractive.Footnote 1 For the accentuation effect to arise, it is not necessary for the decision maker to hold the kind of multi-dimensional attribute representation of the options that is available when options are described; Choice context affects choices by making certain rewards particularly distinct (e.g., from rewards that are negatively correlated over trials).
One of the main limitations of past research on context effects in experience-based choices has been the reliance on the full-feedback paradigm (Reference Ert and LejarragaErt Lejarraga, 2018, Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019; but see Reference Hadar, Danziger and HertwigHadar et al., 2018, that used a different paradigm) in which individuals repeatedly make consequential choices between multiple options and receive feedback about the obtained and the forgone outcomes (i.e., from the options they did not choose). However, in many real-life situations, decision makers do not obtain counterfactual information about non-chosen options: for example, hiring officers might never find out how well the job candidates that were not hired would have performed had they joined the company. On the other hand, whenever forgone feedback is available, it is often highly relevant: if the hiring officers learn about the performance of the rejected job candidates, they might want to try hiring them at a later point in time. Therefore, the feedback stems from what we call relevant alternatives, as the decision maker is motivated to learn about the performance of the non-chosen alternatives (the rejected job candidates in the example) in order to try and choose (hire in the example) them in the future. However, processing the information from the chosen and non-chosen options is cognitively taxing. Imagine the effort that hiring officers would need to exert trying to follow up on the performance of rejected job candidates at other companies. It is therefore expected that people will try to reduce the cognitive costs associated with processing forgone outcomes by engaging in a heuristic process, at the cost of potential loss of utility. For example, when tracking the performance of rejected job applicants, hiring officers might focus on those whose performance consistently differ from the other applicants (as in the case of the accentuation effect). Past research has demonstrated that merely paying attention to choice options increases the propensity to choose them (Reference Cavanagh, Wiecki, Kochar and FrankCavanagh et al., 2014, Reference Gluth, Spektor and RieskampGluth et al., 2018), so hiring officers who focus their attention on specific candidates would be more likely to hire candidates with unique profiles in the future.
The goal of the present work is to shed light on the role of information relevance on the manifestation of the accentuation effect. Specifically, we investigated whether the accentuation effect also occurs when some of the information presented about non-chosen options is not informative, that is, it does not provide any new evidence in favor of or against choosing those non-chosen options. If this is the case, choices cannot be explained by a cost–benefit account, according to which context effects could arise as a by-product of the vast amount of information that has to be processed. Being aware of which pieces of information are informative, decision makers that maximize their rewards would ignore the ones that are not informative and focus on those that are. Foreshadowing our results, we found that individuals successfully ignored this information if it was not tied to the task they were solving. However, when information was task-related, it was processed similarly to how relevant information would be processed. The type of influence is consistent with a recently proposed learning model (Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019) but not with other prominent theoretical accounts. For example, it cannot be explained by a higher decision weight for salient events (Reference Bordalo, Gennaioli and ShleiferBordalo et al., 2012) or by contextual value adaptation (Reference Palminteri, Khamassi, Joffily and CoricelliPalminteri et al., 2015). Overall, our results suggest that context effects also occur in situations in which there is no need to allocate attentional resources between a large amount of information.
2 Experiment 1
In Experiment 1, we have investigated the accentuation effect in a setting in which individuals did not obtain counterfactual information about non-chosen options (i.e., information about what rewards the other options would have yielded had they been chosen). However, in contrast to similar tasks, we provided individuals with reminders about what they received from the non-chosen options when they chose them in the past. In contrast to a setting that provides counterfactual information, the additionally displayed reminders about the outcomes of non-chosen options do not contain any new information and are thus irrelevant outcomes from relevant options (relevant options since they are available for choice). Importantly, we argue that incorporating these outcomes into the preference formation of the chosen option reflects, in addition to violations of economic rationality, also an inefficient allocation of cognitive resources: Since the information provided about non-chosen options is not new, a rational decision maker should have incorporated this information into her evaluation of the option’s value when that option was actually chosen; On every trial, she would fully focus on the outcome of the chosen option. Ignoring the irrelevant outcomes at the same time minimizes the amount of information that has to be processed and, therefore, the cognitive resources required. Moreover, this information is of unknown counterfactual validity to the individuals, so treating it as new information would bias their estimates.
2.1 Method
2.1.1 Participants and procedure
A total of 40 participants (29 female, 11 male, age 19–32, M = 22.62, SD = 3.34), mostly students of psychology at the University of Freiburg, with normal or corrected-to-normal vision, participated in the experiment. After giving informed consent, participants completed the experiment in individual cubicles. The procedure consisted of a demographic questionnaire, task instructions, a training block to assess learning performance, and two blocks of the experimental task (the order of which was counter-balanced across participants). In total, the experiment took approximately 45–60 min to complete and participants received the course-credit equivalent of an hour. Due to the hypothetical nature of the choices and the game-like framing of the task, we provided feedback about how many points they got in comparison to the other participants as motivation. We did not exclude any participants or trials. The behavioral data of both experiments and code for the computational models are available at https://osf.io/s52z8/.
2.1.2 Paradigm and materials
The paradigm used in the experiment was a heavily modified variant of the n-armed bandit problem (Reference Sutton and BartoSutton Barto, 1998) with partial feedback. In an n-armed bandit problem, individuals repeatedly choose between (the same) n different options that provide monetary rewards according to their underlying outcome distributions. These outcome distribution are not known to the decision maker at the onset of the experiment. After each choice, they obtain a realization from the respective outcome distribution, thus learning which options yield the highest rewards through trial-and-error. In the present experiment, the outcome distributions of the options was comprised of the sum of three components: a systematic component, a constant (grand mean), and a noise component. The systematic component was based on three different events that occurred with certain probabilities, containing an option-specific outcome. Every time an event occurred, it yielded the same outcome. On top of this systematic component, a constant (or “grand mean”) was added on every trial. This grand mean differed between participants and changed during the experiment multiple times, stemming from the value ranges 25–35, 35–45, and 45–55. Finally, a non-systematic noise component from a standard normal distribution was added on top of the other two components (see Figure 1A for the information display from the perspective of the participants). We developed the paradigm so that an isolated value representation is complicated whereas relating options’ trial-by-trial outcomes to past outcomes of non-chosen options is comparatively easier.
After a short inter-trial interval (400–600 ms), participants made a self-paced choice. The chosen option was highlighted for 900–1,100 ms and the non-chosen options were blurred, after which feedback about the current value of the grand mean, the current event, and the outcome of the chosen option on that trial was presented. This outcome was added to the participants’ tally. Additionally, for every non-chosen option for which participants have encountered the same event (irrespective of the grand mean) at least once, they saw a reminder about what the grand mean and the outcome at the last observation with the respective event was. If the current event has not occurred when another option was chosen, then that option’s reminder field remained blank. Importantly, the reminders did not contain any new information about the outcome distributions of the options. The feedback was presented for 4,000–4,500 ms, after which a new trial began.
The correlation between events and outcomes is an essential part of the experimental design. In a full-feedback setting, this structure is observable on a trial-by-trial basis. However, in a partial-feedback setting, non-chosen feedback is not provided, so it is not possible to establish a link between specific events and outcomes. For individuals to be able to relate events to outcomes, this information has to be conveyed explicitly. To do so and to increase task engagement, we framed it as an extraterrestrial space mission (similarly to Reference Kool, Cushman and GershmanKool et al., 2016). Participants were told that a rare, valuable resource was found on extraterrestrial planets and that they were in charge of trying to retrieve as much of that valuable resource as possible. They had a selection of probes (representing the different options) available where each of the probes would try to retrieve as much of the rare resource as possible before returning to earth. Additionally, they were told that it was known that the amount retrieved depended on the color of a nearby star (representing the different events) and on the visibility on the planet (representing the grand mean). They were not told how each of these components related to each other. Figure 1B provides an illustration of a choice trial from the perspective of a participant. In this illustration, past feedback is available only for one of the two non-chosen options.
2.1.3 Design
A training block contained 40 trials in which individuals chose between a high-valued (HV) and a low-valued (LV) option and was used to assess general learning performance. The outcomes of the options depended on two events, Ꮛ 1 and Ꮛ 2, that occurred with probabilities Pr(Ꮛ 1) = .6 and Pr(Ꮛ 2) = .4. When Ꮛ 1 occurred, HV had an option-specific component of 33 and LV an option-specific component of -1. When Ꮛ 2 occurred, HV yielded –7 points and LV yielded –6 points. Therefore, the option-specific expected value (EV) of HV was EV(HV) = 17 and EV(LV) = −3. The training block acquainted the participants with the task and was used to assess learning performance. The grand mean was initiated at the beginning of the training block and changed after 20 trials. Grand means comprised of a single draw from U(25, 35), U(35, 45), or U(45, 55), and the distribution from which they stemmed were drawn randomly without replacement.
The design of the experimental blocks was based on (Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019), Experiment 4: The Accentuation Effect, in which two options, B and C, were available for choice in two different choice sets of three options. The outcomes of the options depended on three events, Ꮛ 1, Ꮛ 2, and Ꮛ 3, that occurred with probabilities Pr(Ꮛ 1) = .6, Pr(Ꮛ 2) = .3, and Pr(Ꮛ 3) = .1. When Ꮛ 1 occurred, B had an option-specific component of –5.5 and C an option-specific component of –1.5. When Ꮛ 2 occurred, B yielded 6 points and C yielded 4 points. Finally, when Ꮛ 3 occurred, B resulted in a gain of 23 points and C in a gain of 5 points. In total, the options had the identical EV, with B being a riskier option (i.e., with a higher variance) than C.
The third option in each choice set of the experimental blocks, A or D, served as a decoy for C or B, respectively, therefore supposedly increasing that option’s choice proportion relative to the other option. In other words, in the choice set S 1 = {A, B, C}, C should be perceived as more attractive relative to B and in the choice set S 2 = {D, B, C}, B should be perceived as more attractive relative to C. The outcome distributions of the options were constructed such that options A and B (C and D) yielded relatively similar outcomes on a trial-by-trial basis that are relatively dissimilar to option C (B): When event Ꮛ 1 (Ꮛ 2; Ꮛ 3) occurred, A resulted in a loss of 7 points (gain of 7 points; gain of 33 points) and D resulted in a gain of 2 points (0 points; 0 points).
To illustrate the effect of salience, consider the case of when event Ꮛ 3 occurs: In choice context S 1, the options A, B, and C yield 33, 23, and 5 points, respectively. The most salient outcome (i.e., the outcome that is most dissimilar to the other outcomes) is 5 points of option C. In S 2, options B, C, and D yield 23, 5, and 0 points, respectively. Here, the 23 points of option B are most salient, even though the outcomes of both B and C are identical across choice context (see Table 2 for a full description of the options and the choice sets they appear in).
Note. Option-specific components of the options were tied to the occurrence of events. See sec:design for details.
Individuals completed the experimental blocks in a counter-balanced order and made 150 choices in each choice set. Grand means were drawn the same way as in the training block and changed twice within a block after 50 and 100 trials. The grand-mean changes were included to (1) encourage continuous learning in the task, (2) conceal that two of the options are identical across choice sets, and (3) to invalidate the past outcomes as counterfactuals (across grand means). In all cases, event occurrences were pseudo-randomly generated to be representative every 10 trials within an option. To avoid perceptual saliency effects, outcomes were truncated at 10 and 99. Associations between stimuli and the events, options, and choice sets they represent were randomized across participants.
Learning performance was quantified using two different dependent variables: raw accuracy and corrected accuracy. Raw accuracy reflects the proportion of HV choices in the training block Pr(HV). Values of Pr(HV) > .5 reflect that individuals were able to learn that HV has a higher EV than LV. However, due to the random nature of observed outcomes, LV can have yielded better outcomes than HV for a limited number of observations. Corrected accuracy controls for the influence of sampling error by computing the running mean (i.e., the mean of all outcomes previously observed) within each option. A “correct” response is therefore choosing the option that has the higher running mean.
The manifestation of context effects was quantified using the relative choice share of the target (RST; Reference Berkowitsch, Scheibehenne and RieskampBerkowitsch et al., 2014), where target is the option whose attractiveness is supposed to increase according to the accentuation effect: where Pr(T) is the proportion of target choices (i.e., C in choice set S 1 and B in choice set S 2, respectively) and Pr(C) is the proportion of competitor choices (i.e., B in choice set S 1 and C in choice set S 2, respectively). RST values range from 0 (competitor is always chosen) to 1 (target is always chosen), where RST = .50 indicates an absence of a context effect. RST > .50 indicates the presence of an accentuation effect. By using the RST as a dependent measure, we automatically control for individual prior preferences for low- or high-variance options (i.e., safe or risky options, respectively).
We also checked for violations of a less restrictive variant of the independence axiom, weak independence from irrelevant alternatives (see Reference Rieskamp, Busemeyer and MellersRieskamp et al., 2006). This weaker axiom is violated if significantly more people prefer C over B in S 1 while simultaneously preferring B over C in S 2 than the other way round. In contrast to the stronger axiom, it does not restrict the choice proportions to be exactly equal but only requires the ordering of choices to remain stable across contexts.
2.1.4 Computational modeling
To assess the influence of irrelevant outcomes on a trial-by-trial basis, we analyzed the data in two different ways: First, we assessed whether the probability of repeating the same choice can be predicted by the obtained reward and the chosen option’s salience using a logistic regression. Second, in order to obtain a mechanistic understanding of the cognitive processes underlying learning and decision making in the task, we used a formal modeling approach.
Within the context of the regression analysis, we relied on two predictors:
-
The difference between the obtained reward and the running mean of the chosen option as the first predictor. In other words, it is positive if the obtained outcome is above the average of that option’s previous outcomes and negative if it is below average and it reflects the degree to which individuals are sensitive to outcomes. Essentially, it corresponds to a reward-prediction error which is the standard learning signal in the literature (e.g., Reference Schultz, Dayan and MontagueSchultz et al., 1997).
-
The chosen option’s salience, which is the centered standardized mean pairwise Euclidian distance between all outcomes presented on the screen (i.e., the obtained outcome and the past outcomes of non-chosen options, if available). The standardization achieves that all saliences are between 0 and 1, such that the predictor variable reflects the deviation from an “average” salience in every trial. This measure of salience is one that has relatively few assumptions and satisfies the following properties: (1) If only one or two outcomes are observed, then salience has no influence, (2) values above 0 (below 0) indicate that the chosen option’s outcome is more (less) salient than the past outcomes from non-chosen options, and (3) a value of 0 indicates that the chosen option’s outcome has an average salience.
Regression weights of the reward predictor above 0 reflect that individuals are sensitive to rewards: If the obtained outcome is better than what they expected, they are more likely to choose the same option again. Regression weights of the chosen option’s salience above 0 reflect that individuals compare the chosen option’s outcome with past outcomes of non-chosen options in line with the similarity mechanism: If an option’s outcome is particularly salient on a given trial, then individuals are more likely to choose it again (compared to a situation in which the outcome is not as salient).
For the formal-modeling analysis, we fit a total of three nested reinforcement-learning models, two commonly used reinforcement-learning models that do not assume an influence of irrelevant outcomes and rigorously compared it to the accentuation of differences model that assumes such an influence (see Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019). The first and most simple model is the basic reinforcement learning model. It keeps track of the subjective expectation Q i,t of option i on trial t and updates it using the reward-prediction error:
where R i,t is the reward obtained on trial t. The only parameter of the basic reinforcement learning model is learning rate α, ranging from 0 to 1 and governing the degree to which individuals adapt to recent rewards.
The second model is the marginal utility function model that nests the basic reinforcement learning model. In contrast to the latter, the former assumes a power function that maps observed rewards onto subjective utilities:
In the case of γ = 1, the marginal utility function model reduces to the basic reinforcement learning model. Values of γ above 1 (between 0 and 1) represent risk-seeking (risk-averse) behavior.
Finally, the accentuation of differences model assumes that subjective utilities are not evaluated in isolation but rather that particularly salient rewards receive more attention and are in turn perceived as more attractive (relative to less salient rewards). This intuition is implemented in form of an inhibitory similarity mechanism, which conceptually corresponds to inverse saliency. The same intuition applies: The more similar (i.e., closer on the number line) an option’s outcome is to the other options’ outcomes, the less attractive it becomes. Formally, in Equation 2 is replaced by
where Z is the average negative exponential distance between the respective option’s perceived reward and the other perceived rewards,
and is the average perceived reward of all outcomes that scales Z up from a (0, 1) to a standardized scale (i.e., average perceived reward). The set J contains the last-seen rewards of non-chosen options for the same event that occurred.
The core parameter that determines the degree to which individuals take the similarity mechanism into account is η; η = 0 reflects an agent that ignores saliency, η > 0 is the standard case in which saliency increases an option’s attractiveness, and η < 0 reflects a situation in which saliency reduces an option’s attractiveness. Additionally, the scaling parameter ψ determines the sensitivity to the numerical distance between outcomes (see Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019, for additional details and validations in a full-feedback setting).
To transform subjective expectations into choice probabilities, we used a soft-max choice rule with choice-sensitivity parameter θ:
We fit each model within a hierarchical Bayesian framework (see Reference Gelman, Carlin, Stern, Dunson, Vehtari and RubinGelman et al., 2013, for an introduction) and compared the models using the leave-one-out information criterion (LOOIC; Reference Vehtari, Gelman and GabryVehtari et al., 2017). The LOOIC quantifies how well a model can explain the data and penalizes models that are complex to avoid overfitting. It does so by computing the effective number of parameters that does not depend directly on the number of free parameters but rather on how parameter values affect model predictions (see Reference Vehtari, Gelman and GabryVehtari et al., 2017, for details). Lower LOOICs reflect better penalized-for-complexity model fits. Models have been specified using weakly informative priors.
2.2 Results
2.2.1 Behavioral analyses
First, we checked whether participants chose the higher-valued option HV more often than they chose the lower-valued option LV in the training block. A one-sample t test on raw accuracy against .5 confirmed that HV was chosen in more than half of the cases (M = .65, SD = .10; t(39) = 9.45, p < .001, d = 1.49, 95% CI [1.04, 1.94]). A one-sample t test on corrected accuracy led to the same conclusion (M = .66, SD = .11; t(39) = 9.39, p < .001, d = 1.48, 95% CI [1.03, 1.93]).
Second, we checked whether individuals’ choices violated the independence axiom (and, therefore, economic rationality). A violation of independence would be reflected in a significant change of the relative preference of options B and C across the two choice sets S 1 and S 2 (see Figure 2, left panel, for mean choice proportions in the two choice sets and Figure 3, top row, for aggregated choice proportions in bins of 10 trials). Behavior in line with the independence axiom would result in RSTs = .5 and the presence of an accentuation effect would result in RSTs > .5. A one-sample t test on RSTs against .5 confirmed the presence of a substantial accentuation effect (MRST = .59, SDRST = .14; t(39) = 4.05, p < .001, d = 0.64, 95% CI [0.30, 0.98]), where people chose the target option on average almost 50% more often than the competitor.
We followed up this analysis with a test for violations of “weak” independence from irrelevant alternatives which, if violated, contradicts more fundamental principles of economic rationality; This principle states that while relative choice proportions (e.g., the RST) can vary across contexts, modal choices should not. In contrast to this notion, 22 out of 40 individuals (55%) chose C more often than B in choice set S 1 but B more often than C in choice set S 2. In contrast, only 4 out of 40 individuals (10%) had the opposite pattern, which is the control condition to rule out random fluctuation. A 2×2 χ2 contingency test confirmed the difference in preference-shift proportions (χ2(1)=16.47, p < .001).
2.2.2 Computational modeling
We investigated whether the observed violation of economic rationality can be explained by an attentional salience mechanism. According to such a mechanism, particularly salient outcomes of the chosen option increase and particularly non-salient outcomes of the chosen option decrease its attractiveness. For each individual, we performed logistic regressions on the probability of choosing the same option again with the reward-prediction error and the chosen option’s outcome salience as predictors. A one-sample t test on the reward-prediction-error regression weight (M = 0.02, SD = 0.04) confirmed that individuals were sensitive to rewards (t(39) = 3.81, p < .001, d = 0.60, 95% CI [0.26, 0.94]). Crucially, the chosen option’s outcome salience also incrementally predicted choice-repetition probability, as confirmed by a one-sample t test on the respective regression weight (M = 3.24, SD = 4.58; t(39) = 4.47, p < .001, d = 0.71, 95% CI [0.36, 1.05]).
Finally, we have compared the accentuation of differences model, a computational model that formalizes the cognitive processes supposedly underlying learning and decision making in the task, to two alternative models — nested within the accentuation of differences model—in their ability to explain the trial-by-trial choices of individuals (see Table 3 for results of the model comparison). The critical difference between the accentuation of differences model and the other reinforcement-learning models is that the accentuation of differences model assumes a mechanism that leads to outcome-salience dependent valuation of options. In line with the model-free analysis, our model comparison revealed that the accentuation of differences model provides a better account of the data (LOOIC = 22,389, SE = 112.69) than the utility-function model, the better of the other two models (LOOIC = 22,802, SE = 106.84), ΔLOOIC = 413 (SE = 44.36), resulting in a standardized effect size of 9.31σ, which means that the predictions of the better model are 9.31 standard errors away from those of the worse model (see Reference Vehtari, Gelman and GabryVehtari et al., 2017, for details). An additional model comparison with a basic reinforcement learning model that updates the expectations of non-chosen options using the reminders of past outcomes provided the worst account of the data (LOOIC = 23,650, SE = 115.06), a performance even below the chance level of 26,367 (after accounting for model complexity), ruling out that people simply confused the presented reminders with actual forgone outcomes or valid counterfactual outcomes.
Note. LOOIC = Leave-one-out information criterion (Reference Vehtari, Gelman and GabryVehtari et al., 2017). p LOOIC = Effective number of parameters. SE LOOIC = Standard error of the LOOIC. All measures are reported on the deviance scale.
The obtained group-level parameter estimates of the accentuation of differences model shed light on the cognitive processes at work. With a mean learning rate of α = .04, individuals have a rather long time window of integration. The mean curvature of the utility function reflects a moderate degree of risk aversion, γ = 0.61, and the parameter that determines the degree of similarity-based inhibition is positive, η = 0.36 (however, its highest-density interval overlaps with 0, suggesting a substantial degree of individual differences). See Table 4 for a summary of the group-level posterior.
Note. See sec:models of Experiment 1 for a detailed description of the model and its respective parameters.
2.3 Discussion
The accentuation effect is a context effect that emerges from the trial-by-trial salience of outcomes in a learning setting (Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019). Experiment 1 investigated this effect in an experience-based setting with partial feedback and reminders of past outcomes from non-chosen options. We found that in such a setting, individuals showed an accentuation effect of considerable size. Analyses based on a varying degree of assumptions, ranging from a logistic regression to a full-fledged model comparison, confirmed the assumed mechanism underlying the accentuation effect, namely that dissimilar outcomes are perceived as more attractive.
Previous comparable studies (Reference Ert and LejarragaErt Lejarraga, 2018, Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019) used a full-feedback paradigm in which individuals obtained counterfactual feedback about the rewards of the non-chosen options (i.e., the reward they would have earned had they chosen them). Compared to partial feedback, full feedback substantially facilitates the task as individuals obtain information about forgone outcomes free-of-cost. In contrast, individuals in the partial-feedback situation have to trade off forgone rewards from not-choosing the option they think is the best with the possibility of finding an option that is even better — an exploration–exploitation dilemma (e.g., Reference Navarro, Newell and SchulzeNavarro et al., 2016). More importantly, individuals cannot compare the outcomes of the options with each other, a necessary condition for certain rewards to be particularly salient and, therefore, for the accentuation effect to arise.
To compensate for this property of the partial-feedback paradigm, Experiment 1 substantially deviated from the full-feedback paradigm by not only leaving out the forgone feedback but also by providing individuals with information about the structure of the environment (information about the star color and visibility referring to the grand-mean component and the event, respectively); Information that individuals in the full-feedback paradigm did not obtain. Most notably, individuals also saw the outcomes they have gotten from non-chosen options in the past. While we found no evidence that individuals treated these outcomes as actual outcomes or valid counterfactuals, it is possible that individuals still perceived these reminders to be informative of what they would have gotten had they chosen the respective option, especially since these options were relevant for them.
Given the design of the experiment, whenever individuals have observed the outcomes of both non-chosen options in the same event and the same grand-mean component, the outcomes were in fact not too far away from being valid counterfactuals. However, this was somewhat rarely the case, especially for the less frequent events; Only in 81% of the trials did participants actually obtain the reminders from both of the non-chosen options, and only in 77% of these cases (63% of the total trials) was the information from the same grand-mean component. Put differently: In 37% of all trials, paying any attention at all to the past outcomes would introduce a non-negligible bias into any estimate based on them. Only a sophisticated understanding of how the different components relate to each other could correct for this bias. It is unlikely that participants obtain such an understanding and correct for the bias in order to draw counterfactual conclusions about the non-chosen options. We therefore argue that the past outcomes from non-chosen options were, indeed, entirely “irrelevant”. Experiment 2 aimed to explore how grave this misperception is: Do individuals blindly react to fully irrelevant information, much like with anchoring (Reference Tversky and KahnemanTversky Kahneman, 1974) or does it only occur when the information is factually irrelevant but stems from relevant options?
3 Experiment 2
The first experiment demonstrated how normatively irrelevant feedback can lead to context-dependent learning in a partial-feedback setting. However, the irrelevant feedback came from relevant options, so individuals might process this information as if it was relevant. The goal of the second experiment was to investigate the effect of irrelevant information from irrelevant options. More precisely, it was to test whether the influence of irrelevant information is goal independent and occurs in any situation in which it is available (much like a purely perceptual phenomenon) or whether individuals only process such information if it is goal relevant.
In order to address this question, Experiment 2 flipped around the logic of Experiment 1: Instead of providing de facto irrelevant information from relevant options, we provided individuals with valid counterfactual information from irrelevant information sources. If the accentuation effect is mainly a perceptual phenomenon based on numerical saliency, it should arise in Experiment 2 as well; After all, the visual presentation is mostly identical in both experiments.
3.1 Method
The experiment was a modified variant of Experiment 1, with the following differences. A total of 51 participants (30 female, 21 male, age 18–22, M = 20.12, SD = 1.00), mostly students with different majors from the Universitat Pompeu Fabra, Barcelona, took part in the experiment. The experiment took approximately 30–40 min to complete and participants received a show-up fee of 5 Euro and a choice-dependent bonus of up to 4 Euro. We did not exclude any participants or trials.
The experimental part of the task was a modified version of the one from the first experiment. Instead of three options in the choice set (and 150 decisions), participants always chose between two options, namely B and C, for 100 trials in each context. Participants received feedback about the outcome of the chosen option, the current event, and the grand mean, much like in Experiment 1. However, the “irrelevant feedback” they received this time stemmed from two options that were explicitly labeled as unavailable to them. The outcomes of the non-available options corresponded to the counterfactual forgone outcome of the non-chosen option and hypothetical counterfactual outcomes of option A (context S 1) and option D (context S 2). For example, if option B was chosen, the outcomes of the non-available options were the forgone outcome of option C and the outcome of option A (or D in the other context). On each trial, these values were randomly mapped to the two non-available space ships so individuals could not learn that the outcome of one of the non-available ships in fact corresponded to the other available option. Figure 1C illustrates the difference between the two experiments.
3.2 Results
We confirmed that participants chose the higher-valued option HV more often than the lower-valued option LV in the training block. A one-sample t test on raw accuracy against .5 confirmed that HV was chosen in more than half of the cases (M = .60, SD = .11; t(50) = 6.40, p < .001, d = 0.90, 95% CI [0.57, 1.22]). A one-sample t test on corrected accuracy led to the same conclusion (M = .61, SD = .12; t(50) = 6.50, p < .001, d = 0.91, 95% CI [0.58, 1.23]).
In contrast to Experiment 1, we did not find a significant change in the choice proportions, as reflected in a one-sample t test on RSTs against .5 (MRST = .52, SDRST = .10; t(50) = 1.32, p = .19, d = 0.18, 95% CI [-0.09, 0.46]). See Figure 2, right panel, for mean choice proportions in the two choice sets and Figure 3, bottom row, for aggregated choice proportions in bins of 10 trials. In line with this, the test for violations of the weak version of the independence principle showed that 13 out of 51 participants (25%) had chosen C more often than B in S 1 and at the same time B more often than C in S 2, with 7 individuals (14%) showing the opposite pattern. The difference was not significant, as confirmed by a 2×2 χ2 contingency test (χ2(1)=1.55, p = .21).
In line with the main behavioral results, the logistic regression revealed a significant influence of the reward-prediction error on the probability to repeat the previous choice (i.e., reward sensitivity; M = 0.01, SD = 0.03; t(50) = 2.24, p = .03, d = 0.31, 95% CI [0.03, 0.59]), but no influence of the chosen option’s outcome salience (M = 0.65, SD = 3.99; t(50) = 1.16, p = .25, d = 0.16, 95% CI [-0.12, 0.44]). The high individual variability in the salience weighting was reflected in the model comparison (see Table 3), where the accentuation of differences model provided the best account of the data (LOOIC = 13,332, SE = 56.52), but only with a small margin to the second-best model , with ΔLOOIC = 70 (SE = 22.99) and a standardized effect size of 3.04σ (see Table 4 for a summary of the group-level posterior of the accentuation of differences model).
Given that participants were not randomly allocated to the two experiments, a direct comparison between them is not possible. Nevertheless, descriptively, the effect sizes obtained in Experiment 2 are consistently lower than in Experiment 1, suggesting a consistently lower degree of context dependency.
3.3 Discussion
Experiment 2 aimed to distinguish whether the accentuation effect is a purely perceptual phenomenon or whether it is related to goal-related processes. To do so, we have flipped around the logic of the first experiment by providing individuals with new information that stemmed from irrelevant alternatives. In this setting we found no evidence of a pronounced accentuation effect, as all analyses agreed that the accentuation effect does not arise in this setting.
4 General Discussion
The present work investigated whether individuals form preferences in learning tasks independently of irrelevant outcomes. In contrast to notions of economic rationality, we found that preferences shift depending on the choice context and that these preference shifts are driven by irrelevant outcomes, but only if these irrelevant outcomes stem from relevant options. In such a situation, our results support the recently established notion that particularly salient outcomes on a trial-by-trial basis increase that option’s perceived attractiveness.
4.1 A model of the experiment
So far, the accentuation effect has been investigated in a full-feedback paradigm only (Reference Spektor, Gluth, Fontanesi and RieskampSpektor et al., 2019). In this paradigm, the psychological process supposedly underlying it is a rather straight-forward process: It is easy to compare outcomes with one another on a trial-by-trial basis and discount options whose outcomes are similar. Not only does the exploration–exploitation dilemma make the partial-feedback setting considerably more complex, but it is also not possible to directly compare outcomes with each other. This increased complexity has been shown to result in slower learning (Reference Yechiam and BusemeyerYechiam Busemeyer, 2005), lower choice accuracy (Reference Rakow, Newell and WrightRakow et al., 2015, Reference Yechiam and RakowYechiam Rakow, 2012, Reference Palminteri, Khamassi, Joffily and CoricelliPalminteri et al., 2015), and a higher impact of surprising outcomes (Reference Plonsky and ErevPlonsky Erev, 2017). In order to facilitate learning in the task and isolate the expected effect of outcome saliency on choices, we have provided individuals with some structural information about the task, information that is typically not provided in experience-based paradigms.
In such a setting, we were able to show that the mere presence of irrelevant outcomes affects preference formation. However, this was only the case when the irrelevant information came from relevant choice alternatives. Whenever the information came from supposedly entirely irrelevant sources, individuals successfully ignored it. This indicates that the irrelevant information of relevant options is interpreted as relevant information and is considered in the decision-making process. These results speak against the notion that the accentuation effect is a perceptual phenomenon that is insensitive to the relevance of sources. Nevertheless, the use of irrelevant information from relevant alternatives poses a violation of normative principles. Any kind of reprocessing of past outcomes as relevant information would lead to a biased estimate of the perceived value of the currently chosen option, the non-chosen option, or both. Additionally, a comparison of the past outcome with the outcome of the currently chosen option (in line with the assumed mechanism underlying the accentuation effect) would violate the independence principle and lead to context effects. Even less rigid extensions of economic rationality, such as those that assume a cost–benefit analysis of information acquisition, would predict that the irrelevant information should be ignored: irrespective of how it is processed, not processing it at all is less effort than even the most heuristic kind of processing.
It is noteworthy that the experimental setting might have suggested to the individuals that the information that is presented to them is somehow relevant and that they should use it, despite explicit instructions about the factual irrelevance thereof. In this case, individuals would be solving a different problem than what the experimenters expect them to solve (e.g., Reference Szollosi and NewellSzollosi Newell, 2020, Reference KellenKellen, 2019). We see no strong reason to suspect that we are dealing with such a situation: A model that explicitly treats past outcome reminders as if it was valid information cannot account for the behavior observed in Experiment 1. Moreover, while Experiment 2 provided more information for individuals to make use of (even though they were not aware of that fact), it did not significantly affect participants’ behavior. Finally, even if individuals felt they had to use the information presented to them somehow, it is doubtful the information carries any suggestion in line with the mechanism giving rise to the accentuation effect. In sum, the behavior observed in the present experiments is unlikely to occur due to experimental demand effects.
4.2 Broader relevance of accentuation effects
While the present study was designed to elicit the strongest accentuation effect possible, this specific experimental setup is not necessary for accentuation effects to arise. Situations in which individuals get partial feedback along with reminders of past choices (e.g., when online shops remind their customers of past purchases or when streaming services provide reminders of already-watched movies) are quite common in everyday life, and accentuation effects are expected to occur in these situations as well. Importantly, these reminders are often of questionable informational relevance. The present study sheds light on how individuals are susceptible to the influence of particularly distinct outcomes in such situations and the role of informational relevance.
Traditionally, context-effects research has relied on choice options that are each described on two attribute dimensions (Reference TverskyTversky, 1972, Reference Huber, Payne and PutoHuber et al., 1982, Reference SimonsonSimonson, 1989, Reference Trueblood, Brown, Heathcote and BusemeyerTrueblood et al., 2013). The accentuation effect breaks with this tradition by not being defined in terms of an interaction between attribute dimensions but by the trial-by-trial reward dynamics. Although both types of context effects constitute violations of the independence axiom, their qualitative differences raise the question whether the effects belong to the same or to distinct categories of context effects. So far, the two types of context effects do not seem to arise within the same setting. Future research should clarify the degree to which a separate treatment is necessary or not.
4.3 Alternative models, possible explanations, and conclusion
The present study used a similarity mechanism with an underlying reinforcement-learning mechanism to interpret participants’ behavior. Here, we will discuss whether the observed behavior is compatible with alternative theoretical approaches, even though we are not aware of any alternative model that would be able to account for the observed choices without additional assumptions and adaptations.
The semantically most closely related model is surely salience theory (Reference Bordalo, Gennaioli and ShleiferBordalo et al., 2012), according to which particularly salient outcome states receive a higher decision weight, where salience is essentially the range of outcomes. As a theory of decisions under risk (that assumes perfect knowledge of the options’ outcome distributions), two possible modifications to the setting of repeated choices come to mind. First, individuals might learn the reward contingencies explicitly (as displayed in Table 2). Second, individuals might use the trial-by-trial salience to determine the degree to which they update reward expectations (i.e., the learning rate). In both implementations, salience within each event across the two choice sets would change only marginally, where event Ꮛ 3 would receive the highest decision weight, failing to predict the choice pattern observed in choice set S 1.
Within the reinforcement-learning framework, a different approach to context-dependent preferences is contextual value adaptation (Reference Palminteri, Khamassi, Joffily and CoricelliPalminteri et al., 2015). In the present experimental design, contextual value would not exert any influence as our design explicitly controlled for contextual value, where both decoy options had the same expected value. Non-reinforcement-learning models often rely on recall of instances from memory to form preferences (Reference Erev and RothErev Roth, 2014, Reference Gonzalez and DuttGonzalez Dutt, 2011). Within these frameworks, individuals draw a sample of single trials from memory, process that sample, and choose the option with the highest criterion. These models could be augmented with a mechanism resembling the similarity mechanism in various different ways. For example, the values could be processed during the trial and these processed values (that already take into account outcome salience) could be stored in memory, people could recall an entire trial and then process it much like the reinforcement-learning model does. Irrespective of the concrete mechanistic implementation, the main phenomenon remains: Outcome salience in a context with relevant-but-invalid information affects preferences.