1 Introduction
Through lottery decisions, economic agents can reveal their level of risk tolerance. Agents can, however, make decisions that are inconsistent with most classical decision theory, namely, choices that are first-order stochastically dominated (FOSD). Such a choice is defined, roughly, as accepting a lesser prize or a lower probability of a higher prize.
Previous studies have investigated FOSD or inconsistent choice either as a necessity to explain subsets of their data (Holt & Laury, Reference Holt and Laury2002) or to test the impacts of complexity in decisions under risk (Charness et al., Reference Charness, Karni and Levin2007, Reference Charness, Eckel, Gneezy and Kajackaite2018). These studies span both the laboratory (Loomes, Reference Loomes1991; Polisson et al., Reference Polisson, Quah and Renou2020; Dembo et al., Reference Dembo, Kariv, Polisson and Quah2021)Footnote 1 and the field (Jacobson & Petrie, Reference Jacobson and Petrie2009; Galarza, Reference Galarza2009). Depending on the complexity of the lottery choice, task type and elicitation setting, FOSD violation rates (or inconsistent choices) have varied greatly across studies, ranging from under 10% to around 50%. The majority of these studies focus on a single decision or elicitation task type, often repeated with some slight variation in riskiness.
This paper contributes to these literature by documenting the prevalence of stochastically dominated choices across several commonly used elicitation tasks in a single experiment. Theoretically, I provide the conditions for which a risky decision over Arrow securities along a budget line yields the possibility of an FOSD violation, while empirically I check violation frequency in a set of important tasks against a pair of interesting benchmarks.
2 Data
The theoretical environment, experimental setting and data used in this report are from the recent risk elicitation paper (Friedman et al., Reference Friedman, Habib, James and Williams2022) (henceforth referred to as VRE22). The experiment had 142 undergraduate students at UC Santa Cruz, each engaging with 56 risk elicitation trials using six different sorts of tasks. The design was entirely within subject, with variation occurring in price and probability ordering, task block ordering, and within task block monotonicity/randomness. See VRE22 for a full characterization of the design.
2.1 Experiments
In a given elicitation task, a subject chose a bundle (x, y) of Arrow securities; the bundle delivers x in state X (probability ) and y in state Y (probability ). The x and y securities have prices of and , respectively. This means the agents solve the maximization problem
according to standard decision theory. The endowment m is set in each trial such that the corner bundle for the cheaper security holds 100 units of the said security. Here, is the agent’s smooth, strictly increasing Bernoulli function, representing her preferences over the securities’ payout.
After solving the first-order conditions, the Lagrangian multiplier satisfies the following pair of equivalencies:
which when rearranged yield a new statement of marginal rate of substitution
As such, VRE22 defined statistic L as the negative logarithm of the MRS:
A couple of special cases arise from such a definition. First, an L of 0 relates to the price ratio of being the reciprocal of . Second, a risk-neutral agent’s preference of equaling some positive constant is only satisfied at ; corner solutions are chosen when such a requirement is not met by the decision’s corresponding budget line. For CRRA preferences (as assumed in VRE22) with some coefficient of relative risk aversion , the agent’s MRS is , yielding the equation
This allows the elicitation and recovery of an agent’s at the decision level via the use of the decision space’s L. Intuitively, an increase in the magnitude of L can be thought of as increasing the obviousness of which security to have more of in an agent’s portfolio.
While L serves as the main regressor in VRE22’s extraction of subjects’ elicited risk aversion , this paper uses L for establishing a measure for FOSD violation severeness. More specifically, a threshold is placed on the measure for each decision, where choices yielding an estimate below such a threshold indicates a major violation of FOSD. Each trial seen by each subject can be associated with a single value of L.
Of the six sorts of tasks considered, five of them offer opportunities for FOSD violations: Holt–Laury, Budget Line, two variations of a new task named Budget Jars, and a spatial version of Holt–Laury named Budget Dots–Holt–Laury. The Holt–Laury (HL) task, originating from Holt and Laury (Reference Holt and Laury2002), is a text-based multiple price list which has six (traditionally 10) consecutive choices between two lotteries. The Budget Line (BL) task, per (Choi et al., Reference Choi, Fisman, Gale and Kariv2007), asks subjects to choose a bundle along a budget line. Budget Jars, an elicitation task developed in VRE, has subjects begin with a “jar” of cash and use sliders to spend the cash on two Arrow securities, with (BJ) and without (BJn) cash retention allowed. The final task type, Budget Dots–Holt–Laury (BDHL), portrays each of the six lines of HL as a separate budget line, with the two feasible choices appearing as dots on the line.
2.2 Simulations
Along with the experimental data described above, I use simulation data from VRE22. The simulations provide estimates for automated agents making choices across the same risk elicitation tasks as the human subjects while following behavior akin to that of random coefficient models (Wilcox Reference Wilcox2008; Apesteguia and Ballester Reference Apesteguia and Ballester2018).
Each simulated run has a batch of automated agents making choices across the same 56 elicitation tasks as seen in the human subject sessions. Each agent has a task-specific “true” value of , tied to the matching human subject’s percentile within the distribution in each task. An independent draw is made for each of the decisions from a normal distribution with the mean set as the agent’s task-specific “true” and the standard deviation matching the task-specific variation in the human data. As such, each simulation creates a parallel data set to that produced by the human subjects. A set of 1000 such simulations were run, against which the human data is compared and ranked.Footnote 2
3 FOSD characterization
Suppose that and while . No matter what her risk preferences, an agent facing these prices and probabilities should never choose a point on the budget line with . For example, suppose she considered choosing , exhausting her budget Since the states are equally likely, she would be just as happy with (15, 7.5), no matter what her Bernoulli function is. But the portfolio (15, 7.5) costs only 10.5, so she could afford to spend 1.5 more on either Arrow security and be strictly better off than at .
The general result is expressed in terms of first-order stochastic dominance (FOSD). Recall that lottery A (strictly) FOSDs lottery B iff for all x, with strict inequality for some x. The definition refers to the cumulative distribution function , the probability that the realized payoff in lottery Z is no greater than x. Recall also (e.g., Mas-Colell, Whinston, and Green Reference Mas-Colell, Whinston and Green1995, p. 195) that every expected utility maximizing agent prefers lottery A to B iff A FOSDs B.
Proposition 1
A choice (x, y) on the budget line is strictly first-order stochastically dominated by another choice on the same budget line iff
a. one Arrow state (e.g., X) is more likely and its security is less expensive (e.g., and , with at least one of these comparisons strict; and
b. the choice includes strictly less of the less-expensive–more-likely security (e.g., ).
See Appendix A for a proof,Footnote 3 which can be generalized in a straightforward manner to cover Prospect Theory with symmetric probability weighting as well as Disappointment Aversion and some other generalizations of expected utility theory.
The Proposition tells us that every choice on the budget line can be rationalized by some Bernoulli function if the more likely state has a higher price, or if . But some choices will be dominated when prices are equal and probabilities differ, or the reverse, and when the more likely state has a lower price. In those cases, I can test for the rationality of subjects without committing to a functional form.
With the above proposition defined, the major violation cutoff [ ] mentioned before can be properly interpreted. Among decision spaces that satisfy the conditions in Proposition 1, those with x as the less-expensive–more-likely good will have a positive L, while those with y depicted as such will have a negative L. Thus, in decisions where FOSD violations are possible, if L is positive, then (weakly) more x should be purchased than y which happens to satisfy . Similarly, should be satisfied in cases as more y should be purchased than x. Combined, these conditions result in when no FOSD violation is made, and therefore FOSD violations are associated with negative values of .
4 Empirical results
BL |
BJ |
BJn |
HL |
BDHL (0.81) |
BDHL (0.58) |
|
---|---|---|---|---|---|---|
Opportunities |
1960 |
1278 |
1247 |
280 |
70 |
70 |
Violations |
263 |
131 |
135 |
23 |
8 |
13 |
(Sim. Avg.) |
141 |
197 |
146 |
13 |
6 |
6 |
(Sim. Perc.) |
100 |
0 |
15 |
100 |
86 |
100 |
(Random) |
761 |
497 |
484 |
253 |
63 |
63 |
Major Violations |
17 |
6 |
16 |
– |
– |
– |
(Sim. Avg.) |
50 |
59 |
57 |
– |
– |
– |
(Sim. Perc.) |
0 |
0 |
0 |
– |
– |
– |
(Random) |
233 |
188 |
182 |
– |
– |
– |
“Opportunities” is the number of trials for each task that allowed violations of FOSD. “Violations" is the number of such violations. “(Sim. Avg.)” reports, to the nearest integer, the average number of violations in each task across 1000 Monte Carlo simulations. “(Sim. Perc.)” is the percentile the human data falls into within the 1000 trials. “(Random)” gives the expected number of violations given i.i.d. uniformly distributed random choices in each task. A violation (x, y) at L is deemed “major” if . Counts for 140 subjects were used in VRE22 analysis (check VRE22 for subject drop explanation). Only half of the subject pool interacted with BDHL
Table 1 shows the overall frequency of dominated choices in the experiment of Friedman et al. (Reference Friedman, Habib, James and Williams2022) as well as two benchmarks. The first row tallies in each panel report human subject choice frequencies, while the second and third rows report average simulated violation counts and the experimental data’s percentile among the simulated data. The final row in each panel reports the expected number of violations were random choices to be used.
Multicrossings in six-row HL or BDHL trials imply dominated choices (see Appendix C), and these appear in the Table’s last three columns. The HL violation rate is 8.2%, which is slightly lower than those found in recent studies such as Charness et al. (Reference Charness, Eckel, Gneezy and Kajackaite2018), though the HL task in VRE22 yields fewer chances to multicross.Footnote 4 BDHL follows relatively closely in both trials (11.4% violation rate) and trials (18.6%). The other columns report first-order stochastic dominance violations in the remaining tasks, where Proposition 1 applies. A violation is deemed “major” if its log ratio lies outside the rectangular hyperbola Table 1 shows a fair number of minor violations of FOSD, but rather few major violations. Table 2 in Appendix B looks at tighter criteria for major violations and confirms that a large majority of actual violations are small, due to clicking just a few dozen pixels away from an undominated choice in the BL task, or to purchasing just a little of an asset that is more expensive but not more likely in the BJ tasks. To summarize,
Result 1. Dominated choices are uncommon in all tasks, and only about 1% of observations in relevant tasks are major violations of first-order stochastic dominance (FOSD).
Additionally, I check these counts against two theoretical benchmarks. The first check makes use of a set of 1000 Monte Carlo trials simulated in the style of Apesteguia and Ballester (Reference Apesteguia and Ballester2018).Footnote 5 I find that the human subjects violated more often than the simulated agents in the majority of investigated tasks; the human subject data set fell in the 86th percentile for BDHL (price = 0.81) violations and had more violations than all 1000 simulated data sets for BL, HL, and BDHL (price = 0.58) trials. For the two Budget Jar tasks, however, the human data set fell below all simulation data sets when cash can be retained (BJ), and filed in at the 15th percentile among the simulated data sets when cash retention was not allowed (BJn). Human data reported fewer major violations than any simulated data set for each possible task type. Uniform random choice serves as the other main benchmark. In all tasks, the human subjects made violating choices far less often than agents choosing randomly.
At the subject level, violation counts varied widely, ranging from 0 violations to 16. The number of elicitation trials which allowed for FOSD violations seen by each subject varied based on treatment/session, either being 33, 37 or 38.Footnote 6 Within task, each subject had two FOSD-possible trials in the HL task, 0 or 1 in BDHL ( ), 0 or 1 in BDHL ( ), 13–15 in BL, 8–12 in BJ, and 8–10 in BJn.Footnote 7 Fig. 1 shows cumulative density functions for subject-level violation percentages, as well as major violation percentages and task-specific percentages. While average violation rate at the aggregate level is roughly , the subject level data shows individual rates can be as high as just over (16 violations in 38 opportunities), though this is rare. Zero violations were made by of subjects, one violation by another , and fewer than by of subjects. When focusing on major violations, no subject made more than six such choices and did not commit any major FOSD violations.
At the task–subject level, variation across tasks appears at low violation rates. Within the HL/BDHL cluster of trials, one-time violation rates of HL sits between violation rates for both BDHL tasks, though all three reveal at least of the subjects make no violations.Footnote 8 In the cluster of more continuous trials, BL/BJ/BJn, separation appears early on. Nearly twice as many subjects make at least one violation in BL as they do in BJ/BJn, with a sizeable gap persisting until around a violation rate.
Result 2. Subject-level violation percentages vary widely, while major violations are made by less than a quarter of the subjects.
5 Conclusion
I characterize FOSD violation in an important set of tasks. Using data from Friedman et al. (Reference Friedman, Habib, James and Williams2022), I investigate FOSD violation rates across several elicitation methods. Violations are relatively uncommon, falling into the range generally seen in the literature, while major violations are very rare across all task types studied. Human subjects make violations more often than Apesteguia and Ballester (Reference Apesteguia and Ballester2018) inspired simulated agents in most tasks, yet human-made violations are generally much less severe.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. This paper made use of data generated in Friedman et al. (Reference Friedman, Habib, James and Williams2022), which was funded by the National Science Foundation via grant SES-1357867.
Data availability
Data are available upon request.
Appendix A: Proof of Proposition 1
A budget line is the set of lotteries satisfying where m is an (implicit or explicit) endowment of cash, and and are the prices of the two Arrow securities, with state probabilities and
Recall that a lottery L FOSDs another lottery M if their cumulative distribution functions (cdf’s) satisfy for all , and that the lottery ordering is strict if the inequality is strict for some .
Proof
First, consider the case and , and suppose that . The cdf for lottery (x, y) is
We will construct another lottery (a, b) on the same budget line as (x, y) in two steps, and show that it strictly FOSDs (x, y). First, set and , and let G be its corresponding cdf. Then, for and , but for , so the lottery weakly FOSDs (x, y). Now set , where by hypothesis, and let H be the cdf for the lottery (a, b). Clearly except for where . Thus, (a, b) strictly FOSDs and thus, by transitivity, strictly FOSDs (x, y). To complete the proof for the present case, we need only verify that the expenditure on (a, b) is the same as on (x, y):
The other cases have very similar proofs. For example, if and , then the conclusion follows from the fact that strictly FOSDs (x, y). Of course, we can only guarantee weak FOSD of (x, y) with when both and . To show that (x, y) with is FOSD’d when and , we use precisely the same approach interchanging the roles of X and Y.
To complete the proof, we need only show that no lottery on the budget line strictly FOSDd when (i) and or (ii) and , and to check subcases where the inequalities are weak. Of course, the arguments are the same for (ii) as for (i) due to the symmetric roles of X and Y, so it suffices to consider only case (i). For this case, let F, G be the cdfs for lotteries on the same budget line. Since the line is negatively sloped, one of the points, say (x, y), is northwest of the other, so and . There are now three subcases.
1. Both points are above the diagonal . Since , we have . It follows that for but for Hence, neither point FOSD’s the other.
2. Both points are below the diagonal . Since , we have . It follows that for but for again, there is no FOSD ranking.
3. but . We cannot have , as this would imply that the budget line has -slope but the hypothesis implies -slope . The other three orderings and are possible, but each implies a change in the sign of . For example, with , we have for but for .
The subcases where the inequalities are weak follow from taking limits as and .
Appendix B: Additional tables/figures
B.1: Major violation cutoff robustness
Table 2 shows the progression of violations over a subset of cutoffs . As the criteria for major violations, , weakens from -1 toward 0, the number of violations naturally increases. Even at , over half of the BL violations are still not considered major violations, indicating the majority of violations are from being only a handful of pixels away from what was likely intended to be a choice along .
FOSD major violations over range of cutoffs |
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Major Cutoff c |
− 0.05 |
− 0.1 |
− 0.2 |
− 0.3 |
− 0.4 |
− 0.5 |
− 0.6 |
− 0.7 |
− 0.8 |
− 0.9 |
− 1 |
BL major violations |
92 |
73 |
49 |
36 |
28 |
27 |
24 |
23 |
21 |
17 |
17 |
BL major random |
715 |
646 |
5440 |
466 |
410 |
366 |
330 |
300 |
275 |
252 |
233 |
BJ major violations |
57 |
38 |
21 |
17 |
15 |
12 |
8 |
7 |
7 |
6 |
6 |
BJ major random |
477 |
445 |
392 |
350 |
317 |
288 |
263 |
241 |
222 |
204 |
188 |
BJn major violations |
58 |
46 |
32 |
25 |
22 |
21 |
20 |
19 |
17 |
17 |
16 |
BJn major random |
466 |
433 |
380 |
339 |
306 |
278 |
254 |
233 |
214 |
197 |
182 |
Each column header is a different cutoff value c for the inequality
Appendix C: HL FOSD characterization
Take a trial of HL, where each row is a choice between two lotteries, A and B. Let lottery A be the safe lottery (closer to ) and B be the risky lottery (closer to a corner of the budget line) in each row. Each row of the trial can be characterized as follows: (x, y) with probabilities versus with probabilities , where is the state probability for state j in row i. In the variants of HL used in this paper, the following hold: , , , , for , and for .
Suppose a subject multicrosses, meaning B is chosen in some row m, while in some row , A is chosen. *Note that each subject is assumed to have started with a choice of A. Even if in practice a subject selects B in row 1, he is assumed to have selected A in a preceding row had it been shown.* I conjecture that choosing A in row m and B in row n (call this choice AB) FOSDs choosing B in row m and A in row n (call this BA).
Assuming the set of row choices, not including rows m and n, in the two scenarios are the same, we can simplify the relevant payoffs for AB and BA such that row m and n will be chosen as the paying lottery with equal probability. Thus, we can define the cumulative density functions for AB and BA and call them and , as follows:
and
Thus, we can see for all values of z except . Over this union, is clearly true, thus we have . By definition, AB FOSDs BA.
This sketch can be expanded to show more severe multicrossings (more than two crosses) are also dominated by a reordering which forms a single crossing.
Appendix D: Simulation process
Gamma distributions: .
1. For each human subject i, divide their data into three subsets: BL/BJ/BJn tasks (continuous tasks ), HL tasks, and BDHL/BDEG tasks. Each screen in these subsets has an implied gamma associated with it. Thus, each subject has a set of continuous task implied gammas, a set of discrete task implied gammas, and a set of BD task implied gammas (if in the appropriate session type). Each of these sets has its own estimation process. As BDEG is not discussed in this paper, its process will remain in VRE22. These are briefly summarized as follows:
• Continuous (BL, BJ/n): We use
(6)and(7)for single and multiple trial extraction. i represents subject, t represents trial number and represents task type. A weighted average of the elicited gammas across tasks provides each subject’s .• HL: We use the traditional method of eliciting the crossover point in the HL list, unless the subject is inconsistent, in which case a logit estimation process occurs (see Section 5.3 of VRE22).
• Budget Dots: For BDHL, we use the same extraction process as HL.
2. For each of these implied gamma sets, take the average of the implied gammas to get a subject’s individual-specific gamma means ( , , and ).
Gamma distributions: .
For each subject, a task level of variability is established as:
• Continuous: For each of the tasks, we use the subject’s standard error from estimating Eq. (7).
• HL: If the subject is inconsistent, then we use the standard error from the estimating the logit model using that subject’s HL data. If not, then we use 0.
• Budget Dots: For BDHL, we use the same process as HL.
Simulation ( draw) process.
• For each trial (row in the simulated data set), using the matching subject ID and task type, we draw a gamma to be associated with the said row (six draws per row in HL trials).
• For each row, draw from a normal distribution where the central tendency is the appropriate and the standard deviation is the subject’s task-specific as described above.
– Continuous: The used is the for that specific subject.
– For HL: The used is the from the distribution of HL that is the same percentile as that subject’s percentile in the Continuous distribution.
– For BD: The same process as HL is used, but with BDHL-appropriate distributions.
• For each drawn gamma, we back out the (x, y) pair that the subject would have chosen given that this is drawn gamma.
• Given these (x, y) pairs, we perform the same extraction process as was done with the human data.
The above process creates one simulated data set parallel to the human data set, with the same number of simulated agents as there are human subjects in the experimental data set. We run 1000 such simulation runs.