1 Introduction
In decisions in various contexts, individuals do not strictly adhere to standards of rationality, in that judgments and choices are influenced by many irrelevant factors such as changes in presentation format (e.g., Reference Kahneman and TverskyKahneman & Tversky, 1984), the presence of random anchors (e.g., Reference Tversky and KahnemanTversky & Kahneman, 1974), and many more. It is, however, of course socially desirable for the outcomes of legal cases to depend solely on laws and relevant facts and for influences of extraneous factors to be minimal. Decisions should, for instance, not be influenced by the order in which cases are presented or by whether the judge is exhausted or hungry.
Still, it has been demonstrated that judges show the same fallacies and biases as other individuals do (e.g., Reference Englich, Mussweiler and StrackEnglich, Mussweiler & Strack, 2006; Reference Guthrie, Rachlinski and WistrichGuthrie, Rachlinski & Wistrich, 2000, 2007). In psychology, the prevailing descriptive models consequently take into account that legal decision making does not follow a purely rational calculation, but involves some constructive and intuitive element, making it potentially malleable to irrelevant factors (e.g., Reference Pennington and HastiePennington & Hastie, 1992; Reference SimonSimon, 2004; Reference ThagardThagard, 2006).
Similarly, in the legal literature the traditional view that legal judgments can be mechanically or logically derived from official legal materials — such as statutes and reported court cases — in the vast majority of instances has been challenged by legal realism (e.g., Frank, 1930) maintaining that “legal doctrine […] is more malleable, less determinate, and less causal of judicial outcomes than the traditional view of law’s constraints supposes” (Reference SchauerSchauer, 2013). Legal realism holds that — aside from official legal materials — extraneous factors influence legal rulings such as ideology or policy preferences of the judge, general judgment biases, and — similar to current approaches in psychology — it has been argued that rulings are partially guided by intuition (Hutcheson, 1929; see Schauer, 2013, for a review). Legal realism has a long history and many facets but it is often caricaturized by the phrase that “justice is what the judge ate for breakfast”, which also has become a trope for legal realism in general.
In summary, there is clear evidence that judicial decision making is influenced to some degree by extraneous factors, which is also reflected in prevailing theories in law and psychology. Danziger, Levav and Avnaim-Pesso (2011a) (hereafter DLA) aim to add to this body of evidence by demonstrating that deciding multiple cases in a row influences legal outcomes of later cases. DLA analyzed 1,112 legal rulings of Israeli parole boards that cover about 40% of the parole requests of the country. They assessed the effect of the serial order in which cases are presented within a ruling session and took advantage of the fact that the ruling boards work on the cases in three sessions per day, separated by a late morning snack and a lunch break.
DLA found that the probability of a favorable decision drops from about 65% in the first ruling to almost 0% in the last ruling within each session (Figure 1). The rate of favorable rulings returns to 65% in the session following the break. DLA argue that this effect of ordering shows that judges are influenced by extraneous factors and they speculate that the effect is caused by mental depletion (Reference Muraven and BaumeisterMuraven & Baumeister, 2000). The argument is that, after repeated decisions, judges become exhausted, hungry or mentally depleted and use the simple and less effortful strategy to stick with the status quo by rejecting the request resulting in what could be called an “irrational hungry judge effect”.
Considering the tremendous consequences for human beings, the large magnitude of the effect, and the fact that the investigated boards decide almost half of the parole requests in Israel, these results are unexpected and potentially alarming. Consequently the article has attracted attention and the supposed order effect is considerably cited in psychology (e.g., Reference Evans, Dillon, Goldin and KruegerEvans, Dillon, Goldin & Krueger, 2011), law (e.g., Schauer, 2013), economics (e.g., Kamenica, 2012), and beyond (e.g., Gibb, 2012; Reference Yamada, Camerer, Fujie, Kato, Matsuda, Takano and TakahashiYamada et al., 2012).Footnote 1 The fact that — in line with the trope for legal realism mentioned above — eating (or not) is considered important for legal rulings according to DLA might have additionally contributed to the tendency to cite it heavily.
2 Critical Evaluation
One further factor that most likely contributed to the popularity of the article is the large magnitude of the effect. A drop of favorable decisions from 65% in the first trial to 5% in the last trial as observed in DLA is equivalent to an odds ratio of 35 or a standardized mean difference of d = 1.96 (Reference ChinnChinn, 2000). This is more than twice the size of the conventional limit for large effects. The meta-analytic estimate for effect of mental depletion, which is considered as potential explanation for the drop, is d = –0.10 to 0.25 (publication-bias corrected), meaning that on average only small effects of mental depletion can be expected (Reference Carter and McCulloughCarter & McCullough, 2013).Footnote 2 Similarly, a recent multi-lab registered replication study involving 23 labs (N= 2,142) found an effect of d = 0.04 and not significantly different from zero (Reference Hagger and ChatzisarantisHagger & Chatzisarantis, 2016). Hence, under the assumption that mental depletion is causing the findings, the magnitude of the effect observed by DLA is surprisingly large. It might, however, be argued that manipulations of depletion and exhaustion might be stronger in reality than in the lab causing stronger effects.
Considering the latter issue and taking into account that the potential costs for giving wrong advice are high, it seems justified to take a closer look at the results and the analyses on which they are based.
2.1 Non-random Order of Cases
One crucial assumption permitting conclusions concerning the effect of case ordering is that case ordering is random or at least not driven by hidden factors that are not taken into account in the analysis. If more severe cases went first, for example, and severe cases at the same time reduced the likelihood of favorable decisions, spurious correlations could result. In their regression analyses, DLA take this concern into account by including reasonable control variables for substantive factors that might influence both ordering and rulings. They show that the results remain robust when controlling statistically for severity of offence, previous imprisonment, months served, participation in a rehabilitation program, and proportion of previous favorable decisions.
Still in a direct reply to DLA, it has been argued that case order is influenced by systematic factors that DLA did not account for (Reference Weinshall-Margel and ShapardWeinshall-Margel & Shapard, 2011). Specifically, Weinshall-Margel and Shapard (2011) conducted informal interviews with persons involved in the parole decision process (including a panel judge) and came to the conclusion that case ordering is not random. They argue, among other things, that the downward trend might be due to the fact that, within each session, unrepresented prisoners usually go last and are less likely to be granted parole than prisoners who are represented by attorneys. In a response, Danziger, Levav and Avnaim-Pesso (2011b) show that the downward trend also holds when controlling for representation by an attorney although they do not report whether the magnitude of the effect remains the same, which seems unlikely given the correlation pattern reported above. Note also the more general methodological problem that statistical control need not remove the full effect of a variable measured in rough categories (e.g., severity of offence) or with error.
2.2 Decision to Take a Break
A second, potentially more subtle, concern is that results might be driven by factors that systematically influence judges’ decisions to take a break. DLA analyze whether properties of a case influence the likelihood of taking a break afterwards. They report that the substantive case properties mentioned above do not predict when a break is taken. Furthermore, they argue that judges do not know details of the upcoming case such as whether the prisoner has a previous incarceration record or not. Interestingly, Weinshall-Margel and Shapard still report their interviewees to state that judges might aim to finish a set of cases (e.g., to complete all cases from one prison) within a session. This indicates that some organizational planning occurs. At first glance, however, it seems hard to understand how this mere organizational planning of when to end a session without taking into account any details of the case could contribute to the downward trend. I will discuss this issue in detail in the next section.
In summary, in their reply DLA (2011b) argue that they could rule out all alternative explanations and therefore uphold their conclusion that parole decisions are influenced by legally irrelevant factors in that repeated choice is causing a decreasing likelihood for making favorable decision as the session progresses.
3 Rational Time Management and Selective Dropouts
If we accept that the effect of ordinal position also holds after all reasonable substantive factors that might have influenced ordering and decisions to take a break are ruled out, we must still ask whether more subtle factors could explain the observed effects, without assuming that judgments are influenced to a large degree by irrelevant factors. One major concern is the effects of selective dropouts and rational time management when to end a session in order to complete cases or sets of cases within it. Selective dropout in this context refers to the possibility that — for whatever reason — cases with favorable rulings have a lower likelihood to be in the sample of cases with higher ordinal number in a session than cases with unfavorable rulings.
DLA report that favorable rulings take longer (M = 7.37 min, SD = 5.11) than unfavorable rulings (M = 5.21 min, SD = 4.97). The number of cases completed in each session varies between 2 and 28Footnote 3 and DLA present rulings for 10 to 13 cases within each session, with the last ruling having a probability of zero (or in one case close to zero) to be favorable, respectively. Consequently, the number of observations within each session decreases with ordinal position and the last observations in a session are likely to consist of a few observations only. Considering that favorable rulings take longer than unfavorable rulings, the dropout is not random. On average, sessions that consist of mainly unfavorable decisions will allow judges to make many rulings. Therefore, in the reduced sample of observations constituting the data for higher ordinal positions, the relative frequency of rulings from sessions with mainly unfavorable decisions increases.Footnote 4
Judges have to finish cases before they take a break. To avoid starving, they are likely to avoid starting potentially complex cases (or sets of cases) directly before the break. It seems reasonable to assume that simple surface features that are available before investigating the case in detail (e.g., amount of material, kind of the request, representation by an attorney, some specifics of the attorney, the prison, or the prisoner) allow judges roughly to estimate the time the next case will take above chance level. Importantly, such surface features could also be unrelated to the content features that could produce non-random ordering of cases and that DLA already control for in their analysis.
Still, as mentioned above, it is hard to see whether and to what degree not starting overly long cases before a break would lead to the observation of downward sloping effects without assuming that judgments are influenced by extraneous factors at all. I conducted simulations to make the effect visible.
4 Simulating Choice Patterns by a (hypothetical) Rational Judge
I simulated the rulings of an ideal judge who makes choices without errors and biases. I assume that she has a rough time limit for each session and works on cases until recognizing that a case would go over this limit. The case that would be too long would not be solved any more in the current session, but it would be the first case in the next session.Footnote 5
The results indicate that, following the approach by DLA, a rational judge working on cases that are presented in random order would show a strongly decreasing probability of a favorable decision towards the end of the session. Even the shape of the curve and the magnitude of the effect are comparable to that observed by DLA. Simulations assuming normally distributed decision times (Figure 2, right) or more realistic positively-skewed decision times that follow a Weibull distribution (Figure 2, left) lead to similar conclusions, and repeated simulations show that the qualitative pattern of results is robust to changes in distributional assumptions. As one could expect, however, estimations become unstable for higher decision numbers due to the low number of remaining observations (see Figure 2, size of circles), resulting in occasional peaks to high or zero percentages. Not surprisingly, statistical analysis reveals that the downward trend is significant and that first decisions are more favorable than later ones, as it was found by DLA.
Figure 3 shows why this effect appears for the normally distributed case. Distributions of decision time have different means with favorable cases taking longer than unfavorable ones (left panel). Consequently, the relative frequency of favorable cases (in all cases) that would still fit in the session decreases with remaining time. In our example, if 15 minutes remain in the session, essentially all cases would still be started since such long times are rare both for favorable and unfavorable cases (Figure 3, left). The ratio of favorable and unfavorable cases therefore roughly reflects the overall ratio in the population. For 5 minutes remaining, however, only 12% of the favorable cases could still be included in the session, whereas the respective proportion for unfavorable cases is much higher at 46%. Hence, the relative frequency of favorable cases, as compared to all cases, decreases with the time that remains causing selective dropout.
The cumulative probability distribution for favorable decisions (taking into account differences in base rates for both events) is plotted in the right panel of Figure 3. For long remaining times, the proportion of favorable cases is close to the base rate of 36%. For short remaining times, the proportion approaches values close to zero. With an increasing decision number within a session, the remaining time decreases, causing the downward sloping effect. Since sessions can stop after 1 to 14 decisions, the stopping effect is not only found after case 14, but already to a smaller degree for earlier cases. Hence, the probability can be expected to drop from 36% to zero percent for later rulings.
It remains to be explained why the proportion of favorable rulings (in both the simulation and the DLA data) peaks beyond 36% in the first round. This “beginning effect” is indirectly caused by the above mechanism as well, since the session is more likely to end before a favorable ruling than before an unfavorable ruling. The probability mass that is missing in the last decision of the previous session adds to the probability mass of favorable cases in the first decision of the next session (either on the same day or the first session of the next day). If one assumes that planning is not only done for single cases, but also occasionally concerns sets of cases (Reference Weinshall-Margel and ShapardWeinshall-Margel & Shapard, 2011), this would explain why the probability of a favorable decision in the second and third ruling in a session is also above the base rate of 36%.Footnote 6 Furthermore, the observation by DLA that the overall length of sessions varies considerably does not speak against the planning explanation since the effect also holds under the assumption that judges have implicit time limits that vary from session to session. Also, it should be noted that the planning described here is merely organizational and does not require any foresight concerning how the case will be decided. All it requires is that the judges have a rough estimate, whether the next case will be quick or take longer.
5 Further Factors: Autocorrelation and Censoring
After demonstrating that rational time management and selective dropout can cause dramatic drops in favorability ratings, the robustness of this finding and the influence of further factors should be investigated. Two factors are considered. First, DLA report that they censor their data, in that the last 5% of the cases in each session are dropped, with the intention of eliminating small samples at higher ordinal positions. Second, as mentioned above (Footnote 4) results from DLA indicate that there is an autocorrelation in the time-series, in that rulings correlate with previous ones. Since the consequences of these factors are again hard to anticipate, I conducted further analyses to explore their effects.
To investigate the effect of censoring, I dropped the last 5% of the rulings within each session in the normal distribution data-set from above (Figure 2, right) and analyzed the data again. Results remained largely the same, but censoring increased the magnitude of the drop (Figure 4, left), which was also observed for the Weibull data set and was consistently replicated in further simulations. Hence, censoring artificially increases selective dropout, and therefore it should not be used when analyzing the effect of ordinal position on favorability rulings.
To investigate the effect of autocorrelation, I generated new data sets (N = 50,000) based on normally distributed response times with the same parameters as above in which, however, rulings correlated with rulings directly before at a low degrees. Figure 4 (right) shows results from a data set with a (first-order) autocorrelation of r = .10 and including censoring as above. Results are generally comparable to the results from the independent data-set, and autocorrelation did not noticeably change the magnitude of the artifact.
6 Rational Time Management without Foresight
One assumption underlying the simulations reported so far is that judges plan ahead and do not start a case that would be too long to finish within the time limit for a session. This planning would require some degree of foresight in that judges (or other people administratively involved) generate estimates of the time required for finishing the upcoming case. Thereby estimates do not need to be exact to generate the artifact and can be based on rough surface cues as mentioned above as well. Also cues for time management might be (consciously or unconsciously) conveyed by administrative persons involved in the process of handling cases (Reference PfungstPfungst, 1911). As mentioned above, DLA state that details about the upcoming case are not known to the members of the board in advance. Since, however, DLA did not have full control over the situations, the existence of such cues cannot be entirely ruled out.
Still, presuming foresight in many cases is admittedly a relatively strong assumption. I therefore tested whether the analysis conducted by DLA would also generate similar artifacts without foresight in that judges stop after a case went over the available time limit.Footnote 7 When conducting this analysis without censoring and autocorrelation, all artifacts disappear, as one would expect. Interestingly, however, when including censoring and autocorrelation a downward sloping effect appears again (Figure 5). The reason for this is that cases with favorable rulings are more likely to hit the time limit than cases with unfavorable ruling due to the mere fact that they are longer. Dropping 5% of the cases at the end means often postponing this last case, which is more likely favorable than unfavorable. Hence, censoring causes selective dropout of favorable cases even without foresight and artificially induces a downward sloping effect of favorable ruling. The effect was, however, smaller than in the simulations with foresight and caused a drop of roughly 15% only.
7 Discussion
In a comprehensive analysis of legal rulings of Israeli parole boards DLA identified that the proportion of favorable rulings decreases with serial order within a session but goes back to the initial level after a session break that includes eating a meal. This finding is important as well as potentially alarming, since both serial order and food supply are clearly extraneous factors that should not affect whether a parole request is decided favorable or not. DLA argue that their findings indicate that extraneous variables influence judicial decisions and cautiously interpret their finding with reference to a mental depletion account.
I critically revisited this interpretation and tested whether the core of the conclusion — namely that order and mental depletion causally influence the outcome of legal judgments — can be made on the basis of the presented data. Specifically, I tested whether the observed downward trend could also results from selective dropout of favorable cases due to rational time management, censoring of data and autocorrelation. The analysis shows that large parts — but admittedly not all aspects, see below — of the findings could be accounted for by this explanation.
The simulations show that the seemingly dramatic drop of favorable rulings from 65% to almost 0% towards the end of each session does not conclusively indicate bias or error in judicial decision making. A drop of comparable — although somewhat smaller — magnitude would be produced by a (hypothetical) rational judge who aims to avoid starting work on cases that could not be completed in the time that remains in the current session. Furthermore, the simulations revealed that the practice of censoring data within a session is problematic and artificially induces a downward sloping effect even without foresight and under the less restrictive assumption that judges stop each session after a time limit has been passed. Hence, the analyses by DLA do not provide conclusive evidence for the hypothesis that extraneous factors influence legal rulings.
7.1 Caveats
It has to be acknowledged that the analyses reported in this paper do not preclude that serial order and mental depletion might have affected the legal judgments analyzed by DLA. The analysis, however, demonstrates that there is a possible alternative explanation for large parts of the results within a rational framework that does not require the assumption of any influence of extraneous factors. The strong downward-sloping effect could — at least in parts — simply reflect a statistical artifacts.
Still, rational time management and selective dropout cannot account for all aspects of the data by DLA. First, the magnitude of the effects reported in the simulations was somewhat smaller than the magnitude of the original effects.Footnote 8 This was mainly due to the fact that, second, in the original data the percentage of favorable rulings started at a higher level than in the current simulations (i.e. 65% instead of 45%). Particularly, the high starting rates at the beginning of the day (and not only after the breaks) are hard to explain by my account since postponing cases to a different day and panel seems not overly likely.Footnote 9 Third, since the statistical effects described here are driven by ordinal position, they cannot easily explain the effects of time on favorable rulings reported in DLA as well. Fourth, the shape of the curves differ in some details in that the empirical curve tended to be smoother whereas the simulated data showed stronger drops at the beginning and the end a flatter area in between. Finally, given that according to DLA the setting might have precluded direct foresight concerning the upcoming case to some degree, the remaining effects of rational time management could be estimated to account for a drop of 15% to 45% only.
In sum, rational time management and selective dropout — although potentially being important — can explain the findings by DLA only in parts. Hence, further factors may exist that contributed to the observed downward-sloping effect. The remaining differences could potentially be explained by other methodological factors such as the issue of non-random ordering in that prisoners represented by attorneys went first (Reference Weinshall-Margel and ShapardWeinshall-Margel & Shapard, 2011). Alternatively, extraneous factors such as causal effects of serial case ordering and mental depletion might have played a role. Since the data are not availableFootnote 10 for further detailed analyses and the exact circumstances under which the rulings were made cannot be fully reconstructed, these issues have to be addressed in further studies.
The analyses reported here indicates that the effect of serial order and mental depletion is overestimated in the original work by DLA. Rational time management concerning when to take a break and effects of non-random ordering of cases with represented prisoners going first identified by Weinshall-Margel and Shapard (2011) are lumped together with potential effects of serial order and mental depletion so that the latter are overestimated. Disentangling these influences should lead to more reasonable (smaller) estimates concerning the magnitude of the effect. According to previous findings on mental depletion, the “irrational hungry judge effect” should at best be small in magnitude (if existing at all; see Reference Carter and McCulloughCarter & McCullough, 2013), which might render the observed extraneous influence less relevant from a practical point of view and the need for state interventions less urgent.
More generally, the analysis shows that sometimes there is a nonobvious rational basis for irrational-looking behavior. Computer simulations as well as formal mathematical analyses are measures to identify them. Such analyses have revealed, for example, that whole strands of literature supposedly demonstrating irrational behavior such as spreading apart effects after choice (Reference Chen and RisenChen & Risen, 2010), unrealistic optimism (Reference Harris and HahnHarris & Hahn, 2011) or the adaptive usage of simple heuristics (Jekel & Glöckner, in press) are methodological or statistical artifacts that would be shown by completely rational agents as well. I argue that simulations of rational agents and formal mathematical analyses should be used earlier and more intensely in the research process to investigate findings of supposedly hugely irrational behavior before jumping to the conclusion that legal actors — or any other individuals — are irrational.