1 Introduction
This paper is about the benefits of being wrong. The rise of modern sciences began with the falsification of a theory that has been considered the undisputable truth for centuries: heliocentrism. Copernicus, Kepler, and Galileo provided empirical evidence that falsified the notion that earth is the center of the universe. Their work gave rise to the empirical sciences governed by the notion that a proposition has to stand the ground against reality in order to be accepted.
If we really want to know whether a rule or regularity holds in the real world, we must not be satisfied with instances of verification. We have to seek critical tests to approximate truth.
This may appear to readers as a truism. One would expect scientists to seek strong rules (laws) and put them to a critical test. However, we observe a trend towards the very opposite direction, at least in our field of judgment and decision making. Many theories are weakly formulated. They do not come up with strong rules and are, at least to some extent, immune to critical testing. Therefore, the number of theories that peacefully coexist in the literature is constantly growing. How many theories address judgment and decision making? Which one is superior to the others and makes the best predictions? We do not know, and neither may many readers of this article. We believe that theory accumulation and coexistence of weak theories has its origins in the fact that we lack shared standards for theory formulation and publication. We come up with some suggestion for a remedy. To lay the ground for these, we start with a summary of quite an old contribution made by Karl Popper decades ago.
Popper (1934/2005) suggested that, prior to any empirical testing, scientific theories should be evaluated according to their empirical content (empirischer Gehalt), that is, to the amount of information they convey concerning the world. He argues that “[t]he empirical content of a statement increases with its degree of falsifiability: the more a statement forbids, the more it says about the world of experience” (p. 103). Two criteria that determine the empirical content of a theory are their level of universality (Allgemeinheit) and their degree of precision (Bestimmtheit). The former specifies to how many situations the theory can be applied. The latter refers to the precision in prediction, that is, how many “subclasses” of realizations it allows. As we will describe more formally later, the degree of precision is thereby used as a technical term for how much a theory forbids in the situations to which it can be applied. It does not refer to a lack of ambiguity whether a prediction follows from the theory, because an ambiguous theory would not be considered a theory at all in the strict Popperian sense. To avoid confusion, we adopt the translation of this term from the English edition of Popper’s writings, although it might be debated whether it conveys the exact meaning of the concept. An alternative translation could be specificity.Footnote 1
The empirical content of a theory is, a priori, independent of empirical testing and therefore also independent of the theory’s degree of corroboration (Bewährung).Footnote 2 Accordingly, falsifiability is a feature of a theory that can be assessed by means of logical analysis. It pertains to the empirical content of a theory which does not require conducting empirical tests. In the current paper, we adopt the Popperian perspective and reflect on the empirical content of models in judgment and decision making (JDM). We presuppose two things: a) theories (in the Popperian sense) are necessary to generalize findings beyond observed data and therefore theory development is an important part of the daily work of scientists; and b) some central aspects of Poppers’ philosophical theory of science—particularly concerning the definition of empirical content—are valid.
Note that both assumptions are very basic and do not even necessitate accepting the Popperian approach in general. It should, however, also be noted that other parts of Popper’s philosophical theory of science have been debated and/or extended in subsequent and contemporary philosophical and methodological writing. Particularly noteworthy are the strong inference approach (Reference PlattPlatt, 1964) and Bayesian methods to evaluate the degree of corroboration of hypotheses and theories (e.g., Horwich, 1982; Reference WagenmakersWagenmakers, 2007). These developments are also reflected in work on JDM (e.g., strong inference using a critical property testing approach: Birnbaum, 1999; Birnbaum, 2008a, 2008b; and for applications of Bayesian approaches to JDM, see Bröder & Schiffer, 2003a; Reference Lee and NewellLee & Newell, 2011; Matthews, 2011; Nilsson, Rieskamp, & Wagenmakers, in press). We will discuss these developments and core points of criticism in the last section of this paper and aim to rule out some misunderstandings.
2 Evaluation of Theories
2.1 Development and formulation of scientific theories
Popper suggests a standardized process of developing theories and selecting among them prior to empirical testing. It starts with putting forward a tentative idea from which conclusions are drawn by means of deductive logic. Then: (I) these conclusions are compared amongst themselves to test internal consistency; (II) the logical form of the system of conclusions is tested whether they have the character of a scientific theory; and (III) they are compared with predictions of existing theories to determine whether establishing them would constitute a scientific advantage.
Only after a theory has successfully passed evaluation on the logical level can it be subjected to critical empirical testing. In the current paper, we will focus on the steps that come before empirical testing and start our discussion with Poppers third criterion.
2.2 Scientific advantage (III): Empirical content, universality and precision
Popper argues in favor of replacing inductive with deductive testing of theories (later on elaborated by Reference Hempel and OppenheimHempel & Oppenheim, 1948). The question why a phenomenon happened “is construed as meaning according to what general laws, and by virtue of what antecedent conditions does the phenomenon occur” (Hempel & Oppenheimer, 1948, p. 136). A theory can be translated into a set of premises (law-like statements, definitions, etc.) that, given a set of antecedent conditions, allow for deriving predictions concerning empirical phenomena by logical deduction. According to Popper’s notation, theories can therefore be written as a set of general implications of the form (x) (ϕ (x) → f(x)) , which means that all values of x which satisfy the statement function ϕ (x) also satisfy the statement function f(x). Both statement functions can thereby consist of multiple elements or conditions (connected by logical operators such as AND and OR).
Level of Universality. A theoryFootnote 3 has a high level of universality if the antecedent conditions summarized in ϕ (x) include as many situations as possible. An expected value (EV) model, for example, which predicts “when selecting between gambles people choose the gamble with the highest expected value”, has a higher universality than a priority heuristic (PH; Reference BeckerBrandstätter, Gigerenzer, & Hertwig, 2006). PH has multiple antecedent conditions that all have to be fulfilled and therefore reduce universality. Specifically, PH predicts “if people choose between pairs of gambles, and if the ratio of the expected value of the two gambles is below 1:2, and if neither gamble dominates the other then people chose the gamble with the higher minimum payoff”.Footnote 4 The class “decisions between pairs of gambles with a limited ratio in expected values and without dominance”, for which PH is defined, is a subclass of the class “all decisions between gambles”, for which EV is defined. Therefore, EV has a higher level of universality than the PH.
Degree of Precision. A theory’s degree of precision increases with the specificity of the predicted phenomenon or, more formally, with the classes of elements that the phenomenon specification function f(x) forbids. A theory EV+C that predicts that “when selecting between gambles, people choose the gamble with the higher expected value and their confidence will increase with the difference between the expected values of the gambles” is more specific than the EV model mentioned above. It describes the behavior more specifically and all findings that falsify EV also falsify EV+C, but not vice versa. For one dependent measure, a theory is more precise than another one if it allows fewer different outcomes (e.g., it predicts judgments in a smaller range or allows choices for fewer alternatives).
Empirical Content. The empirical content of a theory (in the Popperian sense) increases with universality and precision. A theory that is superior in both aspects to another one has a higher empirical content. The scientific advantage of a new theory that is dominated on both aspects would be zero and the theory should be disregarded even prior to any empirical testing (i.e., Poppers’ criterion III). On the other hand, a theory that dominates another theory on at least one of the two dimensions has “unique” empirical content which could constitute a scientific advantage if it holds in empirical testing. The gist of the concept, however, is that the more a statement prohibits, the more it says about our world.
One possibility ultimately to annihilate empirical content is to formulate the core premises of a theory in terms of existential statements. A statement like “Some decision makers may sometimes use strategy X” can never be falsified because the modus tollens is not applicable. In this case, it makes no sense to search for violations of the theory because there is nothing to be violated. If one ignores the issue of logic—with modus tollens unavailable—any instance of applying another strategy Y is rendered theoretically irrelevant because it is not forbidden by the propositions. Numerous decision makers relying on strategy Y would not count. Detecting a single user of strategy X would verify the existential statement that constitutes the theory (see Reference Betsch and PohlBetsch & Pohl, 2002, for an example).Footnote 5
2.3 Internal consistency (I) and satisfying the character of a scientific theory (II)
Consistency. A theory should be consistent by avoiding inconsistent predictions such as that people chose A and “not A” at the same time. Assume, for example, an imaginary decision theory postulating that people make all decisions concerning which of two cities is bigger by applying one of two strategies: Recognition Heuristic (RH; Reference Goldstein and GigerenzerGoldstein & Gigerenzer, 2002) and Inverse-Recognition Heuristic (I-RH). RH predicts that for all choices between options from which one is known and the other one is not, people select the recognized object without considering further information. Assume that the I-RH predicts for the same comparisons that people select the un-recognized object. Note that the theory does allow the application of both heuristics at the same time. The theory would consequently be inconsistent because it predicts choices for and against recognized objects at the same time. The theory could, however, be made consistent by defining exclusive and non-overlapping subsets of tasks for which the two heuristics are applied.
Satisfying the Character of a Scientific Theory. One temptingly easy solution to solve the above problem would be to define the subsets conditional on the observed choices: the RH is defined only for the tasks in which the recognized object is chosen and the I-RH is defined for tasks in which the unrecognized object is chosen. However, this definition is tautological in that the antecedence conditions defined in ϕ (x) are implied by the phenomenon specification function f(x), and therefore the theory does not satisfy the character of a scientific theory. Another solution would be to define the antecedence conditions in ϕ (x) so that people apply RH vs. I-RH if they have learned that the strategy is more successful in the respective environment. This solves the problem but reduces the universality of the theory in that the theory makes only predictions for cases in which previous learning experiences with the strategies exist and in which the success rates differ.
3 General issues
3.1 Simplicity and parameters of a theory
According to Occams’ razor, it has been argued that—everything else being equal—simple theories should be preferred over complex ones (Reference BeckerBrandstätter, et al., 2006; Reference Hempel and OppenheimGigerenzer & Todd, 1999). However, “simplicity” is a vague concept which can be understood in many different ways (e.g., esthetic, pragmatic; Popper, 1934/2005). Is one model simpler than another because it assumes a particularly simple decision process, such as non-compensatory heuristics, compared to an expected utility model? The definition of empirical content allows specifying the aspect of “simplicity” that is relevant for theory selection and to avoid misunderstandings; for the logical analysis, simplicity essentially reduces to the empirical content and the falsifiability of a theory: according to Popper, a theory is “simpler” than another if a) it is more precise in its predictions, but can be applied at least as broadly as another theory; or b) it can be applied more broadly, although it is at least as precise as the other theory.Footnote 6
Based on this understanding of the concept “simplicity”, some common arguments in JDM might require reconsideration. Comparing non-compensatory heuristics and weighted compensatory models, it is often argued that the former are simpler because they have fixed parameters, while the latter have free parameters and can therefore fit a broader set of behavior (e.g., Brandstätter, et al., 2006, p. 428). According to the Popperian concept of “simplicity”, this statement is valid when comparing models that are fitted to the data (Reference Roberts and PashlerRoberts & Pashler, 2000) and the difference in flexibility has to be taken into account (Reference Pitt, Myung and ZhangPitt & Myung, 2002; Reference Pitt, Myung and ZhangPitt, Myung, & Zhang, 2002). The argument is, however, irrelevant if parameters are assumed to be fixed or can be estimated from independent measures. Hence, accepting the Popperian definition of simplicity, models are not per se “simpler” than others, but this evaluation depends on the way the parameters are (and can be) determined. For example, an expected utility model is as “simple” as a non-compensatory heuristic if a) fixed parameters for the utility and the probability transformation function are used; or b) the coefficients are measured independently or using a cross-prediction procedure.
The “simplicity” of the theory decreases if multiple outcomes are allowed. This can result from the fact that different sets of parameters are possible or multiple strategies are simultaneously allowed without specifying how to select between them. The phenomenon specification function f(x) would consequently contain many logical OR statements. Hence, a theory assuming that persons are risk-neutral, risk-seeking, or risk-averse and all apply the same utility function (e.g., U(x) = x α) with three different sets of parameters (e.g., α =1 for risk-neutral persons; α = 1.2 for risk-seekers; and α = 0.88 for risk-averse persons) would be less simple than a theory assuming one set of parameters for all (α = 0.88, as in Reference Tversky and KahnemanTversky & Kahneman, 1992). However, both theories would level in terms of Popperian simplicity if the risk aversion type of each person is known because it has been measured using a separate scale (e.g., Reference Holt and LauryHolt & Laury, 2002).Footnote 7
3.2 Process models vs. outcome models
Many models in JDM predict choices or judgments only. These outcome models (also called “as-if models”, or paramorphic models) predict people’s choices or judgments, but they are silent concerning the cognitive operations that are used to reach them. Expected utility theory (Reference LuceLuce, 2000; Reference Luce and RaiffaLuce & Raiffa, 1957) and weighted-linear theories for judgment (e.g., Brunswik, 1955; Reference Karelaia and HogarthKarelaia & Hogarth, 2008) are prominent examples. Both are models with a high degree of universality and they are precise concerning their predictions for choices or judgments, respectively (assuming fixed or measurable parameters as mentioned above).
Process models could, however, have a higher precision by making additional predictions on further dependent variables such as time, confidence, information search, and others. A Weighted Additive Strategy (WADD) could, for instance, assume a stepwise deliberate calculation process consisting of elementary information processes (EIPs; Reference Newell and SimonNewell & Simon, 1972; Reference Glöckner and BetschPayne, Bettman, & Johnson, 1988) of information retrieval, multiplying, and summing. It makes predictions concerning multiple parameters of information search (Reference Glöckner and BetschPayne, et al., 1988), decision time, and confidence (Reference Glöckner and BetschGlöckner & Betsch, 2008c). Everything else being equal, the empirical content of a theory increases with the number of (non-equivalent) dependent variables, on which it makes falsifiable predictions. Process theories therefore per se yield potentially higher empirical content than outcome theories.Footnote 8
3.3 Distinct predictions and universality
It seems trivial to assume that a new theory should make predictions that are distinct from previous theories. Following Popper’s criteria for evaluating theories according to their empirical content, however, this is not always true. If a new model has a higher universality and is equal in precision, it could constitute a scientific advantage even without distinct predictions - just by replacing a set of theories that each predicted parts of the phenomena.
Furthermore, if the universality of a theory is reduced to a set of situations in which more universal theories make the same predictions, it becomes obsolete. Assume a theory predicting that people use a take the best (TTB) heuristic (Reference Gigerenzer and GoldsteinGigerenzer & Goldstein, 1996), which states that people decide according to the prediction of the most valid differentiating cue. Further, assume that the theory additionally defines the antecedence conditions that TTB is only used in situations with high costs for information acquisitions (i.e., monetary cost and cognitive costs for retrieving information from memory), and particularly in situations in which these costs are higher than the gain in expected utility of additional cues after retrieving the first cue. Such a theory would make exactly the same choice predictions as an expected utility theory, since expected utility theory would predict utilization of only the first cue under this specific condition either. The expected utility theory, however, is more universal and hence the TTB-theory would be obsolete.Footnote 9
4 Evaluating the empirical content of models in JDM
From the descriptions above, some evaluation of theories in JDM concerning empirical content might have already become apparent. We will summarize and condense them in the following.
4.1 Universal outcome theories
Universal outcome theories are theories that are defined for a wide range of tasks. They have a high level of universality. Universal outcome theories make predictions for choices or judgments but not for other outcome variables. Some may comprise free parameters. The degree of precision is intermediate to high. For example, expected utility theories, cumulative prospect theory (Reference FiedlerKahneman & Tversky, 1979; Reference Tversky and KahnemanTversky & Kahneman, 1992), and the transfer-of-attention-exchange model (Reference Baron and RitovBirnbaum, 2008) for risky choices belong to this class.
The theory that claims the maximum of universality in this category is a generalized expected utility theory by Gary Becker (“’the economic approach”; Reference BeckerBecker, 1976, p. 14) stating: “I am saying that the economic approach provides a valuable unified framework for understanding all human behavior […] all human behavior can be viewed as involving participants who maximize their utility from a stable set of preferences and accumulate an optimal amount of information and other inputs in a variety of markets.” Precision is, however, relatively low in that neither the set of preferences nor the transformation function for utility is defined or easily measurable.Footnote 10 If, by contrast, one suggested that persons have monetary preferences only and assumed that the utility of money follows a specific concave utility curve, precision would be high and the theory would have high empirical content.
4.2 Single heuristic theories
Single heuristic theories are theories that consist of a single heuristic for which a specific application area is defined. The heuristics PH, RH, TTB, mentioned above, and many more, such as the original formulation of the equal weight strategy (Reference Dawes and CorriganDawes & Corrigan, 1974), elimination by aspect (Reference TverskyTversky, 1972), and affect heuristic (Reference Slovic, Finucane, Peters and MacGregorSlovic, Finucane, Peters, & MacGregor, 2002), fall into this class. The application area defined for these heuristics is sometimes rather limited, and therefore universality of these models can be low. The theories can become tautological if their application area is defined by their empirically observable application or by non-measurable variables as described above. Single heuristic theories often make predictions concerning multiple dependent variables such as choices, decision time, confidence, and information search. Hence, their precision is potentially high. To achieve this degree of precision, however, existential statements and ambiguous quantifiers such as “some people use heuristic X” or “people may use heuristic X” must be avoided because they reduce precision and empirical content to zero. Replacing these statements by “under the defined conditions more than 50% of individuals use heuristic X” (and will show the respective choice behavior) is one way to solve the problem. Given that sufficiently precise measurement of this proportion is possible (and only if this is the case, as argued by Hilbig, 2010a), modus tollens can be applied. If one wishes to be more conservative, a much lower limit (e.g., 1%) could be used, which then conveys important information concerning the universality of the theory (i.e., probabilistic universality).
4.3 Multiple heuristic theories
These kinds of theories define multiple strategies or heuristics that are applied adaptively. Models in this class are the contingency model (Reference Beach and MitchellBeach & Mitchell, 1978), the adaptive decision maker approach (Reference Glöckner and BetschPayne, et al., 1988), and the adaptive toolbox (Reference Hempel and OppenheimGigerenzer & Todd, 1999). Their (potential) level of universality is higher than for single heuristics because the theory’s scope is not limited to a certain decision domain. However, frameworks that propose an open set of heuristics without defining them have no empirical content because each deviation can be explained by adding a new heuristic (i.e., precision is zero; for a similar argument see also Fiedler, 2010). To have empirical content, a theory has to be defined as a fixed (or at least somehow limited) set of heuristics. The degree of precision of such a fixed-set theory can still be low if many heuristics are included and no clear selection criteria among the heuristics are defined. Examples for selection criteria are trade-offs between costs and benefits of applying strategies, selection contingent on time constraints, or selection based on previous learning experiences. All of these have to be measurable to increase empirical content. Concerning precision, the issues previously described for single heuristic theories apply.
4.4 Universal process theories
Universal process theories describe cognitive process on a fine-grained level. They often rely on one general mechanism that is applied very broadly to explain many seemingly unrelated phenomena. Considering such basic mechanisms often allows replacing multiple theories that explain phenomena on a more abstract level (e.g., Reference Dougherty, Gettys and OgdenDougherty, Gettys, & Ogden, 1999; Reference FiedlerFiedler, 2000; Reference Fiedler, Freytag and MeiserFiedler, Freytag, & Meiser, 2009; see also Reference Tomlinson, Marewski and DoughertyTomlinson, Marewski, & Dougherty, 2011). Universal process theories have a high level of universality because they are applied not only to decision making, but also to phenomena of perception or memory. They allow us making predictions on many dependent variables (e.g., Glöckner & Herbold, 2011) and therefore can be very precise. One of the problems that has to be solved, however, is that universal process models sometimes have many free parameters which are hard to measure and might therefore decrease precision (Reference Usher and McClellandMarewski, 2010; Reference Roberts and PashlerRoberts & Pashler, 2000).Footnote 11 Examples are sampling approaches (Reference FiedlerFiedler, 2000; Reference Fiedler, Freytag and MeiserFiedler, et al., 2009), evidence accumulation models (Reference BirnbaumBusemeyer & Johnson, 2004; Reference Betsch and PohlBusemeyer & Townsend, 1993; Reference Pleskac and BusemeyerPleskac & Busemeyer, 2010; Reference Usher and McClellandUsher & McClelland, 2001), parallel constraint satisfaction models (Reference Betsch and GlöcknerBetsch & Glöckner, 2010; Reference BirnbaumGlöckner & Betsch, 2008b; Reference Holyoak and SimonHolyoak & Simon, 1999; Reference Simon, Krawczyk and HolyoakSimon, Krawczyk, & Holyoak, 2004), cognitive architectures assuming production rules (Reference AndersonAnderson, 1987), and multi-trace memory models (Reference Dougherty, Gettys and OgdenDougherty, et al., 1999; Reference Thomas, Dougherty, Sprenger and HarbisonThomas, Dougherty, Sprenger, & Harbison, 2008).
5 Conclusions and remedies
We aimed to show that the Popperian criteria for evaluating the empirical content of theories are useful guides towards achieving scientific progress in JDM research. A rigorous analysis of the level of universality and degree of precision of theories increases sensitivity concerning how theories have to be formalized to maximize their empirical content. It also reveals the criteria for them to have empirical content in the first place, and it makes transparent which general weaknesses are present in current models. Many current theories lack precision. As we have shown, precision could easily be improved by adding simple assumptions or limitations to the theories (i.e., replace vague quantifiers; fix a set of parameters; define a method to measure parameters; limit the set of strategies; define precise selection criteria and application conditions).
Unfortunately, researchers could be caught in a social dilemma:Footnote 12 it is desirable for the scientific community to have sufficient theory specification to gain scientific progress, but it is individually rational not to have it because it allows for an easier path to defending one’s own theory (assuming that retaining one’s own theory has high utility for researchers). Also, it could just be convenient not to specify the theory, thus gaining more time to find out which specifications are most appropriate.
This dilemma could be solved on the level of scientific policy. The rigorous standards for empirical research provide the foundation of our science’s success and reputation. These standards are documented in catalogues such as the Publication Manual of the American Psychological Association, handed down in academic courses, and serve as guidelines for publication. We suggest that our publication standards should be extended by principles for the formulation of theories. Specifically, authors should be required to maximize empirical content in their formulation of theories. We suggest three criteria or obligations that may serve as standard evaluation of theoretical papers in a review process.
(1) Specification
Authors must explicitly state a finite set of definitions and propositions that together constitute their theory. The gold standard to achieve this aim is formalization. Importantly, this set of premises must be separated from antecedent conditions and auxiliary assumptions (Reference LakatosLakatos, 1970). Moreover, authors should be obliged to state more precisely how and to what extent their theory advances prior attempts of theorizing.
(2) Empirical content
Authors are obliged to formulate the propositions in such a way that the theory as a whole clearly predicts particular states of the world to occur and others not to occur. Wording and formulations that are defective to empirical content (e.g., existential statements) must not be included in those premises that constitute the nomological part of the theory.
(3) Critical properties
The authors should be obliged to explain which kind of empirical observations they would consider a fundamental violation of their theory. In other words, the authors themselves should make deductions from their postulated set of premises that could be potentially refuted on empirical grounds.
6 Epilog: Developments and some popular objections
According to our own experience, psychologists sometimes feel uncomfortable with Popper’s rigid approach to theory construction and evaluation. As mentioned above, his corroboration approach was criticized. Note, however, that our paper focuses on his ideas of theory construction and evaluation prior to empirical testing. These aspects have remained to be the basis for further methodological work, which we will outline next.
6.1 Developments in theory testing
In his influential plea for using strong inference in scientific investigation, Platt (1964) suggests a stepwise procedure for scientific progress. It starts with devising alternative hypotheses, followed by developing and conducting crucial experiment(s) to exclude (branches of) hypotheses. This procedure should be iterated to gradually approach the truth. Scientific advance is achieved by conducting critical experiments that exclude branches of hypotheses. Such a procedure requires as a fundamental prerequisite that theories are formulated in a falsifiable fashion. To put it in Popper’s terminology, Platt’s approach necessitates that theories have empirical content. Platt suggested that researchers should use the following test questions to evaluate whether a hypothesis and experiment is a step forward: “But Sir, what experiment could disprove your hypothesis?’; or, on hearing a scientific experiment described, ‘But Sir, what hypothesis does your experiment disprove?”’ (p. 352).
In a similar vein, the issue of falsifiability is central in debates on the persuasiveness of a good model fits and concerning ways to take into account model flexibility. Reference Roberts and PashlerRoberts and Pashler (2000) have argued that model fit per se is no persuasive evidence for a model since one also has to take into account whether “plausible alternative outcomes would have been inconsistent with the theory” (p. 358). This idea is mathematically formalized in the Normalized Maximum Likelihood (NML) approach (e.g., Davis-Stober & Brown, 2011; Reference Myung, Navarro and PittMyung, Navarro, & Pitt, 2006). In NML the likelihood for observing the data vector given the theory is normalized (i.e., divided) by the average likelihood for observing any possible data vector given the theory.
Obviously, a singular finding in conflict with a theory according to classic hypothesis testing with an alpha level of .05 does not suffice to falsify a theory. Bayesian approaches have been suggested to compare multiple hypotheses on equal footing and allow to easily integrating evidence over multiple studies (e.g., Matthews, 2011; Reference WagenmakersWagenmakers, 2007). Resulting posterior probabilities provide excellent quantitative measures for the degree of corroboration of a set of theories considering the available data in total.
6.2 Objections
A frequently raised knockout argument is that Popper’s ideas are outdated. It is true, indeed, that falsificationism as a methodology has faced many critiques. The most striking one refers to its difficulties of dealing with probabilistic evidence. Presumably, the Bayesian approach mentioned in the previous section is more viable as a pragmatic methodology of testing theories than Popper’s underspecified program of corroboration (e.g., Reference Howson and UrbachHowson & Urbach, 2006). Note, however, that this critique concerns the pragmatic methodological level only. Pragmatics deal with the problem how to treat a theory vis-à-vis empirical evidence (e.g., whether to keep, alter, or dismiss a theory). On the logical level, we deal with the formulation of a theory prior to testing. The question at this level is: does the theory allow us to derive precise predictions that could be critically tested? The pragmatic level concerns the issue of falsification (e.g., which methods are used when applying modus tollens), whereas the logical level concerns logical falsifiability of a theory. Therefore, the critique raised on the pragmatic level does not apply to the logical level. The point we want to make here is that a theory lacking empirical content is a bad theory because it is immune to reality, regardless of what kind of testing methodology will be applied. This follows purely from logical rules. These rules have not become outdated. If one is to criticize the notion of empirical content one has to dismiss logic: “Of course you are free to dismiss the principles of logic, but then you have the obligation to provide new ones.” (Hans Albert, in a personal communication to the second author in 1987)
6.3 Existential statements in theory formulation and theory testing
Findings that show the existence of certain phenomena are crucial for the development of science. According to Platt (1964), this is particularly the case if a theory (or a whole class of theories) exists that explicitly prohibits the occurrence of this phenomenon. Consider, for example, the case of coherence effects, this is, the finding that information is modified during the decision process to achieve a coherent pattern (e.g., DeKay, Patino-Echeverri, & Fischbeck, 2009; Reference Glöckner, Betsch and SchindlerGlöckner, Betsch, & Schindler, 2010; Reference Russo, Carlson, Meloy and YongRusso, Carlson, Meloy, & Yong, 2008; Reference Simon, Krawczyk and HolyoakSimon, Krawczyk, & Holyoak, 2004). Coherence effects violate normative and many descriptive models of decision making which assume informational stability. More generally, much valuable work in JDM has been devoted to demonstrating effects of systematic deviations from rational behavior (i.e., biases, fallacies). Without any doubt, it is extremely important to know that these phenomena exist. They give fresh impetus to theorizing (e.g., Reference Busemeyer, Pothos, Franco and TruebloodBusemeyer, Pothos, Franco, & Trueblood, 2011) and modification of existing models. Furthermore, knowledge of these deviations can also be important for other reasons such as proving important insight for public policy (e.g., Reference Baron and RitovBaron & Ritov, 2004; Reference McCaffery and BaronMcCaffery & Baron, 2006).
The principle of empirical content is violated, however, if one treats existential statements as theories or if existential statements are used to constitute the core of a theory (Reference Betsch and PohlBetsch & Pohl, 2002). One might think that this is sometimes the best we can do: to claim and to show that some people show some effects sometimes. Without any doubt, identifying new phenomena is an important first step in any science. However, framing an existential statement concerning an effect as a theory is not the best, but the worst thing we can do. Due to the principles of logic, an existential statement can never be falsified but only be verified, as Popper brilliantly illuminated in his Logic of Scientific Discovery. A single instance of proof suffices to verify an existential statement. Thereafter, a verified existential premise cannot be overcome by means of empirical research. Existential statements are among the reasons responsible for theory accumulation in psychology.
It is, however, important to mention that much work in JDM is concerned with investigating moderator effects for the prevalence of certain types of behavior. Following this approach, showing existence of the behavior can be a first necessary step. The existence statement “there are people that use lexicographic strategies” (which has zero empirical content) can, for example, be a first step towards developing and testing a theory stating that certain factors (e.g., memory retrieval costs; see Reference Bröder and GaissmaierBröder & Gaissmaier, 2007; Reference Bröder and SchifferBröder & Schiffer, 2003b; Reference Glöckner and HodgesGlöckner & Hodges, 2011) increase the usage of lexicographic strategies.
6.4 Implications and outlook
Why should we bother about excessive theory accumulation? The community is liberal when it comes to theory (in contrast to empirical methodology!). It allows new researchers to enter the arena quickly by providing demonstrations of phenomena. Obviously, if our main goal is to promote the academic careers of our graduate students, we definitely should not bother with such aloof concepts like empirical content (for an insightful discussion of the social dilemma structure in scientific publication, see also Dawes, 1980, p. 177). We should advise our students to perform like hunters and gatherers: seek a demonstration of an effect; postulate a corresponding “theory” and make the prediction not too strong in order to get published. If the goal, however, is to improve the quality of scientific predictions, we cannot be interesting in maintaining the status quo. We accumulate effects and “theories”, but in many cases we have no guidelines how to select a “theory” for making predictions. “Theory” accumulation is not a proof for progress, but rather an indicator for the lack of a shared methodology for theory construction and testing in our science.
“Time will tell”, as optimists might conjecture; eventually good theories will survive and the others will be forgotten. According to Poppers’ corroboration method, theories should compete against each other. The criterion for completion is to withstand critical tests. The prerequisite for making such tests, however, is that theories have empirical content. Otherwise they cannot take part in the game of competition. They will survive or be forgotten upon arbitrary criteria which are unscientific in nature. More generally, the underlying social dilemma structure has to be broken by introducing and enforcing standards such as the ones described above. To avoid the tragedy of the commons, the standard method for solving dilemma structures has to be applied also for theory construction in JDM: “mutual coercion, mutually agreed upon by the majority of the people affected” (Reference HardinHardin, 1968).
We would like to close with a positive example. Over half a century ago, Ward Edwards (1954) introduced a powerful theory to psychologists. He imported utility theory together with its set of axioms implying strong prohibitions. The influence of this theory was exceptional because it was so strong and because it turned out to be wrong. We learned much about the processes of human choice from critical tests of this theory. The biases, the fallacies, and the violations of principles soon filled our textbooks. They stipulated researchers thinking about how humans really make decisions. Our plea is simple. Strive for maximizing empirical content in theorizing. Do not be afraid of failure. Failure breeds scientific advance. This is the gist of Popper’s logic of scientific discovery.