9.1 Introduction
In recent years, the “replication crisis,” or the finding that many attempts to replicate prominent published studies have failed to reproduce their original results, has roiled the medical, social science, public policy, and development research communities (Reference IoannidisIoannidis 2005).Footnote 1 This has led to efforts to change both procedures and cultures in carrying out and publishing research, including a de-emphasis of p-values in statistical research, preregistration of studies using experimental designs or observational statistics, and, in some journals, preacceptance of studies based on their designs rather than their results.
Although many of the projects whose results could not be replicated were experimental studies, one response in the program evaluation community has been to increase the emphasis on experiments. Done well, these research designs, including field experiments and natural experiments as well as lab and survey experiments, remain powerful tools in program evaluation. Yet experiments impose demanding methodological requirements (Reference CookCook 2018; Reference Deaton and CartwrightDeaton and Cartwright 2018), they face challenges of external validity, and in some policy domains they are not practical for fiscal or ethical reasons. In addition, evaluators are often called upon to evaluate programs that were not set up as experiments, including programs instituted quickly to address pressing needs.
Thus, a second response in the evaluation community has been increased interest in “process tracing,” a method of causal inference that is applicable to single observational case studies (Reference Bamanyaki and HolvoetBamanyaki and Holvoet 2016; Reference Barnett and MunslowBarnett and Munslow 2014; Reference Befani and MayneBefani and Mayne 2014; Reference Befani and Stedman-BryceBefani and Stedman-Bryce 2017; Reference Busetti and DenteBusetti and Dente 2017; Reference Mendoza and WoolcockMendoza and Woolcock 2014; Reference Punton and WellePunton and Welle 2015; Reference Schmitt and BeachSchmitt and Beach 2015; Reference Stern, Stame and MayneStern et al. 2012; Reference Wauters and BeachWauters and Beach 2018). Process tracing has been common in political science for decades and has been the subject of recent methodological innovations, most notably the explicit use of Bayesian logic in making inferences about the alternative explanations for the outcomes of cases. Process tracing and program evaluation, or contribution analysis, have much in common, as they both involve causal inference on alternative explanations for the outcome of a single case (although process tracing can be combined with case comparisons as well). Evaluators are often interested in whether one particular explanation – the implicit or explicit theory of change behind a program – accounts for the outcome. Yet they still need to consider whether exogenous nonprogram factors (such as macroeconomic developments) account for the outcome, whether the program generated the outcome through some process other than the theory of change, and whether the program had additional or unintended consequences, either good or bad. Process tracing can address these questions, and it is also useful in assessing the validity of the assumptions behind natural, field, and lab experiments.
This chapter outlines the logic of process tracing and the ways in which it can be useful in program evaluation. It begins with a short discussion of the philosophy of science underlying process tracing and a definition of process tracing. It then turns to the role of process tracing in single case studies and in checking the underlying assumptions of experiments, field experiments, and natural experiments. Next, the chapter provides practical advice on process tracing for causal inference in individual cases and discusses the special considerations that arise in the use of process tracing in program evaluation. Finally, the chapter outlines an important recent development in process tracing methods: the explicit and transparent application of Bayesian logic to process tracing. It concludes that explicit Bayesian process tracing holds promise, but not yet proof, of improving the use of process tracing in causal inference and program evaluation.
9.2 The Philosophy of Science of Causal Mechanisms and Process Tracing
The increased interest in process tracing across the social and policy sciences is related to the turn in the philosophy of science over the last few decades toward a focus on causal mechanisms as the locus of causal explanation. Earlier, philosophers hoped that either “laws” or observed relations of statistical conditional dependence – analogous to what the philosopher David Hume called “constant conjunction” – would provide satisfactory accounts of causation and causal inference. The attempt to explain outcomes by reference to “laws” or “covering laws” foundered, however, when its advocates, including Carl Hempel, failed to come up with a justification or warrant for laws themselves (Reference Salmon and SalmonSalmon 1998, 69). In addition, Hempel’s approach, known as the “Deductive-Nomological (D-N) Model,” had difficulty distinguishing between causal and accidental regularities. In a common example, a barometer’s readings move up and down with changes in the weather, but they do not cause the weather. Rather, changes in air pressure, which are measured by a barometer, combine with changes in temperature and other factors (topography, humidity, ocean currents, etc.) to cause the weather. But the D-N model has trouble distinguishing between a barometer and a causal explanation of the weather, as the barometer readings exhibit strong law-like correlations with the weather.
In an effort to address these problems, philosopher of science Wesley Salmon attempted to work out a defensible schema of explanation based on conditional dependence, or, in Salmon’s terms, “statistical relevance.” After encountering several paradoxes and dead-ends in this effort, he ultimately concluded that “statistical relevance relations, in and of themselves, have no explanatory force. They have significance for scientific explanation only insofar as they provide evidence for causal relations … causal explanation, I argued, must appeal to such mechanisms as causal propagation and causal interactions, which are not explicated in statistical terms” (Reference SalmonSalmon 2006, 166).
Many philosophers and social and other scientists thus turned to exploring the role of causal mechanisms and causal processes in causal explanation and the roles of different research methods (experiments, observational statistics, case studies, etc.) in uncovering evidence about the ways in which causal mechanisms work and the contexts in which they do and do not operate. Within philosophy, the discussion of causal mechanisms has generally gone under the label of “scientific realism” (related but not necessarily identical approaches include “causal realism” and “critical realism”). This is the school of thought that Ray Pawson, Nick Tilley, and others in the evaluation community have drawn upon in their discussions of “realist evaluation” (Reference Astbury and LeeuwAstbury and Leeuw 2010; Reference Dalkin, Greenhalgh and JonesDalkin et al. 2015; Reference Pawson and TilleyPawson and Tilley 1997).
A detailed analysis of scientific realism and causal mechanisms, and of debates surrounding their definitions, is beyond the scope of the present chapter, but a brief summary will suffice. Realism argues that there is an ontological world independent of the mind of the observer or scientist, and causal mechanisms ultimately reside in that ontological world. Scientists have theories about how causal mechanisms work, and, to the extent that those theories are accurate, they can explain outcomes. In one widely cited formulation, causal mechanisms are “entities and activities, organized such that they are productive of regular changes from start or set-up to finish or termination conditions” (Reference Machamer, Darden and CraverMachamer, Darden, and Craver 2000, 3). In another definition that also emphasizes a kind of regularity, mechanisms are processes that cannot be “turned off” through an intervention (Reference Waldner and KincaidWaldner 2012). Fire happens, for example, whenever there is combustible material, oxygen, and a sufficient ignition temperature; we can intervene on the presence of oxygen or materials or the temperature, but we cannot intervene on the mechanism of fire happening when the requisite materials and conditions exist.
Mechanisms are in the world, and theories about mechanisms are cognitive or social constructs in our heads. Scientists make inferences about the accuracy and explanatory power of theories about mechanisms by outlining the observable implications of these theories and testing them against evidence. In frequentist studies, the observable implications of theorized mechanisms lie at the population level, such as the correlations one would expect to find if a theory were true. In contrast, process tracing gets closer to mechanisms where they actually operate: in individual cases. The operation and interaction of causal mechanisms is realized in specific cases and contexts, and scientists and evaluators are interested in building theoretical understandings of the conditions under which mechanisms are activated or deactivated and the ways in which they interact with other mechanisms.
In studying individual cases, process tracers focus not just on the values of the independent and dependent variables, but on diagnostic evidence of sequences and processes that lie in the temporal space between the independent variables and the observed outcome. Process tracing uses this evidence to make inferences about which theories most likely offer true explanations of a case’s outcome, sometimes called “inference to the best explanation.” Process tracers continually ask “What should be true about the sequence of events between the independent variables and the dependent variable if a theory is a true explanation of the outcome of a case?” In the social sciences, this often takes the form of asking “Who should have conveyed what information to whom, when, and with what effect at each stage in the process if this explanation is true?” Diagnostic evidence, ideally, is information that allows inferences about which processes are in operation, but that does not itself represent an additional variable that independently affects the operation or outcome of these processes. Diagnostic evidence, in other words, is not an “intervening variable” in a process, as the term “variable” implies an independent entity with its own potential causal effects.
9.3 Definition of Process Tracing
Process tracing is “the analysis of evidence on processes, sequences, and conjunctures of events within a case for the purposes of either developing or testing hypotheses about causal mechanisms that might causally explain the case” (Reference Bennett, Checkel, Bennett and CheckelBennett and Checkel 2015, 7).Footnote 2 Process tracing is a within-case form of analysis: that is, it seeks to explain the outcomes of individual cases (sometimes called “historical explanation” or “token explanation”). At the same time, process tracing can be combined with cross-case comparisons or other methods. Researchers can use process tracing, for example, to assess whether differences between most-similar cases might account for these cases’ different outcomes. The theoretical explanations of case outcomes assessed through process tracing can be about individual mechanisms or processes, or combinations of mechanisms and processes. They can include structural mechanisms, agent-based mechanisms, or any combinations thereof.
A key difference between process tracing and frequentist statistical analysis is that statistical analysis faces the “ecological inference” problem: even if a statistical correlation correctly captures an average causal effect for a population, it does not necessarily explain the outcome for any particular case in that population. Process tracing, in contrast, focuses directly on the causal explanation of individual cases. It may or may not uncover strong evidence leading to a confident explanation of a case, but it does aspire to develop directly the strongest explanation of the case that the evidence allows. Rather than facing an ecological inference problem, process tracing explanations, even when strong, face challenges regarding the external validity or generalizability of findings from individual cases. As Chapter 4 argues, the challenges of generalizing the results of case studies, while real, are often misunderstood. The explanation of an individual case can indeed prove generalizable: a new understanding of how a causal mechanism works, derived from the study of an individual case, can give strong clues about the scope conditions in which we should expect that mechanism to operate.
While process tracing is most often focused on the explanation of case study outcomes, the logic of process tracing can also be used in interrogating the validity of the strong assumptions necessary for experiments, field experiments, and natural experiments. In lab experiments, in addition to carrying out various balance tests on the treatment and control groups, researchers can use process tracing to check the procedures through which individuals were assigned to one group or the other, to assess the ways in which and reasons for which individuals opted to drop out of one group or the other, and to check on the possible presence of unmeasured confounders. Similarly, in field and natural experiments, where there is less control over assignment to treatment and control groups, researchers can use process tracing to assess whether the actual assignment or election into treatment and control groups was “as if random,” and to evaluate evidence on whether the hypothesized process does indeed account for differences between the outcomes of the treatment and control groups (Reference Dunning, Bennett and CheckelDunning 2015).
Process tracing is much like detective work: the researcher is seeking an explanation of one case, and they can use both deductive and inductive inferences to find the best explanation. Deductively, the researcher starts with some “suspects” – the theories that have typically been applied to the outcome of interest. In program evaluation, this includes the theory of change explicitly or implicitly adopted by a program’s designers and managers, but it also includes alternative explanations that relate to variables exogenous to the theory of change, such as macroeconomic trends, demographic change, local and national political developments, wars, natural disasters, etc. The researcher then looks for evidence on the deductively derived observable implications of each potential theoretical explanation of the outcome of the case. Just as a detective can reason forward from suspects and backward from a crime to connect possible causes and consequences, researchers can trace processes in both directions. A researcher can trace sequences forward from the independent variables, asking whether each caused the next step in the hypothesized chain leading to the outcome, and the step after that, and so on to the outcome. She or he can also trace backward from the outcome, asking about the most proximate step in the process that caused the outcome, and the step prior to that, back to the independent variables.
Deductively derived implications of a theory are one type of “clue,” but researchers also gather other kinds of evidence or clues that they stumble upon inductively as they investigate or “soak and poke” in their cases. Inductively discovered evidence might point to an existing social science theory that the researcher had not identified as a possible explanation of the case, or it may lead to the development of an entirely new theory as a potential explanation of the case. It is possible that an inductively identified piece of evidence, even evidence for an entirely new theory or explanation, can be so strong – so uniquely consistent with one explanation and so inconsistent with all other explanations – that this theory could become the most likely explanation for the outcome even without further corroboration. This cuts against the common but erroneous intuition that a theory developed from a case can never be considered to have undergone a severe test from the evidence that led to the theory. Anyone who has done their own amateur home or car repairs knows the experience of finding physical evidence that not only suggests but makes highly likely a heretofore untheorized explanation for why a switch, appliance, or part is not working.
In addition, our confidence in a newly derived or newly added potential explanation of a case can be strengthened if the explanation entails additional observable implications within the same case that are then corroborated by additional evidence. This contravenes the frequent claim that one cannot develop a theory from a case and test it against the same case. We can develop a theory from a case and test it against different evidence from the same case that is independent of the evidence that gave rise to the theory. It would be illogical, for example, for a doctor to diagnose a rare illness in a patient based on an unexpected test result, and then insist on testing the diagnosis on a different patient, rather than on an additional diagnostic test in the first patient.
9.4 Practical Advice on Traditional Process Tracing
The general approach of process tracing is fairly intuitive as it follows a kind of inferential process that has been around as long as humankind. Yet despite its seeming simplicity and familiarity, researchers do not always do process tracing well, and, as the final section of this chapter argues, even trained researchers make common mistakes in employing the Bayesian logic that underlies process tracing. So how can we do process tracing well? Elsewhere I have elaborated with my co-author Jeffrey Checkel on ten best practices for being a good traditional process-tracing detective; here, I introduce these practices briefly and elaborate on the considerations of each that are most relevant to program evaluation (Reference BennettBennett and Checkel 2013, 20–31). In the final section of this chapter I address how to carry out the more formal Bayesian variant of process tracing.
1. Cast the net widely for alternative explanations.
2. Be equally tough on the alternative explanations.
3. Consider the potential biases of evidentiary sources.
4. Take into account whether the case is most or least likely for alternative explanations.
5. Make a justifiable decision on when to start.
6. Be relentless in gathering diverse and relevant evidence, but make a justifiable decision on when to stop.
7. Combine process tracing with case comparisons when useful for the research goal and feasible.
8. Be open to inductive insights.
9. Use deduction to ask “If the explanation is true, what will be the specific process leading to the outcome?”
10. Remember that conclusive process tracing is good, but not all good process tracing is conclusive.
9.4.1 Cast the Net Widely for Alternative Explanations
One of the most common mistakes in case study research designs is the omission of a potentially viable explanation. It is important to consider a wide range of potential explanations, as the omission of a viable explanation can skew the interpretation of evidence on all the other explanations that a researcher does consider. Explanations for program outcomes need not be – and usually should not be – single-variable explanations. Rather, they can include combinations of interacting variables. There are four main sources of potential alternative explanations of program outcomes. The first is the program’s explicit or implicit theory of change, which should be evident in program documents and interviews with program managers. In practice, individuals may differ in how they view the theory of change or interpret its implications for how they administer the program, so it may be necessary to process trace different variants of the theory of change. As it is essential to not unduly privilege the theory of change, a second source of explanations includes those offered by other stakeholders (beneficiaries, government officials, members of communities who experience knock-on effects, etc.), as well as the implicit or explicit explanations news reporters give for program outcomes. A third range of candidate explanations consists of social science theories that researchers have typically applied to the kind of program or outcome in question. As there is a wide range of such theories, a useful checklist is to consider both explanations focused on variations among agents (their interests, capacities and resources, networks, ideas, etc.) and those focused on social structures (norms, institutional rules and transactions costs, and actors’ relative material resources).Footnote 3 Fourth, it is useful to consider the standard list of potential confounding explanations for program outcomes and to do process tracing on any that are relevant. These include:Footnote 4
History: exogenous events (economic cycles, elections, natural disasters, wars, etc.) during the program period that can affect outcomes.
Maturation: program beneficiaries might go through aging processes that improve or degrade outcomes over time.
Instrumentation: changes in measurement instruments or technologies during the program can affect the assessment of outcomes.
Testing: exposure to testing or assessment can change the behavior of stakeholders.
Mortality: there may be selection bias regarding which stakeholders or recipients drop out of the program.
Sequencing: the order in which program treatments are implemented may affect outcomes.
Selection: if acceptance into the program is not random – for example, if the program chooses to address the easiest cases first (low-hanging fruit) or the hardest cases first (triage), there can be selection bias.
Diffusion: if stakeholders interact with each other due to the program, this can affect results.
Design contamination: competition among stakeholders can affect outcomes; those not selected as beneficiaries might try harder to improve their own outcomes, or they might become demoralized and not try as hard to succeed.
Multiple treatments: if governments or other organizations are administering programs targeted at similar outcomes, or if the program being evaluated includes multiple treatments, this can affect outcomes.
There can also be potential interactions among these factors that merit process tracing.
9.4.2 Be Equally Tough on the Alternative Explanations
Being fair to alternative explanations is an obvious goal for evaluation and causal inference, but it can be difficult to achieve in practice given the cognitive propensity for confirmation bias. A key contribution of rigorous research methods, whether qualitative, quantitative, or experimental, is to make it harder to engage in the well-known heuristics and biases through which individuals often make faulty inferences. Process tracing methods aim to achieve this by requiring that we consider not only what evidence would be consistent with each explanation, but also what other explanations might be equally or more consistent with that same evidence. They also require that we consider what evidence would be inconsistent with each explanation, and the degree to which other explanations would be (in)consistent with that evidence. This can prevent the temptation to focus mostly on affirming evidence for one explanation and to neglect how that same evidence could also fit other explanations. A common mistake occurs when researchers do deep process tracing on one theory, such as the theory of change, and only cursory process tracing on alternative explanations. An unbiased estimate of how likely it is that a theory is a good explanation of the outcome of a case requires that the alternative explanations receive scrutiny as well. Process tracing proceeds not only by finding evidence that fits one explanation better than the others, but also by eliminative induction of alternative explanations that do not fit the evidence. The discussion of Bayesianism in Section 9.5 gives a more formal assessment of how the relative likelihood of evidence given alternative explanations should affect the confidence we invest in those explanations.
9.4.3 Consider the Potential Biases of Evidentiary Sources
The potential biases of stakeholders are sometimes fairly clear, but they can depend on institutional and contextual factors. A government official might want to cast a program in a good or bad light, for example, depending on their party affiliation. Program managers generally want to show that their program is succeeding, but they might be tempted to downplay the baseline achievements they inherited from their predecessors. It is important as well to consider not only motivated biases, but also unmotivated biases that can arise from the selective information streams to which individuals are exposed, or from procedures through which some documents are maintained and made accessible and others are discarded.
9.4.4 Take Into Account Whether the Case is Most or Least Likely for Alternative Explanations
This consideration applies to the ability to generalize the findings of a program evaluation to other contexts in which the program might be instituted. When a program succeeds in its least hospitable conditions, this can provide a warrant for arguing that it is likely to succeed in a wide range of conditions. When it fails in its most favorable context, this suggests a program is unlikely to succeed anywhere. For additional discussion, see Chapter 4.
9.4.5 Make a Justifiable Decision on When to Start
An obvious point in time at which to start an evaluation or establish a baseline is often at the initial implementation of a program. Different parts of a program may have started at different times, however, or they may have started at different times in different regions or for different groups of stakeholders. There can also be time lags between the proposal, approval, and implementation of a program, and during each period stakeholders might start to change their behavior in ways that either enhance or undermine program performance. For example, actors might try to corner the local market and increase the prices of local goods, properties, or services that will be in greater demand once a program starts. In addition, stakeholders may have had incentives to boost or depress some of a program’s indicators or measures to try to get initial baseline measures that suit their purposes. When such anticipatory behaviors are possible, it makes sense to consider beginning the evaluation period at the first point in time when actors became aware of the program (which might include private leaks of information, and rumors and misinformation, even before a program is publicly announced).
9.4.6 Be Relentless in Gathering Diverse and Relevant Evidence, but Make a Justifiable Decision on When to Stop
The Bayesian logic outlined at the end of this chapter gives rationales for why diverse evidence is important and for deciding on when it is reasonable to stop gathering additional evidence. Essentially, when we assess a particular kind of evidence, each successive piece of this evidence has less potential to strongly change our confidence in different explanations of a case. We will have already updated our views based on the earlier pieces of the same kind of evidence, so each new piece of this kind of evidence is less likely to surprise us, and at some point our time would be better spent looking at a different kind of evidence or a different observable implication of a potential explanation.
At the same time, the appropriate “stopping rule” for looking at a particular kind of evidence depends not just on whether each successive piece of evidence is consistent with the story told by each previous piece, but also on how unexpected that story is in the first place. As the philosopher David Hume wrote, “No testimony is sufficient to establish a miracle, unless the testimony be of such a kind, that its falsehood would be more miraculous than the fact which it endeavors to establish” (Reference Hume and Selby-BiggHume 1748, chp. 10).Footnote 5 We would thus demand more voluminous, consistent, and diverse evidence to be convinced that a program had an astonishingly strong or weak effect than to be convinced that it does not.
A third consideration for determining a stopping rule for policy-relevant process tracing concerns the question of what is at stake. The higher the consequences of a type I (false positive) or type II (false negative) inference on whether the program worked, the higher the degree of confidence we will seek to establish based on the evidence. It makes sense, for example, to demand more conclusive evidence for medical treatments where lives are at stake than for programs that might at best modestly improve incomes or at worst leave them unchanged.
9.4.7 Combine Process Tracing with Case Comparisons when Useful for the Research Goal and Feasible
Process tracing is a within-case form of analysis, but it can be combined with cross-case comparisons to strengthen inferences. In a “most-similar” case comparison, for example, a researcher selects two cases that are, ideally, similar in the values of all but one independent variable and that have different outcomes on the dependent variable. Before–after comparisons, which compare a preprogram baseline to postprogram outcomes, can be most-similar comparisons if important nonprogram variables do not change in the same time period. The goal in most-similar comparisons is to make an inference on whether the difference on the independent variable – or, here, the program intervention -- accounts for the difference on the dependent variable. The key limitation of this design is that even if all but one of the independent variables are closely matched, there may be other untheorized differences between the two cases, including exogenous variables that change in the time period between the inception and the evaluation of a program, that might account for the difference in their outcomes. It is thus important to do process tracing on the independent variable that differs, or the program intervention, to show that it created a causal chain leading up to the outcome. The researcher should also process trace the hypothesized effects of any other potential independent variables that differ between the comparison cases, and to the extent that this reveals that they can be ruled out as causes of the cases’ differing outcomes, we can be more confident that the program’s theory of change generated the outcome.Footnote 6
9.4.8 Be Open to Inductive Insights
Because the omission of a viable candidate explanation can undermine inferences about a case, it is important to watch for potentially causal variables that were omitted from the initial list of candidate explanations. The feeling of surprise at discovering an unexpected potential causal factor is something to be savored rather than feared, as it signals that there may be something new to be learned about the process that led to the outcome. Cases where the outcome was surprisingly good or unexpectedly poor, or “deviant” or “outlier” cases, are good candidates for process tracing that puts added emphasis on inductive soaking and poking to identify and assess variables whose omission from researchers’ or practitioners’ prior theories might explain why one or both communities were surprised by the outcome.
9.4.9 Use Deduction to Ask “If the Explanation Is True, What Will Be the Specific Process Leading to the Outcome?”
Researchers need to think concretely about specific hypothesized processes in order to do process tracing well. Social science theories are usually stated in general terms, and it is necessary to adapt them to the case and circumstances at hand and ask what specific sequences and events they would predict if they were to constitute an adequate explanation for the outcome.
Consider the example of microfinance. On one level, the hypothesized mechanism through which such loans work (if they do) is simple: microloans give credit to businesses too small or informal to have access to conventional loans. Yet depending on the details of the microfinance program, several different mechanisms may be at work. In the process of applying for a microloan, applicants might receive feedback that improves their business plans, and those that receive loans may receive further monitoring and advice. Being accepted as a loan recipient might be seen as an indicator of the quality of the applicant’s business plan, opening the door to additional credit, whether from social networks or formal financial institutions. If the savings that provide the funds for loans come from local actors who also decide on which loans to make, as in solidarity lending, this can create social pressures – and social resources – for the business to succeed and for loan repayment. Transactions costs, interest rates, inflation, macroeconomic trends, and other factors can affect whether and how microloans work as well. It is necessary to specify concretely how each of these possible mechanisms might have worked in the case at hand, and to outline the observable implications for each, in order to carry out process tracing.
Educational programs provide another example of the importance of thinking concretely about how projects actually work. University scholarship programs aim to provide opportunities for students who could not otherwise afford higher education. It is relatively easy to measure inputs (how many scholarships were given out) and outputs (how many scholarship recipients graduated), but the challenge is to assess how such a program actually works and what its actual effects are compared to the counterfactual world in which the program did not exist. On what basis does it select students for funding? How does it establish and verify the criterion of financial need? Does it also advise students on how to apply to universities and how to prepare for and succeed once they begin attending? Does it get students into programs they would not otherwise attend, or to which they would not even apply without the possibility of a scholarship? What programs were students contemplating or applying to before and after they heard of the scholarship? Might the same students have received scholarships or loans that would allow them to get a university education at the same institutions? Does the scholarship lead to a higher rate of program completion for funded students compared to students who nearly won funding? Were funds provided in a timely way in each semester, or did delay cause dropouts or registration difficulties? Did scholarship students expand the capacities of universities and the numbers of students they accepted, or merely take the place of other students who then had to go to other universities? Did accepting the scholarship open up other funds or resources that the student would have used, creating opportunities for yet other students (including siblings, cousins, etc.)? Such concrete questions get us closer to assessing the actual outcomes that arose and the ways in which they came about.
9.4.10 Remember That Conclusive Process Tracing Is Good, but Not All Good Process Tracing Is Conclusive
When the evidence from a case sharply discriminates among alternative hypotheses – that is, when it is likely to be true under one hypothesis but very unlikely under the alternatives – this allows strong claims that the one hypothesis consistent with the evidence is a strong explanation of the outcome in the case. The evidence is not always strongly conclusive, however, and it is important not to overstate the certainty that the evidence allows. The evidence may be weak or mixed, and it is important to convey how strong the evidence is and how strong the inferences are that the evidence allows. As discussed later in this chapter, this can be expressed in informal terms, such as “smoking gun” versus “straw in the wind” evidence, and “high confidence” or “likely” explanations, or it can be conveyed in numerical point or range estimates of probabilities ranging from zero to one.
In addition, often a combination of factors rather than one factor alone explains the outcome of a case, and it can be difficult to figure out process tracing tests that discriminate among all the possible interactions of the variables of interest. For example, in a particular case of microfinance, it may be that expanded credit alone was sufficient for the outcome, or it may be that this together with business advice from the lender generated the outcome. To distinguish among these, an evaluator would have to think of observable implications that would be consistent with the “credit alone” explanation but not the “both together” explanation, and vice versa.
A third reason to be careful to not overstate the certitude that the evidence allows is that it is always possible that the outcome is due to an explanation that the evaluator did not consider. As discussed later in this chapter, the Bayesian logic in which process tracing is rooted requires exhaustive and mutually exclusive explanations in order to function completely, and it is never possible to know with certitude that one has considered all the possible explanations. This is one reason that Bayesians do not allow for 100 percent certitude in any inferences.
9.5 Program Evaluation Process Tracing versus Social Science Process Tracing
There is one key difference between program evaluation process tracing and social science process tracing, and it generates both advantages and challenges for program evaluators. This is the fact that the experts who design policy interventions have the opportunity to outline in advance diagnostic indicators that will later provide evidence on whether a program is working as its theory of change suggests. Moreover, officials can require that program implementers begin gathering and reporting evidence on these indicators from the inception of the program or even the preprogram baseline. If the indicators are well designed, and if they also include data on alternative causal processes that might affect program outcomes, this greatly eases the task of program evaluation. Social scientists, in contrast, usually have to devise their own process tracing tests and gather the relevant evidence themselves after the events under study have already taken place.
Predesignation of program indicators can present challenges as well, however. First, indicators may be poorly designed and fail to provide strong evidence on the mechanisms through which the theory of change is expected to operate. Program outcomes can be difficult to conceptualize and measure, which can create a tendency to rely on measuring inputs or outputs instead of outcomes (Reference CastroCastro 2011; Reference Markiewicz and PatrickMarkiewicz and Patrick 2016; Reference Van der KnaapVan der Knaap 2016). Diagnostic process tracing evidence is not the same as measures of outputs or outcomes, as it focuses on hypothesized causal mechanisms and processes, but it can overlap with output measures. There can also be a temptation to focus on diagnostic measures that are easy to measure rather than those that provide strong evidence for causal inference.
Second, there is a risk that program managers and other stakeholders will “game” the measurement and reporting of indicators to slant them toward their desired evaluation results. It can be difficult to devise diagnostic measures that provide strong evidence on the causes of program outcomes and that are not also susceptible to gaming. Essentially, this requires devising diagnostic measures that program implementers cannot achieve unless they actually are faithfully carrying out the program in accordance with its theory of change. This can lead to another problem, however: if diagnostic measures are too demanding and detailed, or if program implementers think (rightly or wrongly) that the theory of change is imperfect and that their experience and skills (or changed circumstances) give them better ideas on how to achieve the program’s goals, these program managers will face unpleasant choices between following micromanaging guidelines that they think are inappropriate or departing from the prescribed practices and measures. This raises the classic dilemmas concerning how much authority and flexibility to delegate in principal–agent relations, how to monitor agents through management information systems, and whether and how to allow for changes in the middle of program implementation (Reference HonigHonig 2018). While there is no perfect solution to these dilemmas, consulting stakeholders and program managers on the design of appropriate diagnostic measures and putting in place procedures and decision-making processes for modification or adaptation of these measures can minimize the trade-offs between too much and too little delegation and oversight (Reference Gooding, Makwinja and NyirendaGooding et al. 2018).
Perhaps a more common challenge, however, arises when program designers had an under-specified theory of change or gave insufficient attention to developing and gathering evidence on indicators that would make later process tracing and program evaluation easy. Even when a theory of change is well specified, evaluators need to assess its coherence and consider alternative explanations that program managers may not have considered or on which they did not gather evidence. In this regard, program evaluators are often in a position similar to that of social scientists who design and gather evidence on alternative explanations only after the events of interest have taken place.
9.6 Bayesian Logic and Process Tracing
The best practices outlined earlier address the “traditional” process tracing that characterizes almost all published research and completed program evaluations to date. In the last few years, however, methodologists have begun to explore the possibility of applying more explicitly and formally the Bayesian logic that underlies process tracing. There are as yet few applications of this approach to empirical research, and there are strong pragmatic reasons why full formal Bayesian analysis of evidence from case studies is not appropriate in most research settings. Still, it is useful to understand the formal Bayesian logic that informs more informal process tracing practices, as this can lead to better implementation of these less formal practices. In addition, it may be useful to apply more formal Bayesian analysis to a few of the most important pieces of evidence in a study even if it is unduly cumbersome to do so for most of the evidence. While a full discussion of the Bayesian logic of process tracing is beyond the scope of this chapter, the brief outline that follows provides an introduction to the topic.Footnote 7
In Bayesian analysis, probability is conceived of as the degree of belief or confidence that we place in alternative explanations. This is quite different from the standard frequentist statistical conception of probability as representing the likelihood that a sample is or is not representative of a population. In Bayesian analysis of individual case studies, the analyst starts with a “prior,” or an initial guess regarding the likelihood that alternative explanations are true regarding the outcome of the case. The analyst uses the logic of the explanations, or of their underlying theories, to estimate how likely particular kinds of evidence are in the possible worlds represented by each explanation. The analyst then uses the laws of conditional probability to translate the likelihood of evidence given alternative explanations into the likelihood of alternative explanations given the evidence. This new, updated estimate of the likelihood that alternative explanations are true is called the “posterior” probability, or simply the posterior.Footnote 8
Bayesianism provides a formal language for discussing the relative strength or probative value of different pieces of evidence. We already have an informal language for this: “smoking gun” evidence strongly supports one explanation, but the absence of such evidence does not necessarily reduce confidence in that explanation. Passing a “hoop test” is asymmetrical in the other direction: an explanation is strongly undermined if it fails a hoop test, but we do not necessarily greatly increase our confidence in an explanation that passes a hoop test. These informal examples are points on a continuum: the “likelihood” of evidence taking on a certain value if a theory or an explanation is true can range from 0 to 1, and when we compare the likelihood of evidence under one explanation to its likelihood under an alternative – that is, when we divide the likelihoods – this ranges from 0 to infinity. The more likely evidence is under one explanation, and the less likely it is under the alternatives, the more strongly the discovery of that evidence affirms the one explanation it fits. It is the relative likelihood of the evidence under the alternative explanations, or the “likelihood ratio,” that matters, not the absolute likelihood that the evidence or data will take on a certain value if one explanation is true.Footnote 9
Bayesian inference, however, is only as good as the information that informs the analysis, which raises the obvious question: How do we estimate the priors and likelihoods? The prior, or our initial guess on the likelihood that a particular theory correctly explains the outcome of a case, in principle represents all of our “background knowledge,” or all of our conclusions and intuitions from previous research and experience. In some situations, such as when we have mountains of data, we can use well-informed priors, just as life insurance companies do when they use the ample data at their disposal to estimate life expectancies given a person’s age, health habits, and health indicators. Most of the time in social science settings, however, we lack a strong evidentiary basis for estimating priors. One option here is to use uninformed priors – that is, to give each alternative explanation an equal prior (such as a prior of 1/3 if there are three candidate explanations). Another option is to try the analysis with different priors to see how sensitive the conclusions are to the choice of the prior; if the evidence is strong, the estimate of the prior might not matter much to the estimate of the posterior (Bayesians call this the “washing out of priors”). A third approach that case study methodologists are beginning to assess is to “crowd source” estimates of priors, whether among subject matter experts or nonexperts.
Estimating likelihoods of evidence is challenging as well. This requires “inhabiting the world” of each hypothesis – that is, assuming that the hypothesis is true and then assessing the likelihood of a piece of evidence given the truth of the hypothesis. Estimating likelihood ratios requires performing this task for multiple hypotheses. On the other hand, it can be easier to assess the relative likelihood of evidence – to ask which of two hypotheses makes the evidence more likely, and even to estimate the ratio of these likelihoods – than to estimate the absolute likelihood of evidence given each hypothesis. As with estimating priors, researchers can try crowd-sourcing estimates of likelihoods.
A third challenge is arranging the alternative explanations, as Bayesian inference requires, in such ways that they are mutually exclusive and exhaustive. This includes explanations that combine several interacting theoretical variables or causal mechanisms, such as agents, institutions, norms, etc. In principle, this is possible for any group of hypotheses. To take a simple example, a criminal investigator might divide the explanations for a murder into four possibilities: the murder could have been committed by suspect A alone, by suspect B alone, by both A and B colluding together, or by neither A nor B. The next step is a bit more complex: the investigator has to think of the likelihood of different pieces of evidence under all these possible explanations, and, ideally, to find evidence that strongly discriminates among the explanations. This can be difficult for murder investigations: the detective has to ask what evidence would point to collusion that would not also be consistent with A or B acting alone. It is arguably even more challenging for social science researchers who are evaluating various combinations of structural, normative, macroeconomic, managerial, and other factors that can contribute to the success or failure of development programs.
A final difficulty with formal Bayesian analysis is that the calculations it requires become tedious and lengthy to write up and to read even for a small number of pieces of evidence and alternative explanations, and much more so for multiple pieces of evidence and explanations. For this reason, even the methodologists who have begun to explore formal Bayesian process tracing argue against trying to implement it fully for all the evidence (Reference Fairfield and CharmanFairfield and Charman 2017).
Still, it can be useful to do formal Bayesian analysis on one or a few of the pieces of evidence that a researcher judges to be most powerful in discriminating among alternative explanations, as this can make the analysis more transparent. Specifically, understanding the Bayesian logic of process tracing can contribute to better process tracing practices in at least four ways. First, Bayesian logic provides a clear philosophical warrant for much of the practical advice methodologists have given regarding traditional process tracing, including the ten best practices of traditional process tracing discussed earlier. One reason to initially consider a wide range of alternative explanations, for example, is that failing to consider a viable explanation can bias the estimates of the likelihoods, and thus the posterior estimates, of all the explanations the analyst does consider. Bayesianism also gives a clear explication of what constitutes strong evidence, of why diverse and independent evidence is important, of the trade-offs involved in stopping too soon or too late in gathering and analyzing evidence, and of why we should never be 100 percent confident in any explanation.
Second, Bayesianism leads to counterintuitive insights. Evidence that is consistent with an explanation, for example, can actually make that explanation less likely to be true if the same evidence is even more consistent with an alternative explanation. Also, numerous pieces of weak evidence (or what might be called “circumstantial evidence” in a court), if they all or mostly point in the same direction, can jointly constitute strong evidence that considerably changes our confidence in alternative explanations.
Third, formal Bayesian analysis, even if it is done only on a few key pieces of evidence, provides a transparent form of inference that allows researchers and their readers or critics to identify exactly why their inferences diverge when they disagree on how to update their confidence in explanations in light of the evidence. Researchers and their readers can disagree about their priors, the likelihood of evidence under alternative explanations, and the interpretation or measurement of the evidence itself. Leaving estimates and interpretation of each of these ambiguous obscures where authors and readers agree and disagree. Making judgments on each of these clear, in contrast, can prompt researchers and their critics to reveal the background information that underlies their judgments, which can narrow areas of disagreement.
The fourth, and perhaps strongest, rationale for learning Bayesian analysis is that it illuminates the logic that traditional process tracers have used informally all along in order to make causal inferences form individual cases, and it can help them to use it better. Research on the psychology of decision-making indicates that people often make mistakes when they try to be intuitive Bayesians or first attempt formal Bayesian analysis (Reference Casscells, Schoenberger and GrayboysCasscells, Schoenberger, and Grayboys 1978). Other research shows that deeper training in Bayesian analysis can help improve forecasting (Reference Tetlock and GardnerTetlock and Gardner 2015). Additional research indicates that a few simple practices consistent with Bayesian process tracing, such as actively considering alternative explanations, can help debias judgments (Reference Hirt and MarkmanHirt and Markman 1995).
9.7 Conclusion
Process tracing and program evaluation, especially forms of evaluation that emphasize contribution analysis, have much in common. Both involve inferences on alternative explanations of outcomes of cases. It is not accidental that the evaluation community has taken a growing interest in process tracing, or that process tracing methodologists have become interested in program evaluation. The best practices developed in traditional social science process tracing are applicable, with modest adaptations, to the task of program evaluation. The biggest difference is that in contrast to researchers doing process tracing in the social sciences, program evaluators may have the opportunity to designate in advance, and to require reporting upon, diagnostic indicators about alternative processes as well as measures of inputs, outputs, and outcomes. This can make later evaluation easier, but it can also introduce potential distortions and biases as program managers and stakeholders might “game the system” once they know what measures will be tracked. Program designers and evaluators need to be creative and flexible in designing indicators that are useful in subsequent program evaluations, that cannot be achieved without also achieving the desired results at which a program aims, and that do not become a straightjacket on program managers when modifications to a program can better achieve its goals.
Program evaluators can benefit as well from exploring the emerging literature on formal Bayesian process tracing. This literature clarifies the logic behind traditional process tracing methods, and it is beginning to explore and outline new practices, such as crowd-sourcing of estimates of priors and likelihood ratios, that might further strengthen process tracing. Although formally analyzing the weight of every piece of evidence is impractical, it can be useful to formally assess a few of the strongest pieces of evidence. This can contribute to more logically consistent and analytically transparent assessments of alternative explanations of program outcomes.
10.1 Introduction
Case studies can contribute valuably to the study of development generally and to the implementation of development policy in particular. Case studies are uniquely well situated to identify and disentangle causal complexity and to interrogate the role of contextual factors in shaping outcomes, among other strengths. In this chapter, I focus on the potential insights that can be derived from the study of a particular kind of case – the deviant case – and, more specifically, on anomalies that exceed expectations, or “positive deviant” cases. I argue that the study of positive deviance can offer two distinct types of benefits for development policy. The first is methodological: Building on the literature on case selection in the social sciences, I emphasize the value of deviant cases for hypothesis generation and for the analysis of causal heterogeneity. The second potential contribution is less technical. Deviant cases can play an important inspirational role, signaling to practitioners, policy-makers, and local development actors that improvement is possible, even in resource-constrained environments. At the same time, the celebration of positive deviant cases must proceed with caution, not only because idiosyncratic factors may deter the replication of their experiences to other units or contexts, but also because other actors operating in the same sectors or communities may feel undermined if they are implicitly judged vis-à-vis similar institutions or actors deemed to be more successful.
In Section 10.2, I elaborate on the concept of positive deviance and highlight the potential value of positive deviant cases for development policy. The discussion focuses on both the methodological strengths and limits of positive outliers and on the value and risks of using such cases as sources of inspiration for local actors and development practitioners. The section identifies ways of selecting positive deviant cases in systematic and less systematic ways. In the subsequent section of the chapter, I illustrate the value of positive deviant cases for several projects on the quality of social service delivery in the Middle East and North Africa to which I have contributed. Section 10.3 summarizes the main arguments and raises additional issues related to the pros and cons of using positive deviance as a way to formulate and propel beneficial reforms in development policy.
10.2 What Can Development Policy Learn From Positive Deviance?
The analysis of high-performing outliers is a promising approach for the formulation and implementation of development policy on both technical and nontechnical grounds. The literature on case studies in the social sciences highlights the methodological value-added of positive deviant cases in research designs. An additional, nontechnical merit of the approach emphasizes the role of such cases in encouraging greater performance from other actors and institutions operating in the same field. In this section, I define positive deviance and review these distinct justifications for the study of positive deviance.
10.2.1 What are Positive Deviant Cases?
Positive deviant cases are outliers that exhibit superior performance than the predictions of a model would hold. In technical terms, this refers to cases selected to maximize |Yi – Ŷi|, or the difference between the actual value and the fitted value in a regression (Reference GerringGerring, 2007: 89; Reference SeawrightSeawright, 2016: 16). In order to qualify as cases of positive deviance, however, cases must be more than ephemeral outliers operating in an environment conducive to good performance. Rather, examples of positive deviant organizations or other types of collective actors must exhibit sustained high performance in a context in which good results are uncommon.
Writing from the perspective of organizational behavior in the management literature, Reference Spreitzer and SonensheinSpreitzer and Sonenshein (2004) present a somewhat distinct definition and operationalization of positive deviance that is especially useful when thinking about the role that deviant cases can play in the formulation and implementation of development policy. Their definition emphasizes a normative component of positive deviance, depicting the construct as behavior that departs from established norms in a referent group “in honorable ways” (Reference Spreitzer and SonensheinSpreitzer and Sonenshein, 2004: 832). This understanding differs from the more neutral statistical perspective in which deviance – whether positive or negative – represents behavior that diverges from average or normal experiences (Reference Spreitzer and SonensheinSpreitzer and Sonenshein, 2004: 830). In their analysis of positive deviance in agricultural development, Reference Pant and OdamePant and Hambly Odame (2009: 160), too, describe positive deviants as those who break from prevailing norms to promote positive change, which in their study is measured by the facilitation of agricultural knowledge creation and application in the instances they highlight. As I hope to show, an understanding of positive deviance that incorporates a technical, statistical definition while including attention to deviation from prevailing norms and practices in a given sociopolitical context is essential if we view positive deviant cases in both methodological and inspirational terms.Footnote 1
10.2.2 Methodological Justifications
The literature on research design in the social sciences identifies a range of case selection strategies and highlights the pros and cons of each type for distinct research goals (Reference Collier and MahoneyCollier and Mahoney, 1996; Reference Eckstein, Gomm, Hammersley and FosterEckstein, 2009; Reference GeddesGeddes, 1990; Reference GerringGerring, 2007; Reference Przeworski and TeunePrzeworski and Teune, 1970; Reference SeawrightSeawright, 2016; Reference Seawright and GerringSeawright and Gerring, 2008). What, if anything, can be learned from positive deviance in the context of development policy?
The consensus view on the role of deviant cases is that they are most useful for exploratory purposes or discovery. Anomalous cases, whether positive or negative, are valuable for theory building exercises and to search for new but as yet unspecified explanations (Reference OdellOdell, 2001: 166; Reference SeawrightSeawright, 2016; Reference Seawright and GerringSeawright and Gerring, 2008: 302). They can also generate insights into the identification and operation of causal mechanisms not examined in the existing literature (Reference George and BennettGeorge and Bennett, 2005: 20; Reference GerringGerring, 2007: 89; Reference Seawright and GerringSeawright and Gerring, 2008: 303). In the case of positive deviance, then, these outliers can show how specific causal mechanisms propelled high performance, indicating how specific values on a previously unexplored variable propel superior outcomes.Footnote 2
Deviant cases are well suited to pinpoint sources of causal heterogeneity (Reference SeawrightSeawright, 2016: 21, 25). This goal is particularly relevant for development policy because it is increasingly recognized that social context affects the implementation and results of the same types of interventions and policies (Reference Pritchett and SandefurPritchett and Sandefur, 2013). Policy interventions that have been validated through experimental research in one country or subnational unit often do not yield the same results in another place, limiting the broader value of studies based on experimental designs for development policy (Reference Pritchett and SandefurPritchett and Sandefur, 2013; Reference Rodrik, Cohen and EasterlyRodrik, 2008: 26–27). An advantage of a qualitative approach to studying positive deviance is that contextual factors can be identified and integrated into an analysis of what has or has not worked in a given unit (Reference Bradley, Curry, Ramanadhan, Rowe, Nembhard and KrumholzBradley et al., 2009: 3). As a result, it is possible to study the way that social norms within an organization or in the community where it is embedded may affect the behavior of relevant actors and, hence, the performance of the organization as a whole.
At the same time, deviant cases have inbuilt limitations. By virtue of their status as outliers, they are obviously not useful for identifying the causal pathways of average or on-the-line cases. In addition, as Reference SeawrightSeawright (2016: 21) argues, they may not be as useful for identifying omitted variables as some have suggested (Reference Bennett and ElmanBennett and Elman, 2006). Furthermore, as is true for other types of case studies, deviant cases cannot on their own refute a theory. However, when a theory is based on a deterministic proposition, deviant cases can also be used to disconfirm a theory (Reference Seawright and GerringSeawright and Gerring, 2008: 302), although this is a less common causal premise in mainstream social science research.
It is also important to stress that identifying cases of positive deviance by selecting cases with exceptionally high values on the outcome of interest is subject to the usual pitfalls of selecting on the dependent variable (Reference King, Keohane and VerbaKing, Keohane, and Verba, 1994). If we only study cases of high performers, we are liable to falsely ascribe their success to factors that may also be present in less successful cases as well. For this reason, the convention in qualitative research is to select on the independent variable whenever possible. But analyses of positive deviant cases among a field of otherwise similar cases that operate in the same context or area can be a valuable way to identify potential explanatory variables for exceptional performance. The hypothesized explanatory variables can then be incorporated in subsequent quantitative or qualitative studies in which their effects are evaluated more generally.
Despite these words of caution, taking a positive deviance approach has the potential to change the way researchers or development practitioners think about the phenomenon or outcome in question because the goal of analyzing outliers is to identify factors that explain why it does not fit the general pattern. From a statistical perspective, the ultimate goal of analyzing deviant cases is to aid in the development of a new model or the revision of an existing model so that the variable responsible for the anomalous value of a case is incorporated. The insights generated from a close analysis of the positive (or negative) deviant case can then be can tested more generally in a broader set of cases. To the extent that this model works, the deviant case is no longer deviant (Reference Seawright and GerringSeawright and Gerring, 2008: 302–303). Deviant cases, then, can play an important role in revising the conventional wisdom on the determinants of a particular development outcome.
Positive deviant cases can yield especially valuable insights when situated in a larger mixed methods approach. In particular, if a qualitative analysis of high performers precedes a quantitative study, this affords the possibility of identifying previously unconsidered hypotheses or of devising measures that can tap into potentially important constructs. Process tracing is a valuable technique for identifying the factors that explain deviant performance (Reference George and BennettGeorge and Bennett, 2005, p. 215), which can then be assessed in a larger sample. For example, in their discussion of learning from positive deviance in health care delivery, Reference Bradley, Curry, Ramanadhan, Rowe, Nembhard and KrumholzBradley et al. (2009) present an approach that entails (a) the identification of organizations that have consistently demonstrated high performance on a clear set of indicators, (b) intensive analysis using qualitative methods to generate hypotheses about the specific practices and policies that led to exceptional performance, (c) the use of statistical tests in a larger, representative sample of organizations to assess the broader applicability of the hypothesized explanatory factors, and (d) the dissemination of best practices to other organizations operating in the same field. The discussion of a multistep research project on the determinants of quality in social service provision in Jordan, which was led by the World Bank, is another example of this approach (see Section 10.3.1).
10.2.3 Inspirational Justifications
A less technical perspective highlights a distinct potential benefit of focusing on deviant cases, and especially positive deviant cases, with notable benefits for development policy. The identification and exploration of high performers can be motivating in and of itself through a signaling effect. In the context of low-resource environments, which development policy tends to target, cases of success against the odds indicate to others that there is hope. When the prospects for advancement seem dim, examples of high performers signal that it is possible to break, or at least start to weaken, development traps.
Actors embedded within resource-constrained communities may regard the experiences of other actors or organizations in similar contexts as applicable to their own circumstances, increasing the probability that they will seek to emulate their practices. This dynamic may facilitate the diffusion of best practices, in the same way that activists in the Arab region or in other regional waves of mass mobilization emulated the strategies and tactics of first movers because they perceived that they faced similar opportunities and constraints in their own countries (Reference Patel, Bunce, Wolchik and LynchPatel, Bunce, and Wolchik, 2014). In this sense, positive outliers can help to stimulate a sense of “cognitive liberation” (Reference McAdamMcAdam, 1999), whereby practitioners and policy-makers come to believe that change is not only desirable, but possible. The mere act of shedding feelings of resignation and hopelessness can propel positive change by motivating greater effort and stimulating collaboration toward shared community goals.Footnote 3
In the day-to-day design and implementation of development policy, highlighting examples of positive deviance is likely to play a far more constructive role than pinpointing instances of underperformance or negative deviance. Positive incentives can be a source of motivation, potentially inducing greater effort and commitment to problem-solving among stakeholders, whereas sanctions may have a deterrent effect by increasing the sense of resentment and hostility to change (Reference Ryan and DeciRyan and Deci, 2000). Thus, just as emphasizing instances of success may galvanize positive change, policy-makers and development practitioners must tread lightly when advertising success stories lest others in the community resent the fact that they, too, were not the objects of praise. As I note below, this is precisely the reason why Moroccan officials were hesitant to include identifying information in a short case study of high-performing public health centers included in a World Bank report (Reference Belkâab and CammettBelkâab and Cammett, 2014; Reference Brixi, Lust and WoolcockBrixi, Lust, and Woolcock, 2015). For these reasons, how the lessons of positive deviance are shared can affect the probability that other organizations will adopt them. If the management of organizations with less-distinguished records feels threatened or marginalized when learning of the exceptional performance of others, then it will be difficult to foster acceptance of the identified best practices. If, on the other hand, others can be made to feel as if they are partners in the identification and scale-up of effective strategies, then dissemination may be more likely.
In short, in development practice it is imperative to distinguish between the empirical identification of positive deviant behavior for methodological purposes and the inspirational role of positive deviant actors or organizations in driving beneficial outcomes. Positive deviants in the context of development policy and outcomes may be more than statistically anomalous cases. From the standpoint of effecting tangible change in public goods provision or other development policy goals, they may be actors or groups who drive innovation and bring about solutions to problems that are not well addressed under the status quo; alternatively, they may just be those who, on an everyday basis, seek to find ways to manage existing resources in a way that yields strong performance against the odds. As such, an analysis of deviant cases can serve at least two roles. First, a close analysis of the actions and behaviors of deviant actors, and especially the identification of causal pathways linking these to outcomes, provides explanations for why a handful of cases exceeded expectations. Second, in the design and implementation of development policies and programs, positive deviants can play an important signaling role by demonstrating to others who toil under equally challenging circumstances that success (or at least improvement) is possible.
10.2.4 Selecting Cases of Positive Deviance
A systematic approach to the selection of positive deviance cases is important to avoid biases that can result in missed lessons or misleading interpretations of the causal factors leading to exceptional performance. Under ideal circumstances, case selection must occur with reference to a broader population of cases that are identified based on a general causal model. This approach, however, is contingent on the availability and validity of information on the population – a condition that is often hard to meet in some countries, whether due to lack of information or lack of transparency – and on the quality and nature of the model (Reference GerringGerring, 2007: 106; Reference SeawrightSeawright, 2016).
Given a conventional understanding of the determinants of the outcome of interest, then, positive deviant cases can be selected because they are off the regression line in the direction corresponding to high performance, or outcomes that imply beneficial or honorable behavior or outcomes (Reference Spreitzer and SonensheinSpreitzer and Sonenshein, 2004). Yet positive deviant cases are more than just outliers. Rather, to qualify as examples of positive deviance, high performers must excel in a context in which most other comparable units do not perform well. To pinpoint such cases, case selection might proceed in a two-stage process in which the research first highlights low performers in a high-deviance region and then identifies high performers in an area characterized by low overall results (Reference WoolcockWoolcock, 1998). At the same time, such exceptional performance in a field of otherwise poor results must be sustained lest it arises due to chance, ephemeral factors, or particular circumstances.
As with any model, the utility of a regression line approach to selecting positive deviant cases also depends on the ways in which the outcome of interest is conceptualized and measured. Many goals central to development policy are multidimensional. A case or unit may perform exceptionally well with regard to some aspects of an outcome but may exhibit average or inferior values with respect to others. For example, in analyses of the quality of health care, the empirical focus of the examples discussed in the next section, dependent variables can focus on the infrastructural, process-oriented, or health outcome aspects of quality (Reference DonabedianDonabedian, 1988). Which aspects of quality are emphasized may be self-evident if there is a professional consensus on which dimensions are the key drivers of human development outcomes, or the choice may depend on the objectives of the researcher or development program.
In practice, it may be impossible to employ such a systematic approach to case selection due to a lack of data or restrictions on access to data. Should case studies of positive deviance then be abandoned? Not necessarily. It may be possible to identify actors or organizations that are deemed to be high performers in a less rigorous way by drawing on insights from local officials or community members. These actors know the terrain exceptionally well because they are embedded in the communities where policies or programs are implemented and have implicit or explicit rankings of local actors with respect to performance on development outcomes. Feedback from local actors can be elicited through interviews or by the administration of a survey, depending on resource availability.
That said, the measurement of positive deviance based on the reports of key informants is obviously vulnerable to a host of biases, even when respondents have the best of intentions. For example, officials or other stakeholders may inadvertently provide invalid information due to recall biases or because they wish to highlight favored actors or organizations at the expense of others. At a minimum, when cases are selected purposively, it is important to choose cases for in-depth analysis from a diverse sample that includes organizations varying in size, resource endowments, geographic location, and other factors that might affect performance (Reference Bradley, Curry, Ramanadhan, Rowe, Nembhard and KrumholzBradley et al., 2009: 3). As an alternative or supplemental approach, researchers may draw on findings and insights derived from studies in other, comparable settings to use as a benchmark for the determination of high-performing organizations or actors.
10.3 Examples: What Explains High-Performance Service Delivery in Jordan and Morocco?
In this section, I aim to flesh out the potential contributions and limitations of case studies of apparent positive deviance with reference to two studies carried out in preparation for a World Bank report highlighting local success stories in service delivery in the Middle East and North Africa (Reference Belkâab and CammettBelkâab and Cammett, 2014; Reference Brixi, Lust and WoolcockBrixi et al., 2015; Reference Rabie, Lust, Clark, Cammett and LinnemannRabie et al., 2014). The first entailed a case study of high-performing public health centers in Jordan and evolved in two stages. The first part of the study was based on in-depth, qualitative analyses of the drivers of high performance at the facility level. The second stage of the research built on the findings of the case study to design a quantitative study based on a nationally representative sample. The Jordanian case therefore shows how a case study of positive deviance can fit into a larger mixed methods research design. The second study also focused on the drivers of high-quality health care through a case study of high-performing public health centers in provincial and semirural areas of Morocco. By describing the methods and findings of case studies of positive deviance in Jordan and Morocco, and detailing how they have contributed to larger-scale projects in the two countries, I hope to illustrate the value and limitations of this approach for development policy.
10.3.1 Jordan
In March 2014, I was part of a team of researchers that visited six health centers in four of Jordan’s twelve governorates, collecting data for a case study of positive deviance in the delivery of primary health care in the Jordanian public health system. The first order of business was to identify a set of indicators against which high performance would be assessed. A growing body of research emphasizes that several process-oriented aspects of health care quality, such as provider effort and human resource management at the facility level, are key factors contributing to the poor quality of health care in developing countries, with consequences for health outcomes (Reference Das and HammerDas and Hammer, 2014; Reference Dieleman, Gerretsen and van der WiltDieleman, Gerretsen, and van der Wilt, 2009; Reference Harris, Cortvriend and HydeHarris, Cortvriend, and Hyde, 2007: 450–452). At the same time, the quality of physical infrastructure may enhance or inhibit the ability of health workers to fulfill their professional obligations, and therefore it was also imperative to collect data on the availability and condition of medical equipment and supplies.Footnote 4
In order to identify appropriate cases, we aimed to benchmark the performance of the selected centers in the past three to five years against others within the same urban or rural region with similar socioeconomic profiles. Given the limitations of time, resources, and existing data on service quality, case selection faced significant challenges. In the end, the choice of facilities relied heavily on insights provided by the Ministry of Health and referrals made by the Health Care Accreditation Council (HCAC),Footnote 5 a nonprofit organization in Jordan that implemented an accreditation program funded by USAID. As a result of its role in identifying appropriate health centers to undergo the accreditation process, HCAC staff members had collected a database on a sample of high-performing facilities in the public health system and were willing to advise us on case selection even though they were not permitted to share the full database with us. Input from the HCAC was especially valuable for the purposes of carrying out case studies of positive deviance because the participating facilities had been selected in the first place due to a longer record of high performance, which was enhanced after completing the accreditation program.
During site visits, the team carried out a series of open-ended interviews with the chief medical officers, doctors, nurses, administrative staff, and, where applicable, the members of local health councils to gather information on multiple indicators of health care quality and to probe the underlying drivers of the quality of care. In conjunction with administrative data provided by the Ministry of Health and the centers themselves and with interviews with local, regional, and national government officials, the data collected during these visits provided the information used to write the case studies. Given that the accreditation process itself was an important driver of quality, the team was careful to collect data illuminating performance on quality indicators prior to and following participation in the program.
The findings of in-depth studies of the selected health centers yielded a variety of insights about the determinants of high performance in the delivery of primary health services in Jordan. The fact that the accreditation program was implemented in a nonrandom fashion and the design of the study prevented an analysis of the independent effects of participation in the program. However, in-depth interviews at the facility level clearly demonstrated that the act of preparing for and undergoing the evaluation for accreditation drove major improvements in the quality of care, even at facilities that already performed above their peers. In particular, the findings indicated that preparation for accreditation in and of itself provides one mechanism for improving administrative procedures and the quality of service delivery. The procedures and requirements of the accreditation process helped to establish clear rules and regulations, increase transparency to clients and staff, develop more effective staff monitoring, and give greater voice to and participation for staff within facilities as well as communities.
Beyond adherence to externally imposed standards and practices, leadership emerged as a key factor improving the operation of facilities. Indeed, strong, proactive chief medical officers at the health centers multiplied the positive outcomes from reforms implemented for accreditation or for other purposes. The head of one health center emphasized that he set clear expectations for his employees, fostered a collaborative work environment, and ensured that adequate feedback mechanisms existed for staff members to convey their concerns. Conversely, administrative reforms were less effective under weak leaders. For instance, the chief medical officer in one center noted that staff meetings were not held, since people came on different days and the assumption was that “everyone knows their job.”
A third finding pointed to the role of social networks in facilitating voice and participation and potentially for improving the extent and quality of service provision. Personal ties, whether among family or friends and neighbors, are particularly valuable for establishing priorities, extending public health outreach in the community, and mobilizing resources to support the activities and development of health centers. Interviews with staff and members of local committees at the selected facilities indicated that shared identity, especially a common tribal affiliation, was especially valuable in rural areas but somewhat less relevant in urban centers, where people from diverse regions and backgrounds intermingle and many residents do not come from the major Transjordanian tribal families. In particular, when members of local health committees and local residents hail from the same tribes, the staff and governing board of a given center had an inbuilt channel through which to reach the community with vaccination and other health campaigns and to encourage greater compliance with medical advice. At one center, members of the local health committee claimed that the center’s public outreach initiatives are relatively successful because they come from the same families as the targeted beneficiaries.Footnote 6 Beyond family ties, we also found that social networks and linkages to important local social institutions such as mosques, youth centers, and local schools also assist health workers to accomplish their tasks. This insight provides a clear example of the ways in which case studies of positive deviance can generate new variables and causal pathways.
The information gathered through the case study of successful facilities helped to inform a subsequent quantitative study that the team undertook in Jordan. Based on a national sample of 100 health centers, the analysis aimed to explain variation in standard measures of provider effort, including absenteeism, adherence to clinical practice guidelines, rights-based practice, and time spent with provider. The main independent variables captured various aspects of within-facility governance, a neglected variable in the public health literature on quality and, more generally, in research on governance and the quality of service delivery. These included indicators to capture the presence of various management practices at the health center, including monitoring of professional staff, sanctions for underperformance, and financial and nonfinancial rewards for good performance. The main control variables tried to address both top-down monitoring and a limited component of citizen engagement, notably the presence of a community health committee within the local primary health center (PHC). Many other factors were not measured, however, due to a lack of resources and time. The findings indicated that monitoring is the most consistent predictor of improved provider effort, including adherence to clinical practice guidelines, the provision of rights-based and responsive care, and time spent with patients in clinical examinations. When considered independently, sanctions were either not associated with provider effort or were associated with worse provider effort, a finding that fits with existing studies in the management literature on the importance of positive work environments (Reference EdmondsonEdmondson, 2003) and accords with what we saw at high-performing facilities in the case studies. The use of nonfinancial rewards to recognize good performance was not associated with provider effort, but this result likely obtained because the usage of this approach was too infrequent to be able to detect an effect.
Due to resource constraints, the study could not incorporate much attention to the impact of social context on the quality of service delivery. However, insights on the role of social ties in driving higher-quality health care derived from the case study of positive deviance facilities in Jordan have informed subsequent projects that my collaborators and I have undertaken. Thus, while we were not able to fully capitalize on the findings from deviant cases in the quantitative study in Jordan in order to build a new, more comprehensive model, we aim to do so in future research in relevant contexts.
10.3.2 Morocco
In May 2014, I traveled to Morocco to carry out a parallel case study of positive deviance primary health centers for the aforementioned World Bank report (Reference Brixi, Lust and WoolcockBrixi et al., 2015). Initially, the strategy for case selection adopted a systematic approach by identifying positive outliers from data collected through a quantitative, nationally representative sample of the quality of primary health care in public centers carried out by the World Bank and the Ministry of Health (MOH). Given time constraints, it was not possible to visit all the centers I initially requested and, instead, I selected centers in several provinces with guidance from national and regional MOH officials. Although the sampling procedures were nonrandom, several systematic criteria guided the choice of facilities. First, half of the facilities participated in the Concours Qualité (CQ), a program introduced by the MOH to set up province-level competitions between public health facilities to be recognized for exceptional performance.Footnote 7 Second, they exhibited notable and sustained improvements in one or more key dimensions of the quality of health care provision in the last five years. Third, the centers were not located in major urban areas, where quality tends to be somewhat higher.
A consistent array of factors was observed at well-performing Moroccan public health centers, including larger patient loads; reasonable wait times despite high demand; good management and availability of stocks, consumables, and equipment; and detailed and regular maintenance of patient medical records, among other factors. Based on interviews with staff members at the facilities and MOH officials from multiple administrative jurisdictions, at least five factors seemed to be associated with improvements in the quality of care. First, the presence of dynamic, energetic, and visionary leadership helped to motivate staff members to carry out their duties competently and thoroughly, introduce new procedures and management systems, institute a “culture of quality” among staff members, inspire confidence in the community and local government officials, and attract additional resources to the facility. Second, a sense of a shared mission and collaborative ethic helped to motivate the staff and ensure that all staff members know and fulfill their responsibilities. Like leadership, a team spirit and a relatively flat organizational culture are especially important in the context of resource scarcity, in which staff members are required to make do with less.
Third, meetings with staff members from facilities that had and had not previously competed in the national CQ program demonstrated that the initiative has had a clear, positive effect on the management and administration of participating centers. The mere act of enrolling in the program generates significant transfer of knowledge and the adoption of new procedures within participating health centers. At the same time, it became clear that the program may be unsustainable because it requires a major investment on the part of staff members, taking them away from their primary professional duties, and has the paradoxical effect of increasing pressures on the successful facilities by boosting their patient load.
Fourth, effective coordination between the head doctor and local health officials was critical for the health centers to meet the needs of the populations in their catchment areas. Regular exchanges between the administrators of facilities and officials from the provincial delegation help to ensure that stock-outs of medications and equipment do not occur, that facilities receive resources when available, that local solutions are developed for local problems, and that good administrative and management practices are disseminated. Finally, partnerships with nearby groups, organizations, and prominent individuals help health centers to meet the needs of their surrounding communities more effectively. These findings were elaborated in a case study of positive deviance in the Moroccan primary health sector (Reference Belkâab and CammettBelkâab and Cammett, 2014). It is important to note that MOH officials explicitly requested that we refrain from naming specific health centers that performed exceptionally well in order to avoid resentment among staff members at other facilities not featured in the case study.
10.3.3 The Contributions of Positive Deviance Case Studies in Jordan and Morocco for Mixed Methods Research Designs
Case studies of high-performing health centers in two countries in the Middle East and North Africa suggest that, even in different sociopolitical contexts, some common factors at the facility and community levels affect the quality of service delivery. For example, leadership and management practices were important proximate determinants of quality in both contexts. This finding helped to inform a subsequent quantitative study in Jordan centered on the factors within facilities that incentivize greater provider effort.
Perhaps more interesting are the findings that facility health committees composed of elites, who hold either formal or informal positions of influence at the local level, and the extent and nature of social ties may have important causal effects on the quality of social services. From the perspective of the large body of social science research on clientelism and elite capture (Reference Bardhan and MookherjeeBardhan and Mookherjee, 2012; Reference Dasgupta and BeardDasgupta and Beard, 2007; Reference KhemaniKhemani, 2015; Reference PlatteauPlatteau, 2004), the potentially constructive role of elites in driving improved process and outcome-based measures of health care quality may be surprising and deserves greater investigation. This finding raises questions about the conditions under which elites are likely to exert efforts to improve the quality of services and, more generally, to play a positive role in promoting inclusive access to services for local communities and to encourage citizens to adopt more health-seeking behavior. They also call for additional research into the nature of social ties and how they may mediate the effects of different types of interventions on the quality of social services. In both Jordan and Morocco, at least some local elites appear to have leveraged their influence to improve the quality of care delivered and to ensure that nonelites take greater responsibility for their own well-being.
10.4 Conclusion: Can Positive Deviance Inform Development Policy?
In this chapter, I have argued that positive deviant cases have the potential to contribute in important ways to the formulation and execution of development policy in at least two overarching ways. First, building on insights from the methodological literature on case selection in the social sciences, I reiterate the value of deviant cases – whether positive or negative – for identifying previously overlooked factors that may help to explain a phenomenon and distinct causal pathways that account for causal heterogeneity. At the same time, it is critical to add the caution that potential causal factors identified through small-N research based on selection on the dependent variable must be assessed more broadly before informing development policy and practice. In particular, follow-up studies, whether qualitative or quantitative, which select on the hypothesized explanatory variable would be a valuable complementary component of a larger research strategy.
Second, deviant cases, and especially positive deviant cases, can serve an important inspirational function for multiple audiences, whether development practitioners, policy-makers, or organizational staff and citizens who experience and shape development policy on a quotidian basis. By signaling that beneficial reforms or outcomes are possible, even in a resource-constrained environment, positive deviant cases can provide hope, empowering others to pursue similar approaches or outcomes. At the same time, the depiction and dissemination of lessons from positive deviant cases must proceed with caution in order to avoid inducing resentment on the part of similar actors or organizations that were not singled out for commendation.
Ultimately, the potential value of the study of positive deviant cases for development policy is contingent on a number of factors, only some of which are in the control of the researchers. Given data availability and consensus over the indicators that enable the classification of high performance, researchers can and should follow systematic principles of case selection by identifying and choosing positive outliers in carefully specified models that build on the state of knowledge on the outcome of interest.
In practice, however, many real-world factors, which are largely out of the control of researchers, may impede adherence to best practices in case selection. First and foremost, officials or other gatekeepers must be willing to share data or facilitate the collection of data that enables the identification of positive deviance. Once the cases have been identified and the lessons extracted from their experiences, other conditions must be in place to ensure that these examples spur reforms or constructive outcomes. In particular, other actors or organizations that have not distinguished themselves must be ready to experiment with new approaches or systems. If they are resistant to adopting lessons generated from more successful peer institutions, then the dissemination of best practices based on positive deviance will be hindered. Furthermore, government officials and other stakeholders must exhibit the political will to promote the dissemination of such practices.
The very fact that the lessons of positive deviance are derived from outlier cases raises the question of whether their experiences are idiosyncratic or can be generalized. For example, case studies of high-performing health centers in Jordan and Morocco underscored the value of strong and effective leadership at the facility level for high-quality service provision. If leaders are born and not made, or at least not easily cultivated, then this finding is less useful for generating policy prescriptions. In this case, the best that development practitioners and government officials can hope for is to recruit and retain employees with demonstrated and appropriate leadership skills. In most cases, however, policy lessons will not be so specific to individual personality traits or other idiosyncratic factors. Furthermore, factors that seem difficult to replicate may be less idiosyncratic than they appear; as studies of leadership attest (Reference Nohria and KuranaNohria and Kurana, 2011), even this quality can be fostered.
To the extent that context matters, the same practices often do not have the same effects in different places, potentially limiting the generalizability of the lessons of positive deviance for development policy. Fortunately, the case study method itself may compensate for this potential drawback. Because case studies enable deep attention to context, it is possible to identify the factors that facilitated success and to adapt the lessons to other contexts or to identify similar contexts where the lessons may apply more readily. Attention to context also avoids a cookie-cutter or “best practices” approach to development policy by pointing to the ways in which local factors may moderate or completely alter the effects of a particular policy prescription and must be harnessed to design appropriate policy interventions.
11.1 Introduction
Analytic narratives (Reference Bates, Greif, Levi, Rosenthal and WeingastBates et al., 1998, Reference Bates, Greif, Levi, Rosenthal and Weingast2000; Reference Levi and MayntzLevi, 2002, Reference Levi, Shapiro, Smith and Masoud2004) involve selecting a problem or puzzle, then building a model to explicate the logic of an explanation for the puzzle or problem, often in the context of a unique case. The method involves several steps. First, the use of narrative to elucidate the principal players, their preferences, the key decision points and possible choices, and the rules of game, all in a textured and sequenced account. Second, building a model of the sequence of interaction, including predicted outcomes. And, third, the evaluation of the model through comparative statics and the testable implications the model generates. The analytic narrative approach is most useful to scholars who seek to evaluate the strength of parsimonious causal mechanisms in the context of a specific and often unique case. The requirement of explicit formal theorizing (or at least theory that could be formalized) compels scholars to make causal statements and to identify a small number of variables as central to understanding the case.
Case studies abound in the study of development. A weakness of case studies per se is that there typically exist multiple ways to interpret a given case. How are we to know which interpretation makes most sense? What gives us confidence in the particular interpretation offered? This problem is particularly difficult where the uniqueness of the situation precludes the collection of a data set that encompasses multiple cases. Many scholars augment their case study with a model. The model adds some discipline to the account. For example, observed choices must be consistent with the assumption about preferences in the model. A model is necessary, but not sufficient: that is, a model alone does not an analytic narrative make.
The analytic narrative approach provides a means to help get around these questions. The essence of many cases is unique, including the French Revolution, the American Civil War, or the surprising and quick development of Spain following the death of long-time dictator, Francisco Franco, in 1975. Analytic narratives deal with these cases by building a model that has multiple implications, and then testing an implication of the model that provides the possibility for both confidence in the claims and comparison across cases.
For example, Reference Weingast, Bates, Greif, Levi, Rosenthal and WeingastWeingast’s (1998) case study of the American Civil War builds on a unique feature of American institutions to explain long-term political stability – namely, the “balance rule”: the idea that both Northern free states and Southern slave states would be admitted in pairs, giving each set of states a veto over national policy. This institution fell apart in the 1850s. Weingast tests his account of this failure by using game theory to reveal a path not taken, given the interest calculations of those making choices about what path to take. This enabled him to estimate a counterfactual involving what would have happened had a contingency in the case study not occurred. Thus, the main thesis of the case – the balance rule – is unique and cannot be tested directly, but other implications of the approach can be tested so as to give confidence in the overall account.
A second example addresses the effect on public goods provision of a 2004 decentralization reform in postconflict Sierra Leone (Reference Clayton, Noveck and LeviClayton et al. 2015). The specificities of most laws are unique; even when the words are replicated, implementation varies across and within countries. In this instance, the narrative reveals the key stakeholders at the local level: elected councilors and paramount chiefs. The interests of the former should lead them to prefer successful implementation, ceteris paribus; but in some localities they conceded to the paramount chiefs, who preferred the status quo. Given other reforms that increased the power of the councilors and reduced that of the chiefs, the question becomes why the councilors deferred. What are the relevant comparative statics? This question produced a series of testable implications. The case, although unique, sheds light on the more general problem of the variation in the impact of decentralization on the delivery of health and education services.
This chapter proceeds as follows. In Section 11.2, we discuss criteria for case section. Section 11.3 discusses identifying processes and mechanisms, while Section 11.4 discusses the limitations of the approach. In Section 11.5, we briefly discuss the implications of the approach for the development context. Our take-aways follow.
11.2 Criteria for Case Selection
The analytic narrative approach combines a commitment to rational choice, a deep interest in a particular case, a method for devising a generalizable model of the case, and a means of providing empirical evidence, even in unique cases.
The combination also entails an aim most area specialists lack: to go beyond detailing the case to elaborate more general conditions for the problem or puzzle. This exercise requires criteria for selection of cases other than their intellectual appeal as puzzles demanding solutions. Standard approaches to case selection emphasize the bases for choice among a sample of cases which are informative about the causal chain of interest, because of the absence, presence, or extreme values of key variables. One traditional method advocates pairs of cases that are either “most similar,” hopefully allowing the analyst to identify similar mechanisms in the two cases, or “most different,” hopefully allowing the analyst the ability to isolate a mechanism that accounts for the differences. These traditional methods fail when more than one causal variable is relevant.
Bearing similarities to the analytic narratives approach is process tracing (Reference George and BennettGeorge and Bennett 2005; Reference CollierCollier 2011; Bennett, Chapter 4, this volume), which shares an emphasis on both sequencing and fine-grained description as means for making causal inferences. Process tracing also shares a concern with generating testable implications, but its emphasis is on key variables rather than the key actors, their interaction, and their strategies. This makes game-theoretic analysis largely irrelevant to process tracing.
Analytic narratives include features that make the cases amenable to modeling, which not all puzzles or problems are. Essential to the model building is the choice of cases in which the key actors interact strategically. That is, the choices of one actor depend on the choices of the other. In addition, analytic narratives consider situations that can be modeled as an extensive-form game, which generates a subgame perfect equilibrium.
Another necessary feature of an analytic narrative is the opportunity to get at an important process or mechanism not easily accessible through other means. For example, the extensive-form game allows the analysis to demonstrate the existence of a self-enforcing institution that often solves an important economic or political problem through creating a credible commitment. The advantage of the game is that it reveals the logic of why, in equilibrium, it is in the interest of the players to fulfill their threats or promises against those who leave the equilibrium path.
The formalization itself is not a requirement of a successful analytic narrative; indeed, in some cases, there are too many actors and no benefit from reducing the multiple players to the small number required for a game-theoretical model. Levi’s case on conscription (Reference Bates, Greif, Levi, Rosenthal and WeingastBates et al. 1998) illustrates how one can still use the logic of extended form games to assess the strategies and actions – and paths not taken – without formalization. Another example is Reference FerraraFerrara’s (2003) analytic narrative of the Burmese uprising in 1988 as a means to understand both a particular historical event and the more general question of the relationship between coercion and protest.
The final expectation of an analytic narrative is that the causal mechanisms and the structures or relationships must be generalizable to other cases under specifiable conditions. We deal with this issue below.
11.3 Identifying Sequence and Mechanisms
Analytics, in this approach, refer to the building of models derived from rational choice, particularly the theory of extensive-form games.Footnote 1 The steps toward building the model include:
First, extracting from the narratives the key actors, their goals, the sequence of options available to an actor at a given moment, and the effective rules that influence actors’ behaviors.
Second, elaborating the strategic interactions that produce an equilibrium that constrains some actions and facilitates others. By making clear and explicit the assumptions about who the key actors and their preferences are, it is possible to challenge the assumptions to produce new insights and competitive interpretations of the data.
Third, the equilibrium analysis leads to comparative static predictions that produce testable implications even if they’re not the main assertion of the case.
We emphasize this third criterion. An important advantage of relying on game theory is that this method often produces comparative statics – that is, predictions about how the equilibrium shifts in response to changes in the exogenous variables. This approach allows the analyst to identify the reasons for the shift from one equilibrium to another. It therefore produces expectations of behaviors in the form of testable implications if the key actors are staying on the equilibrium path and if they are not. A case study that includes a model may involve the first two criteria, but generally not the third. This is especially true for unique cases where it is hard to test the model directly. Both authors have written many case studies of this type. These cases may provide insights, but they are not analytical narratives (see, e.g., Reference LeviLevi 1988; Reference Weingast, Morris, Oppenheimer and SoltanWeingast 2004). Analytic narratives require testable implications derived from the comparative statics that the narrative helps reveal.
The narrative of analytic narratives establishes the principal players, their goals, and their preferences while also illuminating the effective rules of the game, constraints, and incentives. Narrative is the story being told but as a detailed and textured account of context and process, with concern for sequence, temporality, and key events. By meeting these criteria, the narrative offers a means to arbitrate among possible explanations for observational equivalences – that is, two distinct processes that lead to the same outcome.
Comparative statics are crucial for comparative research because they generate hypotheses of what could have taken place under different conditions. Comparative statics therefore clarify the relationship between the key endogenous and exogenous variables. Moreover, the consideration of “off-the-equilibrium-path” behavior typically reveals reasons and reasoning for why actors took one path and not another. Indeed, what actors believe will happen were they to make a different choice typically influences the choices they do make. As Niall Reference FergusonFerguson (1999) observed in his study of the causes of World War I, to understand why Britain entered what would otherwise have been a continental war, we need to know what the British believed would happen had they not entered the war. Another important aspect of the game-theoretic approach is that the off-the-path behavior of an equilibrium disciplines each player’s beliefs, for they must be consistent with all the other player’s strategies (see Reference Weingast, Tetlock and BelkinWeingast 1996).
For example, consider the illustrative “deterrence game” in the appendix of Analytic narratives (Reference Bates, Greif, Levi, Rosenthal and WeingastBates et al. 1998). Two countries interact: the home country and an opponent. The home country maintains a large, expensive army; the opponent does not attack. Is the large army the reason for peace as it deters the opponent from attacking? Or is it a waste of resources because the opponent has no interest in attacking? These two hypotheses are observationally equivalent: both offer explanations for why the opponent does not attack, yet they differ dramatically as to the reason for the observation. Different people have different beliefs that can only be understood contextually:
[T]he observationally equivalent interpretations rest on markedly different theories of behavior. To settle upon an explanation, we must move outside the game and investigate empirical materials. We must determine how the opponent’s beliefs shape their behavior. This blend of strategic reasoning and empirical investigation helps to define the method of analytic narratives.
This approach provides the researcher with some discipline. As the deterrence game illustrates, absent a game and an equilibrium structure, it is possible to posit a wide range of beliefs that motivate action. How do we choose among these different accounts? In the context of a game, beliefs about another player’s actions are part of the equilibrium. Not just any sets of beliefs will work. In the deterrence game, the opponent must have a belief about how the home country will react to an attack; and, in equilibrium, this belief must hold in practice.
This form of explicit theory provides criteria to enable the researcher to distill the narrative and ensure that the explanation need not rely too much on factors outside the model.
11.4 Overcoming the Limits of Analytic Narratives
The analytic narrative approach, at least in its original formulation, had several potential limitations, some recognized by the authors and others revealed by various critiques.
11.4.1 Generalizing
The Achilles’ heel of analytic narratives – as with any approach to case studies – is in the capacity to generalize, given that each narrative represents an effort to account for a particular puzzle in a particular place and time with a model and theory tailored to that situation. Even so, it is possible to use the cases to make some more general points.
Although the approach is not straightforwardly deductive, it nonetheless relies on rational choice, which is a general theory of how structures shape individual choices and, consequently, collective outcomes. Rational choice, particularly in its game-theoretic form, highlights certain properties of the structure and strategic choices that arise. Although the specific game may not be portable, it may yield explanations that can be tested in the form of collective action problems, principal–agent issues, credible commitments, veto points, and the like. Analytic narratives provide a way to suggest the characteristics of situations to which these apply and in what ways. For example, the models of federalism, as initially developed by William Reference RikerRiker (1964) and further developed by Weingast and his collaborators (Reference WeingastWeingast 1995; Reference WeingastMontinola, Qian, and Weingast 1995), are useful in explicating a large number of problems in a wide range of countries, including the case Reference Weingast, Bates, Greif, Levi, Rosenthal and WeingastWeingast (1998) addresses in his Analytic narratives chapter.
Moreover, the analytic narrative approach also demands identification of causal mechanisms. A wide range of mechanisms, such as emotions, resentment, and other aspects of behavioral economics, can offer a fine-grained explanation of the link between actions and alternatives (Reference ElsterElster 1998, Reference Elster1999). Others have fruitfully made these links in such situations as insurgency in El Salvador (Reference Wood, Goodwin, Jasper and PollettaWood 2001) and violence in Eastern Europe and the Balkans (Reference PetersenPetersen 2002, Reference Petersen2011) while meeting the requirement that they “generate new predictions at the aggregate or structural level” (Reference StinchcombeStinchcombe 1991: 385).
11.4.2 Surprise, Contingency, and Conjunction
Daniel Reference CarpenterCarpenter’s (2000) critique of Analytic narratives raised several issues that the approach needed to confront to fulfill its promise. Carpenter worried that we narrowed the conceptualization of narrative in a way that was likely to neglect the surprises history offers, the contingencies that affect outcomes, and the conjunctures that make parsimony so difficult. The first and last are easiest to address since nothing about the method precludes either. The approach actually makes it possible to take surprises into account since they often take the form of events that would change comparative static outcomes. Reference de Figueiredo, Rakove and WeingastDe Figueiredo, Rakove, and Weingast (2006) illustrate one means by which game-theoretic models can be generalized to encompass surprises. Those American colonists already suspicious of Britain were apt to believe the worst interpretation of any British act and to believe that large-scale rebellion was inevitable. The result was a self-confirming equilibrium to explain the surprise element in the eruption of the American Revolution. Nothing is foreordained by an analytic narrative, which, on the contrary, often reveals factors as significant that we might not otherwise have noted. For example, in Gretchen Reference HelmkeHelmke’s (2005) analysis of courts in autocratic regimes, her counterintuitive finding is that a nonindependent judiciary has the power, under certain circumstances, to rule against its government.
Carpenter also raises conjunctural analysis: the idea that multiple, interlacing factors occur at once – say, a war and a depression – so that the causal factors are difficult to disentangle. Reference CarpenterCarpenter’s concern (2000: 657–658) is that “[i]f one changes the values of two variables at once, or renders the values of one variable dependent on those of another – precisely as historians who rely on conjunctures tend to do – then the embedded independence assumption comes, well, with high costs.” Reference SkocpolSkocpol (2000) and Reference Katznelson and MilnerKatznelson and Milner (2002) share Carpenter’s concern.
Conjunctures are a problem for every form of analysis, not just analytic narratives. Moreover, by relying on game theory, analytic narratives may be uniquely suited to addressing conjunctures. By providing a specific model of events, a game-theoretic model helps disentangle conjunctures by potentially making predictions about what would have happened had only one of the conjoining events occurred instead.
Carpenter claims that contingency disappears from the analytic narrative approach because, as he perceives it, there is less likelihood of multiple equilibria – that is, alternative stable states of the world. He goes on to say that “[i]t would have been theoretically appealing for the authors to give examples where history in some way ‘selects’ some equilibria and makes others impossible (kind of like a trembling hand, or stability, or coalition-proofness criterion)” Reference CarpenterCarpenter (2000, 657).
But this criticism reflects a misreading of the analytic narrative approach. The use of game theory means that in many instances multiple equilibria will arise. Hence, the existence of multiple equilibria is part of the analytic narrative approach even if the case studies in the original volume do not make that evident. Contingency in the form of multiple equilibria is therefore a feature of the approach.
Even when there are clear focal points and strategies, factors in the situation can change unexpectedly. Some contextual changes may have clear and significant consequences, others have butterfly effects, and still others have little or no effect. The narrative is crucial here for sorting out what matters for what. In Rosenthal’s Analytic narratives chapter, the potential birth of a Catholic heir to James II affects the calculations of both monarch and elites, but its importance lies in how it changes the strategies of the elites even unto the point of revolution (Reference Rosenthal, Bates, Greif, Levi, Rosenthal and WeingastRosenthal 1998: 92). Why elites resorted to revolution rather than peaceful institutional change becomes apparent through the narrative and the associated model.
Uncertainty and lack of information are prevalent features of the unraveling of events in history, and they are major bases of contingency. Ahlquist and Levi’s work on leadership illustrates the effect of uncertainty (Reference Ahlquist and LeviAhlquist and Levi 2011, Reference Ahlquist and Levi2013). They find that followers, members, and citizens are very concerned to have competent representation; followers, members, and citizens therefore do their best to figure out who will be a good leader based on the track record of potential candidates. Nonetheless, unknowns remain, often in the form of other variables that are uncertain. For example, no one can know for sure how opponents will react to a given leader, what the economy will do, or how leaders will respond under circumstances distinctive from those in which they were selected. This uncertainty has direct consequences for other facets of the organization, such as its governance arrangements and mechanisms of accountability. If members knew and understood all the implications of their original choice, they might make a different one – if they could. Yet, uncertainty instead leads them to coordinate around a specific leader and leadership style, and they may well continue to maintain that person in office for years.
Analytic narratives must include problems of randomness or contingency, but not if they are too extreme. The example of unions makes the point. Members address their leadership problem in the face of uncertainty about the occurrence of strikes and only partial information about the reaction of employers to their demands. Because the interactions between unions and managers involve unpredictable elements, and because leaders cannot always deliver what they promise, leadership turnover may result. However, as Reference Ahlquist and LeviAhlquist and Levi (2013) show, this turnover is not only relatively rare but also highly delimited by the organizational culture and governance arrangements that ensure new leaders will share many of the characteristics of their predecessors.
The analytic narrative approach rests on cases where there is some, but hardly complete, contingency in the path of history, cases that the model helps in understanding what was likely to happen. Nothing about the approach, however, limits it to cases of determinateness or low contingency. Extensive-form games have long proved useful in studying settings of high uncertainty and contingency.
11.5 Analytic Narratives for Use in Development Policy and Practice
In this section we turn, briefly, to suggest the implications of the analytic narrative approach to problems of development.
Economists have long proposed an economic role for political institutions, such as the market infrastructure embodied in the provision of secure property rights, enforcement of contracts, and, generally, the provision of justice and the rule of law (Reference WeingastWeingast 1995). Governments that use violence against minorities and opponents, confiscate citizens’ wealth, and create economic privileges (such as dispensing monopoly rights) fail to provide adequate market infrastructure. As Adam Smith recognized more than two centuries ago, the risk of violence and of plunder leads men to avoid hard work, initiative, and investment. In discussing settings in which “the occupiers of land in the country were exposed to every sort of violence,” Smith argued that “men in this defenceless state naturally content themselves with their necessary subsistence; because to acquire more might only tempt the injustice of their oppressors” (Reference SmithSmith 1776: III.iii.12:405). Further, a “person who can acquire no property, can have no other interest but to eat as much, and to labour as little as possible” (Reference SmithSmith 1776: III.ii.9:387–88).
But, if secure property rights, enforcement of contracts, and the provision of justice are necessary for economic development, how are such institutions built and, especially, sustained?
Reference North and WeingastNorth and Weingast (1989) developed the hypothesis of credible commitments to answer this question. Governments seeking to implement the economists’ prescriptions for political institutions had to commit to honoring rights of citizens and to use agreed upon political procedures to make political decisions. They developed their hypothesis in the context of a unique case: the English Glorious Revolution of 1688–1689. Although this revolution and its institutional consequences were unique to that case, North and Weingast provided some important evidence favoring their larger, general argument about credible commitments. Focusing on public finance, they showed that the ability of the English government to borrow money changed dramatically. Government debt had never been much above 5 percent of estimated GDP in the seventeenth century. But in the eight years following the Glorious Revolution, it rose by nearly an order of magnitude, to 40 percent of estimated GDP. Because debt repayment depends critically on credible commitments, the massive increase in debt in a short time suggests that a new mechanism for making credible commitments had emerged.
Sure enough, subsequent studies have identified some of the devices used to create credible commitments and have leant support to the hypothesis.Footnote 2 First, Reference CoxCox (2012) has shown that a number of other variables also increased dramatically, consistent with the credible commitment hypothesis. Specifically, per the North and Weingast narrative, Cox demonstrates that parliament gained control over taxation and the issuance of public debt. Similarly, the ministerial responsibility system emerged: while parliament faced difficulties in holding the king accountable for public decisions, they could hold the king’s ministers accountable, forcing them to honor parliament’s interests. Second, scholars have undertaken a range of studies of public debt at similar events. For example, Reference SummerhillSummerhill (2015) has shown that nineteenth-century imperial Brazil provided the institutions for credible commitment to public debt, yet it failed to provide the institutional foundations for private financial markets and hence this fundamental basis for economic development. Reference Mo and WeingastMo and Weingast (2013: ch 4) reveal the means by which the South Korean regime under President Park Chung provided credibility to its promises to honor property rights and a range of other programs, such as education, underpinning that country’s economic development.
As a second illustration, consider political stability, another element widely agreed as important for economic development. Coups, civil wars, ethnic conflict, and other forms of disorder cripple a country’s ability to develop. Reference Cox, North and WeingastCox, North, and Weingast (2019) show that disorder in the form of violent takeover of regimes occurs surprisingly often in the developing world: the median regime of the poorest half of countries lasts only seven years. Just how do a minority of countries provide for political stability?
Reference Mittal and WeingastMittal and Weingast (2012) provide three conditions for political stability, one of which they call the “limit condition”: the idea that all successful constitutions reduce the stakes of power, for example, by providing incentives for political officials to honor a range of citizen rights. Limited government does not imply small government (as modern political debate suggests), but a government that can honor restrictions on its behavior, such as abiding by election results, refraining from the use of violence to repress enemies, and, generally, honoring citizen rights.
The logic of the limit condition is that high stakes make it much more likely that people who feel threatened by the government will support coups. For example, landowners in Chile under the presidency of Salvadore Allende supported the military coup in 1973 to protect themselves. Similar events led to disorder in Spain (1936–1939) and Kenya (2007–2008). One way that constitutions reduce the stakes of politics is through various forms of countermajoritarian institutions.
Mittal and Weingast develop their hypothesis in the context of the American case, where the institutional features of the US Constitution are unique. Subsequent work has revealed similar features in a range of cases of stable constitutions. Countermajoritarian provisions serve two valuable roles in preserving political stability. First, they often aid in the instantiation of democracy. When groups see themselves as potentially worse off under democracy, they are likely to resist democratization. The reality is that powerful – and sometimes inimical – groups often have the power to hold up democratization, such as slaveholders in the early American Republic, Whites in South Africa in the 1990s, the supporters of the authoritarian regime of Francisco Franco in Spain following the death of the dictator in 1975, the military dictatorship in Chile in the late 1980s and early 1990s, and the communist regime in Poland in 1989. In each of these cases, countermajoritarian provisions to protect powerful groups aided democratization and, eventually, the lightening of the relevant constraints. Similarly, electoral laws often bias elections in favor of constituencies that favor the previous regime (Chile and Spain). These provisions also become part of the limit condition once democracy has been initiated.
Others have also used analytic narratives to discuss the difficulties of achieving transitions to democracy. Reference FerraraFerrara (2003) suggests conditions under which widespread protest and uprising has little effect, given the strategic use of coercion. His case is Burma, but the implications are more general. On the other hand, Reference NalepaNalepa (2010) considers the conditions under which pacts and negotiated settlements among elites facilitate the transition from autocracy to more open access regimes. She finds that the transition will prove unstable (if it even takes place) unless a specific type of limit condition holds: namely, that key players receive credible commitments that the “skeletons in their closets” will not be revealed or that they will receive amnesty for politically problematic behavior during the old regime. Her analytic narrative focuses on Eastern Europe but is applicable to a wide range of cases where transitional justice is at issue.
Analytic narrative approaches are also useful in understanding why some reforms succeed while others do not in countries experiencing development. Methodologically, this demands explicit recognition of the comparative statics, on the one hand, and the off-the-path-behavior, on the other. These features distinguish analytic narratives from other case studies, enabling them to reveal processes and causal mechanisms that might otherwise go undetected and to provide the bases for generalizations that might otherwise not be possible. Some authors already self-consciously attempt analytic narratives (e.g., Reference HosmanHosman 2009 on Nigeria’s failed oil policies), but many do not. Even so, we could get additional leverage on reform by transforming existing case studies into analytic narratives.
Among the many accounts of reform, we have selected two categories of cases where analytic narratives clearly increase explanatory power for the particular instance as well as making the findings transportable to other situations.
The first set is where the same laws have different effects in different places. The study of the Sierra Leone decentralization reforms (Reference Clayton, Noveck and LeviClayton, Noveck, and Levi 2015) display not only varying impact but also reasons for that variation. Sierra Leone has a long history of tension among elites at different levels of government and a more recent history of tensions among key local elites. This case explores the consequences of the latter for effective public service delivery once decentralization is introduced.
A rich literature (cited in the case study) reveals contradictory expectations of the effects of local interelite dynamics. The narrative reveals considerable county-level variations in power sharing between the traditional power-holders, the paramount chiefs, and the newer power-holders, the elected councilors. This variation provides an opportunity to derive expectations specific to the case and then assess their plausibility. Indeed, from the narrative the authors hypothesized that competitive relationships among the two improve services while collusion reduces their quality.
The first challenge was to offer a measure of elite dynamics to be used in a statistical investigation of the implications of differences. To test the impact of this relationship required, first, a measure of the nature of their interaction. The probability of collusion was operationalized by using data that reports on the following direct relationships: the median number of times councilors report having contacted a chief in the previous month, the percentage of councilors that report having had a dispute with a chief during the past month, and the percentage of councilors that report that they are related to a Paramount Chief either through blood or marriage.
But what accounts for the distinctiveness of power-sharing arrangements? The strategic interaction underlying the implementation of the reforms reveals two possible equilibria of collusion and competition. Digging deeper uncovers factors that possibly change the strategic interactions among the key actors and, thus, the comparative statics. The authors considered the gender, age, and party of the councilors as well as the degree of electoral competition. But the most telling explanatory factor was one that could only be known by knowing the case in depth: the proportion of councilors who were in the Civilian Defense Force (CDF) during the war. Former CDF councilors were likely to have forged deep ties with the paramount chiefs who were the primary sources of funding.
The next step was to determine the extent to which these different kinds of power-sharing arrangements – and the probable causes of them – actually influenced service delivery. Drawing out testable implications that could in fact be explored with the available survey material enabled the authors to provide additional confidence in the hypotheses they had derived. Further statistical tests indeed suggested that collusion, particularly that produced by the proportion of councilors who had been in the CDF, leads to far poorer service delivery than does competition.
Bangladesh, Honduras, China, and the United States all have similar labor laws on their books but very distinctive actual protection and enforcement of labor protections in the supply chains of global brands. Reference Berliner, Greenleaf, Lake, Levi and NoveckBerliner et al. (2015a, Reference Berliner, Greenleaf, Lake, Levi and Noveck2015b) investigate the clusters of stakeholders and what transforms the relations of power among them. To do this, they consider the strategic interactions among key players and what transforms the current equilibrium or status quo. Using the logic of game theory but not formalizing it, they are aware that it is off the equilibrium path for workers to organize and make demands unless they are assured that they will not be punished for their actions by losing their jobs or being sent to jail.Footnote 3 That only happens when brands find it in their interest to improve worker rights and benefits, and this only occurs if government is upholding its laws or the reputation of brands among consumers is being threatened.
Unfortunately, both of these circumstances are most likely to occur when there is an unexpected (if predictable) catastrophe such as a major fire or building collapse where workers’ lives are tragically lost. Reflecting comparative statics, such a shift leads to reform, but whether the commitments are credible depends on the creation of legal institutions that are hard to change and that incorporate sufficient administrative capacity to implement the rules. The testable implications may differ among the cases, but they are the organizing principles of the cases. The findings are not promising for labor rights. Honduras and Bangladesh lack the government capacity to maintain a positive labor rights regime over a long period of time. In China, the government has the capacity but not the will to establish meaningful labor rights, although it does ensure some protections. The United States, which once had both the will and capacity, now lacks the first and possibly the second and so has undergone a reversal. The result in all these instances is that the pressure on brands to discipline their supply chains is episodic or nonexistent.
The second set of cases document instances where societal interests come to trump private interests, making it possible to actually implement policies that will serve the population as a whole. As we saw with the labor illustration, it is difficult enough to ensure the protection of the interests of a neglected group within the society. It is arguably harder to protect general interests, as the case of corruption in Indonesia (Reference KurisKuris 2012a, Reference Kuris2012b) documents. And it is arguably harder still to implement policies where the interests of the world at large are at issue, as the case of deforestation in Brazil (Reference JacksonJackson 2014, Reference Jackson2015) details.
Of course, in each of these cases a range of stakeholders are the beneficiaries or losers from policy change. To transform past practice required some combination of leadership, interests, expert knowledge disseminated widely, trust relationships, monitoring, new forms of direct enforcement, credible commitments, and mobilizations that changed the incentives of both government officials and recalcitrant stakeholders. All of these features are documented in these cases, and documented well. Lacking is a structure to the accounts that makes it possible to observe the causal mechanisms and derive testable implications. The comparative statics are not sufficiently explicit. While the Sierra Leone decentralization and labor standards cases do not provide an actual formalization of the game, the presentation of the material makes it possible not only to derive but also to test implications that enhance confidence in the claims of the authors and make them generalizable to other cases.
11.6 Creating Take-Aways
Multiple interpretations are inherent in the traditional case study method. Moving beyond traditional approaches, analytic narratives provide two methods for establishing the generalizability of findings from case studies. First, the model in an analytic narrative often affords a range of explanations and predictions. Although the main account of a unique case may not be testable, the model may yield other predictions that can be tested, either in this case or in other cases. Second, as with other methods, out-of-sample tests constitute an important route to generalization. The presumption today in social science research is that the authors will provide those tests themselves. However, seldom does the level of knowledge for the out of sample case rival the detailed understanding of the original case that puzzled the author. The demonstration of generalizability must rest on a larger community of scholars who take the findings applicable to one place and time to illuminate a very different place and time. Each case then becomes a case among many that are grist for the mill of scholars, experts on particular countries and sectors, and policy-makers who must work collaboratively to sort out the lessons learned.
In this chapter, we have outlined the analytic narrative approach and, in Section 11.5, suggested the potential value of the approach for problems of development. Reflecting the interest of the authors employing the approach, the applications tend to focus on political issues, such as political stability and violence. The approach also applies to case studies of particular economic reforms, and we believe it will produce valuable results in this area.
In summary, the goal of analytic narratives is to provide several forms of discipline on the structure of case studies, such as a game, with emphasis on comparative statics and on off-the-path-behavior, and on predictions that can be tested on aspects of the case even if the main assertion about the case cannot.
12.1 Introduction
This chapter considers how different types of development-focused organizations have introduced case studies into their operations, and explores the lessons from these experiences for other development organizations interested in using case studies to enhance their own implementation effectiveness.Footnote 1 At one level, of course, case studies will be used differently depending on the organizational context; as such, to fully exploit a case study’s potential it must align with an organization’s specific reality: its history, mission, mandate, and capability. Actually doing this, however, requires undertaking the complex task of integrating cases into idiosyncratic organizational structures, rules, regulations and processes, and aligning it with a corporate culture that, at least initially, may or may not be favorably disposed to ‘learning’ in this way. In the sections that follow, we provide a comparative analysis of how this task has been conducted in four different development organizations, focusing in particular on how they select, prepare, and utilize case studies for collective learning.
A concern from the outset, and one that some regard as a pervasive weakness of case studies, is how to prepare cases that are both faithful to the unique particularities of each intervention and yet potentially usable by practitioners working elsewhere, perhaps even in different sectors, regions, and scales of operation. Indeed, “But how generalizable is that?” is a common critique levelled against case studies as a research method, where the concern is that the case itself is neither randomly selected nor “representative” of a larger population, but rather “cherry picked” to support predetermined conclusions. As methodological and empirical issues, these concerns are addressed elsewhere in this volume.Footnote 2 For present purposes, we consider case studies not as “qualitative evaluations” nor as small-scale “impact assessments” of projects, but focus instead on their roles as diagnostic and pedagogical instruments within (and between) development agencies. In this sense, we consider how case studies are prepared and read in ways akin to their use in medicine, law, and public policy – which is to say, as instances of broader phenomena, wherein professionals use their seasoned experience (and, where appropriate, scientific knowledge) to learn from specific instance of how, why, where, and for whom particular outcomes emerged over the course of a project’s or policy’s implementation. If formal impact evaluations are concerned with assessing the “effects of causes” (e.g., Did this rice subsidy, on average, benefit the poor? Did that text message invoking sacred precepts increase credit card repayments?), then in this instance case studies primarily seek to discern the “causes of effects” (How was this village able to solve its water disputes so much more effectively than others? Why did that program for improving child nutrition fare so much better with younger mothers than older ones? Where were the weakest and strongest links in the implementation chain of this immunization program? Why do some development organizations seemingly learn more effectively than others?).Footnote 3 It is in responding to these latter concerns that case studies have a distinctive comparative advantage; in this sense they should be seen as a key complement to, not a substitute for, more familiar evaluation tools used to engage with and learn from development interventions.
In this spirit, our concern here is to work backwards from broader concerns about the conditions under which development organizations ‘learn’ (or seek to learn), with a view to considering the role that case studies play in this process. Our discussion proceeds as follows. Section 12.2 considers four broad factors that seem especially important for understanding how organizations (not just their individual staff members) learn – that is, modify and/or improve their procedures and products in the light of experience and evidence. Section 12.3 then considers how these four factors have been deployed in case studies as used by four different organizations engaged with development issues: the World Bank, Germany’s GIZ (Gesellschaft für Internationale Zusammenarbeit), the Brookings Institution, and China’s Ministry of Finance. Section 12.4 concludes by categorizing how these different organizations are using case studies to learn across four organizational levels.
12.2 Organizational Learning Within Development Organizations
How do development organizations learn? A reading of the literature suggests that four broad factors seem to be especially important for understanding whether and how such learning takes place: motivation, environment, knowledge type, and practical use. We explore each of these factors by responding to four related questions.
12.2.1 Do Development Agencies Have the Motivation to Learn?
What motivates organizations to learn and invest in learning, and why might case studies be a suitable tool for doing so? For private sector organizations operating in today’s globalized economy, the motivation is clear: they must ‘adapt or die’ – that is, they must continually change in response to their fast-moving environments or risk becoming irrelevant. Indeed, in business theory and practice, an organization’s capacity to learn, and to apply and communicate knowledge, is considered a key strategic capability and is thus fundamental to its ability to produce value through innovation, improved quality, and efficiency (Reference DruckerDrucker 1994). Management specialist Peter Reference SengeSenge (1990) goes so far as to argue that the rate at which organizations learn may become the only sustainable source of competitive advantage; to capture this, he introduced the idea of a ‘learning organization’ – namely, an organization which actively cultivates certain characteristics to harness value from continuous learning.
For the most part, however, development organizations tend to be mission- or impact-driven rather than profit-driven. As such, they operate in a somewhat different environment and are influenced by different forces. These organizations may not ‘die’ if they do not adapt – the fate of large development agencies whose mandates derive from nation-states, for example, is ultimately determined by political criteria. As such, and because their very existence serves the purposes of different powerful groups, public and nonprofit development agencies are unlikely to decline, at least in the short term, no matter what their level of “performance” is deemed to be. However, if a key driver of learning in organizations is typically to improve performance (Reference Fiol and LylesFiol and Lyles 1985), this can be a source of motivation common to all development organizations – mission-driven as well as profit-driven. So understood, for development agencies performance can be broadly defined by its key functions (e.g., client services, advocacy, distribution of funding, direct service delivery).
Factors both external and internal to the organization can help generate a strong need for learning which acts as an important motivator for action within an organization. Such a need generates the motivation to go from contentment (passive) to curiosity (actively seeking knowledge). A perceived need is therefore the antecedent to new learning (Reference ScottScott 2011). For development organizations in the current environment, there are many factors that may generate a learning ‘need’. External forces, including large global political agendas such as meeting the Sustainable Development Goals, may motivate a learning need as the organization considers how to respond; similarly, the emergence of influential new rival agencies, such as the New Development Bank, may create pressures where previously there were none. Internal factors may also generate a need: the desire to improve communication; to share lessons, build relationships and communicate; or to build a culture that is open to discussing challenges.
12.2.2 Is the Organization’s Environment Conducive to Learning?
Any learning initiative will take place in the wider context of the organization’s approach to learning and knowledge management. The capacity and openness to learn must be designed into the organization and, in turn, be reflected across its structures, functions, and processes. To do this, an organization, and especially its key managers, must first be open to “unlearning” established ways (Reference Hedberg, Nystrom and StarbuckHedberg 1981); indeed, Reference Inkpen and CrossanInkpen and Crossan (1995: 596) argue that “a rigid set of managerial beliefs associated with an unwillingness to cast off or unlearn past practices can severely limit the effectiveness of organization learning” (see also Reference Nonaka and KonnoNonaka and Konna 1998). More positively, Reference ZackZack (1999: 135) defines a firm’s knowledge strategy “as the overall approach an organization intends to take to align its knowledge resources and capabilities to the intellectual requirements of its strategy.” While knowledge may transfer in the normal course of activities, organizations often introduce processes and knowledge management systems that actively facilitate the key processes of knowledge creation, transfer, and retention (Reference Argote, Beckman and EppleArgote, Beckman, and Epple 1990). Reference ScheinSchein (1990) suggested that a group’s learning over time becomes encapsulated as the group’s culture: in other words, it is both internalized as a set of assumptions and externalized as group norms or values.
The use of case studies should therefore be considered in the context of the organization’s learning intent, strategy, and culture, and as one of a number of possible organizational learning tools or methods. The production of a case study involves not just a product but also a process which in itself can provoke learning at multiple levels of the organization. Key characteristics of such a process include:
Individual learning: Individuals have generated knowledge through their practices and they have learned how to overcome challenges. Organizations are motivated to capture the tacit knowledge held within individuals in the system and to share this knowledge. Case studies are one tool which can be used to approach this task.
Group learning: Group engagement with producing a case study. Case studies can be used to engage individuals within a group in reflecting together, capturing the group’s knowledge and generating shared insights.
Organizational learning: Retention of knowledge within the organization. The case study process is a way of attempting to codify and share knowledge. Members of the organization can then access this knowledge through the case studies, which can be used to initiate and inform discussion. Learning at the organizational level typically requires support from the organization’s authorities.
Interorganizational learning: Case studies are shared between organizations to foster the collective learning of a wider community of practice. Knowledge is transferred through a learning network by the development of shared processes/systems. Creating a network expands the reach of any particular initiative.
We will categorize this multilevel learning as IGOIL (individual, group, organizational and interorganizational learning), where different institutions may operate actively on one or more levels relevant to their learning strategy.
12.2.3 What Types of Knowledge are Captured by Case Studies?
Drawing on the early work of Reference PolanyiPolanyi (1966), Reference NonakaNonaka (1994) distinguishes between two types of knowledge: explicit knowledge, which is easily identified and codified; and tacit knowledge, which is what we know but cannot easily describe, and relates to both cognitive capability (‘know what’) and action (‘know how’). Explicit knowledge can be shared and integrated via reports, databases, and lectures, whereas sharing tacit knowledge occurs through dialogue and practice. One can acquire and convey explicit knowledge about a bicycle (its wheels, frame, etc.) through study, but one only acquires the tacit knowledge required to ride the bicycle by persistent practice (i.e., by falling over many times until one’s brain figures out how to stay upright).
There is a lot of technical knowledge within development organizations, and a corresponding familiarity with discussing and recording what was done in a given situation in an attempt to discern and capture ‘best practice’. The case studies discussed in this chapter intend instead to capture knowledge about the way that things are done: ‘the how’ of implementation rather than ‘the what’ of end results. This type of knowledge is often held within an individual (or team) who has implemented or supported implementation of a program. From the social constructionist perspective on learning, Reference Cook and BrownCook and Brown (1999) suggest that this type of knowledge is acquired “as people wrestle with the intricacies of real world challenges and improvise a way to a solution” (Brown 2011: 6). From this perspective, learning depends on social interaction and collaboration: one person’s knowledge is co-dependent on the contributions of peers and must be negotiated with them. Knowledge about ‘the how’ is often tacit, context specific, and complex; factors relating to behavior, politics, and institutions influence the process. This is difficult to capture as the more we try to codify tacit knowledge the more it loses its context; perhaps it can only be recorded to a degree. Case studies attempt to capture some of this type of knowledge through alternative devices (such as via narrative form and personalization).
The cases discussed in this chapter are written with a specific focus on ‘delivery challenges’ (see Box 12.1); they describe situations where groups wrestle with and sometimes overcome delivery challenges. By sharing this type of knowledge, it is thought that others in the organization may gain inspiration for wrestling with their own real-world challenges. The organization’s culture will influence the openness of its members to capturing and discussing this type of knowledge – that is, knowledge relating to challenges and failures rather than just success stories.
Delivery challenges are the nontechnical problems that hinder development interventions and that prevent practitioners from translating technical solutions into results on the ground. They are intimately related to development challenges, how interventions are implemented, and organizational issues. Delivery challenges should be the answer to the following questions: Why did intervention X, aimed at solving the development challenge Y, not work or not achieve its full potential? What were the main obstacles that intervention X faced during its implementation?
12.2.4 How Do Development Organizations Enhance the Practical Use of Case Studies?
It is widely accepted that learning requires changes in both cognition (knowing) and behavior (doing) (Reference ArgyrisArgyris 1977; Reference Crossan, Lane and WhiteCrossan, Lane, and White 1999; Reference GarvinGarvin 1993; Reference Hedberg, Nystrom and StarbuckHedberg 1981; Reference Stata, Almond, Schneier, Russell, Beatty and BairdStata and Almond 1989). As such, the practical value of using case studies lies not just in documenting the end product (what was achieved) but also the processes involved in getting there (how the end product was achieved). An advantage of the type of case study described in this chapter is that it remains close to practice. The cases capture stories of practice and should assist practitioners in implementing their work, thereby helping the organization achieve its mission.
Case studies can provide direct learning opportunities for practitioners to gain understanding of specific types of implementation challenges and how they were tackled, and/or to increase knowledge about specific development contexts. They aim to provide knowledge in a context-sensitive manner (unlike ‘best practices’). Since this type of knowledge is often best shared in person, additional value can be gained from the case study by using it as a catalyst to spark dialogue around implementation issues between practitioners within and between both sectors and organizations. As the focus is on challenges encountered during implementation, use of this type of case study may also contribute to wider discussions in an organization about challenges, including failures, and how to learn from them. Dissemination and promotion of engagement with case studies are therefore important activities that should take into consideration the specific audience, organizational context, and culture. Knowledge management systems which incorporate the compiling and coding of cases are a useful resource; however, it may not be sufficient to just share a case study with colleagues. Instead, learning platforms and opportunities should be designed with the intended audience in mind; for example, structured discussions and learning events may be appropriate mechanisms to translate knowledge into practice.
12.3 Using Case Studies for Organizational Learning in Four Development Agencies
Organizations have different ways of curating, documenting, and mobilizing knowledge. Generating and using case studies as a tool for organizational learning requires a considerable investment of an organization’s time and resources, and different organizations have deployed different approaches. This section presents the experiences of four different organizations engaged with development issues – a multilateral agency (the World Bank), a major bilateral agency (Germany’s GIZ), a leading think tank (Brookings Institution), and a key national ministry of a large developing country (China’s Ministry of Finance) – as they have developed their use of case studies within their individual contexts. Reflecting on the experience of these different types of organizations may assist other organizations in their decisions about whether and how best to incorporate case studies.
The organizations were selected on the basis of their participation in the Global Delivery Initiative (more on this below) as well as the type of organization they represent. They were assessed via oral interviews as well as complementary desktop research of secondary material. Based on this assessment, the chapter will now examine how the motivation for organizational learning, managing knowledge, and the use of case studies in managing knowledge can vary among different types of development organizations.
All of the four organizations are linked through their involvement in the Global Delivery Initiative (GDI; described below – see Box 12.2) and all have developed case studies and shared them through the GDI network, which allows for some comparison between methods and approaches used.
The Global Delivery Initiative (GDI) was a joint effort by multiple organizations to create a collective and cumulative evidence base on the ways in which challenges encountered during the delivery of development interventions are addressed. The GDI supported the science of deliveryFootnote 4 by building on the experience of its partners; connecting perspectives, people, and organizations across sectors and regions; and ensuring that staff and clients have the knowledge they need for effective implementation (see Reference Gonzalez and WoolcockGonzalez and Woolcock 2015). From the outset, the GDI deployed analytical case studies as its primary tool for acquiring, assessing, and disseminating knowledge on implementation dynamics: how particular teams, often implementing complex projects in difficult circumstances, successfully identify, prioritize, and resolve the problems that inherently accompany delivery.
In addition to producing case studies (and sharing them through its Global Delivery Library), the GDI convened partners to facilitate sharing of experiences and lessons learned on delivery; provided support to practitioners in member organizations as needed; trained prospective case writers; and identified common delivery challenges to provide support to practitioners. The goal was not to identify prescriptive universal ‘best practice’ solutions, but rather to share particular instances of how common problems were solved, with the expectation that these solutions could be adapted elsewhere as necessary by those who face similar challenges. Knowing that others have faced and overcome similar challenges can also be an important source of ideas and inspiration. Indeed, all professional communities – from brain surgeons to firefighters – have forums of one kind or another for sharing their experiences and soliciting the advice of colleagues as new challenges emerge; similarly, managers and front-line implementers of development projects should have ready access to people and materials that can help enhance their skills and effectiveness.
The steps by which a GDI case study was prepared emerged through an iterative process. The common principles underpinning the preparation of a GDI case study centered on treating it as an instance of applied research: beginning with a thorough desk review (documenting the project’s history, objectives, and performance to date); using this to generate specific questions pertaining to implementation challenges that formal documents cannot answer; and then outlining a pragmatic methodology whereby particular stakeholders (project staff, recipients, senior government counterparts, etc.) were interviewed and additional data generated. The case study was then prepared on the basis of this material (Global Delivery Initiative 2015). Unique to the GDI case study methodology was that it evolved around development and delivery challenges. Instead of focusing on (project and/or program) objectives, case studies were built around challenges that were cross-sectoral and allowed for learning across sectoral disciplines. The assumption was that this approach would spark a discussion on nontechnical matters amongst technical experts as well as related stakeholders (e.g., governments). This approach varied considerably from general practice in development organizations, wherein learning was focused on project reports, excluding knowledge on the “how to.”
12.3.1 Motivation for Using Case Studies for Organizational Learning
The motivation for using case studies varies widely across all assessed organizations, depending on organizational objectives, structures, and processes. For example, instead of focusing on ‘best practices’, China’s Ministry of Finance (MoF) seeks to tell the story of China’s development over the past decades in ways that capture insights to inform and possibly adapt planned or ongoing interventions in other countries (as well as in China) – the MoF invests in case studies because they are perceived as a suitable product for knowledge-sharing between China and the rest of the world. A case study is considered an additional product in documenting project results and hence will be disclosed and distributed publicly. More formally, the MoF’s objective(s) when producing case studies are to:
Shed light on underexplored projects that China has conducted together with the World Bank, producing implementation knowledge on how these projects were carried out.
Identify a platform and adequate tools to document its development experiences in order to share these with the world, especially with other developing countries as part of a “South–South Cooperation” agenda.
Name of Organization | Purpose of the Organization | Type of Organization |
---|---|---|
World Bank | To end extreme poverty (decreasing to 3% the number of people living on less than $1.90 per day) and promote shared prosperity (fostering the income growth of the bottom 40% in every country) | Multilateral finance institution |
Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH (Government of Germany) | To contribute to sustainable development through services and approaches such as capacity development, sectoral and policy advice and change management, project management and logistics, network facilitation and mediation, and event management | Bilateral implementing agency |
Brookings Institution (Center for Universal Education, Millions Learning Project) | To conduct in-depth research that leads to new ideas for solving problems facing a society at the local, national, and global levels | Nonprofit public policy organization |
Ministry of Finance (MoF) of the People’s Republic of China | MoF is one of the ministries of State Council which is responsible for financial affairs of the People’s Republic of China | National government |
The Millions Learning Program at the Center for Universal Education (Brookings Institution) decided that case studies were an appropriate strategy for capturing and sharing the process behind how education interventions around the world went to scale. In order to do so, the Millions Learning team globally scanned for programs and policies initiated by state and nonstate actors that demonstrated a measurable improvement in learning among a significant number of children or youth.
GIZ’s interest in case studies is to primarily address specific delivery challenges by first characterizing the most important failure in not closing the delivery gap, specifically the so-called “last mile delivery gap” for the poor. For example, in the case of water and sanitation programs, it is the missing access to clean water; in the case of the energy program, it is missing access to at least one important energy service. Case studies address more complex issues at the governance level, such as the functioning of public administration systems overseeing police forces. They also deal with more institutional/political types of failure, such as the missing rights-based approach to public administration (South Caucasus) or political interventions in police reforms (Central America). Success is therefore always presented as a substantive response to an identified failure in public service delivery.
GIZ’s motivation in curating knowledge via case studies has varied depending on the case study in question. Some examples follow:
Starting a more general reflection process on specific program approaches (Water/Sanitation; Community Policing)
Promoting an innovative intervention with proven scale-up (Prison Reform/Bangladesh)
Presenting a proven technical/organizational innovation (Metering System Bangladesh)
Supporting regional learning processes (Community Policing, Administration Law South Caucasus)
Marketing program approaches (Cashew Initiative; Energizing Development).
12.3.2 Organizational Learning Environment
Work on case studies is usually embedded in organizational contexts such as units explicitly dealing with organizational learning and/or knowledge management. These linkages are of high importance to ensure that case studies reach their intended target audiences within each organization. Organizational culture – or in this case, learning culture – is the “breeding ground” that highly impacts how case studies are perceived and acknowledged.
For China’s MoF, promoting adaptive learning is the core rationale for producing case studies; as such, case studies should at best include stories of successful interventions as well as course correction. However, changing the perspective from focusing on success to challenges has not always been easy for case writers in this context. To openly identify, assess, document, and communicate failure poses a distinct challenge in China’s otherwise “success-driven” environment.
Brookings’ Millions Learning project was initially interested in learning from case study “success stories” as well as from interventions that did not achieve their intended outcomes. However, the team quickly realized how challenging it was to publish “failure cases,” as people are often hesitant to publicly admit to failure. That is why in the project’s calls for case studies, the wording is highly important. For example, the team’s use of the term “failure” caused resistance, whereas the terms “challenges” and/or “course corrections” resulted in greater sharing among case study partners. Apart from semantics, the change in wording also strongly enhances the emphasis on learning and jointly improving from experiences (such as how challenges have been overcome).
To openly discuss challenges as well as failure is nothing new at GIZ, which for many years has been actively fostering a culture permitting failure to be openly addressed. Strategic evaluations, for example, are done with openness, highlighting deficits and failure. However, discussing failure and limitations is not yet a mainstreamed management attitude. GIZ acknowledged several common challenges to the process of writing case studies, as follows:
Identifying an appropriate delivery challenge
Updating the existing literature by internet research, and not just relying on existing institutional documents or reports
Identifying the most important causal mechanisms
Lack of recognition of the importance of governance structures/aspects at the national level
Comparative case studies require a different methodological approach. They are not an extension of a single case study
The process of organizing a case study depends on the specific demand and should not be too predetermined. (It is not the written document which counts, but the use of the knowledge that emerges by doing case studies.)
Unlike China’s MoF or the Millions Learning project at Brookings, the scope of GIZ’s case studies depends on the demand of its partner organizations and program managers. Consequently, GIZ’s approach to learning from case studies and its integration into corporate learning has several specific objectives:
To document the tacit implementation knowledge of different program interventions with different partner organizations. As a contribution to an internal reflection process, this type of case study needs a clear mandate from an internal network or community of practice and relies on the motivation of senior advisors to make their implicit knowledge explicit.
To introduce innovative approaches focused on a specific delivery gap at the country level, but also at regional or international levels. This type of case study is neither a policy document with general recommendations nor a detailed story of a specific program intervention at the country level. The case attempts to understand the most important causal mechanism responsible for the identified delivery challenge and to explain why and how the presented response to the delivery challenge has been effective.
To present a proven organizational or technical solution to an identified delivery gap mainly at the local or micro-level starts by explaining why the established approach has not been effective in closing the delivery gap. Such case studies usually focus on the incentive structure, in particular on incentives and behavioral attitudes of clients and partner organizations.
At the World Bank, the Independent Evaluation Group (IEG) has embarked on a series of reports to better understand how the Bank learns from its operations, embedded knowledge, and experiences (see IEG 2014, 2015). As a general conclusion, these reports state that the World Bank can do much better in learning from the knowledge it produces and that flows through its practice.Footnote 5 The Bank agrees it needs a more strategic approach to learning, and that such strategy should adapt to the different learning needs identified by these reports (needs related to operational policies and procedures, human resources policies and practices, and promoting an institutional environment with incentives and accountability to foster knowledge and learning).
As part of a recent full-fledged institutional change management process, the World Bank has created different sectoral responsibilities to manage learning and knowledge to help overcome development challenges. The new arrangement aims to build capacity for staff and to encourage clients to learn, share, and use knowledge derived from experience in addressing operational challenges, including assessing whether and how such experiences can be adapted elsewhere and scaled. One of these institutional responsibilities resided in the Global Delivery Initiative, which sought to package such knowledge and lessons into case studies and generate methods to develop such case studies for use within and between development organizations. For GDI, case studies on delivery provided a clearer understanding of the sequence of events and balanced the perspectives of key actors, helping us untangle cause and effect. More specifically, such case studies sought to outline how interventions were implemented. They provided insights into the results and challenges of implementation, and helped to identify why a particular outcome occurred. They explored interventions in their contexts, and described what was done, why, how, for whom, and with what results.
12.3.3 Types of Knowledge Curated Via Case Studies
Case studies are an appropriate tool to capture knowledge in a structured yet context-sensitive manner, allowing for narratives to unfold and implementation processes to be revealed without over-simplifying. The type of knowledge curated via case studies, however, varies according to each organization assessed.
Guidelines produced by the World Bank were used as the methodological backbone of all case study work initiated by China’s MoF. However, the Ministry would like to maintain a certain flexibility regarding its case studies that allows experienced case writers to add their individual styles and additional details. This is because China’s MoF strives to capture knowledge through case studies that informs the design of new interventions (projects) in China, as well as to inform the implementation of ongoing interventions (scaling up). Therefore, the selection criteria for case studies are primarily based on the quality of the project the case study will focus on, and whether it entails concrete experiences that are worth sharing within and beyond China. In a small number of cases, the MoF also selects case studies based on research interest.
Apart from publishing a final report and upcoming stand-alone case studies, the Millions Learning team periodically blogs about its case studies, report findings, and topics. The team is planning to release a series of two-minute videos that feature voices of case study partners to bring each featured case study to life. The Millions Learning team also disseminates a quarterly newsletter, tweets daily, and presents its report and case study findings at international events and conferences every few months. The vast majority of the case studies (80 percent) contained empirical findings from fieldwork and were not limited to desk research only. Fieldwork was conducted by staff at the Center for Universal Education at the Brookings Institution and consultants via in-person or phone interviews. The same people who undertook the field visits and data collection wrote the case studies (in-house researchers as well as external consultants). What is required of case writers is familiarity with the case study methodology as well as the topic of the case, the specific intervention, and the country.
GIZ has broad experience in using case studies and uses an existing methodology. One of the main learnings is that case studies are only valid in specific contexts and that knowledge cannot be directly transferred from one context to another. For instance, once a case study is developed, its results are only used by a couple of colleagues to feed into the development of specific programs. At times meta-evaluations are carried out for specific topics, but these do not always lead to changes in action as the conclusions tend to be fairly general. This has led to the understanding in GIZ that case studies are a necessary tool for specific programs but that generalization of results is tricky and obtaining evidence is highly resource-intensive and often impractical. Use of case studies falls outside the default reporting procedures at GIZ. Reporting requirements are linked to specific program cycles and implementation processes, whereas case studies take a broader view of the social and political context as well as behavioral and institutional aspects. They usually cover a greater period than a program cycle, as they focus on how delivery gaps have been closed (and not only on the impact of a given program intervention).
At the World Bank, the current objective is to gain in-depth and systematic knowledge on the causal mechanisms that explain development results. Based on systematizing casual mechanisms (which includes the identification of the key factors and enabling conditions) that explain the pathway to change, the Bank can identify lessons learned that may usefully inform decision-making in other contexts and scales. The case study method is useful for hypothesis generation: drilling deep into experiences and tracing the casual mechanisms of change (see Reference GerringGerring 2017) helps to systematize the mechanisms behind implementation process.
GDI’s cases, then, worked with a focus on the ‘how to’ of implementation. The type of knowledge curated revolved around those factors and pathways of change that explain a particular development result. The purpose of gathering such knowledge was to provide practitioners with evidence that can help them inform their own decision-making. As stated in GDI’s fact sheets,
The case study method encourages researchers to ask questions about underexplored complex delivery problems and processes that development stakeholders routinely grapple with: what they are, when they arise, and how they might be addressed, including detailed accounts of delivery techniques, strategies, and experiences of the twists and turns of the implementation process. Systematically investigating delivery in its own right will make it possible to distill the common delivery challenges – the institutional, political, behavioral, logistical, and other issues that affect the delivery of specific interventions. It will also inform practitioners when they are faced with similar delivery challenges in their own programs and projects.
12.3.4 Use of Case Studies for Organizational Learning
Apart from disseminating case studies via the Global Delivery Library of GDI, China’s MoF intends to publish all its case studies via the library of the Shanghai University of Finance and Economics, which is one of the partner universities of MoF China. Conferences and events organized by local government officials are equally important channels for dissemination of insights gained via case studies. For instance, the Ningbo government is planning to include the Wetland project case study in a book about Ningbo’s experience in implementing World Bank projects, and it will be shared with participants at a conference hosted by the Ningbo government. Additionally, all case studies by the MoF will be disseminated via the internal online platform to all bureaus and agencies affiliated with the Ministry. It is too soon to provide evidence on whether case studies have been used by decision-makers and officers in government. However, there has been strong interest by project managers in China to use and learn from these case studies. The MoF does not foresee any resistance or challenges in disseminating case studies. Even so, it has adapted its approach following feedback from a GDI training course so that now a selected group of dedicated academics will produce all case studies; this has significantly increased the quality of the cases.
The explicit objective of the Millions Learning project is to use case studies to provide a picture of the players, processes, and drivers behind the scaling process in education. It is evident that the project is interested in leveraging knowledge in education across organizational and national borders. The project also intends to learn from and build on research on scaling up which may be relevant across sectors – for example, health and nutrition, as well as other disciplines. It has been clear from the start that the project did not intend to publish a compendium of case studies, but instead preferred to focus on patterns across case studies that should be documented and shared. Case studies are referred to in order to provide examples. The team was also clear from the project’s inception that documentation of knowledge is more a means to an end than a final product. Therefore, the Millions Learning report is considered to be the starting point for knowledge-sharing, dialogue, and, ideally, action around selected topics and areas in education. Hence, it is outward facing, inviting organizations and individuals to share information and contribute to further shaping the debate around global education. To achieve this, the initiative continuously reaches out to organizations, agencies, and individuals from around the world to contribute to and feed into the process through interviews, conventions, and draft report reviews. The Millions Learning team also published stand-alone case studies in 2016, providing a deeper dive into the individual case studies discussed in the Millions Learning report.
To date, ten case studies using the GDI methodology have been developed by GIZ. There has been exchange across organizational boundaries, but not yet at scale. However, regional programs have used case studies for reflection processes across boundaries. Selected case studies have been presented at regional seminars and used as reference material in the formation of new interventions. Coming back to the different types of case studies GIZ has developed, the following lessons can be derived from experiences in writing and using case studies so far:
Case studies presenting innovative approaches focus first on design and analyze the real implementation issues related to the chosen design. The context is more related to regional or international experiences in the area or issues presented, and the country context is mainly taken into account for understanding the differences with other experiences. Comparison is more important than detailed understanding of specific case-related aspects of implementation and management. The main focus is on understanding similarities and differences due to specific country conditions.
Case studies which summarize implementation knowledge focus more on implementation than on design since the design has been proven effective under different conditions and situations. Thus, the main interest is to understand what works under which conditions and what kind of tacit knowledge should be taken into account when approaches have to be transferred and adapted to a “new” context.
Case studies which present a proven organizational/technical solution to a delivery gap at the local level focus on the “how” of the incentive structure. Therefore, feedback loops with clients and real-time impact monitoring are important tools.
At the World Bank, the GDI was one of the most interesting and productive initiatives using case studies as a learning source. The model of case studies for the GDI provided comprised a critical body of knowledge with insights from the implementation process that helped practitioners identify those causal mechanisms explaining results in particular contexts. An understanding of the critical factors and enabling conditions in achieving results helped to inform projects operating outside the specific context of the case. The cases were also used as part of training sessions to develop the capacity of practitioners to use cases to inform their own practice and to populate the GDI’s case study repository, now managed by the Global Partnership for Effective Development Cooperation.Footnote 6 At the same time, the training agenda acted as a capacity building “train the trainers” strategy, with the aim of creating a global cadre of suitably qualified practitioners that not only gained skills as case writers but also benefited their own practice. Internally at the World Bank, the GDI trialed some case studies that were used as learning exercises for newcomer staff, in which they simulated how staff approach clients in different contexts and for different development problems.
12.4 Lessons Learned in Aligning Case Studies with an Organizational Learning Agenda
In the previous section we noted that case studies on development practice are used in different ways and with different levels of systematization for the purpose of organizational learning. Here we can make use of our IGOIL categorization to explain how case studies from these different organizations tap into different levels of learning.
As we see from Table 12.2, different organizations use case studies for learning purposes, but such purposes serve different objectives. We can use the MoF of China and the World Bank as two examples with different purposes. For China’s MoF, learning is external facing, with partners that want to learn from the experiences captured in the Chinese case studies. This external interest may come typically from other governments that want to learn how the Chinese government dealt with a particular development challenge. Learning is done mainly at the interorganizational level: the MoF selects and systematizes experiences to be disseminated, and this external demand is what guides the capture and systematization of knowledge by the MoF.
Learning Category | MoF, China | GIZ, Germany | Brookings – Millions Learning Initiative | World Bank –GDI |
---|---|---|---|---|
Individual | X | X | ||
Group | X | |||
Organizational | X | X | X | |
Interorganizational | X | X | X |
The World Bank’s approach is also very much about interorganizational learning, by sharing experiences among institutions on how to address development challenges. However, at the same time there is a specific focus on knowledge retention and organizational learning, with the goal of interpreting and using the knowledge collected through the case studies to support the organization’s business practices and improve performance. The GDI approach focused on contacting particular partners and using group discussion to advance this learning agenda; it also provided training for practitioners to not only become case writers, but to develop capacity at the individual level for transformational change by better understanding the change process.
Table 12.2 also points to some of the different motivations for using case studies as a learning tool. In the case of MoF China and Brookings, for instance, case studies are shown as exemplars of how to do things or ‘what and how things work’ in the spirit of sharing such knowledge outside the boundaries of the organization. At GIZ the focus is to provide practitioners, within and outside the organization, with examples of good practices. Finally, GIZ understands itself as a convener of experiences on transformational processes, with the role of promoting dialogue not only at the practitioner level but also across organizations and countries.
Table 12.2 and the preceding discussion shows that case studies do not need to use the same knowledge-sharing strategy or audience to inform development processes. Case studies can be used as a learning tool to improve performance and implementation in internal practices. They may never be shared directly with other practitioners or stakeholders outside of that organization, but this approach may still spread lessons indirectly through changes in behavior and practices as a consequence of insights captured in the case study. On the other hand, case studies can be used directly to inform counterparts of experiences that provide insights on what works and how. In this instance cases may have more impact on an external organization receiving such knowledge.
Finally, the use of case studies as a learning tool also generates some knowledge value in the process of developing the case study itself, in addition to the output. As has been shown with MoF China, the GDI, and to some extent GIZ, case writers are trained to focus on a problem-driven approach to tackle case studies. These case writers are also practitioners involved in development projects who may be keen to incorporate this approach in future development practices. Further capacity building at an individual level may also take place among the key stakeholders involved. As a case study’s interviewees, they play a role in articulating their experiences, which are captured as knowledge on the “how to” of implementation. As experienced through the preparation of case studies by the four organizations discussed in this chapter, such engagement provides these key stakeholders with a new perspective on how to tackle challenges throughout the implementation cycle, and in the process perhaps generates a change of mindset.
13.1 Introduction
Former UK prime minister Tony Blair once said: “The problem isn’t vision. Often we know what to do. The real problem is getting things done.”Footnote 1 In 2013, the World Bank Group embraced this challenge as a part of a new “science of delivery” initiative championed by its president,Footnote 2 building on an ambition that Sir Michael Barber articulated in the service of the Blair government, manifest most conspicuously in his deployment of dedicated delivery units (see Reference BarberBarber, 2015). At issue was whether organizations could develop and formalize reliable guidance about how best to translate good ideas into real impact.
As part of this effort to improve implementation, the qualitative case study has a special place. Randomized controlled trials and other tools used to assess program design or evaluate the effectiveness of specific interventions provide little leverage or practical insight when the breakdown between ideas and impact lies in the hows – the specific steps taken to deliver a service or change an institution. A case study can help improve the translation of policy into results by tracing these pathways, illuminating the effects of context, process, politics, and capacities on intermediate achievements and broader outcomes.
But practitioners can also use case studies to improve performance in a variety of other ways. While previous chapters have laid out a social scientific rationale for the use of qualitative case studies, proposed standards for assessing rigor, and offered examples, this chapter focuses on employing case studies for adaptation and learning, especially in governments or organizations that seek to promote economic growth and development. It proposes that case studies useful for this purpose have seven specific qualities, though they may differ widely in other respects. Additionally, it offers a brief user’s guide for policy planners, managers, and instructors.
Our observations build on insights from two programs: the World Bank’s Development Research Group and its leading operational unit deploying case studies, the Global Delivery Initiative (GDI), and Princeton University’s Innovations for Successful Societies (ISS) program, which develops policy-focused case studies of development.Footnote 3 Both programs worked for many years with people leading change in different contexts. From 2008 through 2021, the Princeton program helped a rising generation of leaders address the institution-building challenges facing governments in fragile states and neighborhoods, low-income countries, and crisis situations. Case studies were, and remain, the program’s medium for enabling public servants to share experience with each other in an accessible manner. Similarly, the World Bank-based GDI, which launched in 2014, began as a collaboration among various development partners to help practitioners build a more systematic understanding of program implementation, promote policy dialogue, and improve operational effectiveness. The Global Delivery Library, one of the GDI resources, became an open repository of cases that tapped the tacit knowledge of field-level practitioners about how to navigate delivery challenges, enabling future operations to draw upon wisdom from past interventions.
13.2 From the Science of Delivery to Adaptive Management
Blair’s observation – it’s not the vision but the how that’s the problem – had its roots in a prime minister’s struggle to improve service delivery across different sectors, especially education, health, and policing. In the United Kingdom, as in every country, implementation is often the great bugaboo on which great ideas stumble. But offering reliable generalizations to help guide the work of front-line providers, managers, and ministers poses many challenges. The social world cannot be reduced to a set of laws or principles as easily as the natural world.
Efforts to frame a science of delivery exposed two different policy worlds: one in which it was possible to base generalizations on credible evidence, and another in which tracing the influence of actions on impact was more difficult, though still valuable. In medicine and education, for example, there were some strong points of agreement about measures that could have a big impact on broad outcomes, as Reference WagstaffWagstaff (2013) has correctly noted. Take the example of vaccination against childhood diseases. There is mounting evidence about how best to scale vaccination campaigns. Though not completely reducible to a formula – at least not to one that works the same way to the same extent in every setting – it is possible to think systematically about how to achieve results, including estimates of the participation rates needed to create herd immunity and innovations to help maintain the cold-chain when lack of electricity threatens vaccine viability. Reference WagstaffWagstaff (2013) points out that it is unsurprising, then, that champions of a science of delivery – the testable, relatively stable understanding of cause and effect within the implementation process – often started their careers in a field such as public health and that journals such as Implementation Science were specific to this policy area.
This science came together as the confluence of many strands of research and multiple methods of investigation. It is notable that the contributions in the pages of the Centers for Disease Control’s Morbidity and Mortality Weekly Report have included not only analysis of epidemiological data, but also case studies based on field interviews.Footnote 4 The qualitative case studies help identify the nature of the many gaps between the release of a vaccine to a health worker and actual protection of an individual against the disease, and often to point to remedies. By tracing the breakdowns in the process, they spur adaptation that could help improve the match between the numbers of people a campaign aimed to protect and actual levels of vaccine administration. Through multiple cases, as well as larger tracking studies, practitioners are able to come closer to answering the key “How?” questions that Reference BehnBehn (2017: 94) rightly highlights as essential elements of a science of delivery: “How does this strategy produce results? What exactly are the causal connections between the strategies employed and the resulting outputs or outcomes?”Footnote 5
Case studies have also aided understanding by enabling us to probe why outliers – exceptional successes or failures – differed from the patterns normally observed, thereby illuminating possible ways to improve performance across the board. This was the approach adopted by Reference Brixi, Lust and WoolcockBrixi, Lust, and Woolcock (2015) to learn from local service success stories in parts of the Middle East and North Africa. Household survey data from several countries in the region indicated that student performance was often poor, despite the fact that school access and facilities had improved. If all schools in a country operated under the same set of regulations, these authors asked, why do some areas perform so much better than others, controlling for demographics? Did the differences stem from a condition outside the control of managers, or was it something that principals and teachers in one area just decided to do differently – a practice that, at least in principle, others could replicate? The household surveys did not contain the type of information that allowed them to answer these questions, so the team went to the successful schools and studied them. One hypothesis was that degree of parental engagement affected both teacher behavior and student performance. The questions the team posed therefore included several about interaction between school officials and the community. The case studies found that the successful schools were those where principals and teachers met with residents and there was more communication with families. The challenge was then to figure out how to generalize a practice that was at least partially sensitive to the orientations and aptitudes of school leaders. In this instance, qualitative case studies supported development of alternative explanations and illuminated a potential solution to the problem of low-performing schools.
Not all policy spheres look like either of these examples, however. In some, policy arenas, implementation involves multiple changes at once, which means there are several possible causal explanations for outcomes. In Reference BehnBehn’s (2017: 96) words: “Thus, the manager’s ability to assign causal credit is difficult. And if the management team is just starting out – if this is the team’s first effort to improve performance – which of the team’s multiple actions deserves how much of the credit?” The answer to this question cannot be called “science,” he says. “It could, however, be an intelligent guess.”
An intelligent guess is a step in the right direction, a hypothesis rooted in facts, though it isn’t the same as an evidence-based handbook, the kind of product Reference BehnBehn (2017) suggests a science would produce. Where it is hard to winnow out which conditions, circumstances, or actions carry the most weight in delivering a development outcome, and where we are therefore likely to have a high ratio of intelligent guesses in decision-making, implementation may adhere to a different model. Continual review, learning, and mid-course correction become essential. Though long practiced, this approach has more recently gone under names such as “adaptive management” or AdaptDev, which now has its own Google Group,Footnote 6 “Doing Development Differently” (DDDFootnote 7), and “Problem-Driven Iterative Adaptation” (PDIAFootnote 8). The common idea across these new platforms is that where a traditional after-action review, for example, is conducted at the end of an initiative, the push instead should be for feedback and learning to occur throughout an effort to implement a policy or institutional change. Reference Booth, Balfe, Gallagher, Kilcullen, O’Boyle and TiernanBooth et al. (2018: 8) point to a process in which implementers, in response to complex challenges, “deliberately set themselves up to learn by trial and error, testing initial approaches and adjusting rapidly as evidence on possible avenues of change is acquired.” Matt Reference AndrewsAndrews (2018: 1), one of the key contributors to this approach, has written on the basis of his long experience: “We always ask of PDIA in practice: What did we do? What results emerged? What did we learn? What did we struggle with? What was next?”
Although both policy learning and learning-by-doing have a long history, the ambition of the Doing Development Differently and AdaptDev communities that have emerged in this space is to expand the practice of experimenting, learning, and adjusting in domains where broad evidence-based generalizations about implementation are out of reach. In these areas, the people responsible for translating ideas into practice will almost certainly encounter challenges and unexpected obstacles (Reference SchonSchon 1983; Reference Pritchett, Samji and HammerPritchett, Samji, and Hammer 2013). If they do not step back, reflect, learn, and adapt, they risk persisting with interventions or strategies that are not well suited to the situation that they face. Therefore, these teams must be ready and willing to adapt mid-course, to experiment and scale up what works, and to iterate and integrate feedback into implementation. Together with careful planning and the elaboration of a clearly articulated theory of change, the incorporation of “rapid feedback loops” into an endeavor is crucial, as is using these processes for “learning in response to ongoing challenges” (Reference Pritchett, Samji and HammerPritchett, Samji, and Hammer 2013: 1).
In this corner of the policy world, where causal relationships are less straightforward than they are in public health (and elsewhere), case studies help practitioners pool observations, recognize what has worked, identify where things aren’t turning out as anticipated, flag surprises, and open up space for adaptation. They help make the tacit knowledge practitioners have accumulated as explicit possible. Although they may draw on focus groups, surveys, and quantitative evidence, they employ interviews to help trace the steps taken, departures from the roadmap, and intermediate results in order to help us better address both anticipated and unexpected circumstances and increase the probability of generating intended impacts.
In early experiments, embedding case development and data collection directly into projects not only strengthened the quality of evidence produced but also enabled managers to make mid-course corrections and secure stronger buy-in from other stakeholders. Innovative elements have sometimes included smartphone surveys to check whether a service reached intended beneficiaries or assess satisfaction, geotagged information displayed on maps to help spot service coverage issues, satellite photography to track crop conditions, and other information generated with relatively low-cost and flexible tools that have a broad variety of applications (e.g., see Reference Danquah, Hasham and MacFarlaneDanquah et al. 2019 on Sierra Leone). Workshops to document and review implementation steps taken to date help staff members spot omissions and bottlenecks and discuss creative ways to surmount unanticipated obstacles.
The World Bank’s Global Scaling Up Rural Sanitation program aptly illustrates this kind of effort. With the goal of making a dent in the 2.5 billion people worldwide without access to improved sanitation, the project launched pilots in three countries, which served as learning laboratories for developing a theory of change. After this pilot phase concluded, the project then made the necessary adjustments and scaled up to a further 10 countries; to date, it has provided some 22 million people in 13 countries with improved sanitation.
The use of pilots in the initial “learning laboratory” countries provided crucial knowledge about what worked and what did not. This information was then disseminated through a global network, allowing team members to reflect on and analyze the results of their actions. Team leaders were able to learn from these initial lessons in real time, allowing for quick adaptation. An iterative and adaptive approach was also hardwired into the program, giving task team leaders both the freedom and the mandate to apply lessons learned in their countries or areas of responsibility, while also adapting and correcting course as they scaled up and collecting their own evidence locally to target effective behavior changes and interventions.
A second example from the GDI illustrates a slightly different approach, this time in the context of improving access of Nigerians to sustainable, clean, potable water. A case study indicated that governance reforms were difficult to implement, trust in the system was low, and monitoring was weak – with the result that progress had stalled. It was crucial to establish trust, build networks, and enhance relationships with a wide variety of stakeholders. To design a new phase of the project, the World Bank decided to share the case study and solicit ideas from each major stakeholder. It organized a series of meetings to invite observations and proposals. The first convened its Nigeria task team leaders. The subsequent meetings took place in Abuja and involved participation from representatives of more than sixty agencies, including the head of the Federal Program Implementation Unit, the high representative of the Federal Ministry of Finance, State Ministers of Water, State heads of the program implementation units, and the World Bank Country Director. Participants had a chance to discuss the case itself and introduce other information, then they charted out concrete recommendations.
13.3 Seven Qualities That Make a Case Useful for Practitioners
For purposes of learning and mid-course adjustment, not all case study formats are created equal. Moreover, the information and format needed are not always the same that academic colleagues seek. The GDI and Princeton’s ISS program both ambitiously tried to tailor what they do to serve three distinct audiences: practitioners who want to improve implementation success, policy researchers or scholars who want to ground a (social) science of delivery, and aspiring leaders completing courses of study in universities and staff colleges or executive education programs. The jury is still out on whether it is possible to serve three masters equally well. Nonetheless, the experience to date has generated some wisdom – not yet formally tested! – about what helps a case to meet the needs of practitioners. This wisdom can be spelled out in seven principles:
1. A good case draws on a clear, shared lexicon.
A good part of what makes some cases more useful than others in development policy is the conceptual structure that underlies them, the lexicon. A good case is far more than a heap of facts the reader must somehow fit together. A good case focuses on subject matter that is central to a decision or series of decisions and helps reveal the development challenge and choice architecture, as well as the conditions or circumstances that affected the options available and the degree of success. The utility of a case depends on the ability to attach general names to the core challenges and in so doing facilitate comparison and consideration of alternatives.
A lexicon precedes a theory. It is a conceptual map, the key or index a practitioner, instructor, or researcher needs to identify other instances in which the same issue arose. For example, the difficulty people have in coming together to provide a public good, like a litter-free street, is a collective action problem. To be useful, qualitative cases that address this issue either have to use the term or employ the definition, minus the jargon, so that we can draw them into the pool of shared experience.
To employ an analogy, many of us have probably had the experience of moderating a discussion in which people with diverse experiences share their recent work. The moderator’s job is to find the common ground, the shared problem on which the participants have something to say and could learn from each other. That job is much easier when the presenters share a lexicon and use that reference to define their focus and structure their remarks. Otherwise the moderator has to try to discern points of congruence based on fragmentary information – or ask the author, “this is a case of what?”
The ease with which we can learn from qualitative cases hinges partly on the degree to which the general names unlock the experience of others. It goes without saying that to be useful to development practitioners, this lexicon has to respond to how those practitioners think about their work and to what they seek to know. For example, to assist with implementation, both ISS and the GDI developed frameworks that featured a variety of delivery challenges (such as geographic fragmentation) and common impediments to success in achieving a broader development outcome (such as better health). But the aim was also to link users to broader theories and toolkits helpful for thinking outside the box and developing new approaches.
Located in an academic institution, ISS defined its lexicon by matching the problems governmental leaders said they encountered in trying to build more effective and accountable government with existing conceptual vocabularies in the social sciences. For example, some cases focus on coordination problems, and the program treats these in several different domains or policy spheres, including cabinet offices (centers of government), public financial management, disaster response, and business process improvement. In addition to coordination, collective action, and principal–agent/agency issues, the program focuses on problems that are especially difficult because they can lock a country into subpar performance: institutional traps, capacity traps, norm coordination traps, or thresholds, for example. (This approach led one reviewer to term the program’s work “trapology.”)
The GDI tried to secure a tighter fit between its lexicon and the mental maps of people in its diverse user base.Footnote 9 It reviewed more than 160 development publications to identify the delivery challenges most often encountered and conducted a text analysis on more than 4,000 Implementation Completion Reports from projects supervised by the World Bank and other development organizations. Focus groups reviewed the draft lists. The final result was a taxonomy with two levels. At the higher level, the program chose fifteen broad types of implementation problems across three dimensions: stakeholders, context, and project.Footnote 10 Below that were fifty-two additional keywords that presented a more granular view of specific delivery challenges. In the end, the effort yielded a taxonomy that included a mix of challenges, in several domains of application, mirroring the way many potential users searched for information and advice.
2. A good case has a structure that communicates what a practitioner needs to know and facilitates cross-case comparison.
Whatever the realm of use, a good case is a story with a particular spin, in the sense that it helps the user focus in on the information needed to draw conclusions. Structure is important for this reason, and the right structure depends on the intended purpose. If the focus is on implementation, then the case should track the stages of the implementation process, for example: problem recognition, likely delivery challenges, framing and strategy, steps taken to implement, adaptation processes, results obtained, and thoughts about what one might do differently. This ideal-type may not perfectly mimic the actual policy process in a given setting, but a decision-maker can easily follow the case narrative and relate to the subject matter if arrayed in this way, as well as compare and contrast with other cases.
The ISS program and the GDI both adopted templates to facilitate comprehension and comparison. With a few exceptions, the main actors – the “voice” of the story – are civil servants, civic leaders, task managers of projects, and occasionally managers based in international organizations. The text walks the reader through the context and the anticipated challenges (a set of hypotheses about potential sources of difficulty), and shows the options considered and the program design or strategy adopted to address these. Each case documents the new practices or policies a reform team created and the steps they took to win support, secure authorization, build awareness, reshape organizational cultures, and do the many other things often required to put a new system in place. In this respect, the approach resembles the classic Harvard Business School management case that puts the reader into the driver’s seat alongside the person who has to solve a problem. The cases also document unanticipated obstacles and happy surprises, then conclude with results and participants’ reflections on what they would do differently next time or in a different context.
3. A good case entertains multiple hypotheses.
Many different possible causes may account for an outcome. The case should make these visible to the reader and indicate where one or another appears to influence implementation, independently shape outcomes, or affect the scope conditions attached to solutions decision-makers employed. If the influence is negative, a work team can then think about how to solve the problem or mitigate the effects. If the influence is positive, the team might ask itself whether there are ways to amplify the impact. In this way, making hypotheses explicit facilitates adaptive management as well as instruction. This step also enhances the usefulness of a case for social scientists and policy-makers who aim to conduct cross-case comparison or internal process tracing to try to adjudicate among theories.
One sometimes hears that a good case must leverage a single underlying theory. But is that necessarily true? This approach is often too restrictive in practice, though it has its place. It would mean that, as in some kinds of social science research, the purpose of a case is to help us decide whether to accept or dismiss a particular account of results or impact. In areas where conditions may make a science of delivery achievable, as in aspects of public health, education, or economic policy, there is a rationale for constructing cases in this way. But for the purposes of adaptive management, in policy spheres where multiple causes are in play, it is preferable to entertain a range of theories and the hypotheses that flow from them.
There can be tension between the ultimate use of the case and making hypotheses explicit up front. The ISS program wrestled with this problem, sometimes with mixed success. Each series of its cases begins with a research design that highlights the many influences it wants to trace. Most of these become part of the challenges the decision-makers in the case confront, laid out in the second section. However, to ensure cases are engaging to read, ISS does not tag its hypotheses as such. Moreover, not all appear in the same section in every instance. Separate cross-cutting analysis carries the weight of this need. The decision to proceed in this way has consequences, however, and one is that many see the cases as purely inductive, scoping exercises. To conform more fully to a social science model, the program would have to produce a second, stylized version of each case that directly engaged hypotheses and shed other detail.
On the basis of its early experience, the GDI discerned five core categories of causal influence that development practitioners valued highly. Though not each was equally important in every instance, these dimensions provided an instructive set of entry points for assessing the dynamics of implementation and gradual accumulation of granular knowledge about these effects of contextual characteristics, political factors, and the actions of implementation teams on outcomes and impact.
The five dimensions (outlined below) were interconnected, complementing and enabling one another. Cases examined how particular challenges encountered along the way were managed with respect to:
a. Citizen demands and citizen outcomes: defining the goal as measurable gains in citizens’ well-being; identifying the nature of the problem based on a thorough understanding of citizens’ demands and local context; staying attentive to all factors that influence citizen outcomes, including, but not limited to, grassroots representation and bottom-up political pressure.
b. Collaboration: facilitating multistakeholder coalitions and multisectoral perspectives to identify and prioritize problems and coordinate (possible) solutions; convening varied development partners and building on their competitive advantages; tracing the impact of coordination structures on development outcomes.
c. Evidence to achieve results: using the best available evidence to identify problems and solutions; developing local evidence to refine solutions; collecting evidence of results throughout the project cycle; contributing to the global body of knowledge with the evidence collected for scaling up; whether outcomes were driven by evidence.
d. Leadership for change: understanding local political economies and drivers of change; identifying the incentives that motivate behaviors and integrating these into designing delivery solutions; evaluating whether incentive systems or political will accounted for outcomes.
e. Adaptive implementation: developing an adaptive implementation strategy that allows for iterative experimentation, feedback loops, and course correction; building a committed team with the right skills, experience, and institutional memory; maintaining the capacity to reflect on actions and their results; assessing whether institutional capacity for learning helped drive results.
GDI cases also included hypotheses drawn either from practitioner experience or research.
4. A good case contains essential operational detail. To serve development practitioners well, a case must speak to the issues that managers face with sufficient granularity that a counterpart in another country can follow the steps laid out. This quality often runs counter to what we seek in academe, where the aim is to test highly parsimonious theories that have broad applicability or scope, and where both the content and analysis of cases focuses on just a few key variables. The difficulty is to discern the difference between extraneous information and pertinent operational elements, which may include legal authority to act, the impact of political structures on jurisdiction, organizational routines, budget calendars, costs, information architecture, algorithms, and other elements, depending on the subject matter. From the perspective of someone trying to lead institutional change or implement a complex program, the devil is often in these details. An expert should see what she considers essential in a case and a novice should find the language easy enough to follow that the technical detail is clear.
When the person or team researching and writing the case (or facilitating case development) is unfamiliar with a subject area and the specific issues managers confront, reaching the right level of granularity may pose a problem. In some technical areas, both the ISS program and the GDI engaged experts to partner with them or to review initial briefings before case development began. Employing questions broad enough to allow practitioners to discuss their work in their own terms also helped the cases reach essential detail. It was always useful to ask, at the end of a conversation, “What would you like to know about how your counterparts in other countries have tried to reach the outcome you wanted to generate?”
5. A good case pays attention to political will but need not make political will its focus. Whether in the limited sense of having approval (authorization) from a department head or in the larger sense of having the backing of the head of state, implementation cases usually cut into a problem after there is at least a modicum of political will to proceed with a program and after an opportunity or ripe moment has already materialized. Sometimes sustaining political will is indeed one of the obstacles, but usually addressing this issue is antecedent to the steps taken to deliver a result. If there is no will, there is no policy intervention, and for those of us interested in improving implementation know-how, the “no will” cases are generally less interesting than others (though sometimes good ideas and initiatives bubble up without leadership).
A good practitioner case identifies the source of political will, as well as changes in intensity or motivation that may flow from political transitions, rotation in office, changes in popular opinion, unexpected events, etc. The case should identify how political backing was sustained or grew, or whether it was simply irrelevant and why. Were there self-reinforcing incentives built into the program design? Did program popularity make it difficult to change once the program started to deliver results? Were citizens groups able to lobby? Did leaders become part of a professional community favorable to a program’s continued operation? It may be tempting in some instances to attribute a project’s initiation or durability to outside pressure from a development partner, but rarely is that true. A good case explains why officials acceded, if in fact they did so.
6. A good case discusses scope conditions. One of the criticisms of randomized controlled trials is that they have limited external validity (Reference Pritchett and SandefurPritchett and Sandefur 2015). We often just do not have the information to know whether the same result would occur in other places, for other people, or during different periods in history (Reference WoolcockWoolcock 2013). Learning from qualitative case studies can be prone to this same problem, but an implementation case usually provides some grist for thinking more systematically about whether the experience highlighted holds lessons for others. That grist comes in the form of a clear specification of context and analysis of how context shaped the steps taken and the results achieved. Such an analysis provides some basis for understanding how a change in implementation circumstances (context, scale, population) might alter the result.
Beyond encapsulating these broad principles, both ISS and the GDI made it a practice to offer the people who did the hard work of putting a program into practice a chance to think about how their experience generalizes, thereby capturing some of the tacit knowledge in the heads of these experts. For analytical purposes it is important to establish the parameters within which the findings of a given case apply, and experienced practitioners are often keenly aware of how slight differences in legal authorization, public opinion, or institutional capacity could make it hard for others to emulate their successes.
7. A good case is fun to read. Our two programs differ with respect to this seventh quality: the “engagement factor.” People are busy. Senior officials, especially political leaders, are exceptionally so, and gaining their attention can be hard. If the purpose of a case is adaptive learning or diffusing experience, then a case ought to draw the reader in and get to the point fast. For this reason, the ISS program opted to follow a Harvard Business School management case model that puts a decision-maker in the driver’s seat, uses names and quotes (cleared with the people interviewed), and keeps jargon to a minimum. Its cases put the reader right at the coal-face.
This approach had its pros and cons, however. In the program’s view, while it boosted engagement with many practitioners and with students, it sometimes hurt credibility with a social science research audience, for whom this approach seemed to imply a “great man” theory of history. In the program’s view these concerns were often misplaced. The style was similar to highly commended scholarly work on political development. The social science translation problem more often lay in the release of individual cases separately from cross-cutting analysis – and outside the realm of peer-reviewed journals.
For its part, the GDI, initially hosted within a multilateral organization, chose a different approach. Its cases usually treated an agency within a government or an institution as the lead actor, though it may mention the names of those involved. By virtue of being a consortium of more than forty partner organizations, of necessity the case writing style adopted had to balance ensuring adequate cross-program coherence with fitting the particular preferences and imperatives of its affiliate members. This approach also came at a cost, sometimes obscuring the internal negotiation dynamics within the agency in favor of a cleaner or more administratively procedural account. That said, adopting such an approach also allowed communities of practice to stand back and evaluate a situation more dispassionately.
13.4 Putting Cases to Work: Moderating a Case Discussion
A case is not usually a stand-alone document, though it can be so. If an important purpose of case studies is to promote learning and adaptation, then much rides on their capacity to stimulate group reflection, deliberation, and innovation. This in turn raises another question: How does one effectively moderate a case discussion?
Coming forward to the present, in our experience, the tone, sequence, and focus vary depending on whether the aim is to teach – to introduce key concepts and ways of thinking about a problem – or to help people who have participated in implementation reflect on their work. For the first purpose, the moderator may play a strong role in directing the discussion so that a group reaches key points, pausing to elaborate these. By contrast, for adaptive learning, where the point of a discussion is to help the people who carried out the work reflect and solve problems, the moderator may stand back a bit more to give participants a bigger opportunity to shape the agenda and to get into specific operational details in more depth than one might in a classroom setting. In both situations, however, there are some shared objectives, most importantly stimulating creative thinking about ways to: overcome obstacles that continue to impede success; mitigate the downsides of a generally successful response; reach difficult (isolated, marginalized) communities; take the intervention to scale; or adapt an approach for different circumstances.
To use a case for classroom purposes, we usually begin by reminding the group of the broader issues at stake. Every case has a development challenge at its core, the public value the people at the center of the action seek to create: the desired impact on citizens’ lives. Every action also has an author, so naming names is important, or at least naming offices: “Minister Marina da Silva wanted to reduce the rate of deforestation in order to adhere to a new climate regime and preserve water quality and availability in her country”; “Sudarsono Osman wanted the land registries in Kuching to serve citizens faster, with fewer errors.” The discussion leader may want to add some additional facts to situate the issue, identify what created the space for change, and add some more detail about the lead decision-makers.
Next comes the dramatic moment: “But … something stood in the way.” The discussion leader then poses a series of questions, beginning with “What was the main problem, the main delivery challenge?” At this stage, it is important to ensure that everyone can identify the general form of at least the major implementation problem in a case – process efficiency, aligning the interests of a principal and an agent, collective action, or coordination, for example: “Mr. X is responsible for making the program work, but he’s stuck. At the start, what is his main problem? What is the general form of this problem?” Knowing the general form enables the case user to link to a general toolkit and consider whether solutions often considered in other settings might be useful in the circumstances at hand. The ability to abstract in this way enlarges problem-solving capacity. It is important to pause and sharpen familiarity with the general concept and the standard toolkit at this point.
Third, we help users connect with the context: “What do we know about the setting and the elements of context that might shape which tactics Mr. X can deploy?” Context is something that will come up throughout the discussion but especially at the end, when the focus is often on scale, scope conditions, and adaptations required to help a similar approach work in another setting. Context may include resource levels, diversity, socioeconomic conditions, government structure, legal authority, and many other conditions or circumstances, some of which may be malleable, while others remain fixed.
The real focus of the discussion comes after this point: “What options did they consider? Were there other possibilities and, if so, do we know why they weren’t considered? What motivated the choices they made?” And then: “Let’s work through the steps the team takes … ” The central objective is to develop a clear outline of the strategy and tactics employed. If the real issue the instructor wants to use as a focal point occurs later in the case, then it may be perfectly acceptable to expedite the discussion and simply throw the key elements of the initial response into a Powerpoint slide. “So here are the steps they initially took … Have I got it right?” Usually, however, the aim is to pause to consider the purpose of each step, the appropriateness of the design, what proved difficult to do, any pleasant surprises, and how sensitive the actions taken were to the aptitudes of team leaders or context.
In the classroom, the instructor’s job is to help participants identify concepts useful for analyzing problems that emerge at each step, as well as to bring external information to bear, where warranted. One of Princeton’s Ebola response cases, for example, focuses on carrying out contact tracing in a very difficult context. If the group is unfamiliar with the key elements of contact tracing, it is helpful to call a short “time out” and explain these in some detail. Even if the elements are in the case text, pausing to reinforce the ideas is often helpful for nonspecialists.
Sometimes the focus of the discussion is not on the strategy or the main steps taken, but on an unanticipated obstacle a team confronts: “There is a big unanticipated obstacle in this case … They struggle to adapt. Put yourself in their shoes. How would you deal with this situation?” If the obstacle is minor and the response is successful, it is possible to fold this discussion into the previous stage of the conversation. If the obstacle is significant and incompletely resolved, the major part of the discussion could focus on this matter. The aim is then to help participants identify possible solutions by abstracting from the specific – giving the problem a general name that links to a toolbox – or by inviting each person to tap his or her own experiences and intuitions about how to solve the problem.
At this stage the moderator’s role is to ensure everyone has a chance to contribute and to provide two or three alternative ways to structure the problem under discussion, in the event that everyone is stuck. For example, in one Smart City case, a public health unit used sophisticated math modeling to identify households at risk of lead poisoning, but the effort temporarily ground to a halt over the question of whether it could enter houses at risk and intervene, given concerns for privacy, personal autonomy/consent, and data security. Did it matter that those most at risk were too young to make informed choices on their own behalf? Would the answer to these questions be different if the issue was secondhand cigarette smoke or some other kind of risk – and if so, why? The moderator stimulated thinking by highlighting the ethical principles at issue and inducing participants to think about the implications by pointing to analogous issue areas where the same quandary was a matter of settled law or procedure.
The discussion moderator may want to summarize the results actually achieved and move on, but it is also possible to craft two important conversations around this segment of the case: one focused on causation and the other focused on metrics. Often the conversation will jump to the impact on the broad development challenge, the outcome highlighted in the beginning. In most instances many things affect this type of outcome, so it is important to identify the other things that contribute – the potential confounders – and then try to identify the specific lines of influence through which policy implementation shaped this “public value.” To establish these lines of influence, we usually have to focus on intermediate outcomes or outputs: faster delivery times, lower rates of error, more inclusive coverage, etc.: “Were these the right metrics? Can you think of better metrics? If your office didn’t have much money, is there a way to assess effectiveness inexpensively?” “What contributed most to these improvements?” “On one important dimension, there was little improvement … Why?”
Finally, if the purpose of the discussion is to assess the extent to which lessons from the case are applicable in other contexts, then it is possible to skim through some of the other stages and focus on this matter. Identifying the scope conditions, or the central factors and processes shaping the effectiveness of the solution case protagonists deploy, is central to this task. It is also possible to focus this part of the discussion on ways to improve further, to mitigate the downsides of the tactics selected, or to borrow from other fields to get around some of the limitations associated with the tactics actually used.
Some moderators subdivide the cases, asking participants first to read just the opening sections that outline the problem and the delivery challenges (possibly also the options considered and framing), so that the group has a chance to think about tactical toolkits available and how to proceed. The moderator then hands out further sections of the case, and the next phase of the conversation picks up with what the decision-makers actually did and the pros and cons of the approach, improvements, etc. A third handout might focus on an unanticipated obstacle or on results, prompting another turn in the conversation.
Over the years we have come to share the view of Harvard Business Case Publishing that providing moderators with teaching notes or discussion guides improves usage and enhances the quality of discussion. These notes provide some of the general concepts, toolkits, conceptual puzzles, options, and additional background information that moderators often need to move a conversation forward and inspire creative thinking. Generating them should become a part of the case development process, and they usually flow well from the initial research design and the cross-cutting analysis produced at the end, if there is such.
13.5 Using Case Studies as Part of Adaptive Management
Using cases for problem-solving or improvement within an organization entails a slightly different approach. In this setting, the case study becomes part of a participatory process designed to improve problem identification, foster development of solutions, and win agreement on accompanying changes in practice, including monitoring results. Since 2012, this form of adaptive management, long practiced in many major companies, has attracted a following in public sector development organizations. The United States Agency for International Development’s adaptive management principles, treated as requirements in some of its assistance packages or awards, include elements such as regular monitoring of results; practices to support mid-course review of strategy and implementation and course correction; rewarding “candid knowledge sharing” and collaborative learning; and sharing results widely.Footnote 11
The qualitative case study can play an important role in this approach. In some instances, the case writer’s role is to conduct interviews before a mid-course review begins and to assemble observations of individual team members and beneficiaries in a form the moderator can use to structure discussion of what has worked, why some steps did not succeed to the degree anticipated, and what to do next. The project manager may then use the results of the conversation to revise the program so that there is a record for comparison after the next attempt to improve delivery. Alternatively, a designated writer may skip the first step and become the recorder for the group discussion, creating a case as a record or after-action report. Qualitative cases drawn from other settings may also enter the moderated discussion at various points in order to spur reflection and creative thinking about what decision-makers should do next.
Those developing the PDIA approach have given this issue a lot of thought. In their experience, one of the challenges associated with learning and adaptation is to induce team members to think hard about the sources of success and difficulty. For this purpose, they employ some of the tools of the trade that Toyota has developed – for example, the “Five Whys” exercise that asks participants to push themselves beyond an initial statement about the proximate cause of a problem to deeper reasons: If A was the cause, why did A happen? If B caused A to happen, what caused B?Footnote 12 They go through this exercise at multiple points, creating a “fish diagram” to help provide a record of the discussion (see Figure 13.1). This discussion generates information and insight to incorporate in the next case draft, rendering the case study a collective, participatory product.
The next step is to encourage people to think outside the box in generating solutions for each problem the process identifies. The aim at this stage is to encourage people to draw on their own thinking. At this point it may be helpful to consider what others who have faced similar problems have done, drawing on cases from the libraries that the GDI and the ISS offer, or some other source. These stories take people out of their circumstances and surroundings, reduce defensiveness, and trigger new lines of thought. These conversations about other places usually quickly lead back to a more open discussion about the issues on the table. The moderator may summarize what another government or agency tried and then simply ask, “Would that work here, in your view?” “What would you do differently?” “What is the theory of change behind this idea?” “How will we know if this idea works?” Again, this part of the discussion can go into the case draft, if the case serves as the collective record.
But there is also a further step in adaptive learning. The PDIA authors ask participants to identify the space for change in connection with each problem identified in the previous step. That space includes three elements: Authority (who has the authority to act?), Acceptance (Do the people who will be affected recognize the need for change?), and Ability (Is there capacity – time, money, skill – to act?). This phase of the discussion may help set priorities – if the suggestion is to move where there is space or leverage – or it may lead to creative thinking about how to expand the space for change. This information may also become part of the case record.
Both the GDI and the Princeton ISS program have contributed to learn-and-adapt initiatives. In its first years, the GDI’s Science of Delivery team helped more than sixty different projects use cases to broaden or deepen thinking during review of the initial concept note, decide how to address operational challenges, or present results. Participants sometimes convened their project staff to discuss and record their experiences as their work moved forward, resulting in the gradual development of a case, or they assembled at the conclusion of a project to develop an after-action report that documents the steps they took.
The GDI described its method, the Delivery Lab, as an opportunity to bring together thematic experts with specific operational knowledge from GDI’s partner organizations and other invited guests who are working to overcome the obstacles and bottlenecks that can impede development efforts. Each lab began with a practitioner (the challenge holder) sharing an operational challenge that he or she currently faced in the context of an ongoing project. This brief presentation was followed by a facilitated group discussion and brainstorming session where experts shared relevant experiences. Ultimately, participants worked together to cocreate actionable solutions. The sessions allowed for peer exchange of experience-based knowledge, as practitioners explore problems and think through potential solutions.
13.6 Conclusion
Implementation-focused case studies play a vital role within the development community in the three key respects described here: (a) helping to develop better understanding of implementation dynamics (a science of delivery), (b) training, and (c) supporting adaptive management. But both the GDI and ISS program observe that practitioners have often employed qualitative cases for other purposes too.
Sometimes the aim is simply to help a manager or public servant structure a problem and think about the menu of options others have tried. A case study can provide a quick guide to key issues and enough operational knowledge to enable the decision-maker to figure out what s/he needs to know so as to pose the right questions in a more detailed person-to-person follow-up conversation. For instance, Princeton’s ISS program has documented the efforts of a number of governments to improve cabinet office coordination and support for policy decisions. These cases have helped chiefs of staff and deputy ministers learn from each other without having to take valuable time to travel abroad in search of ideas. But they have also facilitated face-to-face small group meetings that have matched those who have led impressive reforms with those who are just beginning to think about what to do.
To take another, similar example, the GDI used a case on accountability for mineral royalty funds to support Colombia’s peace process. In Colombia, royalty funds from mining and natural resources held potential for financing local projects and building legitimacy. However, early experiences in managing natural resource funds were unsuccessful in part because local governments lacked capacity to avoid misallocation, corruption, and poor planning, and the central government had no mechanism to remedy this problem. As a result, instead of building peoples’ confidence in their governments, the initial program undermined trust and the sense of government efficacy. The National Planning Department then created a new program that had flexibility to help local governments to build their capacity to implement projects, while also mobilizing community members to carry out “citizen visible audits.” The case study on this program, which helped ensure that money was not stolen or misplaced and that projects met the real needs of the citizenry, helped foster agreement among parties to the peace process. In Colombia, an actual example of how to build local accountability and legitimacy and equitably use natural resources to develop the country moved policy conversations forward.
Apart from this kind of use, the programs have also found that people who have played important roles in the changes a case documents value the record of achievement. Those who labored hard to make something happen often immediately move on to the next project or crisis. The case provides welcome recognition and helps them explain their own contributions to others. They say the acknowledgment helps fuel another round of effort. Indeed, organizations often ask the programs whether they will commit to develop a case study on a specific program so that managers can say to team members, “If we do well, we will become a model … ”
In other instances, people have written to say that they have used a case as a briefing to prepare for deployment to a new post. Operations documents and technical reports rarely contain names, but cases often do, thereby helping newcomers know to whom they can reach out for additional information while also offering historical context and an implicit heads up about sensitivities.
Finally, the case study is a vital tool for communicating to a wider audience what purpose a development initiative serves, the human story that unfolds around and within it, and the results achieved. It gives form and spirit to the numbers we often use to analyze policies. In an era when trust in governments and international organizations is low, the case study is a way to make the work practitioners do more accessible to fellow citizens and to rebuild shared understandings about the missions we pursue.