1. Introduction
The extent to which the high costs of rural electrification are justified by its impacts on societies and economies has been a matter of debate for decades (see, for example, Rose, Reference Rose1940; Devine, Reference Devine1983; Barnes and Binswanger, Reference Barnes and Binswanger1986; Barnes, Reference Barnes2010). In recent years, academic contributions to this discussion have been influenced considerably by the so-called credibility revolution in economics (see Angrist and Pischke, Reference Angrist and Pischke2010). The claim is that ‘design-based research’ (Card, Reference Card2022) like randomized controlled trials (RCTs) and instrumental variables (IVs) leads to more credible and verifiable identification of causal effects. This ‘experimentalist paradigm’ (Biddle and Hamermesh, Reference Biddle and Hamermesh2017) is closely linked to the vision of evidence-based policy: well-identified causal effects, so the narrative goes, will eventually tell us which interventions work and hence should be scaled to shape future policies (Young et al., Reference Young, Ashby, Boaz and Grayson2002; Duflo, Reference Duflo2004, Reference Duflo2020; Panhans and Singleton, Reference Panhans and Singleton2017).
In this paper, we examine the case of rural electrification in the Global South, documenting that design-based research is much less effective in improving policy than it is often claimed. This is not a new verdict, and we build on previous critical reflection on the credibility revolution paradigm (Rodrik, Reference Rodrik2008; Ravallion, Reference Ravallion2009, Reference Ravallion, Bédécarrats, Guérin and Roubaud2020; Heckman and Urzua, Reference Heckman and Urzua2010; Basu, Reference Basu2014; Deaton and Cartwright, Reference Deaton and Cartwright2018; Deaton, Reference Deaton2020; Drèze, Reference Drèze2020; Muller, Reference Muller2023).Footnote 1 We extend this line of discussion by a specific application to rural electrification, an important area of development policy that absorbs large amounts of public funding (World Bank, 2018; Blimpo and Cosgrove-Davies, Reference Blimpo and Cosgrove-Davies2019). While national governments often justify investments into rural electrification from a social justice and hence a rights-based perspective, donor agencies and international development banks are under pressure to prove that the investment is worthwhile, following an explicit or implicit cost-benefit analysis logic. There is also an interesting within-sector cost-effectiveness debate because expensive grid extension competes with infrastructure leapfrogging via lower cost decentralized solutions like stand-alone solar or mini-grids (Levin and Thomas, Reference Levin and Thomas2016).
To inform this debate, many empirical studies have been published in recent years that examine the impacts of rural electrification, increasingly also using design-based methods from the credibility revolution toolkit. Several systematic reviews and meta-analyses have summarized this growing literature. In a nutshell, these reviews show that the literature is divided, with some studies finding very large effects, and others very modest or no effects. This divide is consequential for policy, especially considering that, for the extension of the power grid, large effects are required to justify the high costs. This holds true under a cost-benefit principle as it is applied by many donors, but also under a rights-based principle because then grid extension competes with off-grid technologies for cost-effectiveness.
Such meta-analyses and systematic reviews are important because, while design-based research is good at generating well-identified causal effects, the external validity gap still needs to be bridged. For this, an accumulation of evidence is needed – something that Duflo (Reference Duflo2020) refers to as the ‘pointillist painting,’ with each causal study being one dot on the painting.Footnote 2 We use the case of rural electrification to show that even in a rich literature the pointillist painting is hard to compile and the dots on the canvas leave a lot of room for interpretation. We further argue that in highly contested policy areas, even well-meaning policymakers will use this wiggle room to pursue their interests. Next, we argue that the practice of design-based research, despite its intellectual beauty in identifying causality, is not immune to other biases stemming from questionable research practices, underpowered designs, overgeneralization, and publication bias. This further complicates the use of evidence in the policy landscape.
To conclude, we argue that this observation is not particular to electrification. We therefore call for a debate on what this implies for the science-policy interface. More research is needed on how evidence is generated and synthesized as well as how it is used for policy.
2. The credibility revolution in the electrification literature
Prior to the credibility revolution, empirical research on rural electrification had been conducted for many decades and had recurrently featured insightful studies based on various methods. Nonetheless, it is a showcase example of what the credibility revolution rightly criticized in the 2000s: many studies made some sort of causal inference based on a naive comparison of people or regions with and without access to electricity, without accounting for endogenous selection processes (see Peters, Reference Peters2009).Footnote 3 That has changed over the past 15 years or so, with an increasing number of published studies revealing more sensitivity for the problems of selection bias. The methodological portfolio first covered quasi-experimental matching and difference-in-difference designs, but increasingly also IVs and sometimes RCTs.
In fact, IVs have been used in many papers on grid-based electrification. Dinkelman (Reference Dinkelman2011) and Lipscomb et al. (Reference Lipscomb, Mobarak and Barham2013) are the earliest examples and they have been influential and foundational for the literature. The decentralization of electricity access also facilitated randomization, so that the first RCTs appeared in the mid-2010s (Furukawa, Reference Furukawa2014; Aklin et al., Reference Aklin, Bayer, Harish and Urpelainen2017; Grimm et al., Reference Grimm, Munyehirwe, Peters and Sievert2017). RCTs for power infrastructure in most settings proved to be infeasible for political or budgetary reasons; Lee et al. (Reference Lee, Miguel and Wolfram2020a) is a notable exception. Yet, for on-grid electrification, quasi-experimental methods and especially IVs continue to be the dominant identification strategies, while for off-grid solar several RCTs exist.
This wave of intense design-based research was followed by a battery of overview papers and systematic reviews (henceforth ‘reviews’) (Bernard, Reference Bernard2012; Peters and Sievert, Reference Peters and Sievert2016; Bonan et al., Reference Bonan, Pareglio and Tavoni2017; Jimenez, Reference Jimenez2017; Bos et al., Reference Bos, Chaplin and Mamun2018; Morrissey, Reference Morrissey2018; Blimpo and Cosgrove-Davies, Reference Blimpo and Cosgrove-Davies2019; Hamburger et al., Reference Hamburger, Jaeger, Bayer, Kennedy, Yang and Urpelainen2019; Bayer et al., Reference Bayer, Kennedy, Yang and Urpelainen2020; Perdana et al., Reference Perdana, Glandon, Moore and Snilsveit2020; Lee et al., Reference Lee, Miguel and Wolfram2020b; Jeuland et al., Reference Jeuland, Fetter, Li, Pattanayak, Usmani, Bluffstone, Chávez, Girardeau, Hassen, Jagger, Jaime, Karumba, Köhlin, Lenz, Naranjo, Peters, Qin, Ruhinduka and Toman2021). The research community has hence not only generated the dots on Duflo's pointillist painting but also invested in compiling what the painting shows. All these reviews diagnose a divide in the literature, that is, one set of studies comes to very positive conclusions about the development effects of electrification while another set of studies rather observes small or no effects.
To understand the policy implication of this, the size of the effect must be assessed in relation to the costs. Here, it is important to distinguish between on-grid and off-grid electrification. Given the high cost of grid-based rural electrification, large positive effects are required to make the intervention cost-effective and even modest positive effects would advocate against the investment. Based on their finding of muted effects, Lee et al. (Reference Lee, Miguel and Wolfram2020a) conclude that the investment into grid extension entails a ‘social surplus loss’. In contrast, for off-grid electrification such as small-scale solar, even modest effects can render a cost-benefit analysis positive and suggest that promoting this technology is cost-effective – because of the considerably lower investment cost (Grimm et al., Reference Grimm, Lenz, Peters and Sievert2020).Footnote 4
The reviews refer to several potential explanations of the divide, but in our reading, two narratives stand out: a regional divide and a methodological divide.Footnote 5 Jeuland et al. (Reference Jeuland, Fetter, Li, Pattanayak, Usmani, Bluffstone, Chávez, Girardeau, Hassen, Jagger, Jaime, Karumba, Köhlin, Lenz, Naranjo, Peters, Qin, Ruhinduka and Toman2021) is an insightful starting point. It does not delve into a narrative for the divide in the literature. Its main purpose, rather, is to comprehensively take stock of the literature. Jeuland et al. (Reference Jeuland, Fetter, Li, Pattanayak, Usmani, Bluffstone, Chávez, Girardeau, Hassen, Jagger, Jaime, Karumba, Köhlin, Lenz, Naranjo, Peters, Qin, Ruhinduka and Toman2021) thereby illustrates how vast the evidence base is when a review is very inclusive. By covering a generous list of journals as well as the grey literature, it shows that the electrification literature comprises some 2,000 studies. As an extreme case, one can draw from this large pool to compile the pointillist painting, even if there is certainly a broad consensus that many of these 2,000 dots should be dismissed, for example because a study does not apply design-based methods. All other reviews employ much more exclusive selections of the literature and most include design-based studies only.
The regional narrative for the divide in the literature points to the different development potentials in different regions and target populations (see, for example, Peters and Sievert, Reference Peters and Sievert2016; Hamburger et al., Reference Hamburger, Jaeger, Bayer, Kennedy, Yang and Urpelainen2019; Lee et al., Reference Lee, Miguel and Wolfram2020b). Hamburger et al. (Reference Hamburger, Jaeger, Bayer, Kennedy, Yang and Urpelainen2019) reveal that large parts of the design-based electrification literature are concentrated in just a few countries. Especially Sub-Saharan Africa is largely ignored. Related to this, Peters and Sievert (Reference Peters and Sievert2016) argue that the large effects observed in some Latin American and Asian countries cannot be generalized to Sub-Saharan Africa because of different economic conditions at baseline. They also provide evidence for small effects from several Sub-Saharan African countries, which contrast with the much larger effects in the pre-existing literature. In a similar vein, Lee et al. (Reference Lee, Miguel and Wolfram2020b) emphasize that, historically, electrification in most industrialized countries happened while the economies were on a growth trajectory. Evidence from such contexts is hence not transferable to places today where remote areas are connected that are barely integrated in economic development processes.
The methodological narrative is raised mainly in Bayer et al. (Reference Bayer, Kennedy, Yang and Urpelainen2020) and Lee et al. (Reference Lee, Miguel and Wolfram2020b). Bayer et al. (Reference Bayer, Kennedy, Yang and Urpelainen2020) establish that studies using randomized designs typically deliver smaller effects than those using quasi-experimental designs. They explain this by the selection bias inherent to non-randomized methods that inflates impact estimates. The pattern in their data is indeed striking, but an important caveat is that with one exception all RCTs were done on off-grid electrification technologies, not the grid. Grid extension programs are mostly evaluated using IVs, and sometimes regression discontinuity and difference-in-difference designs. Lee et al. (Reference Lee, Miguel and Wolfram2020b: 131), focusing on grid electrification, point to the large number of IVs in that literature and suggest that ‘it is hard to rule out the possibility that the correlation between the instrument and the dependent variable runs through additional channels beyond electrification’.
In fact, the heavy reliance on observational data and especially IVs is conspicuous in the electrification literature, and it might import risks of bias. Above all, the geographic IVs that are often used in electrification evaluations such as the land gradient or water flow are suspected of violating exclusion restrictions because they affect the causal network through many pathways, not just through electrification, the instrumented variable.Footnote 6 Another reason to be concerned is that these geographical IVs are often weak IVs, which is not a problem per se if appropriate remedies are used. But these remedies are less effective if weakness concurs with violated exclusion restrictions (Bensch et al., Reference Bensch, Gotz and Ankel-Peters2020) and if scholars screen specifications based on first-stage strength (Ankel-Peters et al., Reference Ankel-Peters, Bensch and Vance2023). Related to the screening aspect, IVs are suspected of being more prone to publication bias and p-hacking (Brodeur et al., Reference Brodeur, Cook and Heyes2020: 3,636), because ‘when using a non-experimental method like IV there are many points at which a researcher exercises discretion in ways that could affect statistical significance’.Footnote 7 Relatedly, we are not aware of an IV-based study in the electrification literature with a null result (Bayer et al., Reference Bayer, Kennedy, Yang and Urpelainen2020).
It is furthermore conspicuous that those more recent studies that find smaller effects use self-collected primary data to evaluate specific electrification interventions, irrespective of whether they are RCTs or based on a difference-in-differences. This covers studies like Lee et al. (Reference Lee, Miguel and Wolfram2020a), an RCT, but also Bensch et al. (Reference Bensch, Cornelissen, Peters, Wagner, Reichert and Stepanikova2019), Chaplin et al. (Reference Chaplin, Mamun, Protik, Schurrer, Vohra, Bos, Burak, Meyer, Dumitrescu, Ksoll and Cook2017), Masselus et al. (Reference Masselus, Ankel-Peters, Sutil, Modi, Mugyenyi, Munyehirwe, Williams and Sievert2024a) and Lenz et al. (Reference Lenz, Munyehirwe, Peters and Sievert2017) as well as Peters et al. (Reference Peters, Vance and Harsdorff2011). We therefore raise the question of whether this evaluative setting – primary data and specific interventions under evaluation – could possibly lead to fewer incentives to publish large effects. One reason could be that the specific interventions under evaluation are often large and well-known investments, making a null effect more interesting. Self-collected data also allows for tracking potential effects much more meticulously along a theoretical results chain (e.g., by eliciting appliance adoption, productive appliance adoption, jobs in electricity-using firms, etc.). This is not to say that such evaluations are without problems. Regional scope is limited and cooptation by funding development agencies is possible. Primary data also often covers shorter time periods (Nag and Stern, Reference Nag and Stern2023; see Masselus et al. (Reference Masselus, Ankel-Peters, Sutil, Modi, Mugyenyi, Munyehirwe, Williams and Sievert2024a) for an exception).
In any case, the electrification literature should be evaluated in light of recent trends in the economics profession towards more transparency (Christensen and Miguel, Reference Christensen and Miguel2018). This requires sensitivity for pre-specification and robustness replicability as well as quantitative meta-analyses that account for potential publication bias (Andrews and Kasy, Reference Andrews and Kasy2019; Carter et al., Reference Carter, Schönbrodt, Gervais and Hilgard2019; Irsova et al., Reference Irsova, Doucouliagos, Havranek and Stanley2024) – something that has hitherto not been done.
3. Bayesian policymakers and reasoned intuition
The target audience of applied empirical research according to the evidence-based policy paradigm are policymakers.Footnote 8 Economists have started to examine the conditions under which policymakers indeed make use of available evidence (Banuri et al., Reference Banuri, Dercon and Gauri2019; Hjort et al., Reference Hjort, Moreira, Rao and Santini2021; Vivalt and Coville, Reference Vivalt and Coville2023). The underlying assumption often is that the evidence provides a scientifically clear picture. In practice, though, the evidence is often murky and contradictory, and subject to debates about methodological issues. The electrification literature is a showcase example of this. It is therefore important to ask how policymakers form their beliefs.
Ideally, policymakers and we, their academic advisors, are Bayesians: We have a prior which we update as new evidence comes in. The prior's responsiveness is a function of the evidence's methodological quality. That is, the prior is firmer and less responsive to new evidence the better the already existing evidence is. Likewise, it responds more to methodologically sound new evidence. This type of thinking, though, requires repeated appraisals of the incoming evidence. For this appraisal, there exist no standards. At best, these appraisals are based on experience and expertise. In other words, we must use what Basu (Reference Basu2014: 466) calls reasoned intuition: ‘intuition and gut feeling […] need to be held under the scanner of reason before we use them to translate experience and evidence into rules and behaviour and policy.’ Most policymakers have experience and expertise, so it is possible that reasoned intuition can work when policymakers come across new evidence.
Yet, so far, we have assumed benevolent policymakers while in practice they might have some sort of vested interests. This is, in many cases, not condemnable. For example, policymakers are typically civil servants and hence subscribe to a certain political agenda of the administration they represent. It is natural that policymakers extract from the evidence what serves their interest. A divided literature like the one on rural electrification provides the basis for confirmation bias as it is empirically diagnosed by Banuri et al. (Reference Banuri, Dercon and Gauri2019). In a similar vein, Vivalt and Coville (Reference Vivalt and Coville2023) provide empirical evidence for what they call ‘asymmetric optimism’: policymakers update more on good news than on bad news.
Policymakers managing electrification portfolios can have agendas. For example, major development banks have a long history of investing in large infrastructure through grants and lending, and it is understandable that they – or some of their staff members – prefer on-grid electrification over off-grid electrification. Confirmation bias and asymmetric optimism might tempt them to seize that part of the literature that suggests substantial development effects of grid extension programs. Staff of solar advocacy organizations or private sector representatives seeking subsidies for their off-grid solar programs might, by contrast, prefer evidence suggesting only modest impacts of on-grid electrification. This would strengthen the cost-effectiveness of off-grid technologies. The hawker's tray of the electrification literature has much evidence to offer for both camps.
An informed debate between these two camps based on reasoned intuition is hence problematic. An additional important layer of complexity is that applying reasoned intuition is harder the more prevalent methodological concerns are that are not well understood within academia.Footnote 9 For example, academic debates do not converge when it comes to publication bias and how to account for it when making inferences. Likewise, controversies about robustness in replications and reproductions are hard to settle among replicators and original authors (Ozier, Reference Ozier2021; Ankel-Peters et al., Reference Ankel-Peters, Fiala and Neubauer2024). And while external validity is an accepted barrier in economics between rigorous evidence and its policy relevance, the literature on how to account for it in the generalization of scientific results is nascent but so far inconclusive (Muller, Reference Muller2015, Reference Muller, Kincaid and Ross2021; Pritchett and Sandefur, Reference Pritchett and Sandefur2015; Peters et al., Reference Peters, Langbein and Roberts2018; Vivalt, Reference Vivalt2020; Dehejia et al., Reference Dehejia, Pop-Eleches and Samii2021; Gechter, Reference Gechter2024). Concerns about construct validity are less widely discussed and virtually absent in the economics literature, although they are of utmost importance for generalization across supposedly similar interventions (Pritchett et al., Reference Pritchett, Samji and Hammer2013; Esterling et al., Reference Esterling, Brady and Schwitzgebel2023; Masselus et al., Reference Masselus, Ankel-Peters and Petrik2024b). Such debates including their ambiguous outcomes are not a failure but rather a natural part of the scientific enterprise. Nevertheless, they do pose major hurdles for the evidence-policy interface.
4. Conclusion and way forward
In this paper, we have argued that the evidence-based policy paradigm reaches its limits in the case of rural electrification. Policymakers with vested interests of different kinds will each find support for their respective agenda. But even benevolent policymakers might get into difficulties because of unresolved methodological debates in the literature. It is overly simplistic, though, to merely blame policymakers for extracting only a partial interpretation of the evidence. Academic researchers bear part of the responsibility in that they often communicate results with what Manski (Reference Manski2011, Reference Manski2019) calls incredible certitude.
Manski stresses that the logic of any inference is: assumptions + data = > conclusions. In terms of data, the rural electrification research community deserves to be applauded for the many systematic reviews it has produced, to which we owe the consolidated understanding that this literature is divided. In terms of assumptions, though, most individual papers wishfully extrapolate (again, Manski) their data to much too strong conclusions. These often heavy assumptions are only partly made transparent and range from external validity concerns to a much weaker robustness than what is communicated in the papers.
The patterns we have diagnosed in this paper are not a peculiarity of rural electrification.Footnote 10 Many literatures that have been subject to a myriad of design-based impact evaluations exhibit fuzzy pointillist paintings and methodological issues related to external validity and reproducibility. What are the implications for the learning model in the electrification literature and beyond? One response would be to do more and more design-based studies, accompanied by robustness replications ensuring that the right inference is being made, and hope for a clearer picture emerging in the literature soon. However, ‘the pace of politics is faster than the pace of scientific consensus formation’ (Collins and Evans, Reference Collins and Evans2002: 241).Footnote 11
Theory-based evaluation will help to accelerate this process (Duflo et al., Reference Duflo, Glennerster, Kremer, Schultz and Strauss2007; White, Reference White2009). Clearly outlined theory can identify mechanisms, which are then tested in (quasi-) experiments. The hope is that such mechanisms are less context-dependent and hence more generalizable than the effects of the whole program, which is often a bundle of interventions (Ludwig et al., Reference Ludwig, Kling and Mullainathan2011). It is true that much of the literature on rural electrification is lacking such a clear theory, and potential context-stable mechanisms such as productive use are rarely tested in a theory-grounded manner. This would require pre-specification of hypotheses, not just explorative heterogeneity analysis (which is indeed done in several studies). A clearly outlined theory would also render Manski's wishful extrapolation more difficult because the theoretical foundation would expose the assumptions underlying the extrapolation.
In the meantime, a pragmatic way forward for design-based research is to become humbler: impact evaluations could focus on informing the specific program under evaluation only and widely refrain from generalization to other contexts.Footnote 12 Impact evaluations would then rather be a feature for internal program management than for global learning processes. Elements of this can be found also in proposals from within the credibility revolution movement (see Banerjee et al., Reference Banerjee, Banerji, Berry, Duflo, Kannan, Mukerji, Shotland and Walton2017; Duflo, Reference Duflo2017). Yet the current reward system in academia and from funding agencies does probably not incentivize such a humbler approach.
More generally, more research is needed in the economics profession on how the science-policy interface can be improved. Absent formal evidence clearinghouses like the World Health Organization or the Intergovernmental Panel on Climate Change, policy often relies on in-house literature reviews or policy briefs, scientific advisory boards or bilateral consultations to be backed up by scientific expertise. That is fine, but policymakers need to be sensitized to the pitfalls of evidence-based policy advice outlined in this paper. Ultimately, we need a better methodology for how to organize and synthesize knowledge formation in economics – a slightly belated version of ‘studies of expertise and experience’ (Collins and Evans, Reference Collins and Evans2002). This will raise many important downstream questions for the economics profession, ushering in a veritable research program.
Acknowledgements
We are grateful for valuable comments and suggestions from E. Somanathan, two anonymous referees, Gunther Bensch, Maximiliane Sievert, Colin Vance and from participants at the Sustainable Energy Transition Initiative (SETI) 2020 workshop, International Association of Energy Economics (IAEE) 2021 conference, the Power to Empower Emerging Africa 2020 workshop in Marrakesh and the 3rd Conference on Econometrics and the Environment 2020.
Competing interest
One of the authors has made several contributions to the literature under review in this paper. Beyond this, the authors declare no competing interests.