Policy Significance Statement
This commentary advocates for more rigorous empirical and policy examinations of the relationship between machine learning methodologies and conflict forecasting for civil wars. Rather than presuming machine learning as a panacea for conflict prediction, we advocate for a greater focus on interpretability during modeling and model heuristics in policymaking. If these conditions are met, we argue that predictive conflict models can help improve peacebuilding efforts by mitigating security risks facing peacekeeping operations, enabling more timely and judicious troop allocation, and testing different outcomes for negotiations during crisis diplomacy efforts.
1. Introduction
The world is witnessing a dramatic rise in the frequency and intensity of violent conflict. The year 2022 marked the bloodiest year for armed conflict since the Rwandan genocide in 1994, with the miasma of war, predominately inter-state conflicts, spreading from Mali to Myanmar (Institute for Economics and Peace, 2023). Importantly, this swelling of violence cannot be attributed solely to any particular conflict, such as Russia’s invasion of Ukraine—the year before the onset of this war saw over 100,000 conflict-related deaths (Institute for Economics and Peace, 2023).
Many of these internal conflicts are particularly pernicious because of a failure of forecasting. When a conflict is not adequately anticipated by the international community, relevant actors cannot take the necessary steps required—whether through greater humanitarian aid, a peacebuilding mission, or other support—to reduce the risk of a country’s drift into violence. Consider the resurgence of violence in Ethiopia in 2020, which was widely and fatally unanticipated by the international community. After becoming Prime Minister in 2018, Abiy Ahmed implemented scores of liberal reforms, which included securing a peace deal with Eritrea, a country with which Ethiopia had shared a tense and hostile history (Mokaddem, Reference Mokaddem2019, 1; Soliman and Demissie, Reference Soliman and Demissie2019). The international community was generally under the impression that conflict had subsided, so much so that Ahmed was awarded the Nobel Peace Prize, with the US celebrating his “extraordinary efforts” to “advance peace and end conflict in our world” (U.S. Embassy in Ethiopia, 2019, np). Then, in November 2020, Ahmed put an end to this simulacrum of peace by sending troops into the Tigray region, paving the way for some of the most intense violence in recent Ethiopian history. The episode has been heralded as a “cautionary tale of how the West, desperate to find a new hero in Africa, got this leader spectacularly wrong” (Walsh, Reference Walsh2021, np).
Of course, the recent spread of violent conflict should not be wholly reduced to the failure of forecasting. Attention to the risk of violence is only one part of the problem; sometimes, actors are aware that a conflict is likely and still fail to respond. The widespread apathy before the onset of the 1994 Rwandan genocide is illustrative of this (Dallaire, Reference Dallaire2009). However, even in the case of Rwanda, there is evidence of international actors’ failure to anticipate violence. Many organizations lacked the capacity necessary for prediction, with the Joint Evaluation of Emergency Assistance to Rwanda’s report claiming that the UN had “poorly-developed structures for systematically collecting and analyzing information in a manner relevant to preventive diplomacy and conflict management” despite ample evidence (Eriksson et al., Reference Eriksson, Adelman, Borton, Christensen, Kumar, Suhrke, Tardif-Douglin, Villumstad and Wohlgemuth1996, np).Footnote 1
Is this failure of forecasting—or, at the very least, the failure to improve forecasting—a corollary of the extreme complexity of conflict (Tetlock, Reference Tetlock2005; Chadefaux, Reference Chadefaux2017a)? Conflicts are sociopolitical events defined by highly convoluted, overlapping, and nonlinear factors, so one could argue that it is fundamentally impossible to accurately predict them. The self-immolation of a Tunisian vegetable store owner triggered the largest and most rapid score of political protests the world has seen in centuries in the Arab Spring. The painting of graffiti on school walls by a group of teenage boys in Dara’a in 2011, and their subsequent torture by the Syrian authorities sparked huge protests and eventually a full-scale civil war. Indeed, “the unexpected” is not only ubiquitous in—but seems to be built into the fabric of—world politics, and perhaps this is inherently unpredictable (Taleb, Reference Taleb2010; Seybert and Katzenstein, Reference Seybert, Katzenstein, Katzenstein and Seybert2011).
Yet, largely due to the twinned developments of the explosion in the volume of data and developments in machine learning (ML) (Buchanan, Reference Buchanan2020), it seems increasingly feasible to forecast conflicts more accurately than ever before. A growing literature suggests that there are sturdier, more repeatable patterns undergirding conflict than was previously thought, findings that can then be used to produce promising predictions (Guo et al., Reference Guo, Gleditsch and Wilson2018). Ostensibly “random” onsets or inflammations of conflict may, in fact, be foreseeable, and this offers a glimmer of hope for reducing the rise of modern conflicts.
2. Overview of the literature and developments of the technology
Conflict forecasting—that is, the area of research surrounding methods in anticipating conflict—is similar to, but substantively different from, conflict modeling, which instead aims to identify the causal relationships between particular features of a nation or population and its risk of civil violence. At its most extreme, the predictive part of this field has been dismissed as “unscientific” or “pointless” (Chadefaux, Reference Chadefaux2017a, 8). More commonly, though, forecasting studies are critiqued for conflating correlation with causation (Shmueli, Reference Shmueli2010). Unpacking the theories and causal drivers of conflict is necessary, but as Chadefaux (Reference Chadefaux2017a, 8) argues “both explanation and prediction are needed to generate and test theories”; in the last two decades, prediction has begun to assume a more prominent position as an actionable and meaningful objective. This commentary will focus primarily on the concept of civil warfare (or civil conflicts, used interchangeably here), as popularized by Fearon and Laitin (Reference Fearon and Laitin2003), who proposed a commonly accepted framework in which a civil conflict occurs when 1000 total deaths occur with at least one state force involved and is sustained at a minimum of 100 deaths each year (Blair and Sambanis, Reference Blair and Sambanis2020).
Forecasting models are trained on data stretching from event-based dataFootnote 2, which detail daily events (such as riots and strikes), to variables that are theoretically understood to drive conflict, including democratic indices, measures of economic inequality, and climate variables like temperature change.Footnote 3 This information is then used to train a model to predict a selected outcome variable, such as the number of conflict-related fatalities per month. Deciding what, precisely, to predict—and when—has taken on many forms. Many of these decisions are limited by the nuances of the chosen dataset—for instance, Armed Conflict Location & Event Data Project (ACLED) is updated weekly, with event-based data and fatality counts (Raleigh and Kishi, Reference Raleigh and Kishi2019). This enables researchers to model fatality counts with minimal lag time.
There has been research on predicting outcomes at various phases of conflict, from the onset of violence to its termination (Kerins and Burke, Reference Kerins and Burke2019; Arana-Catania et al., Reference Arana-Catania, van Lier and Procter2022). In this sense, the field is underpinned by the notion of negative peace and tends to focus less on how ML methods can bring about positive measures (Galtung, Reference Galtung1969). Several studies aim to evaluate the change in death rates in conflict zones, too—this class of models focuses on forecasting the potential exacerbation of current at-risk areas (Vesco et al., Reference Vesco, Hegre, Colaresi, Jansen, Lo, Reisch and Weidmann2022). Others predict multi-class “conflict states” (Hegre et al., Reference Hegre, Karlsen, Nygård, Strand and Urdal2013), binary assumptions of war or peace (Ward et al., Reference Ward, Greenhill and Bakke2010), or individual political events (Libel, Reference Libel2022).
Conflict forecasting has attracted oscillating levels of skepticism and excitement over the last half-century. We draw upon Hegre et al. (Reference Hegre, Metternich, Nygård and Wucherpfennig2017) to briefly chronicle the field, and highlight how it has been driven by changes in government interest, data ubiquity, technological sophistication, and computation capabilities.
2.1. First wave of interest (the 1960s)
In 1963, the foundational Correlates of War was developed by Singer to accumulate quantitative knowledge about patterns of warfare (Correlates of War, 2022), and datasets produced by the project are referenced heavily (Lagazio and Russet, Reference Lagazio and Russett2001; Gleditsch and Ward, Reference Gleditsch and Ward2013). This was bookended by work on the mathematics of war by Richardson (Reference Richardson1960), Wright’s (Reference Wright1965) prediction equations, Sorokin’s (Reference Sorokin1957) “Social and Cultural Dynamics,” and Boulding’s theories on conflict from 1962 (Rummel, Reference Rummel1979). These academics are generally considered to be pioneers of the scientific analysis of conflict and have helped usher in a new wave of data collection and quantitative study. These efforts continued more sporadically throughout the 1970s and 1980s (Gurr and Lichbach, Reference Gurr and Lichbach1986), but received considerably less attention.
2.2. Emergence of machine learning and the establishment of Political Instability Task Force (1990s)
In the late 1980s and early 1990s, the field received a revitalization from researchers like Schrodt who proposed several papers on the use of neural networks and other ML methods to predict interstate conflicts (Schrodt and Mintz, Reference Schrodt and Mintz1988). In 1989, King also published several pieces of work on extending event count variables to continuous models. From there, one of the notable pushes for further innovation came in 1994 with the creation of the Political Instability Task Force (PITF) under the CIA—a body of scholars from universities around the United States was convened to advise the federal government on states vulnerable to failure and instability (George Mason University, 2006). The group published several reports over the next decade outlining the causes of state failure and its implications for forecasting techniques (Goldstone and Gurr, Reference Goldstone and Gurr2000; Goldstone et al., Reference Goldstone, Bates, Epstein, Gurr, Lustik, Marshall, Ulfelder and Woodward2010). In addition to drawing more attention to the field, the PITF reports also sparked an array of publications in response, some of which proposed more accurate statistical procedures (King and Zeng, Reference King and Zeng2001).
2.3. The modern generation (2010s–2020s)
In the early 2000s, scholars like Beck et al. (Reference Beck, Diego, King and Zeng2000) and Lagazio and Russett (Reference Lagazio and Russett2001) continued to explore the use of neural networks in analyzing interstate conflicts. However, the greatest growth in the number of papers published began in the 2010s, beginning with widely cited papers like Ward et al. (Reference Ward, Greenhill and Bakke2010), which challenged the assumption that statistically significant explaining variables were well suited for use in prediction models and helped launch a shift in focus to localized indicators and sources of data.
The landscape of forecasting methods has experienced significant growth over the last 20 years, with a greater focus on model explainability (Baillie et al., Reference Baillie, Howe, Perfors, Miller, Kashima and Beger2021; Attina et al., Reference Attina, Carammia and Iacus2022). Importantly, the period brought about a diversification of approaches. The introduction of sophisticated statistical techniques and computational models greatly expanded the purview of conflict researchers, including Monte Carlo methods, as discussed by Ward and Gleditsch (Reference Ward and Gleditsch2002), and agent-based modeling (Epstein, Reference Epstein2012). At the same time, the deployment of ML algorithms, such as random forest and gradient-boosted trees by Muchlinski et al. (Reference Muchlinski, Siroky, He and Kocher2016), has enriched predictive capabilities through more complex, supervised learning methods. In terms of addressing the temporal nature of conflict data, Markov-switching processes have been employed (Brandt et al., Reference Brandt, Freeman and Schrodt2014) to better capture the dynamics of time series, reflecting the nonlinearities in many forecasting domains. Similarly, the adoption of natural language processing techniques has expanded, with Mueller and Rauh (Reference Mueller and Rauh2022) applying topic modeling to navigate the challenges of class-imbalanced data and Besse et al. (Reference Besse, Bakhtiari and Lamontagne2012) utilizing N-gram models for deriving forecasts from sequential event data, among others. The integration of these diverse methods into larger ensemble models has also been a notable trend as well, often resulting in significant improvements in accuracy (Ettensperger, Reference Ettensperger2021). In more recent work, though nascent, researchers have also been investigating whether or not forecasting, in general, might benefit from advances in transformers and large language models (LLMs). For instance, Google’s TimesFM paper introduced a transformer-based time series model with promising zero-shot performance on times series data (Das et al., Reference Das, Kong, Sen and Zhou2024), and ensembles of LLMs have also been recently shown to outperform groups of human forecasters in a variety of forecasting tournaments (Schoenegger et al., Reference Schoenegger, Tuminauskaite, Park and Tetlock2024).
2.4. State-of-the-art performance and major players
Beyond strictly technical advancements, a small group of specialized institutions has played a critical role in fostering research in conflict forecasting. Namely, the ACLED publishes forecasts and general analysis on a monthly basis with regular, regional briefs (Raleigh and Kishi Reference Raleigh and Kishi2019). Since 2021, the Peace Research Institute of Oslo has led competitions focused on predicting changes in death rates in unstable areas and other larger research initiatives, like ViEWS (Hegre et al., Reference Hegre, Nygård and Landsverk2021). On the government front, the United States’ State Department Bureau of Conflict and Stabilization Operations maintains comprehensive conflict forecasting and monitoring projects like the Conflict Observatory and the Instability Monitoring & Analysis Platform (United States Department of State, 2024). Over the last decade, these efforts have spurred a wealth of research, elevated the field’s legitimacy, and generated several state-of-the-art models.
3. Technical bottlenecks and limitations
Despite the improvements in conflict forecasting systems, there remain some critical bottlenecks that must be addressed to improve the efficiency of these models. The first set of problems relates to data quality and the second to the interpretability of models.
3.1. Conflict data
While there are a host of issues that affect the quality of conflict data, this commentary cannot do justice to all of them and will focus on the primary concerns. First, key data are missing for many countries, which significantly hinders the ability to train performant models (Cederman and Weidmann, Reference Cederman and Weidmann2017). For example, Attina et al. (Reference Attina, Carammia and Iacus2022, 11) were unable to make forecasts for one-third of the countries in Africa because of “missing observations in the training set.” Similarly, researchers often find themselves unable to include certain types of data that could be potentially important determinants of violence (Racek et al., Reference Racek, Thurner, Davidson, Zhu and Kauermann2024)—almost one in four countries around the world has not had an agricultural census for 15 years, which forces them to omit agricultural data in their models (Burke et al., Reference Burke, Driscoll, Lobell and Ermon2021).Footnote 4
What makes this issue particularly difficult is that the discrepancies in the quality of the data tend to follow the chasm between wealthy and poor states. Since poorer states experience disproportionately higher levels of violent conflict than wealthier ones (Braithwaite et al., Reference Braithwaite, Dasandi and Hudson2016) and are often caught in the “conflict trap,” this problem is especially acute (Collier and Nicholas, Reference Collier and Nicholas2002). In half of all states in Africa, the average time taken between nationally representative livelihood surveys is 6.5 years, whereas in most wealthy countries, it happens several times each year (Burke et al., Reference Burke, Driscoll, Lobell and Ermon2021). The cost of undertaking national surveys is extremely high, and some leaders may be skeptical about carrying them out to conceal a lack of economic progress (Burke et al., Reference Burke, Driscoll, Lobell and Ermon2021).Footnote 5 The process of collecting data relevant to conflict forecasting is also logistically difficult because of poor infrastructure and defective communications structures (Marivoet and De Herdt, Reference Marivoet and De Herdt2014).
Reporting bias often leads to missing data or measurement errors. For example, the type of news outlet can have a large influence on what gets reported (Demarest and Langer, Reference Demarest and Langer2018). Herkenrath and Knoll (Reference Herkenrath and Knoll2011) found that in Argentina, Mexico, and Paraguay, the difference between national and international newspaper coverage was huge—with the latter reporting significantly fewer protest events than the former. While international sources may be less susceptible to partisan pressures, they exhibit a bias toward covering urban events over rural ones (Miller et al., Reference Miller, Kishi, Raleigh and Dowd2022). Several studies detail bias from local sources too (Croicu and Kreutz, Reference Croicu and Kreutz2017), albeit with some disagreement on the effect of factors like press freedom (Drakos and Gofas, Reference Drakos and Gofas2006; Urlacher, Reference Urlacher2009).
Bias also stems from the methodological decisions made by the data collectors. Consider the discrepancy between how the ACLED and the Uppsala Conflict Data Program’s Georeferenced Event Dataset (UCDP-GED) reported civilian deaths in Mexico in 2021. The former claimed that 6739 civilians had been killed, whereas the latter identified 28. This fissure stems from a methodological difference: ACLED counts unnamed armed groups, whereas UCDP-GED does not (Raleigh et al., Reference Raleigh, Kishi and Linke2023).
In order to deal with the vast volume of event-based data—even a small segment of which would be incredibly challenging for humans to analyze—many organizations have opted to automate the process. While it has sped up the ingestion process, this approach has exacerbated a fourth complication: misclassification. GDELT has published billions of data points and releases new batches every 15 minutes, but with greater automation comes more coding errors (Demarest and Langer, Reference Demarest and Langer2022). For example, ICEWS, GDELT, and Phoenix, which use machine-coded data, have loose “inclusion criteria,” casting an extremely wide net when searching for results (Raleigh and Kishi, Reference Raleigh and Kishi2019). Consequently, many events with little to no relevance are included in the dataset. For example, in June 2019, ICEWS classified 25 events between the United States and Iran as being of comparable severity to a nuclear war (based on the “CAMEO” ontology), when in fact most of the events “capture Iran engaging in a ‘war of words’ without any conflict or threats with the US” (Raleigh et al., Reference Raleigh, Kishi and Linke2023, 6). Even more strikingly, Phoenix’s system inaccurately classified an article discussing a hippopotamus attack at Victoria Falls as a conflict between the United States and Zimbabwe (Raleigh and Kishi, Reference Raleigh and Kishi2019).
Finally, there are difficulties around duplication. At times, duplicate results can provide valuable insights as impactful events are likely to be discussed more widely. This can be a valuable signal, but when attempting to “measure changes of ‘ground-truth’ behavior,” duplication introduces real difficulties (Schrodt and Yonamine, Reference Schrodt and Yonamine2012, 8). There have been efforts to address deduplication, but these have generally proven insufficient. For example, GDELT checks to see whether any other documents have the same titles, but only does so by searching for 15 minutes around the time the post was first seen and the sources it appeared in (Raleigh and Kishi, Reference Raleigh and Kishi2019). The proliferation of misinformation and disinformation, in part fueled by the same technology underpinning conflict forecasting, has heightened this problem (Bontridder and Poullet, Reference Bontridder and Poullet2021). With a higher volume of false information online, the likelihood of these machine-based data collection picking up on erroneous reports of events increases, resulting in misrepresentation of actions on the ground (Miller et al., Reference Miller, Kishi, Raleigh and Dowd2022).
3.2. Interpretability
In addition to the data quality challenges, several problems can arise during the modeling phase. When neural networks are used, there is often a trade-off between accuracy and interpretability (Deng and Ning, Reference Deng and Ning2021). Indeed, this is a problem that is being addressed by researchers in many fields, such as AI-driven medical work and image classification (Goebel et al., Reference Goebel, Chander, Holzinger, Lecue, Akata, Stumpf and Holzinger2018; Frasca et al., Reference Frasca, La Torre and Pravettoni2024). There have been promising developments in making these “black boxes” more interpretable (Deng et al., Reference Deng, Rangwala and Ning2021), and an entire field known as Explainable AI has emerged to deal with the challenge (Dwivedi et al., Reference Dwivedi, Dave, Naik, Singhal, Omer, Patel and Ranjan2023). Nevertheless, knowing precisely what variables these systems lean on the most to make their predictions still tends to be hidden away (Brandt et al., Reference Brandt, D’Orazio, Khan, Li, Osorio and Sianan2022). Amarasinghe et al. (Reference Amarasinghe, Rodolfa, Lamba and Ghani2023) distinguish between intrinsically interpretable and opaque models, the latter of which can become more explainable if post hoc methods are introduced. This poses a problem across a range of policy domains where ML is used, and the field of conflict forecasting is no different. If policymakers cannot understand why a country is likely to experience changes in the levels of violence—that is, changes in which variables are causing the prediction—there will be insufficient trust and minimal political will to take action (Sunstein, Reference Sunstein2023).
However, it seems increasingly possible to balance the trade-off such that the forecasts are simultaneously accurate and understandable. Montgomery et al. (Reference Montgomery, Hollenbach and Ward2012) make the case for using ensemble Bayesian model average (EBMA) in social sciences—an approach that pools and then averages predictions across multiple models—arguing that it improves out-of-sample forecasting. Ward and Beger (Reference Ward and Beger2017) utilized EBMA to generate 1- and 6-month predictions of conflict, and were able to simultaneously interpret the conflict drivers while also achieving an AUC score of 0.823, although the quality of this measure depends on the specifications of the target variable. Colaresi et al. (Reference Colaresi, Hegre and Nordkvelle2016) also applied EBMA to the challenge of conflict forecasting and found that while the precision was slightly lower than that of the best-performing model, their chosen model was able to balance true positives and negatives better. Their method performed well at forecasting several spikes in conflict in January 2012, namely in South Sudan and Somalia. Still, EMBA’s effectiveness in explainability remains relatively contingent on the transparency of its constituent models, suggesting the need for further work.
More recently, in terms of interpretability, Attina et al. (Reference Attina, Carammia and Iacus2022) used dynamic elastic net to predict the number of fatalities caused by state-based conflict each month. The value of this adaptive approach is that each country was modeled individually, and the model is able to select the most relevant variables (out of 700 available). It was then possible for countries with similar conflict drivers to be grouped together, further improving the heuristic function of the research. This did come with a slight sacrifice in terms of accuracy: DynENet performed seventh best out of the tested models in terms of Mean Squared Error. However, it remained “well above the median performance of competing models on 12 out of 13 evaluation metrics,” demonstrating the possibility of balancing these two considerations (Attina et al., Reference Attina, Carammia and Iacus2022, 13).
While acknowledging that correlation does not prove causation, the deployment of these models has still provided some valuable heuristic insights about the contours of conflict. First, at a basic level, these models have lent credence to the “conflict trap,” which is the notion that countries or regions that have experienced conflict have a high likelihood of experiencing more conflict in the future (Collier and Nicholas, Reference Collier and Nicholas2002). Simply put: conflict brings about conditions that are conducive to further conflict. Mueller and Rauh (Reference Mueller and Rauh2022) find that when one episode of conflict ends, the likelihood that another will start again immediately after is 30%, but after 10 years of not experiencing any conflict, the risk of a conflict breaking out is 0.5%. The corollary of this is that in the vast majority of cases, outbreaks of conflict can be accurately predicted simply by analyzing recent levels of conflict—but predicting these cases is less valuable as they are more generally anticipated by policymakers. It is the instances of conflict that exist outside of the conflict trap—which happen without clear and recent precedents of violence, like the violence in Tigray in 2020—that are “very unlikely and hard to forecast,” but possess the most potential policy impact because they are the most destabilizing cases (Mueller and Rauh, Reference Mueller and Rauh2022, 2447).
The deployment of these models has also revealed the continued importance of the relationship between physical geography and conflict, perhaps pushing back against those claiming that war has entered a “post-physical” era (Gregory, Reference Gregory2011; Möllers, Reference Möllers2021). For instance, Aquino et al. (Reference Aquino, Guo and Wilson2019) mapped the geospatial network of cities in an attempt to predict conflict. Since they used a dynamical model, they were able to see which connections between geographic nodes were most responsible for changes in violence (i.e., get some sense of the main driving factors). For example, they found that in Somalia—where their model predicted new violent events with 95% accuracy—the city of Burr Gaabo went from being in the state of “war” to “peace” when the connection with Kenia, another geographic node, went from “enemy” to “ally.” Put differently: it was the hostile relationship between the two locations was largely responsible for the high risk of violence.
The final way in which these more interpretable models have served a heuristic function is through detailing the complex relationship between environmental factors and conflict. Scheffran et al. (Reference Scheffran, Guo, Krampe and Okpara2023) yoke together two strands of research—”tipping points” in risk/conflict and cooperation under conditions of climate stress—to investigate how climate factors could alter the risk of conflict. They deduce that poor quality governance exposes countries to “climate-induced tipping,” using the case of Lake Chad as an example, whereas having a robust civil society can act as a bulwark against climate-driven conflict (Scheffran et al., Reference Scheffran, Guo, Krampe and Okpara2023, 12). This type of insight might help peacemakers who are operating in areas that are being acutely impacted by climate change, such as the Sahel. Overall, the field of conflict forecasting must continue to emphasize interpretability in addition to accuracy. It is only through understanding the determinants of conflict—as well as generating the predictions themselves—that stand to significantly support the work of policymakers and peacebuilders.Footnote 6
4. Risks and opportunities in policy implementation
Conflict forecasting is becoming a tangible reality in the policy domain, especially for the United Nations Peacekeeping Operations (PKOs). The Secretary General stressed in his 2020–2022 Data Strategy report that predictive peacekeeping would bolster forecasts of armed violence, enable more accurate strategic decision-making, and encourage timely deployments of boots-on-the-ground (United Nations, 2020). Yet the state of the field remains somewhat opaque since early warning is only effective insofar that action is taken over words alone. Broadly, the impact of predictive ML for PKOs should be gauged not only through how interpretable the analysis is, but also through the direct effect that this analysis has on enduring peacekeeping missions (Druet, Reference Druet2021).
In this regard, much of the potential for conflict forecasting in PKOs stems from the use of The Situational Awareness Geospatial Enterprise (SAGE) database, which serves as the backbone event and incident tool for most UN peacekeeping missions (Druet, Reference Druet2021). The use of this data would be particularly useful for training predictive ML models; rather than training on a corpus of irrelevant data, which drains both labor and resources, models can be refined using mission-specific data with significantly greater predictive capacity. The UN’s Joint Mission Analysis Centre database in Darfur, for instance, contains troves of high-quality data on troop movements, anecdotal evidence from local informants, new rebel splits, environmental factors, and even positive measures for peace such as ceasefire talks and diplomatic engagements (Galtung, Reference Galtung1969; Duursma and Karlsrud, Reference Duursma and Karlsrud2019). To this end, the UN PKOs in Mali (MINUSMA) have succeeded in deploying predictive awareness analysis with mortar detection equipment to preemptively address threats against troops in many of Kidal’s most violent hotspot regions (Druet, Reference Druet2021). Here, lies one of the key opportunities of using predictive forecasting: the ability to identify threats against PKOs and strengthen camp security. The risk of insurgency against UN PKOs remains one of the key reasons for operational inefficiency, with 13 out of 24 UN civil conflict PKOs being attacked by rebel groups from 1989 to 2003 and peacekeeping troop fatalities drastically rising in MINUSMA to become the second most deadly mission in UN history (Salverda, Reference Salverda2013; Henke, Reference Henke2016; Rietjens and de Waard, Reference Rietjens and de Waard2017; United Nations Peacekeeping, 2024). Underpinned by contextualized SAGE data, there is vast potential for conflict forecasting to reduce operational risks during PKOs and ultimately pave the way for more effective and safe peacekeeping endeavors in the future.
Monitoring highly dynamic conflicts through forecasting is also important for making prudent choices about troop deployment. PKOs are often blamed for impotence due to their lack of presence beyond military bases, but they are also faced with deployment issues when conflicts spill over into noncontiguous geographic space (Duursma and Karlsrud, Reference Duursma and Karlsrud2019). Consider, for instance, that just under half of all insurgency attacks in Darfur take place over 100 kilometers from the nearest peacekeeping camp (Duursma and Karlsrud, Reference Duursma and Karlsrud2019). Evidently, operational range is an issue that plagues UN PKOs, and consequently predictive geospatial data can be particularly useful while making decisions on dynamically reallocating troops (Tuvdendarjaa, Reference Tuvdendarjaa2022)—especially because the “when” question is just as important as the “where” question in conflict prediction, as discussed by the International Crisis Group in a report on PKOs in Sudan (International Crisis Group, 2005).
The third avenue of opportunity for ML in conflict forecasting is for applications in “deep conflict resolution”, a term coined by Olsher (Reference Olsher2015) to encapsulate more holistic approaches to peacebuilding involving local knowledge, social psychology, and stakeholder values. At present, there continues to be cognitive limitations in conflict resolution that often prevent the realization of dreams of lasting stability, namely insufficient expertise, groupthink within military communities, and ethnocentric biases—particularly in contexts where PKOs do not have the due time to learn the cultural specificities of the regions within which they operate (Olsher, Reference Olsher2015). Duffey (Reference Duffey2000, 165), for example, argues that a lack of cultural and linguistic understanding contributed to the failure of the UN Operation in Somalia II mission, and that “improved efforts must be made toward understanding the cultural issues at all levels of interpersonal interaction and process implementation.” In such cases, the continued development of predictive ML tools like Olsher’s (Reference Olsher2015, 282) cogSolv can be used to simulate and forecast different stakeholder views of the world based on field experts’ cultural models and real-time conflict data, allowing peacekeeping personnel to “find negotiation win-wins…, avoid offense, provide peacekeeping decision tools, and protect emergency responders’ health.” Importantly, this should not be used to further an at-a-distance foreign policy, where PKOs might be more inclined to manage from afar and therefore lessen “the ability to interact, understand and empathize with local populations” (Duursma and Karlsrud, Reference Duursma and Karlsrud2019, 13). By working in line with gradual reduction in tensions theory, predictive negotiation tools can be used by civil society NGOs, UN civil affairs officers, and international diplomats to work alongside local communities in culturally and politically complex environments to maximize outcomes and foster a lasting stability and peace (Duursma and Karlsrud, Reference Duursma and Karlsrud2019).
Some key risks and obstacles remain a barrier for the successful implementation of ML tools in peacebuilding. The growing focus on open-sourcing data is particularly instructive in the context of UN PKOs since a secondary analysis of mission success could also involve sharing data internally between missions to learn and analyze best practices and failures (Druet, Reference Druet2021). Coupled with developments in reinforcement learning algorithms, a strong culture of data sharing could help deliver critical insights for optimizing missions and resources. However, many mission leaders continue to create internal friction when asked to provide data to the UN headquarters, protesting that performance metrics lack context or could leak sensitive information (Druet, Reference Druet2021). This “paradox of information ownership and sharing” within UN PKOs continues to be impedimentary for the deployment of ML in effectively forecasting and responding to conflict (Druet, Reference Druet2021, 17).
The researcher must also conduct dimension reduction at some point in the model life cycle, introducing the risk that personal biases become baked into model taxonomies. Most researchers often try to follow the principle of Ockham’s razor—that the best model is the one with the fewest assumptions made—and attempt to achieve this by reducing the number of features, and thus assumptions on causality (Wainwright and Mulligan, Reference Wainwright and Mulligan2013; Piasini et al., Reference Piasini, Liu, Chaudhari, Balasubramanian and Gold2023). In doing so, many issues can arise in the context of conflict forecasting, as seen in the case when analysts working on the UN PKO in the Congo (MONUSCO) attempted to amalgamate a large number of Mayi-Mayi militia groups under a single title, which caused issues further downstream in attributing perpetrators of attacks and wrongfully accusing communities of insurgency (Druet, Reference Druet2021; United Nations, 2024).
Perhaps the most discussed risk in the literature of conflict forecasting is that of information security and adversarial actors. Despite the size of human resources available, the UN and other peacebuilding bodies are not well-resourced enough to mitigate against intrusions from highly sophisticated cyber adversaries (Druet, Reference Druet2021). As a supranational organization comprised of many competing national interests, there is always a latent risk that training alone cannot obviate different national allegiances, and resultantly “states do not wish to share secrets with all the countries in the world and refuse to allow other states to send troops to spy on their own governance” (Martin-Brûlé, Reference Martin-Brûlé2021, 494). This problem becomes especially acute in situations where misinformation and disinformation campaigns arise to splinter factions in UN PKOs, such as in the Central African Republic, which risks confidential data on informants and predictive analysis being leaked to direct adversaries who intend to further promote conflict (Druet, Reference Druet2021). The logical extension in addressing this risk is to truly underscore the importance of data privacy concerns, even more so given the fact that a significant amount of SAGE data is collected from local informants whose personal security is always jeopardized by insurgency groups (Druet, Reference Druet2021).
The final risk in policy implementation is expectation management. Many of the challenges associated with using forecasting models in policy are rooted in the unique expectations placed upon the field of geopolitics, as analysts are often counted on to give a prophetic view into the future and to generate a wholly deterministic “upstream” understanding of events that have not yet happened (Gentry and Gordon, Reference Gentry and Gordon2019). Politically speaking, this creates a danger that conflict forecasting can only be appreciated for its validation—that is, the predictive accuracy of models—rather than verification, which is “the process by which the model is checked to make sure that it is solving the equations correctly” (Clifford and Valentine, Reference Clifford and Valentine2003, 286; Baillie et al., Reference Baillie, Howe, Perfors, Miller, Kashima and Beger2021). Gentry and Gordon (Reference Gentry and Gordon2019) discuss this risk with the “batting-average” metaphor used in intelligence communities, where analysts are measured on their overall frequency of “hits” rather than their analytic rigor—which only helps to foster negligence by focusing solely on accuracy instead of notions of error and interpretability (Jervis, Reference Jervis2010). To address this, Mueller and Rauh’s (Reference Mueller and Rauh2022) ML cost-based intervention framework uses a conflict weighting system to allocate degrees of emphasis on some potential false positive scenarios over others based on their possible scale of harm, allowing decision-makers to take fuller stock of developing cases that could rapidly escalate. Therefore, a sufficiently cautious application of forecasting can avoid the dangers of wrongfully viewing predictive models as a panacea for intelligence forecasting (Musumba et al., Reference Musumba, Fatema and Kibriya2021). If used in concert with human agency while attempting to address the risks of implementation, forecasting models can be decisive tools for predicting, preventing, and responding to violent conflict.
5. Discussion
Even if comprehensive and granular data encompassing a range of the drivers of civil war existed, it would be impossible to perfectly predict conflict since it is a complex sociopolitical phenomenon laden with interlocking and nonlinear variables. However, in this commentary, we argue that this technology can help forecast violent conflict with a meaningful degree of accuracy, which then can—and should—be used to inform foreign policy and peacebuilding decisions. While we advocate for greater use of AI systems in conflict forecasting, we also encourage caution by emphasizing some critical considerations spanning both the technical process of building these systems and the policy implementation stage. The data used to train these models—including both data that capture the determinants of conflict, such as economic indicators and climate variables, and event-based data—remain the key barrier to progress. Clausewitz’s claim, made 200 years ago, that “casualty reports on either side are never accurate, seldom truthful, and in most cases deliberately falsified” continues to hold true (Clausewitz, Reference Clausewitz, Howard and Paret1976 [1832], 234). Moreover, when building these models, there should be a strong emphasis on interpretability—even if it comes at a slight cost of accuracy. Understanding why a model presents specific predictions facilitates one of the key benefits of the deployment of this technology: the heuristic function. The use of these ML models has already highlighted some key characteristics of conflict in the 21st century, such as the continued presence of the conflict trap, the importance of physical geography, and the complex relationship between environmental factors and conflict. This improved understanding of conflict can then inform the protection of PKO personnel, troop deployment, and deep conflict resolution. What forecasting efforts can teach policymakers and peacebuilders about the character of conflict in the 21st century is of comparable value to the predictions themselves.
Provenance
This article was submitted for consideration for the 2024 Data for Policy Conference to be published in Data & Policy on the strength of the Conference review process.
Acknowledgments
The authors are grateful for the valuable feedback provided by Dr. Michael Kenwick (Rutgers University) and Dr Juan Luis Manfredi Sánchez (Georgetown University).
Data availability statement
No original data was used in the production of this article.
Author contribution
Conceptualization: M.M; E.S; K.H. Methodology: M.M; E.S; K.H. Writing original draft: M.M; E.S; K.H. All authors approved the final submitted draft.
Competing interest
The authors declare no competing interests exist.
Comments
No Comments have been published for this article.