1. Introduction
Actuaries have been using econometric and statistical models for decades. And just as statistical learning has fundamentally changed the way predictive models are built, actuaries have had to adapt to these new techniques. Neural networks, a concept rooted in the 1940s and inspired by the structure of the human brain, have exploded in recent decades with the advent of massive data, enabling increasingly sophisticated architectures and capturing more complex effects. This progress has made it possible to implement the universal approximation theorem, which had previously existed only in theory. The 2024 Nobel Prize in Physics, awarded to Hopfield and Hinton, underscores their pivotal role in this revolution.
But the arrival of these artificial intelligence (AI)/machine learning models has not been without its problems. Breiman (Reference Breiman2001) spoke of a cultural difference between data modelers and algorithmic modelers. But the difference is more profound. Econometric and statistical models are deeply probabilistic, whereas learning algorithms are not. In a Support Vector Machine (SVM), we try to place a separating plane in a cloud of points, based on distance, and if this allows us to separate images of dogs and cats or individuals who are sick and others who are not, the question of the probability of belonging in a given group rarely arises. Yet it is this quantity that is essential for actuaries, in order to construct a tariff. The actuary is not trying to predict who will die in a life insurance portfolio, but to estimate, as accurately as possible, the probability of death for each individual. Recent advances in insurance predictive analytics have brought new challenges, such as handling high-cardinality features, incorporating Poisson and Tweedie loss functions into machine learning models, and enforcing smoothness and monotonicity constraints.
As a highly regulated industry, insurance often requires models to be explainable, enabling regulators and stakeholders to understand the basis for decision-making. However, emerging machine learning and AI models is usually too complex and opaque to meet these explainability standards. This has created a need for new models and techniques that can harness the predictive power of these black-box models while maintaining the transparency and interpretability that insurance demands.
Issues of discrimination and fairness in insurance have long been debated. Yet, AI and Big Data have added layers of complexity, as opaque algorithms and proxy discrimination introduce new concerns. Addressing these challenges requires multiperspective and cross-disciplinary collaboration. Importantly, even to start addressing these challenges, interpretability and explainability is a fundamental prerequisite.
To encourage further research in this area and support recent innovations, the Annals of Actuarial Science (AAS) has launched a special issue titled “Insurance Analytics: Prediction, Explainability, and Fairness.”
2. Predictive analytics in insurance
Statistical models have long been used in insurance, but their use raises profound epistemological questions. Von Mises (Reference Von Mises1939) explained that the “probability of death” applies to a group or class of individuals, not to any single person, as it has no meaning when referring to an individual, even with detailed knowledge of their life and health. In the frequentist approach, probabilities are constructed as asymptotic limits of frequencies, grounded in the law of large numbers. By reasoning in terms of “homogeneous risk classes,” actuaries historically relied on robust statistical techniques, both mathematically and philosophically, to rate policyholders. However, modern machine learning techniques now allow for pricing individual risks and personalizing premiums with increasing granularity. This shift introduces new challenges when applying advanced analytics in insurance, including managing high-cardinality features, classifying policyholders into unique subgroups, and incorporating Poisson and Tweedie deviance loss functions in boosting and tree-based methods.
In this special issue, we present four papers that focus on predictive analytics in insurance.
Campo & Antonio (Reference Campo and Antonio2024) proposed the data-driven Partitioning Hierarchical Risk-factors Adaptive Top-down (PHRAT) algorithm to reduce hierarchically structured risk factors to their essence by grouping similar categories at each level. They also utilize embeddings to encode textual descriptions of economic activities, aiding in the grouping of categories for inputs.
Lee & Jeong (Reference Lee and Jeong2024) modified the alternating direction method of multipliers (ADMM) for subgroup analysis to classify policyholders into unique groups. They interpret the credibility problem using both random effects and fixed effects, which correspond to the ADMNM approach and the classic Bayesian approach, respectively.
Willame et al. (Reference Willame, Trufin and Denuit2024) reviewed the use of boosting under the Poisson deviance loss function and log-link (following Wüthrich & Buser, Reference Wüthrich and Buser2019) and apply boosting with cost-complexity pruned trees on Tweedie responses (following Huyghe et al., Reference Huyghe, Trufin and Denuit2022). They introduced a new Boosting Trees package in R designed for insurance applications.
Wu et al. (Reference Wu, Chen, Xu, Pan and Zhu2024) extended the traditional Lee-Carter model using Kernel Principal Component Analysis (KPCA) to enhance mortality rate predictions. They demonstrated the robustness of this model, particularly during the COVID-19 pandemic, showing its superior performance in volatile conditions.
3. Explainability and interpretability
Insurance, as a high-stakes business, faces stringent regulatory requirements, particularly regarding explainability and interoperability. Traditional statistical models, such as Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs), are typically more interpretable than modern machine learning models like Gradient Boosting Machines, Random Forests, or Neural Networks. While a range of interpretability tools have been developed to increase transparency in these black-box models, they have not been without criticism (Hooker et al., Reference Hooker, Mentch and Zhou2021; Rudin, Reference Rudin2019; Xin et al., Reference Xin, Huang and Hooker2024). Balancing regulatory demands for explainability with the use of advanced machine learning models has become a pressing challenge for the insurance industry. Recent literature emphasizes the growing importance of model interpretation and transparency in this field, as highlighted by Aas et al. (Reference Aas, Jullum and Løland2021), Delcaillau et al. (Reference Delcaillau, Ly, Papp and Vermet2022), and Richman & Wüthrich (Reference Richman and Wüthrich2023).
In this special issue, we present four papers that focus on explainability and interpretability in insurance analytics:
Jose et al. (Reference Jose, Macdonald, Tzougas and Streftaris2024) developed a zero-inflated Poisson neural network (ZIPNN) by following the combined actuarial neural network (CANN) approach to model admission rates. They extend this with zero-inflated combined actuarial neural network (ZIPCANN) models and adopt the LocalGLMnet method (Richman & Wüthrich, Reference Richman and Wüthrich2023) to interpret the models.
Lindholm & Palmquist (Reference Lindholm and Palmquist2024) proposed a method for constructing categorical GLMs guided by information derived from a black-box predictor. They use partial dependance (PD) functions to create covariate partitions based on the black-box predictor, followed by an auto-calibration step and a lasso-penalized GLM fitting.
Maillart & Robert (Reference Maillart and Robert2024) explored an approach to estimate a GAM with non-smooth feature functions. This method distills knowledge from an Additive Tree model, partitions the covariate space, and fits a GLM using binned covariates for each decision tree, followed by an ensemble approach and final GLM fitting for auto-calibration.
Richman & Wüthrich (Reference Richman and Wüthrich2024) introduced ICEnet, a method that enforces smoothness and monotonicity constraints in deep neural networks. To train neural networks with these constraints, they augment datasets to produce pseudo-data that reflect the desired properties. A joint loss function is used to balance accurate predictions with constraint enforcement.
4. (Algorithmic) fairness
As in other industries, insurers are redefining their practices with the rise of Big Data and advanced AI algorithms, enabling them to detect previously unknown patterns, incorporate more rating factors, improve predictive accuracy, and move toward more granular risk classification. While these technologies expand the scope of what is possible, they do not fundamentally change the longstanding issues of insurance discrimination. In fact, in this rapidly evolving landscape, old challenges are becoming more pronounced. Concerns about indirect discrimination and the use of algorithmic proxies are growing, as insurers increasingly leverage vast datasets and sophisticated models.
Avraham (Reference Avraham2017) argued that insurance faces unique moral and legal challenges. While policymakers seek to prevent discrimination based on factors like race, gender, and age, the insurance business inherently involves distinguishing between risky and non-risky insureds, which often correlates with those sensitive characteristics. Actuaries must remain vigilant to these issues and actively contribute to solutions that mitigate the risks of discrimination.
Recent research has begun addressing these issues for insurance applications from multiple perspectives (such as ethical, actuarial, statistical, economic, and legal perspectives), with contributions from Prince & Schwarcz (Reference Prince and Schwarcz2019), Baumann & Loi (Reference Baumann and Loi2023), Lindholm et al. (Reference Lindholm, Richman, Tsanakas and Wüthrich2022, Reference Lindholm, Richman, Tsanakas and Wüthrich2024), Frees & Huang (Reference Frees and Huang2021), Xin & Huang (Reference Xin and Huang2023), Barry & Charpentier (Reference Barry and Charpentier2023), Charpentier (Reference Charpentier2024), Araiza Iturria et al. (Reference Araiza Iturria, Hardy and Marriott2024), and Fahrenwaldt et al. (Reference Fahrenwaldt, Furrer, Hiabu, Huang, Jørgensen, Lindholm and Tsanakas2024), as examples.
We encourage more contributions to this crucial area of research, addressing the ongoing challenges of discrimination and fairness in insurance via multidisciplinary research and collaborations.
5. Conclusion
The integration of advanced analytics, machine learning, and AI into the insurance industry has presented significant opportunities to enhance predictive accuracy, streamline operations, and deliver more personalized services. However, with these advances come complex challenges, particularly in maintaining the explainability and fairness of increasingly opaque models. As insurers adopt these powerful tools, the responsibility to ensure ethical and responsible use becomes even more critical.
The papers in this special issue of the Annals of Actuarial Science highlight cutting-edge approaches to prediction and explainability in insurance analytics. They collectively demonstrate the potential of advanced methods to address industry challenges, but they also emphasize the need for further research to reconcile the power of these models with business, regulatory, and ethical considerations. We encourage continued research contributions to this critical area.
Data availability statement
Data availability is not applicable to this article as no new data were created or analyzed in this study.
Funding statement
There was no external funding.
Competing interests
None.