Distill knowledge of additive tree models into generalized linear models: a new learning approach for non-smooth generalized additive models

Arthur Maillart; Christian Robert

doi:10.1017/S1748499524000241

Distill knowledge of additive tree models into generalized linear models: a new learning approach for non-smooth generalized additive models

Published online by Cambridge University Press: 14 November 2024

Arthur Maillart and

Christian Robert

Show author details

Arthur Maillart: Affiliation:
Detralytics, Saint-Josse-ten-Noode, Belgium Université Lyon 1, Institut de Science Financière et d’Assurances, Lyon, France
Christian Robert*: Affiliation:
Université Lyon 1, Institut de Science Financière et d’Assurances, Lyon, France Laboratory in Finance and Insurance - LFA CREST - Center for Research in Economics and Statistics, Paris, France
*: Corresponding author: Christian Robert; Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Generalized additive models (GAMs) are a leading model class for interpretable machine learning. GAMs were originally defined with smooth shape functions of the predictor variables and trained using smoothing splines. Recently, tree-based GAMs where shape functions are gradient-boosted ensembles of bagged trees were proposed, leaving the door open for the estimation of a broader class of shape functions (e.g. Explainable Boosting Machine (EBM)). In this paper, we introduce a competing three-step GAM learning approach where we combine (i) the knowledge of the way to split the covariates space brought by an additive tree model (ATM), (ii) an ensemble of predictive linear scores derived from generalized linear models (GLMs) using a binning strategy based on the ATM, and (iii) a final GLM to have a prediction model that ensures auto-calibration. Numerical experiments illustrate the competitive performances of our approach on several datasets compared to GAM with splines, EBM, or GLM with binarsity penalization. A case study in trade credit insurance is also provided.

Keywords

Additive tree ensembles auto-calibration generalized additive models generalized linear models partitioning methods XAI

Type: Original Research Paper
Information: Annals of Actuarial Science , Volume 18 , Issue 3 , November 2024 , pp. 692 - 711

DOI: https://doi.org/10.1017/S1748499524000241 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alaya, M. Z., Bussy, S., Gaïffas, S., & Guilloux, A. (2019). Binarsity: A penalization for one-hot encoded features in linear supervised learning. Journal of Machine Learning Research, 20(118), 1–34. http://jmlr.org/papers/v20/17-170.html Google Scholar

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRef Google Scholar

Denuit, M., Charpentier, A., & Trufin, J. (2021). Autocalibration and tweedie-dominance for insurance pricing with machine learning. Insurance: Mathematics and Economics, 101, 485–497.Google Scholar

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.CrossRef Google Scholar

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3).CrossRef Google Scholar

Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Statistical Science, 297–310.Google Scholar

Henckaerts, R., Antonio, K., & Côté, M.-P. (2022). When stakes are high: Balancing accuracy and transparency with model-agnostic interpretable data-driven surrogates. Expert Systems with Applications, 202, 117230. doi: 10.1016/j.eswa.2022.117230 .CrossRef Google Scholar

Hofner, B., Mayr, A., & Schmid, M. (2014). gamboostlss: An r package for model building and variable selection in the gamlss framework. arXiv preprint arXiv:1407.1774.Google Scholar

Hooker, G. (2004). Discovering additive structure in black box functions. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 575–580).CrossRef Google Scholar

Lindholm, M., Lindskog, F., & Palmquist, J. (2023). Local bias adjustment, duration-weighted probabilities, and automatic construction of tariff cells. Scandinavian Actuarial Journal, 2023(10), 1–28.CrossRef Google Scholar

Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.Google Scholar

Tan, S., Caruana, R., Hooker, G., Koch, P., & Gordo, A. (2018). Learning global additive explanations for neural nets using model distillation.Google Scholar

Tsang, M., Cheng, D., & Liu, Y. (2017). Detecting statistical interactions from neural network weights. arXiv preprint arXiv:1705.04977.Google Scholar

Wood, S. N. (2017). Generalized additive models: An introduction with R. CRC Press.CrossRef Google Scholar

Wüthrich, M. V., & Ziegel, J. (2023). Isotonic recalibration under a low signal-to-noise ratio. arXiv preprint arXiv:2301.02692.Google Scholar

Article contents

Distill knowledge of additive tree models into generalized linear models: a new learning approach for non-smooth generalized additive models

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests