Hostname: page-component-6587cd75c8-r56mn Total loading time: 0 Render date: 2025-04-24T00:46:03.006Z Has data issue: false hasContentIssue false

Distill knowledge of additive tree models into generalized linear models: a new learning approach for non-smooth generalized additive models

Published online by Cambridge University Press:  14 November 2024

Arthur Maillart
Affiliation:
Detralytics, Saint-Josse-ten-Noode, Belgium Université Lyon 1, Institut de Science Financière et d’Assurances, Lyon, France
Christian Robert*
Affiliation:
Université Lyon 1, Institut de Science Financière et d’Assurances, Lyon, France Laboratory in Finance and Insurance - LFA CREST - Center for Research in Economics and Statistics, Paris, France
*
Corresponding author: Christian Robert; Email: [email protected]

Abstract

Generalized additive models (GAMs) are a leading model class for interpretable machine learning. GAMs were originally defined with smooth shape functions of the predictor variables and trained using smoothing splines. Recently, tree-based GAMs where shape functions are gradient-boosted ensembles of bagged trees were proposed, leaving the door open for the estimation of a broader class of shape functions (e.g. Explainable Boosting Machine (EBM)). In this paper, we introduce a competing three-step GAM learning approach where we combine (i) the knowledge of the way to split the covariates space brought by an additive tree model (ATM), (ii) an ensemble of predictive linear scores derived from generalized linear models (GLMs) using a binning strategy based on the ATM, and (iii) a final GLM to have a prediction model that ensures auto-calibration. Numerical experiments illustrate the competitive performances of our approach on several datasets compared to GAM with splines, EBM, or GLM with binarsity penalization. A case study in trade credit insurance is also provided.

Type
Original Research Paper
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alaya, M. Z., Bussy, S., Gaïffas, S., & Guilloux, A. (2019). Binarsity: A penalization for one-hot encoded features in linear supervised learning. Journal of Machine Learning Research, 20(118), 134. http://jmlr.org/papers/v20/17-170.html Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 532.CrossRefGoogle Scholar
Denuit, M., Charpentier, A., & Trufin, J. (2021). Autocalibration and tweedie-dominance for insurance pricing with machine learning. Insurance: Mathematics and Economics, 101, 485497.Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 11891232.CrossRefGoogle Scholar
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3).CrossRefGoogle Scholar
Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Statistical Science, 297310.Google Scholar
Henckaerts, R., Antonio, K., & Côté, M.-P. (2022). When stakes are high: Balancing accuracy and transparency with model-agnostic interpretable data-driven surrogates. Expert Systems with Applications, 202, 117230. doi: 10.1016/j.eswa.2022.117230 .CrossRefGoogle Scholar
Hofner, B., Mayr, A., & Schmid, M. (2014). gamboostlss: An r package for model building and variable selection in the gamlss framework. arXiv preprint arXiv:1407.1774.Google Scholar
Hooker, G. (2004). Discovering additive structure in black box functions. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 575580).CrossRefGoogle Scholar
Lindholm, M., Lindskog, F., & Palmquist, J. (2023). Local bias adjustment, duration-weighted probabilities, and automatic construction of tariff cells. Scandinavian Actuarial Journal, 2023(10), 128.CrossRefGoogle Scholar
Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.Google Scholar
Tan, S., Caruana, R., Hooker, G., Koch, P., & Gordo, A. (2018). Learning global additive explanations for neural nets using model distillation.Google Scholar
Tsang, M., Cheng, D., & Liu, Y. (2017). Detecting statistical interactions from neural network weights. arXiv preprint arXiv:1705.04977.Google Scholar
Wood, S. N. (2017). Generalized additive models: An introduction with R. CRC Press.CrossRefGoogle Scholar
Wüthrich, M. V., & Ziegel, J. (2023). Isotonic recalibration under a low signal-to-noise ratio. arXiv preprint arXiv:2301.02692.Google Scholar