AI in actuarial science – a review of recent advances – part 1

Ronald Richman

doi:10.1017/S1748499520000238

AI in actuarial science – a review of recent advances – part 1

Published online by Cambridge University Press: 26 August 2020

Ronald Richman

Show author details

Ronald Richman*: Affiliation:
QED Actuaries and Consultants
*: E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Rapid advances in artificial intelligence (AI) and machine learning are creating products and services with the potential not only to change the environment in which actuaries operate but also to provide new opportunities within actuarial science. These advances are based on a modern approach to designing, fitting and applying neural networks, generally referred to as “Deep Learning.” This paper investigates how actuarial science may adapt and evolve in the coming years to incorporate these new techniques and methodologies. Part 1 of this paper provides background on machine learning and deep learning, as well as an heuristic for where actuaries might benefit from applying these techniques. Part 2 of the paper then surveys emerging applications of AI in actuarial science, with examples from mortality modelling, claims reserving, non-life pricing and telematics. For some of the examples, code has been provided on GitHub so that the interested reader can experiment with these techniques for themselves. Part 2 concludes with an outlook on the potential for actuaries to integrate deep learning into their activities. Finally, a supplementary appendix discusses further resources providing more in-depth background on machine learning and deep learning.

Keywords

Actuarial science Deep learning Machine learning Insurance Telematics

Type: Review
Information: Annals of Actuarial Science , Volume 15 , Issue 2 , July 2021 , pp. 207 - 229

DOI: https://doi.org/10.1017/S1748499520000238 [Opens in a new window]
Copyright: © Institute and Faculty of Actuaries 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y. & Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. Paper presented at the OSDI.Google Scholar

Albright, J., Schneider, J. & Nyce, C. (2017). The Chaotic Middle. Available online at the address https://assets.kpmg.com/content/dam/kpmg/us/pdf/2017/06/chaotic-middle-autonomous-vehicle-paper.pdf [accessed 24-Jul-2018].Google Scholar

Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends^® in Machine Learning, 2(1), 1–127.CrossRef Google Scholar

Bengio, Y., Courville, A. & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.CrossRef Google Scholar PubMed

Bengio, Y., Ducharme, R., Vincent, P. & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.Google Scholar

Bengio, Y. & LeCun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste & J. Weston (Eds.), Large-Scale Kernel Machines. MIT Press, Cambridge, MA.Google Scholar

Boonen, T. (2017). Solvency II solvency capital requirement for life insurance companies based on expected shortfall. European Actuarial Journal, 7(2), 405–434.CrossRef Google Scholar PubMed

Borovykh, A., Bohte, S. & Oosterlee, C.W. (2017). Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691.Google Scholar

Breiman, L. (2001). Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231.CrossRef Google Scholar

Bühlmann, H. & Gisler, A. (2006). A Course in Credibility Theory and its Applications. Springer Science & Business Media, Berlin.Google Scholar

Bühlmann, H. & Straub, E. (1983). Estimation of IBNR reserves by the methods chain ladder, Cape Cod and complementary loss ratio. Paper presented at the International Summer School.Google Scholar

Cairns, A.J.G., Blake, D. & Dowd, K. (2006). A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. Journal of Risk & Insurance, 73(4), 687–718. doi: 10.1111/j.1539-6975.2006.00195.x.CrossRef Google Scholar

Canny, J. (1986). A computational approach to edge detection. In IEEE Transactions on Pattern Analysis and Machine Intelligence: Vol. PAMI-8 (pp. 679–698). Elsevier. https://doi.org/10.1109/TPAMI.1986.4767851.Google Scholar

Chollet, F. (2015). Keras Retrieved from keras.io.Google Scholar

Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. (2015). Gated feedback recurrent neural networks. Paper presented at the International Conference on Machine Learning.Google Scholar

Currie, I.D. (2016). On fitting generalized linear and non-linear models of mortality. Scandinavian Actuarial Journal, 2016(4), 356–383.CrossRef Google Scholar

De Brébisson, A., Simon, É., Auvolat, A., Vincent, P. & Bengio, Y. (2015). Artificial neural networks applied to taxi destination prediction. arXiv:1508.00021.Google Scholar

De Jong, P. & Heller, G.Z. (2008). Generalized Linear Models for Insurance Data. Cambridge University Press, Cambridge.CrossRef Google Scholar

Dong, W., Li, J., Yao, R., Li, C., Yuan, T. & Wang, L. (2016). Characterizing driving styles with deep learning. arXiv:1607.03611.Google Scholar

drive.ai. (2018). Drive.ai announces on-demand self-driving car service on public roads in Texas. Frisco, Texas. Available online at the address https://s3.amazonaws.com/www-staging.drive.ai/content/uploads/2018/05/06164346/Press-Release_Drive.ai-Texas-Deployment.pdf [accessed 24-Jul-2018].Google Scholar

Elman, J. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. doi: 10.1207/s15516709cog1402_1.CrossRef Google Scholar

Federal Drug Administration. (2018). FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems. Available online at the address https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm [accessed 24-Jul-2018].Google Scholar

Freund, Y. & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.CrossRef Google Scholar

Friedman, J. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.2307/2699986.CrossRef Google Scholar

Friedman, J., Hastie, T. & Tibshirani, R. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Springer-Verlag, New York.Google Scholar

Gao, G. & Wüthrich, M.V. (2019). Convolutional neural network classification of telematics car driving data. Risks, 7(1), 6.CrossRef Google Scholar

Geladi, P. & Kowalski, B. (1986). Partial least-squares regression: a tutorial. Analytica Chimica Acta, 185, 1–17.CrossRef Google Scholar

Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel Hierarchical Models (Vol. 1). Cambridge University Press, New York, NY.Google Scholar

Gesmann, M., Murphy, D., Zhang, Y., Carrato, A., Crupi, G., Wüthrich, M. & Concina, F. (2017). ChainLadder: Statistical Methods and Models for Claims Reserving in General Insurance. Available online at the address https://CRAN.R-project.org/package=ChainLadder [accessed 24-Jul-2018].Google Scholar

Girshick, R. (2015). Fast R-CNN. arXiv:1504.08083.Google Scholar

Gluck, S. (1997). Balancing development and trend in loss reserve analysis. Paper presented at the Proceedings of the Casualty Actuarial Society.Google Scholar

Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.CrossRef Google Scholar

Goldberg, Y. (2017). Neural Network Methods for Natural Language Processing (Vol. 10). Morgan & Claypool Publishers, San Rafael, California.Google Scholar

Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning. MIT Press, Cambridge, MA.Google Scholar

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative adversarial nets. Paper presented at the Advances in Neural Information Processing Systems.Google Scholar

Graves, A. (2012). Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24797-2_2.CrossRef Google Scholar

Graves, A., Mohamed, A. & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. Paper presented at the Acoustics, speech and signal processing (icassp), 2013 ieee international conference on.CrossRef Google Scholar

Guo, C. & Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv:1604.06737.Google Scholar

Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., … Coates, A. (2014). Deep speech: Scaling up end-to-end speech recognition. arXiv:1412.5567.Google Scholar

Hastie, T., Tibshirani, R. & Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC, Boca Raton, Florida.CrossRef Google Scholar

Hinton, G., Osindero, S. & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.CrossRef Google Scholar PubMed

Hinton, G. & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.CrossRef Google Scholar PubMed

Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv (arXiv:1207.0580).Google Scholar

Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Google Scholar PubMed

Hoerl, A.E. & Kennard, R.W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.CrossRef Google Scholar

Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.CrossRef Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Paper presented at the Advances in Neural Information Processing Systems.Google Scholar

Kuhn, M. & Johnson, K. (2013). Applied Predictive Modeling (Vol. 26). Springer, Berlin.CrossRef Google Scholar

LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.CrossRef Google Scholar PubMed

LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef Google Scholar

Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419), 659–671.Google Scholar

Maaten, L. & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.Google Scholar

Mack, T. (1993). Distribution-free calculation of the standard error of chain ladder reserve estimates. Astin Bulletin, 23(02), 213–225.CrossRef Google Scholar

Mack, T. (2002). Schadenversicherungsmathematik 2. Auflage: Schriftenreihe Angewandte Versicherungsmathematik, DGVM.Google Scholar

Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018). The M4 competition: results, findings, conclusion and way forward. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2018.06.001.CrossRef Google Scholar

McGrayne, S. (2011). The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy. Yale University Press, New Haven, Connecticut.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in neural information processing systems.Google Scholar

Mitchell, T. (1997). Machine Learning. McGraw-Hill, Boston, MA.Google Scholar

Mullainathan, S. & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106.CrossRef Google Scholar

Nair, V. & Hinton, G. (2010). Rectified linear units improve restricted Boltzmann machines. Paper presented at the Proceedings of the 27th International Conference on Machine Learning.Google Scholar

Noll, A., Salzmann, R. & Wüthrich, M.V. (2018). Case study: French motor third-party liability claims. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3164764.CrossRef Google Scholar

Ohlsson, E. & Johansson, B. (2010). Non-Life Insurance Pricing with Generalized Linear Models (Vol. 2). Springer, Berlin.CrossRef Google Scholar

Parodi, P. (2014). Pricing in General Insurance. CRC Press, Boca Raton, Florida.CrossRef Google Scholar

Parodi, P. (2016). Towards machine pricing. Paper presented at the GIRO 2016, Dublin.Google Scholar

Pascanu, R., Gulcehre, C., Cho, K. & Bengio, Y. (2013). How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026.Google Scholar

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., … Lerer, A. (2017). Automatic differentiation in PyTorch.Google Scholar

Renshaw, A.E. & Verrall, R.J. (1998). A stochastic model underlying the chain-ladder technique. British Actuarial Journal, 4(4), 903–923.CrossRef Google Scholar

Rentzmann, S. & Wüthrich, M. (2019). Unsupervised learning: what is a sports car? Available at SSRN 3439358.CrossRef Google Scholar

Richman, R. (2017). Old age Mortality in South Africa, 1985-2011. University of Cape Town, Cape Town.Google Scholar

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.CrossRef Google Scholar

Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.Google Scholar

Rumelhart, D., Hinton, G. & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533.CrossRef Google Scholar

Schreiber, D. (2017). The Future of Insurance. Available online at the address https://www.youtube.com/watch?time_continue=1&v=LDOhFHJqKqI [accessed 24-Jul-2018].Google Scholar

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330.CrossRef Google Scholar

Sutskever, I., Vinyals, O. & Le, Q. (2014). Sequence to sequence learning with neural networks. Paper presented at the Advances in neural information processing systems.Google Scholar

Sutton, R. & Barto, A. (2018). Reinforcement Learning: An Introduction, Second Edition (Vol. 1). MIT Press, Cambridge, MA.Google Scholar

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015). Going deeper with convolutions. Paper presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).CrossRef Google Scholar

Thatcher, A.R., Kannisto, V. & Andreev, K. (2002). The survivor ratio method for estimating numbers at high ages. Demographic Research, 6(1), 2–15.CrossRef Google Scholar

Thomson, R. (2006). A typology of models used in actuarial science: refereed paper. South African Actuarial Journal, 6(1), 19–36.CrossRef Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288.CrossRef Google Scholar

Tomas, J. & Planchet, F. (2014). Prospective mortality tables and portfolio experience. In A. Charpentier (Ed.), Computational actuarial science with R: CRC Press.Google Scholar

Viola, P. & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Paper presented at the Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.CrossRef Google Scholar

Weisstein, E. (2003). Convolution. Available online at the address http://mathworld.wolfram.com/Convolution.html [accessed 24-Jun-2018].Google Scholar

Werbos, P. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339–356. https://doi.org/10.1016/0893-6080(88)90007-X.CrossRef Google Scholar

Wu, Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M., Macherey, W., … Macherey, K. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.Google Scholar

Wüthrich, M. & Buser, C. (2018). Data analytics for non-life insurance pricing. Available online at the address https://doi.org/10.2139/ssrn.2870308 [accessed 17-Jun-2018].CrossRef Google Scholar

Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.CrossRef Google Scholar

Richman supplementary material

File 26.5 KB

Article contents

AI in actuarial science – a review of recent advances – part 1

Abstract

Keywords

Access options

References

Richman supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests