COST-SENSITIVE MULTI-CLASS ADABOOST FOR UNDERSTANDING DRIVING BEHAVIOR BASED ON TELEMATICS

Banghee So; Jean-Philippe Boucher; Emiliano A. Valdez

doi:10.1017/asb.2021.22

COST-SENSITIVE MULTI-CLASS ADABOOST FOR UNDERSTANDING DRIVING BEHAVIOR BASED ON TELEMATICS

Published online by Cambridge University Press: 31 August 2021

Banghee So ,

Jean-Philippe Boucher and

Emiliano A. Valdez

Show author details

Banghee So: Affiliation:
Department of Mathematics, Towson University, 7800 York Rd, Towson, MD, 21252, USA, E-Mail: bso@towson.edu
Jean-Philippe Boucher: Affiliation:
Département de Mathématiques, Université du Québec à Montréal, 201 Avenue du Président-Kennedy, Montréal, Québec, H2X 3Y7, Canada, E-Mail: boucher.jean-philippe@uqam.ca
Emiliano A. Valdez*: Affiliation:
Department of Mathematics, University of Connecticut, 341 Mansfield Road, Storrs, CT, 06269-1009, USA, E-Mail: emiliano.valdez@uconn.edu
*: E-Mail: emiliano.valdez@uconn.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Using telematics technology, insurers are able to capture a wide range of data to better decode driver behavior, such as distance traveled and how drivers brake, accelerate, or make turns. Such additional information also helps insurers improve risk assessments for usage-based insurance, a recent industry innovation. In this article, we explore the integration of telematics information into a classification model to determine driver heterogeneity. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero accidents, a lower proportion with exactly one accident, and a far lower proportion with two or more accidents. We here introduce a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm we call SAMME.C2 to handle such class imbalances. We calibrate the algorithm using empirical data collected from a telematics program in Canada and demonstrate an improved assessment of driving behavior using telematics compared with traditional risk variables. Using suitable performance metrics, we show that our algorithm outperforms other learning models designed to handle class imbalances.

Keywords

Vehicle telematics usage-based insurance cost-sensitive learning AdaBoost SMOTE SAMME SAMME.C2.C10 C51 C52

Type: Research Article
Information: ASTIN Bulletin: The Journal of the IAA , Volume 51 , Issue 3 , September 2021 , pp. 719 - 751

DOI: https://doi.org/10.1017/asb.2021.22 [Opens in a new window]
Copyright: © The Author(s), 2021. Published by Cambridge University Press on behalf of The International Actuarial Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ayuso, M., Guillen, M. and Nielsen, J.P. (2019) Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data. Transportation 46, 735–752.CrossRef Google Scholar

Ayuso, M., Guillen, M. and Pérez-Marín, A.M. (2016) Telematics and gender discrimination: some usage-based evidence on whether men’s risk of accidents differs from women’s. Risks 4, 1–10.CrossRef Google Scholar

Bhowan, U., Zhang, M. and Johnston, M. (2010) Genetic programming for classification with unbalanced data. Proceedings 13th European Conference on Genetic Programming, EuroGP 2010, pp. 1–13. Springer-Verlag Berlin.CrossRef Google Scholar

Boucher, J.-P., Côté, S. and Guillen, M. (2017) Exposure as duration and distance in telematics motor insurance using generalized additive models. Risks 5, 1–23.CrossRef Google Scholar

Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357.CrossRef Google Scholar

Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.W. (2003) SMOTEBoost: Improving prediction of the minority class in boosting. PKDD 2003: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery, pp. 107–119. Springer-Verlag: Berlin-Heidelberg.CrossRef Google Scholar

Constantinescu, C.C., Stancu, I. and Panait, I. (2018) Impact study of telematics auto insurance. Review of Financial Studies 3(4), 17–35.Google Scholar

Douzas, G., Bacao, F. and Last, F. (2018) Improving imblanced learning through a heuristic oversampling method based on k-means and smote. Information Sciences 465, 1–20.CrossRef Google Scholar

Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B. and Herrera, F. (2018). Learning from Imbalanced Data Sets. Switzerland: Springer.CrossRef Google Scholar

Ferrario, A. and Hämmerli, R. (2019) On Boosting: Theory and Applications. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3402687 CrossRef Google Scholar

Ferreira, A.J. and Figueiredo, M.A. (2012) Boosting algorithms: A review of methods, theory, and applications. In Ensemble Machine Learning: Methods and Applications (eds. Zhang, C. and Ma, Y. ), chap. 2, pp. 35–85. Springer Science.CrossRef Google Scholar

Fowlkes, E.B. and Mallows, C. (1983) A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78(383), 553–569.CrossRef Google Scholar

Freund, Y. and Schapire, R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139.CrossRef Google Scholar

Friedman, J., Hastie, T. and Tibshirani, R. (2000) Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28(2), 337–407.CrossRef Google Scholar

Galar, M., Fernández, A., Barrenechea, E., Bustince, H. and Herrer, F. (2012) A review on emsembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Review 42(4), 463–484.CrossRef Google Scholar

Gao, G., Meng, S. and Wüthrich, M.V. (2019) Claims frequency modeling using telematics card driving data. Scandinavian Actuarial Journal 2, 143–162.CrossRef Google Scholar

Gao, G., Wang, H. and Wüthrich, M.V. (2021) Boosting poisson regression models with telematics car driving data. Machine Learning.CrossRef Google Scholar

Guillen, M., Nielsen, J.P., Pérez-Marín, A.M. and Elpidorou, V. (2020) Can automobile insurance telematics predict the risk of near-miss events? North American Actuarial Journal 24(1), 141–152.CrossRef Google Scholar

Hand, D.J. and Till, R.J. (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45(2), 171–186.CrossRef Google Scholar

Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.CrossRef Google Scholar

Holland, J.H. (1975) Adaptation in Natural and Artifical Systems. Ann Arbor: Univesity of Michigan Press.Google Scholar

Mühlenbein, H. (1997) Genetic algorithms. In Local Search in Combinatorial Optimization (eds. Aarts, E.H. and Lenstra, J.K. ), pp. 137–172. Princeton University Press.CrossRef Google Scholar

Orphanoudakis, S.C., Chronaki, C.E., Tsiknakis, M. and Kostomanolakis, S.G. (1998) Telematics in healthcare. In Medical Image Databses (ed. Wong, S.T. ), chap. 10, pp. 251–281. New York: Springer.Google Scholar

Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T. and Brunk, C. (1994) Reducing misclassification costs. ICML 1994: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–225. San Francisco, CA: Morgan Kaufman Publishers Inc.CrossRef Google Scholar

Pednault, E.P., Rosen, B.K. and Apte, C. (2000) Handling imbalanced data sets in insurance risk modeling. Technical report, Association for the Advancement of Artificial Intelligence (AAAI).Google Scholar

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.Google Scholar

Pérez-Marín, A. M., Guillen, M., Alcañiz, M. and Bermúdez, L. (2019) Quantile regression with telematics information to assess the risk of driving above the posted speed limit. Risks 7, 1–11.CrossRef Google Scholar

Pesantez-Narvaez, J., Guillen, M. and Alcañiz, M. (2019) Predicting motor insurance claims using telematics data – XGBoost versus logistic regression. Risks 7, 1–16.CrossRef Google Scholar

Schapire, R.E. and Singer, Y. (1999) Using boosting algorithms using confidence-rated predictions. Machine Learning 37, 297–336.CrossRef Google Scholar

Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J. and Napolitano, A. (2010) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40(1), 185–197.CrossRef Google Scholar

Shon, H.S., Batbaatar, E., Kim, K.O., Cha, E.J. and Kim, K.-A. (2020) Classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry 12, 154.CrossRef Google Scholar

Sun, Y., Kamel, M.S., Wong, A.K. and Wang, Y. (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378.CrossRef Google Scholar

Tang, Y., Zhang, Y.-Q., Chawla, N.V. and Krasser, S. (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(1), 281–288.CrossRef Google Scholar PubMed

Verbelen, R., Antonio, K. and Claeskens, G. (2018) Unravelling the predictive power of telematics data in car insurance pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 67(5), 1275–1304.Google Scholar

Wüthrich, M.V. and Buser, C. (2020) Data analytics for non-life insurance pricing. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2870308 Google Scholar

Yang, Q. and Wu, X. (2006) 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5(4), 597–604.CrossRef Google Scholar

Zhang, S. (2020) Cost-sensitive KNN classification. Neurocomputing 391, 234–242.CrossRef Google Scholar

Zhu, J., Zou, H., Rossett, S. and Hastie, T. (2009) Multi-class AdaBoost. Statistics and Its Interface, 2, 349–360.Google Scholar

Article contents

COST-SENSITIVE MULTI-CLASS ADABOOST FOR UNDERSTANDING DRIVING BEHAVIOR BASED ON TELEMATICS

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests