Optimizing insurance risk assessment: a regression model based on a risk-loaded approach

Zinoviy Landsman; Tomer Shushi

doi:10.1017/S1748499524000162

Optimizing insurance risk assessment: a regression model based on a risk-loaded approach

Published online by Cambridge University Press: 31 May 2024

Zinoviy Landsman

and

Tomer Shushi

Show author details

Zinoviy Landsman*: Affiliation:
Actuarial Research Center, Department of Statistics, University of Haifa, Haifa, Israel Faculty of Sciences, Holon Institute of Technology, Holon, Israel
Tomer Shushi: Affiliation:
Department of Business Administration, Guilford Glazer Faculty of Business and Management, Ben-Gurion University of the Negev, Beer-Sheva, Israel
*: Corresponding author: Zinoviy Landsman; Email: [email protected]

Article contents

Abstract
Introduction
Unrestricted $F_{\mathcal{L},\lambda }(\boldsymbol{\beta })$ model
The equality constrained $F_{\mathcal{L},\lambda }(\beta )$ model
Numerical analysis
Conclusion
Data availability statement
Funding statement
Competing interests
References

Rights & Permissions

Abstract

Risk measurement and econometrics are the two pillars of actuarial science. Unlike econometrics, risk measurement allows taking into account decision-makers’ risk aversion when analyzing the risks. We propose a hybrid model that captures decision-makers’ regression-based approach to study risks, focusing on explanatory variables while paying attention to risk severity. Our model considers different loss functions that quantify the severity of the losses that are provided by the risk manager or the actuary. We present an explicit formula for the regression estimators for the proposed risk-based regression problem and study the proposed results. Finally, we provide a numerical study of the results using data from the insurance industry.

Keywords

Loss functions optimization regression theory penalty loss function trade-off coefficient

Type: Original Research Paper
Information: Annals of Actuarial Science , Volume 19 , Issue 1 , March 2025 , pp. 82 - 95

DOI: https://doi.org/10.1017/S1748499524000162 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

1. Introduction

Regression theory plays an essential role in actuarial science. Using regression models, one obtains predictions for the losses based on a priori knowledge of the explanatory variables considered by the risk manager or actuary. Several papers developed and studied regression theory in the context of risks and risk measures (He et al., Reference He, Hou, Peng and Shen2020; Keilbar & Wang, Reference Keilbar and Wang2022; Gaglianone et al., Reference Gaglianone, Lima, Linton and Smith2011; Xiao et al., Reference Xiao, Guo and Lam2015; Barkai et al., Reference Barkai, Shushi and Yosef2021; Rennie & Srebro, Reference Rennie and Srebro2005; Buccini et al., Reference Buccini, De la Cruz Cabrera, Donatelli, Martinelli and Reichel2020; Ki Kang et al., Reference Ki Kang, Peng and Golub2020; Pitselis, Reference Pitselis2020; Jeong & Valdez, Reference Jeong and Valdez2020; Daouia et al., Reference Daouia, Gijbels and Stupfler2021). Unlike standard regression analysis, when dealing with risks one commonly wishes to focus on risk severity levels, which can be quantified by expected losses. In this paper, we develop a regression model that both minimizes the standard OLS error function and the loss function of the severity of the studied risk. We prove that the minimized solutions, that is, the regressors $\widetilde{\boldsymbol{\beta }}$ of the model, can be explicitly solved, which allows us to capture the fundamental behavior of the slopes for the given econometric problem. We then study prototypical examples of such a general approach and test it with empirical analysis.

Let $\mathbf{Y}$ be a risk with historical data $\mathbf{Y}=\left ( Y\,_{1},\ldots,Y\,_{n}\right ) ^{T},$ $X$ the $n\times p$ designed matrix that defines the dependent variables $\left ( x_{1},x_{2},\ldots,x_{n}\right )$ , and $\boldsymbol{\beta }\in \mathbb{R}^{p}$ be the vector of slopes.. Then, the linear regression takes the form $\mathbf{Y=}X\boldsymbol{\beta }+\boldsymbol{\varepsilon},$ where $\boldsymbol{\varepsilon }$ is the error term and the total error is given by $\varepsilon _{Total}\left ( \boldsymbol{\beta }\right ) =\left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2},$ whose minimization with respect to $\boldsymbol{\beta }$ gives the minimum least square estimator:

(1)

\begin{equation} \boldsymbol{\hat{\beta }}=\arg \inf _{\boldsymbol{\beta \in \mathbb{R}}^{p}}\varepsilon _{Total}\left ( \boldsymbol{\beta }\right ) =(X^{T}X)^{-1}(X^{T}\mathbf{Y}). \end{equation}

(2)

\begin{equation} \boldsymbol{\varepsilon \backsim }N_{n}\left(\mathbf{0},\sigma ^{2}I_{n}\right), \end{equation}

where $\sigma \gt 0$ is a dispersion parameter, $I_{n}$ is an $n\times n$ identical matrix that satisfies all of the BLUE assumptions, and $\boldsymbol{\hat{\beta }}$ is the BLUE estimator and also an MLE estimator of multivariate parameter $\boldsymbol{\beta .}$ We consider the case in which matrix $X^{T}X$ is not singular, that is,

(3)

\begin{equation} Rank(X)=p. \end{equation}

We define the loss intensity of the problem by $I_{\mathbf{Y}}=E(||\mathbf{Y||}^{2})=\sum _{i=1}^{n}EY_{i}^{2}=\boldsymbol{\beta }^{T}X^{T}X\boldsymbol{\beta +}n\sigma ^{2}$ quantified by the total sum of expected squares of the losses. Given a loss function $\mathcal{L}\; :\; u\in \mathbb{R}_{\geq 0}\Rightarrow \mathcal{L}\left ( u\right ) \geq 0,$ we consider $\mathcal{L}\left ( I_{\mathbf{Y}}\right )$ as the severity level of the loss intensity $I_{\mathbf{Y}_{n}}$ . Following a standard linear regression $\mathbf{Y}=X\boldsymbol{\beta +\varepsilon},$ our multi-objective problem is given by:

(4)

\begin{equation} \min \left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2}\textrm{AND}\min \mathcal{L}\left ( I_{\mathbf{Y}}\right ) . \end{equation}

In practical terms, loss intensity measures $\mathcal{L}\left ( I_{\mathbf{Y}}\right )$ can take many forms depending on the specific risks in the system. For example, in insurance, it might involve calculating the average claim payout per policyholder over a certain period. It can also be a way to assess the potential financial impact of different types of risks, such as natural disasters, cybersecurity breaches, or credit risks. The measure $\mathcal{L}\left ( I_{\mathbf{Y}}\right )$ has a special role in risk measurement. The square root of the loss intensity $I_{\mathbf{Y}}^{1/2}=\sqrt{E(||\mathbf{Y||}^{2})}$ is an objective measure that provides us an index of the magnitude of loss in the system in the units of the loss, for example, in US dollars, while the function $\mathcal{L}$ is subjective and depends on the decision-maker. The function $\mathcal{L}$ quantifies how the decision-maker is risk-averse toward the loss intensity, and $\lambda$ as the trade-off parameter quantifies the weight we assign to the consideration of $\mathcal{L}$ , that is, how important it is to minimize $\mathcal{L}$ compared to the minimization of the regression error $\left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2}.$

As we wish to find $\boldsymbol{\beta }$ that minimizes both the error term $\left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2}$ and the loss function $\mathcal{L}\left ( I_{\mathbf{Y}}\right ),$ the severity is quantified by the sum of expected squares of the losses. We set a trade-off goal function between them in the following standard trade-off form:

(5)

\begin{equation} F_{\mathcal{L},\lambda }\left ( \boldsymbol{\beta }\right ) =\left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2}+\lambda \mathcal{L}\left ( I_{\mathbf{Y}}\right ), \end{equation}

with a trade-off parameter $\lambda \gt 0.$ $F_{\mathcal{L},\lambda }\left ( \boldsymbol{\beta }\right )$ considers the balance sum of two functionals: the error term and the penalty term of the minimized goal function.

This paper explores the extension of the regression theory that captures the magnitude of loss in the system of risks. Taking the proposed goal function (5) for obtaining the regression parameters $\boldsymbol{\beta }^{\ast }$ provides a more general framework than the classical regression model, in which one captures both the minimization of the error provided by $\left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2}$ while also taking into account the minimization of $\mathcal{L}\left ( I_{\mathbf{Y}}\right ) .$

In this paper, we explore the problem of minimization of (5) and prove that it has explicit closed-form solutions in both an unrestricted (Section 2) and a restricted (Section 3) model. The proposed findings hold crucial implications within actuarial analysis. Specifically, we demonstrate that the ratio of intensities, represented as:

\begin{equation*} r=\frac {I_{\mathbf {Y}^{\ast }}}{I_{\hat{\mathbf{Y}}}}\leq 1, \end{equation*}

where $I_{\mathbf{Y}^{\ast }}$ is the loss intensity obtained by the results of Sections 2 and 3, and $I_{\hat{\mathbf{Y}}}$ is the intensity obtained by the classical least square method. Notice that our main theorems do not require the normality assumption of the residuals $\boldsymbol{\varepsilon }$ , as we highlighted in Section 2, Remark 1. This might be particularly useful in insurance data, where right-skewed distributions are often observed.

Regression analysis in actuarial science plays a vital role by modeling relationships between variables to assess risk and predict future outcomes. Actuaries often employ regression to analyze historical data, such as insurance claims or mortality rates, to develop models that estimate the impact of various factors on future events. By utilizing regression techniques, actuaries can make informed decisions regarding pricing, risk assessment, and policy development within the insurance and financial sectors. In that follows Section 4 is devoted to a numerical illustration of the risk-loaded approach using real data. In our numerical study demonstrating the risk-loaded approach in regression analysis, we focus on the claims experience of a prominent property and casualty insurer in the Midwestern United States. Specifically, we delve into the realm of private passenger automobile insurance. Within this context, the variable we’re analyzing as the dependent factor is the monetary value paid on closed claims, measured in US dollars.

2. Unrestricted $F_{\mathcal{L},\lambda }(\boldsymbol{\beta })$ model

In this section, we consider problem of minimization of functional (5) and for while we do not apply any restrictions on the choice of regression coefficients $\boldsymbol{\beta }=(\beta _{1},\ldots,\beta _{p})^{T}.$ The following theorem represents just such a case:

Theorem 1. Assume functions $\mathcal{L}^{\prime }\left ( y\right ) \geq 0$ , $\mathcal{L}^{\prime \prime }\left ( y\right ) \geq 0,$ and define $a=\mathbf{Y}^{T}X\boldsymbol{\hat{\beta }.}$ If the univariate equation

(6)

\begin{equation} w=\frac{1}{1+\lambda \mathcal{L}^{\prime }\left ( aw^{2}\right ) }, \end{equation}

has the solution $w^{\ast },$ this solution is unique, and the solution of the problem of minimization of functional ( 5 ) has the following explicit form:

(7)

\begin{equation} \boldsymbol{\beta }^{\ast }=\arg \inf _{\boldsymbol{\beta \in \mathbb{R}}^{p}}F_{\mathcal{L},\lambda }\left ( \boldsymbol{\beta }\right ) =w^{\ast }\boldsymbol{\hat{\beta }}, \end{equation}

where the coefficient $w\ast$ is depending on $\mathbf{Y}^{T}X\boldsymbol{\hat{\beta }.}$

Proof. For the proof of the theorem, we will need the following Lemma, the proof of which we have included in the appendix.

Lemma 1. Let $\boldsymbol{\alpha \in }\mathbb{R}^{n}$ be some vector and $A\gt 0$ be some positive definite matrix. Consider the following optimization problem:

\begin{equation*} \mathcal {F}(\mathbf {x})=F(\boldsymbol {\alpha }^{T}\mathbf {x},\mathbf {x}^{T}A\mathbf {x})\rightarrow \inf . \end{equation*}

Suppose bivariate function $F(x,y)$ is twice continuously differentiable in the feasible space $\mathcal{X\subset }\mathbb{R}\times \mathbb{R}_{+}$ and functional $\mathcal{F}(\mathbf{x})$ is convex. Then, the analytic solution to the optimization problem ( 5 ) is as follows:

\begin{equation*} \mathbf {x}^{\ast }=\frac {x_{1}^{\ast }}{A_{1\ast }^{-1}\boldsymbol {\alpha }}A^{-1}\boldsymbol {\alpha }, \end{equation*}

where $A_{1\ast }^{-1}$ is the first row of matrix $A^{-1}$ , that is, $A_{1\ast }^{-1}\boldsymbol{\alpha }=\Sigma _{j=1}^{n}A_{1j}^{-1}\boldsymbol{\alpha }_{j}$ , and $x_{1}^{\ast }$ is the root of the univariate equation:

(8)

\begin{equation} x_{1}=-\frac{1}{2}G\left(ax_{1},bx_{1}^{2}\right)A_{1\ast }^{-1}\boldsymbol{\alpha}, \end{equation}

where

\begin{equation*} G(x,y)=\frac {F_{x}^{\prime }(x,y)}{F_{y}^{\prime }(x,y)}, \end{equation*}

\begin{equation*} a=\frac {1}{A_{1\ast }^{-1}\boldsymbol {\alpha }}\boldsymbol {\alpha }^{T}A^{-1}\boldsymbol {\alpha}, \end{equation*}

and

\begin{equation*} b=\frac {1}{(A_{1\ast }^{-1}\boldsymbol {\alpha })^{2}}\boldsymbol {\alpha }^{T}A^{-1}\boldsymbol {\alpha .} \end{equation*}

Using this Lemma, we now prove the problem of minimization of functional (5), which we represent as the following problem:

(9)

\begin{equation} \mathcal{F}_{1}(\boldsymbol{\beta })=F(\boldsymbol{\alpha }^{T}\boldsymbol{\beta },\boldsymbol{\beta }^{T}A\boldsymbol{\beta })\rightarrow \inf, \end{equation}

where

(10)

\begin{equation} F(x,y)=b-2x+y+\lambda \mathcal{L}\left ( y\right ), \end{equation}

and $\boldsymbol{\alpha }=X^{T}Y$ ; $A=X^{T}X$ ; $b=\mathbf{Y}^{T}\mathbf{Y.}$ Now we show that the functional (9) is convex. We first observe that

(11)

\begin{eqnarray} F_{x}^{\prime }(x,y) &=&-2 \\[4pt] F_{y}^{\prime }(x,y) &=&1+\lambda \mathcal{L}^{\prime }\left ( y\right ) . \notag \\[4pt] F_{xx}^{\prime \prime }(x,y) &=&F_{xy}^{\prime \prime }(x,y)=0, \notag \\[4pt] F_{yy}^{\prime \prime }(x,y) &=&\lambda \mathcal{L}^{\prime \prime }\left ( y\right ), \notag \end{eqnarray}

and we have

(12)

\begin{equation} \Delta =F_{xx}^{\prime \prime }(x,y)F_{yy}^{\prime \prime }(x,y)-F_{xy}^{\prime \prime }(x,y)^{2}=0. \end{equation}

As the first derivatives of function $\mathcal{L}\left ( y\right )$ is also nonnegative, we use Lemma 3.1 of Landsman et al. (Reference Landsman, Makov and Shushi2020) and conclude that the considered functional (9) is convex. Then, we use Lemma 1 and find an analytic solution of problem (5). Since

(13)

\begin{eqnarray} F_{x}^{\prime }(x,y) &=&-2 \\[4pt] F_{y}^{\prime }(x,y) &=&1+\lambda \mathcal{L}^{\prime }\left ( y\right ), \notag \end{eqnarray}

we obtain $G(x,y)=-\frac{2}{1+\lambda \mathcal{L}^{\prime }\left ( y\right ) },$ and the solution of the optimization problem (5) is given by the following explicit form:

(14)

\begin{eqnarray} \boldsymbol{\beta }^{\ast } &=&\frac{x_{1}^{\ast }}{(X^{T}X)_{1\ast }^{-1}(X^{T}\mathbf{Y)}}(X^{T}X)^{-1}(X^{T}\mathbf{Y)} \\[4pt] &=&\frac{x_{1}^{\ast }}{\hat{\beta }_{1}}\boldsymbol{\hat{\beta }}, \notag \end{eqnarray}

where $x_{1}^{\ast }$ is a solution of the following equation:

(15)

\begin{equation} x_{1}=\frac{1}{1+\lambda \mathcal{L}^{\prime }\left ( \frac{a}{\hat{\beta }_{1}^{2}}x_{1}^{2}\right ) }\hat{\beta }_{1}\mathbf{.} \end{equation}

Recall and verify that $\hat{\beta }_{1}\mathbf{=}(X^{T}X)_{1\ast }^{-1}(X^{T}\mathbf{Y).\ }$ After designating $w=x_{1}/\hat{\beta }_{1}$ , we reduce equations (15) to (6). Since the functional (9) is convex, the solution of equation (6) is unique and vector $\boldsymbol{\beta }^{\ast \text{ }}$ presented in expression (14) is a minimum point of functional (9). It is not superfluous to notice that

(16)

\begin{equation} a=\mathbf{Y}^{T}X\boldsymbol{\hat{\beta }=Y}^{T}K\mathbf{Y}\geq 0, \end{equation}

where $K=X(X^{T}X)^{-1}X^{T}$ is an idempotent matrix and is, consequently, a nonnegative definite. Therefore, $aw^{2}\in R_{+}.$

Corollary 1. As $\mathcal{L}^{\prime }\left ( y\right ) \geq 0$ from equations ( 6 ) and ( 16 ) immediately follows that $0\leq w^{\ast }\leq 1.$

This corollary has an important value for understanding the actuarial and economic sense of risk-loaded regression result. Define $\hat{\mathbf{Y}}=X\boldsymbol{\hat{\beta }}$ and $\mathbf{Y}^{\ast }=X\boldsymbol{\beta }^{\ast }.$ Then for empirical intensities $I_{\hat{\mathbf{Y}}}=\boldsymbol{\hat{\beta }}^{T}X^{T}X\boldsymbol{\hat{\beta }},$ $I_{\mathbf{Y}^{\ast }}=\boldsymbol{\beta }^{\ast T}X^{T}X\boldsymbol{\beta }^{\ast },$ it follows that the ratio of intensities:

(17)

\begin{equation} r=\frac{I_{\mathbf{Y}^{\ast }}}{I_{\hat{\mathbf{Y}}}}=w^{\ast 2}\leq 1. \end{equation}

Remark 1. The conditions of Theorem 1 do not include the assumption of normal distribution of residuals $\boldsymbol{\varepsilon }$ (cf. equation (2)).

Remark 2. As can be seen from the proposed model, the desired coefficients $\boldsymbol{\beta }^{\ast }=w^{\ast }\boldsymbol{\hat{\beta }}$ are biased estimators stemming from the risk-loaded term $\lambda \mathcal{L}\left ( E(\mathbf{Y}^{T}\mathbf{Y})\right ),$ in such a way that the rate of the risk-loaded $\boldsymbol{\beta }^{\ast }$ is $w^{\ast },$ since the biased estimator is achieved by $\boldsymbol{\beta }^{\ast }=\boldsymbol{\hat{\beta }}$ which is the solution when $\lambda =0,$ or $\mathcal{L}\left ( E(\mathbf{Y}^{T}\mathbf{Y})\right ) =0.$

Remark 3. The proposed risk-loaded approach for regression analysis of risks modifies the standard concept of regression theory by adding a penalty term for the uncertainty of the risk and allows the decision-maker to choose different functionals to both the error term $\left \Vert \mathbf{Y}-X\boldsymbol{\beta }\right \Vert ^{2}$ and $E(\mathbf{Y}^{T}\mathbf{Y).}$ The simplest form of $\mathcal{L},$ $\mathcal{L}\left ( u\right ) =u,$ is the celebrated Ridge regression $.$

3. The equality constrained $F_{\mathcal{L},\lambda }(\beta )$ model

In certain situations, it is possible to possess non-sample information (a priori information on the $\boldsymbol{\beta }$ parameters), which can vary in nature. We specifically focus on precise a priori information regarding coefficients. Let us now scrutinize the proposed risk-based regression model under equality constraints. Generally, this prior information on the coefficients can be represented as follows: we address the issue (5) with linear constraints:

(18)

\begin{equation} R\boldsymbol{\beta }=\mathbf{r}, \end{equation}

where $R$ is matrix, $m\leq p$ , and $\mathbf{r}$ is $m\times 1$ vector. A notable and significant example of such constraints is

(19)

\begin{equation} \beta _{1}+\beta _{2}+\ldots +\beta _{p}=1, \end{equation}

where matrix $R=\mathbf{1}^{T}$ , $\mathbf{1}$ is merely a vector-column of ones, and $\mathbf{r}=1.$ This constraint allows us to regard the coefficients $\beta _{i}$ s as weights of the factors involved in the regression model. To solve the optimization problem (5) under constraints (18), we use Theorem 3.1 of Landsman et al. (Reference Landsman, Makov and Shushi2020), where such a problem was solved under even more general conditions. Denote the following vectors

(20)

\begin{eqnarray} \boldsymbol{\beta }_{0} &=&(X^{T}X)^{-1}R^{T}(R(X^{T}X)^{-1}R^{T})^{-1}\mathbf{r} \\[4pt] \boldsymbol{\beta }_{1} &=&(X^{T}X)^{-1}X^{T}\mathbf{Y}-(X^{T}X)^{-1}R^{T}(R(X^{T}X)^{-1}R^{T})^{-1}R(X^{T}X)^{-1}X^{T}\mathbf{Y} \notag \end{eqnarray}

and the following values

(21)

\begin{equation} \text{ }\alpha _{1}=\mathbf{Y}^{T}X\boldsymbol{\beta }_{1},\quad \alpha _{2}=\boldsymbol{\beta }_{0}^{T}X^{T}X\boldsymbol{\beta }_{0}. \end{equation}

Theorem 2. Under the conditions of Theorem 1, if the following univariate equation

(22)

\begin{equation} w=\frac{1}{1+\lambda \mathcal{L}^{\prime }\left ( \alpha _{2}+\alpha _{1}w^{2}\right ) }, \end{equation}

has the solution $w^{\ast },$ then this solution is unique, and the solution of the problem of minimization of functional ( 5 ) under linear constraints ( 18 ) has the following explicit form:

(23)

\begin{equation} \boldsymbol{\beta }_{R}^{\ast }=\arg \inf _{R\boldsymbol{\beta }=\mathbf{r}}F_{\mathcal{L},\lambda }\left ( \boldsymbol{\beta }\right ) =\boldsymbol{\beta }_{0}+w^{\ast }\boldsymbol{\beta }_{1}. \end{equation}

The vectors $\boldsymbol{\beta }_{0}$ and $\boldsymbol{\beta }_{1}$ are orthogonal with respect to $X^{T}X$ in the sense that

(24)

\begin{equation} \boldsymbol{\beta }_{0}^{T}X^{T}X\boldsymbol{\beta }_{1}=0. \end{equation}

Proof. The proof immediately follows from Theorem 3.1 of Landsman et al. (Reference Landsman, Makov and Shushi2020), and we also refer to the main results given in Landsman et al. (Reference Landsman, Makov and Shushi2018). In fact, the problem (5), subject to the system of linear constraints (18), is a special case of Theorem 3.1, where $F(x,y)$ has a form (10), equation (18) of Theorem 3.1 becomes (22), and the solution of the optimization problem (18) of Theorem 3.1 becomes (23). Vectors $\boldsymbol{\beta }_{0}$ and $\boldsymbol{\beta }_{1}$ are calculated by vectors $\mathbf{x}^{0}$ and $\mathbf{z}$ given in Theorem 3.1, respectively, where matrices $\Sigma =$ $X^{T}X,B=R,$ and vectors $\mathbf{c}=\mathbf{r}$ and $\boldsymbol{\mu }=X^{T}\mathbf{Y}$ . The orthogonal property (24) immediately follows from the orthogonal property of vectors $\mathbf{x}^{0}$ and $\mathbf{z}$ , see equation (19) of Theorem 3.1.

Note that the vectors $\boldsymbol{\beta }_{0}$ and $\boldsymbol{\beta }_{1}$ are not dependent on the loss function $\mathcal{L}(u)$ but are dependent only on matrix $X$ , vector $\mathbf{Y},$ restriction matrix $R,$ and vector $\mathbf{r}.$ When $w^{\ast }=1,$ we have the classical minimum least squared estimator under restriction (19):

(25)

\begin{equation} \boldsymbol{\hat{\beta }}_{R}=\boldsymbol{\beta }_{0}+\boldsymbol{\beta }_{1}. \end{equation}

Function $\mathcal{L}(u),$ in turn, determines equations (6) and (22).

As in the unrestricted case considered in Section 2, we have the following

Corollary 2. As $\mathcal{L}^{\prime }\left ( y\right ) \geq 0$ from equations ( 22 ) and ( 16 ) immediately follows that $0\leq w^{\ast }\leq 1.$

We define the empirical intensities of $\hat{\mathbf{Y}}_{R}=$ $X\boldsymbol{\hat{\beta }}_{R}$ and $\mathbf{Y}_{R}^{\ast }=X\boldsymbol{\beta }_{R}^{\ast },$ by:

(26)

\begin{equation} I_{\hat{\mathbf{Y}}_{R}}=\boldsymbol{\hat{\beta }}_{R}X^{T}X\boldsymbol{\hat{\beta }}_{R} \end{equation}

and

(27)

\begin{equation} I_{\mathbf{Y}_{R}^{\ast }}=\boldsymbol{\beta }_{R}^{\ast }X^{T}X\boldsymbol{\beta }_{R}^{\ast }, \end{equation}

respectively. Then, Corollary 2 and the orthogonal property of $\boldsymbol{\beta }_{0}$ and $\boldsymbol{\beta }_{1}$ (see, (24)) immediately lead to the following intensity ratio:

(28)

\begin{eqnarray} r &=&\frac{I_{\mathbf{Y}_{R}^{\ast }}}{I_{\hat{\mathbf{Y}}_{R}}}=\frac{\boldsymbol{\beta }_{R}^{\ast T}X^{T}X\boldsymbol{\beta }_{R}^{\ast }}{\boldsymbol{\hat{\beta }}_{R}^{T}X^{T}X\boldsymbol{\hat{\beta }}_{R}}=\frac{(\boldsymbol{\beta }_{0}+w^{\ast }\boldsymbol{\beta }_{1})^{T}X^{T}X(\boldsymbol{\beta }_{0}+w^{\ast }\boldsymbol{\beta }_{1})}{(\boldsymbol{\beta }_{0}+\boldsymbol{\beta }_{1})^{T}X^{T}X(\boldsymbol{\beta }_{0}+\boldsymbol{\beta }_{1})} \notag \\[4pt] &=&\frac{\boldsymbol{\beta }_{0}^{T}X^{T}X\boldsymbol{\beta }_{0}+w^{\ast 2}\boldsymbol{\beta }_{1}^{T}X^{T}X\boldsymbol{\beta }_{1}}{\boldsymbol{\beta }_{0}^{T}X^{T}X\boldsymbol{\beta }_{0}+\boldsymbol{\beta }_{1}^{T}X^{T}X\boldsymbol{\beta }_{1}}\leq 1. \end{eqnarray}

The above bound supports the preferability of a risk-loaded approach.

In the following, we examine special cases for our proposed actuarial and econometric measure, for different functions $\ \mathcal{L}$ .

3.1 Classical case

The classical case is obtained when . Then, the solution of equations (6) and (22) reduces to

(29)

\begin{equation} w^{\ast }=\frac{1}{1+\lambda }, \end{equation}

and the solution to the optimization unrestricted optimization problem is $\boldsymbol{\beta }^{\ast }=\frac{1}{1+\lambda }\boldsymbol{\hat{\beta }.}$ For the case $\lambda =0$ , which is the classical minimum least squared case, we have

(30)

\begin{equation} \boldsymbol{\beta }^{\ast }=\boldsymbol{\hat{\beta }}, \end{equation}

which conforms well with the solution of the classical problem. Notice that as $\boldsymbol{\hat{\beta }}$ is unbiased and a consistent estimator of $\boldsymbol{\beta }$ , $\boldsymbol{\beta }^{\ast }$ is unbiased and consistent estimator of $\frac{1}{1+\lambda }\boldsymbol{\beta .}$

In the context of the equality constrained model from Theorem 2 and (29), it immediately follows that

(31)

\begin{equation} \boldsymbol{\beta }_{R}^{\ast }=\boldsymbol{\beta }_{0}+\frac{1}{1+\lambda }\boldsymbol{\beta }_{1}. \end{equation}

When $\lambda =0,$ which is the classical minimum least squared case under linear constraints (18), we have

(32)

\begin{equation} \boldsymbol{\beta }_{R}^{\ast }=\boldsymbol{\beta }_{0}+\boldsymbol{\beta }_{1}=\boldsymbol{\hat{\beta }}_{R}, \end{equation}

which conforms well with the solution of the classical problem, cf. Őzkale & Selahattin (Reference Őzkale and Selahattin2007), eq (1.16).

3.2 Powered penalty function

Assume $\mathcal{L}\left ( u\right ) =u^{\delta },\delta \gt 0.$ Then equation (6) has the following form:

(33)

\begin{equation} w=\frac{1}{1+\lambda \delta a^{\delta -1}w^{2\delta -2}}, \end{equation}

which can be reduced to the power equation:

(34)

\begin{equation} w+\lambda \delta a^{\delta -1}w^{2\delta -1}=1. \end{equation}

A special case of this example is the case in which $\delta =1/2.$ Then, equation (34) has an analytic solution:

(35)

\begin{equation} w^{\ast }=1-\frac{\lambda }{2\sqrt{a}}, \end{equation}

and, the explicit solution of the minimization problem becomes

(36)

\begin{equation} \boldsymbol{\beta }^{\ast }=\left ( 1-\frac{\lambda }{2\sqrt{\mathbf{Y}^{T}X\boldsymbol{\hat{\beta }}}}\right ) \boldsymbol{\hat{\beta }.} \end{equation}

It is well known that under natural condition

(37)

\begin{equation} \frac{1}{n}X^{T}X\rightarrow Q\ \textrm{as}\ n\rightarrow \infty, \end{equation}

where $Q$ is $p\times p$ matrix and $\boldsymbol{\hat{\beta }}$ is a consistent estimator of $\boldsymbol{\beta .}$ We show that in the case that residuals are normally distributed, $\boldsymbol{\varepsilon \backsim }N_{n}(\mathbf{0},\sigma ^{2}I_{n}),$ $\boldsymbol{\beta }^{\ast }$ is also consistent estimator of $\boldsymbol{\beta .}$ Recall $\boldsymbol{\beta }^{\ast }=w^{\ast }\boldsymbol{\hat{\beta }}$ , where $w^{\ast }=1-\frac{\lambda }{2\sqrt{a}},$ and $a=\mathbf{Y}^{T}X\boldsymbol{\hat{\beta }=Y}^{T}K\mathbf{Y},$ where $K=X(X^{T}X)^{-1}X^{T}.$ As $K^{2}=X(X^{T}X)^{-1}X^{T}=K$ , matrix $K$ is idempotent. Recall that matrix $X$ has maximal rank equal to $p$ and vector $\mathbf{Y}$ is distributed $N_{n}(X\boldsymbol{\beta },\sigma ^{2}I_{n}).$ Then $\mathbf{Y/}\sigma \boldsymbol{\backsim }N_{n}(X\boldsymbol{\beta/}\sigma,I_{n})$ and $a/\sigma ^{2}\boldsymbol{\backsim \chi }_{n}(\Delta )$ , where $\boldsymbol{\chi }_{n}(\Delta )$ is a non-central chi-squared distribution with $p$ degrees of freedom, and $\Delta =\frac{1}{\sigma ^{2}}\boldsymbol{\beta }^{T}X^{T}X\boldsymbol{\beta }$ is a non-centrality parameter. Then, $E(a/n)=(n+\Delta )=1+\frac{1}{n}\boldsymbol{\beta }^{T}X^{T}X\boldsymbol{\beta \rightarrow }1,$ and $Var(a)=2(n+2\Delta )=2n+4\boldsymbol{\beta }^{T}X^{T}X\boldsymbol{\beta .\ }$ From Chebyshev’s inequality follows that

(38)

\begin{equation} P\left ( \left \vert \frac{a}{n}-E\left(\frac{a}{n}\right)\right \vert \gt \varepsilon \right ) \leq \frac{2n+4\boldsymbol{\beta }^{T}X^{T}X\boldsymbol{\beta }}{n^{2}\varepsilon ^{2}}=\frac{2}{n\varepsilon ^{2}}+\frac{4}{n\varepsilon ^{2}}\boldsymbol{\beta }^{T}\frac{X^{T}X}{n}\boldsymbol{\beta \rightarrow }0. \end{equation}

This implies that

\begin{equation*} \frac {a}{n}\overset {P}{\rightarrow }1 \end{equation*}

and then

\begin{equation*} w^{\ast }=1-\frac {\lambda }{2\sqrt {a}}\overset {P}{\rightarrow }1, \end{equation*}

that is, $\boldsymbol{\beta }^{\ast }$ is a consistent estimator of $\boldsymbol{\beta }$ for this case.

The same happens even for the more general case, $1/2\leq \delta \lt 1.$ In fact, it follows from equation (33) that:

(39)

\begin{equation} w^{\ast }=\frac{1}{1+\lambda \delta \lbrack (a/n)^{1-\delta }n^{1-\delta }w^{\ast 2(1-\delta )}]^{-1}}\overset{p}{\rightarrow }1,n\rightarrow \infty . \end{equation}

Considering the second special case $\delta =1$ , we obtain from the power equation (34) that $w^{\ast }=\frac{1}{1+\lambda },$ which conforms well with Subsection 3.1, but in this case, $\boldsymbol{\beta }^{\ast }$ is not a consistent estimator of $\boldsymbol{\beta},$ but a consistent estimator of $\frac{1}{1+\lambda }\boldsymbol{\beta .}$

4. Numerical analysis

We illustrate the risk-loaded approach for regression analysis by conducting a numerical study. We consider claims experience from a large Midwestern (US) property and casualty insurer for private passenger automobile insurance. The dependent variable is the amount paid on a closed claim, in (US) dollars (claims that were not closed by year end are handled separately). The independent variables are State code, vehicle Class code, Gender, and Age. We obtained the data from Frees (Reference Frees2010, Table 4). To have a design matrix $X$ be only numerically valued, we denoted the different class vehicle codes by numbers from 1 to 18 and Gender variable by $1$ or $0.$ In Table 1, we present the first $10$ lines (of $n=6773$ lines) that, in fact, are the first 10 rows of matrix $X$ and vector $\mathbf{Y}$ :

Table 1. First 10 lines of matrix X and vector Y

Consider, first, the unrestricted case and powered penalty function. Then the solution of equation (34) is, in fact, the zero of function:

(40)

\begin{equation} F(w)=w+\lambda \delta a^{\delta -1}w^{2\delta -1}-1. \end{equation}

In Fig. 1, we show the graph of function $F(w)$ for $\lambda =0.2$ and $\delta =1.1,$ and we can see that this function has the unique root $w^{\ast }=0.170.$

Figure 1. Graph of function $F(w)$ for $ \lambda =0.2$ and $ \delta =1.1$ ; $w^{\ast }=0.1701$ .

In Fig. 2, we provide the graph of the ratio of empirical intensities. One can see that the ratio of intensities decreases from 1 to 0.

Figure 2. Changing the ratio of empirical intensities $r=I_{Y^{\star }}/I_{\hat{Y}}$ when power parameter $ \delta$ increases from 0.5 to 1.5.

Now assume that we have the equality constraint (19). Then equation (22) in Theorem 2 takes of the form:

(41)

\begin{equation} F_{1}(w)=w+\lambda \delta w\left ( \alpha _{2}+\alpha _{1}w^{2}\right ) ^{\delta -1}-1=0. \end{equation}

The solution of this equation is a zero of function $F_{1}(w).$ In Fig. 3, we show the graph of function $F_{1}(w)$ for $\lambda =0.2$ and $\delta =1.1.$ We can see that this function has the unique root $w^{\ast }=0.341.$

Using obtained $w^{\ast }$ , we can provide the solution of the problem (5) under linear constraints (18), that is, coefficients $\boldsymbol{\beta }_{R}^{\ast }$ as follows:

In the following table, we provide the classical estimators for comparison with the risk-loaded estimators.

In order to collate these two solutions for the minimum least squared problem, we calculate the ratio of empirical intensities:

(42)

\begin{equation} r=\frac{I_{\mathbf{Y}_{R}^{\ast }}}{I_{\hat{\mathbf{Y}}_{R}}} \end{equation}

using the expression (28). In the considered numerical case $r=0.116$ , this implies that the intensity is significant lower using risk-loaded approach. In addition, we can say that as the $\boldsymbol{\beta }_{R}-$ coefficients satisfy the restriction (19), they indicate the weights of the factors such as Intercept, State, Class Vehicle, Gender, and Age in the considered regression model. Comparing the risk-loaded estimators reported in Table 2 with the classical estimators given in Table 3, we can conclude that the amplitude of changing weights for the risk-loaded estimators is much less than for the classical regression estimators. Regarding the difference between non-restricted and restricted models, we can observe that for the considered numerical data, the intensity ratio for the unrestricted approach, that is, $r=0.029$ , which is lower than for the restricted case. This result is quite natural because in the non-restricted case, we have an absolute minimum. However, in some sense, the unrestricted case can be considered as useless because we cannot interpret the meaning of each $\mathbb{\beta -}$ coefficient.

Table 2. Solution of Theorem 2 under power risk-loaded function

Table 3. The classical minimum least squared estimator under restriction (19)

Figure 3. Graph of function $F_{1}(w)$ for $ \lambda =0.2$ and $ \delta =1.1;\;w^{\ast }=0.34146$ .

In Fig. 4, we show how the ratio of intensities changes when power parameter $\delta$ increases from $0.5$ to $1.5.$ In this figure, we see that the ratio of intensities decreases when $\delta$ increases from $0.5$ to $1.5$ and that its graph is very similar to the graph given in Fig. 2.

Figure 4. Changing of the ratio of empirical intensities when power parameter $ \delta$ increases from 0.5 to 1.5: Linear constraint is present.

Notice that sometimes the data need to be log-transformed. In our case, it is unnecessary because our factors (State, Vehicle Class, Gender, and Age) are integers, and the relatively low ratios ( $r$ ) indicate a reasonably good level of adequacy.

5. Conclusion

In this paper, we proposed a hybrid risk-loaded regression model that simultaneously minimizes the classical least squared loss function and penalty loss function of suggested loss intensity of the problem, being the sum of expected squared historical data of responses $\mathbf{Y}=\left ( Y\,_{1},\ldots,Y\,_{n}\right ) ^{T}$ . Imposing rather general conditions on the model and the loss function, we find that the explicit solution of the minimization problem takes a proportional form of the least squared estimator $\boldsymbol{\hat{\beta }}$ , where the coefficient of proportionality depends on response vector $\mathbf{Y}$ and design matrix $X$ . Special attention was given to the powered penalty loss function, which essentially generalized the classical case of identical loss function. In addition to the unconditional problem, we also considered a situation in which the solution satisfies the linear equality constraints. In that case, we also found the explicit closed-form solution. In both the unconditional and conditional cases, we demonstrated that the ratio of empirical intensities for classical least squared estimator and for hybrid risk loaded estimator proposed in the paper is always less than 1, which speaks in favor of the proposed method. We also provided a numerical illustration of the proposed model using real data.

Acknowledgment

The authors would like to thank the reviewer for the comments, which significantly improved the paper.

Data availability statement

Replication materials are available on request from the authors. The data and code that support the findings of this study are available from the corresponding author [Zinoviy Landsman] upon reasonable request.

Funding statement

This work received no specific grant from any funding agency, commercial, or not-for-profit sectors.

Competing interests

The author(s) declare none.

Appendix

Proof of Lemma 1:

To find the stationary points of the optimization problem (5), we obtain the following system of equations:

\begin{equation*} \frac {d}{d\mathbf {x}}F\big(\boldsymbol {\alpha }^{T}\mathbf {x},\mathbf {x}^{T}A\mathbf {x}\big)=F_{x}^{\prime }\big(\boldsymbol {\alpha }^{T}\mathbf {x},\mathbf {x}^{T}A\mathbf {x}\big)\boldsymbol {\alpha +}2F_{y}^{\prime }\big(\boldsymbol {\alpha }^{T}\mathbf {x},\mathbf {x}^{T}A\mathbf {x}\big)A\mathbf {x=0.} \end{equation*}

After algebraic calculations, we obtain $\mathbf{x=-}\frac{1}{2}G\left(\boldsymbol{\alpha }^{T}\mathbf{x},\mathbf{x}^{T}A\mathbf{x}\right)A^{-1}\boldsymbol{\alpha .}$ Thus, we have the following system of equations:

(A1)

\begin{equation} \left \{ \begin{array}{c} x_{1}=\mathbf{-}\frac{1}{2}G\left(\boldsymbol{\alpha }^{T}\mathbf{x},\mathbf{x}^{T}A\mathbf{x}\right)A_{1\ast }^{-1}\boldsymbol{\alpha } \\[8pt] x_{2}=\mathbf{-}\frac{1}{2}G\left(\boldsymbol{\alpha }^{T}\mathbf{x},\mathbf{x}^{T}A\mathbf{x}\right)A_{2\ast }^{-1}\boldsymbol{\alpha } \\[4pt] \cdot \cdot \cdot \\[4pt] x_{n}=\mathbf{-}\frac{1}{2}G\left(\boldsymbol{\alpha }^{T}\mathbf{x},\mathbf{x}^{T}A\mathbf{x}\right)A_{n\ast }^{-1}\boldsymbol{\alpha }\end{array}\right ., \end{equation}

where $A_{i\ast }^{-1}$ is the $i$ -th row of matrix $A^{-1}.$ Dividing the $i$ -th equation, $i=2,\ldots,n,$ into the first equation, we obtain

\begin{equation*} \frac {x_{i}}{x_{1}}=\frac {A_{i\ast }^{-1}\boldsymbol {\alpha }}{A_{1\ast }^{-1}\boldsymbol {\alpha }},\quad i=2,\ldots,n, \end{equation*}

where, without loss of generality, we assume that $A_{1\ast }^{-1}\boldsymbol{\alpha }\neq 0.$ Then $x_{i}=\frac{A_{i\ast }^{-1}\boldsymbol{\alpha }}{A_{1\ast }^{-1}\boldsymbol{\alpha }}x_{1},i=2,\ldots,n,$ and hence $\mathbf{x}$ can be represented by the first variable $x_{1},$ in the following manner:

(A2)

\begin{equation} \mathbf{x}=\frac{x_{1}}{A_{1\ast }^{-1}\boldsymbol{\alpha }}A^{-1}\boldsymbol{\alpha}, \end{equation}

where $x_{1}$ is the solution of the first equation of system (A1). Substituting (A2) into this equation, we obtain equation:

\begin{eqnarray*} x_{1} &=&\mathbf{-}\frac{1}{2}G\left(\boldsymbol{\alpha }^{T}\mathbf{x},\mathbf{x} ^{T}A\mathbf{x}\right)A_{1\ast }^{-1}\boldsymbol{\alpha =-}\frac{1}{2}G\left ( \frac{x_{1}}{A_{1\ast }^{-1}\boldsymbol{\alpha }}\boldsymbol{\alpha }^{T}A^{-1}\boldsymbol{\alpha },\frac{x_{1}^{2}}{\left(A_{1\ast }^{-1}\boldsymbol{\alpha }\right)^{2}}\boldsymbol{\alpha }^{T}A^{-1}AA^{-1}\boldsymbol{\alpha }\right ) A_{1\ast }^{-1}\boldsymbol{\alpha } \\[4pt] &=&\mathbf{-}\frac{1}{2}G\left ( \frac{x_{1}}{A_{1\ast }^{-1}\boldsymbol{\alpha }}\boldsymbol{\alpha }^{T}A^{-1}\boldsymbol{\alpha },\frac{x_{1}^{2}}{\left(A_{1\ast }^{-1}\boldsymbol{\alpha }\right)^{2}}\boldsymbol{\alpha }^{T}A^{-1}\boldsymbol{\alpha }\right ) A_{1\ast }^{-1}\boldsymbol{\alpha}, \end{eqnarray*}

If this equation has a solution, this solution is unique due to the convexity of function $F$ , and the obtained stationary point given in (A2) is a minimum point. The Lemma is proved

References

Barkai, I., Shushi, T., & Yosef, R. (2021). A cryptocurrency risk–return analysis for bull and bear regimes. The Journal of Alternative Investments, 24(1), 95–118.CrossRef Google Scholar

Buccini, A., De la Cruz Cabrera, O., Donatelli, M., Martinelli, A., & Reichel, L. (2020). Large-scale regression with non-convex loss and penalty. Applied Numerical Mathematics, 157, 590–601.CrossRef Google Scholar

Daouia, A., Gijbels, I., & Stupfler, G. (2021). Extremile regression. Journal of the American Statistical Association, 117, 1579–1586.Google Scholar

Frees, E. W. (2010). Instructors’ Manual for Regression Modeling with Actuarial and Financial Applications. Available at: https://instruction.bus.wisc.edu/jfrees/jfreesbooks/regression%20modeling/bookwebdec2010/DataDescriptions.pdf. Accessed 24 May 2024.Google Scholar

Gaglianone, W. P., Lima, L. R., Linton, O., & Smith, D. R. (2011). Evaluating value-at-risk models via quantile regression. Journal of Business & Economic Statistics, 29(1), 150–160.CrossRef Google Scholar

He, Y., Hou, Y., Peng, L., & Shen, H. (2020). Inference for conditional value-at-risk of a predictive regression. The Annals of Statistics, 48(6), 3442–3464.CrossRef Google Scholar

Jeong, H., & Valdez, E. A. (2020). Predictive compound risk models with dependence. Insurance: Mathematics and Economics, 94, 182–195.Google Scholar

Keilbar, G., & Wang, W. (2022). Modelling systemic risk using neural network quantile regression. Empirical Economics, 62(1), 93–118.CrossRef Google Scholar

Ki Kang, S., Peng, L., & Golub, A. (2020). Two-step risk analysis in insurance ratemaking. Scandinavian Actuarial Journal, 2021, 532–542.Google Scholar

Landsman, Z., Makov, U., & Shushi, T. (2018). A generalized measure for the optimal portfolio selection problem and its explicit solution. Risks, 6(1), 19.CrossRef Google Scholar

Landsman, Z., Makov, U., & Shushi, T. (2020). Portfolio optimization by a bivariate functional of the mean and variance. Journal of Optimization Theory and Applications, 185(2), 622–651.CrossRef Google Scholar

Pitselis, G. (2020). Multi-stage nested classification credibility quantile regression model. Insurance: Mathematics and Economics, 92, 162–176.Google Scholar

Rennie, J. D., & Srebro, N. (2005). Loss functions for preference levels: Regression with discrete ordered labels. In Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling, Menlo Park, CA. AAAI Press, vol. 1.Google Scholar

Xiao, Z., Guo, H. & Lam, M. S. (2015) Quantile regression and value at risk. In Handbook of financial econometrics and statistics (pp. 1143–1167). Springer.CrossRef Google Scholar

Őzkale, M. R., & Selahattin, K. (2007). The restricted and unrestricted two-parameter estimators. Communications in Statistics—Theory and Methods, 36(15), 2707–2725. doi: 10.1080/03610920701386877.CrossRef Google Scholar