1. Introduction
Credibility theory is an experience rating method that combines information from the collective and individual risks to obtain an accurate estimation of the premium of an insurance contract. In a situation where exact credibility can be obtained, the credibility theory is determined how much weight should be assigned to the claim history of an individual. However, in the Bayesian credibility theory, we restate our belief on risk parameters in terms of the prior distribution. Then, given the past risk experience, our belief has been updated and restated in terms of the posterior distribution. Finally, using such a posterior distribution, we derive a predictive distribution to make inferences about the future claim. In cases where the measurable space $\mathcal{X},$ or alternatively say population, is heterogeneous and can be partitioned into some finite homogenous populations, the posterior distribution and consequently the predictive distribution cannot represent in a closed form. Therefore, any inferential statistics, including the Bayesian credibility mean, about future claims cannot derive explicitly.
The history of the credibility theory began with Mowbray (Reference Mowbray1914)’s and Whitney (Reference Whitney1918)’s papers. They suggested a convex combination $P=\zeta{\bar X}+(1-\zeta)\mu,$ of collective premium, $\mu,$ and individual premium, ${\bar X},$ with credibility factor $\zeta,$ as an appropriate premium of an insurance contract. In 1950, Bailey restated this premium (well-known as an exact credibility premium) in the language of parametric Bayesian statistics. Bühlmann (Reference Bühlmann1967) and Bühlmann & Straub (Reference Bülmann and Straub1970) extended the idea of the exact credibility premium to the model-based approach. After the seminal works of Bühlmann (Reference Bühlmann1967) and Bühlmann & Straub (1970), the credibility theory has become very popular in most actuarial aspects. For a comprehensive discussion on various developments and methodologies in credibility theory, see Bühlmann & Gisler (Reference Bühlmann and Gisler2005) and Payandeh (2010). The classical credibility theory provides a relatively simple, but inflexible to mean of that predictive distribution. Hong & Martin (Reference Hong and Martin2017, Reference Hong and Martin2018) introduced a Dirichlet process mixture model as an alternative approach to the classical credibility theory. They studied several theoretical properties and the advantages of their approach. Moreover, they compared it with the classical credibility theory. The precise choice of prior distribution in the Bayesian credibility theory has been studied by Hong & Martin (Reference Hong and Martin2020, Reference Hong and Martin2022).
The Bayesian credibility mean under mixture distributions has been studied by several authors such as Lau et al. (Reference Lau, Siu and Yang2006), Cai et al. (Reference Cai, Wen, Wu and Zhou2015), Hong & Martin (Reference Hong and Martin2017, Reference Hong and Martin2018), Zhang et al. (Reference Zhang, Qiu and Wu2018), Payandeh & Sakizadeh (Reference Payandeh Najafabadi and Sakizadeh2019, Reference Payandeh Najafabadi and Sakizadeh2023), Li et al. (Reference Li, Lu and Zhu2021), among others. All of their approaches are derived based on an approximation. For instance, (1) Payandeh & Sakizadeh (2019) approximated the complicated posterior distribution by a mixture distribution, and then, they derived an approximation for the Bayesian credibility means. Unfortunately, their approximation error rises as the number of past experiences increases; (2) Lau et al. (Reference Lau, Siu and Yang2006) following Lo (Reference Lo1984) restated the predictive distribution of $X_{n+1}$ given the past claim experience $X_1,\cdots, X_n$ as a finite sum over all possible partitions of the past claim experience. Then, using the credibility premium, which is a convex combination of the collective premium (the prior mean) and the sample average of the past claim experience, to derive the Bayesian credibility mean. Certainly, under the class of the exponential family of distributions such a credibility premium coincides with the Bayesian credibility mean, see Payandeh (Reference Payandeh Najafabadi2010) for more details.
This article considers a random sample observation $X_1,\cdots,X_n$ from a K-component mixture distribution whit the cdf $F_X({\cdot})=\sum_{l=1}^{K}\omega_lG_{l}({\cdot}),$ where $\sum_{l=1}^{K}\omega_l=1.$ Moreover, it assumes that for a random variable $X_i,$ $i=1,\cdots,n,$ there is additional information $Z_{i,1},\cdots, Z_{i,m},$ such that given such additional information, one may probabilistically determine the random variable $X_i$ belongs to which component of the K-component mixture distribution, i.e., $P(X_i\sim G_{l}({\cdot})|Z_{i,1},\cdots, Z_{i,m})=\omega_l.$ Under these assumptions, this article provides (1) the Bayesian credibility premium for such a finite mixture distribution, (2) the exact credibility premium for such finite mixture distributions, whenever populations’ claim distribution belongs to the exponential family of distributions and their corresponding prior distribution conjugates with such a claim distribution, (3) a Logistic Regression Credibility model for a situation that a 2-component mixture family of distributions is an appropriate choice for data modelling, and (4) a comparison between the Logistic Regression Credibility and well-known Regression Tree Credibility model.
The rest of this article develops as the following. Section 2 collects some useful preliminaries and provides technical notations and symbols that we will use hereafter now. The main results are represented in section 3. The exact credibility mean under the class of single-parameter exponential family of distributions along with several examples has been given in section 4. For a situation that a 2-component mixture family of distributions is an appropriate choice for data modelling, section 5 suggests a probabilistic model to formulate such additional information and derive the Bayesian credibility mean for a finite mixture of distributions. Moreover, a comparison between the LRC model and its competitor, the Regression Tree Credibility (RTC) model, has been given in section 5.1. Conclusion and suggestions are given in section 6.
2. Preliminaries
A single-parameter exponential family is a family of probability distributions whose probability density/mass function can be restated as
where $a({\cdot}),$ $\phi({\cdot}),$ and $t({\cdot})$ are given functions, and the normalising factor $c({\cdot})$ is defined based on the fact that $\int_{S_X} f(x|\theta)dx=1.$
By setting $\eta=-\phi(\theta),$ Jewell (Reference Jewell1974) showed that, based upon random sample $X_1,\cdots,X_n,$ and under the conjugate prior distribution
the Bayesian credibility can be expressed based on the sufficient statistic $t({\cdot})$ as
where the credibility factor $\zeta_n=n/(n+\alpha_0)$ and ${\bar t}_n=\sum_{i}^{n}t(x_i)/n.$
For example, for the normal distribution with given mean $\mu_0$ and unknown variance $\sigma^2.$ To imply Jewell (Reference Jewell1974)’s findings, one may define the precision $\theta$ as $\theta=1/\sigma^2$ and $t(x)=(x-\mu_0)^2/2.$ Now by considering the Gamma conjugate prior (with parameters $\alpha_0$ and $\beta_0$ ) for $\theta$ then get
Therefore, the Bayesian credible prediction for the variance of $X_{n+1}$ is the linear combination of sample variance and mean of the conjugate prior.
A random variable X, given parameter vector $\boldsymbol{\Psi},$ has a K-component finite mixture distribution if it’s corresponding cdf can be reformulated as
where $G_l(x|\boldsymbol{\Psi})$ -s are some given the cdfs, $\omega_l\in[0,1],$ for $l=1,\cdots,K,$ $\sum_{l=1}^{K}\omega_l=1.$
The finite mixture distributions have proved remarkably useful in modelling an enormous variety of phenomena in a wide range of branches in climatology, demographics, economics, actuarial science, statistics, healthcare, and a mixture of expert models and engineering. Indeed, the shape of a finite mixture distribution is flexible, being able to capture, many aspects of the collected data, such as multimodality, heavy-tailed, truncated, skewness, and kurtosis, see Miljkovic & Grün (Reference Miljkovic and Grün2016), Blostein & Miljkovic (Reference Blostein and Miljkovic2019) and de Alencar et al. (2021), among others, for more details. Moreover, one of the most advantages of finite mixture distributions is that they illustrate most aspects of complex systems which cannot be done by a single distribution, see McLachlan & Peel (Reference McLachlan and Peel2004), among others, for more details on mixture models. A finite mixture distribution is a simple and elementary model, but unfortunately, such simplicity does not extend to the derivation of either the maximum likelihood estimator or Bayes estimators (Lee et al., Reference Lee, Marin, Mengersen and Robert2009). In fact, based upon a random sample observation $X_1,\cdots, X_n,$ the likelihood function of a K-component mixture distribution is a product of a summation, which can be turned into a sum of $K^n$ terms. Therefore, it will be computationally too expensive to be used for more than a few observations. To overcome this problem, several attractive approaches have been introduced by the authors. For instance, Keatinge (Reference Keatinge1999) used the Karush-Kuhn-Tucker theorem to provide a maximum likelihood estimator algorithm to estimate the weights of a finite mixture of exponential distributions. Other authors employed a demarginalisation argument (or missing data approach) to assign a random variable $X_i$ to a subgroup, using a random latent variable. Then using an EM algorithm (Dempster et al., Reference Dempster, Laird and Rubin1977) or the data augmentation algorithm (Carvajal et al., Reference Carvajal, Orellana, Katselis, Escárate and Agüero2018) to derive an estimation. Some authors came up with an approximation technique; for instance, Payandeh & Sakizadeh (2019) approximated the Bayesian likelihood function for a mixture distribution by a practical and appropriate distribution. Unfortunately, the accuracy of their approximation technique dramatically reduces as the number of observations increases. All of these methods are time-consuming (Frühwirth-Schnatter, Reference Frühwirth-Schnatter, Celeux and Robert2019) or suffer from low accuracy.
A class of K-component finite mixture distributions is said to be identifiable whenever the equality of any two members $F({\cdot})$ and $F^*({\cdot})$ of this class implies: (1) equality of their components, (2) theirs weights, and (3) their cdfs. Identifiability problems for finite and countable mixtures have been widely investigated. Teicher (Reference Teicher1960, Reference Teicher1963) established a necessary and sufficient condition for the identifiability of the class of finite mixture distributions. Moreover, he proved the identifiability of a class of mixture Normal (or Gamma) distributions. Atienza et al. (Reference Atienza, Garcia-Heras and Munoz-Pichardo2006) showed that a class of all finite mixtures distributions generated by a union of Lognormal, Gamma, and Weibull distributions is identifiable. Unfortunately, most mixture distributions are not identifiable because they are invariant under permutations of the indices of their components. This problem is well-known as the “ label-switching problem.” The posterior distribution may also inherit such the “ label-switching problem” from its prior distribution (Rufo et al., Reference Rufo, Pérez and Martín2006 and Reference Rufo, Pérez and Martín2007). Under the “ label-switching problem,” there is a positive probability that at least one of the components in a finite mixture distribution does not contribute to any of the observations. Therefore, the random sample $x_1,\cdots,x_n$ does not carry any information on this component. Consequently, unknown parameter(s) of such a component cannot be estimated under either classical or Bayesian frameworks. A naïve solution to the “ label-switching problem” is to impose some constraint on the parameter space for the classical approach (Maroufy & Marriott, Reference Maroufy and Marriott2017), and for the Bayesian approach, some constraints have been added to the prior distribution that leads to a posterior distribution that does not suffer from the “ label-switching problem” (Marin et al., Reference Marin, Mengersen and Robert2005). Unfortunately, insufficient care in the choice of suitable identifiability constraints can lead to other problems (Rufo et al., Reference Rufo, Pérez and Martín2006 and Reference Rufo, Pérez and Martín2007).
It is worthwhile to mention that if random variable X, given $\boldsymbol{\Psi}$ , has the cdf function (4), one may not conclude that $P(X\in PoP_k|\boldsymbol{\Psi})=\omega_k,$ where $X\in PoP_k$ stands for “ $X|\boldsymbol{\Psi}\sim F_k$ .” To observe this fact, consider a 2-component mixture distribution $F(x)=\omega_1 G_1(x)+\omega_2G_2(x).$ Now for an arbitrary density function $G_3({\cdot}),$ set $G^*_1(x)=G_3(x)$ and $G^*_2(x)=(\omega_1G_1(x)+\omega_2G_2(x)-\omega_1G_3(x))/\omega_2.$ Now observe that $F^*(x)= \omega_1G_1^*(x)+\omega_2G_2^*(x)=F(x).$
Note 1. We should note that in this article, alternatively, we use $X\in PoP_k$ instead of $X\sim G_k.$
Suppose parameter vector $\boldsymbol{\Psi}$ can be restated as $\boldsymbol{\Psi}=(\theta_1,\theta_2,\cdots,\theta_K),$ based upon random sample ${\tilde{\boldsymbol{{X}}}}=(X_1,X_2,\cdots,X_n),$ the likelihood function and the posterior distribution, respectively, can be restated as
where $\pi(\boldsymbol{\Psi})$ stands for prior distribution on $\boldsymbol{\Psi}$ and $g_k({\cdot})$ is density function of the $k^{th}$ component.
To derive a maximum likelihood estimation (resp. a Bayesian estimator) using Equation (5) (resp. Equation (6)), the missing data approach is the most popular method.
The following explain such an approach.
Note 2. Suppose random variables $X_1,X_2,\cdots,X_n$ corresponding to the observed sample $x_1,x_2,\cdots,x_n$ are accompanied with latent binary random vector $\tilde{\boldsymbol{{H}}}=\left(H_{1,l},H_{2,l},\cdots,H_{n,l}\right)^\prime,$ for $l=1,2,\cdots,K,$ which indicating each observation arrives from which component/population, i.e., $P\!\left(X_i\in PoP_{k}|H_{i,k}=1\right)=1$ and $P\!\left(X_i\notin PoP_{k}|H_{i,k}=0\right)=1.$ The likelihood function (5) and posterior distribution (6), respectively, can be restated as
Now in $s{\textrm{th}}$ iteration of the E-step, one takes expectation with respect to conditional posterior distribution of the binary latent variable $H_{il},$ given observed data and update parameters at $(s-1){\textrm{th}}$ iteration.
Diebolt & Robert (Reference Diebolt and Robert1994) and Zhang et al. (Reference Zhang, Zhang and Yi2004) showed that such a missing data approach is very expensive from computational viewpoint.
Directly using the Likelihood function (5) or the posterior distribution (6), well-known as a combinatorial approach, see Marin et al. (Reference Marin, Mengersen and Robert2005) for a brief review. The combinatorial approach restates such product-summations equations as $K^n$ summation terms. To avoid a long presentation, we use some notations or symbols which defined in Table 1.
Note: $\mathcal{S}^n,$ $\mathcal{S}^n_i,$ $B_{ir}$ and $B_{ir}^c$ define on the index of observations rather than their values.
Before we go further, we provide a simple example.
Consider a 2-component mixture distribution function with density function $\omega_1f_1(x|\theta_1)+\omega_2f_2(x|\theta_2).$ Moreover, suppose that we have sample observation $X_1,X_2,X_3.$ Using Table 1’s symbols, we have
The likelihood function can be restated as
It would worthwhile to mention that a given K-component mixture distribution can be reformulated as
where
This type of presentation will be employed whenever we like to just estimate the parameter of the $l{\textrm{th}}$ component.
Hereafter now, we assume the K-component mixture distribution (4) is an identifiable model.
The following used the combinatorial method, to restate the likelihood function for the K-component mixture distribution (4).
Lemma 1. Suppose that random sample $X_1,\cdots,X_n$ comes from the K-component mixture distribution (4). The likelihood function for mixtures of distributions can be restated in the following recursive manner
where $L_{K-1}\!\left(\boldsymbol{\Psi}({-}K)|{\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}\right)$ stands for the likelihood function, based upon the density function $g^*({\cdot})$ (given by Equation (7) and random sample ${\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}.$
Proof. Using the fact that
and such partitions are disjoint, one may restate the likelihood function as
The second equation arrives from the assumption that given parameter vector $\boldsymbol{\Psi},$ two random samples ${\tilde{\boldsymbol{{X}}}}_{B_{ir}}$ and ${\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}$ are independent.
The Bayes estimator for a given parameter of the K-component mixture distribution (4) under the squared error loss function is given as follows.
Lemma 2. Assume that random sample $X_1,\cdots,X_n$ comes from the K-component mixture distribution (4). Moreover assume that $\pi(\theta_1,\theta_2,\cdots,\theta_K)=\prod_{j=1}^{K}\pi(\theta_j).$ Then, the Bayesian estimator, under the square error loss function, for parameter $\theta_l$ is
where
Proof. The posterior distribution of $\Theta_l|{\tilde{\boldsymbol{{X}}}}$ can be restated as
where ${\mathbf{\int}}_{\psi({-}l)}$ stands for $\int_{\theta_1}\cdots\int_{\theta_{l-1}}\int_{\theta_{l+1}}\cdots \int_{\theta_{K}},$ ${\textbf{d}}\psi({-}l)=d\theta_1\cdots d\theta_{l-1}d\theta_{l+1}\cdots d\theta_K$ and $ \displaystyle C_{ir}^{(l)} =P\!\left({\tilde{\boldsymbol{{X}}}}_{B_{ir}}\in PoP_{l},{\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}\notin PoP_{l}|{\tilde{\boldsymbol{{X}}}}\in\bigcup_{l=1}^{K}PoP_l\right).$
Since the Bayes estimator under the squared error loss function is the posterior expectation, we obtain the desired result.
Now, we concentrate on the Bayesian credibility mean for the K-component mixture distribution (4).
3. A Recursive Formula for the Bayesian Credibility Mean
The Bayesian credibility mean of $X_{n+1}$ based upon the past information $X_1,X_2,\cdots,X_n$ is
The following represents a recursive statement for the Bayesian credibility mean under the K-component mixture distribution (4).
Theorem 1. Assume that the observations $X_1,\cdots,X_n$ come from the K-component mixture distribution (4). Moreover, suppose that the prior distribution $\pi(\theta_1,\theta_2,\cdots,\theta_K)$ is independent, i.e., $\pi(\theta_1,\theta_2,\cdots,\theta_K)=\prod_{k=1}^{K}\pi_k(\theta_k).$ The Bayesian credibility mean based upon such random sample and the K-component mixture distribution is
where $\displaystyle C_{ir}^{(K)} =P\!\left({\tilde{\boldsymbol{{X}}}}_{B_{ir}}\in PoP_{K},{\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}\notin PoP_{K}|{\tilde{\boldsymbol{{X}}}}\in\bigcup_{k=1}^{K}PoP_k\right).$
Proof. Using the definition of $C_{ir}^{(K)},$ one may conclude that
Theorem 1 provides a recursive formula to evaluate the Bayesian credibility mean. A practical application of this theorem is very expensive, to see that, please see the following example.
Teicher (Reference Teicher1960, Reference Teicher1963) established the identifiability of a class of mixture Gamma distribution, using this fact, the following example provides the Bayesian credibility mean (or premium) for a class of 2-component exponential distribution with Gamma conjugate prior distributions.
Example 1. Suppose given parameter vector $\boldsymbol{\Psi}=(\theta_1,\theta_2)$ , random sample $X_1,X_2,\cdots,X_n$ obtained from a 2-component exponential distribution with density function
where $\omega_1,\omega_2\in[0,1]$ and $\omega_1+\omega_2=1.$ Moreover, consider the conjugate prior $Gamma(\alpha_i,\beta_i)$ for parameter $\theta_i,$ for $i=1,2.$ Now, we are interested in the Bayesian credibility premium under this setting.
To obtain the desired Bayesian credibility premium, we employ the result of Theorem 1. Application of this theorem arrives under the following two steps:
-
Step 1) $C_{ir}^{(2)}=P\!\left({\tilde{\boldsymbol{{X}}}}_{B_{ir}}\in PoP_{2}\,\& \,{\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}\notin PoP_{2}|{\tilde{\boldsymbol{{X}}}}\in\bigcup_{l=1}^{2}PoP_l\right),$
-
Step 2) $\mathbf{E}_{1}\bigg(X_{n+1}|{\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}\in PoP_{1}\bigg)$ and $\mathbf{E}_{1}\bigg(X_{n+1}|{\tilde{\boldsymbol{{X}}}}_{B_{ir}}\in PoP_{2}\bigg).$
For Step 1 observe that:
Therefore, one has to calculate
Using the above findings, we have
Now observe that:
One may similarly calculate $\pi(\theta_1|\left({\tilde{\boldsymbol{{X}}}}_{B_{ir}^c}\in PoP_{1}\right).$
Now, we move to Step 2.
Similarly,
Therefore, using Theorem 1 the Bayesian credibility premium is
It is worthwhile mentioning that, in a situation that $\omega_1=1$ (or $\omega_2=0$ ), the summation $\sum_{i=0}^n\sum_{r=1}^{\binom{n}{i}}C_{ir}^{(2)}$ just valid for $i=0$ which $C_{01}^{(2)}=1.$ Therefore, under this setting the Bayesian credibility premium, given by Equation (9), is the well-known Bayesian credibility premium under the Exponential-Gamma assumption, in the other words,
where ${\bar{\boldsymbol{{x}}}}_{\mathcal{S}^n}={\bar{\boldsymbol{{x}}}}=\sum_{k=1}^nx_k/n.$
The combinatorial object in the Bayesian credibility mean (see Equation (9)) for Example 1 makes it very hard to use. Table 2 represents the number of combinators one has to be considered, whenever he/she would like to use Equation (9). As one may observe, implementation of Equation (9) even for sample size $n=30$ is very expensive and cannot be done with a regular computer.
To remove such barrier, we have two possibilities:
Approximate $C_{ir}^{(l)}$ by a function which just depends on i an l
Impose some restriction on our problem such that $C_{ir}^{(l)}$ does not depend on r.
Somehow, the first approach has been employed by Lau et al. (Reference Lau, Siu and Yang2006). They employed the sampling scheme based on a weighted Chinese Restaurant algorithm to estimate the Bayesian credibility for the infinite mixture model from observed data.
The next section considers a situation where the above recursive formula is simplified and the exact Bayesian credibility mean is obtained.
4. Exact Bayesian Credibility Mean
Hereafter now, we follow the second approach. Therefore, we consider the following model assumption.
Model Assumption 1. Suppose given parameter vector $\boldsymbol{\Psi},$ random variables $X_1,\cdots, X_n$ are i.i.d. Moreover suppose that there is an additional information $Z_{i,1},\cdots, Z_{i,m}$ where given such information random variable $X_i,$ with probability $\omega_{l},$ has the cdf $G_l({\cdot}),$ for $l=1,2,\cdots,K,$ where $\sum_{l=1}^{K}\omega_{l}=1.$
The following lemma shows that, under the above model assumption, random variables $X_1,\cdots, X_n$ are a member of the K-component mixture distributions (4).
Lemma 3. Under Model Assumption 1, given $\boldsymbol{\Psi},$ random variables $X_1,\cdots, X_n$ are a member of the K-component mixture distributions (4).
Proof. Under Model Assumption 1 given $\boldsymbol{\Psi},$ random variables $X_1,\cdots, X_n$ are i.i.d. Therefore, we just need to find distribution of the random variable $X_1$
Another useful property of Model Assumption 1 has been given by the following.
Lemma 4. Under Model Assumption 1, the $C_{ir}^{(l)}$ defined in Theorem 1 can be simplified as
Proof. Conditioning the $C_{ir}^{(l)}$ on $\boldsymbol{\Psi},$ one may restate
The last two equations arrive from the fact that $P({X_1}\in PoP_j|\boldsymbol{\Psi})=\omega_j$ and the posterior distribution $\pi(\boldsymbol{\Psi}|X_1,X_2,\cdots,X_n)$ is a proper distribution.
Under Model Assumption, 1’s result of Theorem 1 can be simplified as follows.
Corollary 1. Under Model Assumption 1, the Bayesian credibility mean is
Now, by several examples, we develop the Bayesian credibility mean under the single-parameter exponential family of distributions.
For simplicity in presentation, hereafter now, we just consider the single-parameter exponential family of distributions, given by Equation (1), with $\phi(\theta)=-\theta$ for some possible extension of our finding see section 5.
Before move further, it would be useful to observe that
Identifiability of a class of mixture of normal distributions has been established by Teicher (Reference Teicher1960, Reference Teicher1963). Therefore, we may consider the following example.
Example 2. Suppose that under Model Assumption 1, the random sample $X_1,X_2,\cdots,X_n,$ given parameter vector $\boldsymbol{\Psi}=(\theta_1,\theta_2,\theta_3)^\prime,$ has been distributed according the following 3-component normal mixture distribution
where $\sigma_1^2,$ $\sigma_2^2$ $\sigma_3^2$ are given, $\omega_1,\omega_2,\omega_3\in[0,1]$ and $\omega_1+\omega_2+\omega_3=1.$
Moreover, suppose that, for $l=1,2,3,$ $\theta_l$ has the conjugate prior distribution $N\!\left(\mu_l,b_l^2\right).$
Now using result of Corollary 1, the recursive Bayesian credibility mean/premium is
Another application of Theorem 1 leads to
The exact Bayesian credibility mean under a 1-component normal mixture distribution helps us to conclude
see Bühlmann & Gisler (Reference Bühlmann and Gisler2005), among others for more details.
Substituting the above findings in Equation (11) and an application of Equation (10), the Bayesian credibility mean $\mathbf{E}_{3}(X_{n+1}|{\tilde{\boldsymbol{{X}}}})$ can be restated as
where
To show application of recursive formula represented in Theorem 1, the following considers a 4-component mixture distribution.
Example 3. Suppose that under Model Assumption 1, the random sample $X_1,X_2,\cdots,X_n,$ given parameter vector $\boldsymbol{\Psi}=(\theta_1,\theta_2,\theta_3,\theta_4)^\prime,$ has been distributed according the following 4-component normal mixture distribution
where for $l=1,2,3,4,$ variance $\sigma_l^2,$ are given, $\omega_l\in[0,1]$ and $\omega_1+\omega_2+\omega_3+\omega_4=1.$
Moreover, suppose that, for $l=1,2,3,4,$ $\theta_l$ has the conjugate prior distribution $N\!\left(\mu_l,b_l^2\right).$
Now an application of Corollary 1 leads to the following Bayesian credibility mean
And again, application of Corollary 1 leads to
The exact credibility mean is well-known for a 1-component normal mixture distribution (Bühlmann & Gisler, Reference Bühlmann and Gisler2005), using this knowledge, we may have
Putting the above findings in Equation (11), the Bayesian credibility mean is
where
In the above two examples, we just consider a situation, in which all elements of the mixture distributions belong to a family of distribution. The following example considers a case that the mixture distributions are the union of different distributions. Using Atienza et al. (Reference Atienza, Garcia-Heras and Munoz-Pichardo2006)’s method, we established (but for briefness we eliminate its proof) that a mixture union of Gamma, Lognormal and Weibull distributions constructs a class of identifiable distributions. Therefore, without any concern about identifiability, we may consider the following example.
Example 4. Suppose that under Model Assumption 1, the random sample $X_1,X_2,\cdots,X_n,$ given parameter vector $\boldsymbol{\Psi}=(\theta_1,\theta_2,\theta_3),$ has been distributed according the following distribution
where parameters $\alpha,$ $\sigma_0^2,$ $\lambda$ are given and given weights $\omega_1,$ $\omega_2$ and $\omega_3$ satisfy $\omega_1+\omega_2+\omega_3=1.$
Moreover, suppose that $\theta_1,$ $\theta_2$ and $1/\theta_3,$ respectively, have the conjugate prior distribution $Gamma(\alpha_1,\beta_1),$ $N\!\left(\mu_2,b_2^2\right)$ and $Gamma(\alpha_3,\beta_3).$
It is well-known that the exact Bayesian credibility mean for a 1-component Gamma mixture, a 1-component Lognormal mixture and a 1-component Weibull mixture distributions, respectively, are
see Bühlmann & Gisler (Reference Bühlmann and Gisler2005), among others, for more details.
Using the above results along with double applications of Corollary 1, the Bayesian credibility mean is
where
The next section develops a practical idea based on the logistic regression to derive a probabilistic model to use the additional information $Z_{i,1}\cdots,Z_{i,m}$ to assign population’s weight whenever, we partition measurable space $\mathcal{X}$ into two populations.
5. Logistic Regression Credibility for Two Populations
Consider a situation that the measurable space $\mathcal{X}$ can be partitioned into two populations. Moreover, suppose that for each random variable $X_i$ there is some additional information $Z_{i,1}\cdots,Z_{i,m},$ are available. Now using the logistic regression, one may evaluate the first Population’s weight by
Therefore, the result of Corollary 1 can be simplified by the following. Since this result initiated from the logistic regression, hereafter now, we call it “Logistic Regression Credibility.”
Remark 1. Suppose that measurable space $\mathcal{X}$ can be partitioned into two populations, then, under Model Assumption 1, the Bayesian credibility mean is
where $\omega$ is given by Equation (13).
The following example represents a practical application of the Logistic Regression Credibility (given by Remark 1)
Example 5. Suppose an insurance company based upon its past experience classified its policyholders in two homogenous groups, labelled “Group 1” and “Group 2,” where claim size distribution for the Group 1 is a Normal distribution (with mean $\theta_1$ and variance $0.36$ ) and for the Group 2 is a Normal distribution (with mean $\theta_2$ and variance $0.40$ ) where $\theta_1$ and $\theta_2,$ respectively, have been distributed according $N(9,0.25)$ and $N(10, 0.25).$ Moreover, suppose that the insurance company developed the following logistic regression model to assign its policyholder to the “Group 1”
where $z_1,\cdots, z_5,$ respectively, stand for Gender (0=male and 1=female), Marital Status (0=single and 1=Married), Age (ranges from 20 to 80), Occupation class (distinct values 1,2, 3, and 4) and location (distinct values 1 to 30).
Now consider a 40 years single man who lives in location labelled 9 and his job is labelled 3. Moreover, suppose that his 10 years loss reports are $16.19502,$ $13.92823,$ $15.69760,$ $15.00515,$ $15.30293,$ $16.54005,$ $16.03626,$ $16.84823,$ $14.49716,$ $14.75258.$
Using the Equation (14), the policyholder with probability $\omega=0.2378$ ( $1-\omega=0.7622$ ) belongs to “Group 1” (“Group 2”), and his next year Bayesian credibility premium is
The Logistic Regression Credibility, say LRC, and the Regression Tree Credibility, say RTC, share a same idea. Both of them use a statistical model to partition the measurable space $\mathcal{X}$ into some populations. But, the RTC method develops a credibility prediction for each population while the LRC provides just one credibility prediction for all populations with different weight. The following subsection shows that at least for some cases the LRC has a lower risk function.
5.1 Logistic regression credibility versus the regression tree credibility
Diao & Weng (Reference Diao and Weng2019) introduced the RTC model. Their model, in the first step, employs some statistical techniques (such as logistic regression) to partition the measurable space $\mathcal{X}$ into some small regions in which a simple model provides a good fit. Then, in the second step, for each region they applied the Bühlmann-Straub credibility premium formula for each region to predict credibility premium prediction. More precisely, given observed data $X_i$ and its associated information $Z_{i,1}\cdots,Z_{i,m},$ for $i=1,\cdots,n.$ Using a statistical model, such as logistic regression given by Equation (13), it determines the probability that such claim experience $X_1,X_2,\cdots,X_{n}$ arrives from the Population 1. If such a probability passes $1/2,$ the credibility premium predicts using the model which developed for Population 1, otherwise, the model developed for Population 2.
Under the squared error loss function, the RTC method decreases risk function of prediction compared to the regular credibility method. Diao & Weng (Reference Diao and Weng2019) presented its theoretical proof for the situation where measurable space $\mathcal{X}$ has been partitioned into two distinguished classes.
The following lemma shows that at least for an interval about $1/2,$ the LRC’s risk function dominates the RTC’s risk function.
Lemma 5. Under Model Assumption 1, consider two following different scenarios to predict the credibility mean based upon the i.i.d. random claim experience $X_1,X_2,\cdots,X_{n}.$ .
-
Scenario 1 (the LRC approach): The claim experience $X_1,X_2,\cdots,X_n,$ given parameter vector $\boldsymbol{\Psi}=(\theta_1,\theta_2)^\prime,$ has been distributed according the following 2-component normal mixture distribution $\omega N\!\left(\theta_1,\sigma_1^2\right)+(1-\omega)N\!\left(\theta_2,\sigma_2^2\right),$ where $\sigma_1^2,$ $\sigma_2^2$ are given, $\omega\in[0,1]$ and for $j=1,2,$ $\theta_j$ has the conjugate prior distribution $N\!\left(\mu_j,\tau_j^2\right).$
-
Scenario 2 (the RTC approach): The measurable space $\mathcal{X}$ partitions into two populations in which if the i.i.d. random claim experience $X_1,X_2,\cdots,X_{n}$ are belong to Population $j=1,2,$ then $E(X_i)=\mu_j,$ $Var(E(X_i|\Theta)) =\tau^2_j$ and $E( Var(X_i|\Theta))= \sigma^2_j.$
Then, at least for the situation that the population’s weight, $\omega,$ (given by Equation (13)) locates in an interval $I=\left[\left(\tau_2^2 - R_2\right)/\left(R_1+\tau_2^2\right),\,\, (R_1+R_2)/\left(R_2+\tau_1^2\right)\right] $ under the squared error loss function, the LRC’s risk function dominates the RTC’s risk function, where $R_l =\sigma_l^2\tau_l^2/\left(n \tau_l^2+\sigma_l^2\right)$ for $l=1,2.$
Proof. Similar to Example 2, one may show that under the Scenario 1, the Bayesian credibility premium is $\omega[\xi_1\bar{X}+(1-\xi_1)\mu_1] +(1-\omega)[\xi_2\bar{X}+(1-\xi_2)\mu_2]$ and its corresponding risk function under the squared error loss function is
where $\xi_1=\sum_{i=0}^{n} \omega^i (1-\omega)^{n-i}\binom{n}{i}\frac{i\tau_1^2}{i\tau_1^2 + \sigma_1^2}$ and $\xi_2=\sum_{i=0}^{n} \omega^i (1-\omega)^{n-i}\binom{n}{i}\frac{(n-i)\tau_2^2}{(n-i)\tau_2^2 + \sigma_2^2}.$
However under the Scenario 2, since the RTC method employs the Bühlmann-Straub credibility premium formula, its credibility premium is $ \alpha_j\bar{X} + \left(1- \alpha_j\right) \mu_j,$ where $\alpha_j = \frac{n}{n + \sigma^2_j/\tau^2_j},$ whenever Population $j=1,2$ has been chosen. Therefore, its corresponding risk function under the squared error loss function is
where the probability that the past claim experience $X_1,X_2,\cdots, X_n$ belong to Population 1, $\omega,$ derived from Equation (13).
Now observe that, difference between the above two risk functions, $L_{LRC}(\omega) - L_{RTC}(\omega),$ can be restated as
where
Now without losing the generality, assume that we can take a derivative with respect to i and observe
at $i=0,$ $M_0\lt 0,$ at $i=n,$ $M_n>0,$ and $\partial^2H_i(\omega)/\partial^2 i>0.$ This means $H_i(\omega)$ is a concave function with respect to i that attains its maximum at $i=0$ and $i=n.$
Therefore,
Imposing negativity on Equations (15) and (16), perspectively, lead to $\omega\leq(R_1+R_2)/\left(\tau_1^2+R_2\right)$ and $1-\omega\leq(R_1+R_2)/\left(\tau_2^2+R_1\right).$ This observation completes the desired result.
One should note that, the above $H_i(\omega)$ also can be stated as
where $A_i^{(l)} = \frac{(n-i)^2 \tau_l^2+n\sigma_l^2}{\left((n-i)\tau_l^2+\sigma_l^2\right)^2} $ , $B^{(l)}=\frac{n^2\tau_l^2+n\sigma_l^2}{\left(n\tau_l^2+\sigma_l^2\right)^2} $ and $C^{(l)} =\frac{\sigma_l^2 \tau_l^2}{n} $ for $l=1,2.$
Since $A^{(l)}_i$ is an increasing function with respect to i, one may observe that $A^{(l)}_i\leq A^{(l)}_n$ and $A_0^{(l)} =B^{(l)}.$ This fact allows one to conclude that
and consequently $L_{LRC}(\omega=0.5) - L_{RTC}(\omega=0.5)\leq0.$ The continuation of $L_{LRC}(\omega) - L_{RTC}(\omega)$ in $\omega$ shows that at least in an interval about $\omega=0.5$ the LRC’s risk function dominates the RTC’s risk function. This means that at least in a situation where one with probability a bit more than 50% is going to assign the past claim experience to one of the population and using the RTC’s method derives the credibility mean for the future claim. We suggest him/her to use the LRC’s method.
Figure 1 illustrates behaviour of $L_{LRC}(\omega) -L_{RTC}(\omega)$ with respect to $\omega,$ for some values of $(n,\sigma_1, \sigma_2, \tau_1, \tau_2).$
6. Discussion and Suggestions
This article considered the Bayesian credibility prediction for the mean of $X_{n+1}$ under a finite class of mixture distributions. In the first step, it developed a recursive formula for the Bayesian credibility mean under such a class of distributions. Since the implementation of the recursive formula is very expensive (see Example 1), therefore, it imposed some additional conditions on the problem. More precisely, it assumed random variables $X_i,$ for $i=1,\cdots,n$ , corresponding to the observed sample $x_i$ accompanied with additional information $Z_{i,1},\cdots,Z_{i,m},$ where under a probabilistic model one may use such observable information to determine the population of random variables $X_i,$ see Model Assumption 1 for more details. Under this new assumption, it developed an exact Bayesian credibility mean whenever all members of such a class of mixture distributions belong to the single-parameter exponential family of distributions. Finally for a situation that the measurable space $\mathcal{X}$ can be partitioned into two populations, it employed the logistic regression and introduced the Logistic Regression Credibility which in the sense of the risk function in some specific population’s weight dominates the Regression Tree Credibility.
We should note that assumption on the additional information $Z_{i,1},\cdots,Z_{i,m},$ has a slight difference by assumption on latent variable $Z_{ij}$ in the EM algorithm (see Note 2). More precisely, under Model Assumption 1, $Z_{i,1},\cdots,Z_{i,m}$ are observable and give a probabilistic information about distribution of random variable $X_i,$ say population’s weight. While under the missing data approach, $Z_{ij}$ is a latent variable which provides certain information about distribution $X_i.$ This fact persuades us to claim assumptions in Model Assumption 1 are available and practicable in many cases, see Example 5 as an evidence.
Our finding can be extended for (1) other indices of $X_{n+1}$ , such as the variance of $X_{n+1},$ as represented in Equation (3), (2) the M-parameter exponential family of distributions, and (3) the Bayesian non-parametric credibility under the Dirichlet process mixture models, which introduced by Fellingham et al. (Reference Fellingham, Kottas and Hartman2015) and enriched by Hong & Martin (Reference Hong and Martin2017, Reference Hong and Martin2018).
To see the second possible extension, the following recalls Jewell (Reference Jewell1974)’s findings for the M-parameter exponential family of distributions with probability density/mass function
where $a({\cdot}),$ $\phi_{m}({\cdot}),$ $t_{m}({\cdot}),$ for $m=1,2,\cdots,M,$ are given functions and the normalising factor $c({\cdot})$ is defined based on the fact that $\int_{S_X}f(x|\theta)dx=1.$ To derive the Bayesian credibility prediction for a given index of $X_{n+1}$ under the M-parameter exponential family of distributions, he set $\eta_m=-\phi_m(\theta),$ and considered the conjugate prior distribution
where $\Delta=(\eta_1,\eta_2,\cdots,\eta_M)^\prime.$ Then, he showed the Bayesian credibility can be expressed based on the sufficient statistics $t_m({\cdot})$ as
where the credibility factor $\zeta_n=n/(n+\alpha_0)$ and ${\bar t}_{m,n}({\tilde{\boldsymbol{{x}}}})=\sum_{i}^{n}b_m(x_i)/n.$
Acknowledgements
The authors would like to appreciate anonymous reviewers for their constructive comments which improve theoretical foundation and presentation of this article.