1. Introduction
Statistics over the last few decades demonstrate an increase in life expectancy in many countries. For example, in Japan, the life expectancy in 2020 was 85 years, whereas it was 60 years in 1950. Such a rapid change in longevity is called the “Longevity Revolution”. This trend confers selectivity and value to human life for individuals. However, this gives rise to several medical, economic, and social welfare problems. For instance, the Japanese financial crisis involving the national pension system is a pressing matter. The prediction of mortality is becoming a critical social issue worldwide.
Since the early 20th century, numerous authors have studied mortality prediction, and a methodology has already been established. Most mortality models treat “death” as the first event of a time-inhomogeneous Poisson process. Let $T_x$ be the remaining lifetime of an individual of age x. It is assumed that
where $\mu(x,t)$ is a (possibly stochastic) intensity function called the force of mortality in the insurance context. Previous studies have derived models for $\mu(x,t)$ . For instance, certain deterministic mortality models, such as the Gompertz, Makeham, and Heligman-Pollard laws, were introduced in earlier years; More recently, numerous stochastic mortality models have been proposed, such as those developed by Olivieri Olovieri (Reference Olovieri2001), Biffis (Reference Biffis2005), Cairns et al. (Reference Cairns, Blake and Dowd2006b), Hainaut and Devolder Hainaut and Devolder (Reference Hainaut and Devolder2008), Biffis et al. (Reference Biffis, Denuit and Devolder2010), Blackburn and Sherris (Reference Blackburn and Sherris2013). Moreover, by assuming that $\mu(x,\cdot)$ is constant between $(t,t+1]$ , say m(x, t), allows for modeling of the mortality m(x, t). This approach corresponds to many established classical models, such as the Lee-Carter (Reference Lee and Carter1992), Renshaw-Gaberman (2006), and CBD models (Cairns et al., Reference Cairns, Blake and Dowd2008, Reference Cairns, Blake, Dowd, Coughlan, Epstein, Ong and Balevich2009), among others. We refer to these approaches as reduced-form approaches, because they consider death just as a stochastic event.
Shimizu et al. (Reference Shimizu, Minami and Ito2020) proposed a structural approach under the “survival energy hypothesis”, which assumes the existence of survival energy for human beings, and death occurs when the energy dissipates. Shimizu et al. (Reference Shimizu, Minami and Ito2020) used inhomogeneous diffusion (ID) processes as the cohort-wise survival energy model (SEM), such as $X^c=(X^c_t)_{t\ge 0}$ with cohort c, called ID-SEM:
where $x_c$ is a positive constant corresponding to the initial survival energy, $U_c$ and $V_c$ are deterministic functions on $\mathbb{R}_+\times \Theta$ with the parameter space given below, and W is the Wiener process:
where $T_c$ is a known parameter called change point, at which the trend of survival energy changes drastically. $\Theta$ of $\vartheta_c=(\alpha_c,\beta_c,\gamma_c,\kappa_c)$ is given by
Under this ID-SEM, they defined the time of death as the first hitting time for $X^c$ to reach zero: $\tau^c \,{:\!=}\, \inf\{t \gt 0\,|\, X^c_t \lt 0\}$ , and illustrated that the mortality function,
or more practically, the following conditional mortality function:
for a suitably chosen age S can fit their empirical version computed using data from the human mortality database (HMD) (Human Mortality Database); see also Remark 2.5. This indicates that the SEM can propose an excellent parametric family to predict future mortality functions; nevertheless, it is merely a fictitious assumption.
As described in Shimizu et al. (Reference Shimizu, Minami and Ito2020), the term “structural approach” follows the structural approach in credit risk analysis. This approach is analogous to the structural approach to “default probability”, in which a stochastic process describes the asset price. Default time was defined as the first hitting time to a certain level. These two approaches are mathematically identical, but there is a significant difference from a statistical perspective: we can observe the asset process in a credit risk context unlike the “survival energy.” However, we can observe many deaths for many individuals’ data, although defaults are not directly observed in default risk calculations (because they are predetermined or assumed to occur before default). We estimated the parameters in the SEM family with careful attention to this point.
The main contribution of this paper is the proposal of a novel SEM in Section 2. The mortality function for ID-SEM is sensitive to the change-point parameter $T_c$ and is difficult to predict for a future cohort because it has no clear trend. To address this issue, we propose a new SEM, IG-SEM, which comprises a simple parametric family without such a threshold and is fully flexible enough to fit the training data without a change point. This is helpful because the mortality function can be written explicitly.
Another contribution is that we propose a methodology to improve long-term future predictions in Section 3. The prediction procedure proposed by Shimizu et al. (Reference Shimizu, Minami and Ito2020) is satisfactory for mid-term future (approximately 10–30 years future) but not for long-term future (e.g., 40 years future cohort). Occasionally, the predictive mortality function does not fit existing data. Therefore, we implement a two-step procedure: the first step is the same as in Shimizu et al. (Reference Shimizu, Minami and Ito2020), and in the second step, we refit the predicted mortality to the existing younger generation data using 95% prediction intervals for the parameters. We illustrate that this second step can drastically improve the long-term prediction.
Section 5 discusses some advantages of SEM over classical regression-type models in the reduced-form approach. Although this section only presents a theoretical discussion, there are some ongoing experimental studies. We refer to, for example, Shirai and Shimizu (Reference Shirai and Shimizu2022) for discussing the prediction of full life expectancy via SEM.
Finally, Section 6 introduces the SEM project. which explicitly provides cohort/countrywise mortality functions with parameter values on a website.
2. A new SEM: Inverse Gaussian SEM
Let us introduce some notations to provide a new SEM with an explicit mortality function.
Random variable Y follows an inverse Gaussian distribution, that is,
with mean a and variance $a^3/b$ if the probability density is given by
Definition 2.1 (IG-SEM; Inverse Gaussian SEM). A survival energy process $X^c=(X_t^c)_{t\ge 0}$ follows the IG-SEM if
where $x_c>0$ is the initial energy and $Y^c\sim IG(\Lambda_c,\sigma_c)$ is an inverse Gaussian process with mean function $\Lambda_c$ and parameter $\sigma_c>0$ ; that is, $Y_0^c=0,\ a.s.$ , and $Y^c$ have independent increments. Moreover, for any $t>s>0$ and an increasing function $\Lambda_c$ with $\Lambda_c(0)=0$ , it follows that
Remark 2.2. If $\Lambda (t) = t$ , then Y is an inverse Gaussian Lévy process that is a spectrally positive pure-jump subordinator. Hence, IG-SEM can include a jump in the path of survival energy, although the path of ID-SEM is continuous.
Such a process is used to model the time of system failure in engineering, where failure occurs at $\tau^c$ if the accumulated damage $Y^c_t$ exceeds a certain threshold $x_c$ : $\tau^c = \inf\{t>0\,|\,Y^c_t \gt x_c\}$ , which follows the same idea as our survival energy for human death; refer to Ye and Chen (Reference Ye and Chen2014). The following theorem provides the mortality function:
Theorem 2.3. The mortality function for IG-SEM is given by
where $\Phi(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}} e^{-z^2/2}\,\textrm{d} z$ ,
Subsequently, we consider the mean function $\Lambda_{\vartheta_c}$ as follows:
where
The parameters are estimated in a manner similar to those for ID-SEM, as shown in the data analysis in Section 4.
Remark 2.4. As described in Shimizu et al. (2020), we can interpret the parameters and coefficients of the SDE. For example, in ID-SEM, the drift term represents the intrinsic survival power of a human and the diffusion term is affected by the social environment. In IG-SEM, $\lambda_c$ may correspond to the drift term because it is the mean of the accumulating damage process $Y^c$ , and $\sigma_c$ may be an environmental parameter because it affects damage variance.
Remark 2.5. We estimate the parameters by least-squares fitting of the “conditional” mortality function $q_c(t|S)$ given in (1.3) to the corresponding empirical version, which can be computed based on the data in the HMD (Human Mortality Database), as explained in Shimizu et al.(Reference Shimizu, Shirai, Kojima, Mitsuda and Inoue2023). We often recommend choosing a conditioning age S of approximately 20 years. This is because mortality at young ages is highly volatile and unstable, making it difficult to predict with simple models, such as ours. The value of S must be determined empirically by examining the abundance of the data and mortality rates at young ages, which depends on the country.
3. Modification of estimated mortality functions
Suppose we have estimated the values of $\vartheta_c$ for some cohorts $c_1<c_2 \lt \dots <c_m$ , say, $\widehat{\vartheta}_{c_1},\dots,\widehat{\vartheta}_{c_m}$ via LSE, as in Shimizu et al. (Reference Shimizu, Minami and Ito2020). We assume that future parameter $\vartheta_c$ is determined as follows:
for the deterministic (unknown) mean function h. Assuming that the estimated parameters $\widehat{\vartheta}_{c_1},\widehat{\vartheta}_{c_2},\dots,\widehat{\vartheta}_{c_m}$ are the realisations of $\vartheta_{c_i}\ (i=1,\dots,m)$ , we estimate h, parameterized case-by-case, as described in Section 4. Once h is estimated, say $\widehat{h}$ , we predict the parameter $\vartheta_{c^{\prime}}$ for a future cohort c’ by
and obtain a predicted mortality function (PMF) $q_{c'}(\cdot, \widehat{\vartheta}_{c'})$ , as in Shimizu et al. (Reference Shimizu, Minami and Ito2020). However, in this study, we propose further modifications to improve the prediction.
Based on assumption (3.1), we can construct the $\alpha$ -prediction interval for $\vartheta_{c'}$ :
where $\widehat{\Sigma}_c$ is an estimator of $\Sigma_c$ in (3.1), and $z_\alpha$ is the $(1-\alpha)$ -percentile of N(0,1); that is,
(Numerical illustrations for these (95%-) prediction intervals are shown in Figures 1 and 2 in the real data analysis). Using prediction interval $\widehat{I}_\alpha^{c',m}$ , we readjust the parameters within the $\alpha$ -prediction interval such that the mortality function can fit the existing (younger) data for cohort c $^{\prime}$ as follows:
Definition 3.1 (Modified PMF). When empirical data $\widehat{q}_{c'}(t|S)$ for $ t= t_1,\dots,t_{d'}$ exist, we reselect the predictor such that
where $ \widehat{I}_\alpha^{c',m}$ is given by (3.3). We used $q_{c'}(\cdot, \widetilde{\vartheta}_{c'})$ as the final predictive mortality function. We refer to it as the modified predicted mortality function (MPMF).
Later, in certain examples, we compare the direct prediction (3.2) with the above modification (3.4).
4. Data analysis: ID-SEM versus IG-SEM
In this section, we compare ID-SEM and IG-SEM using actual data from the HMD (Human Mortality Database) and illustrate that the MPMF with (3.4) can predict future mortality significantly better than the PMF without modification.
Remark 4.1. Determining the change-point parameter $T_c$ in ID-SEM is difficult. In principle, it should be estimated from data, but this is challenging because the estimated mortality function is susceptible to this parameter. In the following examples, we fix $T_c=50$ , for which ID-SEM can fit the training data relatively well.
4.1 Denmark
The first example is Denmark. We use the following mortality data from the $m=25$ cohorts:
and suppose that we are in 1951 (because we already have the data of 110 years old of the 1840 birth cohort). Based on this data, we predicted the mortality functions of 20 years old in the future cohorts $c'=1850$ , 1870, and 1890 for females and males, respectively. The predicted age groups for $c'=1850$ , $c'=1870$ , and $c'=1890$ will be 101 years old, 81 years old, and 61 years old, respectively, based on the assumption that the current year is 1951.
Data analysis was performed using the following procedure.
-
1. We estimate the parameters in $q_c^{ID}(t,\vartheta_c)$ and $q_c^{IG}(t,\vartheta_c)$ for the data $c=$ 1816 – 1840 and obtain the values of the parameters in the future cohorts $c=1850,1870$ and 1890 as in Section 3; also refer to Shimizu et al. (Reference Shimizu, Minami and Ito2020).
The results for ID-SEM and IG-SEM with the (adjusted) coefficient of determination $R^2, (\overline{R}^2)$ and 95. We will show the tables for $R^2\, (\overline{R}^2)$ and the regression curves with the amplitude of the 95%-PI for males, but not the corresponding figures.
-
2. To obtain the MPMF, we split the data into training and test data. For example, in $c'=1890$ , we split the mortality data into two parts: 20–60 years (training data: red dots) and 61–110 years (test data: black dots), and use the training data for modification (3.4).
-
3. In Figure 3, we will visually compare the two mortality curves with test data (black dots) for males and females, but only for $c'=1890$ . For other cohorts ( $c'=1850$ and 1870), we will only show the MSE between the predicted mortality function and the actual empirical mortality function in Table 3.
Remark 4.2. We employed the simplest regression functions feasible to facilitate ease of use. For $\alpha_c,\beta_c,\kappa_c$ , we used a negative increasing function of the form $-c_1 e^{-c_2x} \lt 0$ because these values should be negative. Although $\gamma_c$ should be positive, it may be justifiable to model it using a linear function, among other possible forms, given the available data. Occasionally, one can use the information criteria, for example, AIC or BIC, to select a regression function; it is also possible to use a time-series model to predict future parameters. However, any model has merits and demerits; therefore, we attempted it as simply as possible.
In this cohort (relatively long future prediction), the difference between ID-SEM and IG-SEM is more significant. Even a modified version in IG-SEM cannot predict well in males because of the parameter prediction for $a_c$ and $b_c$ . This is a successful example of ID-SEM with a change point T and more parameters than IG-SEM.
4.2 Norway
The second example is that of Norway. Similar to Denmark, we use the following mortality data from the $m=25$ cohorts:
and assume that we are in 1961 (because we already have data for the 110-year-old in the 1840 cohort). Based on these data, we predicted the mortality functions of the 20-year-old for future cohorts $c'=1860$ , 1880, and 1900 for males and females. The prediction is after the 101 years for $c'=1860$ , 81 years for $c'=1880$ , and 61 years for $c'=1900$ . The results only for $ c'=1900$ are given in Figure 4.
All other procedures were identical to those used in Denmark. We estimate parameters using nonlinear regression and obtain PMFs before/after the modification. For these results, we only show the figures of PMF before/after changes for $c'=1900$ . For the others, we only show the noninear regression curves with the values of $R^2\,(\overline{R}^2)$ and their 95%-PI in Tables 4 and 5. Moreover, Table 6 lists the MSE of MPMFs.
Remark 4.3. In Denmark, ID-SEM is superior to IG-SEM. However, IG-SEM is effective in this example and occasionally outperforms ID-SEM. Because it is challenging to determine a suitable change-point parameter T in ID-SEM, IG-SEM, which has fewer parameters than ID-SEM, is also a good candidate for the prediction model of the mortality function.
In this example, IG-SEM is superior to ID-SEM in females but not males. Accordingly, it would be challenging to determine the SEM to predict and compute some quantities of interest. We should compute them both by ID-SEM and IG-SEM and compare the values objectively to make a decision
5. Advantages of SEM
5.1 Comparison with the classical model with cohort-effects
Shimizu et al. (Reference Shimizu, Minami and Ito2020) demonstrated that ID-SEM is superior to the classical Lee–Carter model. This section compares our SEM with the Renshaw–Haberman model (RHM), extending the Lee–Carter model, including cohort effects.
For comparison, we use the same data as in the previous section for Denmark and Norway. Moreover, for RHM, we used 20–110 years old of the 1911–1950 calendar years in Denmark and the same ages of 1921–1960 in Norway. We compared the modified mortality functions of the 1870 and 1890 birth cohorts using ID- and IG-SEM and the mortality functions of the RHM. The results are shown in Figure 5 along with their MSEs. The results demonstrate that the differences in prediction errors are similar, but ID-SEM is often superior to RHM at senior ages.
Remark 5.1. Although we used the CBD model, for example, Cairns et al. (2006a, 2008), as a candidate cohort model, it was unsuitable for long-term prediction. Therefore, these results were excluded from this study.
5.2 Reducing statistical errors
One of the advantages of the proposed SEM approach is the statistical estimation of the actuarial quantities. Consider, for example, the single premium of all life insurance at age x, say $A_x$ . It is written as follows:
where $v \in (0,1)$ is the discount factor. If we use the Lee–Carter model, then it is written as
where $m_{x,t}$ is the (crude) mortality parameterized by
with parameters $\alpha_x,\beta_x$ estimated based on the predicted values of $\kappa_t$ , which are generated using a time series model that includes some unknown parameters, and ${\epsilon}_{x,t}$ is a noise process. Here, we must estimate numerous parameters $\{(\alpha_y,\beta_y)\}_{y=x,x+1,\dots}$ and those in $\kappa_t$ , which can increase the statistical error of $A_x$ . However, if we use SEM, cohort-wise computation
requires only one parameter estimation for $\vartheta_c$ because $\vartheta_c$ is independent of $k=1,2,\dots$ . This can make the statistical error less than that of classical mortality models.
5.3 Sensitivity analysis
As shown in the previous section, most actuarial quantities are written in the functionals of the mortality function $q_c(t,\vartheta_c)$ , which are often rewritten in terms of the conditional mortality function $q_c(t,\vartheta_c|S)$ , with a few unknown parameters $\vartheta_c$ . This situation is suitable for sensitivity analysis concerning parameter changes.
Consider an actuarial quantity for age x and cohort c represented by a Stieljes-type integral form such as
where h denotes a measurable function of $[0,\infty)$ , The integral sign implies that $\int_0^\infty \,{:\!=}\, \int_{[0,\infty)}$ . We suppose the exchangeability of $\int_0^\infty$ and differentiation $\partial_{\vartheta}$ as far as we need
which is continuous in $\vartheta$ .
Most actuarial quantities are written in this form (see Shimizu et al. (Reference Shimizu, Minami and Ito2020)). For example, $A_x$ , the single premium of all life insurance at age x, is given by
where $v\in (0,1)$ , Moreover, for the immediate payment version:
is given by $H(\vartheta)$ with
It follows from integration by parts that
We are interested in the difference $H(\vartheta) - H(\vartheta_c)$ for different values of parameters $\vartheta$ and $\vartheta_c$ . By Taylor’s formula,
Integral $\int_0^\infty \partial_{\vartheta} q_c(t,\vartheta_c|x)\,\textrm{d} h(t)$ can be evaluated via direct computation. For instance, we have the following inequality:
Lemma 5.2. For the mortality function of IG-SEM, $q_c^{IG}(t,\vartheta)$ with $\vartheta=(a,b,\sigma)$ , we obtain the following estimates:
where $x_c$ is the initial survival energy for cohort c and $\phi_{u,v}(x)$ is the probability density function of the normal distribution with mean u and variance v.
Proof. Note that
where $\Phi(x) = (2\pi)^{-1/2}\int_{-\infty}^x e^{-z^2/2}\,\textrm{d} z$ .
In the final equality, we used
We use an inequality for the “error function” such that for $x>0$ ,
to obtain
We used equality (5.1) in the last equality. The estimate of the partial derivative $\partial_bq_c^{IG}$ is slightly similar and omitted.
For $\partial_\sigma q_c^{IG}$ , it follows from (5.1) that:
Hence, the same argument as above is available and the proof ends. □
Corollary 5.3. Under the same model as in Lemma 5.2, assume that:
Subsequently, it follows that
For our LSE $\widehat{\vartheta}_c$ of $\vartheta_c$ given in Theorem 3.2 in Shimizu et al. (Reference Shimizu, Minami and Ito2020) (see also the erratum Shimizu, Reference Shimizu2022) and the sample size $n_c$ required to obtain the estimator, we have, by the delta method in statistics, that
where the asymptotic variance $\Sigma_{c,x}$ can be estimated using the estimators of $R_d, Q_d, \Sigma$ in Theorem 3.2 in Shimizu et al. (Reference Shimizu, Minami and Ito2020) (with Shimizu, Reference Shimizu2022), and the plug-in estimator $\int_0^\infty \partial_{\vartheta} q_c(t,\widehat{\vartheta}_c|x)\,\textrm{d} h(t) $ . This yields the confidence interval $H(\vartheta_c)$ :
where $z_\alpha$ is the upper $\alpha$ -percentile of the standard normal distribution and $\widehat{\Sigma}_{c,x}$ is an estimator of the asymptotic variance $\Sigma_{c,x}$ .
6. Conclusions
We proposed two types of parametric families for SEMs: ID-SEM and IG-SEM, which provide accurate cohort-wise PMFs. Using the (prediction) confidence intervals for unknown parameters, we can modify the MPMF to fit existing data in a manner consistent with LSE (refer to Remark 3.1).
SEM is a viable candidate for alternative modeling of mortality prediction. We illustrated that both SEMs had high potential for long-term mortality prediction and were superior to the classical model, possibly with cohort effects, for example, LC, RH, and CBD models. Moreover, SEM has numerous theoretical advantages: notational understanding for nonactuarial people, reduced estimation error owing to fewer parameters, and usefulness for sensitivity analysis.
For further information regarding SEM, such as graphs and other topics, please refer to the supplementary article by Shimizu et al. (Reference Shimizu, Shirai, Kojima, Mitsuda and Inoue2023).
Acknowledgments
The author thanks the anonymous referees for their detailed suggestions and proposals, which have improved the paper extensively. This research was partially supported by the JSPS KAKENHI Grant-in-Aid for Scientific Research (C) #21K03358.