Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-26T20:50:37.639Z Has data issue: false hasContentIssue false

AffineMortality: An R package for estimation, analysis, and projection of affine mortality models

Published online by Cambridge University Press:  21 May 2024

Francesco Ungolo*
Affiliation:
School of Risk and Actuarial Studies, University of New South Wales, Kensington, NSW, Australia ARC Centre of Excellence in Population Ageing Research, University of New South Wales, Kensington, NSW, Australia Technical University of Munich, Garching, Germany
Len Patrick Dominic M. Garces
Affiliation:
ARC Centre of Excellence in Population Ageing Research, University of New South Wales, Kensington, NSW, Australia School of Mathematical and Physical Sciences, University of Technology Sydney, Ultimo, NSW, Australia
Michael Sherris
Affiliation:
School of Risk and Actuarial Studies, University of New South Wales, Kensington, NSW, Australia ARC Centre of Excellence in Population Ageing Research, University of New South Wales, Kensington, NSW, Australia
Yuxin Zhou
Affiliation:
School of Risk and Actuarial Studies, University of New South Wales, Kensington, NSW, Australia ARC Centre of Excellence in Population Ageing Research, University of New South Wales, Kensington, NSW, Australia
*
Corresponding author: Francesco Ungolo; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

This paper presents the AffineMortality R package which performs parameter estimation, goodness-of-fit analysis, simulation, and projection of future mortality rates for a set of affine mortality models for use in pricing and reserving. The computational routines build on the univariate Kalman Filtering approach of Koopman and Durbin ((2000). Journal of Time Series Analysis, 21(3), 281–296.) along other numerical methods to enhance the robustness of the results. This paper provides a discussion of how the package works in order to effectively estimate and project survival curves, and describes the available functions. Illustration of the package for mortality analysis of the US male data set is provided.

Type
Actuarial Software
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

1. Introduction

The analysis of mortality rates is fundamental for actuaries as these are used to develop and set the premium for life insurance products, estimate liabilities, and develop the corresponding risk management strategies.

It is widely acknowledged in practice how the development of future mortality rates is the outcome of a stochastic process (Cairns et al., Reference Cairns, Blake and Dowd2006a). The seminal work of Lee & Carter (Reference Lee and Carter1992) attracted significant attention in the last three decades towards the development and the extension of stochastic mortality models which, to different degrees, are capable of capturing the different features of the mortality surface, such as the presence of the cohort effect (Willets, Reference Willets2004). These models have been originally developed for the analysis of data in discrete time, such as integer ages and integer calendar (or birth) year. Some examples include the Cairns–Blake–Dowd model (Cairns et al., Reference Cairns, Blake and Dowd2006b), the Poisson log-bilinear approach of Brouhns et al. (Reference Brouhns, Denuit and Vermunt2002), the Plat model (Plat, Reference Plat2009), and the functional approach of Hyndman & Shahid Ullah (Reference Hyndman and Shahid Ullah2007).

More recently, models developed using the tools of financial mathematics gained attention in the literature, following the work of Milevsky & Promislow (Reference Milevsky and Promislow2001). For example, the papers of Schrager (Reference Schrager2006), Biffis (Reference Biffis2005), Dahl (Reference Dahl2004), Blackburn & Sherris (Reference Blackburn and Sherris2013), Jevtić et al. (Reference Jevtić, Luciano and Vigna2013), Jevtić & Regis (Reference Jevtić and Regis2019), and Jevtić & Regis (Reference Jevtić and Regis2021) propose the use of the affine interest rate modeling framework developed by Duffie & Kan (Reference Duffie and Kan1996) for the analysis of mortality dynamics. In this way, it is possible to obtain closed-form formula for the survival curves, which can be used in pricing longevity-related securities and devise risk management strategies for these products. These models assume that the mortality intensity process is driven by a set of latent variables, whose dynamics are characterized by a stochastic differential equation with mean reversion. This implies that the closed-form survival curve is an exponentially affine function of the latent variables.

An advantage of the continuous-time approach for mortality modeling is the use of financial pricing techniques, which are familiar to market practitioners. Specifically, a no-arbitrage valuation framework can be used for pricing life-contingent products and developing appropriate risk management strategies for these products using the analytical results for affine processes (Biffis, Reference Biffis2005).

This paper describes the R package AffineMortality, which supports an extensive analysis of continuous time affine mortality models in the spirit of Blackburn & Sherris (Reference Blackburn and Sherris2013), Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022), and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023). More precisely, the package estimates the parameters of these models using mortality data collected at discrete time points and ages. In addition, the package facilitates the analysis of model fit and the simulation and projection of future mortality rates. These tasks require us to discretize the continuous-time models and to recast the inferential problem in a state-space form.

Schrager (Reference Schrager2006) and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023) describe how the estimation of these models using the base Kalman filter can be problematic due to the numerical issues which follow from the multiplication and inversion of large-dimensional matrices. Hence, they advocate the use of the univariate Kalman filter of Koopman & Durbin (Reference Koopman and Durbin2000). For this reason, the computational routines in AffineMortality are implemented using the univariate Kalman filtering approach, together with other numerical tricks described in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), which make the Kalman filtering procedure more stable.

Alternative methods for the parameter estimation of affine term structure models with latent factors are the simulated moments estimator and the Markov chain Monte Carlo, whose details are described in Singleton (Reference Singleton2006). These methods do not take advantage of the possibility to obtain a closed-form solution for the distribution of the latent variables, as can be possible for several affine specifications. Within the mortality modeling literature, Jevtić et al. (Reference Jevtić, Luciano and Vigna2013) use the differential evolution method to estimate the parameters by minimizing the root mean squared error based on the survival function, which in the findings of the authors can take long time to converge.

The package allows to implement and assess among others, the Blackburn–Sherris model (BS) with independent factors, the BS model with two and three dependent factors, the Arbitrage-Free Nelson-Siegel mortality model (AFNS) with independent and dependent factors, and the Cox–Ingersoll–Ross mortality model (CIR). The plan is to further expand the library of models available for analysis, and to extend the existing ones by accounting for cohort or period specific factors.

To the best of our knowledge, currently available software focus on the analysis of mortality models in discrete time. For example, the R package StMoMo (Villegas et al., Reference Villegas, Kaishev and Millossovich2018) allows for the analysis, among others, of the Lee-Carter (Lee & Carter, Reference Lee and Carter1992), CBD (Cairns et al., Reference Cairns, Blake and Dowd2006b), and the age-period-cohort model by Renshaw & Haberman (Reference Renshaw and Haberman2006). In a similar fashion, StanMoMo (Barigou et al., Reference Barigou, Goffard, Loisel and Salhi2023) performs a Bayesian analysis of stochastic mortality models using Stan. The R package demography (Hyndman et al., Reference Hyndman, Booth, Tickle and Maindonald2022) instead provides functions for demographic analysis including: lifetable calculations, Lee-Carter modeling, functional data analysis of mortality rates, fertility rates, net migration numbers, and stochastic population forecasting.

Furthermore, there are R packages for the analysis of state-space models, such as dse (Gilbert, Reference Gilbert2009), sspir (Dethlefsen et al., Reference Dethlefsen, Lundbye-Christensen and Christensen2022), dlm (Petris, Reference Petris2010), FKF (Luethi et al., Reference Luethi, Erb and Otziger2022), and KFAS (Helske, Reference Helske2017) (see also Tusell, Reference Tusell2011 for a comprehensive review). However, these packages do not readily adapt to the state-space model used for the analysis of affine mortality models. This is because the system matrices of the affine mortality models follow as the solution of an ordinary differential equation, as we briefly illustrate in Section 3, and its subsequent discretization.

R and RStudio may be subject to crashes. Since the optimization process may take a long time to perform, the functions in AffineMortality, which require a lot of time to execute (e.g. affine_fit() for estimating the model parameters), allow the user to stop the process without losing the computations performed thus far. The package deals with fault tolerance by allowing the user to input a directory where the work can be saved as an .Rdata file.

AffineMortality can be installed by running the following commands:Footnote 1

library(devtools)

install_github(“ungolof/AffineMortality”)

library(AffineMortality)

The source code of AffineMortality is available through the GitHub repository https://github.com/ungolof/AffineMortality .

The paper develops as follows: Section 2 introduces the data to be used as input for the analysis, Section 3 summarizes the affine mortality modeling framework and briefly describes the mortality models supported by the package, and Section 4 describes their parameter estimation procedure. Section 5 describes how the package can be used to perform the goodness-of-fit analysis and compare affine models, and Section 6 describes the function affine_project(), which can be called to project future cohort survival curves. The package provides two methods to analyze parameter uncertainty, described in Section 7: the first estimates the covariance of the parameter estimates by using the bootstrap, while the other implements a multiple imputation-based method. The step-by-step illustration of the package is provided in Section 8. Section 9 describes other functions in AffineMortality and Section 10 concludes.

2. Input data

Let $\mu _{x,t}$ denote the force of mortality for an individual aged $x$ last birthday in calendar year $t$ . Without loss of generality, $\mu _{x,t}$ is approximated as the central death rate $m_{x,t}$ , when we assume that the force of mortality is constant between each integer age $x$ and $x+1$ , and between each calendar year $t$ and $t+1$ . The central death rate $m_{x,t}$ is empirically estimated as the ratio between the observed number of deaths $d_{x,t}$ and the central exposure at risk years $E^c_{x,t}$ .

The period survival probability at time $t$ of an individual aged $x$ at time $t$ until age $x+T-t$ is given by:

(2.1) \begin{align} S_x\left (t,T\right ) = \displaystyle \prod _{j=1}^{T-t} \exp \left (-\mu _{x+j-1,t}\right ) = \exp \left (-\displaystyle \sum _{j=1}^{T-t} \mu _{x+j-1,t} \right ) \end{align}

The estimation of affine mortality models uses the average force of mortality denoted for each calendar year $t$ and each age as:

(2.2) \begin{align} \bar{\mu }_{x}\left (t,T\right ) =\frac{1}{T-t} \displaystyle \sum _{j=1}^{T-t} \mu _{x+j-1,t} \approx \frac{1}{T-t} \displaystyle \sum _{j=1}^{T-t} m_{x+j-1,t} \end{align}

Here, $x$ is fixed and denotes the smallest age in the age-range of interest (for example, equal to 50 in Ungolo et al., Reference Ungolo, Garces, Sherris and Zhou2023 and Huang et al., Reference Huang, Sherris, Villegas and Ziveyi2022). The dataset for the analysis is a matrix of dimension $N\times K$ , where $N$ is the number of ages in the age-range of interest and $K$ is the number of calendar years for the analysis.

Conversely, given the average force of mortality, we can obtain $\mu$ , by means of the following recursion, starting from $\mu _{x,t} = \bar{\mu }_{x}\left (t,t+1\right )$ :

(2.3) \begin{align} \mu _{x+i,t} = i \bar{\mu }_{x}\left (t,t+i\right ) - \left (i-1\right )\bar{\mu }_{x} \left (t,t+i-1\right ) \end{align}

for $i=2,\ldots,N$ . Furthermore, let us define the column vector $\bar{\mu }_{t} = [\bar{\mu }_{x}\!(t,t+1),\ldots,$ $\bar{\mu }_{x}\!(t,t+N)]^\top$ (we drop the reference on $x$ for practical reasons).

When processing the force of mortality data $\mu _{x,t}$ , the function rates2avg() in AffineMortality can be used to transform the $N\times K$ -dimensional matrix of $\mu _{x,t}$ rates into the corresponding matrix of $\bar{\mu }$ rates. The inverse operation can be carried out using the function avg2rates():

mu_bar <- rates2avg(mu_xt_rates)

mu_xt <- avg2rates(mu_bar)

When analyzing age-period mortality rates, each cell of $\bar{\mu }$ contains the average mortality rate between the base age $x$ and the age $x+y$ to which the $y$ th row refers in calendar year $t$ .

The matrix of average forces of mortality is the key input for the estimation of affine mortality models. The use of the average forces of mortality yields smoother data, which renders the estimation process more stable. This is the approach adopted within the interest rate literature (see Christensen et al., Reference Christensen, Diebold and Rudebusch2011), as well as in the analysis of affine mortality models (Blackburn & Sherris, Reference Blackburn and Sherris2013; Huang et al., Reference Huang, Sherris, Villegas and Ziveyi2022 and Jevtić & Regis, Reference Jevtić and Regis2019).

A similar data set can be set up for the analysis of age-cohort mortality rates, as discussed in Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022) and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).

3. Affine mortality models

In this section, we provide a brief overview of the affine mortality modeling framework, and describe the different mortality models supported by AffineMortality. We first present the model setting in the real-world probability space $(\Omega,\mathcal{F},P)$ , where $P$ is the real-world probability measure. We then discuss the development of the mortality model under a risk-neutral pricing measure $Q$ that is equivalent to $P$ . The development of the model under $Q$ is important in view of applications in pricing and valuation of mortality-linked cash flows, since under $Q$ , the combined financial and actuarial market is assumed to be arbitrage-free. The mortality dynamics under $P$ and under $Q$ are linked by a careful choice of a Radon-Nikodým density process, which shall be discussed below.

Let $\tau$ denote the random time of death of an individual who is currently aged $x$ . We equip the probability space with a right-continuous and $P$ -complete filtration $\mathbb{F} = \{\mathcal{F}_t\}_{t\geq 0}$ , which can be decomposed as $\mathbb{F} = \mathbb{G} \vee \mathbb{H}$ , where $\mathbb{G}$ is a filtration containing all financial and actuarial information, except the actual time of death, and $\mathbb{H}$ is the smallest filtration such that $\tau$ is a $\mathbb{H}$ -stopping time. Hence, $\tau$ is also a $\mathbb{F}$ -stopping time. See, for example, Biffis (Reference Biffis2005) and Blackburn & Sherris (Reference Blackburn and Sherris2013) for further details.

We model $\tau$ as a doubly-stochastic stopping time that admits an intensity process $\mu _x(t)$ , which is a nonnegative, $\mathbb{G}$ -predictable process. See Biffis et al., (Reference Biffis, Denuit and Devolder2010, Section 2) for a detailed mathematical discussion. Furthermore, we assume that $\mu _x(t)$ is modeled as an affine function of an $M$ -dimensional, $\mathbb{F}$ -adapted, latent factor process $X(t)$ . That is, there exist $\rho _0^{(x)} \in \mathbb{R}$ and $\rho _1^{(x)}\in \mathbb{R}^M$ , possibly dependent on the base age $x$ , such that

(3.1) \begin{equation} \mu _x(t) = \rho _0^{(x)} + (\rho _1^{(x)})^\top X(t), \qquad t\geq 0. \end{equation}

The process $X(t)$ is assumed to be a solution of the (vector) stochastic differential equation (SDE)

(3.2) \begin{equation} dX(t) = K(\theta ^P - X(t)) dt + \Sigma D(X(t),t) dW^P(t), \qquad X(0) = x_0 \in \mathbb{R}^M, \end{equation}

where $K\in \mathbb{R}^{M\times M}$ , $\theta ^P\in \mathbb{R}^M$ , and $\Sigma \in \mathbb{R}^{M\times M}$ . We assume that $D(X(t),t)$ is an $M$ -dimensional diagonal matrix with diagonal elements $d_{ii}(X(t),t)$ given by

\begin{equation*}d_{ii}(X(t),t) = \sqrt {\alpha ^i(t) + \beta _1^i(t) X_1(t) + \dots + \beta _M^i(t) X_M(t)}, \qquad i=1,\dots,M,\end{equation*}

where $\alpha ^i$ and $\beta ^i\;:\!=\;(\beta ^i_1,\dots,\beta ^i_M)^\top$ are bounded and continuous functions, and $W^P$ is a standard $M$ -dimensional $ P$ -Brownian motion. The quantities $ K$ , $\theta ^{ P}$ , and $\Sigma$ represent the rate of mean reversion, the long-run mean, and the volatility of $X(t)$ , respectively. We identify $\mathbb{G}$ as the natural filtration generated by $W^P$ , so that $X(t)$ is a $\mathbb{G}$ -adapted process. Since $X(t)$ is also a (right-) continuous process, it follows that $\mu _x(t)$ is a $\mathbb{G}$ -predictable process (Cohen & Elliott, Reference Cohen and Elliott2015, Theorem 7.2.4)

Let $ S^P_x(t,T) \;:\!=\; \mathbb{E}^P[\exp \{-\int _t^T \mu _x(s) ds\} | \mathcal{G}_t]$ denote the (real-world) probability that an individual aged $x$ at time $t$ , conditional on being alive at time $t$ , survives up to time $T$ . Hence, following the affine framework set in Duffie & Kan (Reference Duffie and Kan1996) and Duffie et al. (Reference Duffie, Pan and Singleton2000), $S_x(t,T)$ is an exponentially affine function of $X(t)$ :

\begin{equation*}S_x^{ P}(t,T) = \exp \{A_x^{ P}(t,T) + (B_x^{ P}(t,T))^\top X(t)\},\end{equation*}

where $A_x^{ P}$ and $B_x^{ P}$ are solutions of the system of ordinary differential equations (ODEs)

\begin{align*} \frac{dB_x^{ P}(t,T)}{dt} & = \rho _1^{(x)} +{ K}^\top B_x^{ P}(t,T) - \frac{1}{2}\sum _{k=1}^M [\Sigma ^\top B_x^{ P}(t,T) B_x^{ P}(t,T)^\top \Sigma ]_{k,k}(\beta ^k(t))^\top \\ \frac{dA_x^{ P}(t,T)}{dt} & = \rho _0^{(x)} + B_x^{ P}(t,T)^\top{ K}\theta ^{ P} - \frac{1}{2}\sum _{k=1}^M [\Sigma ^\top B_x^{ P}(t,T) B_x^{ P}(t,T)^\top \Sigma ]_{k,k} \alpha ^k(t) \end{align*}

with terminal condition $A_x^P(T,T) = 0$ and $B_x^P(T,T) = \mathbf{0}$ .

Therefore, the average force of mortality over the period $[t,T]$ , defined as

\begin{equation*} \bar {\mu }_x^{ P}(t,T) \;:\!=\; -\frac {1}{T-t}\log S_x^{ P}(t,T) \end{equation*}

is an affine function of the latent state process $X\left (t\right )$ , i.e.

\begin{equation*}\bar {\mu }_x^{ P}(t,T) = -\frac {A_x^{ P}(t,T)}{T-t} - \frac {B_x^{ P}(t,T)^\top }{T-t} X(t).\end{equation*}

In correspondence to Equation (2.2), $\bar{\mu }_x^{ P}\textbf{}(t,T)$ represents the average force of mortality for an individual aged $x$ at time $t$ from ages $x$ to $x+(T-t)$ .

The affine representation of the average force of mortality allows us to cast the parameter estimation problem into that for a state-space model where $\bar{\mu }_x^{ P}(t,T)$ are the observations and $X(t)$ is the unknown state process.

While the model is developed with age-period mortality data in mind (as in Section 2), the approach here is also naturally cohort-based and can be applied to the analysis of age-cohort mortality data; see e.g. Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).

In the following sections, we discuss how the affine mortality model can also be developed under a pricing measure $Q$ . Then, we discuss the affine mortality models we implement in AffineMortality. For simplicity, we shall specify the mortality models we implement in the package in terms of their dynamics under the pricing measure $Q$ , since certain models call for a specific structure in the stochastic dynamics under $Q$ (see for example the Arbitrage-Free Nelson-Siegel (AFNS) model and its extensions).

3.1 Affine mortality models under the pricing measure

At this stage, we assume that the combined financial and actuarial market is arbitrage-free. Thus, there exists a probability measure $Q$ on $(\Omega,\mathcal{F})$ , equivalent to $P$ , under which the discounted financial security prices are $Q$ -martingales. We define $Q$ via its Radon-Nikodým density process with respect to $P$ ,

\begin{equation*} \mathcal {L}(t) \;:\!=\; \frac {dQ}{dP}\bigg |_{\mathcal {F}_t} \;:\!=\; \exp \Bigg \{-\int _0^t \Lambda _s^\top dW^P(s) - \frac {1}{2} \int _0^t \|\Lambda _s\|^2 ds\Bigg \} \end{equation*}

for some $\mathbb{G}$ -predictable, $\mathbb{R}^M$ -valued process $\Lambda _t$ satisfying suitable integrability conditions (e.g. the Novikov condition) such that $\mathcal{L}(t)$ is a $P$ -martingale. Here, $\|\cdot \|$ denotes the Euclidean norm on $\mathbb{R}^M$ . By the Girsanov Theorem, the process $W^Q(t)$ defined by

\begin{equation*} W^Q(t) \;:\!=\; W^P(t) + \int _0^t \Lambda _s ds, \qquad t \geq 0 \end{equation*}

is a standard Brownian motion under $Q$ . In particular, since $\Lambda _t$ is $\mathbb{G}$ -predictable, by Biffis et al. (Reference Biffis, Denuit and Devolder2010, Proposition 3.2), $\tau$ is still a doubly stochastic stopping time under $Q$ with intensity process $\mu _x(t)$ given by Equation (3.1). As such, we can define the probability that the individual survives up to time $T$ , conditional on being alive at time $t$ , as

\begin{equation*} S_x(t,T) \;:\!=\; \mathbb {E}^Q\bigg [\exp \bigg \{-\int _t^T \mu _x(s) ds\bigg \} \Big | \mathcal {G}_t\bigg ]. \end{equation*}

To ensure the latent factor process $X(t)$ is affine under both $P$ and $Q$ , we further assume that $\Lambda _t$ is given by the essentially affine specification proposed by Duffee (Reference Duffee2002). Applied to our setting, we let

\begin{equation*} \Lambda _t = \begin {cases} \text {diag}\left (\sqrt {X_1(t)},\dots,\sqrt {X_M(t)}\right ) \lambda _0 & \text {for the CIR model} \\[5pt] \lambda _1 X(t) & \text {for all other models}, \end {cases} \end{equation*}

for some $\lambda _1 \in \mathbb{R}^{M\times M}$ and $\lambda _0 \in \mathbb{R}^M$ , which we refer to as the market price of risk parameters. Under this specification, the dynamics of $X(t)$ can be written as

(3.3) \begin{equation} dX(t) = \Delta (\theta ^Q - X(t)) dt + \Sigma D(X(t),t) dW^Q(t), \qquad X(0) = x_0, \end{equation}

where $\Delta \in \mathbb{R}^{M\times M}$ and $\theta ^Q \in \mathbb{R}^M$ , and $\Sigma$ and $D(X(t),t)$ are as in the $P$ -dynamics of $X(t)$ . Furthermore, under this construction, $\Lambda _t$ is indeed $\mathbb{G}$ -predictable.

Since $X(t)$ is also an affine process under $Q$ , the survival probability $S_x(t,T)$ has the exponential affine form

\begin{equation*} S_x(t,T) = \exp \{A_x(t,T) + B_x(t,T)^\top X(t)\}, \end{equation*}

where $A_x$ and $B_x$ are solutions of the system of ODEs

(3.4) \begin{align} \begin{split} \frac{dB_x(t,T)}{dt} & = \rho _1^{(x)} + \Delta ^\top B_x(t,T) - \frac{1}{2}\sum _{k=1}^M [\Sigma ^\top B_x(t,T) B_x(t,T)^\top \Sigma ]_{k,k}(\beta ^k(t))^\top \\[5pt] \frac{dA_x(t,T)}{dt} & = \rho _0^{(x)} + B_x(t,T)^\top \Delta \theta ^Q - \frac{1}{2}\sum _{k=1}^M [\Sigma ^\top B_x(t,T) B_x(t,T)^\top \Sigma ]_{k,k} \alpha ^k(t)\\[5pt] \end{split} \end{align}

with terminal condition $A_x(T,T) = 0$ and $B_x(T,T) = \mathbf{0}$ . As a result, the average force of mortality over the period $[t,T]$ , denoted by $\bar{\mu }_x(t,T)$ , is affine in the latent factor process

\begin{equation*} \bar {\mu }_x(t,T) = -\frac {A_x(t,T)}{T-t} - \frac {B_x(t,T)^\top }{T-t} X(t). \end{equation*}

One key advantage of the essentially affine specification of $\Lambda _t$ is that, given $\lambda _1$ or $\lambda _0$ , one can compute $\Delta$ and $\theta ^Q$ using the estimated values of $K$ and $\theta ^P$ ; see Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023, Section 2.3) for details. This is relevant if one derives the market price of risk parameters from other sources, e.g. empirical evidence on policyholder behavior or other existing calibration methods (see e.g. Bacinello et al., Reference Bacinello, Biffis and Millossovich2010, Section 4.5) and Biffis (Reference Biffis2005, Section 5.4). Furthermore, this also relates to the fact that the combined financial and actuarial market is incomplete, so there are infinitely many equivalent martingale measures $Q \sim P$ . Thus, there exist multiple values of $\lambda _0$ or $\lambda _1$ which are consistent with the above (essentially affine) framework and such that the process $\Lambda _t$ remains $\mathbb{G}$ -predictable. However, we do not discuss these points in detail as they fall outside the scope of the current paper. In the estimation method we employ in the package, the parameters under $P$ and $Q$ are simultaneously estimated from the data.

Another advantage is the flexibility of the choice of $\lambda _1$ or $\lambda _0$ . This implies that we are free to choose any desired structure for $K$ and $\theta ^P$ while retaining the required special structures for $\Delta$ and $\theta ^Q$ . This is relevant, especially for the AFNS model and its extensions, where the interpretation of the latent factors as level, slope, and curvature factors are directly related to the specification of $\Delta$ . To this end, we assume that $\theta ^P = 0$ for all models considered, except the CIR model, and assume that $K$ is a diagonal matrix with $K = \text{diag}(\kappa _1,\dots,\kappa _M)$ for all models.

In the sections that follow, we state the dynamics of the latent factor process under the pricing measure $Q$ for each model we consider in the package. For each model, the factor loadings $A_x(t,T)$ and $B_x(t,T)$ are available in closed form and are functions of $t$ and $T$ only via $T-t$ . For all models, except the Gompertz-Makeham model, the mortality intensity does not depend on the individual current age $x$ , so we drop this from the notation and write $\mu (t)$ .

3.2 Blackburn-Sherris model

The BS model assumes that

\begin{equation*}\mu (t) = X_1(t) + \dots + X_M(t),\end{equation*}

i.e. $\rho _0 = 0$ and $\rho _1 = (1,\dots,1)^\top$ , where $X(t) = (X_1(t),\dots,X_M(t))^\top$ with dynamics

(3.5) \begin{equation} dX(t) = -\Delta X(t) dt + \Sigma dW^Q(t). \end{equation}

The components of $X(t)$ can be assumed to be independent by specifying $\Delta$ and $\Sigma$ as diagonal matrices, hence $\Delta = \text{diag}(\delta _1,\dots,\delta _M)$ and $\Sigma = \text{diag}(\sigma _1,\dots,\sigma _M)$ . Dependence among the components of $X(t)$ can be induced by setting

\begin{equation*}\Delta = \begin {pmatrix} \delta _1 & 0 & 0 & \dots & 0\\[5pt] \delta _{12} & \delta _2 & 0 & \dots & 0 \\[5pt] \vdots & \vdots & \vdots & \ddots & \vdots \\[5pt] \delta _{1M} & \delta _{2M} & \delta _{3M} & \dots & \delta _M \end {pmatrix}, \quad \Sigma = \begin {pmatrix} \sigma _1 & 0 & 0 & \dots & 0\\[5pt] \sigma _{12} & \sigma _2 & 0 & \dots & 0 \\[5pt] \vdots & \vdots & \vdots & \ddots & \vdots \\[5pt] \sigma _{1M} & \sigma _{2M} & \sigma _{3M} & \dots & \sigma _M \end {pmatrix}.\end{equation*}

In AffineMortality, the user can specify as many latent factors as desired in the independent factor case. The factor loading expressions for the independent factor case can be found in Blackburn & Sherris (Reference Blackburn and Sherris2013). However, while the factor loadings are still available in closed form in the dependent factor case, one cannot obtain a scalable nested expression for the factor loadings. AffineMortality currently supports the two and the three-factor BS models with dependent factors. The factor loading expressions for the dependent case can be found in Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022, Appendix A).

3.3 Gompertz-Makeham model

The Gompertz-Makeham model (see Schrager, Reference Schrager2006) introduces age-dependence in the mortality intensity process. Specifically, the model assumes that $X(t)$ has the same dynamics outlined in Equation (3.5), and

\begin{equation*}\mu _x(t) = X_1(t) + e^{\gamma x} X_2(t)\end{equation*}

for some $\gamma \gt 0$ . The functional specification of $\mu _x(t)$ implies that the mortality intensity increases exponentially in the base age $x$ and it is $X_2(t)$ , which drives the stochasticity in the intensity process at older ages. In this case, we have $\rho _0^{(x)} = \rho _0 = 0$ and $\rho _1^{(x)} = (1, e^{\gamma x})\top$ . As before, dependence between the latent factors can be introduced by replacing $\Delta$ and $\Sigma$ by lower-triangular matrices. The factor loading expressions for both the independent and dependent factor cases can be found in Appendix B.1.

3.4 Arbitrage-free Nelson-Siegel model

The Arbitrage-free Nelson-Siegel (AFNS) model, proposed by Christensen et al. (Reference Christensen, Diebold and Rudebusch2011) for modeling the term structure of interest rates, assumes that there are three latent factors, identified as Level ( $L$ ), Slope ( $S$ ), and Curvature ( $C$ ), with $Q$ -dynamics:

(3.6) \begin{align} \begin{pmatrix} dL(t) \\[5pt] dS(t) \\[5pt] dC(t) \end{pmatrix} = - \begin{pmatrix} 0 & 0 & 0 \\[5pt] 0 & \delta & -\delta \\[5pt] 0 & 0 & \delta \end{pmatrix} \begin{pmatrix} L(t) \\[5pt] S(t) \\[5pt] C(t) \end{pmatrix} dt + \begin{pmatrix} \sigma _{L} & 0 & 0 \\[5pt] 0 & \sigma _{S} & 0 \\[5pt] 0 & 0 & \sigma _{C} \end{pmatrix} \begin{pmatrix} dW_L^Q(t) \\[5pt] dW_S^Q(t) \\[5pt] dW_C^Q(t) \end{pmatrix}. \end{align}

The AFNS model assumes that the mortality intensity is the sum of the level and slope factors

\begin{equation*}\mu (t) = L(t) + S(t).\end{equation*}

The structure of $\Delta$ implies that the factor loadings $B_L$ , $B_S$ , and $B_C$ control for the shape (i.e. level, slope, and curvature, respectively) of the average force of mortality, with randomness driven by the dynamics of $L$ , $S$ , and $C$ ; see Christensen et al., (Reference Christensen, Diebold and Rudebusch2011, Proposition 1 and Section 2.3) for the closed-form expressions for the factor loadings.

A key feature of the AFNS model is the connection between the latent factors and the shape of the average force of mortality curve through the structure of $\Delta$ . We can induce factor dependence through the diffusion matrix $\Sigma$ . Specifically, we replace $\Sigma$ with a lower-triangular matrix in the dependent factor case. As such, $B_L$ , $B_S$ , and $B_C$ remain the same in the dependent factor case; see Christensen et al. (Reference Christensen, Diebold and Rudebusch2011, Appendix B) for the formula of $A(t,T)$ .

3.4.1 Arbitrage-free generalized Nelson-Siegel model

The Arbitrage-Free Generalized Nelson-Siegel (AFGNS) model is an extension of the AFNS model proposed by Christensen et al. (Reference Christensen, Diebold and Rudebusch2009), which includes an additional slope and curvature factor. The AFGNS model was proposed as an arbitrage-free version of the four-factor Nelson-Siegel-Svensson model, which extends the AFNS model by adding a second curvature factor. The AFGNS model is thus a five-factor model whose latent factors, in case of independence, satisfy the SDE

\begin{align*} \begin{pmatrix} \textrm{d}L \left (t\right ) \\[5pt] \textrm{d}S_1 \left (t\right ) \\[5pt] \textrm{d}S_2 \left (t\right ) \\[5pt] \textrm{d}C_1 \left (t\right ) \\[5pt] \textrm{d}C_2 \left (t\right ) \end{pmatrix} & = - \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\[5pt] 0 & \delta _1 & 0 & -\delta _1 & 0 \\[5pt] 0 & 0 & \delta _2 & 0 & -\delta _2 \\[5pt] 0 & 0 & 0 & \delta _1 & 0 \\[5pt] 0 & 0 & 0 & 0 & \delta _2 \end{pmatrix} \begin{pmatrix} L \left (t\right ) \\[5pt] S_1 \left (t\right ) \\[5pt] S_2 \left (t\right ) \\[5pt] C_1 \left (t\right ) \\[5pt] C_2 \left (t\right ) \end{pmatrix} \textrm{d}t \\[5pt] & \qquad + \begin{pmatrix} \sigma _{L} & 0 & 0 & 0 & 0 \\[5pt] 0 & \sigma _{S_1} & 0 & 0 & 0 \\[5pt] 0 & 0 & \sigma _{S_2} & 0 & 0 \\[5pt] 0 & 0 & 0 & \sigma _{C_1} & 0 \\[5pt] 0 & 0 & 0 & 0 & \sigma _{C_2} \end{pmatrix} \begin{pmatrix} \textrm{d}W_L^Q \left (t\right ) \\[5pt] \textrm{d}W_{S_1}^Q \left (t\right ) \\[5pt] \textrm{d}W_{S_2}^Q \left (t\right ) \\[5pt] \textrm{d}W_{C_1}^Q \left (t\right ) \\[5pt] \textrm{d}W_{C_2}^Q \left (t\right ) \end{pmatrix}, \end{align*}

where $\delta _1 \neq \delta _2$ . As in the AFNS model, the dependent factor case consists of replacing a diagonal diffusion matrix with a lower triangular one. As in the AFNS model, under the AFGNS model, the mortality intensity is modeled as the sum of the level and the two slope factors

\begin{equation*}\mu (t) = L(t) + S_1(t) + S_2(t).\end{equation*}

Factor loading expressions for the independent factor case can be found in Christensen et al. (Reference Christensen, Diebold and Rudebusch2009, Proposition 3.1). In the dependent factor case, only the form of $A(t,T)$ changes; this can be found in Christensen et al. (Reference Christensen, Diebold and Rudebusch2009, Appendix).

3.4.2 Arbitrage-free reduced Nelson-Siegel model

We introduce a version of the AFNS model without the curvature factor and call it the Arbitrage-Free Reduced Nelson-Siegel (AFRNS) model. This model has been included in the package in order to assess the effect of the curvature on the resulting model. Experiments we conducted on the mortality data of several countries showed that the presence of the curvature factor may produce negative mortality rates in some cases when projected 25 years ahead, despite the better in-sample performance of the AFNS and AFGNS models.

We consider two latent factors, with dynamics

\begin{equation*} \begin {pmatrix} dL(t) \\[5pt] dS(t) \end {pmatrix} = - \begin {pmatrix} 0 & 0 \\[5pt] 0 & \delta \end {pmatrix} \begin {pmatrix} L(t) \\[5pt] S(t) \end {pmatrix} dt + \begin {pmatrix} \sigma _{L} & 0 \\[5pt] 0 & \sigma _{S} \end {pmatrix} \begin {pmatrix} dW_L^Q(t) \\[5pt] dW_S^Q(t) \end {pmatrix}, \end{equation*}

for the independent factor case. As before, we replace the volatility matrix with a lower-triangular matrix in the dependent factor case. The resulting factor loadings for both the independent and dependent factor cases can be found in Appendix B.2.

3.4.3 Arbitrage-free unrestricted Nelson-Siegel model

We also introduce a variation of the AFNS model where the elements of the drift coefficient matrix $\Delta$ have possibly unequal values. We call this model the Arbitrage-Free Unrestricted Nelson-Siegel (AFUNS) model. The latent factor dynamics under the AFUNS model are given by

(3.7) \begin{equation} \begin{pmatrix} \textrm{d}L(t) \\[5pt] \textrm{d}S(t) \\[5pt] \textrm{d}C(t) \end{pmatrix} = - \begin{pmatrix} 0 & 0 & 0 \\[5pt] 0 & \delta _1 & \delta _2 \\[5pt] 0 & 0 & \delta _3 \end{pmatrix} \begin{pmatrix} L(t) \\[5pt] S(t) \\[5pt] C(t) \end{pmatrix} \textrm{d}t + \begin{pmatrix} \sigma _L & 0 & 0 \\[5pt] 0 & \sigma _S & 0 \\[5pt] 0 & 0 & \sigma _C \end{pmatrix} \begin{pmatrix} \textrm{d}W_L^Q(t) \\[5pt] \textrm{d}W_S^Q(t) \\[5pt] \textrm{d}W_C^Q(t) \end{pmatrix}. \end{equation}

As before, the mortality intensity is modeled as the sum of the level and slope factors

\begin{equation*}\mu (t) = L(t) + S(t).\end{equation*}

We recover the AFNS model by setting $\delta _2 = -\delta _1$ and in the limit as $\delta _3\to \delta _1$ . The dependent factor case is obtained by replacing the volatility matrix by a lower-triangular matrix.

3.5 Cox–Ingersoll–Ross model

Under the CIR model, the mortality intensity is modeled as the sum of the components of the latent factor process $X(t)$ ,

\begin{equation*}\mu (t) = X_1(t) + \dots + X_M(t),\end{equation*}

where each $X_i(t)$ is a square-root diffusion process given by

\begin{equation*}dX_i(t) = \delta _i (\theta _i^Q - X_i(t)) dt + \sigma _i \sqrt {X_i(t)} dW_i^Q(t).\end{equation*}

This implies that each component of $X(t)$ is nonnegative $Q$ -almost surely and is strictly positive $Q$ -almost surely if $X_i(0) \gt 0$ and $2\delta _i \theta _i^Q \geq \sigma _i^2$ . Each $X_i(t)$ is asymptotically gamma distributed (as $t\to \infty )$ (Cox et al., Reference Cox, Ingersoll and Ross1985), hence the CIR mortality model can capture the heterogeneity of mortality rates at older ages (Pitacco, Reference Pitacco2016). The factor loadings for the CIR model are given by

\begin{align*} B_i(t,T) & = - \frac{2(e^{\vartheta _i(T-t)} - 1)}{(\delta _i + \vartheta _i)(e^{\vartheta _i(T-t)} - 1) + 2 \vartheta _i}, \qquad i=1,2,\dots,M\\[5pt] A(t,T) & = \sum _{i=1}^M \frac{2\delta _i \theta _i^Q}{\sigma _i^2} \log \left [\frac{2 \vartheta _i e^{\frac{1}{2} (\delta _i + \vartheta _i) (T-t)}}{(\delta _i + \vartheta _i)(e^{\vartheta _i(T-t)} - 1) + 2\vartheta _i}\right ], \end{align*}

where $\vartheta _i = \sqrt{\delta _i^2 + 2\sigma _i^2}$ . Their application in the analysis of Human Mortality Database (HMD) national mortality data can be found in Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022) and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).

4. Parameter estimation

The model parameters are estimated using data collected in discrete time, as illustrated in Section 2. In the remainder of the paper, we use the real-world probability measure to characterize the dynamics of $X(t)$ . By discretizing the stochastic differential equation of each model, and using the affine representation of the average force of mortality, we obtain the following equally time-spaced state-space formulation:

(4.1) \begin{align} X \left (t\right ) &= \Phi _{t} X \left (t-j\right ) + \eta _{t} \quad \quad \eta _{t}\sim N\left (0,R_t \right ) \end{align}
(4.2) \begin{align} \bar{\mu }_{t} &= A_{t} + B_{t} X \left (t\right ) + \epsilon _{t} \quad \quad \epsilon _{t} \sim N\left (0,H \right ) \quad \quad t=1,\ldots,T \end{align}

where $A_{t} = \left [A \left (t,t+1\right ),\ldots, A \left (t,t+N\right )\right ]^\top$ and $B_{t} = \left [B \left (t,t+1\right ),\ldots, B \left (t,t+N\right )\right ]^\top$ . The age subscript has been omitted for notational convenience.

The state equation (4.1) describes the dynamics of the factor as an autoregressive process of order 1 with system matrix $\Phi _{t} =e^{-\boldsymbol{\kappa } j}$ and stochastic noise $\eta _{t}\sim N\left (0,R_t \right )$ . The measurement equation (4.2) describes $\bar{\mu }_{t} \in \mathbb{R}^{N}$ as an affine function of the latent variable $X\left (t\right )$ with error term $\epsilon _{t} \sim N\left (0,H \right )$ . We assume that $\eta _{t}$ and $\epsilon _{t}$ are independently distributed.

For the models so far implemented in AffineMortality, $A$ , $B$ , $H$ , and $\Phi =e^{-\boldsymbol{\kappa } j}$ do not depend on $t$ . For Gaussian models, such as the BS and the AFNS models, we have $R_t=R$ :

(4.3) \begin{align} R &= \left [I - e^{-\boldsymbol{\kappa }j} \right ] \Sigma \Sigma ^\top \left [I - e^{-\boldsymbol{\kappa }j} \right ]^\top . \end{align}

For the CIR model, because of the independence between the factors, $R_t$ is a diagonal matrix with $k$ th diagonal $r_{t,k}$ equal to:

\begin{align*} r_{t,k} &= \sigma ^2 \left (\frac{1 - e^{-\kappa _k}}{\kappa _k} \right ) \left ( \frac{1}{2} \theta ^P \left (1 - e^{-\kappa _k} \right ) + e^{-\kappa _k} X \left (t\right ) \right ). \end{align*}

Here, $\Phi$ , $A$ , $B$ , $R_t$ , and $H$ depend on the parameters that we estimate based on the statistical inference for the dynamics of the mortality rates. Furthermore, $H$ is a diagonal matrix, where the diagonal elements $\omega ^2_i$ are equal to

(4.4) \begin{align} \omega ^2_i = r_c + r_1 \displaystyle \sum _{k=1}^i \exp \left (r_2 k \right )/ i \end{align}

for $i=1,\ldots,N$ . In this way, the measurement Equation (4.2) accounts for the increasing variation in the mortality rates at older ages.

The parameters are estimated using maximum likelihood. Let $\psi$ denote the vector of parameters to be estimated. The likelihood function is readily obtained from the univariate Kalman Filter recursion (see Koopman & Durbin, Reference Koopman and Durbin2000 and Ungolo et al., Reference Ungolo, Garces, Sherris and Zhou2023 for its implementation in the context of affine mortality models), given the observed average mortality rates $\bar{\mu }_{1:T}$ :

(4.5) \begin{align} \log L \left (\psi \mid \bar{\mu }_{1:T} \right ) = -\frac{TN}{2} \log 2\pi -\frac{1}{2} \displaystyle \sum _{t=1}^T \displaystyle \sum _{i=1}^N \left ( \log F_{t,i} + \nu _{t,i}^2 F^{-1}_{t,i} \right ), \end{align}

where $\nu _{t,i} = \bar{\mu }_{t,i} - a_i - b_{i} \widehat{x}_{t,i}$ is the measurement error, and $F_{t,i} = b_i \widehat{\Sigma }_{t,i} b_i^\top + \omega ^2_i$ is the covariance of $\bar{\mu }_{t,i}$ (denoting $\mu _{x}\left (t,t+i\right )$ ) taking into account the uncertainty about the latent state $X\left (t\right )$ . Here, $a_i$ denotes the $i$ th element of $A_t$ , $b_i$ the corresponding row of the matrix $B_t$ , and $\widehat{x}_{t,i}$ and $\widehat{\Sigma }_{t,i}$ are the univariate Kalman Filter updates of the moments of $X\left (t\right )$ (see Appendix A for additional details).

For the CIR mortality model, the estimated parameter vector $\widehat{\psi }$ corresponds to the quasi-maximum likelihood estimator. See Chen & Scott (Reference Chen and Scott2003) and Jevtić & Regis (Reference Jevtić and Regis2021) for additional details.

4.1 Implementation

The parameter estimation task includes the initial state variable $X\left (0\right )$ among the set of unknown parameters. Other numerical tricks to foster reasonable parameter estimates, e.g. ensuring the positive-definiteness of the covariance matrix, are described in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).

We recommend the use of multiple starting values due to the high non-linearity of the log-likelihood function, which may have multiple local maxima. Some initial values are provided in AffineMortality through the list object sv_default, which we briefly illustrate in Section 8. These are based on previous analyses of country mortality rates. When fitting dependent factor models, we recommend the use as starting values of the parameter estimates obtained from the correspondent independent factor models.

4.2 affine_fit()

The function affine_fit() of the package AffineMortality allows us to carry out the parameter estimation task. The log-likelihood function of Equation (4.5) is optimized sequentially by group of parameters (Coordinate Ascent) through the gradient-free simplex Nelder-Mead method as recommended by Christensen et al. (Reference Christensen, Diebold and Rudebusch2011). This routine is readily available in R within the function optim.

The function affine_fit() takes the following arguments as input:

  • model = c(“BS”, “AFNS”, “AFGNS”, “AFUNS”, “AFRNS”, “CIR”, “GMk”), to select one of the model families to be fitted. Its default value is BS;

  • fact_dep=c(FALSE, TRUE), to select whether the model accounts for factor dependence (default FALSE);

  • n_factors, to select the number of factors (only for the BS and the CIR models; default set to 3);

  • data, the rectangular data set of $\bar{\mu }$ rates used for the analysis;

  • st_val, corresponding to the set of starting values for the parameters. These must be supplied as a list of parameters, e.g.

    st_val=list(x0=c(6.960591e-03, 9.017154e-03, 5.091784e-03),

    delta=c(0.04268782, −0.03122758, −0.08573677),

    kappa=c(1.162624e-02, 6.787268e-02, 5.061539e-03),

    sigma=exp(c(−6.806310, −6.790270, −7.559145)),

    r1=exp(−3.327060e + 01), r2=exp(−6.086479e-01),

    rc=exp(−1.553156e + 01))

    for the BS model with three independent factors. For dependent factor models, we instead supply sigma_dg, which is the parameter denoting the standard deviation, and Sigma_cov indicating the elements of the off-diagonal elements of the covariance matrix (generally a vector of zero, as suggested in Section 4.1);

  • max_iter: maximum number of iterations for the coordinate ascent algorithm (default 200);

  • tolerance: maximum log-likelihood value increase between iterations such that the optimizer can stop (default 0.1);

  • wd: working directory to save the intermediate values of the parameters throughout iterations;

This function returns a list with

  • model: same as input;

  • fit: list with:

    1. par_est: list of parameter estimates;

    2. log_lik: value of the log-likelihood function;

    3. CA_par: table listing the value of the parameters throughout the coordinate ascent algorithm iterations;

  • n.parameters: total number of estimated parameters;

  • AIC: value of Akaike information criterion (see Section 5);

  • BIC: value of Bayesian information criterion;

Using the pre-loaded age-cohort US data set, we can run the function affine_fit() as follows:

data(mu_bar) # - Load the US data set

affine_fit(model=“BS”, fact_dep=FALSE, n_factors=3, data=mu_bar,

st_val=st_val, max_iter=200, tolerance=0.1)

During its execution, the console shows the value of the parameters, the log-likelihood function, and the iteration number:

[1] “X(0)_1 0.005” “X(0)_2 0.006” “X(0)_3 0.004”

[1] “delta_1 0.053” “delta_2 -0.018” “delta_3 -0.087”

[1] “kappa_1 0.032” “kappa_2 0.007” “kappa_3 -0.003”

[1] “sigma_1 0.001” “sigma_2 0.001” “sigma_3 0”

[1] “r1 0” “r2 0.548” “rc 0”

[1] “log_lik 10538.37”

[1] “Iteration 1”

[1] “———————————”

[1] “X(0)_1 0” “X(0)_2 0.007” “X(0)_3 0.007”

[1] “delta_1 0.043” “delta_2 -0.022” “delta_3 -0.086”

[1] “kappa_1 0.017” “kappa_2 0.002” “kappa_3 0.01”

[1] “sigma_1 0.001” “sigma_2 0.001” “sigma_3 0”

[1] “r1 0” “r2 0.556” “rc 0”

[1] “log_lik 10623.92”

[1] “Iteration 2”

5. Goodness of fit

The fitted rates, denoted as $\widehat{\bar{\mu }}_t$ (which are $N$ -dimensional vectors, for $t=1,\ldots,K$ ), of use for the analysis of the goodness of fit of each model with respect to historical data, can be obtained using the function mubar_hat(), which takes the following arguments as input: model, fact_dep, n_factors, parameters, data. Again, the input parameters are supplied as a list, similar to the starting values of affine_fit(). The resulting fitted rates can then be used to analyze the goodness of fit of each model with respect to historical data.

5.1 Numerical measures

AffineMortality considers four goodness-of-fit measures, following Blackburn & Sherris (Reference Blackburn and Sherris2013) and Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022):

  • Akaike information criterion (AIC, Akaike, Reference Akaike1974):

    (5.1) \begin{align} \text{AIC} = - 2 \log L \left (\widehat{\psi } \mid \bar{\mu }_{1:K} \right ) + 2 k \end{align}
  • Bayesian information criterion (BIC, Schwarz, Reference Schwarz1978):

    (5.2) \begin{align} \text{BIC} = - 2 \log L\left (\widehat{\psi } \mid \bar{\mu }_{1:K} \right ) + 2 k KN \end{align}
  • Root mean squared error (RMSE):

    (5.3) \begin{align} \text{RMSE} = \frac{1}{KN} \displaystyle \sum _x \displaystyle \sum _t \left ( \bar{\mu }_{x,t} - \widehat{\bar{\mu }}_{x,t} \right )^2 \end{align}
  • Mean absolute percentage error (MAPE, by age $x$ ):

    (5.4) \begin{align} \text{MAPE}_x = \frac{1}{K} \displaystyle \sum _{t=1}^K \frac{|\bar{\mu }_{x,t} - \widehat{\bar{\mu }}_{x,t}|}{\bar{\mu }_{x,t}} \end{align}

where $k$ is the number of parameters, $t=1,\ldots,K$ , and $N$ is the number of ages considered in the analysis.

The AIC and BIC can be obtained as the output of affine_fit() (see Section 4.2). The RMSE is obtained by running RMSE(fitted, observed), where fitted is the $N \times K$ -dimensional matrix of the fitted rates obtained using the function mubar_hat(), and observed is the corresponding matrix of the $\bar{\mu }$ -rates used for parameter estimation. The function MAPE_row(fitted, observed) yields an $N$ -dimensional vector, which can be used to assess how the model fits at every age in the range of interest.

Figure 1 Heatmap of the standardized residuals for the Blackburn-Sherris model with three dependent factors. Source: Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).

A desirable characteristic of affine mortality models is that their parameters ensure that the probability of negative rates is negligible. This is a potential limitation of Gaussian models, since $X(t)$ can assume any real value.

For this purpose, the function prob_neg_mu() yields an $N$ -dimensional vector with the probability of negative mortality rates at each age (based on the input data set) for a specific $h$ -year ahead projection, based on the simulated values of $X(t)$ . prob_neg_mu() takes as input model, fact_dep, n_factors, parameters, data, years_proj and n_simulations (default value set to 100,000).

5.2 Residuals

The function std_res() returns an $N\times K$ -dimensional matrix of the standardized residuals for the model of interest. These are computed as $N$ -dimensional vectors for each year $t=1,\ldots,K$ as:

(5.5) \begin{align} r_{t} = \left (\sqrt{\widehat{\mathbb{V}\left (\bar{\mu }_{t} \right )}}\right )^{-1} \left (\bar{\mu }_{t} - \widehat{\bar{\mu }}_{t}\right ), \end{align}

Further details about this formula can be found in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023). A heatmap of the standardized residuals by age and year (Fig. 1) can be generated by using the function heatmap_res():

std_resid <- std_res(model=“BS”, fact_dep=TRUE, n_factors=3,

parameters=pe_BSd_3F$fit$par_est, data=mu_bar)

heatmap_res(residuals=std_resid, color=FALSE)

In this way, we can visually detect the presence of period effects (if analyzing age-cohort data) or of cohort effects (if analyzing age-period data).

6. Projection

The function affine_project() returns the projected survival curve for $h$ time periods ahead (cohort or calendar year, depending on the analyzed data set), based on the optimal forecast of the average force of mortality under the quadratic loss (Christensen et al., Reference Christensen, Diebold and Rudebusch2011):

(6.1) \begin{align} \bar{\mu }_{x} \left (t+h,T+h\right ) = -\frac{B_x\left (t,T \right )}{T - t} \mathbb{E} \left [X \left (t+h \right ) \mid X \left (t \right ) \right ] - \frac{A_x\left (t,T \right )}{T - t}, \end{align}

The corresponding survival probability is given by

(6.2) \begin{align} S_{x} \left (t+h,T+h\right ) = \exp \left (B_x\left (t,T \right )^\top \mathbb{E} \left [X \left (t+h \right ) \mid X\left (t \right ) \right ] + A_x\left (t,T \right )\right ), \end{align}

where the expectation is taken with respect to the real-world probability measure $P$ using equation (4.1). The function affine_project() is illustrated in Section 8 when analyzing the BS model with three dependent factors for the US dataset. Its input structure is similar to affine_fit(), with the additional argument years_proj corresponding to the $h$ time periods ahead for the projection.

7. Parameter uncertainty

A further source of risk when projecting cohort survival curves is the uncertainty inherited from the parameter estimation process.

The function par_cov(), returns the variance-covariance matrix of the parameter estimates and their corresponding standard errors. It allows the user to choose between two methods for estimating parameter uncertainty, namely multiple imputation and bootstrap.

The first method, described in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), can be chosen by setting method=“MI” within par_cov(). It is recommended for Gaussian affine mortality models. Briefly, it consists of a procedure that randomly imputes a value of the latent state variable $X\left (t\right )$ sampled from the smoothing distribution at the value of the parameter estimates. In this way, we obtain a set of “completed” data sets such that parameters are then re-estimated. The number of completed data sets can be specified through the argument D_se. This method turns out useful, because, on one hand, it may not be possible to numerically compute the information matrix from the optimization process of the likelihood function due to its very flat surface. On the other hand, the alternative bootstrap method (briefly described later) may be computationally expensive if carried out hundreds of times, as recommended in practice. The downside of multiple imputation is that, unlike the bootstrap, it may tend to underestimate the standard errors since it is a delta method. From a computational perspective, a potential downside is the need to invert a Hessian matrix of larger dimensions, although this task is simpler compared to the inversion of the Hessian matrix from the estimation procedure (whose likelihood is marginalized with respect to the latent states). This method is not recommended, nor implemented for the CIR model, due to the lower truncation of the latent variable $X\left (t\right )$ .

When parameter uncertainty is assessed by multiple imputation, the function par_cov returns a list with two elements: Cov_par_est, the variance-covariance matrix of the parameters, and St_err, which is a list of the standard error of the parameters.

The bootstrap method draws on the work of Stoffer & Wall (Reference Stoffer and Wall2009), and was used by Blackburn & Sherris (Reference Blackburn and Sherris2013) in the context of affine mortality models. It can be implemented in AffineMortality by specifying the argument method=“Bootstrap” in the function par_cov(). In few words, this method consists of an iterative procedure that first computes the standardized innovations from the measurement error, in order to obtain a bootstrapped dataset of average mortality rates. These are used to obtain a new set of parameter estimates. Hence, once n_BS parameter estimates are obtained, their variance-covariance is computed. In this case, the function par_cov() provides an additional element given by the parameter estimates over the bootstrapped samples. The argument t_excl (default value set to 4 as suggested in Stoffer & Wall, Reference Stoffer and Wall2009) sets the number of the oldest residuals in terms of $t$ to adjust for the Kalman Filter startup irregularities.

Both multiple imputation and bootstrap algorithms are initialized at the parameter estimates in order to better explore the likelihood surface in the neighborhood of $\widehat{\psi }$ . The use of both methods is illustrated in Section 8.

We recommend using the bootstrap for models with a large number of parameters, such as the AFGNS, and for the CIR model with any number of factors.

8. Illustration

AffineMortality provides the data set of the average age-cohort mortality rates for the US males aged 50–99 born in the years 1883–1915 analyzed in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), which can be loaded as follows:

data(mu_bar)

This dataset will be used for illustrating the analysis of the BS model with three dependent factors.

For this age-cohort dataset, the cell at the intersection of the row corresponding to age $y$ and of the column corresponding to birth year $t$ contains the average of the mortality rates for the corresponding cohort between the base age 50 and $y$ .

Parameter estimation

A critical aspect of the analysis of affine mortality models is the specification of the starting values for the algorithm. As emphasized in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023) and Blackburn & Sherris (Reference Blackburn and Sherris2013), the likelihood function can have multiple local optima, hence the fitting algorithm should be initialized several times. In AffineMortality, we provide a set of starting values (sv_default in an R list format), which can be used by the researcher for a first exploration of the models.

For example, to get the default starting values for the BS model with dependent factors, we can run the following code:

starting_values <- sv_default$BSd

For the other supported models, we can use sv_default$

  • BSi for the BS model up to four factors;

  • AFNSi and AFNSd for the AFNS model with independent and dependent factors, respectively;

  • GMki and GMkd for the Gompertz-Makeham model with independent and dependent factors, respectively;

  • AFGNSi and AFGNSd for the AFGNS model with independent and dependent factors, respectively. Similar for the AFRNS and AFUNS models;

  • CIR for the CIR model up to four factors;

As highlighted in Section 4.1, a general recommendation when analyzing models with dependent factors is to initialize affine_fit() with the parameter estimates of the corresponding independent factor models, and to set the starting values of the additional parameters (such as the off-diagonal elements of the covariance matrix of $X(t)$ and of the mean reversion matrix $\Delta$ ) to zero.

We can thus estimate the parameters with affine_fit() as follows:

pe_BSd_3F <- affine_fit(model=“BS”, fact_dep=TRUE, n_factors = 3,

data=mu_bar, st_val=starting_values, max_iter = 5, tolerance = 0.1,

wd=“working_folder_directory”)

In practice, we run the fitting process for a larger number of iterations. For example, in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), the authors set max_iter = 200.

Goodness-of-fit analysis

Once we obtain the object pe_BSd_3F, we run the following lines to obtain the AIC and the BIC

>pe_BSd_3F$AIC

-21405.58

>pe_BSd_3F$BIC

-21292

This output can be used to make comparisons between different models. Suppose we want to compare it with the AFNS model with independent factor, then we run the following lines:

pe_AFNSi <- affine_fit(model=“AFNS”, fact_dep=FALSE, st_val=sv_default$AFNSi,

data=mu_bar, max_iter = 5, tolerance = 0.1)

>pe_AFNSi$AIC

-20842.76

>pe_AFNSi$BIC

-20772.45

Since the BS model with dependent factors has a smaller value of both AIC and BIC compared to the AFNS model with independent factors, then we conclude that the former shows a better in-sample fit.

The fitted average mortality rates can be then obtained by using mubar_hat()

fitted_BSd <- mubar_hat(model=“BS”, fact_dep=TRUE, n_factors = 3,

parameters=pe_BSd_3F$fit$par_est, data=mu_bar)

The fitted average mortality rates obtained using mubar_hat() can be used to calculate the RMSE and the MAPE for each age in the range of interest:

>RMSE(mu_bar, fitted_BSd)

[1] 0.002011956

MAPE_age(mu_bar, fitted_BSd)

which can be similarly used for comparing the models.

Furthermore, we can obtain the fitted $\mu$ -rates by using the function avg2rates() described in Section 2:

avg2rates(fitted_BSd)

In order to ensure that the probability of negative rates for the projected 10-years ahead cohort is negligible for the model under analysis, we run the function prob_neg_mu()

>prob_neg_mu(model=“BS”, fact_dep=TRUE, n_factors = 3,

parameters=pe_BSd_3F$fit$par_est, data=mu_bar, years_proj = 10,

n_simulations = 1000)

[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This shows vector of zeros indicating that the fitted model has zero probability of yielding negative rates for the cohort born in 1925.

Projection

The vector of the 1-year ahead (hence for the US males born in 1916) survival rates for the age-range of interest can be obtained by using the affine_project() function and then plotted (Fig. 2) as follows:

BSd_3F_proj <- affine_project(model=“BS”, fact_dep=TRUE, n_factors = 3,

parameters=pe_BSd_3F$fit$par_est, data=mu_bar, years_proj = 1)

plot(rownames(mu_bar), BSd_3F_proj, type=“l”, ylab = “S(t)”, xlab = “Age”)

Figure 2 1-year ahead projected survival curve for the Blackburn-Sherris model with three dependent factors.

Parameter uncertainty

The estimation of the parameter uncertainty by multiple imputation through the function par_cov() described in Section 7 can be carried out as follows:

par_unc_MI <- par_cov(method=“MI”, model=“BS”, fact_dep=TRUE, n_factors = 3,

parameters=pe_BSd_3F$fit$par_est, data=mu_bar, D_se = 5, max_iter = 10,

tolerance = 0.1, wd = 0)

We recommend setting the number of imputations D_se at least equal to 50 to obtain a more robust estimate of the parameter uncertainty. In this illustration, we have the following set of standard errors of the parameters as an R list object:

>par_unc_MI$St_err

$delta

[1] 0.0038258337 0.0738215285 0.0024289977 0.0529585515 0.0020585681 0.0004005845

$kappa

[1] 0.041057554 0.010218757 0.004028687

$sigma_dg

[1] 1.850683e-06 5.515050e-05 4.927546e-05

$Sigma_cov

[1] 0.000446849 0.015596337 0.005928911

$r1

[1] 8.016628e-16

$r2

[1] 0.005309632

$rc

[1] 1.330768e-09

Similarly, we can also estimate the parameter uncertainty by using the bootstrap method:

par_unc_Bts <- par_cov(method=“Bootstrap”, model=“BS”, fact_dep=TRUE,

n_factors = 3, parameters=pe_BSd_3F$fit$par_est, data=mu_bar, t_excl = 4,

BS_s = 5, max_iter = 3, tolerance = 10, wd = 0)

When using the bootstrap in practice, Blackburn & Sherris (Reference Blackburn and Sherris2013) set the number of bootstrap samples BS_s equal to 500, while the default value of the number of residuals to be excluded t_excl = 4 follows from Stoffer & Wall (Reference Stoffer and Wall2009). As for the function affine_fit(), the arguments max_iter and tolerance are usually set to 200 and 0.1, respectively, although the user can set any desired value.

9. Other functions

The function xfilter has the same input structure as mubar_hat and returns a list of conditional mean and covariance of the filtering distribution of the latent variable as obtained from the application of the univariate Kalman–Filter of Koopman & Durbin (Reference Koopman and Durbin2000). More precisely, the matrix X_t returns the value of $\mathbb{E}\left [X\left (t\right ) \mid \bar{\mu }_{1:t}\right ]$ (time-update step), for $t=0,\ldots,K$ while the matrix X_t_c returns the value of $\mathbb{E}\left [X\left (t\right ) \mid \bar{\mu }_{1:t-1}\right ]$ (forecasting step). S_t and S_t_c are the corresponding covariance matrix of $X\left (t\right )$ . For a brief illustration, this can be run as follows:

X_filtered <- xfilter(model=“BS”, fact_dep=TRUE, n_factors = 3,

parameters=pe_BSd_3F$fit$par_est, data=mu_bar)

The function xsmooth implements the Rauch-Tung-Striebel (Rauch et al., Reference Rauch, Tung and Striebel1965) smoothing procedure to obtain the conditional mean and covariance matrix of the distribution of $X\left (t\right )$ conditional to $\bar{\mu }_{1:T}$ , that is, the entire time series of the observations. It uses as input the results from the xfilter function and the value of the parameter $\mathbf{\kappa }$ driving the dynamics of the SDE illustrated in Section 3 under the real-world probability measure.

X_smoothed <- xsmooth(filterobject=X_filtered,

kappa=pe_BSd_3F$fit$par_est$kappa)

10. Conclusion and further developments

This paper describes the AffineMortality R package, which allows the user to estimate, compare, project, and assess the uncertainty of affine mortality models. These models can be analyzed from an age-period as well as from an age-cohort perspective. The package can be used to support researchers to answer a wide range of questions involving stochastic mortality, including pricing of mortality-contingent securities (Xu et al., Reference Xu, Sherris and Ziveyi2020a and Reference Xu, Sherris and Ziveyi2020b), risk management of mortality-contingent products, assessment of the natural hedging of life insurance policies, and life annuities (Blackburn et al., Reference Blackburn, Hanewald, Olivieri and Sherris2017 and Sherris et al., Reference Sherris, Xu and Ziveyi2020), as well as the design of innovative mortality pooling products.

The authors plan to further expand the range of models that can be fitted, such as the Squared Gaussian model used in interest rates modeling (Leippold & Wu, Reference Leippold and Wu2002), and incorporate other features, such as cohort-specific factors and other age-dependent models. Further additions will encompass the possibility to account for incomplete cohort data and the inclusion of models whose mean-reversion parameter is non zero.

Another strand of future developments of AffineMortality includes the possibility of using alternative optimization methods, such as the Subplex algorithm of Rowan (Reference Rowan1990), which is available in R through the package nloptr (Johnson, Reference Johnson2020).

Acknowledgments

The authors are grateful to Dr. Patrick J. Laub for his valuable help and insights in making the package available.

Data availability statement

The data and code supporting the findings of this study are openly available on GitHub at ungolof/AffineMortality. The results contained in the manuscript are reproducible, excluding environment-specific numerical errors. The ungolof/AffineMortality folder is registered with the unique Zenodo DOI https://doi.org/10.5281/zenodo.10828419.

Funding statement

Francesco Ungolo acknowledges financial support from ERGO Center of Excellence in Insurance. Michael Sherris and Yuxin Zhou acknowledge financial support from the Society of Actuaries Center of Actuarial Excellence Research Grant 2017-2020: Longevity Risk: Actuarial and Predictive Models, Retirement Product Innovation, and Risk Management Strategies; and from the Australian Research Council Discovery Grant DP170102275 Retirement Product Innovation. MS, Len Patrick Garces, and YZ also acknowledge support provided by the Australian Research Council Centre of Excellence in Population Ageing Research project number CE170100005.

Appendix

A. Univariate Kalman Filtering

Following the outline in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), the $i$ th element of the vector $\bar{\mu }_t$ can be written as:

(A.1) \begin{align} \bar{\mu }_{t,i} = a_i + b_i x_{t,i} + \epsilon _{t,i}, \quad \epsilon _{t,i} \sim N\left (0,\omega _i\right ), \end{align}

Hence, the state equation corresponding to each observation $\bar{\mu }_{t,i}$ is:

(A.2) \begin{align} x_{t+1,1} &= \Phi x_{t,N} + \eta _t, &&\\[5pt] x_{t,i+1} &= x_{t,i}\nonumber \end{align}

for $i=1,\ldots N-1$ and $t=1,\ldots,K$ , given initial state $x_{0,N}=X\left (0\right )$ . Let $\bar{\mu }_{1:t} = \left [\bar{\mu }_{1}, \ldots,\bar{\mu }_{t} \right ]$ and $\bar{\mu }_{t,1:i} = \left [\bar{\mu }_{t,1}, \ldots,\bar{\mu }_{t,i} \right ]$ .

Given initial state $x_{0,N}\;:\!=\;X\left (0\right )$ and initial conditional covariance $\Sigma _{0,N} = \text{diag}\left (10^{-10},\ldots,10^{-10} \right )$ :

  1. 1. Forecasting ( $i=1$ only):

    (A.3) \begin{align} \widehat{x}_{t,1} &=\mathbb{E}\left (x_{t,1} \mid \bar{\mu }_{1:t-1} \right ) = \Phi \widehat{x}_{t-1,N}, \\[5pt] \widehat{\Sigma }_{t,1} &=\mathbb{V}\left (x_{t,1} \mid \bar{\mu }_{1:t-1} \right ) = \Phi \widehat{\Sigma }_{t-1,N} \Phi ^\top + R;\nonumber \end{align}
  2. 2. Time-update ( $i=1,\ldots,N-1$ on the left-hand side):

    (A.4) \begin{align} \widehat{x}_{t,i+1} &= \mathbb{E}\left (x_{t,i+1} \mid \bar{\mu }_{1:t-1}, \bar{\mu }_{t,1:i} \right ) = \widehat{x}_{t,i} + K_{t,i} \nu _{t,i}, \\[5pt] \widehat{\Sigma }_{t,i+1} &=\mathbb{V}\left (x_{t,i+1} \mid \bar{\mu }_{1:t-1}, \bar{\mu }_{t,1:i} \right ) = \widehat{\Sigma }_{t,i} - K_{t,i} F_{t,i} K_{t,i}^\top \nonumber \\[5pt] &= \left (I - K_{t,i} b_i \right )\widehat{\Sigma }_{t,i}\left (I - K_{t,i} b_i \right )^\top + K_{t,i} \omega ^2_i K_{t,i}^\top,\nonumber \end{align}
    where the scalar quantities $\nu _{t,i}$ and $F_{t,i}$ , and the $M \times 1$ -dimensional vector $K_{t,i}$ , are given by
    (A.5) \begin{align} \nu _{t,i} &= \bar{\mu }_{t,i} - a_i - b_{i} \widehat{x}_{t,i}, \\[5pt] F_{t,i} &= b_i \widehat{\Sigma }_{t,i} b_i^\top + \omega ^2_i, \nonumber \\[5pt] K_{t,i} &= \widehat{\Sigma }_{t,i} b_i^\top F^{-1}_{t,i}.\nonumber \end{align}

B. Factor Loading Expressions

B.1 Gompertz-Makeham model

In the independent-factor Gompertz-Makeham model, the factor loadings are given by

\begin{align*} B_1(t,T) & = -\frac{1-e^{-\delta _1 (T-t)}}{\delta _1}\\[5pt] B_{2,x}(t,T) & = - e^{\gamma x} \frac{e^{(\gamma -\delta _2)(T-t)} - 1}{\delta _2 - \gamma }\\[5pt] A_x(t,T) & = -\sigma _1^2 \left [\frac{T-t}{2\delta _1^2} - \frac{1 - e^{-\delta _1(T-t)}}{\delta _1^3} + \frac{1 - e^{-2\delta _1(T-t)}}{4\delta _1^3}\right ]\\[5pt] & \qquad - \frac{\sigma _2^2 e^{2\gamma x (T-t)}}{(\delta _2 - \gamma )^2} \left [\frac{1 - e^{2\gamma (T-t)}}{2\gamma } - 2\frac{1 - e^{(\gamma - \delta _2)(T-t)}}{\delta _2 + \gamma } + \frac{1 - e^{2\delta _2(T-t)}}{2\delta _2}\right ]. \end{align*}

In the dependent factor case, we have

\begin{align*} B_1(t,T) & = (a_1 + a_2) e^{-\delta _1(T-t)} - (a_1 + a_2 e^{-\delta _2 (T-t)})\\[5pt] B_2(t,T) & = -\frac{e^{\gamma x}}{\delta _2}(1 - e^{-\delta _2}(t-T))\\[5pt] A(t,T) & = -\frac{3}{2\delta _2} \left [\frac{\sigma _2^2 e^{2\gamma x}}{2\delta _2^2} + \frac{\sigma _{12}^2 e^{2\gamma x}}{2\delta _2^3}\right ]\\[5pt] & \qquad + (T-t) \left [\frac{\sigma _2^2 e^{2\gamma x}}{2\delta _2^2} + \frac{\sigma _{12}^2 e^{2\gamma x}}{2\delta _2^3} + \frac{a_1 \sigma _1 \sigma _{12} e^{\gamma x}}{2\delta _2} + \frac{a_1 \sigma _1}{2} + \frac{a_1 \sigma _1 \sigma _{12} e^{\gamma x}}{4\delta _2}\right ]\\[5pt] & \qquad - (1 - e^{-\delta _1(T-t)}) \left [\frac{3\sigma _1\sigma _{12} e^{\gamma x}(a_1 + a_2)}{4\delta _1 \delta _2} + \frac{a_1 \sigma _1^2 (a_1 + a_2)}{\delta _1}\right ] \\ & \qquad + (1 - e^{-\delta _2(T-t)}) \left [\frac{(a_1-1) \sigma _1 \sigma _{12} e^{\gamma x}}{2 \delta _2^2} + \frac{a_1 a_2 (\sigma _1^2 + 2)}{2\delta _2} - \frac{\sigma _1 \sigma _{12} e^{\gamma x} (a_1 - a_2)}{4\delta _2^2}\right ]\\[5pt] & \qquad + (1 - e^{-2\delta _1 (T-t)}) \frac{\sigma _1^2 (a_1 + a_2)^2}{4 \delta _1}\\[5pt] & \qquad + (1 - e^{-2\delta _2(T-t)}) \left [\frac{a_2^2}{2\delta _2} - \frac{a_2 \sigma _1 \sigma _{12} e^{\gamma x}}{2\delta _2^2} - \frac{\sigma _1 \sigma _{12} e^{\gamma x}}{4\delta _2^2}\right ]\\[5pt] & \qquad - (1 - e^{-\delta _1 (T-t) - \delta _2 (T-t)}) \left [\frac{2 a_2^2 + a_2^2 \sigma _1^2 + 2 a_1 a_2 + a_1 a_2 \sigma _1}{2(\delta _1 + \delta _2)} + \frac{\sigma _1 \sigma _{12} e^{\gamma x}(a_1 + a_2)}{4 \delta _2(\delta _1 + \delta _2)}\right ]\\[5pt] & \qquad + e^{-2\delta _2(T-t)}(4e^{\delta _2 (T-t)} - 1) \left [\frac{\sigma _2^2 e^{2\gamma x}}{4\delta _2^3} + \frac{\sigma _{12}^2 e^{2\gamma x}}{4\delta _2^4}\right ],\end{align*}

where, for convenience, we define

\begin{equation*}a_1 = \frac {1}{\delta _1} - \frac {\delta _{12} e^{\gamma x}}{\delta _1 \delta _2}, \qquad a_2 = -\frac {\delta _{12} e^{\gamma x}}{\delta _2^2 - \delta _1 \delta _2}.\end{equation*}

B.2 Arbitrage-Free Reduced Nelson-Siegel (AFRNS) model

In the independent factor case, the factor loadings are given by

\begin{align*} B_L(t,T) & = -(T-t)\\[5pt] B_S(t,T) & = -\frac{1 - e^{-\delta (T-t)}}{\delta }\\[5pt] A(t,T) & = \frac{\sigma _1^2}{6}(T-t)^3 + \sigma _2^2 (T-t) \left [\frac{1}{2\delta ^2} - \frac{1 - e^{-\delta (T-t)}}{(T-t)\delta ^3} + \frac{1}{4} \frac{1 - e^{-2\delta (T-t)}}{\delta ^3}\right ] \end{align*}

On the other hand, in the dependent factor case, we have

\begin{align*} B_L(t,T) & = -(T-t)\\[5pt] B_S(t,T) & = -\frac{1 - e^{-\delta (T-t)}}{\delta }\\[5pt] A(t,T) & = \frac{1}{12 \delta ^3} \Big [6 \sigma _L \sigma _{LS} (e^{-\delta (T-t)}-1) + 6 \delta \sigma _L \sigma _{LS} e^{-\delta (T-t)}(T-t)\\[5pt] & \qquad \qquad + 2 \delta ^3 \sigma _L^2 (T-t)^3 + 3 \delta ^3 \sigma _L \sigma _{LS} (T-t)^2\Big ] \\[5pt] & \qquad - \frac{1}{4 \delta ^3} \Big [2 \sigma _L \sigma _{LS} + 3 (\sigma _{LS}^2 + \sigma _S^2) + \sigma _S^2 e^{-2 \delta (T-t)} - 2 (\sigma _L \sigma _{LS} - 2(\sigma _{LS}^2 + \sigma _S^2)) e^{-\delta (T-t)}\\[5pt] & \qquad \qquad - 2 \delta (\sigma _{LS}^2 + \sigma _S^2) (T-t) - \delta ^2 \sigma _L \sigma _{LS} (T-t)^2 - 2 \delta \sigma _L \sigma _{LS} e^{-\delta (T-t)} (T-t)\Big ]. \end{align*}

B.3 Arbitrage-Free Unrestricted Nelson-Siegel (AFUNS) model

In the independent factor case, we have the following expressions for the factor loadings

\begin{align*} B_L(t,T) & = -(T-t)\\[5pt] B_S(t,T) & = - \frac{1 - e^{-\delta _1 (T-t)}}{\delta _1}\\[5pt] B_C(t,T) & = -\delta _2 (T-t) \frac{1 - e^{-\delta _1 (T-t)}}{\delta _1} + \frac{1 - e^{-\delta _3 (T-t)}}{\delta _3 (\delta _1 - \delta _3)}\\[5pt] A(t,T) & = \frac{1}{6} \sigma _1^2 (T-t)^3 + \frac{\sigma _2^2}{2\delta _1^2}(T-t) - \frac{1 - e^{-\delta _1(T-t)}}{\delta _1^3} + \frac{1 - e^{-2 \delta _1 (T-t)}}{4 \delta _1^3}\\[5pt] & \qquad + \frac{T-t}{2} \left (\frac{\sigma _3 \delta _3}{\delta _1 - \delta _3}\right )^2 \Bigg \{ \Bigg [1 - \frac{2(1 - e^{-\delta _1 (T-t)})}{\delta _1 (T-t)} + \frac{1 - e^{-2\delta _1 (T-t)}}{2\delta _1 (T-t)} \Bigg ]\frac{1}{\delta _1^2}\\[5pt] & \hspace{75pt} + \Bigg [1 - \frac{2 (1 - e^{-\delta _3 (T-t)})}{\delta _3 (T-t)} + \frac{1 - e^{-2 \delta _3 (T-t)}}{\delta _3 (T-t)}\Bigg ] \frac{1}{\delta _3^2} \\[5pt] & \hspace{75pt} + 2 \Bigg [1 - \frac{1 - e^{-\delta _1(T-t)}}{\delta _1(T-t)} - \frac{1 - e^{-\delta _3 (T-t)}}{\delta _3 (T-t)} + \frac{1 - e^{-(\delta _1 + \delta _3) (T-t)}}{(\delta _1 + \delta _3) (T-t)}\Bigg ]\frac{1}{\delta _1 \delta _3} \Bigg \} \end{align*}

Meanwhile, the expression for $A(t,T)$ in the dependent factor case is given by

\begin{align*} A(t,T) & = \frac{1}{6}\sigma _L^2 (T-t)^3 + \frac{1}{2}\left [\frac{c_1}{\delta _1} + \frac{c_2 c_3}{\delta _1} + \frac{c_2 c_3}{\delta _3}\right ] (T-t)^2 \\[5pt] & \qquad + \left [\frac{c_4 + c_3 c_5 + 2 c_3}{2\delta _1^2} + \frac{c_3 c_5 + 2c_3 + 2c_3 c_6}{2\delta _1 \delta _3} + \left (\frac{1}{\delta _1} + \frac{1}{\delta _3}\right )\left (\frac{c_3^2 c_6 - c_3 c_5}{2}\right )\right ] (T-t)\\[5pt] & \qquad + \left [\frac{c_1 + 2c_2 c_3}{2 \delta _1^3}\right ] \left (e^{-\delta _1(T-t)} \delta _1 (t-T) - (1 - e^{-\delta _1 (T-t)})\right )\\[5pt] & \qquad + \frac{c_2 c_3}{\delta _3^3} \left (e^{-\delta _3(T-t)} \delta _3 (t-T) - (1 - e^{-\delta _3 (T-t)})\right )\\[5pt] & \qquad + \left [\frac{c_3 c_5 - c_4 + 2c_3 + c_3 c_5}{\delta _1^3} + \frac{c_3 - c_3 c_5 + c_3^2 c_6}{\delta _1^2 \delta _3} - \frac{c_3 c_5}{2\delta _1^2}\right ] \left (e^{-\delta _1 (T-t)} - 1\right )\\[5pt] & \qquad + \left [\frac{2 c_3^2 c_6 - c_3 c_5}{2\delta _3^2} + \frac{c_3 + c_3^2 c_6}{\delta _1 \delta _3^2}\right ] \left (e^{-\delta _3 (T-t)} - 1\right )\\[5pt] & \qquad + \left [\frac{c_3 c_5 - c_4 - 2c_3 + c_3^2 c_6}{4\delta _1^3} - \frac{c_3 c_5}{4 \delta _1^2 \delta _3}\right ] \left (e^{-2\delta _1 (T-t)} - 1\right )\\[5pt] & \qquad - \frac{c_3^2 c_6}{4\delta _3^3} \left (e^{-2\delta _3 (T-t)} - 1\right ) + \frac{c_3 + c_3^2 c_6}{\delta _1 \delta _3 (\delta _1 + \delta _3)} \left (e^{-\delta _1 (T-t) - \delta _3 (t-T)} - 1\right ) \end{align*}

where we have

\begin{align*} c_1 & = \sigma _L \sigma _{LS} & c_4 & = \sigma _S^2 + \sigma _{LS}^2\\[5pt] c_2 & = \sigma _L \sigma _{LC} & c_5 & = \sigma _S \sigma _{SC} + \sigma _{LC} \sigma _{LS}\\[5pt] c_3 & = \frac{\delta _2}{\delta _1 - \delta _3} & c_6 & = \sigma _C^2 + \sigma _{LC}^2 + \sigma _{SC}^2. \end{align*}

Footnotes

1 The required R package devtools should be already installed on the machine.

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716723.CrossRefGoogle Scholar
Bacinello, A. R., Biffis, E., & Millossovich, P. (2010). Regression-based algorithms for life insurance contracts with surrender guarantees. Quantitative Finance, 10(9), 10771090.CrossRefGoogle Scholar
Barigou, K., Goffard, P.-O., Loisel, S., & Salhi, Y. (2023). Bayesian model averaging for mortality forecasting using leave-future-out validation. International Journal of Forecasting, 39(2), 674690. https://www.sciencedirect.com/science/article/pii/S0169207022000243 CrossRefGoogle Scholar
Biffis, E. (2005). Affine processes for dynamic mortality and actuarial valuations. Insurance: Mathematics and Economics, 37(3), 443468. https://www.sciencedirect.com/science/article/pii/S0167668705000600 Google Scholar
Biffis, E., Denuit, M., & Devolder, P. (2010). Stochastic mortality under measure changes. Scandinavian Actuarial Journal, 2010(4), 284311.CrossRefGoogle Scholar
Blackburn, C., Hanewald, K., Olivieri, A., & Sherris, M. (2017). Longevity risk management and shareholder value for a life annuity business. ASTIN Bulletin: The Journal of the IAA, 47(1), 4377.CrossRefGoogle Scholar
Blackburn, C., & Sherris, M. (2013). Consistent dynamic affine mortality models for longevity risk applications. Insurance: Mathematics and Economics, 53(1), 6473. http://www.sciencedirect.com/science/article/pii/S0167668713000632 Google Scholar
Brouhns, N., Denuit, M., & Vermunt, J. (2002). A Poisson log-bilinear regression approach to the construction of projected life tables. Insurance: Mathematics and Economics, 31, 373393.Google Scholar
Cairns, A. J., Blake, D., & Dowd, K. (2006a). Pricing death: Frameworks for the valuation and securitization of mortality risk. ASTIN Bulletin, 36(1), 79120.CrossRefGoogle Scholar
Cairns, A. J. G., Blake, D., & Dowd, K. (2006b). A two-factor model for stochastic mortality with parameter uncertainty: Theory and calibration. Journal of Risk and Insurance, 73, 687718.CrossRefGoogle Scholar
Chen, R.-R., & Scott, L. (2003). Multi-factor Cox-Ingersoll-Ross models of the term structure: Estimates and tests from a Kalman filter model. The Journal of Real Estate Finance and Economics, 27(488), 143172.CrossRefGoogle Scholar
Christensen, J. H., Diebold, F. X., & Rudebusch, G. D. (2011). The affine arbitrage-free class of Nelson-Siegel term structure models. Journal of Econometrics, 164(1), 420, Annals Issue on Forecasting. http://www.sciencedirect.com/science/article/pii/S0304407611000388 CrossRefGoogle Scholar
Christensen, J. H. E., Diebold, F. X., & Rudebusch, G. D. (2009). An arbitrage-free generalized Nelson-Siegel term structure model. The Econometrics Journal, 12(3), C33C64. http://www.jstor.org/stable/23116046 CrossRefGoogle Scholar
Cohen, S. N., & Elliott, R. J. (2015). Stochastic calculus and applications (2nd ed.). Springer.CrossRefGoogle Scholar
Cox, J. C., Ingersoll, J. E., & Ross, S. A. (1985). A theory of the term structure of interest rates. Econometrica, 53(2), 385407. http://www.jstor.org/stable/1911242 CrossRefGoogle Scholar
Dahl, M. (2004). Stochastic mortality in life insurance: Market reserves and mortality-linked insurance contracts. Insurance: Mathematics and Economics, 35(1), 113136. https://www.sciencedirect.com/science/article/pii/S0167668704000642 Google Scholar
Dethlefsen, C., Lundbye-Christensen, S., & Christensen, A. L. (2022). sspir: State Space Models in R, R Foundation for Statistical Computing. Available at: http://CRAN.R-project.org/package=sspir Google Scholar
Duffee, G. R. (2002). Term premia and interest rate forecasts in affine models. The Journal of Finance, 57(1), 405443. https://onlinelibrary.wiley.com/doi/abs/10.1111/1540-6261.00426 CrossRefGoogle Scholar
Duffie, D., & Kan, R. (1996). A yield-factor model of interest rates. Mathematical Finance, 6(4), 379406. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9965.1996.tb00123.x CrossRefGoogle Scholar
Duffie, D., Pan, J., & Singleton, K. (2000). Transform analysis and asset pricing for affine jump-diffusions. Econometrica, 68(6), 13431376.CrossRefGoogle Scholar
Gilbert, P. D. (2009). Brief User’s Guide: Dynamic Systems Estimation, R Foundation for Statistical Computing. Available at: http://CRAN.R-project.org/package=dse Google Scholar
Helske, J. (2017). Kfas: Exponential family state space models in r. Journal of Statistical Software, 78(10), 139. https://www.jstatsoft.org/index.php/jss/article/view/v078i10 CrossRefGoogle Scholar
Huang, Z., Sherris, M., Villegas, A. M., & Ziveyi, J. (2022). Modelling USA age-cohort mortality: A comparison of multi-factor affine mortality models. Risks, 10(9). https://www.mdpi.com/2227-9091/10/9/183 CrossRefGoogle Scholar
Hyndman, R. J., Booth, H., Tickle, L., & Maindonald, J. (2022). demography: Forecasting Mortality, Fertility, Migration and Population Data, R Foundation for Statistical Computing. Available at: https://www.R-project.org/ Google Scholar
Hyndman, R. J., & Shahid Ullah, M. (2007). Robust forecasting of mortality and fertility rates: A functional data approach. Computational Statistics and Data Analysis, 51(10), 49424956. https://www.sciencedirect.com/science/article/pii/S0167947306002453 CrossRefGoogle Scholar
Jevtić, P., Luciano, E., & Vigna, E. (2013). Mortality surface by means of continuous time cohort models. Insurance: Mathematics and Economics, 53(1), 122133. https://www.sciencedirect.com/science/article/pii/S0167668713000619 Google Scholar
Jevtić, P., & Regis, L. (2019). A continuous-time stochastic model for the mortality surface of multiple populations. Insurance: Mathematics and Economics, 88, 181195. https://www.sciencedirect.com/science/article/pii/S0167668716302906 Google Scholar
Jevtić, P., & Regis, L. (2021). A square-root factor-based multi-population extension of the mortality laws. Mathematics, 9(19). https://www.mdpi.com/2227-7390/9/19/2402 CrossRefGoogle Scholar
Johnson, S. G. (2020). The NLopt nonlinear-optimization package. Available at: http://github.com/stevengj/nlopt Google Scholar
Koopman, S. J., & Durbin, J. (2000). Fast filtering and smoothing for multivariate state space models. Journal of Time Series Analysis, 21(3), 281296. https://onlinelibrary.wiley.com/doi/abs/10.1111/1467-9892.00186 CrossRefGoogle Scholar
Lee, R. D., & Carter, L. R. (1992). Modeling and forecasting U.S. mortality. Journal of the American Statistical Association, 87(419), 659671. http://www.jstor.org/stable/2290201 Google Scholar
Leippold, M., & Wu, L. (2002). Asset pricing under the quadratic class. The Journal of Financial and Quantitative Analysis, 37(2), 271295. http://www.jstor.org/stable/3595006 CrossRefGoogle Scholar
Luethi, D., Erb, P., & Otziger, S. (2022). FKF: Fast Kalman Filter, R Foundation for Statistical Computing. Available at: http://CRAN.R-project.org/package=FKF Google Scholar
Milevsky, M., & Promislow, S. (2001). Mortality derivatives and the option to annuitize. Insurance: Mathematics and Economics, 29, 299318.Google Scholar
Petris, G. (2010). An r package for dynamic linear models. Journal of Statistical Software, 36(12), 116. https://www.jstatsoft.org/index.php/jss/article/view/v036i12 CrossRefGoogle Scholar
Pitacco, E. (2016). High Age Mortality and Frailty. Some remarks and hints for Actuarial Modeling. Available at CEPAR https://cepar.edu.au/sites/default/files/High_Age_Mortality_and_Frailty.pdf.Google Scholar
Plat, R. (2009). On stochastic mortality modeling. Insurance: Mathematics and Economics, 45(3), 393404. https://www.sciencedirect.com/science/article/pii/S0167668709000973 Google Scholar
Rauch, H. E., Tung, F., & Striebel, C. T. (1965). Maximum likelihood estimates of linear dynamic systems. AIAA Journal, 3(8), 14451450. https://doi.org/10.2514/3.3166 CrossRefGoogle Scholar
Renshaw, A., & Haberman, S. (2006). A cohort-based extension to the Lee-Carter model for mortality reduction factors. Insurance: Mathematics and Economics, 38(3), 556570.Google Scholar
Rowan, T. H. (1990). Functional Stability Analysis of Numerical Algorithms, PhD thesis, University of Texas at Austin, USA.Google Scholar
Schrager, D. F. (2006). Affine stochastic mortality. Insurance: Mathematics and Economics, 38(1), 8197.Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461464.CrossRefGoogle Scholar
Sherris, M., Xu, Y., & Ziveyi, J. (2020). Cohort and value-based multi-country longevity risk management. Scandinavian Actuarial Journal, 2020(7), 650676. https://doi.org/10.1080/03461238.2019.1711450 CrossRefGoogle Scholar
Singleton, K. (2006). Empirical dynamic asset pricing: Model specification and econometric assessment. Princeton University Press.CrossRefGoogle Scholar
Stoffer, D., & Wall, K. (2009). Resampling in state space models. In State Space and Unobserved Component Models: Theory and Applications.Google Scholar
Tusell, F. (2011). Kalman filtering in r. Journal of Statistical Software, 39(2), 127. https://www.jstatsoft.org/index.php/jss/article/view/v039i02 CrossRefGoogle Scholar
Ungolo, F., Garces, L. P. D. M., Sherris, M., & Zhou, Y. (2023). Estimation, comparison, and projection of multifactor age–cohort affine mortality models. North American Actuarial Journal, 123. https://doi.org/10.1080/10920277.2023.2238793 CrossRefGoogle Scholar
Villegas, A. M., Kaishev, V. K., & Millossovich, P. (2018). StMoMo: An R package for stochastic mortality modeling. Journal of Statistical Software, 84(3), 138.CrossRefGoogle Scholar
Willets, R. C. (2004). The cohort effect: Insights and explanations. British Actuarial Journal, 10(4), 833877.CrossRefGoogle Scholar
Xu, Y., Sherris, M., & Ziveyi, J. (2020a). Continuous-time multi-cohort mortality modelling with affine processes. Scandinavian Actuarial Journal, 6(6), 526552. https://doi.org/10.1080/03461238.2019.1696223 CrossRefGoogle Scholar
Xu, Y., Sherris, M., & Ziveyi, J. (2020b). Market price of longevity risk for a multi-cohort mortality model with application to longevity bond option pricing. Journal of Risk and Insurance, 87(3), 571595. https://onlinelibrary.wiley.com/doi/abs/10.1111/jori.12273 CrossRefGoogle Scholar
Figure 0

Figure 1 Heatmap of the standardized residuals for the Blackburn-Sherris model with three dependent factors. Source: Ungolo et al. (2023).

Figure 1

Figure 2 1-year ahead projected survival curve for the Blackburn-Sherris model with three dependent factors.