1 Introduction
Mortality rates have been steadily declining in most of the industrialized world throughout the 20th century and there are no signs of improvements decelerating. On the contrary, populations where mortality rates are already very low still experience rates of improvement of the same, or even higher, magnitude than historically.
A great deal of models for modeling and projecting mortality rates have been proposed in recent years. However, the model formulated by leecar92 is still the most widely used. The LeeCarter model describes the agespecific death rates (ASDRs) by the logbilinear relation
(1) 
where and are agespecific parameters and is a timevarying index. Projections are made by estimating the parameters of the model from observed data and extrapolate the time index by standard time series methods. Typically is modeled as a random walk with drift. The combination of a linear forecast of and structure (1) implies that ASDRs are forecasted to improve by the same agespecific factor each year.
The LeeCarter model has gained widespread popularity due to its simplicity and ease of interpretation. Essentially, ASDRs are forecasted to improve by the average rate of improvement over the estimation period. However, the assumption that rates of improvement are constant over time is not generally satisfied. Indeed, the mortality experience of several countries has shown increasing mortality improvement rates over time, in particular for older age groups, see e.g. leemil01; booetal02; renhab03; bon05. Consequently, LeeCarter forecasts have in many situations underestimated the gains in oldage mortality.
Left panel of Figure 1 shows ASDRs for US males from 1950 to 2010 based on data from the Human Mortality Database (www.mortality.org). The death rates are plotted on a logarithmic scale and increasing rates of improvement can be observed by the concavity, rather than linearity, of the agespecific curves. The consequence for LeeCarter forecasts is exemplified in the right panel of Figure 1 which shows actual and LeeCarter forecasted period remaining life expectancy of 60year old US males.^{2}^{2}2The period (remaining) life expectancy is a summary measure for the mortality profile of a population in a given year. It is calculated on the basis of the agespecific mortality rates for that year. Assuming death rates continue to decline the life expectancy experienced by a given cohort will be higher. This quantity is termed the cohort life expectancy. Due to increasing rates of improvement the forecasts increase when a later estimation period is used. Still, all historic forecasts are below the actual life expectancy experience.
In the light of changing rates of mortality improvement it has been suggested to fit the LeeCarter model to shorter periods of data for which death rates comply better with the loglinearity assumption, see e.g. leemil01; tuletal00; booetal02. In an environment of increasing rates of improvement the use of shorter, more recent data periods can indeed result in better forecasts. However, future increases in oldage mortality improvement rates — beyond what has been seen historically — cannot be captured by this method.
1.1 Stochastic frailty models
In this paper we investigate the use of frailty theory to address the pattern of changing rates of improvement. Frailty theory rests on the assumption that cohorts are heterogeneous and that some people are more susceptible to death (frail) than others. Frail individuals tend to die sooner than stronger individuals leading to old cohorts being dominated by low mortality individuals. In effect, the selection mechanism causes the cohort intensity to "slow down", i.e. to increase less rapidly than the individual intensity. This has been used to explain the logistic form of oldage mortality observed in data, see e.g. tha99.
In the model introduced by vauetal79 frailty enters as an unobservable, positive random quantify acting multiplicatively on an underlying baseline mortality intensity. The observable (aggregate) mortality intensity takes the form
(2) 
where denotes mean frailty at time for age , and is the underlying baseline intensity for an individual with frailty one. We assume without loss of generality that mean frailty at birth is one. For young ages with low mortality mean frailty is close to one, while for old ages mean frailty gradually decreases as selection takes effect.
Frailty theory offers an explantation to the observed change in improvement rates of oldage mortality and it suggests that we might expect to see higher rates of improvements in the future. As the probability of attaining a given (old) age increases over time the selection effect weakens and mean frailty increases towards one. This increase partly offsets the underlying improvements in individual baseline intensity, cf. (
2). Initially when selection is high oldage mortality rates will improve slowly (or even increase), but over time as selection weakens improvements in oldage mortality will gradually increase towards baseline improvement rates.In this paper we will use structure (2) as the basis for mortality modeling. We will show how a given model for aggregate mortality, e.g. the LeeCarter model, can be "fragilized" by using it as a model for the underlying individual mortality instead. This procedure will enable us to model logistictype oldage mortality and changing rates of improvement while preserving the structure of the original model and the underlying timeseries dynamics. As an example, a "fragilized" LeeCarter model takes the form
(3) 
In Section 4.2 we will compare this model to the ordinary LeeCarter model.
We show how to perform maximum likelihood estimation and how to forecast in models of form (2). Estimation will be based on a Poisson pseudolikelihood which allows for general specifications of the frailty distribution, e.g. the generalized stable laws of hou86. We also generalize (2) to include an additive (nonfrailty) term and show how models of this form can be estimated by the EMalgorithm. Methods for joint maximum likelihood estimation of the frailty distribution and the baseline intensity are derived based on the (conditional) likelihood for fixed frailty distribution.
In contrast to the typical use of frailty theory in mortality modeling our approach does not rely on a specific parametric model. Indeed, we show how essentially any combination of frailty distribution and parametric or semiparametric model for baseline intensity can be estimated. The only requirement is that the underlying baseline mortality model can be estimated (by maximum likelihood) and that the Laplace transform of the frailty distribution is known.
Frailty theory is wellestablished in biostatics and survival analysis and several monographs are devoted to the topic, e.g. wie10; han11. In demographic and actuarial science frailty models are also known as heterogeneity models. They have been used in mortality modeling to fit the logistic form of oldage mortality, see e.g. wanbro98; buthab04; oli06; caibladow06; spretal13, and to allow for overdispersion in mortality data, cf. lietal09.
The present use of frailty theory is closest in spirit to jarkry11 in which a deterministic frailty model is used to describe the trend in international mortality. However, the methods developed in this paper greatly extend that work by incorporating frailty into any standard stochastic mortality model.
1.2 Outline
The rest of the paper is organized as follows. In Section 2 we give a brief introduction to frailty theory to establish the fundamental relation between cohort and individual mortality on which our approach relies. Section 3 is the main theoretical chapter in which stochastic frailty models are formalized and estimation and forecasting discussed. This is followed in Section 4 by applications to a parametric timeseries model and the LeeCarter model. Section 5 contains a generalization of stochastic frailty models and explains how they can be estimated by the EMalgorithm. Finally, Section 6 offers some concluding remarks.
2 Frailty theory
In this section we go through the basic theory and establish relations to be used in the following. For ease of exposition we consider only a single birth cohort in a continuoustime model. Later on we will add a time parameter. As our starting point we take the frailty model of vauetal79 in which frailty is modeled as an unobservable, nonnegative quantity acting multiplicatively on an underlying, baseline (mortality) intensity. The intensity, or death rate, of an individual of age with frailty is given by
(4) 
where is the baseline intensity describing the age effect. The corresponding survival function, i.e. the probability of the person surviving to age , is
(5) 
where denotes the integrated baseline intensity,
(6) 
The survival function of the cohort, i.e. the expected fraction of the cohort surviving to age , is then
(7) 
where denotes the Laplace transform of the frailty distribution (at birth),
(8) 
Assuming sufficient regularity the (conditional) mean frailty of the cohort at age is given by
(9) 
where denotes differentiation and . It follows that the cohort intensity can be written as
(10) 
The derivations above relating mean frailty and cohort intensity to the frailty distribution and baseline intensity provide the usual tools for frailty modeling. For later use we make the additional observation that mean frailty can also be expressed in terms of the integrated cohort intensity,
(11) 
The cohort survival function (7) provides the following link between and ,
(12) 
from which we obtain , and thereby . By (9) and (10) we then get the expression
(13) 
The idea later on is to replace by a nonparametric estimate obtained from data, i.e. the integrated observed (empirical) death rates. The substitution disentangles the frailty distribution and the baseline intensity which greatly simplifies the estimation procedure.
In order to apply the theory we need to identity suitable frailty distributions and their Laplace transforms. Typically, the baseline intensity contains a scale parameter in which case it is customary to assume that the frailty distribution has mean one to ensure identifiability. We will also make this assumption here.
2.1 Gamma and inverse Gaussian frailty
The frailty distributions most often employed in mortality modeling are the Gamma and inverse Gaussian distributions, see e.g.
vauetal79; hou84; manetal86; buthab04; jarkry11; spretal13. Below we state their Laplace transforms and comment on their use.When frailty is Gammadistributed with mean one and variance
the Laplace transform and mean frailty are given by(14)  
(15) 
It is wellknown that Gamma frailty in combination with Gompertz or Makeham baseline intensity leads to a cohort intensity of logistic type, see e.g. Example 2.1 of jarkry11 for details.^{3}^{3}3A Gompertz intensity is of the form , while a Makeham intensity is of the form . A Makeham intensity is also sometimes referred to as a GompertzMakeham intensity. This is also known as the Perks model, per32, and it has been found to describe oldage mortality very well, see e.g. thaetal98; tha99; caibladow06. Gammadistributed frailty is also mathematically tractable and allows explicit calculations of many quantities of interest, e.g. frailty among survivors at a given age is Gammadistributed with known scale and shape parameters, cf. vauetal79.
When frailty follows an inverse Gaussian distribution with mean one and variance we have for the Laplace transform and mean frailty
(16)  
(17) 
Similar to Gamma frailty inverse Gaussian frailty enjoys a number of nice theoretical properties, e.g. the frailty distribution among survivors of a given age is again inverse Gaussian and an explicit formula for the density exists, see hou84. hou84 shows further that for inverse Gaussian frailty the coefficient of variation among survivors is decreasing with age making the cohort more homogeneous as it gets older, while for Gamma frailty the coefficient of variation is constant. Thus the impact of frailty is smaller for inverse Gaussian frailty than for Gamma frailty. From (15) and (17) we also see directly how Gamma frailty "slows" down baseline intensity more than inverse Gaussian frailty.
It is generally found that Gamma frailty and the associated logistic form provides a better description of oldage mortality than inverse Gaussian frailty, see e.g. buthab04; spretal13. Furthermore, abbber07 show that for a large class of initial frailty distributions the frailty distribution among survivors converges to a Gamma distribution as the integrated intensity tends to infinity. Thus overall the Gamma distribution is a good default choice.
That being said, it is not immediately clear that this conclusion carries over to the case where frailty is introduced to model increasing rates of improvement over time.
2.2 Positive stable frailty
hou86 introduced a family of generalized stable laws which include the Gamma and inverse Gaussian distributions as special cases. The family is obtained by exponential tilting of stable densities with index
. The stable laws themselves only have moments of order strictly less than
, while moments of all orders exist for the exponentially tilted densities. From the original threeparameter family we obtain a twoparameter family by imposing the condition that mean frailty is one.When follows a generalized stable law with index , mean one and variance the Laplace transform and mean frailty are given by^{4}^{4}4The stated formulaes are obtained from hou86 using the parametrization and .
(18)  
(19) 
We note that for (defined by continuity) the generalized stable law specializes to the Gamma distribution, while for it specializes to the inverse Gaussian distribution. While Gamma and inverse Gaussian densities are available in closed form, generally only series representations of (tilted) stable densities exist, cf. hou86 and Lemma XVII.6.1 of felII.
In either case, we have simple closed form expressions for mean frailty in terms of integrated baseline and cohort intensity which is all we need for estimation and forecasting purposes. Moreover, the fact that the parametric family spans the two most commonly used frailty distributions, Gamma and inverse Gaussian, opens for a unified approach for joint estimation of frailty and baseline parameters.
The family has been further generalized by aal88; aal92 to include also negative . Negative values of
correspond to compound Poisson distributions with positive probability of zero frailty. This is important for some applications when modeling the time to noncertain events, e.g. divorce or unemployment. The generalization is of no use in our context where it implies immortality for some individuals. However, it could be useful for coursespecific mortality modeling as demonstrated by
aal88.3 Stochastic frailty models
We first include frailty in a continuoustime model spanning multiple birth cohorts and show the effect on the dynamics of agespecific death rates over time. Then we add frailty to stochastic mortality models and discuss estimation and forecasting of the resulting models.
Again, we start with the multiplicative frailty model. With a slight abuse of notation we will reuse the notation of Section 2 with an added time parameter. Assume that the intensity for an individual of age at time with frailty has the form
(20) 
where is a baseline intensity describing the period (time) and age effect. The cohort intensity is then given by
(21) 
where denotes the (conditional) mean frailty of the cohort of age at time , i.e. the cohort born at time .
Assume that all cohorts have the same frailty distribution at birth, and denote the Laplace transform of this common distribution by . Let , and define the integrated baseline and cohort intensities as
(22)  
(23) 
That is, we follow a specific birth cohort over time and age. We have as before the relations and
(24) 
Following the notation of bon05 we define the rate of improvement in (senescent) mortality as . It follows from (21) that
(25) 
where denotes the rate of improvement of baseline intensity.
Suppose we model the period effect of mortality improvements by decreasing agespecific baseline mortality, i.e. . For fixed , the integrated baseline intensity will then also be decreasing over time, while the mean frailty will be increasing over time due to less and less selection. From (25) we see that this implies that the rate of improvement of cohort mortality will be smaller than the rate of improvement at the individual level.
At old ages where death rates, and thereby selection, is high the change in mean frailty over time can substantially offset improvements in baseline mortality causing rates of improvement of cohort mortality to be close to zero. As improvements continue to occur at the individual level the selection effect gradually disappears and rates of improvement of cohort mortality get closer to baseline rates of improvement. The resulting pattern of gradually changing rates of improvement of oldage mortality resembles what is seen in data. This will be illustrated in the applications in Section 4.
3.1 Data and terminology
Data are assumed to be of the form of death counts, , and corresponding exposures, , for a range of years, , and ages, . Data may or may not be genderspecific, but that will not be part of the notation. denotes the number of deaths occurring in calender year among people aged . Correspondingly, denotes the total number of years lived during calender year by people of age . For readers familiar with the Lexis diagram, counts the number of deaths in the square of the Lexis diagram and gives the corresponding exposure, i.e. we work with socalled Agroups.
From the death counts and exposures we form the observed (empirical) death rates
(26) 
The death rate is an estimate of the cohort intensity, , which for modeling purposes is assumed constant over the square in the following.
3.2 Fragilization of mortality models
Consider as given a stochastic model for baseline mortality, . For ease of presentation we will assume that it is of the form
(27) 
where and are, possibly multidimensional, time and agedependent quantities and describes the functional dependence. We could have included also cohort effects and more general dependence structures if so desired.
The form (27) is chosen with two applications mind: with , and we get the LeeCarter model; with we get parametric timeseries models where the functional form of the ageprofile is determined by . In the parametric case we could use for instance a Gompertz law, , or a logistic function, . The latter is the model considered by caibladow06 for aggregate mortality. As mentioned in Section 2.1 the logistic model already has a frailty interpretation, and it is therefore not an obvious candidate for further fragilization; but it can nevertheless be done.
For presentational convenience we assume that both and need to be estimated from data, although as just seen may in fact be fixed. Once estimated we can forecast by ARIMA timeseries methods and thereby obtain forecasts of by inserting the forecasted values of together with the estimated values of into (27).
In order to proceed we assume that we have available a procedure for maximum likelihood estimation of in the model where death counts are independent with
(28) 
In the LeeCarter case such a procedure is described in broetal02.
With these building blocks in place we now consider the following "fragilized" version of the baseline model in which death counts are independent with
(29)  
(30) 
where is given by (27) and denotes the conditional mean frailty of the cohort of age at time . It is assumed that the frailty distribution at birth is the same for all cohorts and that the distribution belongs to a family indexed by . The parameters of the model are thus . The Laplace transform of the frailty distribution with index is denoted , and this is assumed available in explicit form. Further, as a matter of convention we assume that mean frailty is one at birth and we define .
Based on (22) and (24) we can write
(31)  
(32) 
and insert this into (29). In principle, we can estimate all parameters jointly from the resulting likelihood function. However, the likelihood function is intractable with frailty and baseline parameters occurring in a complex mix. Consequently, estimation has to be handled on a casebycase basis depending on the choice of frailty and baseline model. Below we propose an alternative, generally applicable pseudolikelihood approach which greatly simplifies the estimation task.
3.3 Pseudolikelihood function
From (24) we know that , where is the integrated cohort intensity . At first sight this does not seem to help much since is at least as complicated as . However, in contrast to we can obtain a (model free) estimate of directly from data as
(33) 
We thus propose to base estimation of (29)–(30) on a likelihood function in which the term is replaced by . The resulting approximate likelihood function is referred to as the pseudolikelihood function,^{5}^{5}5The idea of basing inference on a pseudolikelihood function was first introduced in spatial statistics, see bes75.
(34)  
(35) 
Formally, it corresponds to estimating the modified model
(36)  
(37) 
In model (36)–(37) is separable in frailty and baseline parameters and the model is therefore considerably easier to handle than (29)–(32).^{6}^{6}6Equation (37) might still look daunting, but is simplifies in specific cases. If frailty is assumed Gamma distributed with variance (and mean one) we get , while inverse Gaussian frailty with variance (and mean one) yields . The separability allows for simple joint estimation based on the estimation procedure for the baseline model.
One issue remains before turning to estimation in Section 3.4. To form the (pseudo) likelihood function we need to compute for all and in the data window. However, these quantities depend in part on data outside the data window. In principle, we need to know the death rates from birth to the present/maximum age for all cohorts entering the estimation. The gray area of Figure 2 illustrates the "missing" death rates.
For the purpose of calculating we suggest to define the death rates before and below the data window by
(38) 
This corresponds to saying that selection prior to have happened according to initial rates (rather than actual rates), and that all cohorts have mean frailty one at age (rather than at birth). The latter assumption was also used by Vaupel in the contributed discussion part of tha99. With the extension of death rates and hence the likelihood function can be computed.
We note that in some situations, e.g. for many national data sets, we may have more data than we choose to use for estimation. In these situations some or all of the "missing" death rates may in fact be available and hence could be used when calculating . In the present paper, however, we will calculate using the extension (38) even in this case. Assuming no selection prior to also makes forecasting more straightforward.
3.4 Maximum likelihood estimation
We here discuss various ways to obtain maximum likelihood estimates of model (36)–(37) of Section 3.3. For computational efficiency and numeric stability one often considers the loglikelihood function (rather than the likelihood function),
(39) 
where is given by (37) and the constant depends on data only. The simplest and most general method to optimize (39) is to form the profile loglikelihood function
(40) 
where and denote the maximum likelihood estimates of and for fixed frailty parameter . Assuming that we know how to perform maximum likelihood estimation of the baseline model, and can be obtained by estimating the baseline model with death counts and exposure . Since the frailty family is typically of low dimension, e.g. one or two dimensions, the profile loglikelihood function can normally be optimized reliably by general purpose optimization routines, e.g. the optimize or optim routines in the free statistical software package .
For a onedimensional frailty family, e.g. Gamma with mean one and variance , we could use the following skeleton Rcode
# model is a list with functions for calculating # Htilde, mean frailty and MLE of the baseline model frail.optim < function(D,E,model) { Htilde < model$Htilde(D,E) profile.loglike < function(phi) { EZ < model$meanfrail(phi,Htilde) ret < model$MLEbase(D,EZ*E) ret$value # value of maximized loglikelihood } optret < optimize(profile.loglike,interval=c(0,2),maximum=TRUE) MLEphi < optret$maximum EZ < model$meanfrail(MLEphi,Htilde) ret < model$MLEbase(D,EZ*E) MLEbase < ret$par # MLE of baseline parameters list(phi=MLEphi,base=MLEbase) }
Alternatively, maximum likelihood estimation can also be implemented by a switching algorithm. Starting from an initial value, , of the frailty parameter and , the algorithm consists in repeating the following two steps until convergence. On each iteration is increased by one.

Calculate as the maximum likelihood estimate with fixed.

Calculate as the maximum likelihood estimate with fixed.
The estimation in the first step is, as before, performed by estimating the baseline model with death counts and exposure . The estimation in the second step is performed by maximizing as a function of . With the dimension of typically low, this is easy to do by either NewtonRaphson, steepest ascent, or, in the onedimensional case, even a simple bisection algorithm.
For the case of Gamma frailty with mean one and variance we have for and its first two partial derivatives with respect to
where we have suppressed the dependence on and of , and to simplify notation. The calculations show that is strictly concave as a function of , and thereby that a local maximum is also a global maximum. Hence any gradient method is guaranteed to converge to the global maximum. Note however that the maximum may not be interior and therefore we do not necessarily have at the maximum.
The switching algorithm will always converge to a (local) maximum. However, many iterations may be required in particular if the likelihood has a "diagonal" ridge along frailty and baseline parameters, i.e. if frailty and baseline parameters to some extent describe the same feature of data. The advantage of the switching algorithm is that it allows a more detailed analysis which can guide the choice of optimization routine, like in the Gamma example above.
We finally note that estimation can also be carried out by NewtonRaphson sweeps over frailty and baseline parameters, similar to the algorithm described in broetal02 for the LeeCarter model. This may be more efficient in terms of computing time, but it requires a substantial amount of tailormade code for each combination of frailty and baseline model.
Overall, we find that optimization of the profile loglikelihood function is the method of choice. It is particularly wellsuited in the exploratory phase due to its flexibility and ease of implementation. In our experience it is also sufficiently fast and robust to be of practical use.
3.5 Forecasting
We now assume that we have (maximum likelihood) estimates of and we denote these by . We will follow the usual approach in stochastic mortality modeling and forecast mortality based on a timeseries model for . The timeseries model is estimated on the basis of treating these as observed, rather than estimated, quantities. Typically a simple (multidimensional) random walk with drift is used, see e.g. leecar92 and caibladow06, but models with more structure can also be used.
Assume that we have a forecast for a given forecast horizon . The forecast can be either a deterministic (mean) forecast, or a stochastic realization from the timeseries model. The aim is to forecast baseline and cohort mortality for ages ; the forecast region is illustrated as the shaded box to the right in Figure 2.
We first note that the estimated can be written
(41) 
where . Expressing mean frailty in terms of provides the necessary link for forecasting mean frailty, and thereby cohort mortality.
Baseline mortality is readily forecasted by inserting and into (27),
(42) 
while cohort mortality is forecasted by
(43) 
where in the forecast region is given by the recursion
(44) 
Note that in the data window is defined by transformation of to ensure consistency with the estimated model, while in the forecast region it is defined recursively in terms of the forecasted baseline mortality.
For Gamma frailty with mean one and estimated variance we will forecast cohort mortality by
(45) 
where in the data window is given by . Inverse Gaussian frailty with mean one and estimated variance yields the cohort mortality forecast
(46) 
where in the data window is given by .
4 Applications to US male mortality
Broadly speaking stochastic mortality models fall in two main categories: parametric timeseries models and semiparametric models of LeeCarter type. We consider two applications, one for each of these model classes. The applications are illustrative examples highlighting different aspects of the theory rather than fullblown statistical analyses.
In the first application we take a standard Makeham mortality intensity and show how the fit can be improved for different age groups by adding frailty. To give a flavor of the possibilities we consider models of the generalized, additive form presented in Section 5 in combination with the twodimensional family of stable laws of Section 2.2. In the second application we add Gamma frailty to the LeeCarter model and illustrate the effect on the forecasts.
Both applications use US male mortality data from 1950–2010 available at the Human Mortality Database (www.mortality.org). The first application compares a forecast based on the initial half of data with the actual mortality experience. The second application uses the latter half of data and makes a forecast into the future.
4.1 Parametric timeseries application
The classical Makeham, or GompertzMakeham, mortality law states that agespecific mortality can be described by the functional form
(47) 
where denotes age. The additive structure suggests an interpretation of mortality as composed of an agedependent (senescent) part describing mortality as people grow older, and a part describing background mortality, e.g. accidents, which happen independent of age. Despite its simplicity the Makeham intensity provides a reasonable description for a wide range of adult ages. However, for the very old and also for younger adults the description is not adequate. The right panel of Figure 3 shows observed death rates in 1980 for US males aged 20 to 100 with a fitted Makeham curve imposed (and two other curves which will be explained below). It is seen that the Makeham curve overstates oldage mortality and that the curvature is wrong for the younger ages.
For oldage mortality, the lack of fit of the Makeham law is wellknown and logistictype models have been proposed instead, see e.g. thaetal98. In caibladow06
a logitmodel for death probabilities (rather than rates) is suggested and applied to England & Wales males above age 60 for which it fits well. However, in
caietal09 potential problems of fitting US male data with the pure logitmodel is reported and a variant is considered.4.1.1 Stochastic frailty model
As an alternative to the logitmodel and its variants we here consider the stochastic frailty model
(48)  
(49) 
where the distribution of belongs to the positive stable family with index and variance , described in Section 2.2. The model is of the additive form introduced in Section 5 with frailty acting only on the senescent part of mortality. With this structure we keep the interpretation of the second term as background mortality to which everyone is equally susceptible.
We recall from Section 2.2 that the stable family includes both Gamma () and inverse Gaussian () as special cases. Gamma frailty leads to a logistictype model, akin to the one used in tha99, while values of greater than zero yield oldage mortality "in between" logistic and exponential. The Makeham model with timevarying parameters is also included and corresponds to (for all values of ).
We apply the model to US male mortality data for the period 1950–1980 and ages 20–100. The left panel of Figure 3 shows a contour plot of the profile loglikelihood function,
(50) 
where is the pseudolikelihood function of Section 5.2 and and denote the maximum likelihood estimates for fixed value of frailty parameters. The profile loglikelihood function is calculated by the EMalgorithm of Section 5.3.
Interestingly, the profile loglikelihood is maximized for . This implies that the best fit is obtained for a model which is quite far from a logistic form. A closer look at data reveals why this is the case. The right panel of Figure 3 shows the observed death rates in 1980 with three fitted curves imposed. The curve labeled ’Makeham with stable frailty’ is the one corresponding to the MLE, while the curve labeled ’Makeham’ is the fitted Makeham curve mentioned earlier. If we restrict attention to Gamma frailty, i.e. maximize the profile loglikelihood along the axis , we find the maximum . The corresponding fitted curve is the one labeled ’Makeham with Gamma frailty’.
Compared to the Makeham curve, the MLE curve gives an improved fit at young ages and a similar fit at old ages, while the Gamma curve gives an improved fit at old ages and a similar fit at young ages. Since the exposure is much bigger at young ages than at old ages the global MLE is the one fitting young ages the best. Ideally we would like a model which combines the MLE fit for young ages with the Gamma fit at old ages. Note that the observed death rates are actually decreasing from age 20 to around age 30. This however cannot be captured by any of the models considered.
4.1.2 Forecast
Figure 4 shows the estimated and forecasted level, , and slope, , parameters of individual senescent mortality. The forecast is produced as the mean forecast of a fitted twodimensional random walk with drift,
(51) 
where is the drift and are independent, identically distributed twodimensional normal variables with mean zero. Qualitatively similar plots hold for the Gamma frailty model (not shown).
It appears that there is a structural break around 1970 where the annual drift in level and slope changes. With hindsight we now know that this change was genuine and that using only the period 1970–1980 would have resulted in a better forecast. Imagine standing in 1980, however, it is less obvious that we would have used only the last 10 years of data when forecasting. To illustrate the forecast the method would likely have produced in 1980 we therefore use the mean forecast based on the average drift for the full period 1950–1980, as shown in Figure 4.
) are shown as solid lines. 95%confidence intervals with and without parameter uncertainty on
(drift) are shown as dashed and solid lines respectively.Cohort mortality with stable frailty is forecasted by
(52) 
where denotes the mean forecast of , cf. Section 5.4. Background mortality shows very little variation over the data window and is therefore kept fixed at the last value in the forecast, . Similarly, cohort mortality with Gamma frailty is forecasted by
(53) 
The fit and forecast with the two frailty specifications are shown in Figure 5 together with observed death rates from 1950 to 2010. In the data window, the model with stable frailty is seen to provide a very good fit for all ages except the oldest (age 100), while Gamma frailty leads to a good fit for all ages above 50, but a poor fit below. None of the models are able to fully predict the improvements that occurred from 1980 onwards in particular for ages 60–80. Of the two models the Gamma frailty model does in fact come closest due to its more logconcave forecasts, cf. the discussion of Gamma versus inverse Gaussian frailty in Section 2.1.
Overall, the example shows that adding frailty can substantially improve the fit of a simple baseline model. However, as with all parsimonious models the fit is not perfect, in particular when we consider an age span as wide as 20 to 100 years. The example also shows that regarding both fit and forecast of oldage mortality Gamma frailty performs better than (other) stable frailties.
4.2 LeeCarter application
The model of leecar92 assumes a logbilinear structure of mortality,
(54) 
where and are agespecific parameters and is a timevarying index. The index is typically modeled as a random walk with drift,
(55) 
where is the drift and are independent, identically distributed normal variables with mean zero. The combination of (54) and (55) implies that forecasted agespecific death rates decay exponentially at a constant rate. However, the experience of oldage mortality has shown increasing rates of improvement in many countries and, consequently, forecasts based on the LeeCarter methodology have a tendency to underestimate the actual gains. In the example to follow we illustrate how the addition of frailty can be used to improve the forecast of oldage mortality.
We consider the Poisson version of the LeeCarter model,
(56) 
and we use the algorithm of broetal02 to obtain the maximum likelihood estimates of the parameters under the usual identifiability constraints,
(57) 
We will use the LeeCarter model with and without frailty to model mortality for ages 0 to 90, and assume a logistic form for higher ages,^{7}^{7}7For each , and are estimated by least squares based on the relation , for ages .
(58) 
The extrapolation at the oldest ages is necessary to obtain reliable and stable rates, both historically where data are sparse and in forecasts. The left panel of Figure 6 shows the fit in 2010 for the full agespan from 0 to 110 years. The plot illustrates the flexibility and better fit to data of semiparametric models, like the LeeCarter model, over parametric models like the ones considered in Section 4.1. Parametric models, on the other hand, generally produce forecasts which better preserve the overall structure of data.
The four LeeCarter forecasts used in the introductory example in Section 1 are produced as described above. The mean of the index, , is used for forecasting assuming an underlying random walk with drift. The period life expectancy used for illustration is calculated by the following formula with
(59) 
where denotes the integer part of . Arguably, the cohort life expectancy taking future improvements into account is of more interested in practise. However, for our purposes the period life expectancy is more useful since it can be compared with the actual experience.
4.2.1 Stochastic frailty model
Consider the following multiplicative stochastic frailty model
(60)  
(61) 
where is assumed to be Gamma distributed with mean one and variance . Due to the assumption of Gamma frailty we have the relations
(62) 
where and denote integrated cohort and baseline mortality respectively. For we have the ordinary LeeCarter model.
We will deviate slightly from the theory presented so far and use the period version of and rather than the cohort version previously used. Hence we will replace by the quantity
(63) 
when estimating the model. Similarly, we calculate and forecast by its period version . Note that with the period version we do not need to extend the data window.
In the context of the LeeCarter model the period version appears to stabilize the estimates and forecasts, and this is why we use it. We also note that the use of Gamma frailty on a period basis is consistent with the logistic extension above age 90.
For fixed value of model (60)–(61) is estimated as an ordinary LeeCarter model with exposure . With the large number of parameters of the LeeCarter model the addition of makes little difference for the fit. This is illustrated in the left panel of Figure 6 where the fit of the LeeCarter model with and without frailty is indistinguishable.
Since frailty is introduced with the specific aim of improving the forecast we propose to estimate with a backtest approach, rather than by maximum likelihood estimation. Specifically, we define the forecast fit by
(64) 
where denotes the forecast of model (60)–(61) estimated with data from 1970–2000 and ages 0–90 for fixed value of . The model is forecasted as described in Section 4.2.2 below. Plotting shows a unimodal function which is maximized at (plot not shown). The chosen approach is simple and serves as illustration, but many other choices for measure of fit and data period could have been made.
The right panel of Figure 6 shows the life expectancy of US males age 60 resulting from the forecast for different values of . The impact of frailty is initially modest but it gradually leads to substantial differences. Yet, even the presence of frailty cannot fully explain the rapid increase in life expectancy of the last decade. Note that the optimal frailty variance is not the one leading to the highest forecast, but the one giving the best fit overall for ages 0–90 in the ten years, 2001–2010, following the estimation period.
4.2.2 Forecast
Forecasts of model (60)–(61) are obtained by
(65) 
where , and denote estimated parameters and is the forecasted index, either deterministic or stochastic. The integrated baseline intensity is given by in the data window and by
(66) 
in the forecast.
The left panel of Figure 7 shows the estimated index for the estimation period 1980–2010 and frailty variance . The estimated index looks close to linear and a linear forecast seems reasonable. The right panel of Figure 7 shows the corresponding life expectancy forecast for US males age 60 as the dashed line starting in 2010. The forecast is almost linear while the ordinary LeeCarter forecast (solid line) curves downwards over time. For comparison the plot also shows LeeCarter forecasts with and without frailty based on the estimation periods 1950–1980, 1960–1990 and 1970–2000.^{8}^{8}8To show the effect of the estimation period rather than the level of frailty the same value of is used for all forecasts. The pattern is the same with almost linear frailty forecasts and lower, downwardcurving LeeCarter forecasts. Note that all historic forecasts, both with and without frailty, are below the actual life expectancy evolution.
The effect of frailty in terms of the forecasted agespecific death rates is illustrated in Figure 8. For ages below 70 the LeeCarter forecasts with and without frailty are almost identical. For higher ages the frailty effect gradually increases and gives rise to the previously announced curving forecasts. In particular, the frailty forecast predicts substantial improvements in mortality for the very old over the next 50 years while the ordinary LeeCarter forecasts predicts essentially no improvements above age 90. The plot also illustrates that the forecasted ageprofile is somewhat distorted. This is a general feature of semiparametric methods, like the LeeCarter model, due to the lack of structure.
History shows that mortality rates generally decline over time, but it also shows changing rates of improvement. Clearly, there are substantial period effects due to medical breakthroughs, changes in nutrition and working conditions etc., and frailty can at best be a contributing factor to the observed increasing rates of improvement. Indeed, the example shows that adding frailty to historic forecasts is not enough to explain the actual evolution. However, the addition of frailty leads to projected oldage mortality rates which at least partly accommodate future higher rates of improvement. The resulting almost linear life expectancy projections resemble the actual experience better than the downwardcurving LeeCarter forecasts.
5 Generalized stochastic frailty models
The theory presented in Section 3 allows for estimation and forecasting of multiplicative frailty models. This is a useful class of models, but it can also be relevant to consider a larger and more flexible class of additive models with both frailty and nonfrailty components. With additive models we can distinguish between senescent mortality influenced by frailty and selection and "background" mortality due to e.g. accidents with no selection effects. The class introduced below contains e.g. logistictype models with constant terms which is useful when modeling mortality for a wide age span.
In the following we define the class of generalized stochastic frailty models and we show how to estimate and forecast these models. Additive models complicates the theory somewhat and we will have to resort to an EMalgorithm for estimation. The exposition is rather brief and focuses on the areas where the theory differs from the one presented in Section 3. Many of the general remarks still apply, but they will not be repeated here.
The models we consider are based on the underlying assumption that the intensity for an individual of age at time with frailty is of the form
(67) 
where is the baseline intensity influenced by individual frailty and is the background mortality common to all individuals independent of their frailty. The cohort intensity is then given by
(68) 
where denotes the (conditional) mean frailty of the cohort of age at time , i.e. the cohort born at time .
Assume that all cohorts have the same frailty distribution at birth and denote its Laplace transform by . By a straightforward generalization of the results in Section 2 we have and
(69) 
where and
(70)  
(71) 
Note that in contrast to earlier is not just the integrated cohort intensity. The frailty independent part of mortality, , needs to be subtracted to get the usual connection between and .
5.1 Model structure
Assume we have models for baseline and background mortality of the form
(72)  
(73) 
where and together with describe the period and age effects of frailty dependent mortality, and and together with describe the period and age effects of frailty independent mortality. All quantities can be multidimensional. Further generalizations are possible, but the chosen structure is sufficient to illustrate the idea.
For ease of presentation we assume that and are to be estimated from data, but they may in fact be fixed. As a simple example we could use a Gompertz law for baseline mortality, , and a constant for background mortality, . With no frailty this is the GompertzMakeham model, and when frailty is added to baseline mortality we get the logistictype model used in Section 4.1.
A generalized stochastic frailty model is a model where death counts are independent with
(74)  
(75) 
where and are given by (72)–(73) and denotes the conditional mean frailty of the cohort of age at time . The frailty distribution at birth is the same for all cohorts and it is assumed to belong to a family indexed by . The Laplace transform of the frailty distribution with index is denoted , and this is assumed available in explicit form. Further, as a matter of convention we assume that mean frailty is one at birth and we define .
5.2 Pseudolikelihood function
Estimation will be based on a pseudolikelihood function in which the problematic term is replaced by an estimate. This opens for a general, (relatively) easy estimation procedure which can be implemented whenever the baseline and background models can be estimated separately.