Log In Sign Up

Joint Models for Cause-of-Death Mortality in Multiple Populations

We investigate jointly modeling Age-specific rates of various causes of death in a multinational setting. We apply Multi-Output Gaussian Processes (MOGP), a spatial machine learning method, to smooth and extrapolate multiple cause-of-death mortality rates across several countries and both genders. To maintain flexibility and scalability, we investigate MOGPs with Kronecker-structured kernels and latent factors. In particular, we develop a custom multi-level MOGP that leverages the gridded structure of mortality tables to efficiently capture heterogeneity and dependence across different factor inputs. Results are illustrated with datasets from the Human Cause-of-Death Database (HCD). We discuss a case study involving cancer variations in three European nations, and a US-based study that considers eight top-level causes and includes comparison to all-cause analysis. Our models provide insights into the commonality of cause-specific mortality trends and demonstrate the opportunities for respective data fusion.


page 16

page 19


Analyzing Cause-Specific Mortality Trends using Compositional Functional Data Analysis

We study the dynamics of cause–specific mortality rates among countries ...

Multi-Output Gaussian Processes for Multi-Population Longevity Modeling

We investigate joint modeling of longevity trends using the spatial stat...

Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks

This paper introduces a neural network approach for fitting the Lee-Cart...

Mortality data reliability in an internal model

In this paper, we discuss the impact of some mortality data anomalies on...

Bayesian Modelling of Lexis Mortality Data

In this work we present a spatial approach to model and investigate mort...

1 Background and motivation

In-depth modeling of the evolution of human mortality necessitates analysis of the prevalent causes of death. This is doubly so for making mortality forecasts into the future across different age groups, populations and genders. In this article we develop a methodology for probabilistic forecasting of cause-specific mortality in a multi-population (primarily interpreted as a multi-national) context. Thus, we simultaneously fit multiple cause-specific longevity surfaces via a spatio-temporal model that accounts for the complex dependencies across causes and countries and across the Age-Year dimensions.

While there have been many works on modeling mortality across several populations (Dong et al. 2020, Enchev et al. 2017, Guibert et al. 2019, Hyndman et al. 2013, Kleinow 2015, Li and Lu 2017, Tsai and Zhang 2019), as well as an active literature on cause-of-death mortality, there are very few that do both simultaneously. As we detail below, there are many natural reasons for building such a joint model, and this gap is arguably driven by the underlying “Big Data” methodological challenge. Indeed, with dozens of mortality datasets that are indexed by countries, causes-of-death, genders, etc., developing a scalable approach is daunting. We demonstrate that this issue may be overcome by adapting machine learning approaches, specifically techniques from multi-task learning (Bonilla et al. 2008, Caruana 1997, Letham and Bakshy 2019, Williams et al. 2009). To this end, we employ multi-output Gaussian Processes (MOGP) combined with linear coregionalization. GPs are a kernel-based data-driven regression framework that translates mortality modeling into smoothing and extrapolating an input-output response surface based on noisy observations. It yields a full uncertainty quantification for mortality rates and mortality improvement factors. Coregionalization is a dimension reduction technique that enables efficiently handling many correlated outputs.

This work is a continuation of our series of articles Ludkovski et al. (2018), Huynh et al. (2020) and Huynh and Ludkovski (2021) that discussed the application of GPs to model all-cause mortality in the single-population and multi-population contexts, respectively. Unlike all-cause mortality in different geographic regions, which tends to exhibit strong correlation and long-term coherence, different causes have less commonality, and thus require a more flexible structure for the respective cross-dependence. Moreover, joint analysis of 10+ mortality surfaces brings computational scalability challenges of potentially hundreds of model parameters to calibrate. Thus, in order to carry out cause-specific mortality analysis, we make methodological innovations along two directions. First, we compare two distinct versions of the MOGP that implement dimension reduction by fusing outputs through linear combinations of latent functions: the Semiparametric Latent Factor Model (SLFM) and the Intrinsic Coregionalization Model (ICM). ICM assumes a fixed spatial kernel in Age-Year, while SLFM does not. This distinction corresponds to different assumptions about the structural commonality in the modeled mortality surfaces. We conduct sensitivity analysis to assess these two choices for the tasks of in-sample fitting and of forecasting cause-specific mortality. Second, we implement a multi-level MOGP-ICM model that separates latent factors across the different types (countries, causes, genders, etc.) of categorical inputs describing the populations. The separability assumption in the joint covariance kernel yields a product-type cross-population correlation structure, providing insights about the relationships between the mortality improvement trends.

Our models are driven by the scalability issue which has been a critical obstacle to analyze in bulk the large-volume mortality datasets that have become available recently. Thus, to cope with many mortality surfaces, we leverage the twin pillars of multi-output models (Teh et al. 2005, Letham and Bakshy 2019, Williams et al. 2009) and the structured Kronecker covariance that mitigates the typical cubic computational complexity of GPs (Flaxman et al. 2015, Gilboa et al. 2015, Saatçi 2011, Zhe et al. 2019).

Decomposing all-cause mortality rates leads to reduced signal-to-noise ratio since less common causes intrinsically have limited death counts. Consequently, cause-specific analysis must contend with much noisier data. One motivation for the MOGP approach is to explicitly enable data fusion across populations, sharing information to improve model fitting. We demonstrate significant de-noising of mortality experience that successfully captures cause-specific trends, including for causes with low data credibility. We also document the benefit of joint models to reduce model risk, i.e. improved inference of model hyper-parameters. As another feature, our framework can handle non-rectangular datasets, e.g. countries with different period coverages. In one of our case studies we exploit this to borrow the most recent data from other countries to update predicted domestic mortality rates.

Literature Review

One of the few works approaching cause-specific mortality modeling within a multi-population context is the recent study in Lyu et al. (2021). The authors introduced a nested model in the spirit of Li and Lee (2005) to jointly model major causes from three European countries by capturing the cross-cause and cross-country dependencies through common factors.

Since a given death is associated with a single cause-of-death, direct dependence among causes is not observable. As such, many researchers choose to model and forecast each cause in isolation. A variety of forecasting methods for individual causes are employed: univariate time-series, such as ARIMA methods in Caselli (1996), Knudsen and McNown (1993), McNown and Rogers (1992), dynamic parametrization in Tabeau et al. (1999), least squares methods and variations of the Lee-Carter model in Caselli et al. (2019)

. To capture dependence between causes, a common approach is via copulas within the framework of dependent competing risks. The main effort is to characterize the joint distribution of survival times in terms of un-observed cause-specific mortality rates, see

Dimitrova et al. (2013), Lo and Wilke (2010) and Li and Lu (2019). Alternatively, Alai et al. (2018)

utilized multinomial logistic regression. Outside the competing risks framework,

Arnold (-Gaille) and Sherris (2013)

proposed a multivariate Vector Error Correction time-series model to examine the existing cointegration relationships. Another approach is to link multiple causes through a list of clinical factors, then applying a stochastic model to forecast these factors

(Foreman et al. 2018).

Several studies relied on compositional data analysis to achieve coherence in the sense that cause-specific forecasts sum to the all-cause forecast. This entails modeling the by-cause distribution of deaths, directly incorporating dependence between causes, see Bergeron-Boucher et al. (2017), Kjaergaard et al. (2019). We refer to Wilmoth (1995), Caselli et al. (2019), Tabeau et al. (1999) for discussions on the usefulness and limitations in using cause-specific models to project all-cause mortality.

The remainder of this paper is organized as follows. Section 2 introduces cause-of-death mortality datasets from the HCD. Section 3 describes the MOGP-ICM framework and its extensions within multinational context. Section 4 focuses on how MOGPs can maximize predictive gains over single-population models and provide insights about the projected trends of aggregate mortality. Section 5 compares the results from Multi- and Single-level ICM. Finally, Section 6 concludes with main findings and directions for further analysis.

2 The Human Cause-of-death Database

The Human Cause-of-death Database (HCD) (HCD 2021) provides detailed cause-specific mortality data for more than a dozen developed countries. The HCD offers three levels for the classification of causes. The short list (with 16 broad categories) and the intermediate list (103 categories) are the same across countries, while the full list is country-specific. The data for each country is organized by calendar years, age groups, gender, and causes. Datasets for different countries do not line up, both due to historical availability and different timelines for updates, see Figure 21 in Appendix A.

As part of their extensive and well-documented post-processing the HCD conducted a series of bridge-coding studies to reduce disruptions in mortality trends due to changes in the International Classification of Disease (ICD). The current 10th Revision of the ICD is far more detailed with the addition of 8,000 categories compared to the 9th Revision (Anderson et al. 2001). As cause-of-death records switched to the 10th Revision, death counts were shifted among some categories. To minimize such jumps, the HCD reconstructed death assignments between the old and the new ICD Revisions; this reconciliation is already part of the HCD datasets we used.

In our first case study we analyze sub-categories of Cancer, available from the 103-category intermediate list. Being the leading cause of death worldwide (WHO 2021), it is useful to understand the trends in cancer mortality for different age groups and explore the dependence between its common variations. Moreover, cancer types generally do not feature any substantial trend discontinuity due to ICD revisions (Anderson et al. 2001), allowing a better assessment of model performance in terms of in-sample smoothing and out-of-sample forecasting.

Figure 1: Log-mortality rates of selected Cancers by Country & Age groups among Males. Note the different -axes in each panel. Source: HCD.

For our testbed we select 5 variations of cancer: lung and bronchus (LUN), colon and rectum (COL), pancreas (PAN), stomach (STM), and all other cancers (RMN) from 3 countries in the HCD: Czech Republic (CZE), Germany (DEU), and Poland (POL). The chosen Cancer causes-of-death are common in both Male and Female populations, enabling us to jointly model mortality rates across Cause, Country, and Gender factors. For example, we exclude breast cancer from the study as the respective male counts are very limited. Age in the HCD is formatted as discrete five-year age groups; we treat it as a continuous covariate encoded via the respective group average (52 for Age group 50–54, 57 for group 55–59, etc). In Figure 1, we visualize the raw log-mortality rates across different cancer types in Male populations from Czech Rep., Germany, and Poland. Lung and Colorectal are the leading causes of cancer deaths, while Pancreatic and Stomach are less prevalent and exhibit more volatility. Figure 22 in the Appendix visualizes the respective patterns in Age; we observe a strong convexity in Age, especially for LUN.

Our second case study uses a subset of the HCD data for United States, based on the CDC’s National Center for Health Statistics, where we decompose all-cause mortality into its major categories. This analysis is inspired by a similar study conducted by the SOA (Boumezoued et al. 2019) that employed a multivariate Lee-Carter model. The SOA study looked at 11 major causes, all Age groups between 0 and 95+ in 1999–2016 with data combined from the HCD and the Global Burden of Disease project. We restrict analysis to Ages 40–69 and years 1999–2018, and moreover reduce to the eight most common causes for the examined age groups: Heart (HEA), Stroke (STK), Diabetes (DIA), Cancers not induced by smoking (CAN), (lung) Cancers induced by smoking (CANL), Respiratory (RES), Drug abuse (DRU), and all other remaining causes (RMN). The cause mapping for this case study closely follows the SOA guidance.

2.1 Stacking Sub-Populations

For joint modelling purposes, datasets are stacked together. Generically we have a total of populations, indexed by the subscript . When needed, to indicate country, cause, and gender factors, we re-index by . Throughout, the two main independent variables are Age and Year, , and the observed mortality rate in the -th population is denoted by:


where is the number of deaths in the th population for the th observation with the corresponding Age & Year inputs. The denominator refers to the corresponding exposed-to-risk counts. Note that different causes in the same country-gender combination with have the same , but different ’s. Our GP models work with log-mortality rates so that our data is

The overall dataset is then represented as , , .

3 Methodology

3.1 Gaussian Process Regression for Mortality Tables

Consider the non-parametric additive regression model for the populations to be jointly analyzed:


where represents an individual entry in the mortality table (indexed by Age, Year), is the observed output in the th population, and , is the underlying latent log-mortality surface, obscured with the observation noise .

We first summarize the spatial structure that concerns the dependence of mortality rates as a function of Age and Year. Momentarily focusing on a single-output Gaussian Process (SOGP) regression, we put a GP prior on the latent function , meaning that any finite vector at

inputs follows the multivariate Gaussian distribution

where is the mean vector of size and is the -by- covariance matrix. All properties of a GP are thus completely described by its mean and covariance functions.

The functions characterize our prior beliefs about the response surface . The covariance kernel of the GP defines a similarity between pairs of data points. It characterizes the smoothing process by determining the influence of observations on the distribution of the output. Data points that are close are expected to behave more similarly than data points that are farther away. In terms of spatial dependence on (Age, Year) we concentrate on a common family of covariance functions known as the Matérn class, equipped with automatic relevance determination. Specifically, the Matérn-5/2 kernel defines the covariance between two mortality table entries as follows:


This kernel is parametrized by the Age lengthscale and the Year lengthscale .

The mean function describes the relevant trends in log-mortality rates. We fit a parametric prior mean to capture the long-term longevity features: where ’s are given basis functions and the ’s are unknown coefficients. For example, to reflect a quadratic trend in Age (Figure 22 in Appendix A) and linear trend in Year dimension we use:


We are interested in the posterior distribution for at specified inputs given the dataset , , in other words the likelihood of the true response surface being given what we have observed. Using where is the Kronecker delta, we have the Gaussian observation likelihood where the error terms are assumed to be independent and Gaussian-distributed, . Incorporating the evidence from observations, as reflected in the likelihood function and the prior, we obtain the Gaussian posterior

The Universal Kriging equations below provide not only the posterior mean

and posterior variance

but also the estimated coefficients

. Let , , and where is the covariance matrix . The posterior mean of along with the predicted posterior mean and respective variance for any input are:


where is the vector of covariances between inputs in the training set and desired test input . Note that the predictive distribution of observation at is similarly obtained as .

Below we use as the model prediction for the respective (log)-mortality rate of the th population in cell , and

as the corresponding posterior uncertainty which is used to obtain predictive quantiles around the former prediction.

3.2 Semi-parametric Latent Factor Model

The vector-valued latent response variable over the Age-Year input space is defined as

, where the functions are the log-mortality surfaces for the corresponding th population. Similar to single-population GP above, we place a GP prior over the latent function such that:

where is the mean vector function whose elements are the mean functions of each output, and is the fused covariance matrix. This implies that we separate the “spatial” dependence, encoded in from the inter-task dependence encoded in . While has a natural Euclidean metric that is used in (3), the population indices are generally of factor-type. Hence, to describe the dependence across outputs we need hyperparameters which becomes inefficient and unstable beyond 3-4 populations. Thus, with an eye towards reducing the number of hyperparameters, additional structure is needed for .

In the Semiparametric Latent Factor Model (SLFM) dating back to Teh et al. (2005), each output is assumed to be a linear combination of latent functions:


where ’s are independent realizations from GP priors with distinct covariances , and ’s are the factor loadings (), considered part of the kernel hyperparameter space . Thus, the semiparametric name of the model comes from the combination of a nonparametric component (several GPs) and a parametric linear mixing of the functions .

The role of is to achieve dimension reduction for the correlation structure across the ’s. Let be the vector representing the collection of coefficients associated with the th latent function across the outputs. Then the covariance of the vector-valued function is:


where symbolizes the Kronecker product, and each has rank one.

3.3 Intrinsic Coregionalization Model

Similar to SLFM, ICM assumes each output function is generated from a common pool of latent functions, cf. (8). However, the latent all share the same GP prior with the covariance kernel . Then, the covariance for is:


where has rank . In other words, the cross-output correlation is of rank , while the spatial covariance of each output is the same. The th element in the diagonal of the cross-covariance matrix (or in SLFM) represents the process variance of . Since , the individual entries are and the diagonals are , . We can similarly infer the correlation matrix between population and (); for ICM it is


In both SLFM and ICM, the fused covariance kernel belongs to the class of separable kernels (Alvarez et al. 2012) and decouples using the Kronecker product into: (1) the coregionalization matrices that measure the interaction between different outputs and (2) the spatial covariance over Age-Year dimensions , cf. Eqs (9) & (10). In ICM, all populations share the same spatial covariance kernel . Such assumption of a common spatial covariance over Age-Year inputs links to the concept of commonality in the mortality structure (but not levels) across populations. Compared to ICM, SLFM offers more flexibility at the cost of adding more hyperparameters: the total number of kernel hyperparameters in the fused covariance matrix of SLFM is vis-a-vis hyperparameters in the ICM.

Selecting rank . As is not one of the hyperparameters to be optimized, ad hoc ways are needed to pick it. We use the Bayesian Information Criterion (BIC) to select rank that produces the most parsimonious model, see Williams et al. (2009) and Huynh and Ludkovski (2021). As discussed in Bonilla et al. (2008), taking in ICM corresponds to finding a rank- approximation (based on an incomplete Cholesky decomposition) to the full-rank . A similar interpretation holds for SLFM and the respective ’s. While attractive for dimension reduction and computational speed-up, low may not be adequate to describe the overall dependence structure and hence clashes with the original goal of capturing the variability present in the fused mortality dataset. In particular, we observe that BIC tends to select which may be too small for . Based on our case studies, we recommend for maximizing predictive performance.

3.4 Multi-Level ICM for Scalable GPs

In the situation when we have multi-dimensional factor inputs (e.g. Cause and Gender together), one approach is to combine all factor inputs into a single covariate with distinct outputs prior to applying ICM or SLFM. When grows large, ICM becomes less feasible due to its time complexity Bonilla et al. (2008). In this section, we develop the structured Kronecker product kernel (multi-level ICM in Liu et al. (2019), Zhe et al. (2019)) to mitigate this scalability issue in GP. The structured covariance kernel exploits the fact that mortality tables are gridded along each factor dimension.

We express the total number of outputs as the product across types of categorical inputs, , where is the number of levels within the th categorical input. We then decompose the cross-population covariance as the Kronecker product:


where refers to the cross-covariance matrix between sub-populations within the th categorical input, taken to have rank . Directly marginalizing the cross-covariance matrices yields a convenient interpretation of the correlation between sub-populations within a factor input, and moreover allows for separate estimation of each cross-covariance sub-matrix , .

The multi-level ICM setup implies that each output is the weighted combination of independent latent functions, all with the spatial covariance kernel :


The improvement in scalability of the multi-level ICM can be analyzed via the ranks of the cross-covariance sub-matrices. Thanks to the Kronecker product’s property, and using Cholesky decomposition, Eq. (12) can be rewritten as:


where , , and each vector , () represents the collection of scalar coefficients associated with the th latent function across sub-populations in the th categorical input. Thus, the number of hyperparameters required to estimate the cross-covariance is , which can be much lower than for single-level ICM when is large. Note that when , the multi-level ICM utilises more latent functions to generate the model outputs, compensating for the imposed structure in in (14). In terms of overall complexity, multi-level ICM requires time compared to for single-level ICM.

Remark: In the typical situation, ’s are small and so it is feasible to consider full-rank multi-level ICM, i.e. . Otherwise, (14) allows to exploit simultaneously the Kronecker product structure, as well as the low-rank approximation. See Table 2 for results on the impact of values in multi-level ICM.

3.5 MOGP Hyperparameters

To implement a GP model requires specifying its hyperparameters. Note that actual inference reduces to linear-algebraic formulas in (6)-(7), and the modeling task is to learn the spatial covariance, namely the mean and kernel functions.

Mean function: We make the prior to be population-specific in order to maximize model flexibility in describing the mortality trend of each population. Thus, we have coefficients , cf. (4).

Observation Likelihood: We assume a constant observation noise within each population . This accounts for heterogeneous characteristics when observations from multiple populations are combined, in particular is smaller for larger populations and for more prevalent causes (Huynh et al. 2020). The ’s are estimated via Maximum Likelihood along with all other hyperparameters. More advanced GP models that either employ Poisson likelihood or infer input-dependent non-parametric are possible but require additional coding and are beyond the scope of this work.

Estimating Hyperparameters: For (multi-level) ICM the set of hyper-parameters is ; for SLFM it is . We use the R package kergp (Deville et al. 2019) to carry out the respective Maximum Likelihood Estimation via Kronecker decompositions. Alternatively, to account for model risk and offer more robust results, one could employ a fully Bayesian hyperparameter inference. This could be done with the Stan software (Carpenter et al. 2017) but is beyond the scope of this work.

3.6 Performance Metrics

Given a test set of observed ’s, we evaluate the effectiveness of different models using two metrics. First, we employ the mean absolute percentage error (APE) to examine the discrepancy between the observed and predicted outputs:


where is the observed value at test input in the th population, and is the predicted log-mortality rate. Note that APE is scale-independent, enabling us to compare model performance across populations with different exposures.

We also use the Continuous Ranked Probability Score (CRPS) metric to assess the quality of the probabilistic forecasts produced by a MOGP. Indeed, one of the major benefits of GP-based mortality models is a full distribution for future observations which allows a more detailed uncertainty quantification beyond just looking at the predictive mean . CRPS assesses the closeness of the realized outcome relative to its predictive distribution which in the GP contest is Gaussian and leads to


where are the standard Gaussian density and cdf. Observe how CRPS penalizes both bias () and excessive predictive variance.

We average both APE and CRPS across a test set of Age-Year-Population inputs. The resulting mean APE is interpreted as a normalized relative predictive error, and mean CRPS as the squared difference between the forecasted and the empirical cumulative distribution functions. Models with lower mean APE/CRPS are judged to have a better fit.

Mortality Improvement Factors. A common way to interpret a mortality surface is via the (annual) mortality improvement factors which measure longevity changes year-over-year. The raw annual percentage mortality improvement is . The smoothed improvement factor is obtained by substituting in the GP posterior means ’s:


4 Modeling Multiple Causes of Death

To understand the behavior of Age-specific mortality across different causes of death, we begin by generating MOGP models for Cancer variants. Using the HCD database we fit both ICM and SLFM and assess their performance in 3 European countries. We test the resulting predictive performance by computing APE and CRPS for one-year-ahead mortality forecasts in three separate test sets, using SOGP as the baseline. All models, including SOGP models, are trained on the same Ages from 50–84 (7 age groups) and three overlapping periods: 1998–2013 for the 2014 prediction, 1999–2014 for 2015 prediction, and lastly 2000–2015 for 2016 prediction. We report APE and CRPS for all-Cancer log-mortality rates since some of the cancer variations, such as Stomach and Pancreatic have relatively few (and therefore more noisy) recorded deaths. The differences in APE and CRPS between MOGP and SOGP models are expressed as the 3-year median percentage improvement over SOGP models. Positive improvement means joint models have smaller mean APE/ CRPS values. To compute all-Cancer CRPS, we leverage the closed-form expression of the MOGP multivariate predictive distribution in (6)-(7) to simulate the forecast distribution of all-Cancer log-mortality for each Age group in the data. To do so, we first draw () stochastic samples of the joint across all the Cancer types. After exponentiating and summing, we then obtain corresponding samples of (unlogged) all-Cancer mortality rates.

Cause ()
# Kernel
Czech Rep. Germany Poland
ICM 12 19.16 30.18 3.86 15.05 12.85 17.72
17 17.69 19.08 6.53 15.16 13.46 18.64
22 22.31 21.43 6.30 14.58 17.02 11.01
SLFM 14 18.04 29.49 5.09 13.51 10.82 16.05
21 22.87 23.72 2.78 13.55 4.11 10.65
28 16.97 18.24 8.75 16.73 5.13 3.81
Table 1: Comparison between MOGP ICM/SLFM with different ranks , reported as the relative improvement in APE and CRPS of MOGP vis-a-vis SOGP. The reported values are medians of one-year-out aggregated all-Cancer forecasts for age groups 50–84 based on 3 training periods: 1998–2013 (predict 2014), 1999–2014 (predict 2015), and 2000–2015 (predict 2016). Both ICM and SLFM are fitted on 5 Cancer types, Male populations in each selected country.

Table 1 shows the 3-year median improvement in MAPE and CRPS for Multi-cause ICM and SLFM over SOGP models. Overall, joint models produce more accurate mean forecasts (positive improvement in APE) with higher credibility (positive improvement in CRPS). We observe the opportunity to borrow information across different cancers to better estimate the kernel hyperparameters. As a result, joint models can describe important trends for individual cancers, leading to the reduction in disparity between the predicted values and the observed all-Cancer mortality in the test sets.

Figure 2: The predicted YoY improvement rates derived from multi-cause GP models by Country and Age groups among Male observations. In each country, MOGP-ICM () and SLFM () are fitted on Ages 50–84 (7 groups), Years 1998–2016 over 5 Cancer variations: Stomach, Colorectal, Pancreatic, Lung and Remaining types.

4.1 Commonalities in Cause-Specific Mortality Surfaces

The main difference between ICM and SLFM is the underlying assumption about the latent factors . ICM assumes that all factors have the same lengthscales , and is therefore appropriate for modeling homogenous mortality surfaces. SLFM is more general and fits distinct ; it is expected to perform better when the different surfaces exhibit heterogeneity (for example different degree of correlation across Age). We examine this commonality assumption in Figure 2 by displaying side-by-side the mortality improvement factors of the various cancers derived from multi-cause ICM and SLFM. The BIC selection criterion yields for both ICM and SLFM. In this case study, the results from SLFM are almost identical to ICM predictions. Therefore, the assumption of sharing the spatial kernel over Age-Year inputs across the considered cancer types is plausible. This conclusion is reinforced by Table 3 in Appendix A which shows that the inferred lengthscales for SLFM are very similar for across all three countries. In other words, both of the latent factors learned by SLFM have similar Age-Year spatial dependence, and so there is little loss of fidelity in a priori forcing them to be equal, as is done in ICM. Indeed, the lengthscales in SLFM are close to the ICM ones.

We can also inspect Figure 2 for insights about the relative mortality improvement trends of different cancers. Stomach cancer has the largest improvement rates for most age groups in all three countries. Decline in stomach cancer incidence tends to be associated with economic improvements resulting in healthier diet, better food preservation, clean water supply, etc. We also observe the increasing improvement trend of lung cancer among age groups below 60 in Czech Rep. and Poland, reflecting lower smoking rates in birth cohorts after WWII. Czech Males experienced large increase in the improvement rates in most cancers. In Germany, the improvement rates have been rising among age groups below 60 but slowing down among older age groups. In Polish Males, except stomach cancer, the improvement trends increased until early 2010s and then significantly declined, displaying the impact of an aging population and an increase in lifestyle exposure to risk factors for cancers (Wojtyś et al. 2014). The incidence of Stomach cancer has been flat over time, consistent with the finding in Arash et al. (2020).

(a) Multi-cause ICM in Czech R.
(b) Multi-cause ICM in Germany
(c) Multi-cause ICM in Poland
(d) Multi-cause SLFM in Czech R.
(e) Multi-cause SLFM in Germany
(f) Multi-cause SLFM in Poland
Figure 9: Correlation matrices derived from multi-cause GP models; rank is chosen for both ICM and SLFM. In each country, the MOGP model is fitted on Males aged 50–84 (7 groups) during Years 1998–2016, over 5 Cancer variations: Stomach, Colorectal, Pancreatic, Lung and Remaining types.

Figure 9 visualizes the inferred cross-cause correlation matrices . Both ICM and SLFM document a global positive association among mortality rates from these cancers in each country. It is consistent with strong resemblance in the mortality improvement trends between cancers in Figure 2. Since cancers within a given population share common risk factors, innovations in early detection and advancements in treatment for one cancer are likely to have positive effects on other cancer variations. Negative correlation reflects opposite trends, e.g. Stomach and Pancreas cancers in Poland; this may be due to a competing risks context.

For a different take on cause-of-death commonality, we applied the multi-cause GP ICM model to jointly model top causes in the HCD US dataset, separately for each gender. Using BIC as the criterion, models with rank (for 8 populations) yield the largest BICs for both Males and Females. This indicates a higher degree of heterogeneity in these larger cause groupings, compared to across five cancer causes in the previous study. Thus, the models employ more latent functions to adequately capture the total variability in the joint datasets. The inferred cross-cause correlation matrices are displayed in Figure 27 in Appendix B. Overall, our results are in agreement with the SOA study (Boumezoued et al. 2019), for example confirming that there are moderate positive associations between most causes. The strongest correlations are found between Heart disease and Stroke, and between Heart and Drug overdose. As might be expected, there is little correlation between Remaining Causes (RMN) and most other categories. Some of the correlations vary between genders, possibly due to strong observation noise in less common causes.

A complementary way to examine commonalities across causes is to inspect the inferred factor loadings . Populations that have similar loadings will be highly correlated. Figure 23 in Appendix A displays the factor loadings in a Country-Cause SLFM with latent factors

. We observe that primary clustering is by Causes rather than by Countries. Some outliers, such as STM in Czech R., can also be seen and suggest idiosyncratic behavior of the respective mortality surface.

4.2 Aggregating by-Cause models

An important motivation for our work was to use by-cause analysis to make more precise conclusions about all-cause mortality. For example, the mortality trends of individual Cancers give insights into the respective all-Cancer mortality trends. Figure 10 shows the results from a multi-level Country-Cause GP ICM. It visualizes the aggregated predictive distribution of all-Cancer log-mortality observations, , for Male populations by Country and Age group, using shading to denote predictive quantiles. Note that since the model did not use all-Cancer mortality during training, the fact that the predictive in-sample bands closely match the historical movement of all-Cancer mortality data is a validation of a successful by-cause analysis.

The effort in fighting cancer has transpired in all three countries, but the improvement is not uniform. Despite Czech Republic and Poland being socio-economically similar, Czech Rep. has a faster improvement pace than Poland. Although Germany continues to have the lowest log-mortality rates across all Age groups, the Czech Rep. has drastically closed this gap in the last decade, see especially the left panel of Figure 10 (Age group 55-59). The main driver is the rapid improvements in all common Cancers in Czech Rep., for example the mortality improvement factor for LUN being recently more than double compared to Germany, cf. Figure 2.

Figure 10: Predictive distribution of all-Cancer log-mortality rates for different age groups in 3 countries via multi-level Country-Cause ICM. The joint model is fitted on Males, Age groups between 50–84, Years 1998–2016 across 3 Countries () and 5 Cancers (). Shading indicates the 60%, 80%, 95% and 99% predictive quantile bands.

4.3 Trends in US Top-Level Causes

Figure 11 presents the in-sample posterior distribution of log-mortality rates during 1999–2018 along with the projected trends up to 2025 for two Age groups, in both Male and Female US populations. The left panels display the predictive trends for the top 6 causes in each age-group. We observe improvements in most causes and age groups; the largest improvement being in CANL. In contrast, Drug abuse deaths are rising rapidly among young age groups (e.g: Age 40–44), see the 2 upper-left panels of Figure 11. In the older age groups, mortality from Heart disease experienced large declines in early 2000s, but essentially flattened out after 2015. Boumezoued et al. (2019) emphasized the need for break-point detection to improve forecasts of such causes whose trends change over time. In the MOGP framework, the forecasts are driven by the most recent data and are thus automatically adjusted if trends shift. So for example, we do not need to do any break-point analysis to achieve the slow pace of future HEA improvement shown in Figure 11.





Figure 11: Posterior distribution of true log-mortality rates for US Males and Females for the 40-44 and 65-69 Age groups. In each row: left panel shows the posterior quantiles for top 6 individual causes (via multi-cause GP); middle panel shows aggregate log-mortality trends via by-cause model (multi-cause GP); right panel shows aggregate log-mortality trends via all-cause model (single-output GP). All models are fitted on Ages 40–69 and Years 1999–2018. The vertical lines indicate the boundary between in-sample (1999–2018) and out-of-sample forecast (2019–2025). Shading indicates the 60%, 80%, 95% and 99% predictive quantile bands. We further overlay the 50%-quantile (predictive median) of by-cause and all-cause models for the out-of-sample period for convenient comparison.

The middle and right columns in Figure 11 compare the aggregate all-cause projections for the US population. We witness the pessimism of the aggregate projected trends based on the by-cause models compared to an all-cause SOGP (right column), especially in the younger ages. This discrepancy, driven by the growing importance of causes with increasing trends, such as Drug overdose, highlights the additional insights from by-cause modeling. The pessimism of by-cause analysis was first mentioned in Wilmoth (1995) and re-iterated in Boumezoued et al. (2019). For older age groups the underlying dynamics among common causes are more stable and all-cause and by-cause forecasts are broadly similar.

Note that compared to Figure 10, the GP posterior uncertainty bands widen dramatically as we go from in-sample (up to Year 2018) to extrapolating for Years 2019–2025. This reflects the data-driven nature of GP forecasts which intrinsically leads to low uncertainty for in-sample smoothing and widening uncertainty as predictions are made further into the future. This phenomenon gets amplified as we add up the by-cause forecasts to obtain all-cause predictions, witness the wider band in the middle panels of Figure 10 relative to the right panel based on a SOGP.

As discussed in Huynh and Ludkovski (2021), MOGPs are well-suited to generate expert-based projections. This is a useful feature to have given that by default projections are driven by the historical trends that might not continue in the future. For instance, the increasing trend of Drug abuse is largely fueled by the opioid epidemic. Assuming this crisis is addressed in the future, the projected DRU mortality should be adjusted downward. In MOGP this can be achieved by modifying the Year trend in . Figure  24 in the Appendix displays an illustration where the trend of Drug abuse is reduced by one third (through lowering the Year effect by one third) of the original pace for both the Male and Female populations. The resulting adjusted forecast for aggregate mortality gets closer to that from the all-cause model, reducing the level of pessimism we have observed earlier among the US young population. Another potential adjustment could be for smoking-induced cancer, where the MOGP models extrapolate the historical trend of rapid longevity gains. However, the SOA report (Boumezoued et al. 2019) suggests that this pace might not take place in the intermediate term, lowering aggregate mortality gains.

5 Cause-of-Death Joint Modeling in a Multinational Context

We proceed to consider simultaneously 30 populations in the Cancer variations case study, arranged by the three factor inputs of Cause ), Country () and Gender ().

5.1 Multi-level vs Single-level Correlation Structure

(a) Multi-level ICM (, , ) with 40 hyperparameters
(b) Single-level ICM () with 62 hyperparameters
Figure 14: Cross-correlation matrices of MOGP models that incorporate Country, Cause, and Gender as categorical inputs. Thirty total populations.

Figure 14 displays the inferred correlation structure across the above 30 populations. On the left we fit a multi-level ICM () and on the right a single-level ICM (). Both models are fitted on Age groups 50–84, Years 1998–2016 for 3 Countries, 5 Cancers, and 2 Genders. Observe that the multi-level model has hyperparameters compared to in the single-level model. The right panel does not display any recognisable structure in the inferred correlations; because the model is not aware of the different factor dimensions, after dimension reduction the marginal associations between sub-populations within the original factor inputs are no longer accessible. In contrast, the multi-level model enforces a block structure in the correlation matrix , see Figure 14 (a). Recall that the correlation sub-matrices for each factor input are estimated separately and the Kronecker product structure implies that we can read off the correlation among any combination of factors. Figure 18 displays the derived sub cross-correlation matrices from the above three-level Country-Cause-Gender GP model. One can then multiply these factor-based correlations to get the total correlation in Figure 14(a). For example, the correlation of Cancer mortality between Czech-Lung-Male and Polish-Pancreas-Male is . This is in fact also the correlation between Czech-Lung-Female and Polish-Pancreas-Female () mortality. The limited number of deaths in some causes explains differences in the correlation matrices between single- and multi-level ICM. For example, the correlations between Polish-Stomach-Female and other cancer types are mostly negatively correlated in ICM but (mildly) positively correlated in the multi-level ICM.

(a) Cross-country correlation ()
(b) Cross-cause correlation ()
(c) Cross-gender correlation ()
Figure 18: Cross-correlation matrices derived from a three-level Country-Cause-Gender multi-level ICM model (), fitted on Age groups 50–84 and Years 1998–2016.

5.2 Model Selection

Next, we compare the predictive performance of multi-level vs single-level MOGP-ICM. Following the same setup as Table 1, Table 2 shows the 3-year median improvement in APE and CRPS for different joint models relative to SOGP models in a multinational context. Joint models produce more accurate mean forecasts with higher credibility but the predictive gains are not uniform across countries. The largest improvements from multi-level ICM are in Czech Republic; Czech raw data has lower credibility due to its smaller exposures, thus there is more opportunity for data fusion. The results further validate our approach of modelling cause-specific mortality across populations: models that incorporate information from foreign countries (e.g. Country-Cause GP) have larger predictive improvements compared to Table 1. In Poland due to the structural break encountered in mid-2010s (see Fig. 10), the performance of aggregated Country-Cause MOGP is consistently worse than those of all-cause SOGP; this issue is rectified for Country-Cause-Gender MOGPs.

CountryCause ()
# Kernel
Czech Rep. Germany Poland
ICM 32 35.54 30.74 12.02 21.06 40.44 59.47
47 45.51 50.07 8.50 20.35 37.09 38.48
62 31.64 36.56 14.08 35.66 13.32 9.56
SLFM 34 31.51 24.57 4.44 15.87 47.66 45.98
51 33.78 41.59 0.88 8.62 37.25 24.54
68 44.79 43.29 8.93 20.49 45.63 37.66
Multi-level ICM 18 26.42 24.24 4.55 6.84 48.46 61.53
28 25.74 32.26 5.04 15.75 35.94 44.24
36 42.03 36.80 3.07 11.12 44.45 53.69
CountryCauseGender ()
# Kernel
Czech Rep. Germany Poland
ICM 62 9.99 11.75 5.40 13.40 12.21 6.55
92 0.52 3.07 10.78 20.35 15.31 4.09
Multi-level ICM 32 14.85 24.91 68.45 15.22 25.11 7.75
40 21.34 27.22 40.45 10.22 5.12 8.04
Table 2: Comparison between MOGP models with different ranks in terms of APE and CRPS metrics. The reported values are median relative improvements of MOGP vs SOGP of one-year-out aggregated all-Cancer forecasts for age groups 50–84 based on 3 training periods: 1998–2013 (predict 2014), 1999–2014 (predict 2015), and 2000–2015 (predict 2016). Top half: MOGP models fitted on 5 cancer types and 3 countries. Bottom half: MOGP models fitted on 5 cancer types, 3 countries and 2 genders.





Figure 19: Predictive log-mortality distributions from single-level ICM () and multi-level ICM () models with 3 factor inputs: Country, Cause, and Gender (30 populations). All models are fitted on Age groups 50–84 and Years 1998–2013, and applied for 3-year-out forecasts up to 2016. The shadings indicate 95% predictive bands; the vertical lines mark the edge of training data. Note different -axes in each panel.

In contrast to Section 4 where the performance of ICM and SLFM was almost identical, with multiple input factors ICM usually outperforms SLFM. Given the strong commonality in mortality trends across Cancer types, a shared Age-Year covariance kernel appears preferable for information fusion. The comparison between single- and multi-level ICM depends upon the number of populations to model. In Country-Cause-Gender setting with populations, the multi-level ICM () yields better mean APE and CRPS than single-level ICM, and moreover uses fewer hyperparameters. For Country-Cause setting, the performance is comparable.

Table 2 also shows the impact of the latent ranks and ’s. For Country-Cause, tends to yield the best results in single-level models; but for Country-Cause-Gender, ICM with performs consistently worse than , presumably due to unstable estimates of the more than 90 underlying hyperparameters. For multi-level ICM, we generally find that full-rank works best, although low-rank setups also yield good predictive performance, indicating the opportunity to shrink even further the number of kernel hyperparameters.

Figure 19 shows the predicted log-mortality rates for individual Cancers and Age group 55-59 via full-rank multi-level ICM and single-level ICM with . Both models are fitted on Age groups 50–84 and Years 1998–2016 before we perform out-of-sample forecasts for the next 3 years (2014–2016). The single-level ICM produces over-smoothed forecasts for several cancers, especially cancers with large observation noise like Stomach and Pancreas; this problem is mitigated by the shorter length-scale in Year in multi-level ICM ( versus in single-level ICM).

Coherence in cause-specific trends: Figure 19 demonstrates that Males and Females do not always share similar progress in mortality reduction. While the trends in Stomach, Colorectal, and Pancreatic cancers are consistent for both genders, for Lung cancer the Male-Female gap is diminishing rapidly, especially in Germany and Poland. This is driven by a decrease in cigarette consumption among men, while women are more likely to develop Lung cancers that are not associated with smoking. Thus, the concept of forecast coherence (namely extrapolating a stable Male-Female spread, as observed historically) is not always well suited for cause-of-death analysis.

5.3 Borrowing the Latest Datasets from Other Populations

The period coverage in HCD varies by country as datasets for countries are uploaded asynchronously. This implies that some countries have more up-to-date datasets than others, see Figure 21. This offers opportunities to fuse data from other countries to update domestic forecasts. For Age-Period-Cohort models, such as Li and Lee (2005), this is generally challenging as the model fitting relies on having a rectangular dataset. Our MOGP framework can straightforwardly handle such ‘notched’ datasets to take full advantage of additional observations in different countries.

As an illustration, we choose Czech Males as a reference population and examine one-year-out prediction quality for several models. We take Czech observations for 1998–2015 and borrow the more up-to-date dataset from Germany (1998–2016) to implement the notched setup. Table 4 in the Appendix describes the setup of the different models we consider in this experiment.

Figure 20:

Comparison of the prediction accuracy for 2016 all-Cancer log-mortality of Czech Males for indicated Age groups between different models with ‘notched’ setup. Top row: Predictive standard deviation

; bottom row: discrepancy between the predictive mean and the observed value .

Figure 20 visualizes relative performance by comparing two components: (a) predictive standard deviation of the underlying 2016 (top panels), and (b) prediction errors relative to the realized 2016 all-Cancer log-mortality rates of Czech Males (bottom). Our benchmark is Czech Males SOGP fitted on all-Cancer log-mortality rates from 1998–2016. We see that a Country-Cause MOGP yields more credible predictions (lower ) than the benchmark model that performs in-sampled smoothing of the 2016 Czech all-Cancer experience. The forecasts from this joint model are nearly as good as the benchmark as both models produce the smallest prediction errors. In fact, the prediction quality of Country-Cause MOGP is as competitive as the Multi-cause MOGP that simultaneously models the log-mortality rates of all 5 cancer types in Czech Males with 2016 observations available. Thus, borrowing the latest information from Germany improves the prediction quality of the recent all-Cancer mortality rates of Czech Males. The results are as good as when we have domestic data available in Czech Republic.


At the moment, the HCD offers cause-of-death data for less than 15 countries. Many countries do not have recent data available yet (e.g. only up to 2014), leading to limited options in terms of the selection and the number of countries we can incorporate with Czech Males in this experiment. Based on our analysis in

Huynh and Ludkovski (2021), choosing countries that are highly correlated with Czech Males is essential to maximize the prediction quality.

6 Conclusion

In this article, we develop multi-output GP models for cause-specific mortality modeling within a multi-population context. With the MOGP mechanism, we are able to capture the cross-cause dependencies that allow joint models to gain more predictive power over single-output models that treat each population independently. Among MOGP variants, SLFM offers more flexibility and is recommended for modeling heterogenous causes, such as top-level ICD categories. Multi-level ICM is demonstrated to work well for interpretable modeling across multiple factor inputs. Our case studies show the applicability of MOGPs to understand cause-specific and aggregate mortality trends, both within a country and across nations, whereby our framework is convenient for information fusion and credibility boosting.

The current work focuses on exploiting the structured Kronecker covariance to efficiently learn the joint covariance kernel. This is sufficient for handling a moderately large number of sub-populations (up to 30 in our case studies); additional techniques would be needed to handle larger datasets, for example across more causes-of-death or across all the countries in HCD. There is an active research area looking at alternative methods (local kernel interpolation, inducing points, block structures, etc., see e.g.,

Flaxman et al. (2015)) for massive scalable GP well-suited for gridded mortality datasets. Another methodological extension would be to consider a Linear Coregionalization Model (LCM) for mortality modeling. LCM generalizes ICM and SLFM and allows multiple latent functions from GP priors with different covariance kernels. A third direction would be to investigate other kernel families, such as composite kernels or kernels that can incorporate structural changes. The latter would be useful to model (sub-)causes that exhibit strong disruptions over time in their mortality trends.

Appendix A Additional Plots for the Case Study on Cancer Sub-Types

Year ranges available in the HCD by country

Figure 21: Countries in the HCD and their historical data coverage.

Age patterns of log-mortality rates of different cancers

Figure 22: Raw log mortality as a function of Age for five Cancer variations in Male populations. We show 19 faint curves for each year in 1998–2016 range used for training, as well as the respective average historical log-mortality rate in bold.

Kernel Hyperparameter Learning

Searching for optimal hyperparameters in GP can be challenging if the marginal likelihood features multiple local maxima or flattens around its global maximum (Rasmussen and Williams 2005). When the optimizer fails to find the global maximum, unsuitable length-scales in Age and Year sometimes result, leading to a poor fit of the data. Table 3 reports the inferred length-scale in Age () and Year () of the SOGP models fitted on the 5 Cancer types for the Male populations in the three considered countries. Many SOGP models have being too short (less than 5 or so), resulting in oscillatory fitted ’s. Such models lack the ability to distinguish between true signal and the inherent randomness in the data. Similarly, when the estimated lengthscales are too large, the fitted GP surfaces are over-smoothed. Joint models tend to better learn the hyper-parameters by enabling data fusion across multiple populations and utilizing more observations. In Table 3 we show that when we fit multi-cause MOGP (both ICM and SLFM) on all 5 cancer types, the resulting lengthscales are all well calibrated.

Czech Rep. SOGP on each cancer type Multi-cause ICM Multi-cause SLFM
Stomach Colorectal Pancreas Lung Others () ()
159.92 16.32 6.21 17.69 12.52 16.90 21.26 19.67
38.49 9.03 7.82 13.87 3.69 10.12 13.05 11.42

Germany SOGP on each cancer type Multi-cause ICM Multi-cause SLFM
Stomach Colorectal Pancreas Lung Others () ()
8.65 3.72 8.99 0.00 2.79 8.59 11.75 9.46
7.35 4.90 7.69 4.82 4.44 10.26 9.92 6.77

Poland SOGP on each cancer type Multi-cause ICM Multi-cause SLFM
Stomach Colorectal Pancreas Lung Others () ()
23.96 16.28 5.72 15.25 22.80 16.98 16.20 21.89
253.34 3.69 6.19 9.04 6.01 12.83 11.19 12.11

Table 3: GP lengthscales in Age and Year for different models. All models are fitted on Age groups 50–84 and Years 1998–2016.

Training Designs in Notched Datasets

GP models Outcome variable Country Year Pred. Type Abbrev.
SOGP All-cancer log-mortality Czech R. 1998–2016 In-sample SOGP (’98-’16)
1998–2015 Out-of-sample SOGP (’98-’15)
Multi-cause GP
By-cancer log-mortality
(5 variations)
Czech R. 1998–2016 In-sample Multi-cause (’98-’16)
1998–2015 Out-of-sample Multi-cause (’98-’15)
Country-Cause GP
By-cancer log-mortality
(5 variations)
Czech R. 1998-2015 Out-of-sample Country-Cause
Germany 1998-2016
Table 4: Descriptions for models being applied to forecast 2016 all-Cancer log-mortality rates of Czech Males. All the models are fitted on Ages 50–84 (7 age groups) and Year coverage listed below.

Illustrating Latent Factor Loadings in SLFM

Figure 23: Factor loadings in the Country-Cause SLFM with . The model is fitted on Ages 50–84, Years 1990–2016, over 3 Countries and 5 Cancer types. For each of the 15 populations, we plot as a point in 3-space.

Appendix B Additional Plots for the US Top-Level-Cause Study

Adjusting Drug Overdose Trend

Figure 24: Comparison of original and adjusted predicted log-mortality for Age group 40–44, US Males and Females. Left: same model after reducing Drug Abuse yearly trend by one-third. Right: corresponding aggregated all-cause trends. Vertical lines indicate the edge of training data (1999–2018).

Cross-correlation Matrices for the US All-Cause Analysis

(a) US Males
(b) US Females
Figure 27: Cross-cause correlation matrices derived from multi-cause single-level () ICM MOGP, fitted on Ages 40–69 (6 age groups) and Years 1999–2018, separately for US Males and US Females.


  • Alai et al. [2018] Daniel H. Alai, Séverine Arnold (-Gaille), Madhavi Bajekal, and Andrés M. Villegas. Mind the gap: a study of cause-specific mortality by socioeconomic circumstances. North American Actuarial Journal, 22(2):161–181, 2018.
  • Alvarez et al. [2012] Mauricio A. Alvarez, Lorenzo Rosasco, and Neil D. Lawrence. Kernels for vector-valued functions: a review. Foundations and Trends in Machine Learning, 4(3):195–266, 2012.
  • Anderson et al. [2001] Robert N Anderson, Arialdi M Miniño, Donna L Hoyert, and Harry M Rosenberg. Comparability of cause of death between ICD-9 and ICD-10: preliminary estimates. National Vital Statistics Report, 49(2):1–32, May 2001.
  • Arash et al. [2020] Etemadi Arash, Shakeri Ramin, Safiri Saeid, and Sadaf G Sepanlou. The global, regional, and national burden of stomach cancer in 195 countries, 1990–2017: a systematic analysis for the global burden of disease study 2017. The Lancet Gastroenterology and Hepatology, 5(1):42–54, 2020.
  • Arnold (-Gaille) and Sherris [2013] Séverine Arnold (-Gaille) and Michael Sherris. Forecasting mortality trends allowing for cause-of-death mortality dependence. North American Actuarial Journal, 17(4):273–282, 2013.
  • Bergeron-Boucher et al. [2017] Marie-Pier Bergeron-Boucher, Vladimir Canudas-Romo, James E. Oeppen, and James Vaupel. Coherent forecasts of mortality with compositional data analysis. Demographic Research, 37(17):527–566, 2017.
  • Bonilla et al. [2008] Edwin V Bonilla, Kian M. Chai, and Christopher Williams. Multi-task Gaussian Process prediction. In Advances in Neural Information Processing Systems 20, pages 153–160. Curran Associates, Inc., 2008.
  • Boumezoued et al. [2019] Alexandre Boumezoued, Jean-Baptiste Coulomb, Al Klein, Damien Louvet, and Eve Titon. Modeling and forecasting cause-of-death mortality. Technical report, Society of Actuaries, 2019.
  • Carpenter et al. [2017] Bob Carpenter, Andrew Gelman, Matthew D.Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: a probabilistic programming language. Journal of Statistical Software, 76(1), 2017.
  • Caruana [1997] Rich Caruana. Multi-task learning. Machine Learning, 28(1):41–75, Jul 1997.
  • Caselli [1996] Graziella Caselli. Future longevity among elderly populations. In Health and mortality among elderly populations, pages 235––265. Oxford University Press, 1996.
  • Caselli et al. [2019] Graziella Caselli, Jacques Vallin, and Marco Marsili. How useful are the causes of death when extrapolating mortality trends. an update. In Tommy Bengtsson and Nico Keilman, editors, Old and New Perspectives on Mortality Forecasting, pages 237–259. Springer International Publishing, 2019.
  • Deville et al. [2019] Yves Deville, David Ginsbourger, and Olivier Roustant. Contributors: Nicolas Durrande. kergp: Gaussian Process Laboratory, 2019. URL R package version 0.5.0.
  • Dimitrova et al. [2013] Dimitrina S. Dimitrova, Steven Haberman, and Vladimir K. Kaishev. Dependent competing risks: cause elimination and its impact on survival. Insurance: Mathematics and Economics, 53(2):464–477, 2013.
  • Dong et al. [2020] Yumo Dong, Fei Huang, Honglin Yu, and Steven Haberman.

    Multi-population mortality forecasting using tensor decomposition.

    Scandinavian Actuarial Journal, 2020(8):754–775, 2020.
  • Enchev et al. [2017] Vasil Enchev, Torsten Kleinow, and Andrew J. G. Cairns. Multi-population mortality models: fitting, forecasting and comparisons. Scandinavian Actuarial Journal, 2017(4):319–342, 2017.
  • Flaxman et al. [2015] Seth Flaxman, Andrew Wilson, Daniel Neill, Hannes Nickisch, and Alex Smola. Fast Kronecker inference in Gaussian processes with Non-Gaussian likelihoods. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 607–616, Lille, France, 07–09 Jul 2015. PMLR.
  • Foreman et al. [2018] Kyle Foreman, N. Marquez, A. Dolgert, K. Fukutaki, N. Fullman, M. McGaughey, Martin A Pletcher, A. Smith, Kendrick Tang, Chun-Wei Yuan, J. Brown, Joseph Friedman, J. He, Kyle Heuton, Mollie Holmberg, Disha J Patel, P. Reidy, Austin Carter, Kelly M Cercy, Abigail Chapin, D. Douwes-Schultz, Tahvi D Frank, F. Goettsch, P. Liu, Vishnu Nandakumar, M. Reitsma, Vincent Reuter, Nafis Sadat, Reed J. D. Sorensen, V. Srinivasan, R. Updike, Hunter York, Alan D. Lopez, R. Lozano, Stephen S. Lim, A. Mokdad, S. Vollset, and C. Murray. Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: reference and alternative scenarios for 2016–40 for 195 countries and territories. Lancet (London, England), 392:2052 – 2090, 2018.
  • Gilboa et al. [2015] Elad Gilboa, Yunus Saatçi, and John P. Cunningham. Scaling multidimensional inference for structured Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):424–436, 2015.
  • Guibert et al. [2019] Quentin Guibert, Olivier Lopez, and Pierrick Piette. Forecasting mortality rate improvements with a high-dimensional VAR. Insurance: Mathematics and Economics, 88:255–272, 2019.
  • HCD [2021] HCD. The Human Cause-of-Death Database. French Institute for Demographic Studies (France) and Max Planck Institute for Demographic Research (Germany), 2021. URL
  • Huynh and Ludkovski [2021] Nhan Huynh and Mike Ludkovski. Multi-output Gaussian processes for multi-population longevity modelling. Annals of Actuarial Science, 15(2):318–345, 2021.
  • Huynh et al. [2020] Nhan Huynh, Mike Ludkovski, and Howard Zail. Multi-population longevity models: a spatial random field approach. Proceedings of the Society of Actuaries 2020 Living to 100 Symposium, 2020.
  • Hyndman et al. [2013] Rob J Hyndman, Heather Booth, and Farah Yasmeen. Coherent mortality forecasting: the product-ratio method with functional time series models. Demography, 50(1):261–283, 2013.
  • Kjaergaard et al. [2019] Søren Kjaergaard, Yunus Emre Ergemen, Malene Kallestrup-Lamb, Jim Oeppen, and Rune Lindahl-Jacobsen. Forecasting causes of death by using compositional data analysis: the case of cancer deaths. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(5):1351–1370, 2019.
  • Kleinow [2015] Torsten Kleinow. A common age effect model for the mortality of multiple populations. Insurance: Mathematics and Economics, 63:147–152, 2015.
  • Knudsen and McNown [1993] Christin Knudsen and Robert McNown. Changing causes of death and the sex differential in the USA: Recent trends and projections. Population Research and Policy Review, 12(1):27–41, 1993.
  • Letham and Bakshy [2019] Benjamin Letham and Eytan Bakshy. Bayesian optimization for policy search via online-offline experimentation. Journal of Machine Learning Research, 20(145):1–30, 2019.
  • Li and Lu [2017] Hong Li and Yang Lu. Coherent forecasting of mortality rates: A sparse vector-autoregression approach. ASTIN Bulletin: The Journal of the IAA, 47(2):563–600, 2017.
  • Li and Lu [2019] Hong Li and Yang Lu. Modeling cause-of-death mortality using hierarchical Archimedean copula. Scandinavian Actuarial Journal, 2019(3):247–272, 2019.
  • Li and Lee [2005] Nan Li and Ronald Lee. Coherent mortality forecasts for a group of populations: An extension of the Lee-Carter method. Demography, 42(3):575–594, 2005.
  • Liu et al. [2019] Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. When Gaussian process meets big data: A review of scalable GPs, 2019. Arxiv 1807.01065.
  • Lo and Wilke [2010] Simon M. S. Lo and Ralf A. Wilke. A copula model for dependent competing risks. Journal of the Royal Statistical Society. Series C (Applied Statistics), 59(2):359–376, 2010.
  • Ludkovski et al. [2018] Mike Ludkovski, Jimmy Risk, and Howard Zail. Gaussian process models for mortality rates and improvement factors. ASTIN Bulletin: The Journal of the IAA, 48(3):1307–1347, 2018.
  • Lyu et al. [2021] Pintao Lyu, Anja De Waegenaere, and Bertrand Melenberg. A multi-population approach to forecasting all-cause mortality using cause-of-death mortality data. North American Actuarial Journal, 25(1):S421–S456, 2021.
  • McNown and Rogers [1992] Robert McNown and Andrei Rogers. Forecasting cause-specific mortality using time series methods. International Journal of Forecasting, 8(3):413–432, 1992.
  • Rasmussen and Williams [2005] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.
  • Saatçi [2011] Yunus Saatçi. Scalable Inference for Structured Gaussian Process Models. PhD thesis, University of Cambridge, 2011.
  • Tabeau et al. [1999] Ewa Tabeau, Peter Ekamper, Corina Huisman, and Alinda Bosch. Improving overall mortality forecasts by analysing cause-of-death, period and cohort effects in trends. European Journal of Population / Revue Européenne de Démographie, 15(2):153–183, 1999.
  • Teh et al. [2005] Y. Teh, M. Seeger, and Michael I. Jordan. Semiparametric latent factor models. In

    International Workshop on Artificial Intelligence and Statistics (AISTATS)

    , pages 333–340. PMLR, 2005.
  • Tsai and Zhang [2019] Cary Chi-Liang Tsai and Ying Zhang. A multi-dimensional Bühlmann credibility approach to modeling multi-population mortality rates. Scandinavian Actuarial Journal, 2019(5):406–431, 2019.
  • WHO [2021] WHO. Cancer, 2021. URL
  • Williams et al. [2009] Christopher Williams, Stefan Klanke, Sethu Vijayakumar, and Kian Chai. Multi-task Gaussian process learning of robot inverse dynamics. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2009.
  • Wilmoth [1995] John R. Wilmoth. Are mortality projections always more pessimistic when disaggregated by cause of death? Mathematical Population Studies, 5(4):293–319, 1995.
  • Wojtyś et al. [2014] Piotr Wojtyś, Andrzej Antczak, and Dariusz Godlewski. Predictions of cancer mortality in Poland in 2020. Central European Journal of Medicine, 9:667–675, 2014.
  • Zhe et al. [2019] Shandian Zhe, Wei Xing, and Robert M. Kirby. Scalable high-order Gaussian process regression. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 2611–2620. PMLR, 16–18 Apr 2019.