1 Net survival and excess mortality hazard model
Survival analysis after the diagnosis of cancer is an active research area in cancer epidemiology and a primary interest of many countries. The three typical frameworks adopted to model cancer survival data are: (i) the overall survival setting (where overall or allcause mortality is studied), (ii) the causespecific setting (where the cause of death is known), and (iii) the relative survival setting. The overall survival setting is not the optimal choice for cancer epidemiology when the main interest is on comparing two or more populations (e.g. two countries, two periods in the same country, two groups with different deprivation levels within the same country and at the same period, and etcetera), since it is affected by other causes of mortality, which may differ for the populations of interest. In practice, the cause of death is typically unavailable or, in the absence of a standardised protocol, not reliable. Thus, the causespecific setting may not be a reasonable choice either. The relative survival setting (PoharPerme et al., 2016) represents a useful alternative, where the mortality hazard associated to other causes is approximated by the general population hazard , which is typically obtained from life tables based on the available sociodemographic characteristics (e.g.
sex, region and deprivation level) in addition to age and year. The general population hazard is also referred to as the “expected hazard” and the “background mortality rate”. While defined in a hypothetical world, where patients could only die from the cancer under study, net survival (the main object of interest in the relative survival setting) represents a useful way of reporting and comparing the probability of survival of cancer patients since this quantity is not affected by differences in expected mortality (due to other causes) between populations. The basic idea behind net survival consists of decomposing the hazard function associated to an individual,
, as the sum of the hazard associated to the disease of interest (e.g. a specific cancer), , and the hazard associated to other causes in the population of interest, . The hazard associated to other causes is approximated with the general population hazard , assuming that the contribution in the general population hazard of a specific cancer type is small compared to all other causes of death. This is:(1) 
where and
are vectors of covariates, and
typically corresponds to a subset of covariates of . The variables “” and “” represent the age at diagnosis and the year of diagnosis, respectively, thus and represent the age and the year at time after diagnosis. This model is also known as excess hazard model (Esteve et al., 1990). The net survival is defined as the survival function associated to the excess hazard function . Estimation of the net survival function has been largely studied from parametric, semiparametric, and nonparametric perspectives (see Remontet et al., 2007, Pohar Perme et al., 2009, Perme et al., 2012, and Rubio et al., 2018).In practice, a limitation of quantities derived in the relative survival setting (Perme et al., 2012), such as the excess hazard, is that patients need to be matched to groups of the general population sharing the available characteristics in order to obtain the background mortality rates from the life tables. The number of available characteristics varies for different countries, and there exist certain characteristics that are not available at the population level that may affect the background mortality rates such as deprivation, ethnicity, drug use (tobacco, alcohol, and etcetera), comorbidities, among others. For instance, it has been shown that deprivation levels are associated with life expectancy in some populations (Woods et al., 2005). Thus, if life tables are obtained without a proper stratification of deprivation levels, this will imply that two individuals with different deprivation levels (e.g. most affluent vs. most deprived), but sharing other sociodemographic characteristics, will be assigned the same background mortality. We will refer to the case when the life table is not sufficiently stratified as a “mismatch in the life table”. This mismatch may also apply to other characteristics (e.g. smoking status), thus potentially impacting the comparison ability of the net survival measure (Pavlič and PoharPerme, 2018). Moreover, Dickman et al. (1998) and Grafféo et al. (2012) showed that not including relevant information for matching the mortality rates of patients induces a bias in the estimation of the parameters in net survival models (even for other variables than those included in the life tables). Concerns about the implications of this kind of mismatches in the life tables have recently been discussed in Pavlič and PoharPerme (2018) and Bower et al. (2018). Thus, it is desirable to produce models that can capture possible mismatching information in the background mortality rates associated to each patient, and that allow the assessment of the impact of this mismatch.
In this work, we study an existing model for correcting the background mortality using a single correction parameter (Cheuvart and Ryan, 1991), and extend it to a general hazard structure for excess hazard regression models (Rubio et al., 2018). We also propose a correction model using a random effect (instead of a single parameter). For each of these models, we study the properties of maximum likelihood estimation of the corresponding parameters. We assess the performance of this inferential procedure in an extensive simulation study. We apply and compare these methods using a lung cancer data example. We conclude with some practical advice and summarise the conditions and limitations of these methods, as well as potential directions for further research.
2 Corrections of the background mortality in excess hazard regression
In this section, we present two parametric corrections that can account for mismatches in the life table. The first one corresponds to the single parameter correction proposed by Cheuvart and Ryan (1991). The second one corresponds to our proposal, which assumes that the correction to the background mortality is random and can be modelled with a parametric distribution. Next, we present a brief summary of these methods, as well as some of their properties and limitations.
2.1 Single parameter correction: Cheuvart and Ryan’s approach
In the context of clinical trials, Cheuvart and Ryan (1991) proposed an extension of model (1) in which they allowed for a constant correction on the population hazard: the overall hazard after years of followup for a patient who entered the trial at age in year and with covariates was assumed to be
(2) 
where is an unknown parameter differentiating the competing mortality of eligible patients from that of the general population. Cheuvart and Ryan (1991) employ the proportional excess hazard model , where is the baseline excess hazard with parameter , is a vector of regression parameters, including the effects of treatment and other prognostic factors. In the context of clinical trials, the correction parameter can be reasonably assumed to be the same for all individuals since the population entering the trials is selected based on their characteristics (usually patients without comorbid conditions, and etcetera).
Model (2) can be rewritten in terms of the cumulative hazard and the survival functions as follows
(3)  
(4) 
where and are the cumulative hazard functions obtained from and , respectively.
In principle, model (2) can also be used to account for mismatched life tables, as the correction is made on the population hazard. The basic assumption behind this model is that the true competing mortality is proportional to the mortality of the population obtained from the life tables, and the correction is the same for all individuals. Expression (3) also indicates that the information used for estimating the additional correction parameter comes from the differences in the cumulative hazard , which are not used in the estimation of the classical model (1).
2.2 Frailty correction for the population hazard
One limitation of model (2) is that the correction , made on the population hazard, is assumed to be constant across all the individuals. This assumption may not be realistic in population studies, where the diversity of unavailable sociodemographic characteristics used to obtain the life tables may induce a nonconstant mismatch. Thus, instead of assuming that is constant in model (2), we assume that
is a positive continuous random variable. This implies that the correction factor
is allowed to vary across the different individuals. More specifically, consider the conditional hazard model(5)  
where
is an arbitrary absolutely continuous cumulative distribution function with support on
. The conditional overall survival function is given byThen, after integrating out the frailty with respect to the distribution (see Appendix), the individual marginal overall survival function can be written as
(6) 
where denotes the Laplace transform of evaluated at time . Next, we consider a specific choice for the distribution
: a Gamma distribution. This choice allows for obtaining a closedform expression of the marginal survival function, in addition to its appealing flexibility and interpretability of parameters.
Gamma Frailty
Consider the conditional hazard model (5) and suppose that , where denotes a Gamma distribution with mean parameter , scale parameter
, and probability density function
. Then, it follows that
The marginal individual survival function is given by
(7) where and represent the cumulative hazards associated to and , respectively.

The marginal individual hazard function is given by
(8)
Expression (8) provides a nice interpretation of our approach since the observed hazard can be seen as a model with a correction function , which is identifiable and provides a functional form that involves the mean correction, the scale or spread of the correction, and the differences on the population cumulative hazards. Another property of the correction function is that , which indicates that the correction model (2) is a limit case. This property, however, does not imply that the correction parameter in (2) represents the mean correction when , as shown in our simulations. We can also observe that if two patients have the same sociodemographic characteristics (or virtually the same) but different survival times, then the corresponding value of the weight will differ since the corresponding differences in the cumulative hazards will be different. This, intuitively, suggests that the more extreme the survival time of a patient is (either too small or too large), compared to other patients with the same sociodemographic characteristics, the more likely this patient has been assigned the incorrect mortality rate from the life table. This approach implicitly assumes that the excess hazard model is correctly specified and includes all important prognosis factors (see Discussion). It also indicates that the information about the parameters of the frailty distribution is provided by the variability in the observed survival times of similar individuals regarding their sociodemographic characteristics and tumour prognosis factors. This suggests the need for a certain amount of observations for groups of patients with similar characteristics.
2.3 Hazard structure and baseline hazard
For models (1), (2), and (8), we adopt the general excess hazard model (GH) proposed in Rubio et al. (2018):
(9)  
where is the baseline hazard. This hazard structure contains, as particular cases, the proportional hazards (PH) model when , the accelerated hazards (AH) model when , the accelerated failure time (AFT) model when , as well as combinations of these for . Thus, this structure covers the most popular hazard structures used in the literature, allowing also to capture timedependent effects through (i.e. effects which are not assumed to be constant over the whole followup period, such as the effects estimated in PH models). For an extensive discussion on the properties of this GH structure, we refer the reader to Rubio et al. (2018).
The baseline hazard in (9) will be modelled using the Exponentiated Weibull (EW) distribution. The Exponentiated Weibull distribution contains three positive parameters (shape, scale, and power), and the corresponding hazard function can capture some basic shapes: increasing, decreasing, unimodal (upthendown), bathtub (downthenup) and constant. The Exponentiated Weibull density and cumulative distribution functions with shape, scale, and power parameters are given, respectively, by:
Thus, the combination of the GH structure with the choice of the EW baseline hazards can capture a variety of hazard structures, timedependent effects (through the parameters in ), and baseline hazard shapes, while allowing for a parsimonious implementation of all the proposed models (Rubio et al., 2018).
3 Inference
3.1 The classical model
Let , , be the sample of times to event from a population of cancer patients, with covariates , and vital status indicators (death, censored).
For the classical model (1), we will rely on the maximum likelihood estimation method, which has been shown to have good inferential properties with the hazard structure detailed in (9) (Rubio et al., 2018). With this model, the likelihood is defined as:
M1. Classical model (1):
where represent the parameters of the excess hazard model. In our case, . Notice that here for model (1), we omit the expected survival from the likelihood, as this quantity does not depend on parameters.
3.2 The models with correction of the background mortality
For the models of interest, (2) and (8), we will estimate the parameters using the maximum likelihood method. The corresponding likelihood functions are presented below for the singleparameter correction model (2) and the frailty correction model (8)
These likelihood functions can be maximised using standard optimisation routines from the R software (e.g. ‘nlminb’ or ‘optim’). In order to select between models M1–M3, we use the Akaike Information Criterion (AIC), as these models are estimated using the maximum likelihood method.
3.3 Choice of initial points for the optimisation process
For the optimisation process, we consider a twostep algorithm. In the first step, we initialise the search at the initial values: , where , , , , and then move from these initial values using one cycle of a coordinate descend algorithm (CDA), coupled with the R command ‘nlminb’ on each step of the CDA (see Wright, 2015 for more details on the CDA). The CDA is an algorithm which successively maximises an objective function along coordinate directions. In our case, we first obtain a new value of the first parameter, , after maximising the objective function with respect to , while setting the other parameter values at their initial values. Then, we maximise again the objective function, obtained by setting this time the initial values to to get . We repeat the same process for all other parameters to finally get a vector . In the second step, we utilise the values obtained with the CDA as new initial values in a general purpose optimisation algorithm (e.g. ‘nlminb’ or ‘optim’ from the R software), in order to obtain the MLE .
For model M2, we use the initial values , with , while for model M3, we employ the initial values , with , and . In practice, we recommend running the optimisation process at several initial points in order to ensure that the global maximum of the likelihood is reached.
4 Simulation Studies
In this section, we present a simulation study where we assess the impact of mismatches in the population hazard on the estimation of the excess hazard, as well as the performance of the proposed correction models (M2M3). The true values of the parameters are chosen in order to produce scenarios that resemble cancer population studies concerning an aggressive type of cancer (relatively low 5year net survival, see Appendix Table 1), such as lung cancer, similar to the simulations presented in Rubio et al. (2018). An additional extensive simulation scenario resembling a less aggressive type of cancer (such as colon cancer, see Appendix Table 2) is presented in the Appendix (Simulation design II).
4.1 Data Generation
Briefly, in the Simulation Design I, we simulated data sets of size , assuming the additive hazard decomposition given in (1
). The variable “age” was simulated as a continuous variable using a mixture of uniform distributions with
probability on , probability on , and probability onyears old. The binary variables “sex” and “W” were both simulated from a binomial distribution with probability
(the binary variable “W” could be viewed as “treatment”, or “comorbidity”, or “stage” (early and late)). In all scenarios, we simulated the “othercauses” time to event using the UK life tables based on “age” and “sex” (assuming that all patients were diagnosed on the same year). The time to event from the excess hazard (cancer death time) was generated using the inverse transform method, assuming effects of the 3 variables “age”, “sex” and “W” and an Exponentiated Weibull distribution. We assumed either (i) only administrative censoring at years, which induced approximatelycensoring in all cases, or (ii) an additional independent random censoring (dropout) using an exponential distribution with rate parameter
, inducing approximately censoring in these cases. Given the GH structure (9) adopted for the simulation, all variables affect the time scale (i.e. timedependent effects) as well as the hazard scale (i.e. changing the level of the hazard). We refer the reader to Rubio et al. (2018) for a more detailed discussion on the interpretation of the GH structure.We consider 4 scenarios that represent mismatches in the life tables: (i) No mismatch, where the data are generated from model (1); (ii) Moderate mismatch, where the data are generated from the hazard model (5) with ; (iii) Severe mismatch, where the data are generated from the hazard model (5) with ; and (iv) Wide mismatch, where the data are generated from the hazard model (5) with (see Figure 1). In all of these scenarios we fit models M1–M3. We also consider selecting between models M1–M3 using AIC, in order to identify the model favoured by the data. The estimates of the parameters of the excess hazard model selected using AIC are reported, and this model is referred to as model M4. For model M4, we report the estimated correction parameter associated to the selected model. This is, if AIC selects model M1, if AIC selects model M2, and
if AIC selects model M3. We report coverage of the asymptotic normal confidence intervals in all the scenarios.
4.2 Simulation Results
For illustration, Table 1 and Figure 2 show the results for the scenario with , 30% of censoring and a Wide mismatch (iv). The results of the remaining scenarios are shown in the Appendix, as well as the results with sample size of and the simulation design II with a higher net survival. As expected, we observe that when there is Severe and Wide mismatch (iii)–(iv), the model without correction (M1) will lead to biased parameters estimates, as well as poor coverage (Table 1, Figure 2, Appendix Tables 5, 8, 9 and Appendix Figures 4, 7, 8). This is reflected in the MLEs of the model parameters as well as on the fitted excess hazards. The bias of the parameter estimates and the poor coverages of M1 are more pronounced in the simulation design II with high net survival (Appendix Tables 12, 13, 16, 17 and Appendix Figures 11, 12, 15, 16). In scenarios (i) and (ii) with No or Moderate mismatch, the fitted correction models M2 and M3 are centred around the true generating model, although they exhibit (as expected) a slightly larger variability compared to model M1 (Appendix Tables 3, 4, 6, 7, 10, 11, 14, 15, and Appendix Figures 2, 3, 5, 6, 9, 10, 13, 14). In scenario (iii) with Severe mismatch, the fitted correction models M2 and M3 properly correct the mismatch as these models are centred around the true generating model, with the cost of higher variability. The parameters estimated with model M1 are biased with a very low coverage (Appendix Tables 5, 8, 12, 16 and Appendix Figures 4, 7, 11, 15). In scenario (iv) with Wide mismatch, the fitted models M1 and M2 are biased and far from the true generating model. On the other hand, model M3 can capture this mismatch and reduce the bias properly (Table 1, Figure 2, Appendix Tables 9, 13, 17 and Appendix Figures 8, 12, 16). The models selected with AIC (M4) are also centred around the true generating model in all scenarios. In cases with no mismatch or moderate mismatch, we observe that the bias on the estimates is small, the coverage is close to , and all models tend to be relatively close to the true generating model. In those cases, the AIC tends to favour model M1. Table 18 in the Appendix shows the proportion of selected models using AIC. Overall, we can see that M3 is favoured in scenario (iv) with Wide mismatch. In the scenario (iii) with Severe mismatch, the proportion of selected models depends on the contribution of the excess hazard compared to the population hazard. In general, selecting the models using AIC or another model selection tool is advised in order to identify the need for correcting the population hazard. The simulation study results obtained with per sample were very similar to the ones obtained with .
Design I:
Model  Parameter  MMLE  mMLE  ESD  Mean Std Error  RMSE  Coverage 

M1  (1.75)  1.212  1.214  0.228  0.218  0.584  0.464 
(0.6)  0.592  0.593  0.046  0.045  0.047  0.934  
(2.5)  2.365  2.315  0.328  0.308  0.355  0.887  
(0.1)  0.119  0.118  0.012  0.012  0.023  0.700  
(0.1)  0.409  0.405  0.214  0.220  0.376  0.711  
(0.1)  0.034  0.030  0.233  0.221  0.268  0.902  
(0.05)  0.065  0.065  0.002  0.002  0.015  0.000  
(0.2)  0.296  0.296  0.047  0.045  0.107  0.449  
(0.25)  0.161  0.159  0.047  0.046  0.101  0.484  
M2  (1.75)  1.284  1.308  0.232  0.243  0.520  0.741 
(0.6)  0.625  0.635  0.060  0.065  0.065  0.845  
(2.5)  2.199  2.106  0.412  0.402  0.510  0.689  
(0.1)  0.119  0.118  0.012  0.013  0.023  0.715  
(0.1)  0.411  0.406  0.215  0.223  0.378  0.717  
(0.1)  0.011  0.001  0.225  0.225  0.251  0.928  
(0.05)  0.066  0.067  0.003  0.003  0.017  0.004  
(0.2)  0.308  0.309  0.046  0.047  0.117  0.352  
(0.25)  0.155  0.152  0.044  0.045  0.105  0.412  
(6.5)  0.453  0.001  0.652  0.549  6.082  0.846  
M3  (1.75)  1.406  1.323  0.745  0.721  0.820  0.978 
(0.6)  0.543  0.546  0.127  0.122  0.139  0.966  
(2.5)  4.082  2.992  7.019  3.070  7.192  0.970  
(0.1)  0.099  0.099  0.028  0.024  0.028  0.913  
(0.1)  0.121  0.134  0.340  0.339  0.340  0.959  
(0.1)  0.089  0.103  0.354  0.326  0.354  0.950  
(0.05)  0.048  0.049  0.009  0.009  0.010  0.948  
(0.2)  0.187  0.198  0.089  0.089  0.090  0.954  
(0.25)  0.262  0.251  0.087  0.085  0.087  0.949  
(10)  13.060  9.072  12.247  10.379  12.618  0.847  
(6.5)  7.010  7.134  1.839  1.727  1.908  0.853  
M4  (1.75)  1.367  1.274  0.733  0.686  0.826  0.920 
(0.6)  0.540  0.546  0.122  0.117  0.136  0.968  
(2.5)  3.992  2.978  6.479  2.694  6.645  0.960  
(0.1)  0.100  0.100  0.028  0.023  0.028  0.895  
(0.1)  0.139  0.151  0.341  0.328  0.343  0.938  
(0.1)  0.082  0.092  0.350  0.317  0.350  0.945  
(0.05)  0.049  0.049  0.010  0.008  0.010  0.859  
(0.2)  0.194  0.203  0.092  0.086  0.093  0.904  
(0.25)  0.257  0.248  0.089  0.082  0.089  0.907  
(6.5)  6.709  7.137  2.470  –  2.477  – 
. Mean of the MLEs (MMLE), median of the MLEs (mMLE), empirical standard deviation (ESD), mean (estimated) standard error, rootmeansquare error (RMSE), and coverage proportions (Coverage).
Design I:
5 Application: Lung Cancer data
We now analyse a dataset obtained from populationbased national cancer registry of NonSmall Cell Lung Cancer (NSCLC) patients diagnosed in 2012 in England. For deriving information on stage at diagnosis and presence of comorbidities at the time of diagnosis, we linked these data to administrative data (Hospital Episode Statistics HES and the Lung Cancer Audit data LUCADA) and then applied specific algorithms (BenitezMajano et al., 2016; Maringe et al., 2017). We used a 6year period up to 6 months before diagnosis to retrieve information on comorbidity. We checked for the presence of cardiovascular comorbidity (at least one of: Myocardial infarction, Congestive heart failure, Peripheral vascular disease, and Cerebrovascular disease) and Chronic Obstructive Pulmonary Disease (COPD). We measured deprivation using the Income Domain from the 2010 England Indices of Multiple Deprivation, defined at the Lower Super Output Area level (mean population ). The Income Domain measures the proportion of the population in an area experiencing deprivation related to low income, and ranges from to in our data (https://www.gov.uk/government/statistics/englishindicesofdeprivation2010). Followup was assessed on the 31st of December 2015, at which time patients alive were censored (so the maximum followup was 4 years). We restricted our analysis to men with no missing data. We observed patients with complete cases among which died before the 31st of December 2015, and 17 patients were lost to followup (censored before the 31st of December 2015). The median followup among patients censored was years, mainly because of administrative censoring. The , and quantiles of the patients’ age at diagnosis was , , while the mean was . Among the patients, were diagnosed at Stage I, at Stage II, at Stage III, and at Stage IV. Finally,
patients were classified with a cardiovascular comorbidity and
with a chronic obstructive pulmonary disease.We applied models M1–M3 to estimate the excess mortality hazard using deprivationspecific life tables (detailed by sex, age, year and Government Office Region in addition to the deprivation quintile). Consequently, we are implicitly assuming that the variables age, sex, deprivation, tumour stage, and comorbidity accurately explain the excess hazard in NSCLC patients. The regression parameter estimates for the excess hazard models M1–M3, as well as the correction parameters (for models M2–M3), are reported in Table 2. For illustrating the results, the excess mortality hazard and the corresponding Net Survival for two predefined subgroups of patients are depicted in Figure 3.
In Table 2, between the three models (M1, M2 and M3), the AIC favours model M3 (i.e. the frailty correction model). Differences on estimates (regression coefficients) between M1 and M3 are substantial. For example the protective effect of Stage I cancer (compared to being diagnosed with a Stage IV cancer) is even higher when accounting for mismatched life tables. This interpretation follows by noticing that the two parameters associated with Stage I are negative (see Rubio et al., 2018 for details on those hazardstructure models and their interpretation). The impact of the presence of a comorbidity is higher in M3 compared to M1 and M2. Thus, correcting the population life table for unobserved predicting variables of background mortality seems to be quite relevant in this example. An unobserved variable which certainly affects the population mortality hazard here is smoking status. We observe that the frailty distribution, used for correcting the population mortality in M3, cumulates of the probability mass below , and above . That is, the value represents the quantile of the fitted Gamma frailty distribution with scale parameter and mean . These values are in fact related to the proportion of smokers (roughly , which would, in principle, require a correction higher than 1) for England lung cancer patients (Ellis et al., 2014), which provides an intuitive interpretation of the frailty distribution parameters. This interpretation, of course, has to be taken only at an intuitive level since the correction induced with the frailty model M3 is not interpretable in terms of a single missing characteristic, but it represents a combination of missing characteristics such as drug use (most likely tobacco in this case), the presence of comorbidities among other lifestyle related diseases and its impact on the general population mortality, and etcetera.
In order to evaluate the effect of not having deprivationspecific life tables, we have also fitted excess hazard models using life tables without the deprivation variable (i.e. national life tables). The results obtained with the deprivationspecific life tables and those obtained with the national life tables (excluding deprivation) are very similar (see section 5, Table 19, in the Appendix), which is due to the high lethality of lung cancer (inducing negligible differences of population mortality between deprivation groups). Figure 1 in the Appendix shows the net survival curves obtained for the whole population with models M1–M3 as well as the nonparametric PoharPerme estimator (Perme et al., 2012). We observe that model M1 and the PoharPerme estimator are virtually the same, which indicates that M1 can properly capture timedependent effects (Rubio et al., 2018), an assumption made for the correction models. Model M2 produces a net survival curve which is consistently above that obtained with M1. The net survival curve obtained with M3 is above all others, which is explained by the fact that most of the probability mass ( ) of the frailty correction is above . We compared estimates of the excess mortality hazard and the corresponding net survival for two subgroups of patients (Figure 3). For stageIV patients, the differences between each model is almost not visible (upper panels), while the difference between models M1–M3 could be more clearly seen in stage II patients subgroup (lower panels).
M1  M2  M3  

–  –  9.83 (3.03)  
–  2.7 (0.21)  6.54 (0.91)  
0.05 (0.01)  0.03 (0.01)  0.03 (0.01)  
0.38 (0.01)  0.35 (0.01)  0.34 (0.01)  
4.64 (0.34)  5.64 (0.48)  5.92 (0.58)  
Aget  0.29 (0.04)  0.29 (0.04)  0.16 (0.05) 
Dept  0.11 (0.04)  0.12 (0.04)  0.09 (0.04) 
Stage 1t  2.66 (0.25)  2.17 (0.32)  5.4 (1.4) 
Stage 2t  2.2 (0.2)  2 (0.22)  2.69 (0.35) 
Stage 3t  1.66 (0.11)  1.57 (0.11)  1.75 (0.13) 
CVt  0.31 (0.11)  0.31 (0.11)  0.42 (0.11) 
COPDt  0.13 (0.11)  0.08 (0.12)  0.37 (0.14) 
Age  0.27 (0.01)  0.23 (0.02)  0.16 (0.02) 
Dep  0.06 (0.01)  0.06 (0.01)  0.04 (0.01) 
Stage 1  2.84 (0.06)  3.13 (0.1)  3.53 (0.36) 
Stage 2  2.16 (0.06)  2.32 (0.07)  2.65 (0.1) 
Stage 3  1.23 (0.03)  1.27 (0.04)  1.36 (0.04) 
CV  0.24 (0.04)  0.26 (0.04)  0.3 (0.04) 
COPD  0.19 (0.04)  0.17 (0.04)  0.25 (0.05) 
AIC  20304.69  20241.27  20213.41 
6 Discussion
6.1 Summary of findings
Using the general hazard structure in Rubio et al. (2018), we have proposed excess hazard regression models that can account for mismatches in the life tables induced by the unavailability of information on relevant population characteristics. The correction models based on a frailty distribution account for nonspecific mismatches in the life table, in the sense that the correction is not associated to the lack of known specific variables for constructing the life tables, but to the effect of potentially several unavailable characteristics, which is allowed to be different for each patient. This is the main difference with Cheuvart’s model (2), which we used here for comparison purpose even though this model was mainly developed in the context of a randomised clinical trial when a selection bias of the patient population is expected. Thus, Cheuvart’s model assumes the same constant correction parameter for all patients, which is the main difference with our proposed frailty correction models used in populationbased cancer registry data. We have shown that the proposed frailty correction models are able to properly identify and correct these mismatches in several simulation scenarios, provided that the sample size is large enough ( or more). Not accounting for mismatched life tables in the relative survival setting may lead to inappropriate net survival comparisons between populations. The need for relatively large samples in order to identify mismatches in the life tables is unsurprising, as there may be several reasons why general population life tables are not fit for our cancer patient population (drug use, lifestyle related diseases, deprivation, ethnicity, and etcetera), and only a large enough sample would guarantee that the data contains enough information to adjust for the unavailability of those variables. Intuitively, the information about these correction parameters is provided by the sample of differences of the population cumulative hazards . The implicit assumptions behind the proposed correction model M3 (8) are:

The set of covariates includes the relevant cancerspecific variables. Thus, all missing information (if any) is produced from a mismatch of the population mortality rate.

The model is properly specified. This is, the fitted excess hazard model is flexible enough to approximate the excess hazard.
Assumption (i) reflects the fact that the model was constructed to only capture mismatches in the life tables, since the correction parameter only affects the population mortality hazard. For instance, in our lung cancer data example, we assume that the excess hazard is accurately explained by age, sex, deprivation, tumour stage, and comorbidity. However, a potential risk factor that is not included there is the smoking status, which is not available at the population level in England. This risk factor might affect both the background mortality hazard (because of diseases or complications associated with smoking) and the cancerrelated mortality hazard, in the case that patients continue smoking after the diagnosis of cancer. Indeed, smoking is a driver of lung cancer incidence, but its impact on cancerrelated mortality hazard may not be that clear because comorbidity conditions already accounts (at least partially) to smokingrelated complications. Thus, the main interest of including smoking status in the predictor variables of the cancerrelated mortality hazard would be for patients who continue smoking after the diagnosis of cancer, and it would certainly be interesting to explore the effect of including this variable in the model once it becomes available. Assumption (ii) is important since model misspecification can also affect the correction made on the population hazard. If a covariate which appears in
and is wrongly modelled in (e.g.not accounting for timedependent effects), this may also affect the correction. In principle, this is not an onerous condition since one would usually aim at properly modelling the excess hazard, which is typically the main function of interest. Moreover, recent developments in the use of splines and parametric models
(Royston and Parmar, 2002; Giorgi et al., 2003; Nelson et al., 2007; Remontet et al., 2007; Charvat et al., 2016; Remontet et al., 2018; Rubio et al., 2018) allow for a tractable inclusion of nonlinear and timedependent effects in excess hazard models. We also assume that there is enough heterogeneity about the unobserved variables of interest. For instance, if we want to assess the impact of mismatched life tables in terms of deprivation, we assume that the sample contains large enough numbers of individuals with different deprivation levels. The amount of data required to accurately estimate the parameters of the correction models has been explored through a simulation study. Certainly, we would not recommend trying to correct for mismatches in the life tables in samples containing substantially fewer than observations, or with high censoring rates (e.g. higher than ). Overall, we have found that model M3 is a good option for accounting for mismatches in the life tables, provided a large enough sample. Its use however is not automatic, and should be analysed on a case by case basis. Comparing the results between corrected (M3) and uncorrected (M1) models, as well as the nonparametric PoharPerme estimator, is advisable in practice, in addition to the use of expert knowledge from clinicians or epidemiologists in order to understand and explain the source of the mismatches.6.2 Other models in the literature: shared and correlated frailty models
Zahl (1997) describes two extensions of longterm excess hazards models, where the main goal is to account for an increased risk of dying of other diseases in patients with certain cancers. The first extension consists of a shared frailty model in which a random effect (frailty) is multiplied by the population hazard and the excess hazard as follows:
(10) 
where , and is a distribution with positive support, typically chosen to be a Gamma distribution with unknown shape and scale parameters. Perhaps unsurprisingly, the induced model is nonidentifiable unless the random effect has unit mean. Given that the Gamma distribution is asymmetric, the assumption of unit mean implies that , which may not be a reasonable assumption in some scenarios since this implies that there is a higher probability of requiring a shrinking correction to the population hazard () than an increasing one (). For a general framework of frailty hazard models we refer the reader to Aalen et al. (2008).
Zahl (1997) proposed a correlated frailty model, by using frailties on both the population hazard and the excess hazard:
(11) 
where , and is a bivariate distribution with support on the positive quadrant. Intuitively, it is difficult (if at all feasible) to obtain information about the factors affecting the population hazard and the excess hazard (which is typically a flexible parametric model), and the dependencies between them, simultaneously. In fact, Zahl (1997) found that the maximum likelihood estimators of the parameters of model (11) do not exist, suggestive of identifiability issues of this model. Zahl (1997) proposed a number of restrictions of the parameter space (to a compact set) in order to alleviate these estimation issues. However, even after those restrictions, the MLE was on the boundary of the restricted parameter space, which suggests remaining lack of identifiability.
6.3 Further research
From the results of our simulation study, we have observed a larger variability in the estimators of additional parameters corresponding to the correction of the background mortality hazard. In order to reduce this variability, penalised maximum likelihood estimation methods could be used to shrink the correction parameter (i.e. for M2 or for M3) towards the value . This will be explored in future research.
From the simulation study, the coverage proportions of the additional parameter correcting the life table were lower than the nominal value. Using a robust estimator of the variance for this additional parameter may be an option to reach a better coverage, as may be calculating profile likelihood intervals.
Another extension of model (2) consists of modelling the correction parameters and in terms of a set of covariates, say . A related approach has recently been studied in Touraine et al. (2019). Possible limitations include the inferential challenges in estimating (the dimension of
) when the sample size is not large enough. In addition, the assumption of proportional population hazards is often too restrictive in the cancer survival field. In practice, one natural question is whether the Gamma frailty distribution is flexible enough to model the random correction. Using maximum likelihood estimation implies that the estimators of the parameters of the frailty distribution will converge to the values that minimise the distance (in fact, the KullbackLeibler divergence) to the true generating model. Section 6, Appendix Tables 2021 and Appendix Figures 1718, shows a simulated example where the random correction is simulated from a lognormal distribution (instead of Gamma). This example indicates that model M3 has a good performance even if the random corrections are not generated from a Gamma distribution, but as long as the Gamma distribution can approximate the shape of the true generating distribution. A possible extension consists of using a more flexible frailty distribution with a tractable Laplace transform, in order to obtain tractable expressions for the hazard and cumulative hazard functions. An attractive option is the power variance function (PVF) family of distributions
(Aalen et al., 2008), which contains three parameters instead of two. This, of course, complicates the estimation process.Software
Software in the form of R code, together with a sample input data set and complete documentation is available under request, and an R Markdown document entitled “Simulation design I: Excess hazard models for insufficiently stratified life tables” is available on the website http://www.rpubs.com/FJRubio/FGH and the GitHub repository https://github.com/FJRubio67/ExcessHazardModels.
Acknowledgements
The authors thank two referees, an Associate Editor, and the Editor for helpful comments.
We obtained the ethical and statutory approvals required for this research (PIAG 105(c)/2007; ECC 105(a)/2010); ethical approval updated 6 April 2017 (REC 13/LO/0610). We attest that we have obtained appropriate permissions and paid any required fees for use of copyright protected materials.
References
 Aalen et al. [2008] O. Aalen, O. Borgan, and H. Gjessing. Survival and event history analysis: a process point of view. SpringerVerlag, New York, 2008.
 BenitezMajano et al. [2016] S. BenitezMajano, H. Fowler, C. Maringe, C. Di Girolamo, and B. Rachet. Deriving stage at diagnosis from multiple populationbased sources: colorectal and lung cancer in England. British Journal of Cancer, 115:391, 2016.
 Bower et al. [2018] H. Bower, T.M.L. Andersson, M.J. Crowther, P.W. Dickman, M. Lambe, and P.C. Lambert. Adjusting expected mortality rates using information from a control population: An example using socioeconomic status. American Journal of Epidemiology, 187:828–836, 2018.
 Charvat et al. [2016] H. Charvat, L. Remontet, N. Bossard, L. Roche, O. Dejardin, B. Rachet, G. Launoy, and A. Belot. A multilevel excess hazard model to estimate net survival on hierarchical data allowing for nonlinear and nonproportional effects of covariates. Statistics in Medicine, 35:3066–3084, 2016.
 Cheuvart and Ryan [1991] B. Cheuvart and L. Ryan. Adjusting for agerelated competing mortality in longterm cancer clinical trials. Statistics in Medicine, 10:65–77, 1991.
 Dickman et al. [1998] P.W. Dickman, A. Auvinen, E.T. Voutilainen, and T. Hakulinen. Measuring social class differences in cancer patient survival: is it necessary to control for social class differences in general population mortality? a Finnish populationbased study. Journal of Epidemiology and Community Health, 52:727–734, 1998.
 Ellis et al. [2014] L. Ellis, M.P. Coleman, and B. Rachet. The impact of life tables adjusted for smoking on the socioeconomic difference in net survival for laryngeal and lung cancer. British Journal of Cancer, 111:195, 2014.
 Esteve et al. [1990] J. Esteve, E. Benhamou, M. Croasdale, and L. Raymond. Relative survival and the estimation of net survival: elements for further discussion. Statistics in Medicine, 9:529–538, 1990.
 Giorgi et al. [2003] R. Giorgi, M. Abrahamowicz, C. Quantin, P. Bolard, J. Estève, J. Gouvernet, and J. Faivre. A relative survival regression model using Bspline functions to model nonproportional hazards. Statistics in Medicine, 22:2767–2784, 2003.
 Grafféo et al. [2012] N. Grafféo, V. Jooste, and R. Giorgi. The impact of additional lifetable variables on excess mortality estimates. Statistics in Medicine, 31:4219–4230, 2012.
 Maringe et al. [2017] C. Maringe, H. Fowler, B. Rachet, and M.A. LuqueFernandez. Reproducibility, reliability and validity of populationbased administrative health data for the assessment of cancer nonrelated comorbidities. PloS One, 12:e0172814, 2017.
 Nelson et al. [2007] C.P. Nelson, P.C. Lambert, I.B. Squire, and D.R. Jones. Flexible parametric models for relative survival with application in coronary heart disease. Statistics in Medicine, 26:5486–5498, 2007.
 Pavlič and PoharPerme [2018] K. Pavlič and M. PoharPerme. Using pseudoobservations for estimation in relative survival. Biostatistics, in press:na–na, 2018.
 Perme et al. [2012] M.P. Perme, J. Stare, and J. Estève. On estimation in relative survival. Biometrics, 68:113–120, 2012.
 Pohar Perme et al. [2009] M. Pohar Perme, R. Henderson, and J. Stare. An approach to estimation in relative survival regression. Biostatistics, 10:136–146, 2009.
 PoharPerme et al. [2016] M. PoharPerme, J. Estève, and B. Rachet. Analysing populationbased cancer survival–settling the controversies. BMC Cancer, 16:933, 2016.
 Remontet et al. [2007] L. Remontet, N. Bossard, A. Belot, and J. Esteve. An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies. Statistics in Medicine, 26:2214–2228, 2007.
 Remontet et al. [2018] L. Remontet, Z. Uhry, N. Bossard, J. Iwaz, A. Belot, C. Danieli, H. Charvat, and L. Roche. Flexible and structured survival model for a simultaneous estimation of nonlinear and nonproportional effects and complex interactions between continuous variables: Performance of this multidimensional penalized spline approach in net survival trend analysis. Statistical Methods in Medical Research, in press:na–na, 2018.

Royston and Parmar [2002]
P. Royston and M.K.B. Parmar.
Flexible parametric proportionalhazards and proportionalodds models for censored survival data, with application to prognostic modelling and estimation of treatment effects.
Statistics in Medicine, 21:2175–2197, 2002.  Rubio et al. [2018] F.J. Rubio, L. Remontet, N.P. Jewell, and A. Belot. On a general structure for hazardbased regression models: an application to populationbased cancer research. Statistical Methods in Medical Research, in press:na–na, 2018.
 Touraine et al. [2019] C. Touraine, N. Grafféo, R. Giorgi, and the CENSUR working survival group. More accurate cancerrelated excess mortality through correcting background mortality for extra variables. Statistical Methods in Medical Research, in press:na–na, 2019.
 Woods et al. [2005] L.M. Woods, B. Rachet, M. Riga, N. Stone, A. Shah, and M.P. Coleman. Geographical variation in life expectancy at birth in england and wales is largely explained by deprivation. Journal of Epidemiology and Community Health, 59:115–120, 2005.
 Wright [2015] S.J. Wright. Coordinate descent algorithms. Mathematical Programming, 151:3–34, 2015.
 Zahl [1997] P.H. Zahl. Frailty modelling for the excess hazard. Statistics in Medicine, 16:1573–1585, 1997.