 # On an extension of the promotion time cure model

We consider the problem of estimating the distribution of time-to-event data that are subject to censoring and for which the event of interest might never occur, i.e., some subjects are cured. To model this kind of data in the presence of covariates, one of the leading semiparametric models is the promotion time cure model yakovlev1996, which adapts the Cox model to the presence of cured subjects. Estimating the conditional distribution results in a complicated constrained optimization problem, and inference is difficult as no closed-formula for the variance is available. We propose a new model, inspired by the Cox model, that leads to a simple estimation procedure and that presents a closed formula for the variance. We derive some asymptotic properties of the estimators and we show the practical behaviour of our procedure by means of simulations. We also apply our model and estimation method to a breast cancer data set.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Survival analysis has been the subject of many statistical studies in the past decades (see e.g. Klein and Moeschberger (1997); Therneau and Grambsch (2000)) and is commonly used in clinical trials (see e.g. Collett (2015)), where the traditional main goal is to explain the death of patients having a certain disease. When analysing the effect of some covariates on a survival time , a common approach in the literature is based on semiparametric estimation. The seminal paper by Cox (1972) introduces the so-called semiparametric proportional hazards model, often referred to as the Cox model, which is given by the following set of conditional survival functions defined on :

 P1={(t,x)↦S(t|x)=exp(−exp(γTx)Λ(t)) : γ∈Rd, Λ∈G},

where is the space of absolutely continuous cumulative hazard functions defined on . In this standard semiparametric model the elements are characterized by the Euclidean parameter

, called the regression vector, and the infinite dimensional parameter

, called the cumulative hazard. These parameters are estimated by maximizing the profile likelihood for (Cox, 1972) and by computing the Breslow estimator for (Breslow, 1972). Both estimators are nonparametric maximum likelihood estimators (NPMLE), as defined in Murphy (1994). Known asymptotic results include the asymptotic normality (Andersen and Gill, 1982), the semiparametric efficiency of the regression parameters (Ritov and Wellner, 1988) as well as the cumulative hazard (Bickel et al., 1993; Kosorok, 2008), and the validity of general bootstrap schemes (Wellner and Zhan, 1996).

In many data sets, especially the ones arising from clinical trials, a certain proportion of the individuals will never experience the event of interest. These individuals are referred to as the cured subjects. As the survival function does not tend to as in that case (but rather tends to the proportion of cured subjects), specific models need to be considered to account for the improperness of the distribution of . The promotion time cure model is an extension of the Cox model specially designed to handle the presence of cured subjects in the data. It is defined as the set of conditional survival functions defined on , given by

 P2={(t,x)↦S(t|x)=exp(−η(β1+βT2x)F(t)) : β=(β1,β2)∈Rd+1, F∈F},

where

denotes the space of absolutely continuous cumulative distribution functions on

and is a given function. This model was introduced by Yakovlev and Tsodikov (1996) and seems appropriate to treat cure data as, for every , , so that each subject has a positive chance of being cured. In model , the parameter vector has an intercept whereas in model does not. This is because and hence an intercept in model would not be identified, whereas in model the function is replaced by , which tends to 1 as . Estimation of has been studied by Tsodikov (1998a, b, 2001); Chen et al. (1999); Ibrahim et al. (2001); Tsodikov et al. (2003); Zeng et al. (2006); Portier et al. (2017), among many others. Certain parallels might be drawn between the statistical properties related to the estimators of the classical Cox model and the ones related to the promotion time cure model. The classical estimators of and are the NPMLE’s (Zeng et al., 2006). In Zeng et al. (2006), the authors show that the resulting NPMLE is asymptotically normal and moreover that the estimated vector of regression parameters is semiparametrically efficient. In Portier et al. (2017) it is shown that the whole model is estimated efficiently and the validity of a general weighted bootstrap is proved.

There is still an important difference between the NPMLE’s associated to models and . The NPMLE of the Cox model has a much simpler expression than the NPMLE of the promotion time cure model. Within model , the estimated regression parameter maximizes a known (explicit) objective function and the estimated cumulative hazard is expressed through a closed formula (Andersen and Gill, 1982). Within Model , the estimated regression parameter is also the maximizer of a certain objective function, but this time the objective function is implicitly defined (Portier et al., 2017). Moreover, the same is true for the estimated cumulative hazard in , which is only known up to some quantity implicitly defined. The previous features involve important complications that intervene at two different stages. First, estimators from are more difficult to describe, theoretically, than estimators from

. This eventually deteriorates the accuracy of the confidence intervals or of the testing procedures. Second, the computation of the estimators in

has some numerical difficulties, e.g., long computation time, problems with local minima, etc. Given this, the question is to know, whether or not, it is legitimate to rely on a complicated estimation procedure for ? In other words, does the presence of cured subjects in the data prevents us from having an estimation procedure as simple as in the Cox model?

The aim of this paper is to provide a new model dedicated to cure data analysis and for which the NPMLE overpasses the previous difficulties associated with .

The undesirable complications when estimating come from the particular nature of the parameter space . This space is formed by cumulative distribution functions that satisfy the constraint . Such a constraint is taken into account with the help of a Lagrange procedure involving an additional parameter being implicitly defined, the Lagrange multiplier. It turns out that this constraint can be alleviated by including an additional parameter in the model, replacing by , with . We define the set of conditional survival functions , given by

 P3={(t,x)↦S(t|x)=exp(−g(γ,x)θF(t)) : (γ,θ)∈Rq×R>0, F∈F},

where is a given function and . Note that in the present form, handles biological models as developed in Chen et al. (1999) to analyse time to relapse of cancer through the distribution of the carcinogenic cells. It includes also a cure version of the Cox model when . In this case, it coincides with for which . Otherwise and are different. In , the role of is interpreted as a simple multiplicative effect on the cumulative distribution, whereas the effect of in must be analysed depending on the shape of the function .

The main contributions of the paper are listed below.

1. As the NPMLE of is much simpler to evaluate than the one associated to , the proposed methodology provides a significant improvement in terms of computational ease. In particular, we show that the NPMLE’s associated with and coincide when and . Hence our approach provides a new way to compute the NPMLE of when (most commonly used) which is simpler than the existing procedure Zeng et al. (2006); Portier et al. (2017).

2. We derive the asymptotics of the NPMLE associated with

. As in the case of the Cox model, we have closed-formulas for the variance of the limiting Gaussian distributions. This allows us to develop some tests and to build confidence intervals on some quantities of interest as for instance the proportion of cure given the value of a covariate

. The finite sample size accuracy of the confidence intervals is investigated with the help of simulations.

3. Moreover, as the function needs to be chosen by the analyst, we consider a likelihood-based methodology to select an appropriate function among a family of proposals. Such an approach is also followed by Huang and Liu (2006), who investigate spline estimation of the function in the case of the classical Cox model.

In section 2 we present the framework of the paper and derive the NPMLE of model . We also consider the links with the NPMLE of . In section 3, the asymptotic behaviour of the NPMLE of is studied. In sections 4 and 5, we provide simulations and a real data analysis to give some insights in the finite sample performance of our approach. The proofs are collected in the Appendix.

## 2 The data, the model, the estimator

### 2.1 Framework

We focus on the standard right censoring context : the lifetime

of interest is right censored by some random variable

so that we only observe , and the vector of covariates . This means that we know whether the variable of interest has been observed or censored. The covariates are in contrast always observed, and we further denote by their support. We suppose conditional independence between and , given . In practice, as is the case for instance in clinical trials, might be bounded. This prevents us from observing any cured subjects, defined by . A way around this problem is to assume the existence of a threshold such that

 {T>τ}⇒{T=+∞}.

Therefore whenever will be observed to be greater than , the individual will be known to be cured. We use model for modelling the distribution of given . Hence we further assume that the conditional survival function of given is given by , for some , , and an absolutely continuous cumulative distribution function. Let

denote the probability measure associated to

. Supposing in addition that a.s., we obtain that a.s., meaning that every individual can be cured. These assumptions are stated in Section 3 in (H1).

A central object in our study is the counting process , , as it possesses some useful martingale properties as developed in Fleming and Harrington (1991) and Andersen et al. (1993). Define the random process , , with . It equals whenever the individual is still at risk. The presence of cure implies that has positive probability. The compensator of with respect to the -field generated by is the process . That is, defined by

 {M(0)=0dM(y)=dN(y)−g(γ0,X)R(y)θ0dF0(y),y∈R≥0

is a martingale with respect to (Fleming and Harrington, 1991, Theorem 1.3.1). In particular, we have the formula (Fleming and Harrington, 1991, Theorem 1.5.1)

 E[δh(Y,X)]=∫E[h(u,X)g(γ0,X)R(u)]θ0dF0(u) (1)

for any bounded measurable function . Finally, the following identity shall be useful : for any bounded measurable functions and , we have (Fleming and Harrington, 1991, Theorem 2.4.2)

 E[∫h(u)dM(u)∫~h(u)dM(u)]=∫h(u)~h(u)E[g(γ0,X)R(u)]dΛ0(u). (2)

### 2.2 Nonparametric maximum likelihood

Let denote a sequence of independent and identically distributed random variables with law , as described in the previous subsection. The underlying probability measure is denoted by . The estimator we consider shall be based on the observed variables : , , , . Let , , , and define the martingale , for .

Under the current data generating process, assuming that is absolutely continuous, and assuming non-informative censoring (Sasieni, 1992), the likelihood of an observation in model is given by

 Lik(y,δ,x)={g(γ,x)θf(y)}δ exp[−g(γ,x)θ{ΔF(y)+(1−Δ)}], (3)

where stands for the derivative of . Model can be re-written as the set of all survival functions of the form where and belongs to , the space of absolutely continuous cumulative hazards such that . Note that there is a one-to-one relationship between the two sets of parameters and , i.e., and . As a consequence the likelihood in (3) can be expressed in terms of or equivalently, in terms of . Switching from one parametrization to another is straightforward. For the sake of simplicity, we derive the NPMLE with respect to in the next few lines. By following Murphy (1994), the NPMLE is defined as

 (ˆγ,ˆΛ) =argmaxγ∈Rq,Λ n∑i=1[δilog(g(γ,Xi)Λ{Yi})−g(γ,Xi){ΔiΛ(Yi)+(1−Δi)Λ(+∞)}], (4)

the maximum is taken over lying in the space of cumulative hazard functions possibly discrete, and is the size of the jump of at . As is common practice for computing the NPMLE in semiparametric models, the above NPMLE might be profiled over the nuisance parameter (Murphy and Van der Vaart, 2000; Kosorok, 2008). Maximizing along submodels , , with a bounded real function, the value of which maximizes (4), for each , is a solution of

 n−1n∑i=1δih(Yi)−∫ˆQγ(u)h(u)dΛ(u)=0,

with . The solution of the previous equation is given by

 ˆΛγ(y)=n−1n∑i=1δi1{Yi≤y}ˆQγ(Yi),y∈R≥0.

This is then plugged into (4) to get that

 ⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ˆγ∈argmaxγ∈Rq n∏i=1{g(γ,Xi)/ˆQγ(Yi)}δiˆΛ(y)=n−1n∑i=1ˆQˆγ(Yi)−1δi1{Yi≤y},y∈R≥0. (5)

Back to the parameters of Model , the NPMLE is given by

 ⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ˆθ=n−1n∑i=1ˆQˆγ(Yi)−1δiˆF(y)=(ˆθn)−1n∑i=1ˆQˆγ(Yi)−1δi1{Yi≤y},y∈R≥0. (6)

At fixed sample size , the quantities involved in the previous equations are well defined as soon as, for instance, there exists such that and the maximum in (5) can be taken over a known compact set on which the function is continuous, for every .

Note also that the estimation of the parameters depends only on the observed variables such that , and such that , . It results that moving the threshold over , with , has no effect on the NPMLE. In practice the threshold could then be fixed at .

An important point in many situations is to evaluate the proportion of cured subjects in the population under study for a given covariate vector , i.e., . The estimator of , within our framework, naturally follows from the plug-in rule :

 ˆp(x)=exp(−g(ˆγ,x)ˆθ). (7)

### 2.3 Link with other estimators

#### 2.3.1 Cox and Breslow estimator

Model is aimed to handle the presence of cured subjects in the data whereas the traditional Cox model, , is not. However, when , (5) becomes very close to the well-known formulas of the classical Cox and Breslow estimator of and , respectively. As a consequence, the derivation of the asymptotics for is somewhat similar as in the case of the Cox and Breslow estimator, provided for instance in Andersen and Gill (1982). An interesting difference with the Cox and Breslow estimator comes from the fact that

 mini=1,…,nˆQˆγ(Yi)≥n−1n∑i=1g(γ,Xi)(1−Δi).

From the framework described in the previous section, we deduce that and , for every . Consequently, the decreasing function is bounded from below. In Lemma C.2, see the Appendix, this property is shown to hold for , uniformly in , with probability going to . This raises a significant difference with respect to classical Cox estimators in which the quantity corresponding to would go to at infinity. This in turn implies that the weak convergence of the rescaled will still hold over . This is in contrast with the case of the Cox model for which such a convergence holds on bounded intervals. We refer to Andersen and Gill (1982) for a discussion on the study of the Breslow estimator over .

#### 2.3.2 Promotion time cure estimator

The NPMLE for is given by (Portier et al., 2017),

 (8)

where and for every , is the smallest number verifying . Because the function is implicitly defined, it is more difficult to compute the NPMLE of through (8), than the one of through (5) and (6). In particular, solving (8) requires to run an optimization procedure over for which, at each iteration, we shall evaluate , by an additional procedure. When , it is actually useless to solve (8), since it gives the same results as (5) and (6). This is the statement of the following proposition.

###### Proposition 1.

Suppose that and for every . If there exists such that , then and .

## 3 Asymptotics

The asymptotic analysis of the NPMLE associated to model

is inspired from the approach developped for the Cox model in Andersen and Gill (1982). We may first derive the asymptotic behaviour of the -estimator , and then rely on functional Delta-method type arguments, to describe . The monographs of Van der Vaart and Wellner (1996) and Kosorok (2008) will be of good help at each of these steps to rely on suitable empirical process techniques. The preliminary study of and (given in sections 3.1, 3.2 and 3.3) will provide the basis to describe the behaviour of defined in (7).

As it is common for -estimators, the asymptotic study of starts with the establishment of its consistency. In contrast, for , we will rely on the explicit formula (5) to directly show the weak convergence of .

### 3.1 Consistency of ˆγ

The estimator is defined as a maximizer in (5). To obtain the consistency of , we classically show that (i) the maximum of the limiting function is well identified and that (ii) the convergence to this limiting function is uniform; see Newey and McFadden (1994, Theorem 2.1) or Van der Vaart (1998, Theorem 5.7). To obtain the identifiability, we need the following assumptions :

1. The variables and are independent given . Moreover, a.s., a.s., and .

2. For any , implies that .

The following hypotheses (H3) and (H4) help to control the complexity of the underlying class of functions as well as to guarantee the continuity of the function to maximize. Let denote the -norm.

1. The true value belongs to the interior of a compact set .

2. There exist functions and such that for every and every , we have and , and are finite. There exists a function such that for every and every ,

 |g(γ,x)−g(~γ,x)|≤|γ−~γ|1c1(x), (9)

with .

###### Proposition 2.

Under (H1)–(H4), we have that .

We now discuss assumption (H2) by considering some examples.

###### Example 1 (Cox with cure).

When , (H2) is equivalent to the statement that has full rank.

Without specifying , identifiability might not hold. Indeed, consider the case where , then of course different pairs could lead to the same function . A possibility when facing such difficulties is to restrict to the unit sphere in . Then identifiability might be recovered. We refer to this model as a directional model.

###### Example 2 (directional model).

Suppose that with , . One can typically think of functions of the form , for some . Such models allow for a geometric interpretation in the same vein as the single-index models (Cui et al., 2011). The information available from the covariates to predict

is contained in the linear transformation

, where stands for the orthogonal projector on . For more details about identifiability of single-index models, we refer to Theorem 1 in Lin and Kulasekera (2007) as well as Theorem 1 in Portier and Delyon (2013) where is required to possess a density.

If does not hold, then identifiability could fail unless more specific forms are considered for . An example where identifiability is still satisfied is given below.

###### Example 3 (Modified Cox).

An interesting choice is when , where , for . In the following lines, we obtain (H2) under the assumption that has a continuous density and is included in the support of . Suppose that is constant for almost every . Suppose that and are linearly independent. Then, take such that . Let and

be a probability density function. For any

we have, using approximation theory, that as , where . Hence for any , is constant which is impossible. Supposing that and are linearly dependent, we directly obtain that .

### 3.2 Asymptotic normality of ˆγ

We now introduce some notations that will be useful to express the asymptotic normality results. For every , , , , . We define

 I0=∫E[{d0(X)−h0(u)}{d0(X)−h0(u)}Tg(γ0,X)R(u)]dΛ0(u), (10)

where and . We require the following assumptions to obtain an asymptotic decomposition for .

1. The matrix has full rank.

2. For every , is differentiable and there exists a function such that for every and every ,

 |∇γg(γ,x)−∇γg(~γ,x)|1≤|γ−~γ|1c2(x), (11)

with . Moreover there exists a function such that, for every , where , , , and are finite.

###### Proposition 3.

Under (H1)–(H6), we have that

 n1/2(ˆγ−γ0)=n−1/2I−10n∑i=1∫(d0(Xi)−h0(u))dMi(u)+oP(1), (12)

and in particular, using Lemma C.4, see the Appendix, combined with (2), it holds that .

### 3.3 Weak convergence of ˆΛ

Based on the decomposition obtained for , we can now obtain a uniform representation of the process . This is the statement of the next Proposition.

###### Proposition 4.

Under (H1)–(H6), we have that

 supy∈R≥0∣∣ ∣∣n1/2(ˆΛ(y)−Λ0(y))−{n−1/2n∑i=1∫y0dMi(u)Q0(u)−∫y0h0(u)TdΛ0(u)(n1/2(ˆγ−γ0))}∣∣ ∣∣=oP(1). (13)

In particular, using Lemma C.4, the two terms involved in the decomposition are asymptotically independent and converges weakly to a tight centered Gaussian process in with covariance process given by

 (y,y′)↦∫min(y,y′)0dΛ0(u)Q0(u)+(∫y0h0(u)TdΛ0(u))I−10(∫y′0h0(u)dΛ0(u)).

The two previous propositions, Proposition 3 and 4, form the basis of the next analysis, which ultimately describes the estimator of the cure proportion . The following results will be obtained as (almost direct) consequences of Propositions 3 and 4 and so are referred to as corollaries.

### 3.4 Asymptotic normality of ˆθ

Since and , the weak convergence of is deduced from the weak convergence of as the finite dimensional laws converge in distribution. The expression for the asymptotic variance is deduced from the one given in Proposition 4.

###### Corollary 5.

Under (H1)–(H6), converges in distribution to a centered Gaussian distribution with variance

 vθ=∫dΛ0(u)Q0(u)+(∫h0(u)TdΛ0(u))I−10(∫h0(u)dΛ0(u)). (14)

As , invoking some Delta-method arguments, the weak convergence of the process can be established. This however is not needed in the following.

### 3.5 Cure rate estimation

Recall that the cure proportion associated to is given by and that the estimator is . A Taylor development gives that

 n1/2(ˆp(x)−p0(x))=−p0(x)g(γ0,x){n1/2(ˆθ−θ0)+θ0d0(x)Tn1/2(ˆγ−γ0)}+oP(1).

Injecting (12) and (13) in the previous display leads to the following statement.

###### Corollary 6.

Under (H1)–(H6), for a given , we have that

 n1/2(ˆp(x)−p0(x)) =−p0(x)g(γ0,x)n−1/2n∑i=1{∫dMi(u)Q0(u)+u0(x)TI−10∫(d0(Xi)−h0(u))dMi(u)}+oP(1),

where . Consequently, converges in distribution to a centered Gaussian distribution with variance

 vp(x)=p0(x)2g(γ0,x)2(∫dΛ0(u)Q0(u)+u0(x)TI−10u0(x)).

Note that a similar result can be obtained concerning the estimator of the survival function but we prefer to omit this for the sake of brevity.

## 4 Simulation study

We performed some extensive Monte Carlo simulations in order to assess the performnce of our suggested estimators. The simulations were performed under a variety of conditions on the censoring rate, sample size and cure rate. The data were generated according to the following model:

 S(t|x1,x2)=exp[−exp{Γ(γ01x1+γ02x2)}θ0F0(t)]. (15)

In the above model, we chose the link function to be either the identity, the cubic or the sine function. For clarity, in the first part of this simulation study we will focus on the case of the identity function. With few exceptions, all our comments and findings also apply to the case where and . In all our simulations, , , , is the cumulative distribution function of a uniform variable on ,

is a uniformly distributed random variable on

, is a normal random variable with mean , and and are independent. The censoring variable is exponential with parameter and is independent of . By varying the latter we mainly control the censoring rate, while by varying we control the cure rate.

Suppose we have a sample , from the distribution described above, with . We obtain , the estimator of , by maximizing the partial likelihood function given by (5), using the Newton-Raphson algorithm. We get , the estimator of , by applying (6). The cure probability estimator is then obtained by

 ˆp(x1,x2)=exp[−exp{Γ(ˆγ1x1+ˆγ2x2)}ˆθ].

Using the plug-in principle together with (10), we obtain an estimator for the asymptotic variance-covariance matrix of which is given by , where

 ˆI=n−1n∑i=1δi{(dˆγ(Xi)−ˆhˆγ(Yi))(dˆγ(Xi)−ˆhˆγ(Yi))T},

with, for every , . Similarly, using (14), we obtain an estimator of the asymptotic variance of given by , where

 ˆvθ=n−1n∑i=1δiˆQˆγ(Yi)2+⎛⎝n−1n∑i=1δiˆhˆγ(Yi)ˆQˆγ(Yi)⎞⎠TˆI−1⎛⎝n−1n∑i=1δiˆhˆγ(Yi)ˆQˆγ(Yi)⎞⎠.

And using the expression for the variance of given in Corollary 6, we obtain an estimator of the asymptotic variance of given by , where

 ˆvp=ˆp(x)2g(ˆγ,x)2⎛⎝n−1n∑i=1δiˆQˆγ(Yi)2+ˆu(x)TˆI−1ˆu(x)⎞⎠,

with and .

We perform replications for four sample sizes (, , and ), three levels of censoring (, and ) and three levels of cure (, and ). For every scenario and every replication, we calculate the estimators , , and together with their estimated asymptotic variance () and the corresponding asymptotic confidence intervals based on the asymptotic normality. Based on the replications, we also calculate the empirical bias, the empirical variance (), the empirical mean squared error () of every estimator together with the empirical coverage probability () for the confidence intervals. In the case of the cure probability we did the calculations for

and every quantile of

corresponding to the probability levels We summarize the results by taking the average of the resulting empirical ’s, empirical ’s and empirical ’s. Due to space limitations, we provide below only some selected but representative scenarios.