One-step Targeted Maximum Likelihood for Time-to-event Outcomes

by   Weixin Cai, et al.

Current targeted maximum likelihood estimation methods used to analyze time to event data estimates the survival probability for each time point separately, which result in estimates that are not necessarily monotone. In this paper, we present an extension of Targeted Maximum Likelihood Estimator (TMLE) for observational time to event data, the one-step Targeted Maximum Likelihood Estimator for the treatment- rule specific survival curve. We construct a one-dimensional universal least favorable submodel that targets the entire survival curve, and thereby requires minimal extra fitting with data to achieve its goal of solving the efficient influence curve equation. Through the use of a simulation study we will show that this method improves on previously proposed methods in both robustness and efficiency, and at the same time respects the monotone decreasing nature of the survival curve.



There are no comments yet.


page 1

page 2

page 3

page 4


One-step TMLE to target cause-specific absolute risks and survival curves

This paper considers one-step targeted maximum likelihood estimation met...

Predictive Accuracy of Markers or Risk Scores for Interval Censored Survival Data

Methods for the evaluation of the predictive accuracy of biomarkers with...

Canonical Least Favorable Submodels:A New TMLE Procedure for Multidimensional Parameters

This paper is a fundamental addition to the world of targeted maximum li...

Countdown Regression: Sharp and Calibrated Survival Predictions

Personalized probabilistic forecasts of time to event (such as mortality...

Inverse-Weighted Survival Games

Deep models trained through maximum likelihood have achieved state-of-th...

Tutorial: Deriving The Efficient Influence Curve for Large Models

This paper aims to provide a tutorial for upper level undergraduate and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Using machine learning to model time-to-event data extends the application of survival analysis to a wider class of problems beyond the classical medical survival outcome, where factors of the experiment are well controlled. Modern observational time-to-event data analysis flourishes in inter-disciplinary fields such as in radiomics (Leger et al., 2017), device reliability modeling for engineering, and customer lifetime value prediction for online marketing (Wang et al., 2017). A common characteristic of these applications is that there are many high-dimensional, interdependent confounders in the observational data, which demands the development of flexible, machine learning based methods that can also model time-to-event outcomes. Common methodologies in this topic include survival trees (Gordon and Olshen, 1985; Hothorn et al., 2004; LeBlanc and Crowley, 1993)

, survival support vector machines

(Khan and Zubek, 2008)

, and deep learning Cox proportional hazard model

(Faraggi and Simon, 1995)

. For these existing methods, either there is no statistical efficiency theory result, or the confidence intervals have to be constructed using computationally expensive bootstrap methods.

Targeted maximum likelihood estimator (TMLE) method is an appealing approach by performing fully non-parametric machine-learning estimation, and at the same time obtaining valid statistical inference. Moore and van der Laan (2009) first introduced an iterative TMLE for estimating the treatment-specific survival curve in randomized clinical trials. They showed that the TMLE improves upon common methods for analyzing time-to-event data in robustness, efficiency, and interpretability of parameter estimates. Stitelman and van der Laan (2010) extended the estimator using collaborative TMLE (C-TMLE) method to estimate the survival curve in observational studies, establishing an algorithm for fitting the treatment and censoring mechanism that is targeted towards its utility, namely bias reduction w.r.t. the target estimand of interest. The C-TMLE methods are more robust than standard TMLE in challenging observational settings in which there are many confounders and some of them might act as approximate instrumental variables only affecting the treatment or censoring, while not being predictive of survival (Stitelman and van der Laan, 2010). The C-TMLE methodology can also produce stable estimates of borderline identifiable parameters, when a certain set of baseline covariates are almost completely predictive of a particular treatment within the sample.

Sometimes, we might want to study the entire survival curve instead of survival probability at a single time point. As such, estimators that target the entire survival curve may provide further insight into the mechanism of the treatment. However, existing methodologies are not sufficient when targeting the entire survival curve, an infinite dimensional parameter. Suppose the entire survival curve is characterized by survival probabilities at distinct time points. The task boils down to estimating a dimensional target parameter. In the TMLE context, targeting a high-dimensional parameter will result in a least favorable submodel with the same dimension as the target parameter (). As a result, simultaneously targeting in all directions is not stable and loses the meaning of targeting in the TMLE procedure. A common practice when dealing with high-dimensional target parameters is to separately target each component (Moore and van der Laan, 2009; Stitelman and van der Laan, 2010). However, such an estimator for the survival curve ignores the global monotone structure of the curve, potentially generating a non-monotone survival curve (especially for smaller sample sizes or when the survival curve is borderline identifiable).

van der Laan and Gruber (2016) introduced one-step TMLE for which the targeting only takes one step, and thereby requires minimal extra data fitting to achieve its goal of solving the efficient influence curve equation. They developed a one-dimensional universal least favorable submodel such that the one-step TMLE, only minimizing the empirical risk over a univariate parameter, simultaneously solves the multivariate efficient influence curve equation of the target parameter. This allows us to construct a one-step TMLE based on a one-dimensional parametric submodel through the initial estimator, that solves any multivariate desired set of estimating equations. One-step TMLE has already demonstrated good performance on univariate target parameters. For instance, the one-step TMLE of the “average treatment effect among the treated” parameter is more robust and stable than the iterative TMLE (van der Laan and Gruber, 2016).

In this paper, we construct a one-step TMLE for the treatment-rule specific survival curve, whose empirical score equation equals the Euclidean norm of the empirical mean vector efficient influence curve equation, so that solving this score equation solves all of them simultaneously. The one-step TMLE uses a stable one-dimensional targeting step, while at the same time respecting the global monotone shape of the survival curve. In addition to preserving all the asymptotic efficiency of previous TMLE methods, the 1-dimensional universal least favorable submodel allows an interpretation as a “shortest” path towards its high-dimensional goal of solving the empirical mean of the vector efficient influence curve equation. We also construct a one-step TMLE estimator for survival at a specific endpoint, which is the one-step TMLE counterpart of the iterative TMLE of a univariate survival probability.

This article is organized as follows. We first outline the data, model and parameter(s) of interest in Section 2. We then provide a brief review of iterative targeted maximum likelihood estimation for right-censored survival outcomes in Section 3.2. A new application of one-step targeted maximum likelihood estimation to survival probability is presented in Section 3.3. In Section 3.4 we extend the one-step TMLE to target the entire treatment-rule specific survival curve, establish its statistical properties and provide inference. In Section 4 we present two simulation studies to demonstrate the efficiency gains of using one-step TMLE over iterative TMLE estimator for survival curves. The performance of the estimators is compared in an application to a real-life dataset in Section 5. Finally, we conclude with a discussion in Section 6.

2 Data structure & notations

We consider a classic survival analysis data structure with a discrete survival time that is subject to right censoring. We assume that the study consists of subjects monitored at equally spaced time points. We focus on the time it takes for an event to occur which can take values . The right-censoring time is the first time point when the subject is no longer enrolled. We assume and are discrete but that by choosing the time scale fine enough it also provides methods for continuous survival data. At baseline, each subject is assigned treatment and some pre-treatment baseline covariates are also collected.

The full data structure on a subject is with some full-data distribution, and we observe a right-censored version of the full data. That is, our observed data set consists of i.i.d copies of , where is the last time point at which the subject is monitored, is the indicator that the subject is not censored and denotes the probability density of . This formulation of the data structure is termed the short form of survival data because each row in the data set corresponds to one observed subject (Stitelman and van der Laan, 2010). The same observed data can be equivalently expressed as a right-censored version of the failure and censoring counting processes. For each unique subject, we define two counting processes, one for the failure event as , and another for the censoring event as . and are indicators of whether the failure or censoring has ever occurred at time , and both and become degenerate if one of these processes have jumped to one. We assume time ordering , and the observed data is i.i.d copies of , where again denotes the probability density of .

We denote by

the true probability distribution of

and assume that falls in a statistical model that only enforces assumptions on the conditional distribution of treatment, given , and on the conditional hazard of censoring, given . We use the notation for the expectation of w.r.t. . We use to denote the empirical measure so that sample averages of can be written as . We adopt the common notation for the conditional hazards and conditional survival probabilities for both failure and censoring processes. The conditional hazard is defined by

where and are indicators of an observed failure and observed censoring event at time , respectively. Correspondingly, conditional survival is defined as

through the product-integral relation between a survival function and a hazard:


Our parameter of interest of the data distribution is , the treatment-rule specific survival curve, where denotes a treatment allocation rule which deterministically maps baseline covariates to either 0 or 1. is the whole function over time that takes the following form at given :

where is the probability distribution of and is the conditional treatment-rule specific survival function of at , given , which is identified by the conditional hazard through (1). Notice that only depends on through , therefore we will also denote it with .

We choose the loss function of

to be the negative log-likelihood loss . The density of under factorizes as


where is the density of probability distribution of w.r.t. some dominating measure, is the conditional probability of , given . We choose the statistical model to be nonparametric on the density of and and only potentially make model assumptions on the treatment mechanism and censoring mechanism .

If we break up at different time points, the univariate target parameters are pathwise differentiable with canonical gradient (van der Laan and Rubin, 2007; Moore and van der Laan, 2009)




3 Methodology

3.1 Targeted Maximum Likelihood Estimator (TMLE) in general

The TMLE is an asymptotically efficient substitution estimator obtained by constructing a so-called least favorable parametric submodel (LFM) through an initial estimator of the data distribution. The TMLE has score, at zero fluctuation of the initial estimator, that spans the efficient influence curve at the initial estimator (Van der Laan and Rose, 2011). The TMLE algorithm iteratively minimizes the corresponding empirical loss function (e.g, negative empirical log-likelihood) till no more updates occur, at which point the updated initial estimator has converged, and solves the so-called efficient influence curve equation


Thus, the resulting substitution estimator inherits the desirable properties of the estimating function based methodology, namely local efficiency and double robustness (Van der Laan and Robins, 2003). TMLE algorithm consists of two steps: 1) initial fit, and 2) targeting. In this paper, we will use the same machine-learning based estimator for the initial fit of the data distribution and present multiple targeting methods that affect the quality of the final TMLE estimator.

We use a fully data-adaptive estimator of all components of the probability distribution . Recall the factorization of the likelihood in (2), which is identified by the parameters , , , and . We will estimate with the empirical distribution of , and we recommend to use ensemble learning based on cross-validation (i.e, SuperLearner (Van der Laan et al., 2007)) to estimate the conditional hazard functions for failure and censoring , , as well as treatment mechanism

. Candidate estimators of the conditional hazard are obtained by expressing the data as the longitudinal data on the counting process (long form), and using one of the many pooled machine learning logistic regressions and parametric logistic regression models, all based on the negative log-likelihood loss (

2). Either the estimator with the optimal cross-validated log-likelihood is chosen (discrete super-learner), or the best weighted combination of the candidate estimators is chosen (super-learner) that optimizes the cross-validated log-likelihood over all weighted combinations. For implementation details we refer to Section 8 of Stitelman and van der Laan (2010). A software implementation in R language can be found in Benkeser and Hejazi (2017).

Since we choose the initial empirical distribution for to estimate which is the nonparametric maximum likelihood estimate for and is therefore not updated (Moore and van der Laan, 2009), TMLE will only need to target the estimator of the hazard function of , given . The following sections will discuss different targeting submodels to update the initial estimator of the conditional hazard of failure process. The current literature presents an iterative TMLE (Moore and van der Laan, 2009; Stitelman and van der Laan, 2010). In this article, we will develop two one-step TMLEs, one for survival probability at a specific endpoint, and one for the entire survival curve.

3.2 Iterative TMLE for survival at a fixed end point

Note that the efficient influence curve for from 1 to is a dimensional vector as in (3). One can either construct a that solves the vector equation


or, one can construct a separate TMLE for each solving


Each subtask corresponds to independently targeting the treatment-rule specific survival probability at a specific end point . Iterative TMLE solves each of the subtask with a corresponding local least favorable submodel (LLFM). The iterative TMLE esimtator for the survival curve is simply connecting the survival probabilities at different end points into a function of time. The iterative TMLE for time relies on the local least favorable submodel for target parameter to fluctuate the initial estimator . The local least favorable parametric submodel goes through at with parameter and with score We will choose a logistic regression fluctuation submodel:


where we define as factor of density only depending on the initial fit , is the fluctuation parameter of the model and the estimated time-dependent clever covariate is the same as in (3). We define , and the targeting step updates this initial fit by finding in the updated hazard to maximize the likelihood (2) of the observed data. The update can be done in practice by fitting a univariate logistic regression in the time-dependent covariate . The coefficient for is fixed at one and the intercept is set to zero and thus the whole regression is not refitted, rather only is estimated. We note in the formulation of the clever covariate (4) that the model can become unstable when the positivity assumption is close to being violated, namely when or can be close to zero for some strata of . This can be stabilized by moving the denominator of (4), , into the weights for the logistic regression fit, thereby using a weighted log-likelihood loss.

These steps for evaluating correspond with a single iteration of the targeted maximum likelihood algorithm. In the second iteration, the updated now plays the role of the initial fit and the covariate is then re-evaluated with the updated based on . In the third iteration is fit and the procedure is iterated until is essentially zero. The final hazard fit at the last iteration of the algorithm is denoted by with the corresponding survival fit given by . Finally, the TMLE of the probability of surviving past time for subjects under the treatment rule given by is computed by,

Here we suppress the dependence of the final TMLE on , and we note that only involves an update of , while the other factors in are equal to their initial estimator values. By the fact that the score of the submodel w.r.t. the log-likelihood loss (or weighted log-likelihood loss) equals at it follows that the TMLE of implied by solves the efficient influence curve estimating equation (7):

3.3 One-step TMLE for survival at a fixed end point

We define the universal least favorable submodel as in van der Laan and Gruber (2016). is a parametric submodel dominated by , such that and for each , we have We will still use negative log-likeliood loss and SuperLearner initial fit but the submodel is now the universal, instead of local, least favorable submodel. Since the universal least favorable submodel TMLE only requires one-step, the estimator will not change with any more iteration once the one-step TMLE is computed. The ULFM succeeds in achieving its goal of solving the vector equation 6 with minimal data fitting, as explained in van der Laan and Gruber (2016). We consider establishing the ULFM based on the LLFM for in the iterative TMLE case in Section 3.2. For and , we define an increment along the LLFM (8) as

The ULFM can be therefore defined by the following recursive definition:


Similarly, we have a recursive relation for , but since all these formulas are just symmetric versions of the case, we will focus on . This expresses the next in terms of previously calculated for , thereby fully defining this ULFM. This recursive definition (9) corresponds with the following integral representation of this ULFM when we take :


3.4 One-step TMLE for the whole survival curve

In this subsection, we construct a ULFM for the whole survival curve. If we would define a multi-dimensional LLFM, for example using the analogue of the 1-dimensional LLFM as in Section 3.3 but with a multi-dimensional and multidimensional clever covariate , the iterative TMLE might behave poorly by having to fit a very high dimensional -parameter. We construct a ULFM by using, for example, the same high-dimensional least favorable submodel, but only maximizing over with euclidean norm smaller than a small representing the infinitesimal small . Iteratively walking along these -specific fits of the high-dimensional represents a univariate submodel where the small moves of represents its parameter, we might denote with again. This new (data dependent) ULFM through has property that the score of the empirical log-likelihood at equals the euclidean norm of , so that one-step TMLE solves all equations simultaneously. The data dependence of the ULFM is minimal, only through the empirical mean of components of which are uniformly well-estimated, so that this ULFM is much less prone to an overly data adaptive targeting step, only having to maximize over single parameter .

Let be the Hilbert space of real-valued functions on endowed with inner product for some user-supplied positive and finite measure . The norm on this Hilbert space is thus given by . The efficient influence curve for a vector parameter is defined by , where takes the definition of (3). Let be the density of , given and let be an initial estimator of this conditional density using the same machine learning method as in Section 3.1, which then implies a corresponding density estimator . We are also given an estimator of , as well as of the censoring probability .

The universal canonical one-dimensional submodel (van der Laan and Gruber, 2016) applied to is defined by the following recursive relation: for ,


To obtain some more insight in this expression, we note, for example, that the inner product is given by: and similarly, we have such an integral representation of the norm in the denominator. Theorem 4 in van der Laan and Gruber (2016), or explicit verification, shows that for all , is a conditional density of , given , and Thus, if we move away from zero, the log-likelihood increases, and, one searches for the first so that this derivative is smaller than (for example) . Let , and be its corresponding conditional survival function. Then our one-step TMLE of the -specific survival function is given by :

Since is an actual conditional density, it follows that is a survival function. It has also been shown in van der Laan and Gruber (2016) that the one-step TMLE targeting the whole curve preserves all the asymptotic properties of iterative TMLE method.

3.5 Inference

3.5.1 Point confidence interval

The statistical inference of iterative and one-step TMLE at a single time point can be done in the same procedure. The TMLE estimators, both iterative and one-step, solve the efficient influence curve equation (5):

where is the efficient influence curve presented above in Equation (3). Thus, if all components are consistent and under regularity conditions, TMLE is asymptotically linear with influence curve (Van der Laan and Robins, 2003). Based on this result, when an estimate solves the efficient influence curve equation, relying on the consistency of

, inference may be based on the empirical variance of the efficient influence curve

. Thus, the asymptotic variance of is estimated by:

Now a valid

confidence interval is constructed under the normal distribution in the following way:

where is the

-quantile of the standard normal distribution.

It has been shown in van der Laan (2012), Moore and van der Laan (2009) and Hubbard et al. (2000) that TMLE has the double robustness property. In the case of estimating treatment-rule specific survival curve, TMLE is asymptotically linear with influence curve equal the efficient influence curve if we consistently estimate jointly. If either or the pair

is consistently estimated, the TMLE is still consistent and would be asymptotically linear if they are estimated with parametric models. Specifically, if

are estimated consistently with parametric models or more general models that have a well-behaved MLE (such as the Cox proportional hazard model), but is possibly inconsistent, then TMLE is still asymptotically linear and the above confidence interval is conservative (Sec 2.5. of Hubbard et al. (2000), and Robins and Rotnitzky (1992)). Either way, we recommend this confidence interval in practice, while using highly adaptive estimators of all the nuisance parameters.

If our parameter of interest is some function of the treatment-rule specific survival estimates, we can apply the -method to obtain the estimate of its influence curve. For example, the estimated influence curve for the additive difference in survival at


is given by where and take the definition as in (3

). We can again compute confidence intervals and test statistics for these parameters using the estimated influence curve to estimate the asymptotic variance.

3.5.2 Simultaneous confidence interval

Similar to the construction of point-wise confidence interval, the simultaneous confidence bands for the survival curve estimates can be constructed based on asymptotic linearity of the TMLE uniform in all time points considered. Inference for the survival probabilities at time points, a vector parameter, is also based on the empirical variance of the efficient influence curve itself at the limit of . The asymptotic variance of may be consistently estimated by the by empirical covariance matrix of the efficient influence curve:

By multivariate central limit theorem, we have

As a result, an approximate simultaneous confidence band is constructed such that for each , the component of , the region is given by

where is the -th entry in the empirical covariance matrix, thus the empirical variance of . an estimate of the quantile of

. Here we need to use that the latter random variable behaves as the max over

of , where follows -dimensional gaussian and is the correlation matrix of the vector influence curve . We simulate Monte-Carlo samples of and calculate using the empirical quantile of of the random samples. Due to actual weak convergence of the standardized TMLE as a random function in function space endowed with supremum norm, these simultaneous confidence bands are valid even as we take a finer and finer grid of time points as increases.

4 Simulation

To illustrate some of our proposed methods and explore finite-sample performance, We simulate a continuous baseline covariate , a binary exposure , a survival outcome with censoring time . We simulate data from the following model so that , , and are confounded by :

To analyze the above simulated data, we estimate the entire survival curve under the treatment rule that all subjects are treated . Here we implement the proposed one-step TMLE for the entire survival curve, but also give results for the one-step version for a fixed endpoint in Section 3.3 and the iterative version in Section 3.2 (with R code for all methods given in “MOSS” package (Cai and van der Laan, 2018)). All estimators depend on estimates of the functions , , , which we construct using identical SuperLearner that contains the correctly specified parametric models. In practice, we suggest using more comprehensive learner libraries to minimize the risk of model misspecification. We run 200 Monte-Carlo simulations of randomized experiments. The Monte-Carlo replications allow us to estimate the mean squared errors (MSEs) of different estimators. We use the MSEs to further calculate the relative efficiencies (RE) against iterative TMLE for all estimators.

Results for estimating are given in Figure 1.

(a) (b) (c)

Figure 1: (a) one realization of the simulation. (b) Relative efficiency of methods against iterative TMLE, as a function of time, for sample size 50 and (c) sample size 1000. Note the relative efficiency value larger than 3 are capped to 3 so that the plot range is around [0,1] for ease of visual analysis

The simulation results reflect what is expected based on theory. Figure 1(a) is one experimentation of the data generating distribution. We can tell from the plot that the iterative TMLE is not monotone around time = 16, while the one-step TMLE still keeps a monotone decreasing trend. We plot in Figure 1(b) the relative efficiencies at each time point from 0 to 10. Due to the confounding of the baseline covariate , the Kaplan-Meier estimator is biased negatively. The iterative and one-step TMLE are both unbiased, but the iterative TMLE has much larger variance. As a result, the MSE of one-step TMLE for the whole curve can be reduced to half that of the iterative TMLE. We assess the asymptotic performance of three estimators in Figure 1(c), where we choose the sample sizes to be 1000. By plotting the MSEs as a function of time, we see that iterative and one-step TMLE are asymptotically equivalent, while Kaplan-Meier is biased.

5 Data analysis

We apply our proposed methods to a European Union legislation time data studying the effect of a voting procedure on one survival outcome: passing the legislation. The data consists of 3001 pieces of legislation (observations) in EU from 1968 to 1999. At baseline, features about each legislation and its political environment are recorded, and the binary treatment is whether qualified majority vote (QMV) is applied to the legislation (EUR-Lex, 2016). The primary goal of the study was to learn whether the treatment (QMV) affects the time for each legislation to pass. When legislation takes too long to pass, politicians modify and start over with new legislation with a new set of baseline covariates and the old legislation is censored. Golub (2007) give full details of the study population and design.

In our analysis, we adjust for all baseline features and QMV, and we used the cross-validation-based SuperLearner to combine generalized linear models, generalized additive models, and random forests. Note that various factors confound the failure event and censoring event, therefore non-parametric covariate adjustment should be made for the initial fit of survival hazard, censoring hazard, and propensity score. We discretized the continuous time into windows of size 30 days

, in order to keep computation tractable. We estimate the treatment-specific survival curves using (i) one-step TMLE for the whole survival curve from Section 3.4, (ii) one-step TMLE targeting each time point as in Section 3.3, and (iii) iterative TMLE for each time point as in Section 3.2, and finally estimate the difference of survival curves as in Equation (12) and compute simultaneous confidence bands for the difference curve from Section 3.5.2.

Figure 2: Treatment-rule specific survival curves for interventions QMV = 1 (red) and QMV = 0 (blue) estimated using the Kaplan-Meier, iterative TMLE, one-step TMLE for single time and one-step TMLE targeting entire curve.

Figure 2 illustrates the results. The survival curves under the treatment arm (red) are uniformly lower than the control arm (blue), no matter what estimator we use. This indicates that the treatment (QMV) reduces the time to pass legislation, which we defined as the event of interest of the model. Table 1 summarizes the distribution of the denominator of clever covariate (4), , under the counterfactuals and . There is no extremely large value, suggesting that the positivity assumption holds in this finite sample. The iterative TMLE curve is not monotone, as indicated by the jaggy blue curve between the 15th and 25th 30-day period from Figure 2. On the other hand, the one-step TMLE estimators for the entire survival curve or for single time points have monotonically decreasing shape. We note that the one-step TMLE targeting the entire survival curve will always guarantee monotonicity under different experiment and data settings.

Min. 1st Qu. Median Mean 3rd Qu. Max.
A=1 1.14 1.99 3.71 4.82 6.04 31.30
A=0 1.31 2.34 4.86 10.51 10.58 186.70
Table 1: Empirical distribution of the clever covariate weights , under two counterfactuals and .

It is natural also to target the difference of the two survival curves (12), which is the causal effect of on the survival. Figure 3 depicts the estimated difference curve, which draws the same conclusion that QMV has a positive effect on accelerating legislation.

Figure 3: One-step TMLE estimator for the difference of survival curves

6 Discussion

In this paper, we provided two one-step TMLEs for estimating the treatment-rule specific survival curve: one that targets the survival curve at each time separately and another that targets the entire survival curve at once. The one-step estimators have implications for the survival analysis literature by allowing one to construct a TMLE for the infinite dimensional survival curve in a single step. The new methods preserve the asymptotic linearity and efficiency of the iterative TMLE, which adjusts for baseline covariates and accounts for informative censoring through inverse weighting. Additionally, the one-step estimator targeting the entire survival curve respects the monotonically decreasing shape of the estimand. On top of that, the new TMLE for the entire curve also yields a fully compatible TMLE for any function of the whole survival curve, such as the median, quantile, or truncated mean. Thus there is no need to compute a new TMLE for each specific feature of the survival curve, or difference of survival curves. All of these advantages come without requiring any parametric modeling assumptions and is robust to misspecification of the hazard fit. Our simulation confirms the theory in existing literature: that in situations where targeting is difficult and prone to error, using one-step TMLE that fluctuates universal least favorable submodel may provide robustness and efficiency over iterative TMLE. Under large sample sizes, iterative and one-step TMLE are comparable. We show that in practical finite sample situations for survival analysis, using universal least favorable submodel to target a multi-dimensional or even infinite-dimensional target parameter is likely to result in a more efficient and stable estimator. It is not clear how our methods compare with applying isotonic regression to the curve defined by the one-step TMLEs targeting one survival probability across all time-points. This represents another valid and possible method to consider if getting the whole survival curve is the goal of the analysis.


  • Benkeser and Hejazi (2017) David C Benkeser and Nima S Hejazi. survtmle: Targeted Minimum Loss-Based Estimation for Survival Analysis in R, 2017. URL
  • Cai and van der Laan (2018) Wilson Cai and Mark J. van der Laan. MOSS: One-step TMLE for survival analysis, 2018. URL R package version 0.1.0.
  • EUR-Lex (2016) EUR-Lex. Qualified majority, 2016. URL Accessed: 2016-12-02.
  • Faraggi and Simon (1995) David Faraggi and Richard Simon.

    A neural network model for survival data.

    Statistics in medicine, 14(1):73–82, 1995.
  • Golub (2007) Jonathan Golub. Survival analysis and european union decision-making. European Union Politics, 8(2):155–179, 2007.
  • Gordon and Olshen (1985) Louis Gordon and Richard A Olshen. Tree-structured survival analysis. Cancer treatment reports, 69(10):1065–1069, 1985.
  • Hothorn et al. (2004) Torsten Hothorn, Berthold Lausen, Axel Benner, and Martin Radespiel-Tröger. Bagging survival trees. Statistics in medicine, 23(1):77–91, 2004.
  • Hubbard et al. (2000) Alan E Hubbard, Mark J Van Der Laan, and James M Robins. Nonparametric locally efficient estimation of the treatment specific survival distribution with right censored data and covariates in observational studies. IMA Volumes in Mathematics and Its Applications, 116:135–178, 2000.
  • Khan and Zubek (2008) Faisal M Khan and Valentina Bayer Zubek. Support vector regression for censored data (svrc): a novel tool for survival analysis. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 863–868. IEEE, 2008.
  • LeBlanc and Crowley (1993) Michael LeBlanc and John Crowley. Survival trees by goodness of split. Journal of the American Statistical Association, 88(422):457–467, 1993.
  • Leger et al. (2017) Stefan Leger, Alex Zwanenburg, Karoline Pilz, Fabian Lohaus, Annett Linge, Klaus Zöphel, Jörg Kotzerke, Andreas Schreiber, Inge Tinhofer, Volker Budach, et al. A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling. Scientific reports, 7(1):13206, 2017.
  • Moore and van der Laan (2009) Kelly Moore and Mark van der Laan. Application of time-to-event methods in the assessment of safety in clinical trials. Design and Analysis of Clinical Trials with Time-to-Event Endpoints. Taylor & Francis, pages 455–482, 2009.
  • Robins and Rotnitzky (1992) James M Robins and Andrea Rotnitzky. Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS epidemiology, pages 297–331. Springer, 1992.
  • Stitelman and van der Laan (2010) Ori Stitelman and Mark van der Laan. Collaborative targeted maximum likelihood for time to event data. The International Journal Of Biostatistics, 2010.
  • van der Laan and Gruber (2016) Mark van der Laan and Susan Gruber. One-Step targeted minimum loss-based estimation based on universal least favorable One-Dimensional submodels. The International Journal Of Biostatistics, 2016.
  • van der Laan (2012) Mark J van der Laan. Statistical inference when using data adaptive estimators of nuisance parameters. Technical Report 302, Division of Biostatistics, University of California, Berkeley, submitted to IJB, 2012.
  • Van der Laan and Robins (2003) Mark J Van der Laan and James M Robins. Unified methods for censored longitudinal data and causality. Springer Science & Business Media, 2003.
  • Van der Laan and Rose (2011) Mark J Van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media, 2011.
  • van der Laan and Rubin (2007) Mark J van der Laan and Daniel Rubin. A note on targeted maximum likelihood and right censored data. Technical Report 226, Division of Biostatistics, University of California, Berkeley, 2007.
  • Van der Laan et al. (2007) Mark J Van der Laan, Eric C Polley, and Alan E Hubbard. Super learner. Statistical applications in genetics and molecular biology, 6(1), 2007.
  • Wang et al. (2017) Ping Wang, Yan Li, and Chandan K Reddy. Machine learning for survival analysis: A survey. arXiv preprint arXiv:1708.04649, 2017.


6.1 Proof that our proposed submodel (10) is a universal least favorable submodel

Clearly, it is a submodel so that for each it yields a hazard, and that it contains at . Recall the loss function (2) evaluated at :

Use the property of (10) we have

Plug into the loss function, we have the score of at is given by

explicitly proving that indeed this is a universal least favorable model for .