1 Introduction
In survival analysis, one can often use nonparametric approaches to flexibly estimate the survival function from lifetime data, such as the Kaplan–Meier estimator
kaplan1958nonparametric , or to estimate the intensity of a point process for event arrivals, such as the isotonic Hawkes process wang2016isotonic and neural Hawkes process mei2017neural that can be applied to the analysis of recurring events. When exploring the relationship between the covariates and time to events, existing survival analysis methods often parameterize the hazard function with a weighted linear combination of covariates. One of the most popular ones is the Cox proportional hazards model cox1992regression , which is semiparametric in that it assumes a nonparametric baseline hazard rate to capture the time effect. These methods are often applied to populationlevel studies that try to unveil the relationship between the risk factors and hazard function, such as to what degree a unit increase in a covariate is multiplicative to the hazard rate. However, the interpretability is often obtained by sacrificing model flexibility, because the proportionalhazards assumption is violated when the covariate effects are nonmonotonic. For example, both very high and very low ambient temperature were related to high mortality rates in Valencia, Spain, 19911993 ballester1997mortality , and a significantly increased mortality rate is associated with both underweight and obesity flegal2007cause .To accommodate nonlinear covariate effects such as nonmonotonicity, existing (semi)parametric models often expand the design matrix with transformed data, like the basis functions of smoothing splines
li2005boosting ; lu2008boostingand other transformations guided by subjective knowledge. Instead of using handdesigned data transformations, there are several recent studies in machine learning that model complex covariate dependence with flexible functions, such as deep exponential families
ranganath2016deep katzman2018deepsurv ; zhu2016deep ; chapfuwa2018adversarial and Gaussian processes fernandez2016gaussian . With enhanced flexibilities, these recent approaches are often good at assessing individual risks, such as predicting a patient’s hazard function or survival time. However, except for the Gaussian process whose results are not too difficult to interpret for lowdimensional covariates, they often have difficulty in explaining how the survival is impacted by which covariates, limiting their use in survive analysis where interpretability plays a critical role. Some approaches discretize the realvalued survival time and model the surviving on discrete time points or intervals luck2017deep ; li2016multi ; yu2011learning ; miscouridoudeep . They transform the timetoevent modeling problem into regression, classification, or ranking ones, at the expense of losing continuity information implied by the survival time and potentially having inconsistent categories between training and testing.In survival analysis, it is very common to have competing risks, in which scenario the occurrence of an event under a risk precludes events under any other risks. For example, if the event of interest is death, then all possible causes of death are competing risks to each other, since a subject that died of one cause would never die of any other cause. Apart from modeling the time to event, in the presence of competing risks, it is also important to model the event type, or under which risk the event is likely to occur first. Though one can censor subjects with an occurrence of the event under a competing risk other than the risk of special interest, so that every survival model that can handle censoring is able to model competing risks, it is problematic to violate the principle of noninformative censoring kalbfleisch2011statistical ; austin2016introduction . The analysis of competing risks should be carefully designed and people often model two types of hazard functions, cause specific putter2007tutorial ; lau2009competing and subdistribution fine1999proportional ; putter2007tutorial ; lau2009competing hazard functions. The former applies to studying etiology of diseases, while the latter is favorable when developing prediction models and riskcensoring systems austin2016introduction .
In the analysis of competing risks, there is also a tradeoff between interpretability and flexibility. The aforementioned cause specific and subdistribution hazard functions use a Cox model with competing risk wolbers2014concordance ; austin2016introduction and a FineGray subdistribution model fine1999proportional
, respectively, which are both proportional hazard models. Both models are semiparametric, and assume that the hazard rate is proportional to the exponential of the inner product of the covariate and regression coefficient vectors, along with a nonparametric baseline hazard function. However, the existence of nonmonotonic covariate effects can easily challenge and break the proportionalhazards assumption inherited from their corresponding singlerisk model. This barrier has been surmounted by nonparametric approaches, such as random survival forests
ishwaran2014random , Gaussian processes with a single layer barrett2013gaussian or two alaa2009deep , and classificationbased neural networks that discretize the survival time lee2018deephit. These models are designed for competing risks, using the covariates as input and the survival times (or their monotonic transformation) or probabilities as output. Though having good model fit, the nonparametric approaches are specifically used for studies at an individual level, such as predicting the survival time, but not able to tell how the covariates affect the survival or cumulative incidence functions
fine1999proportional ; crowder2001classical . Moreover, it might be questionable for Alaa and van der Schaar alaa2009deepto assume a normal distribution on survival times which are positive almost surely and asymmetric in general.
To this end, we construct Lomax delegate racing (LDR) survival model, a gamma process based nonparametric Bayesian hierarchical model for survival analysis with competing risks. The LDR survival model utilizes the race of exponential random variables to model both the time to event and event type and subtype, and uses the summation of a potentially countably infinite number of covariatedependent gamma random variables as the exponential distribution rate parameters. It is amenable to not only censoring data, but also missing event types or event times. Code for reproducible research is available at
https://github.com/zhangquanut/Lomaxdelegateracingforsurvivalanalysiswithcompetingrisks.2 Exponential racing and survival analysis
Let
represent an exponential distribution, with probability density function (PDF)
where represents the nonnegative side of the real line, and is the rate parameter such that and . Shown below is a wellknown property that characterizes a race among independent exponential random variables Ross:2006:IPM:1197141; caron2012bayesian .Property 1 (Exponential racing ).
If , where , are independent to each other, then and the argument of the minimum are independent, satisfying
(1) 
Suppose there is a race among teams , whose completion times follow , with the winner being the team with the minimum completion time. Property 1 shows the winner’s completion time still follows an exponential distribution and is independent of which team wins the race. In the context of survival analysis, if we consider a competing risk as a team and the latent survival time under this risk as the completion time of the team, then will be the observed time to event (or failure time) and
the event type (or cause of failure). Exponential racing not only describes a natural mechanism of competing risks, but also provides an attractive modeling framework amenable to Bayesian inference, as conditioning on
’s, the joint distribution of the event type
and time to event becomes fully factorized as(2) 
In survival analysis, it is rarely the case that both and are observed for all observations, and one often needs to deal with missing data and right or left censoring. We write as a truncated exponential random variable defined by PDF where and is an open interval on representing censoring. Concretely, can be , indicating right censoring with censoring time , can be , indicating left censoring with censoring time , or can be a more general case , .
If we do not observe or , or there exists censoring, we have the following two scenarios, for both of which it is necessary to introduce appropriate auxiliary variables to achieve fully factorized likelihoods: 1) If we only observe (or ), then we can draw (or ) shown in (1) as an auxiliary variable, leading to the fully factorized likelihood as in (2); 2) If we do not observe but know with , then we draw , resulting in the likelihood
(3) 
Together with , which can be drawn by (1) if it is missing, the likelihood becomes the same as in (2). The procedure of sampling and/or , generating fully factorized likelihoods under different censoring conditions, plays a crucial role as a data augmentation scheme that will be used for Bayesian inference of the proposed LDR survival model.
In survival analysis with competing risks, one is often interested in modeling the dependence of the event type and failure time on covariates . Under the exponential racing framework, one may simply let , where is the regression coefficient vector for the th competing risk or event type. However, the hazard rate for the th competing risk, expressed as , is restricted to be loglinear in the covariates
. This clear restriction motivates us to generalize exponential racing to Lomax racing, which can have a timevarying hazard rate for each competing risk, and further to Lomax delegate racing, which can use the convolution of a potentially countably infinite number of covariatedependent gamma distributions to model each
.3 Lomax and Lomax delegate racings
In this section, we generalize exponential racing to Lomax racing, which relates survival analysis with competing risks to a race of conditionally independent Lomax distributed random variables. We further generalize Lomax racing to Lomax delegate racing, which races the winners of conditionally independent Lomax racings. Below we first briefly review Lomax distribution.
Let represent a gamma distribution with and . Mixing the rate parameter of an exponential distribution with leads to a Lomax distribution lomax1954business , with shape , scale , and PDF
When , we have , and when , we have . The Lomax distribution is a heavytailed distribution. Its hazard rate and survival function can be expressed as and , respectively.
3.1 Covariatedependent Lomax racing
We generalize covariatedependent exponential racing by letting
Marginalizing out leads to . Lomax distribution was initially introduced to study business failures lomax1954business and has since then been widely used to model the time to event in survival analysis myhre1982screen ; howlader2002bayesian ; cramer2011progressively ; hemmati2017adaptive . Previous research on this distribution al2001statistical ; childs2001order ; giles2013bias , however, has been mainly focused on point estimation of parameters, without modeling covariate dependence and performing Bayesian inference. We define Lomax racing as follows.
Definition 1.
Lomax racing models the time to event and event type given covariates as
(4) 
To explain the notation, suppose a patient has both diabetes () and cancer (), then will be the patient’s latent survival time under diabetes and under cancer. The patient’s observed survival time is . Note Lomax racing can also be considered as an exponential racing model with multiplicative random effects, since in (4) can also be generated as
There are two clear benefits of Lomax racing over exponential racing. The first benefit is that given and , the hazard rate for the th competing risk, expressed as , is no longer a constant as . The second benefit is that closedform Gibbs sampling update equations can be derived, as will be described in detail in Section 4 and the Appendix.
For competing risk , we can also express as
Thus Lomax racing regression uses an accelerated failure time model kalbfleisch2011statistical for each of its competing risks. More specifically, with and , we have
(5) 
and hence can be considered as the accelerating factor for competing risk . Considering all risks, we can express survival function and hazard function as
(6) 
The nosology of competing risks is often subjected to human knowledge, diagnostic techniques, and patient population. Diseases with the same phenotype, categorized into one competing risk, might have distinct etiology and different impacts on survival, and thus require different therapies. For example, for a patient with both diabetes and cancer, it can be unknown whether the patient has Type 1 or Type 2 diabetes, where Type 1 is ascribed to insufficient production of insulin from pancreas whereas Type 2 arises from the cells’ failure in responding properly to insulin varma2014prevalence . In this regard, it is often necessary for a model to identify subrisks within a prespecified competing risk, which may not only improve the fit of survival time, but also help diagnose new disease subtypes. We develop Lomax delegate racing, assuming that a risk consists of several subrisks, under each of which the latent failure time is accelerated by the exponential of a weighted linear combination of covariates.
3.2 Lomax delegate racing
Based on the idea of Lomax racing that an individual’s observed failure time is the minimum of latent failure times under competing risks, we further propose Lomax delegate racing (LDR), assuming a latent failure time under a competing risk is the minimum of the failure times under a number of subrisks appertaining to this competing risk. In particular, let us first denote as a gamma process defined on the product space , where , is a finite and continuous base measure over a complete separable metric space , and is a positive scale parameter, such that for each Borel set . A draw from the gamma process consists of countably infinite nonnegatively weighted atoms, expressed as . Now we formally define LDR survival model as follows.
Definition 2 (Lomax delegate racing).
Given a random draw of a gamma process , expressed as , for each , Lomax delegate racing models the time to event and event type given covariates as
(7) 
In contrast to specifying a fixed number of competing risks , the gamma process not only admits a race among a potentially infinite number of subrisks, but also parsimoniously shrinks toward zero the weights of negligible subrisks zhou2016augmentable ; SoftplusReg_2016 , so that the nonmonotonic covariate effects on the failure time under a competing risk can be interpreted as the minimum, which is a nonlinear operation, of failure times under subrisks whose accelerating factor is loglinear in . As shown in the following Corollary, LDR can also be considered as a generalization of exponential racing, where the exponential rate parameter of each competing risk is a weighted summation of a countably infinite number of gamma random variables with covariatedependent weights.
Corollary 1.
Lomax delegate racing survival model can also be expressed as
(8) 
We provide in the Appendix the marginal distribution of in LDR for situations where predicting the failure time is of interest. The survival and hazard functions of LDR, which generalize those of Lomax racing in (6), can be expressed as
(9) 
LDR survival model can be considered as a twophase racing, where in the first phase, for each of the prespecified competing risk there is a race among countably infinite subrisks, and in the second phase, riskspecific failure times race with each other to eventually determine both the observed failure time and event type . Moreover, Corollary 1, representing LDR as a singlephase exponential racing, more explicitly explains nonmonotonic covariate effects on by writing the exponential rate parameter of as the aggregation of weighted by gamma random variables with the shape parameters as the atom weights of the gamma process .
4 Bayesian inference
LDR utilizes a gamma process ferguson73 to support countably infinite regressioncoefficient vectors for each prespecified risk. The gamma process has an inherent shrinkage mechanism in that the number of atoms whose weights are larger than a positive constant
is finite almost surely and follows a Poisson distribution with mean
. For the convenience of implementation, as in Zhou et al. NBP2012 , we truncate the total number of atoms of a gamma process to be by choosing a finite and discrete base measure as . Let us denote and as the covariates and the event type, respectively, for individual . We express the full hierarchical model of the (truncated) gamma process LDR survival model as(10) 
where we further let . The joint probability given is
which is amenable to posterior simulation for . Let us denote
as the likelihood for negative binomial distribution and
as the sigmoid function. Further marginalize out
leads to a fully factorized joint likelihood as(11) 
which is amenable to posterior simulation using the data augmentation based inference technique for negative binomial regression LGNB_ICML2012 ; polson2013bayesian . The augmentation schemes of and/or discussed in Section 2 are used to achieve (11) in the presence of censoring or as a remedy for missing data. We describe in detail both Gibbs sampling and maximum a posteriori (MAP) inference in the Appendix.
5 Experimental results
Synthetic data 1  Synthetic data 2 
,  , 
,  , 
In this section, we validate the proposed LDR model by a variety of experiments using both synthetic and real data. Some data description, implementation of benchmark approaches, and experiment settings are deferred to the Appendix for brevity. In all experiments we exclude from the testing data the observations that have unknown failure times or event types. We compare the proposed LDR survival model, causespecific Cox proportional hazards model (Cox)wolbers2014concordance ; austin2016introduction , FineGray proportional subdistribution hazards model (FG) fine1999proportional and its boosting algorithm (BST) which is more stable for highdimensional covariates binder2009boosting , and random survival forests (RF) ishwaran2014random , which are all designed for survival analysis with competing risks. We show that LDR performs uniformly well regardless of whether the covariate effects are monotonic or not. Moreover, LDR is able to infer the missing cause of death and/or survival time of an observation, both of which in general cannot be handled by these benchmark methods. The model fits of LDR by Bayesian inference via Gibbs sampling and MAP inference via stochastic gradient descent (SGD) are comparable. We will report the results of Gibbs sampling, as it provides an explicit criterion to prune unneeded model capacity (Steps 1 and 8 of Appendix B), avoiding the need of model selection and parameter tuning. For large scale data, performing MAP inference via SGD is recommended if Gibbs sampling takes too long to run a sufficiently large number of iterations. We quantify model performance by causespecific concordance index (Cindex) wolbers2014concordance , where the Cindex of risk at time in this paper is computed as
where and is a prognostic score at time depending on such that its higher value reflects a higher risk of cause . Intuitively, for cause , if patient died of this cause (, ), and patient either died of another cause (, ) or died of this cause but lived longer than patient (, ), then it is likely that for patient is higher than for patient , and the ranking of risks for this pair of patients is concordant. Cindex measures such concordance, and a higher value indicates better model performance. Wolbers et al. wolbers2014concordance write Cindex as a weighted average of timedependent AUC that is related to sensitivity, specificity, and ROC curves for competing risks saha2010time . So a Cindex around implies a model failure. A good choice of the prognostic score is the cumulative incidence function, fine1999proportional ; kalbfleisch2011statistical ; crowder2001classical . Distinct from a survival function that measures the probability of surviving beyond some time, CIF estimates the probability that an event occurs by a specific time in the presence of competing risks. For LDR given and ,
where the expectation is taken over , where . The expectation can be evaluated by MonteCarlo estimation if we have a point estimate or a collection of postburnin MCMC samples of and .
5.1 Synthetic data analysis
We first simulate two datasets following Table 1, where , to illustrate the unique nonlinear modeling capability of LDR. In Table 1 denotes the latent survival time under risk , and is the observed time to event. The event type if , with indicating right censoring if , where the censoring time for data 1 and for data 2. We simulate 1,000 random observations, and use 800 for training and the remaining 200 for testing. We randomly take 20 training/testing partitions, on each of which we evaluate the testing causespecific Cindex at time for data 1 and at time for data 2. The sample mean standard deviation of the estimated causespecific Cindices of risks 1, and the estimated ’s of both risks by LDR (from one random training/testing partition but without loss of generality) for data 1 are displayed in panels (a) and (b) of Figure 1, respectively. Analogous plots for data 2 are shown in panels (c) and (d). The testing Cindices of risk 2 are analogous to those of risk 1 for both datasets, thus shown in Figure 5 in the Appendix for brevity.
For data 1 where the survival times under both risks depend on the covariates monotonically, LDR has comparable performance with Cox, FG, and BST, and all these four models slightly outperform RF in terms of the mean values of Cindices. The underperformance of RF in the case of monotonic covariate effects has also been observed in its original paper ishwaran2014random . For data 2 where the survival time and covariates are not monotonically related, LDR and RF at any time evaluated significantly outperform the other three approaches, all of which fail on this dataset as their Cindices are around for both risks. Panels (b) and (d) of Figure 1 show inferred on data 1 and 2, respectively. More specifically, both risks consist of only one subrisk for data 1. By contrast, two subrisks of the two respective risks can approximate the complex data generating process of data 2.
5.2 Real data analysis
We analyze a microarray geneexpression profile rosenwald2002use to assess our model performance on real data. The dataset contains a total of 240 patients with diffuse large Bcell lymphoma (DLBCL). Multiple unsuccessful treatments to increase the survival rate suggest that there exist several subtypes of DLBCL that differ in responsiveness to chemotherapy. In the DLBCL dataset, Rosenwald et al. rosenwald2002use identify three geneexpression subgroups, including activated Bcelllike (ABC), germinalcenter Bcelllike (GCB), and type 3 (T3) DLBCL, which may be related to three different diseases as a result of distinct mechanisms of malignant transformation. They also suspect that T3 may be associated with more than one such mechanism. In our analysis, we treat the three subgroups and their potential malignant transformation mechanisms as competing risks from which the patients suffer. As the total number of patients is small which is often the case in survival data, we consider 434 genes that have no missing values across all the patients. Seven of the 434 genes have been reported to be related to clinical phenotypes and four of the seven to have nonmonotonic effects on the risk of death li2005boosting . Since some gene expressions may be highly correlated, we follow the same selection procedure of Li and Luan li2005boosting to include as covariates the seven genes, together with another 33 genes having the highest Cox partial score statistic, so that both Cox proportional model and FG subdistribution model for competing risks do not collapse for computational singularity. We use 200 observations for training and the remaining 40 for testing. We take 20 random training/testing partitions and report in Figure 2 boxplots of the testing Cindices evaluated at year , by the same five approaches used in the analysis of synthetic datasets.
The boxplots of BST and LDR are roughly comparable for ABC, but the median of LDR is slightly higher than those of BST until year 2, and hereafter slightly lower. For GCB and T3, LDR results in higher median Cindices than all the other benchmarks do at any time evaluated, indicating LDR provides a big improvement in predicting lymphoma CIFs. Interestingly noted is that RF has low performance in both ABC and GCB, but outperforms Cox, FG, and BST and is comparable to LDR in T3. This implies that the gene expressions may have monotonic effects on survival under ABC or GCB, but it is not the case for T3, which can be validated by the fact that LDR learns one subrisk for ABC and GCB, respectively, and two subrisks for T3. To better show the improvements of LDR over existing approaches, we calculate the difference of Cindices between LDR and each of the other four benchmarks within each trainingtesting partition, and report the sample mean and standard deviation across partitions in Table 11 in the Appendix. On average, the improvements of LDR over Cox, FG, and BST are bigger for T3 than those for ABC or GCB, whereas LDR outperforms RF by a larger margin for ABC and GCB than for T3. This shows another advantage of LDR that it fits consistently well regardless of whether the covariate effects are monotonic or not.
We further analyze a publicly accessible dataset from the Surveillance, Epidemiology, and End Results (SEER) Program of National Cancer Institute seer . The SEER dataset we use contains two risks: one is breast cancer and the other is “other causes,” which we denote as BC and OC, respectively. It also contains some incomplete observations, each of which with an unknown cause of death but observed uncensored time to death, that can be handled by LDR. The individual covariates include the patients’ personal information, such as age, gender, race, and diagnostic and therapy information. More details are deferred to the Appendix.
We first eliminate all observations with unknown causes of death, so we can make comparison between LDR, Cox, FG, BST, and RF. We take 20 random training/testing partitions of the dataset, in each of which of observations are used as training and the remaining as testing. In Figure 3, panels (a) and (b) show the boxplots of Cindices for BC and OC, respectively, obtained from the 20 testing sets by the five models at months . For BC the Cindices by LDR are comparable to those by the other four approaches until month 150 and slightly higher afterwards. For the OC the Cindices by LDR are slightly lower than those by Cox, FG, and BST, but become similar after month 100. Also note that RF underperforms the other four approaches since month 100 for BC and month 50 for OC. Comparable Cindices from LDR, Cox, FG, and BST imply monotonic impacts of covariates on survival times under both risks. In fact, for either risk we learn a subrisk which dominates the others in terms of weights. Furthermore, we analyze the SEER dataset by LDR using the same training/testing partitions, but additionally including the observations having missing causes of death into the 20 training sets, and show the testing Cindices in panels (c) and (d) of Figure 3. We see the testing Cindices are very similar to those in (a) and (b). More importantly, LDR provides a probabilistic inference on missing time to event or missing causes during the model training procedure.
In Appendix E we further provide the Brier scores gerds2008performance ; steyerberg2010assessing of each risk in all data sets over time. Brier score quantifies the deviation of predicted CIF’s from the actual outcomes and a smaller value implies a better model performance van2011dynamic . Tables 210 in Appendix E show Brier scores by the models compared on the four data sets, indicating the model outofsample prediction performance is basically consistent with those quantified by Cindices. Specifically, for the cases of synthetic data 1, SEER, and both ABC and GCB of DLBCL, where Cindices imply linear covariate effects, the Brier scores are comparable for Cox, FG, BST, and LDR, and slightly smaller than those of RF. For synthetic data 2 and T3 of DLBCL where Cindices imply nonlinear covariate effects, the Brier scores by LDR and RF are smaller than those by Cox, FG, and BST. Moreover, the Brier scores by LDR are slightly larger than those of RF for synthetic data 2 but smaller for T3 of DLBCL.
To show the interpretability of LDR, we visualize representative individuals, each of which suffers from an inferred subrisk. Specifically, for each inferred subrisk under risk , we find the representative by evaluating a weighted average of all uncensored observations as , where , , and and are the estimated values of and , respectively. The weight extracts the component of that is likely to make the event of subrisk under risk first occur. Then we implement an Isomap algorithm tenenbaum2000global and visualize in Figure 4 the representatives along with uncensored observations in both DLBCL and SEER. Details are provided in the Appendix.
In Figure 4, small symbols denote uncensored observations and large ones the representatives. Panels (a) and (b) show the representatives suffering from subrisks in the DLBCL and SEER dataset, respectively. In panel (a), we use green for ABC, pink for GCB, and black for T3. The only representative suffering from ABC (GCB) is surrounded by small green (pink) symbols, indicating they signify a typical gene expression profile that may result in the corresponding malignant transformation. There are two representatives suffering from the two subrisks of T3, denoted by a large triangle and a large diamond, respectively. They approximately lie in the center of the respective cluster of small triangles and diamonds, which denote patients suffering from the corresponding subrisks of T3 with an estimated probability greater than . The two subrisks of T3 and the representatives verify the heterogeneity of gene expressions under this risk, and strengthen the belief that T3 consists of more than one type of DLBCL rosenwald2002use . For the SEER data, we randomly select 100 of the 2088 uncensored observations with known event types for visualization. In panel (b), we use green for BC and pink for OC. LDR learns only one subrisk for each of these two risks, and place for each risk a representative approximately at the center of the cluster of patients who died of that risk.
6 Conclussion
We propose Lomax delegate racing (LDR) for survival analysis with competing risks. LDR models the survival times under risks as a twophase race of subrisks, which not only intuitively explains the mechanism of surviving under competing risks, but also helps model nonmonotonic covariate effects. We use the gamma process to support a potentially countably infinite number of subrisks for each risk, and rely on its inherent shrinkage mechanism to remove unneeded model capacity, making LDR be capable of detecting unknown event subtypes without prespecifying their numbers. LDR admits a hierarchical representation that facilities the derivation of Gibbs sampling under data augmentation, which can be adapted for various practical situations such as missing event times or types. A more scalable (stochastic) gradient descent based maximum a posteriori inference algorithm is also developed for big data applications. Experimental results show that with strong interpretability and outstanding performance, the proposed LDR survival model is an attractive alternative to existing ones for various tasks in survival analysis with competing risks.
Acknowledgments
The authors acknowledge the support of Award #1812699 from the U.S. National Science Foundation.
References
 (1) E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,” Journal of the American statistical association, vol. 53, no. 282, pp. 457–481, 1958.
 (2) Y. Wang, B. Xie, N. Du, and L. Song, “Isotonic Hawkes processes,” in International conference on machine learning, pp. 2226–2234, 2016.
 (3) H. Mei and J. M. Eisner, “The neural Hawkes process: A neurally selfmodulating multivariate point process,” in NIPS, pp. 6754–6764, 2017.
 (4) D. R. Cox, “Regression models and lifetables,” in Breakthroughs in statistics, pp. 527–541, Springer, 1992.
 (5) F. Ballester, D. Corella, S. PérezHoyos, M. Sáez, and A. Hervás, “Mortality as a function of temperature. A study in Valencia, Spain, 19911993.,” International journal of epidemiology, vol. 26, no. 3, pp. 551–561, 1997.
 (6) K. M. Flegal, B. I. Graubard, D. F. Williamson, and M. H. Gail, “Causespecific excess deaths associated with underweight, overweight, and obesity,” Jama, vol. 298, no. 17, pp. 2028–2037, 2007.
 (7) H. Li and Y. Luan, “Boosting proportional hazards models using smoothing splines, with applications to highdimensional microarray data,” Bioinformatics, vol. 21, no. 10, pp. 2403–2409, 2005.
 (8) W. Lu and L. Li, “Boosting method for nonlinear transformation models with censored survival data,” Biostatistics, vol. 9, no. 4, pp. 658–667, 2008.
 (9) R. Ranganath, A. Perotte, N. Elhadad, and D. Blei, “Deep survival analysis,” in Machine Learning for Healthcare Conference, pp. 101–114, 2016.
 (10) J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger, “DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network,” BMC medical research methodology, vol. 18, no. 1, p. 24, 2018.

(11)
X. Zhu, J. Yao, and J. Huang, “Deep convolutional neural network for survival analysis with pathological images,” in
Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on, pp. 544–547, IEEE, 2016.  (12) P. Chapfuwa, C. Tao, C. Li, C. Page, B. Goldstein, L. Carin, and R. Henao, “Adversarial timetoevent modeling,” in ICML, 2018.
 (13) T. Fernández, N. Rivera, and Y. W. Teh, “Gaussian processes for survival analysis,” in NIPS, pp. 5021–5029, 2016.
 (14) M. Luck, T. Sylvain, H. Cardinal, A. Lodi, and Y. Bengio, “Deep learning for patientspecific kidney graft survival analysis,” arXiv preprint arXiv:1705.10245, 2017.
 (15) Y. Li, J. Wang, J. Ye, and C. K. Reddy, “A multitask learning formulation for survival analysis,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1715–1724, ACM, 2016.
 (16) C.N. Yu, R. Greiner, H.C. Lin, and V. Baracos, “Learning patientspecific cancer survival distributions as a sequence of dependent regressors,” in NIPS, pp. 1845–1853, 2011.
 (17) X. Miscouridou, A. Perotte, N. Elhadad, and R. Ranganath, “Deep survival analysis: Nonparametrics and missingness,” in Machine Learning for Healthcare Conference, 2018.
 (18) J. D. Kalbfleisch and R. L. Prentice, The statistical analysis of failure time data, vol. 360. John Wiley & Sons, 2011.
 (19) P. C. Austin, D. S. Lee, and J. P. Fine, “Introduction to the analysis of survival data in the presence of competing risks,” Circulation, vol. 133, no. 6, pp. 601–609, 2016.
 (20) H. Putter, M. Fiocco, and R. B. Geskus, “Tutorial in biostatistics: Competing risks and multistate models,” Statistics in medicine, vol. 26, no. 11, pp. 2389–2430, 2007.
 (21) B. Lau, S. R. Cole, and S. J. Gange, “Competing risk regression models for epidemiologic data,” American journal of epidemiology, vol. 170, no. 2, pp. 244–256, 2009.
 (22) J. P. Fine and R. J. Gray, “A proportional hazards model for the subdistribution of a competing risk,” Journal of the American statistical association, vol. 94, no. 446, pp. 496–509, 1999.
 (23) M. Wolbers, P. Blanche, M. T. Koller, J. C. Witteman, and T. A. Gerds, “Concordance for prognostic models with competing risks,” Biostatistics, vol. 15, no. 3, pp. 526–539, 2014.
 (24) H. Ishwaran, T. A. Gerds, U. B. Kogalur, R. D. Moore, S. J. Gange, and B. M. Lau, “Random survival forests for competing risks,” Biostatistics, vol. 15, no. 4, pp. 757–773, 2014.
 (25) J. E. Barrett and A. C. Coolen, “Gaussian process regression for survival data with competing risks,” arXiv preprint arXiv:1312.1591, 2013.
 (26) A. M. Alaa and M. v. d. Schaar, “Deep multitask gaussian processes for survival analysis with competing risks,” in NIPS, 2017.

(27)
C. Lee, W. R. Zame, J. Yoon, and M. van der Schaar, “DeepHit: A deep learning approach to survival analysis with competing risks,” AAAI, 2018.
 (28) M. J. Crowder, Classical competing risks. CRC Press, 2001.
 (29) F. Caron and Y. W. Teh, “Bayesian nonparametric models for ranked data,” in NIPS, pp. 1520–1528, 2012.
 (30) K. Lomax, “Business failures: Another example of the analysis of failure data,” Journal of the American Statistical Association, vol. 49, no. 268, pp. 847–852, 1954.
 (31) J. Myhre and S. Saunders, “Screen testing and conditional probability of survival,” Lecture NotesMonograph Series, pp. 166–178, 1982.
 (32) H. A. Howlader and A. M. Hossain, “Bayesian survival estimation of Pareto distribution of the second kind based on failurecensored data,” Computational statistics & data analysis, vol. 38, no. 3, pp. 301–314, 2002.
 (33) E. Cramer and A. B. Schmiedt, “Progressively TypeII censored competing risks data from Lomax distributions,” Computational Statistics & Data Analysis, vol. 55, no. 3, pp. 1285–1303, 2011.
 (34) F. Hemmati and E. Khorram, “On adaptive progressively TypeII censored competing risks data,” Communications in StatisticsSimulation and Computation, pp. 1–23, 2017.
 (35) S. AlAwadhi and M. Ghitany, “Statistical properties of PoissonLomax distribution and its application to repeated accidents data,” Journal of Applied Statistical Science, vol. 10, no. 4, pp. 365–372, 2001.
 (36) A. Childs, N. Balakrishnan, and M. Moshref, “Order statistics from nonidentical righttruncated Lomax random variables with applications,” Statistical Papers, vol. 42, no. 2, pp. 187–206, 2001.
 (37) D. E. Giles, H. Feng, and R. T. Godwin, “On the bias of the maximum likelihood estimator for the twoparameter Lomax distribution,” Communications in StatisticsTheory and Methods, vol. 42, no. 11, pp. 1934–1950, 2013.
 (38) R. Varma, N. M. Bressler, Q. V. Doan, M. Gleeson, M. Danese, J. K. Bower, E. Selvin, C. Dolan, J. Fine, S. Colman, et al., “Prevalence of and risk factors for diabetic macular edema in the United States,” JAMA ophthalmology, vol. 132, no. 11, pp. 1334–1340, 2014.
 (39) M. Zhou, Y. Cong, and B. Chen, “Augmentable gamma belief networks,” Journal of Machine Learning Research, vol. 17, no. 163, pp. 1–44, 2016.
 (40) M. Zhou, “Softplus regressions and convex polytopes,” arXiv:1608.06383, 2016.
 (41) T. S. Ferguson, “A Bayesian analysis of some nonparametric problems,” Ann. Statist., vol. 1, no. 2, pp. 209–230, 1973.
 (42) M. Zhou and L. Carin, “Negative binomial process count and mixture modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 2, pp. 307–320, 2015.
 (43) M. Zhou, L. Li, D. Dunson, and L. Carin, “Lognormal and gamma mixed negative binomial regression,” in ICML, pp. 1343–1350, 2012.
 (44) N. G. Polson, J. G. Scott, and J. Windle, “Bayesian inference for logistic models using Pólya–Gamma latent variables,” J. Amer. Statist. Assoc., vol. 108, no. 504, pp. 1339–1349, 2013.
 (45) H. Binder, A. Allignol, M. Schumacher, and J. Beyersmann, “Boosting for highdimensional timetoevent data with competing risks,” Bioinformatics, vol. 25, no. 7, pp. 890–896, 2009.
 (46) P. Saha and P. Heagerty, “Timedependent predictive accuracy in the presence of competing risks,” Biometrics, vol. 66, no. 4, pp. 999–1011, 2010.
 (47) A. Rosenwald, G. Wright, W. C. Chan, J. M. Connors, E. Campo, R. I. Fisher, R. D. Gascoyne, H. K. MullerHermelink, E. B. Smeland, J. M. Giltnane, et al., “The use of molecular profiling to predict survival after chemotherapy for diffuse largeBcell lymphoma,” New England Journal of Medicine, vol. 346, no. 25, pp. 1937–1947, 2002.
 (48) S. R. P. S. S. B. National Cancer Institute, DCCPS, Surveillance, Epidemiology, and End Results (SEER) Program Research Data (19732014), Released April 2017, based on the November 2016 submission. Released April 2017, based on the November 2016 submission.
 (49) T. A. Gerds, T. Cai, and M. Schumacher, “The performance of risk prediction models,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, vol. 50, no. 4, pp. 457–479, 2008.
 (50) E. W. Steyerberg, A. J. Vickers, N. R. Cook, T. Gerds, M. Gonen, N. Obuchowski, M. J. Pencina, and M. W. Kattan, “Assessing the performance of prediction models: A framework for some traditional and novel measures,” Epidemiology (Cambridge, Mass.), vol. 21, no. 1, p. 128, 2010.
 (51) H. Van Houwelingen and H. Putter, Dynamic prediction in clinical survival analysis. CRC Press, 2011.
 (52) J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
 (53) P. G. Moschopoulos, “The distribution of the sum of independent gamma random variables,” Annals of the Institute of Statistical Mathematics, vol. 37, no. 1, pp. 541–544, 1985.
 (54) T. A. Gerds, pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis, 2017. R package version 2.5.4.
 (55) T. A. Gerds and T. H. Scheike, riskRegression: Risk Regression Models for Survival Analysis with Competing Risks, 2015. R package version 1.1.7.
 (56) B. Gray, cmprsk: Subdistribution Analysis of Competing Risks, 2014. R package version 2.27.
 (57) H. Binder, CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks, 2013. R package version 1.4.
 (58) H. Ishwaran and U. Kogalur, Random Forests for Survival, Regression, and Classification (RFSRC), 2018. R package version 2.6.0.
 (59) T. H. Cormen, Introduction to algorithms. MIT press, 2009.
 (60) I. Borg and P. J. Groenen, Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
Appendix A Marginal distribution of failure time in LDR
Theorem 1.
If with and , the PDF of given and is
and the cumulative density function (CDF) is
(12) 
where , , , , for , and .
It is difficult to utilize the PDF or CDF of in the form of series, but we can use a finite truncation to approximate (12). Concretely, as , we find an so large that close to (say no less than ), and use as an approximation. Consequently, sampling is feasible by inverting the approximated CDF for general cases. We have tried prediction by finite truncation on some synthetic data and found is mostly between 10 and 30 which is computationally acceptable.
Proof.
We first study the distribution of gamma convolution. Specifically, if with , then the PDF of can be written in a form of series moschopoulos1985distribution as
where , , , , and . moschopoulos1985distribution proved that and where . With , we want to show the PDF of ,
(13)  
which suffices to prove the equality in (13). Note that
which shows the uniform convergence of . So the integration and countable summation are interchangeable, and consequently, (13) holds. Next we want to show the CDF of ,