Our work was motivated by the longitudinal epidemiologic Honolulu-Asia Aging Study (HAAS). The HAAS cohort is comprised of the surviving participants from the Honolulu Heart Program (HHP), a prospective, community-based cohort study of heart disease and stroke established in 1965 with about 8,000 men of Japanese ancestry living on the island of Oahu, who were born between 1900-1919. HAAS was established in 1991 and was brought to closure in 2012 with the goal of determining the prevalence, incidence, and risk factors for Alzheimer’s disease (AD) and brain aging. Demographic data, vital status and diet data were collected every 2-3 years during the HHP period, and neuropsychologic assessment were performed every 2-3 years during the HAAS. Our goal is to assess the causal effect of mid-life alcohol exposure captured during HHP on late life outcomes collected in HAAS. In particular, a subject may develop cognitive impairment, then die, or die without cognitive impairment. These are referred to as semi-competing risks where there are non-terminal events (cognitive impairment) and terminal events (death).
As outcomes we are interested in time to non-terminal event and time to terminal event, as well as
time to the terminal event following the non-terminal event.
The above semi-competing risks setting is the same as the three-states illness-death model
depicted in Figure 1,Xu et al. 
which was first introduced by Fix and Neyman .
We assume that a subject starts in the
“healthy” state (state 0), then transition into the cognitive impairment (state 1) or death state (state 2), which are also referred to as the intermediate or non-terminal, and the terminal state, respectively.
The corresponding transition events are then the non-terminal event and the terminal event, respectively.
Xu et al. 
discussed extensively the illness-death model for semi-competing risks data,
and also incorporated a shared frailty term in the illness-death model that encompasses previous works such as the copula model of Fine et al. . The illness-death model with shared frailty has been extended to different situations including in the presence of left truncation,Lee et al.  or for a nested case-control study.Jazić et al. 
Lee et al.  extended this model to the Bayesian paradigm.
Alvares et al.  developed an R package to analyze semi-competing risks data under the illness-death model using parametric models and the Bayesian method, but not for the semiparametric Cox model formulation.
For observational data, marginal structural models (MSM) have been established as a valuable tool for identifying causal effects, which can be consistently estimated using the inverse-probability-of-treatment weighting (IPTW).
developed an R package to analyze semi-competing risks data under the illness-death model using parametric models and the Bayesian method, but not for the semiparametric Cox model formulation. For observational data, marginal structural models (MSM) have been established as a valuable tool for identifying causal effects, which can be consistently estimated using the inverse-probability-of-treatment weighting (IPTW).Robins et al. , Hernán et al.  In this paper we consider a class of marginal structural illness–death models, with and without a shared frailty term. For the former an EM type iterative algorithm is needed in order to estimate the parameters. The structural models give rise to interpretable causal quantities such as different types of risk contrasts in the multi-state setting.Meira-Machado and Sestelo  The remainder of this article is organized as follows. In the next section we introduce the structural models and assumptions. In Section 3 we discuss inference under the usual Markov illness-death structural model and Section 4 the general Markov illness-death structural model, where a weighted EM algorithm is developed and studied. In Section 5 we carry out extensive simulation studies to assess the performance under the two models including when either one of the model is valid while the other is not. We apply the approaches to the HAAS data set described above in Section 6 and conclude with more discussion in the last section.
2 Three-State Illness-Death model
2.1 Definitions and assumptions
For our setup, assume a well-defined time zero, and let random variables
For our setup, assume a well-defined time zero, and let random variablesand denote time to the non-terminal and the terminal event since time zero, respectively. If a subject does not experience the non-terminal event before the terminal event, we define .Xu et al. , Fine et al.  Denote the joint density of and as in the upper wedge , and the density of along the line as for . Note that for semi-competing risks data, we do not observe any data in the lower wedge ; see Figure 2. We also denote the bivariate survival function of and in the upper wedge as . The multi-state model quantifies event rates and event risks based on the history of events, and is completely specified by the three transition intensities below, also referred to as transition rates in the literature. Let and be the transition rates from the initial healthy state to the non-terminal, and the terminal state, respectively, and the transition rate from the non-terminal state to the terminal state. That is,
Note that (1) and (2) are in fact the cause-specific hazards in the usual competing risks setting, for time to the non-terminal event and time to the terminal event without non-terminal event, respectively. In general, can depend on both and . In the following we consider the commonly used Markov assumption: , i.e. the transition rate from non-terminal to terminal state does not depend on what value takes. While the transition rates in (1) - (3) completely specifies the three-state illness-death model, for interpretation purposes various risk type quantities can be of interest in practice. Cumulative incidence function (CIF) are commonly used for competing risks,Kalbfleisch and Prentice  that is, for the non-terminal event, denoted by below, and for the terminal event without the non-terminal event, denoted by below. In addition, we may also consider a third CIF, denoted by , for the terminal event following the non-terminal event.Meira-Machado and Sestelo  We have
In the presence of right censoring, such as lost to follow-up or administrative censoring, let be the time to right censoring since time zero.
Denote , , and the event indicators , , where is the indicator function. Let be a binary treatment assignment, possibly not randomized.
Following Neyman  and Rubin  framework of potential outcomes, we denote as potential time to the non-terminal event, terminal event and censoring under treatment . And , , and are similarly defined.
Let Z be a p-dimensional vector of covariates.
Exchangeability implies that within levels of the variable , the potential event times and the treatment assignment are independent. It is also called (conditional) ignobility, and that there are no unmeasured confounders. The positivity assumption requires that the probability of receiving either treatment () or control () is positive for any given value of . The consistency assumption here links the potential outcomes with the observed outcomes. For more discussion on these assumptions, please see Hernán and Robins . We also assume: (IV) Non-informative censoring: .
are similarly defined. Let Z be a p-dimensional vector of covariates. Denote, often referred to as the propensity score. The causal relationship of the variables defined above can be depicted in a graphical display called a chain graph as in Figure 3,Tchetgen Tchetgen et al.  where the undirected line indicates correlation. A chain graph without undirected edges is known as a causal directed acyclic graphs (DAG). We assume the following, which are commonly used in order to identify the causal estimands to be specified later: (I) Stable unit treatment value assumption (SUTVA): there is only one version of the treatment and that there is no interference between subjects. (II) Exchangeability: . (III) Positivity: . (IV) Consistency: If , then , , .
2.2 The structural models
Let , and be the transition rates corresponding to the counterfactual states under the three-state model, . Andersen et al.  discussed about modeling each transition intensity by a Cox type proportional intensities regression model. Following the same idea, we can postulate the semi-parametric Cox models for these transition rates, which are also hazard functions.Xu et al. , Andersen et al.  In particular, we consider the following usual Markov illness-death structural model:Xu et al. 
The joint distribution of we consider the log-normal distribution for the frailty, and we have
The joint distribution ofand under model (7) - (9) will be given as a special case below. The usual Markov illness-death model can be extended by incorporating a frailty term, to the general Markov illness-death structural model. The frailty term induces further correlation between and , beyond what is already contained in the joint distribution of and above. It also models unobserved heterogeneity among individuals.Lancaster and Nickell , Nielsen et al.  Following Vaida and Xu 
we consider the log-normal distribution for the frailty, and we have
where . Obviously model (7) - (9) is a special case of (10) - (12) by setting . Recall the joint density and the bivariate survival function previously defined in the upper wedge , and the density function along the line . In the Supplementary Materials we show that these quantities can be derived as functions of the transition rates (1) - (3). With the models specified in (10) - (12) we then have the following quantities that will be used later:
where for , and with .
In this subsection we assume that the treatment is randomized so that we can write down the relevant probabilities for the four scenarios below. We will then use inverse probability weighting (IPW) to create a pseudo-randomized sample. Denote the observed data for subject , and the likelihood conditional on the random effect . We have the following four different scenarios: (i) Non-terminal event then censored prior to terminal event: ,
(ii) Non-terminal event and then terminal event: ,
(iii) Terminal event without non-terminal event: ,
(iv) Censored before any event: ,
Combining the above four scenarios, we have
3 The Usual Markov Structural Model
In the absence of randomization, denote
as the IP weight for subject .
In practice, is unknown and can be estimated from the data by either specifying a
parametric model such as the logistic regression,
is unknown and can be estimated from the data by either specifying a parametric model such as the logistic regression,Robins et al.  or use nonparametric methods such as boosted trees.McCaffrey et al.  For the usual Markov illness-death model, with in (16), we have the weighted log-likelihood
It can be seen that the parameters for the three transition rates , ,
are variationally independent in the above likelihood and therefore can be estimated separately. Note that the semiparametric approach under the Cox type models discretizes the baselines hazards into point masses at the observed event times and estimates the cumulative as step functions.
It can be verified that maximizing (17) is equivalent to maximizing the following three weighted Cox regression model likelihoods:
1) treating the non-terminal event as the event of interest, and terminal event without non-terminal or originally censored as ‘censored’;
2) treating the terminal event without non-terminal as the event of interest, and non-terminal event or originally censored as ‘censored’;
3) treating the terminal event following the non-terminal as the event of interest, left truncated at the time of the non-terminal event (so only those who had the non-terminal event are included), and originally censored as ‘censored’.
Then the standard software (e.g. coxph() in R package ‘survival’) can be used to obtain the estimates .
In order to obtain the variance of the estimates, if we assume the estimated weights in (
. In order to obtain the variance of the estimates, if we assume the estimated weights in (17) as known, then the robust sandwich variance estimator in standard software such as coxph() can be used to obtain the estimated variance for . In the Supplementary Materials we provide the formulas for estimating the covariances between . In addition, we may also use the bootstrap variance estimator which accounts for the uncertainty in estimating the weights. For causal interpretation, we may define the risk contrasts as the difference or the ratio between the CIF’s under the structural models with and . In particular,
We estimate the contrasts by plugging in the parameter estimates, and obtain their 95% confidence intervals (CI) using bootstrap.
We note that for simple competing risk data under the marginal structural Cox model, such risk contrasts are available in the R package ‘cmprskcoxmsm’.
. We estimate the contrasts by plugging in the parameter estimates, and obtain their 95% confidence intervals (CI) using bootstrap. We note that for simple competing risk data under the marginal structural Cox model, such risk contrasts are available in the R package ‘cmprskcoxmsm’.Zhang and Xu 
4 The General Markov Structural Model
where is the normal density function. Then the estimate can be obtained by maximizing (21). We introduce below an EM type algorithm in order to maximize (21). Denote the expectation of the weighted log-likelihood of the augmented data , , conditional on the observed data and the current parameter value :
Then , where
where is shorthand for a function of . Analogous to the EM algorithm, we iterate between the E-steps and the M-steps described below until convergence.
The conditional expectations in (24) - (27) are all in form of , where in (24) - (26) and in (27). These two expectations are not in closed form; however, we can approximate these integrals by numerical methods, specifically by (adaptive) Gaussian quadrature.Gander and Gautschi , Rice  Details of computation are shown in the Supplement Materials.
The M-step conveniently separates the update of and for from that of the variance component . For - , similar to Section 3, (24) - (26) are equivalent to the weighted log-likelihood functions in a Cox regression with additional known offsets . In order to maximize , we set
In the lemma below, we establish the following property of the above weighted EM algorithm, which is similar to that of the EM algorithm.
Suppose is the weighted observed data likelihood. At step of the algorithm denote the current value, and the value that maximizes . Then:
The proof of the lemma is given in the Supplement Materials. Following Wu  or Theorem 4.12 in Lehmann and Casella , since is continuous in both and , then all limit points of the weighted EM sequence are stationary points of , and converges monotonically to for some stationary point . In addition, for existence of such limit point(s) Vaida  proposed a condition for the usual unweighted EM algorithm: as long as the maximizer in the M-step is unique. We can show that this result extends immediately to our weighted EM algorithm. And finally, our M-step satisfies this condition, i.e. the maximizer in the M-step is unique. As initial values we use for and , , the estimates from weighted Cox regression without the offsets, i.e. from the usual Markov model of the previous section; and . The stop criteria we use in this paper are convergence in the log-likelihood as well as in parameters of interest: , , and .
The variance of the parameter estimates following a typical EM algorithm can be estimated by the inverse of a (discrete) observed information matrix calculated using Louis’ formula, including for the nonparametric maximum likelihood estimator (NPMLE) under, for example, the semiparametric proportional hazards mixed models.Vaida and Xu  For observational data, however, inference using the weighted NPMLE under semiparametric models requires the derivation of efficient influence functions,Breslow and Wellner  and is generally non-trivial under the normal frailty construct.Murphy and Van der Vaart , Maples et al.  In the following we use bootstrap to obtain the variance estimator for .
Similar to what we proposed under the usual Markov model, we also can define the risk contrasts under the general Markov model. Since the general Markov models are conditional on the random effect , we have the following conditional risk:
where . As discussed earlier the frailty term, or equivalently, the random effect represents the unobserved heterogeneity among the individuals. As such, the above conditional risk represents individual risk, and the risk contrasts the individual risk contrasts. We therefore have the individual risk difference (IRD) and the individual risk ratio (IRR). Under the random effects model, for , the predicted random effect is .Vaida and Xu  We then obtain the predicted IRD and the predicted IRR. For inference on these individual risk contrasts, Bayesian bootstrapKosorok  may be used which, unlike the usual resampling with replacement, preserves each individual in the original data set. Details of the Bayesian bootstrap are provided in the Supplementary Materials. Note that because is random, the common terminology in the literature is ‘predicted’ instead of ‘estimated’, and ‘prediction interval (PI)’ instead of CI.
We carry out extensive Monte Carlo simulation studies in order to assess the performance of the estimation procedure described above. We use the idea from Havercroft and Didelez  to simulate data under the marginal structural model (10) - (12). We also adapt the method from Jiang and Haneuse , originally designed for simulating semi-competing risk data with gamma frailty. Very briefly the following steps are used to to generate the data; more details are provided in the Supplementary Materials.
Generate and ;
Generate confounder , with , where , and ;
Generate Bernoulli(), where , with ;
Let and . Then with probability given in the Supplementary Materials,
and with probability ,
Generate Censoring time , which leads to an average censoring rate around 20%.
We set , .
Weights are calculated by fitting the logistic regression with as covariates.
We run 500 simulations for each case.
Table 1 and 2 report, for sample size n=250 and n=500, respectively, the estimate, the empirical standard deviation (SD), the mean of estimated standard errors (SE), and the coverage probability (CP) of the nominal 95% confidence intervals.
Under the usual Markov model, we estimate the asymptotical variance of
report, for sample size n=250 and n=500, respectively, the estimate, the empirical standard deviation (SD), the mean of estimated standard errors (SE), and the coverage probability (CP) of the nominal 95% confidence intervals. Under the usual Markov model, we estimate the asymptotical variance of, using both the model-based formulas, which ignores the uncertainty in the estimation of the weights, and bootstrap. When , we see that the estimation under the usual Markov model is nearly unbiased, in particular for the larger sample size , and the coverage of the confidence intervals (CI) based on the normal approximation is very close to the nominal level. We note that the margin of error using 500 simulation runs to estimate the coverage of 95% CI’s is 0.019, so that the range of coverage probability (CP) should be mostly within 93.1% to 96.9%. We also see that when , the estimation under the general Markov mode performed well for and , . However, the mean of the estimated standard error of is much higher than the empirical standard deviation, and the CI overcovers. We note that this is the boundary cases considered in Xu et al. , where the asymptotical distribution is no longer normal. When , we see that our estimator under the general Markov model is quite accurate for even the smaller sample size , the SEs are close to the sample SD and the coverage probabilities are good. The estimates under the usual Markov model is obviously biased with poor coverage of the CI’s when . Finally, we note that the variances of the estimators are generally larger under the general Markov, as more parameter is estimated.
6 Application to HAAS study
For this analysis, we are interested in the effect of mid-life alcohol exposure on cognitive impairment as well as death, which are semi-competing risks. In the HHP-HAAS study, alcohol consumption was assessed by self-report and translated into units of drinks per month. Estimates of the total ethanol intake from reported drinking patterns were calculated as ounces per month for beer, liquor, wine, and sake using algorithms based on average unit sizes and usual alcohol percentages. The alcohol consumption was then dichotomized into light drinking (30.1 oz/month) vs heavy drinking (30.1 oz/month). The “mid-life” alcohol exposure was collected during the HHP study between 1965-73. The Heavy Drinking group consisted of individuals who had heavy drinking at one point during mid-life, and the Light Drinking those who never had heavy drinking during mid-life. Cognitive impairment was based on scores from the Cognitive Assessment and Screening Instrument (CASI), where a score below 74 was considered a moderate impairment (MI). The confounders were decided by literature review and clinical experiences, as well as availability of the data. Literatures show that vital data such as blood pressure and heart rate are associated with drinking habits, as well as the cognitive health. Meanwhile, demographic data such as age, years of education, are also related to cognitive impairment and drinking habits. The Apolipoprotein E is the first identified genetic susceptibility factor for sporadic AD. Towards understanding determinants of cognitive impairment and factors associated with drinking habits, the final set of baseline confounders are baseline CASI score, systolic blood pressure, heart rate, Apolipoprotein E genotype positive, years of education and baseline age. We only include participants with normal cognitive function (CASI 74) at baseline, and after excluding missing values for exposure and confounders, we have 1881 participants in total. Since HAAS is a long-term epidemiology study, lost to follow-up occurs at every exam visit. On the other hand, death certificates were obtained for many participants, even after lost to follow-up. For this reason, we needed to properly define the death for the semi-competing risks data. If the death date is after the participant’s recorded last visit date from the study, we consider this participant lost to follow-up. More details of data pre-processing can be found in Zhang . Propensity scores (PS) were calculated using R package twang (Toolkit for Weighting and Analysis of Nonequivalent Groups), which estimates the PS using boosted regression as the predicted probability of being heavy versus light drinking, conditional on the measured baseline confounders. Before applying the IPW approach to the multi-state model, we obtained stabilized weights and trimmed them within (0.1, 10). In Supplementary Materials we show the PS histograms in the heavy and light drinking groups as a check of the positivity assumption, where the PS distributions are seen to be bounded away from zero and one. We also plot the standardized mean difference (SMD) to check the balance of each confounder before and after weighting, where the SMD’s of all the confounders are within the interval [-0.1, 0.1] after weighting. We apply our proposed methods to the HAAS data. We first fit the usual Markov structural model and the results are in the top half of Table 4. We see that the transition rates to moderate impairment or death without moderate impairment are significantly higher in the heavy drinking group compared to the light drinking group. But we don’t see a significant difference in the transition rates to death after moderate impairment. We then fit the general Markov structural model and the results are in the bottom half of Table 4. The convergence plot of the parameters and the likelihood during the weighted EM algorithm are provided in the Supplement Materials, where we stopped at 168 EM steps for the final results. Compared to the results under the usual Markov model, the magnitude of all three estimated effects are further away from the null, and all three transition rates are significantly higher in the heavy drinking group than the light drinking group. The phenomenon of more significant and away-from-the-null regression effects after accounting for the frailty is known in the literature under the Cox model.[Chastang et al., 1988] Finally, we estimate the causal risk contrasts under the structural models. For illustration purposes we fix years in and ; that is, the cumulative incidence rate of death following MI by 8 years. We show the estimated risk curves in Figure 4 first row under the usual Markov model, and the risk contrasts in Table 5 for heavy versus light drinking. It is seen that the risk contrasts for the two competing events, MI and death without MI, are significantly different from the null at 5 and 10 years, but not so at 15 and 20 years. The risk contrasts for death following MI by 8 years are not significantly different from the null at 10, 15 or 20 years under the usual Markov model. We also show the predicted conditional risk curves at different values () in Figure 4, rows 2-6. In Figure 5 we plot the IRD and IRR at 10 years with 95% PI’s of 100 participants from every percentile of the predicted values. We note the different significance results for IRD and IRR: the IRD tends to be significantly different from the null for values closer to zero, while the IRR tends to be significantly different from the null for negative values. This appears to be generally the case for all three outcomes: MI, death without MI, and death following MI by 8 years. More discussion will follow in the next section.
In this paper we applied the three-state illness-death model to observational data using the potential outcomes framework. Inverse probability of treatment weighting is used to fit these structural models.
Under the Cox model formulation, typical software used to fit the Cox regression model can be used to fit the usual Markov model in the absence of frailty. With the frailty term under the general Markov model, a weighted EM algorithm is developed and its convergence property studied.
The simulation studies showed the good performance of our proposed methods.
For applications in practice, we have defined cumulative risk based causal contrasts and illustrated their use.
Under the general Markov model with frailty, these give rise to individual risk contrasts IRD and IRR. This is consistent with the random effects modeling formulation, where individual trajectories, for example, from longitudinal data can be estimated and predicted. We have extended this feature to the causal inference setting, when the individual heterogeneity is modeled using random effects.
It might also be of some interest to compare the IRD and IRR to the RD and RR under the usual Markov model without frailty, and note some similarity between the first and the fourth row of Figure 4, where the random effect is set to its mean value of zero. We note that these two sets of contrasts are not the same, especially since the Cox model is not collapsible; and the interpretations are different for these two sets of contrasts.
Semi-competing risks data have recently been considered under the mediation setup with the non-terminal event as a mediator.
Huang , Xu et al.  Our multi-state structural models instead consider the total effect of the exposure on all three outcomes: non-terminal event, and terminal event with and without non-terminal event.
For future work, since the IPW estimator is biased if the propensity score model is misspecified, an augmented IPW (AIPW) estimator with doubly robust properties can protect against such model misspecification.
It would also allow us to apply machine learning or nonparametric methods to the propensity score model.
Our multi-state structural models instead consider the total effect of the exposure on all three outcomes: non-terminal event, and terminal event with and without non-terminal event. For future work, since the IPW estimator is biased if the propensity score model is misspecified, an augmented IPW (AIPW) estimator with doubly robust properties can protect against such model misspecification. It would also allow us to apply machine learning or nonparametric methods to the propensity score model.Rava  and Tchetgen and Robins  have already developed the AIPW estimator for the marginal structural Cox model, and it is nature to extend their work for the models in this paper. This is currently under investigation. Another future direction is to develop sensitivity analysis approaches for various assumptions including unmeasured confounding as well as modeling assumptions that are used. The R codes developed in this work have been implemented in the R package semicmprskcoxmsm that is publicly available on CRAN.
This research was partially supported by NIH/NIA grant R03 AG062432. We thank Dr. Andrew Ying for discussion regarding Lemma 1.
- Xu et al.  Jinfeng Xu, John D Kalbfleisch, and Beechoo Tai. Statistical analysis of illness–death processes and semicompeting risks data. Biometrics, 66(3):716–725, 2010.
- Fix and Neyman  Evelyn Fix and Jerzy Neyman. A simple stochastic model of recovery, relapse, death and loss of patients. Human Biology, 23(3):205–241, 1951.
- Fine et al.  Jason P Fine, Hongyu Jiang, and Rick Chappell. On semi-competing risks data. Biometrika, 88(4):907–919, 2001.
- Lee et al.  Catherine Lee, Paola Gilsanz, and Sebastien Haneuse. Fitting a shared frailty illness-death model to left-truncated semi-competing risks data to examine the impact of education level on incident dementia. BMC Medical Research Methodology, 21(1):1–13, 2021.
- Jazić et al.  Ina Jazić, Stephanie Lee, and Sebastien Haneuse. Estimation and inference for semi-competing risks based on data from a nested case-control study. Statistical methods in medical research, 29(11):3326–3339, 2020.
- Lee et al.  Kyu Ha Lee, Sebastien Haneuse, Deborah Schrag, and Francesca Dominici. Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64(2):253–273, 2015.
- Alvares et al.  Danilo Alvares, Sebastien Haneuse, Catherine Lee, and Kyu Ha Lee. Semicomprisks: an R package for the analysis of independent and cluster-correlated semi-competing risks data. The R journal, 11(1):376, 2019.
- Robins et al.  James M Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5):550–560, 2000.
- Hernán et al.  Miguel A Hernán, Babette Brumback, and James M Robins. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96(454):440–448, 2001.
- Meira-Machado and Sestelo  Luís Meira-Machado and Marta Sestelo. Estimation in the progressive illness-death model: A nonexhaustive review. Biometrical Journal, 61(2):245–263, 2019.
- Kalbfleisch and Prentice  John D Kalbfleisch and Ross L Prentice. The statistical analysis of failure time data. John Wiley & Sons, 2011.
Jerzy S Neyman.
On the application of probability theory to agricultural experiments. essay on principles. section 9.(translated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465-480).Annals of Agricultural Sciences, 10:1–51, 1923.
- Rubin  Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
- Tchetgen Tchetgen et al.  Eric J Tchetgen Tchetgen, Isabel R Fulcher, and Ilya Shpitser. Auto-G-computation of causal effects on a network. Journal of the American Statistical Association, 116(534):833–844, 2021.
- Hernán and Robins  Miguel A Hernán and James M Robins. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2021.
- Andersen et al.  Per Kragh Andersen, Lars Sommer Hansen, and Niels Keiding. Non-and semi-parametric estimation of transition probabilities from censored observation of a non-homogeneous markov process. Scandinavian Journal of Statistics, pages 153–167, 1991.
- Lancaster and Nickell  Tony Lancaster and Stephen Nickell. The analysis of re-employment probabilities for the unemployed. Journal of the Royal Statistical Society: Series A (General), 143(2):141–152, 1980.
- Nielsen et al.  Gert G Nielsen, Richard D Gill, Per Kragh Andersen, and Thorkild IA Sørensen. A counting process approach to maximum likelihood estimation in frailty models. Scandinavian journal of Statistics, pages 25–43, 1992.
- Vaida and Xu  Florin Vaida and Ronghui Xu. Proportional hazards model with random effects. Statistics in medicine, 19(24):3309–3324, 2000.
- McCaffrey et al.  Daniel F McCaffrey, Greg Ridgeway, and Andrew R Morral. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological methods, 9(4):403, 2004.
- Zhang and Xu  Yiran Zhang and Ronghui Xu. cmprskcoxmsm: Use IPW to Estimate Treatment Effect under Competing Risks, 2021. R package version 0.2.1.
- Gander and Gautschi  Walter Gander and Walter Gautschi. Adaptive quadrature—revisited. BIT Numerical Mathematics, 40(1):84–101, 2000.
- Rice  John R Rice. A metalgorithm for adaptive quadrature. Journal of the ACM (JACM), 22(1):61–82, 1975.
- Wu  CF Jeff Wu. On the convergence properties of the EM algorithm. The Annals of statistics, pages 95–103, 1983.
- Lehmann and Casella  Erich L Lehmann and George Casella. Theory of point estimation. Springer Science & Business Media, 2006.
- Vaida  Florin Vaida. Parameter convergence for EM and MM algorithms. Statistica Sinica, pages 831–840, 2005.
- Breslow and Wellner  Norman E Breslow and Jon A Wellner. Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression. Scandinavian Journal of Statistics, 34(1):86–102, 2007.
- Murphy and Van der Vaart  Susan A Murphy and Aad W Van der Vaart. On profile likelihood. Journal of the American Statistical Association, 95(450):449–465, 2000.
- Maples et al.  Jerry J Maples, Susan A Murphy, and William G Axinn. Two-level proportional hazards models. Biometrics, 58(4):754–763, 2002.
- Kosorok  Michael R Kosorok. Introduction to empirical processes and semiparametric inference. Springer, 2008.
- Havercroft and Didelez  WG Havercroft and Vanessa Didelez. Simulating from marginal structural models with time-dependent confounding. Statistics in medicine, 31(30):4190–4206, 2012.
- Jiang and Haneuse  Fei Jiang and Sebastien Haneuse. Simulation of semicompeting risk survival data and estimation based on multistate frailty model. Harvard University Biostatistics Working Paper Series, 2015.
- Xu et al.  Ronghui Xu, Florin Vaida, and David P Harrington. Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 19(2):819, 2009.
- Zhang  Y Zhang. Causal inference for competing risks and semi-competing risks data. Ph.D. Thesis, University of California, San Diego, 2022.
- Chastang et al.  Claude Chastang, David Byar, and Steven Piantadosi. A quantitative study of the bias in estimating the treatment effect caused by omitting a balanced covariate in survival models. Statistics in Medicine, 7(12):1243–1255, 1988.
- Huang  Yen-Tsung Huang. Causal mediation of semicompeting risks. Biometrics, 77(4):1143–1154, 2021.
- Xu et al.  Yanxun Xu, Daniel Scharfstein, Peter Müller, and Michael Daniels. A Bayesian nonparametric approach for evaluating the causal effect of treatment in randomized trials with semi-competing risks. Biostatistics, 23(1):34–49, 2022.
- Rava  Denise Rava. Survival Analysis and Causal Inference: from Marginal Structural Cox to Additive Hazards Model and beyond. Ph.D. Thesis, University of California, San Diego, 2021.
- Tchetgen and Robins  Eric J Tchetgen Tchetgen and James Robins. On parametrization, robustness and sensitivity analysis in a marginal structural cox proportional hazards model for point exposure. Statistics & Probability Letters, 82(5):907–915, 2012.
8 Supplementary materials
8.1 Derivation of , and
We also have: