1 Introduction
Our work was motivated by the longitudinal epidemiologic HonoluluAsia Aging Study (HAAS). The HAAS cohort is comprised of the surviving participants from the Honolulu Heart Program (HHP), a prospective, communitybased cohort study of heart disease and stroke established in 1965 with about 8,000 men of Japanese ancestry living on the island of Oahu, who were born between 19001919. HAAS was established in 1991 and was brought to closure in 2012 with the goal of determining the prevalence, incidence, and risk factors for Alzheimer’s disease (AD) and brain aging. Demographic data, vital status and diet data were collected every 23 years during the HHP period, and neuropsychologic assessment were performed every 23 years during the HAAS. Our goal is to assess the causal effect of midlife alcohol exposure captured during HHP on late life outcomes collected in HAAS. In particular, a subject may develop cognitive impairment, then die, or die without cognitive impairment. These are referred to as semicompeting risks where there are nonterminal events (cognitive impairment) and terminal events (death). As outcomes we are interested in time to nonterminal event and time to terminal event, as well as time to the terminal event following the nonterminal event. The above semicompeting risks setting is the same as the threestates illnessdeath model depicted in Figure 1,Xu et al. [2010] which was first introduced by Fix and Neyman [1951]. We assume that a subject starts in the “healthy” state (state 0), then transition into the cognitive impairment (state 1) or death state (state 2), which are also referred to as the intermediate or nonterminal, and the terminal state, respectively. The corresponding transition events are then the nonterminal event and the terminal event, respectively. Xu et al. [2010] discussed extensively the illnessdeath model for semicompeting risks data, and also incorporated a shared frailty term in the illnessdeath model that encompasses previous works such as the copula model of Fine et al. [2001]. The illnessdeath model with shared frailty has been extended to different situations including in the presence of left truncation,Lee et al. [2021] or for a nested casecontrol study.Jazić et al. [2020] Lee et al. [2015] extended this model to the Bayesian paradigm. Alvares et al. [2019]
developed an R package to analyze semicompeting risks data under the illnessdeath model using parametric models and the Bayesian method, but not for the semiparametric Cox model formulation. For observational data, marginal structural models (MSM) have been established as a valuable tool for identifying causal effects, which can be consistently estimated using the inverseprobabilityoftreatment weighting (IPTW).
Robins et al. [2000], Hernán et al. [2001] In this paper we consider a class of marginal structural illness–death models, with and without a shared frailty term. For the former an EM type iterative algorithm is needed in order to estimate the parameters. The structural models give rise to interpretable causal quantities such as different types of risk contrasts in the multistate setting.MeiraMachado and Sestelo [2019] The remainder of this article is organized as follows. In the next section we introduce the structural models and assumptions. In Section 3 we discuss inference under the usual Markov illnessdeath structural model and Section 4 the general Markov illnessdeath structural model, where a weighted EM algorithm is developed and studied. In Section 5 we carry out extensive simulation studies to assess the performance under the two models including when either one of the model is valid while the other is not. We apply the approaches to the HAAS data set described above in Section 6 and conclude with more discussion in the last section.2 ThreeState IllnessDeath model
2.1 Definitions and assumptions
For our setup, assume a welldefined time zero, and let random variables
and denote time to the nonterminal and the terminal event since time zero, respectively. If a subject does not experience the nonterminal event before the terminal event, we define .Xu et al. [2010], Fine et al. [2001] Denote the joint density of and as in the upper wedge , and the density of along the line as for . Note that for semicompeting risks data, we do not observe any data in the lower wedge ; see Figure 2. We also denote the bivariate survival function of and in the upper wedge as . The multistate model quantifies event rates and event risks based on the history of events, and is completely specified by the three transition intensities below, also referred to as transition rates in the literature. Let and be the transition rates from the initial healthy state to the nonterminal, and the terminal state, respectively, and the transition rate from the nonterminal state to the terminal state. That is,(1)  
(2)  
(3) 
Note that (1) and (2) are in fact the causespecific hazards in the usual competing risks setting, for time to the nonterminal event and time to the terminal event without nonterminal event, respectively. In general, can depend on both and . In the following we consider the commonly used Markov assumption: , i.e. the transition rate from nonterminal to terminal state does not depend on what value takes. While the transition rates in (1)  (3) completely specifies the threestate illnessdeath model, for interpretation purposes various risk type quantities can be of interest in practice. Cumulative incidence function (CIF) are commonly used for competing risks,Kalbfleisch and Prentice [2011] that is, for the nonterminal event, denoted by below, and for the terminal event without the nonterminal event, denoted by below. In addition, we may also consider a third CIF, denoted by , for the terminal event following the nonterminal event.MeiraMachado and Sestelo [2019] We have
(4)  
(5)  
(6) 
where . In the presence of right censoring, such as lost to followup or administrative censoring, let be the time to right censoring since time zero. Denote , , and the event indicators , , where is the indicator function. Let be a binary treatment assignment, possibly not randomized. Following Neyman [1923] and Rubin [2005] framework of potential outcomes, we denote as potential time to the nonterminal event, terminal event and censoring under treatment . And , , and
are similarly defined. Let Z be a pdimensional vector of covariates. Denote
, often referred to as the propensity score. The causal relationship of the variables defined above can be depicted in a graphical display called a chain graph as in Figure 3,Tchetgen Tchetgen et al. [2021] where the undirected line indicates correlation. A chain graph without undirected edges is known as a causal directed acyclic graphs (DAG). We assume the following, which are commonly used in order to identify the causal estimands to be specified later: (I) Stable unit treatment value assumption (SUTVA): there is only one version of the treatment and that there is no interference between subjects. (II) Exchangeability: . (III) Positivity: . (IV) Consistency: If , then , , .Exchangeability implies that within levels of the variable , the potential event times and the treatment assignment are independent. It is also called (conditional) ignobility, and that there are no unmeasured confounders. The positivity assumption requires that the probability of receiving either treatment () or control () is positive for any given value of . The consistency assumption here links the potential outcomes with the observed outcomes. For more discussion on these assumptions, please see Hernán and Robins [2021]. We also assume: (IV) Noninformative censoring: .
2.2 The structural models
Let , and be the transition rates corresponding to the counterfactual states under the threestate model, . Andersen et al. [1991] discussed about modeling each transition intensity by a Cox type proportional intensities regression model. Following the same idea, we can postulate the semiparametric Cox models for these transition rates, which are also hazard functions.Xu et al. [2010], Andersen et al. [1991] In particular, we consider the following usual Markov illnessdeath structural model:Xu et al. [2010]
(7)  
(8)  
(9) 
The joint distribution of
and under model (7)  (9) will be given as a special case below. The usual Markov illnessdeath model can be extended by incorporating a frailty term, to the general Markov illnessdeath structural model. The frailty term induces further correlation between and , beyond what is already contained in the joint distribution of and above. It also models unobserved heterogeneity among individuals.Lancaster and Nickell [1980], Nielsen et al. [1992] Following Vaida and Xu [2000]we consider the lognormal distribution for the frailty, and we have
(10)  
(11)  
(12) 
where . Obviously model (7)  (9) is a special case of (10)  (12) by setting . Recall the joint density and the bivariate survival function previously defined in the upper wedge , and the density function along the line . In the Supplementary Materials we show that these quantities can be derived as functions of the transition rates (1)  (3). With the models specified in (10)  (12) we then have the following quantities that will be used later:
(13)  
(14)  
(15) 
where for , and with .
2.3 Likelihood
In this subsection we assume that the treatment is randomized so that we can write down the relevant probabilities for the four scenarios below. We will then use inverse probability weighting (IPW) to create a pseudorandomized sample. Denote the observed data for subject , and the likelihood conditional on the random effect . We have the following four different scenarios: (i) Nonterminal event then censored prior to terminal event: ,
(ii) Nonterminal event and then terminal event: ,
(iii) Terminal event without nonterminal event: ,
(iv) Censored before any event: ,
Combining the above four scenarios, we have
(16) 
3 The Usual Markov Structural Model
In the absence of randomization, denote as the IP weight for subject . In practice,
is unknown and can be estimated from the data by either specifying a parametric model such as the logistic regression,
Robins et al. [2000] or use nonparametric methods such as boosted trees.McCaffrey et al. [2004] For the usual Markov illnessdeath model, with in (16), we have the weighted loglikelihood(17) 
It can be seen that the parameters for the three transition rates , , are variationally independent in the above likelihood and therefore can be estimated separately. Note that the semiparametric approach under the Cox type models discretizes the baselines hazards into point masses at the observed event times and estimates the cumulative as step functions. It can be verified that maximizing (17) is equivalent to maximizing the following three weighted Cox regression model likelihoods: 1) treating the nonterminal event as the event of interest, and terminal event without nonterminal or originally censored as ‘censored’; 2) treating the terminal event without nonterminal as the event of interest, and nonterminal event or originally censored as ‘censored’; 3) treating the terminal event following the nonterminal as the event of interest, left truncated at the time of the nonterminal event (so only those who had the nonterminal event are included), and originally censored as ‘censored’. Then the standard software (e.g. coxph() in R package ‘survival’) can be used to obtain the estimates
. In order to obtain the variance of the estimates, if we assume the estimated weights in (
17) as known, then the robust sandwich variance estimator in standard software such as coxph() can be used to obtain the estimated variance for . In the Supplementary Materials we provide the formulas for estimating the covariances between . In addition, we may also use the bootstrap variance estimator which accounts for the uncertainty in estimating the weights. For causal interpretation, we may define the risk contrasts as the difference or the ratio between the CIF’s under the structural models with and . In particular,(18)  
(19)  
(20) 
where
. We estimate the contrasts by plugging in the parameter estimates, and obtain their 95% confidence intervals (CI) using bootstrap. We note that for simple competing risk data under the marginal structural Cox model, such risk contrasts are available in the R package ‘cmprskcoxmsm’.
Zhang and Xu [2021]4 The General Markov Structural Model
Under the general Markov illnessdeath model (10)  (12) where , let . Denote . The weighted observed data likelihood is:
(21) 
where is the normal density function. Then the estimate can be obtained by maximizing (21). We introduce below an EM type algorithm in order to maximize (21). Denote the expectation of the weighted loglikelihood of the augmented data , , conditional on the observed data and the current parameter value :
(22) 
where
(23) 
Then , where
(24)  
(25)  
(26)  
(27) 
where is shorthand for a function of . Analogous to the EM algorithm, we iterate between the Esteps and the Msteps described below until convergence.
Estep
The conditional expectations in (24)  (27) are all in form of , where in (24)  (26) and in (27). These two expectations are not in closed form; however, we can approximate these integrals by numerical methods, specifically by (adaptive) Gaussian quadrature.Gander and Gautschi [2000], Rice [1975] Details of computation are shown in the Supplement Materials.
Mstep
The Mstep conveniently separates the update of and for from that of the variance component . For  , similar to Section 3, (24)  (26) are equivalent to the weighted loglikelihood functions in a Cox regression with additional known offsets . In order to maximize , we set
leading to
(28) 
In the lemma below, we establish the following property of the above weighted EM algorithm, which is similar to that of the EM algorithm.
Lemma 1.
Suppose is the weighted observed data likelihood. At step of the algorithm denote the current value, and the value that maximizes . Then:
(29) 
The proof of the lemma is given in the Supplement Materials. Following Wu [1983] or Theorem 4.12 in Lehmann and Casella [2006], since is continuous in both and , then all limit points of the weighted EM sequence are stationary points of , and converges monotonically to for some stationary point . In addition, for existence of such limit point(s) Vaida [2005] proposed a condition for the usual unweighted EM algorithm: as long as the maximizer in the Mstep is unique. We can show that this result extends immediately to our weighted EM algorithm. And finally, our Mstep satisfies this condition, i.e. the maximizer in the Mstep is unique. As initial values we use for and , , the estimates from weighted Cox regression without the offsets, i.e. from the usual Markov model of the previous section; and . The stop criteria we use in this paper are convergence in the loglikelihood as well as in parameters of interest: , , and .
Variance estimate
The variance of the parameter estimates following a typical EM algorithm can be estimated by the inverse of a (discrete) observed information matrix calculated using Louis’ formula, including for the nonparametric maximum likelihood estimator (NPMLE) under, for example, the semiparametric proportional hazards mixed models.Vaida and Xu [2000] For observational data, however, inference using the weighted NPMLE under semiparametric models requires the derivation of efficient influence functions,Breslow and Wellner [2007] and is generally nontrivial under the normal frailty construct.Murphy and Van der Vaart [2000], Maples et al. [2002] In the following we use bootstrap to obtain the variance estimator for .
Risk contrasts
Similar to what we proposed under the usual Markov model, we also can define the risk contrasts under the general Markov model. Since the general Markov models are conditional on the random effect , we have the following conditional risk:
(30)  
(31)  
(32) 
where . As discussed earlier the frailty term, or equivalently, the random effect represents the unobserved heterogeneity among the individuals. As such, the above conditional risk represents individual risk, and the risk contrasts the individual risk contrasts. We therefore have the individual risk difference (IRD) and the individual risk ratio (IRR). Under the random effects model, for , the predicted random effect is .Vaida and Xu [2000] We then obtain the predicted IRD and the predicted IRR. For inference on these individual risk contrasts, Bayesian bootstrapKosorok [2008] may be used which, unlike the usual resampling with replacement, preserves each individual in the original data set. Details of the Bayesian bootstrap are provided in the Supplementary Materials. Note that because is random, the common terminology in the literature is ‘predicted’ instead of ‘estimated’, and ‘prediction interval (PI)’ instead of CI.
5 Simulation
We carry out extensive Monte Carlo simulation studies in order to assess the performance of the estimation procedure described above. We use the idea from Havercroft and Didelez [2012] to simulate data under the marginal structural model (10)  (12). We also adapt the method from Jiang and Haneuse [2015], originally designed for simulating semicompeting risk data with gamma frailty. Very briefly the following steps are used to to generate the data; more details are provided in the Supplementary Materials.

Generate and ;

Generate confounder , with , where , and ;

Generate Bernoulli(), where , with ;

Let and . Then with probability given in the Supplementary Materials,
and with probability ,

Generate Censoring time , which leads to an average censoring rate around 20%.
We set , . Weights are calculated by fitting the logistic regression with as covariates. We run 500 simulations for each case. Table 1 and 2
report, for sample size n=250 and n=500, respectively, the estimate, the empirical standard deviation (SD), the mean of estimated standard errors (SE), and the coverage probability (CP) of the nominal 95% confidence intervals. Under the usual Markov model, we estimate the asymptotical variance of
, using both the modelbased formulas, which ignores the uncertainty in the estimation of the weights, and bootstrap. When , we see that the estimation under the usual Markov model is nearly unbiased, in particular for the larger sample size , and the coverage of the confidence intervals (CI) based on the normal approximation is very close to the nominal level. We note that the margin of error using 500 simulation runs to estimate the coverage of 95% CI’s is 0.019, so that the range of coverage probability (CP) should be mostly within 93.1% to 96.9%. We also see that when , the estimation under the general Markov mode performed well for and , . However, the mean of the estimated standard error of is much higher than the empirical standard deviation, and the CI overcovers. We note that this is the boundary cases considered in Xu et al. [2009], where the asymptotical distribution is no longer normal. When , we see that our estimator under the general Markov model is quite accurate for even the smaller sample size , the SEs are close to the sample SD and the coverage probabilities are good. The estimates under the usual Markov model is obviously biased with poor coverage of the CI’s when . Finally, we note that the variances of the estimators are generally larger under the general Markov, as more parameter is estimated.6 Application to HAAS study
For this analysis, we are interested in the effect of midlife alcohol exposure on cognitive impairment as well as death, which are semicompeting risks. In the HHPHAAS study, alcohol consumption was assessed by selfreport and translated into units of drinks per month. Estimates of the total ethanol intake from reported drinking patterns were calculated as ounces per month for beer, liquor, wine, and sake using algorithms based on average unit sizes and usual alcohol percentages. The alcohol consumption was then dichotomized into light drinking (30.1 oz/month) vs heavy drinking (30.1 oz/month). The “midlife” alcohol exposure was collected during the HHP study between 196573. The Heavy Drinking group consisted of individuals who had heavy drinking at one point during midlife, and the Light Drinking those who never had heavy drinking during midlife. Cognitive impairment was based on scores from the Cognitive Assessment and Screening Instrument (CASI), where a score below 74 was considered a moderate impairment (MI). The confounders were decided by literature review and clinical experiences, as well as availability of the data. Literatures show that vital data such as blood pressure and heart rate are associated with drinking habits, as well as the cognitive health. Meanwhile, demographic data such as age, years of education, are also related to cognitive impairment and drinking habits. The Apolipoprotein E is the first identified genetic susceptibility factor for sporadic AD. Towards understanding determinants of cognitive impairment and factors associated with drinking habits, the final set of baseline confounders are baseline CASI score, systolic blood pressure, heart rate, Apolipoprotein E genotype positive, years of education and baseline age. We only include participants with normal cognitive function (CASI 74) at baseline, and after excluding missing values for exposure and confounders, we have 1881 participants in total. Since HAAS is a longterm epidemiology study, lost to followup occurs at every exam visit. On the other hand, death certificates were obtained for many participants, even after lost to followup. For this reason, we needed to properly define the death for the semicompeting risks data. If the death date is after the participant’s recorded last visit date from the study, we consider this participant lost to followup. More details of data preprocessing can be found in Zhang [2022]. Propensity scores (PS) were calculated using R package twang (Toolkit for Weighting and Analysis of Nonequivalent Groups), which estimates the PS using boosted regression as the predicted probability of being heavy versus light drinking, conditional on the measured baseline confounders. Before applying the IPW approach to the multistate model, we obtained stabilized weights and trimmed them within (0.1, 10). In Supplementary Materials we show the PS histograms in the heavy and light drinking groups as a check of the positivity assumption, where the PS distributions are seen to be bounded away from zero and one. We also plot the standardized mean difference (SMD) to check the balance of each confounder before and after weighting, where the SMD’s of all the confounders are within the interval [0.1, 0.1] after weighting. We apply our proposed methods to the HAAS data. We first fit the usual Markov structural model and the results are in the top half of Table 4. We see that the transition rates to moderate impairment or death without moderate impairment are significantly higher in the heavy drinking group compared to the light drinking group. But we don’t see a significant difference in the transition rates to death after moderate impairment. We then fit the general Markov structural model and the results are in the bottom half of Table 4. The convergence plot of the parameters and the likelihood during the weighted EM algorithm are provided in the Supplement Materials, where we stopped at 168 EM steps for the final results. Compared to the results under the usual Markov model, the magnitude of all three estimated effects are further away from the null, and all three transition rates are significantly higher in the heavy drinking group than the light drinking group. The phenomenon of more significant and awayfromthenull regression effects after accounting for the frailty is known in the literature under the Cox model.[Chastang et al., 1988] Finally, we estimate the causal risk contrasts under the structural models. For illustration purposes we fix years in and ; that is, the cumulative incidence rate of death following MI by 8 years. We show the estimated risk curves in Figure 4 first row under the usual Markov model, and the risk contrasts in Table 5 for heavy versus light drinking. It is seen that the risk contrasts for the two competing events, MI and death without MI, are significantly different from the null at 5 and 10 years, but not so at 15 and 20 years. The risk contrasts for death following MI by 8 years are not significantly different from the null at 10, 15 or 20 years under the usual Markov model. We also show the predicted conditional risk curves at different values () in Figure 4, rows 26. In Figure 5 we plot the IRD and IRR at 10 years with 95% PI’s of 100 participants from every percentile of the predicted values. We note the different significance results for IRD and IRR: the IRD tends to be significantly different from the null for values closer to zero, while the IRR tends to be significantly different from the null for negative values. This appears to be generally the case for all three outcomes: MI, death without MI, and death following MI by 8 years. More discussion will follow in the next section.
7 Discussion
In this paper we applied the threestate illnessdeath model to observational data using the potential outcomes framework. Inverse probability of treatment weighting is used to fit these structural models. Under the Cox model formulation, typical software used to fit the Cox regression model can be used to fit the usual Markov model in the absence of frailty. With the frailty term under the general Markov model, a weighted EM algorithm is developed and its convergence property studied. The simulation studies showed the good performance of our proposed methods. For applications in practice, we have defined cumulative risk based causal contrasts and illustrated their use. Under the general Markov model with frailty, these give rise to individual risk contrasts IRD and IRR. This is consistent with the random effects modeling formulation, where individual trajectories, for example, from longitudinal data can be estimated and predicted. We have extended this feature to the causal inference setting, when the individual heterogeneity is modeled using random effects. It might also be of some interest to compare the IRD and IRR to the RD and RR under the usual Markov model without frailty, and note some similarity between the first and the fourth row of Figure 4, where the random effect is set to its mean value of zero. We note that these two sets of contrasts are not the same, especially since the Cox model is not collapsible; and the interpretations are different for these two sets of contrasts. Semicompeting risks data have recently been considered under the mediation setup with the nonterminal event as a mediator. Huang [2021], Xu et al. [2022]
Our multistate structural models instead consider the total effect of the exposure on all three outcomes: nonterminal event, and terminal event with and without nonterminal event. For future work, since the IPW estimator is biased if the propensity score model is misspecified, an augmented IPW (AIPW) estimator with doubly robust properties can protect against such model misspecification. It would also allow us to apply machine learning or nonparametric methods to the propensity score model.
Rava [2021] and Tchetgen and Robins [2012] have already developed the AIPW estimator for the marginal structural Cox model, and it is nature to extend their work for the models in this paper. This is currently under investigation. Another future direction is to develop sensitivity analysis approaches for various assumptions including unmeasured confounding as well as modeling assumptions that are used. The R codes developed in this work have been implemented in the R package semicmprskcoxmsm that is publicly available on CRAN.Acknowledgement
This research was partially supported by NIH/NIA grant R03 AG062432. We thank Dr. Andrew Ying for discussion regarding Lemma 1.
References
 Xu et al. [2010] Jinfeng Xu, John D Kalbfleisch, and Beechoo Tai. Statistical analysis of illness–death processes and semicompeting risks data. Biometrics, 66(3):716–725, 2010.
 Fix and Neyman [1951] Evelyn Fix and Jerzy Neyman. A simple stochastic model of recovery, relapse, death and loss of patients. Human Biology, 23(3):205–241, 1951.
 Fine et al. [2001] Jason P Fine, Hongyu Jiang, and Rick Chappell. On semicompeting risks data. Biometrika, 88(4):907–919, 2001.
 Lee et al. [2021] Catherine Lee, Paola Gilsanz, and Sebastien Haneuse. Fitting a shared frailty illnessdeath model to lefttruncated semicompeting risks data to examine the impact of education level on incident dementia. BMC Medical Research Methodology, 21(1):1–13, 2021.
 Jazić et al. [2020] Ina Jazić, Stephanie Lee, and Sebastien Haneuse. Estimation and inference for semicompeting risks based on data from a nested casecontrol study. Statistical methods in medical research, 29(11):3326–3339, 2020.
 Lee et al. [2015] Kyu Ha Lee, Sebastien Haneuse, Deborah Schrag, and Francesca Dominici. Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64(2):253–273, 2015.
 Alvares et al. [2019] Danilo Alvares, Sebastien Haneuse, Catherine Lee, and Kyu Ha Lee. Semicomprisks: an R package for the analysis of independent and clustercorrelated semicompeting risks data. The R journal, 11(1):376, 2019.
 Robins et al. [2000] James M Robins, Miguel Angel Hernan, and Babette Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5):550–560, 2000.
 Hernán et al. [2001] Miguel A Hernán, Babette Brumback, and James M Robins. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96(454):440–448, 2001.
 MeiraMachado and Sestelo [2019] Luís MeiraMachado and Marta Sestelo. Estimation in the progressive illnessdeath model: A nonexhaustive review. Biometrical Journal, 61(2):245–263, 2019.
 Kalbfleisch and Prentice [2011] John D Kalbfleisch and Ross L Prentice. The statistical analysis of failure time data. John Wiley & Sons, 2011.

Neyman [1923]
Jerzy S Neyman.
On the application of probability theory to agricultural experiments. essay on principles. section 9.(translated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465480).
Annals of Agricultural Sciences, 10:1–51, 1923.  Rubin [2005] Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
 Tchetgen Tchetgen et al. [2021] Eric J Tchetgen Tchetgen, Isabel R Fulcher, and Ilya Shpitser. AutoGcomputation of causal effects on a network. Journal of the American Statistical Association, 116(534):833–844, 2021.
 Hernán and Robins [2021] Miguel A Hernán and James M Robins. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2021.
 Andersen et al. [1991] Per Kragh Andersen, Lars Sommer Hansen, and Niels Keiding. Nonand semiparametric estimation of transition probabilities from censored observation of a nonhomogeneous markov process. Scandinavian Journal of Statistics, pages 153–167, 1991.
 Lancaster and Nickell [1980] Tony Lancaster and Stephen Nickell. The analysis of reemployment probabilities for the unemployed. Journal of the Royal Statistical Society: Series A (General), 143(2):141–152, 1980.
 Nielsen et al. [1992] Gert G Nielsen, Richard D Gill, Per Kragh Andersen, and Thorkild IA Sørensen. A counting process approach to maximum likelihood estimation in frailty models. Scandinavian journal of Statistics, pages 25–43, 1992.
 Vaida and Xu [2000] Florin Vaida and Ronghui Xu. Proportional hazards model with random effects. Statistics in medicine, 19(24):3309–3324, 2000.
 McCaffrey et al. [2004] Daniel F McCaffrey, Greg Ridgeway, and Andrew R Morral. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological methods, 9(4):403, 2004.
 Zhang and Xu [2021] Yiran Zhang and Ronghui Xu. cmprskcoxmsm: Use IPW to Estimate Treatment Effect under Competing Risks, 2021. R package version 0.2.1.
 Gander and Gautschi [2000] Walter Gander and Walter Gautschi. Adaptive quadrature—revisited. BIT Numerical Mathematics, 40(1):84–101, 2000.
 Rice [1975] John R Rice. A metalgorithm for adaptive quadrature. Journal of the ACM (JACM), 22(1):61–82, 1975.
 Wu [1983] CF Jeff Wu. On the convergence properties of the EM algorithm. The Annals of statistics, pages 95–103, 1983.
 Lehmann and Casella [2006] Erich L Lehmann and George Casella. Theory of point estimation. Springer Science & Business Media, 2006.
 Vaida [2005] Florin Vaida. Parameter convergence for EM and MM algorithms. Statistica Sinica, pages 831–840, 2005.
 Breslow and Wellner [2007] Norman E Breslow and Jon A Wellner. Weighted likelihood for semiparametric models and twophase stratified samples, with application to cox regression. Scandinavian Journal of Statistics, 34(1):86–102, 2007.
 Murphy and Van der Vaart [2000] Susan A Murphy and Aad W Van der Vaart. On profile likelihood. Journal of the American Statistical Association, 95(450):449–465, 2000.
 Maples et al. [2002] Jerry J Maples, Susan A Murphy, and William G Axinn. Twolevel proportional hazards models. Biometrics, 58(4):754–763, 2002.
 Kosorok [2008] Michael R Kosorok. Introduction to empirical processes and semiparametric inference. Springer, 2008.
 Havercroft and Didelez [2012] WG Havercroft and Vanessa Didelez. Simulating from marginal structural models with timedependent confounding. Statistics in medicine, 31(30):4190–4206, 2012.
 Jiang and Haneuse [2015] Fei Jiang and Sebastien Haneuse. Simulation of semicompeting risk survival data and estimation based on multistate frailty model. Harvard University Biostatistics Working Paper Series, 2015.
 Xu et al. [2009] Ronghui Xu, Florin Vaida, and David P Harrington. Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 19(2):819, 2009.
 Zhang [2022] Y Zhang. Causal inference for competing risks and semicompeting risks data. Ph.D. Thesis, University of California, San Diego, 2022.
 Chastang et al. [1988] Claude Chastang, David Byar, and Steven Piantadosi. A quantitative study of the bias in estimating the treatment effect caused by omitting a balanced covariate in survival models. Statistics in Medicine, 7(12):1243–1255, 1988.
 Huang [2021] YenTsung Huang. Causal mediation of semicompeting risks. Biometrics, 77(4):1143–1154, 2021.
 Xu et al. [2022] Yanxun Xu, Daniel Scharfstein, Peter Müller, and Michael Daniels. A Bayesian nonparametric approach for evaluating the causal effect of treatment in randomized trials with semicompeting risks. Biostatistics, 23(1):34–49, 2022.
 Rava [2021] Denise Rava. Survival Analysis and Causal Inference: from Marginal Structural Cox to Additive Hazards Model and beyond. Ph.D. Thesis, University of California, San Diego, 2021.
 Tchetgen and Robins [2012] Eric J Tchetgen Tchetgen and James Robins. On parametrization, robustness and sensitivity analysis in a marginal structural cox proportional hazards model for point exposure. Statistics & Probability Letters, 82(5):907–915, 2012.
8 Supplementary materials
8.1 Derivation of , and
We also have: