Exchangeable, Markov multi-state survival process

10/24/2018
by   Walter Dempsey, et al.
Harvard University
0

We consider exchangeable Markov multi-state survival processes -- temporal processes taking values over a state-spaceS with at least one absorbing failure state ∈S that satisfy natural invariance properties of exchangeability and consistency under subsampling. The set of processes contains many well-known examples from health and epidemiology -- survival, illness-death, competing risk, and comorbidity processes; an extension leads to recurrent event processes. We characterize exchangeable Markov multi-state survival processes in both discrete and continuous time. Statistical considerations impose natural constraints on the space of models appropriate for applied work. In particular, we describe constraints arising from the notion of composable systems. We end with an application of the developed models to irregularly sampled and potentially censored multi-state survival data, developing a Markov chain Monte Carlo algorithm for posterior computation.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

02/14/2022

Bayesian semi-parametric inference for clustered recurrent events with zero-inflation and a terminal event/4163305

Recurrent event data are common in clinical studies when participants ar...
05/29/2020

Estimation of Semi-Markov Multi-state Models: A Comparison of the Sojourn Times and Transition Intensities Approaches

Semi-Markov models are widely used for survival analysis and reliability...
07/07/2022

A Bayesian Survival Tree Partition Model Using Latent Gaussian Processes

Survival models are used to analyze time-to-event data in a variety of d...
03/14/2022

Graph-Survival: A Survival Analysis Framework for Machine Learning on Temporal Networks

Continuous time temporal networks are attracting increasing attention du...
03/06/2018

Modeling cure fraction with frailty term in latent risk: a Bayesian approach

In this paper, we propose a flexible cure rate model with frailty term i...
03/24/2021

Joint Stochastic Modeling and Statistical Analysis of Recurrent Competing Risks, Longitudinal Marker, and Health Status

Consider a subject or unit being monitored over a period of random durat...
11/24/2021

Strong Invariance Principles for Ergodic Markov Processes

Strong invariance principles describe the error term of a Brownian appro...

1 Introduction

In many clinical survival studies, a recruited patient’s health status is monitored on either a regular or intermittent schedule until either (1) an event of interest (e.g., failure) or (2) the end of the study window. Covariates are recorded at the time of recruitment, and treatment protocols per patient are presumed known at baseline. In the simplest survival study, health status at time 

is a binary variable, dead 

or alive . In clinical trials with health monitoring,  is a more detailed description of the state of health of the individual, containing relevant patient information, e.g., pulse rate, cholesterol level, cognitive score or CD4 cell count (Diggle et al., 2008; Farewell and Henderson, 2010; Kurland et al., 2009).

In this paper, we consider the setting in which the health process takes values in some pre-specified “state-space”. For example, in the illness-death model, we summarize the current state of the participant as taking one of three possible values . Such a process can be thought of as a coarse view of the state of health for a patient over time. The continuing importance of multi-state processes in applications cannot be overstated (Jepsen et al., 2015; van den Hout, 2016).

When no baseline covariates are measured beyond the initial state , the model for the set of patient state-space processes should satisfy natural constraints. First, the model should be agnostic to patient labeling. Second, the model should be agnostic to sample size considerations. These natural constraints (mathematically defined in section 2) lead to the concept of exchangeable Markov multi-state survival processes. The purpose of this paper is to characterize the set of multi-state survival processes and show how this theory of exchangeable stochastic processes fits naturally into the applied framework of event-history analysis. Both the parametric continuous-time Markov process with independent participants and the nonparametric counting process are contained as limiting cases. Next, we discuss the notion of “composable systems” and its effect on model specification. A Markov Chain Monte Carlo (MCMC) algorithm is then derived for posterior computations given irregularly sampled multi-state survival data. We end with an application to a cardiac allograft vasculopathy (CAV) multi-state survival study.

1.1 Related work

Odd Aalen was one of the first to recognize the importance of incorporating the “theory of stochastic processes” into an “applied framework of event history analysis” (Aalen et al., 2008, p. 457). Martingales and counting processes form the basis of this nonparametric approach. The fundamental concept of the product-integral unifies discrete and continuous-time processes. Nonparametric methods, however, do not adequately handle intermittent observations. For example, Aalen et al. (2015) consider dynamic path analysis for a liver cirrhosis dataset. In this study, the prothrombin index, a composite blood coagulation index related to liver function, is measured initially at three-month intervals and subsequently at roughly twelve-month intervals. To deal with the intermittency of observation times, Aalen et al. (2015)

use the “last-observation carried forward” (LOCF) assumption. However, such an assumption is unsatisfactory for highly variable health processes, and can lead to biased estimates 

(Little et al., 2010). Jepsen et al. (2015) discuss the importance of multi-state and competing risks models for complex models of cirrhosis progression. Here again, the nonparametric approach assumes observation times correspond to transition times of the multi-state process.

One alternative is to consider parametric models such as continuous-time Markov processes. Prior work 

(Saeedi and Bouchard-Côté, 2011; Hajiaghayi et al., 2014; Rao and Teh, 2013) has focused on estimation of parametric continuous-time Markov processes under intermittent observations. Most parametric models, however, make strong assumptions about the underlying state-space process; in particular, most models assume independence among patients. One implication is that observing sharp changes in health among prior patient trajectories at a particular time since recruitment will not impact the likelihood of a similar sharp change in a future patient at the same timepoint. The proposed approach in this paper balances between the nonparametric and parametric approaches.

2 Multi-state survival models

In this section we formally define the multi-state survival process and the notions of exchangeability, Kolmogorov-consistency, and the Markov property. We combine these in section 3 to provide characterization theorems for these processes in discrete and continuous-time.

2.1 Multi-state survival process

Formally the multi-state survival process, is a function from the label set  into the state space . For now, we assume the cardinality is finite (i.e.,  ). If the response is in discrete-time, then the process is defined on . If the response is in continuous-time then the process is defined on . Each label is a pair , and the value is an element of  corresponding to the state of patient  at time .

The distinguishing characteristic of survival processes is flatlining (Dempsey and McCullagh, 2018); that is, there exists an absorbing state  such that  implies  for all . Thus, the survival time  for unit  is a deterministic function of the multi-state survival process :

For all , we assume at recruitment, so . Multiple absorbing states  representing different terminal events may occur, such as for competing risk processes.

Without loss of generality, we assume . For example, if the state-space is , we recode this to . In this case,  is the flatlining state. At each time , the population-level process is given by . We write to denote a generic element of . We write  to denote the restriction of the state space process to . We call  the -restricted state-space process for . We write  to denote a generic element of .

2.2 Transition graphs for multi-state survival process

The transition graph represents the set of potential transitions between elements of the multi-state survival process out of the set of possible transitions. The transition graph is a directed graph . The vertex set  is all potential states; the directed edge set  contains all edges such that at jump times the Markov process can jump from to . In the illness-death model, for example, a patient can jump from Alive to Ill but not back; therefore (Alive, Ill) is in the edge set but not (Ill, Alive). Example 2 in supplementary section C provides additional details. In the bi-directional illness-death model, both edges are present in the transition graph. In continuous-time, jumps can only occur between distinct states, in which case the number of possible edges is . In this case,  for all . Any absorbing state  satisfies  for all . We write  to denote the set of by transition matrices  satisfying for all , and for all . In the continuous-time setting, define .

2.3 Consistency under subsampling

We first note that sample size is often an arbitrary choice based on power considerations and/or patient recruitment constraints. Statistical models should be agnostic to such considerations. That is, observing units versus units and then restricting to the first  units should be equivalent, i.e., the model should exhibit consistency under subsampling.

Consider the multi-state survival process  for . Define the restriction operator  to be the restriction of  to the first individuals. Then the process is consistent under subsampling if is a version of  for all . Under the consistency assumption, the process  satisfies lack of interference; mathematically,

where is the -field generated by the variables for and . Lack of interference is essential, ensuring the -restricted multi-state survival process is unaffected by the multi-state survival process for subsequent components. Consistency under subsampling ensures the statistical models are embedded in suitable structures that permit extrapolation. Without it, one is forced into the awkward situation of only being interested in the current collected dataset. Consistency ensures statistical inference to be a special case of “reasoning by induction” (De Finetti, 1972).

2.4 Exchangeability

We next note that the patient labels, , are also arbitrary. Therefore, any suitable multi-state survival process must be agnostic to patient relabeling. We define a multi-state survival process  to be [partially] exchangeable if for any permutation , the relabeled process is a version of .

2.5 Time-homogeneous Markov process

We say  is a time-homogeneous Markov process if, for every , the conditional distribution of  given the multi-state survival process history up to time , only depends on and . This Markovian assumption is a simplifying assumption which leads to mathematically tractable conclusions. In this paper, we restrict our attention to time-homogeneous processes; therefore, we simply say is Markovian.

3 Markov, exchangeable multi-state survival processes

We define a multi-state survival process that is Markovian, exchangeable, and consistent under sampling as a Markov, exchangeable multi-state survival process. Below, we characterize these processes in both discrete and continuous time. The behavior is markedly different in each setting with continuous-time Markov processes exhibiting much more complex behavior – allowing for both single-unit changes at time  as well as positive fraction changes – showing why choice of time-scale matters in applied settings.

3.1 Discrete-time multi-state survival models

In discrete-time, the Markov, exchangeable multi-state survival process is governed by a series of random transition matrices 

each drawn independently from a probability measure 

on . That is, the initial state  is drawn from an exchangeable distribution on . Then at time , the transition distributions for each  are given by

i.e., the entry of , which is a random transition matrix drawn from . Let  denote a discrete-time process constructed by this procedure with probability measure . By construction, the process is an exchangeable, Markov multi-state survival process in discrete time. Theorem 1 states that this procedure describes all such processes. The proof is left to supplementary section A.

Theorem 1 (Discrete-time characterization).

Let  be a Markov, exchangeable multi-state survival process. Then there exists a probability measure on such that  is a version of .

3.2 Continuous-time multi-state survival models

In continuous-time, the Markov exchangeable multi-state survival process is governed by a measure on transition matrices, denoted , and a set of constants associated with the edge set, denoted . Unlike discrete-time, however, the jumps occur at random times.

Consider the -restricted state space process. If

is the current state then the holding time in this state is exponentially distributed with a rate parameter depending on the current

configuration (see Section 5.3). At a jump time , one of two potential events can occur: (a) a single unit  experiences a transition from to a state , or (b) a subset of  (potentially a singleton) experience a simultaneous transition. Given the transition matrix , these simultaneous transitions are independent and identically distributed transitions according to . The transition matrix  is obtained from a measure  on and a set of constants ; unlike the discrete-time setting, the measure  need not be integrable. Let  denote a continuous-time process constructed by this procedure with probability measure .

Theorem 2 (Continuous-time characterization).

Let  be a Markov, exchangeable multi-state survival process; and  be the identity matrix. Then there exists a probability measure  on  satisfying

(1)

and constants  such that is a version of .

Theorem 2 generalizes Proposition 4.3 in Dempsey and McCullagh (2017) from the simple survival process setting. The multi-state survival process contains many more well-known examples from health and epidemiology – survival, illness-death, competing risk, and co-morbitity processes. We highlight these examples in supplementary section C. Within the discussion of examples, we extend Theorems 1 and 2 to the setting of recurrent events (see Corollaries 2 and 3). Here, Theorem 2 is used to characterize the continuous-time Markov chain  in terms of (1) exponential holding rates and (2) transition matrix at jump times. We start by defining the characteristic index – a set of functions .

Definition 1 (Configuration vector).

For , define  as the configuration vector – an

-vector summary of the number of units in each state. For example, if

, and , then ; for  then .

Definition 2 (Characteristic index).

The characteristic index  is defined as the normalizing constant to the integral representation obtained from Theorem 2:

where  is the th component of the configuration vector associated with , and the sum is set to when . Condition (1) implies the characteristic index is finite for any .

For simple survival processes, the characteristic index defined here simplifies to the characteristic index as defined in Dempsey and McCullagh (2017). At a jump-time , the probability of transition from  to  is

where is the indicator function,  is the single unit to experience a transition, and  is the non-normalized transition function.

4 Discretization and rounding

It has been argued that “there may be no scientific reason to prefer a true continuous time model over a fine discretization” (Breto et al., 2009, p. 325). We tend to disagree with such a viewpoint; a basic and very important issue in multi-state survival analysis is the distinction between inherently discrete data (coming from intrinsically time-discrete phenomena) and grouped data (coming from rounding of intrinsically continuous data). Theorems 1 and 2 reinforce this distinction as we see distinct characterizations of discrete and continuous-time processes. One example of the former in survival analysis is time to get pregnant, which should be measured in menstrual cycles. The latter represents the majority of multi-state survival data. For this reason, we focus the remainder of this paper on the continuous-time case.

5 Description of continuous-time process

5.1 Holding times

Let  be a jump time at which the state vector  transitions into state . To each such state , we associate an independent exponentially distributed holding time. By choosing the rate functions in an appropriate way, the Markov multi-state survival process can be made both consistent under subsampling, and exchangeable under permutation of particles.

Corollary 1.

A set of rate functions , is consistent if it is proportional to the characteristic index .

Corollary 1 follows from Theorem 2, and shows how the exponential holding rate relates to the characteristic index; in particular, the difference is a proportionality constant  which depends on choice of time-scale.

5.2 Density function

Since the evolution of the process

is Markovian, it is a straightforward exercise to give an expression for the probability density function for any specific temporal trajectory. The probability that the first transition occurs in the interval

with transition from  to is

where is the non-normalized transition probabilities. Continuing in this way, it can be seen that the joint density for a particular temporal trajectory consisting of transitions with transition times is

(2)

The number of transitions 

is a random variable whose distribution is determined by (

2), and hence by . Note that with probability one  for each ; therefore, the transition at time  is such that  (i.e., all units have failed by time ). By definition  and so the integral in equation (2) is finite with probability one.

The

-dimensional joint distribution is continuous in the sense that it has no fixed atoms. For

, it is not continuous with respect to Lebesgue measure in . The one-dimensional marginal process is a Markov multi-state survival process with holding rates , assuming a valid starting state. For example, in a survival process the only valid starting state is “Alive”.

Although the argument leading to (2) did not explicitly consider censoring, the density function has been expressed in integral form so that censoring is accommodated correctly. The pattern of censoring affects the evolution of , and thus affects the integral, but the product involves only transitions and transition times. So long as the censoring mechanism is exchangeability preserving (Dempsey and McCullagh, 2017), inference based on the joint density given by equation (2) is possible. Both simple type I censoring and independent censoring mechanism preserve exchangeability.

5.3 Sequential description

Kolmogorov consistency permits ease of computation for the trajectory of a new unit  given trajectories for the first  units . The conditional distribution is best described via a set of paired conditional hazard measures, . For , the pair consists of a continuous component  in addition to an atomic measure  with positive mass only at the observed transition times of .

For a time , not a transition time of , consider the new unit transitioning from state  to . Then the continuous component  has hazard and cumulative hazard

Note that is piecewise constant, so the integral is trivial to compute, but censoring implies that it is not necessarily constant between successive failures.

Now let be an observed transition time (i.e., ) and consider the atomic measure  associated with switching from state to . At each such point, the conditional hazards has an atom with finite mass

or, on the probability scale,

The above calculations define the conditional holding time of the new unit after it enters state  at time  (i.e.,   and ) conditional on . For , let  denote the observed transition times of  within the time-window . Then, the probability that the unit stays in state  for at least time points is

This conditional probability is used to construct the Markov Chain Monte Carlo sampling procedure in Section 7. For any exchangeable, Markov multi-state survival process with absorbing state  such that  with probability one, for all implies that the continuous components ( for ) have infinite total mass, so the time until reaching an absorbing state is finite with probability one. This implies, for instance, that in the Aalen comorbidities study (see Example 3 in Section C) that the process will terminate in death with probability one.

Although the above conditional hazards look complex, it is not difficult to generate the transition times sequentially for processes whose characteristic index admits a simple expression for the above expressions. Right censoring is automatically accommodated by the integral in the continuous component, so the observed trajectory may be incomplete. Below we introduce the self-similar harmonic process – a multi-state survival process which admits such simple expressions. We use this process as a building block for more complex models in further sections.

5.4 Self-similar harmonic process

Theorem 2 proves tied failures are an intrinsic aspect of Markov multi-state survival processes. As stated in Section 4, grouped data usually are the result of rounding of intrisically continous data. For these processes to be useful in biomedical applications, it is essential that the model should not be sensitive to rounding. Sensitivity to rounding is addressed by restricting attention to processes whose conditional distributions are weakly continuous, i.e., a small perturbation of the transition times gives rise to a small perturbation of the conditional distribution.

Dempsey and McCullagh (2017) originally studied this question in the context of exchangeable, Markov survival processes. In particular, it is shown that the harmonic process is the only Markov survival process with weakly-continuous conditional distributions. Here, we extend the harmonic process to a multi-state survival process by associating with each edge  an independent harmonic process with parameters . For  let  denote the unique observed transition times from  to  for  and let  ; then the continuous component of the hazard is given by:

where the sum runs over transition times , and is the last such event. The discrete component is a product over transition times

(3)

For small , the combined discrete components above are essentially the same as the right-continuous version of the Aalen-Johansen estimator.

We call this process the self-similar harmonic process with transition graph . The associated measure  on is

where  is the single non-zero, off-diagonal entry, is the indicator function, and  are the associated parameters.

While the self-similar harmonic process has strong appeal for use in applied work, we show below that it is not a universally optimal choice. The strong assumption embedded in the self-similar harmonic process is that the transition processes are independent across edges . This implies that at each transition time only transitions along a single edge are possible. We argue below that while this may make sense in specific instances, additional care is needed in writing down appropriate models for multi-state survival processes in general.

6 Composable multi-state survival models

We now discuss constraints on the multi-state survival models based on decompositions of the state-space . We use the bidirectional illness-death process as an illustrative example. Recall this process has three states, {Healthy, Ill, Dead} (i.e, ); see Example 2 for further details. The state “Dead” is unique and distinct from the states “Healthy” and “Ill”. Indeed, the latter two states require the individual to be categorized more broadly as alive, and are thus refinements of this more general state (i.e. “Alive”). Suppose the labels “Healthy” and “Ill” were uninformative with respect to failure transitions. Then the refinement is immaterial, and the transition rules should collapse to the transition rule for an exchangeable, Markov survival process.

The above discussion leads to two natural constraints: (1) the state “Dead” (State 3) is unique and distinct from the other states, and (2) the states “Healthy” (State 1) and “Ill” (State 2) should be considered partially exchangeable (De Finetti, 1972). To satisfy this, we require particular constraints on the measure  on transition matrices. First, the measure must only take positive mass on one of two sets of transition matrices: (a) with off-diagonal positive mass only in entries and/or , or (b) with off-diagonal positive mass only in entries and/or . The first set are transition matrices representing transitions from “Healthy” or “Ill” to “Dead”. The second are transitions between “Healthy” and “Ill” or vice versa. The partition  splits the space into two disjoint sets. Within each block  the states are partially exchangeable. We say that the bidirectional illness-death process is thus relatively partially exchangeable with respect to the partition .

Let  denote the number of individuals in states “Healthy” and “Ill” directly preceding a transition of type (a). Then the probability that and individuals respectively transition to the state “Dead” is proportional to , where  is the measure restricted to type (a) transition matrices. That is,  puts positive mass on transition matrices  such that  so for , and  (i.e., “Dead” is an absorbing state). In this paper, we focus on the following choice of the restricted measure:

A similar formula exists for transitions of type (b). The choice corresponds to a proportional model on the logarithmic scale. It strongly links  and  with baseline measure equivalent to that for a harmonic process. In supplementary section D, we provide more details on the connection to the proportional conditional hazards model. We now consider extending this approach to construction of the measure  for any multi-state survival process.

6.1 Composable multi-state survival process

We now generalize the above by introducing -composable processes.

Definition 3.

A multi-state survival process is -composable if there exists a partition of the state-space  such that elements within block  are partially exchangeable with respect to transition graph .

In other words, a -composable process is any process that is relatively partially exchangeable with respect to . For the bi-directional illness-death process (ex. 2), . For comorbidities (ex. 4), partitions the risk processes. Two risk processes in the same partition may represent the same underlying process. For the competing risks (ex. 4), where  partition the absorbing states and the single state, “Alive”, is distinct which implies . Every process is composable via the degenerate partition where each  is a singleton and . The partition accounts for absorbing states, , by satisfying  for , where is the component of  to which state  belongs.

If  is -composable and is a refinement of the partition , then is also -composable. To avoid confusion, from here on, when we say is -composable, we assume there does not exist a refinement  such that is also -composable. Definition 3 is similar in spirit to that of Schweder 2007—both aim to formalize the notion that state changes in the process  are due to changes in different components.

6.2 Choice of measure  for a -composable process

Here, we construct an appropriate measure  for a -composable, Markov, exchangeable multi-state survival process. The measure will take positive mass only on transitions from states within  to states within  for a single choice of  indexing components of the partition . For each component , let  denote a representative state. Then, for , define the restricted measure on transitions from states in  to states in , , by

(4)
(5)
(6)

where , , , and such that . Here, the assumption is  for , and . Lines (4) and (5) build the general measure from a baseline harmonic measure and the assumption of proportionality on the logarithmic scale for , where the proportionality constant depends on . Note  is set to 1 by design, and so the parameters measure risk relative to the chosen representative state. Line (6) addresses the fact that a single state  can transition to multiple states in . Here, we choose an atomic measure located at , where the parameters satisfy . These parameters lead to a multinomial distribution across transitions. We take this approach because it leads to simple Gibbs updates of the underlying parameters. In the example presented in section 9

, Mild CAV status can transition to either No CAV or Severe CAV status. This is captured by presupposing a binomial distribution over these transitions.

Note, for , the above construction yields the self-similar harmonic process. For , the above construction yields the bi-directional illness-death process from Section 6.

7 Parameter estimation

In practice, the patient’s health status is typically measured at recruitment (), and regularly or intermittently thereafter while the patient is under observation. A complete observation on one patient consists of the appointment schedule , multi-state process measurements , and a failure/censoring time , and a censoring indicator . For censored records,  and the censoring time  is usually, but not necessarily, equal to the date of the most recent appointment or end of study.

Here, we assume non-informative observation times. In particular, given previous appointment times and observation values , the next appointment time  satisfies

(7)

In other words, the conditional distribution of the random interval  may depend on the observed history but not on the subsequent health trajectory. This assumption combined with variational independence implies the component of the likelihood associated with the appointment schedule can be ignored for maximum likelihood estimation of parameters associated with the multi-state survival process. Under (7), we propose a Markov Chain Metropolis Hastings algorithm for posterior computation.

7.1 Uniformization

Uniformization (Jensen, 1953; Hobolth and Stone, 2009) is a well-known technique for generating sample paths for a given Markov state-space process. Take a time-homogeneous, continuous-time Markov process with transition matrix  (i.e., a matrix satisfying  and ). A sample path can be generated via Gillespie’s algorithm: (1) given in state , generate a random exponentially distributed holding time  with rate ; (2) after  time steps, randomly transition to a new state  with probability . Alternatively, a sample path can be generated by uniformization with parameter : (1) generate sequence of ’potential’ transition times  from a homogeneous Poisson process with intensity ; (2) run a discrete-time Markov chain with transition matrix  on the times  to generate sequence of states ; (3) construct  by removing elements of  where the process does not transition to a new state, and construct  to be the sequence of states at times . The sample path is represented by . Both algorithms generate sample paths from the same Markov state-space process; however, uniformization is highly adaptable to Markov Chain Monte Carlo techniques. See Rao and Teh (2013) for an excellent discussion of uniformization-based MCMC for Markov processes.

7.2 The MCMC Algorithm

Direct application of existing Gibbs samplers based on uniformization to our setting is problematic due to combinatorial growth in the state-space and time-inhomogeneity of the conditional Markov process. Luckily, this approach can be adjusted using the sequential description from Section 5.3 as guidance. In this section, we derive an MCMC algorithm for posterior computations given irregularly sampled multi-state survival data.

7.2.1 Prior specification and MCMC updates

We start by specifying priors on the parameters , and . We write  to denote the complete set of parameters. We write , , et cetera to denote each subset of parameters. Recall for identifiability reasons  for . For all other pairs , we take the prior to be a log-normal mean-zero distribution . Weakly informative default priors are an alternative Gelman et al. (2008); this corresponds to a log-Cauchy prior with center 0 and scale parameter . However, we saw minimal differences in simulation studies for our current setting. The complete-data likelihood is non-conjugate, so we perform Metropolis-Hastings updates, conditional on both the complete trajectory and all other parameters.

For , we follow Dempsey and McCullagh (2017) and take  as a fixed tuning parameter. Next, define . Scaling by  allows direct comparison of  across various choices of the tuning parameter. We choose a Gamma prior, , which is conjugate to the complete-data likelihood. The posterior distribution conditional on the complete trajectory and other parameters is

(8)

where is the number of transition between blocks and . Finally, for  consider transitions to partition . Index states in such that a transition from is possible by . Then the prior for is a Dirichlet distribution with parameters . Then the posterior is conjugate and

(9)

where  counts the number of transitions from state  to state  in .

7.2.2 Conditional sampling patient trajectories

We now adapt uniformization to construct a Gibbs sampling of the patient trajectory given all other trajectories , parameters , and the previous trajectory for this patient . For patient , we have observation . The appointment schedule  is an ordered sequence  where if and only if . By the Markov property, we can focus on generating the patient trajectory for fixed parameters for each interval  separately. Let  be the unique transition times for all other patients within the interval . Denote this set of transition times . At each time , define  for some constant . By definition, this is a piecewise constant function that changes at unique transition times or censoring times. Next, sample a Poisson process  with piecewise-constant rate , where is the continuous component of the conditional distribution. Finally, let  denote the transition times from the previous trajectory . Then let denote the union of these times. We can then apply the forward-filtering, backward sampling algorithm with transition matrix at times and transition matrix  at times . This combines the standard Gibbs sampling based on uniformization with the added complexity of the atomic component associated with the conditional distributions for exchangeable, Markov multi-state survival processes.

7.2.3 MCMC procedure

In each iteration, we proceed sequentially through patients, sampling a latent multi-state path for patient  given all other latent processes  as discussed in Section 7.2.2. Then, conditional on the multi-state process , we perform Metropolis-Hastings updates for as the complete-data likelihoods are easy to compute via equation (2). We end each iteration by using equation (8) and (9) to sample from the posterior for  and respectively. One issue with this procedure is sequential sampling of the latent process is computationally expensive and can significantly slow down the MCMC procedure. To address this issue, we also propose an approximate MCMC algorithm in which the latent processes are only updated every few iterations. We see performance is not significantly altered, and run time drops significantly. In the illustrative example below, we show iterations of the MCMC procedure where the latent process updates runs every iterations has similar perforance as the standard MCMC procedure with significant decrease in overall runtime.

8 MCMC procedure: a simulation example

We illustrate the sampling procedure on a bi-directional illness-death model (example 2). We set parameters as follows: first, we assume that marginally a healthy participant transitions to ill and dead after and years respectively (on average); ill participants to both healthy and dead on average every years respectively. Both healthy and ill participants took on average years to transition to failure. We assume a sample of  individuals, with initially healthy and initially unhealth, were generated.

First, assume all transitions are observed. Maximum likelihood estimation is performed. Next, assume the state of each individual is observed annually, with the transition time to failure observed. Traceplots in Figure 0(a) suggest convergence of the MCMC procedure after the first 100 iterations. The MCMC sampler gives posteriors for the parameters. Table 1 contains these estimates. We see good performance for , , and . The posterior for

reflects the observation schedule; indeed, increasing the frequency of observation significantly improves these posteriors. In particular, under complete observations, the posterior distributions are approximately equal in distribution to the asymptotically normal confidence intervals.

Maximum Likelihood Posterior distribution
Parameter True Value Estimate Lower CI Upper CI Mean

5% Quantile

95% Quantile
0.50 0.53 0.43 0.64 0.15 0.09 0.23
0.20 0.19 0.13 0.25 0.20 0.15 0.25
0.70 0.75 0.64 0.86 0.85 0.67 1.07
1.71 1.71 1.28 2.13 1.23 1.62 2.09
Table 1: Parameter estimation

Removing the first 100 iterations as burn-in, posterior distributions are presented in Figure 0(b). Black curves are the MCMC sampler procedure; red curves are the approximate MCMC sampling procedure with latent process updates every iterations. We see distributions are approximately equal in all cases, with largest errors for . This supports the aforementioned difficulty in estimating due intermittent observations. This is a consequence of the observation schedule being infrequent compared to the underlying stochastic dynamics. Complete observations (i.e., more frequent observations) significantly improves estimation of .

(a) Traceplots for Gibbs sampler
(b) Approximate posterior density
Figure 1: MCMC traceplots and densities for simulation example

Beyond posterior distributions for parameters, one is typically interested in posterior distributions of the survival functions. Note, there are two distinct sources of variation – (1) intermittent observations and (2) parameter uncertainty. The MCMC sampling procedure accounts for both, allowing for survival functions to be constructed for each iteration of the MCMC sampler using each iterations’ latent process and parameters. Figure 2 presents the pointwise median, %, and % survival at every time since recruitment when the individual is healthy and ill at baseline respectively. The red curves are the true survival function given healthy/ill at baseline. We see that the posteriors for survival functions are almost exactly equal, suggesting intermittent observations did not significantly impact our ability to predict survival of future patients.

Figure 2: Survival functions given baseline state; median (black), 5% and 95% quantiles (dotted black), and true survival function (red)

9 A worked example: Cardiac allograft vasculopathy (CAV) data

To illustrate our methodology, we use data from angiographic examinations of 622 heart transplant recipients at Patworth Hopsital in the United Kingdom. This data was downloaded from the R library http://cran.r-project.org/web/packages/msm maintained by Christopher Jackson. Cardiac allograft vasculopathy (CAV) is a deterioration of the arterial walls. Four states were defined for heart transplant recipients: no CAV , mild/moderate CAV , severe CAV , and dead . The transition graph is given by Figure 3. Yearly examinations occurred for up to 18 years following the transplant. Mean follow-up time, however, is years. Survival times are observed and/or censored, but CAV state was only observed at appointment times prior to death/censoring. For censored recovrds, the censoring time is assumed to be the final appointment time. Out of the patients, Only patients were observed in state (Mild CAV) at any point during their follow-up. Out of these , of these patients were subsequently observed in state . Only patients were observed in state (Severe CAV) at any point during their follow-up. Out of these , of these patients were subsequently observed in state . There was no overlap in the these two patient subsets.

We set , and assume the underlying process is a -composable, exchangeable, Markov multi-state survival process. Parameters are , , and . For identifiability, we set  and equal to one. As transitions from state to occur but should not occur too often, the prior on

is set to a Beta distribution with parameters

and respectively. For parameters  and

, the Gamma prior has hyperparameters

and . Parameters  have independent, standard log-normal priors. We use the approximate MCMC sampler to perform inference.

Figure 3: Cardiac allograft vasculopathy (CAV) transition diagram.

Traceplots in Figure 3(a) suggest convergence after the first 100 iterations. The MCMC sampler gives posteriors for the parameters. The posterior mean of is (i.e., marginal time until transition from state to state is years). Parameters have posterior means , translating into marginal holding times of and years respectively. The posterior mean for is , suggesting that the patient is a bit more likely to experience progression of the CAV status than regression. The posterior mean for is (i.e., marginally the holding time in state until a transition to state is years). This suggests that in state , disease progression is slightly more likely than failure. Parameters have posterior means respectively. This translates marginally into holding times of and years respectively. For state , the distribution of does overlap suggesting that failure transitions from states and may occur at similar rates.

(a) Traceplots for MCMC algorithm
(b) Approximate posterior densities
Figure 4: MCMC traceplots and densities for CAV study

We next consider the posterior distributions for the survival functions. Figure 4(a) plots median survival at each time over all iterations of the MCMC sampler as well as the Kaplan-Meier survival function estimator. Note at baseline, all patients are in state ; therefore, the Kaplan-Meier curve should be compared with the median survival curve given the new patient is in state . We see that the posterior survival curve is significantly lower. This reflects expected disease progression since baseline; that is, a new patient in state at baseline who is alive at time is likely to have seen a progression in their disease state, leading to an increase in their current risk of failure. Under the multi-state survival process model, the expected survival time from baseline given new patient is in state , , and is , , and respectively. Under the Kaplan-Meier estimator, the expected survival time from baseline is . We have included the 5% and 95% quantiles for the survival function at each time when the patient is in state at baseline. Figure 4(b) plots median survival over all iterations of the MCMC sampler given the user is alive at time . Under the multi-state survival process model, the expected survival time from time