## 1 Introduction

In many clinical survival studies, a recruited patient’s health status is monitored on either a regular or intermittent schedule until either (1) an event of interest (e.g., failure) or (2) the end of the study window. Covariates are recorded at the time of recruitment, and treatment protocols per patient are presumed known at baseline. In the simplest survival study, health status at time

is a binary variable, dead

or alive . In clinical trials with health monitoring, is a more detailed description of the state of health of the individual, containing relevant patient information, e.g., pulse rate, cholesterol level, cognitive score or CD4 cell count (Diggle et al., 2008; Farewell and Henderson, 2010; Kurland et al., 2009).In this paper, we consider the setting in which the health process takes values in some pre-specified “state-space”. For example, in the illness-death model, we summarize the current state of the participant as taking one of three possible values . Such a process can be thought of as a coarse view of the state of health for a patient over time. The continuing importance of multi-state processes in applications cannot be overstated (Jepsen et al., 2015; van den Hout, 2016).

When no baseline covariates are measured beyond the initial
state , the model for the set of patient state-space processes
should satisfy natural constraints. First, the model should be
agnostic to patient labeling. Second, the model should be agnostic to
sample size considerations.
These natural constraints (mathematically defined in
section 2) lead to the concept of
*exchangeable Markov multi-state survival processes*.
The purpose of this paper is to characterize the
set of multi-state survival processes and show how this theory
of exchangeable stochastic processes fits
naturally into the applied framework of event-history analysis.
Both the parametric continuous-time Markov process with
independent participants and the nonparametric counting
process are contained as limiting cases.
Next, we discuss the notion of “composable systems” and its effect on
model specification. A Markov Chain Monte Carlo (MCMC) algorithm is
then derived for posterior computations given irregularly sampled
multi-state survival data. We end with an application to a cardiac
allograft vasculopathy (CAV) multi-state survival study.

### 1.1 Related work

Odd Aalen was one of the first to recognize the importance of incorporating the “theory of stochastic processes” into an “applied framework of event history analysis” (Aalen et al., 2008, p. 457). Martingales and counting processes form the basis of this nonparametric approach. The fundamental concept of the product-integral unifies discrete and continuous-time processes. Nonparametric methods, however, do not adequately handle intermittent observations. For example, Aalen et al. (2015) consider dynamic path analysis for a liver cirrhosis dataset. In this study, the prothrombin index, a composite blood coagulation index related to liver function, is measured initially at three-month intervals and subsequently at roughly twelve-month intervals. To deal with the intermittency of observation times, Aalen et al. (2015)

use the “last-observation carried forward” (LOCF) assumption. However, such an assumption is unsatisfactory for highly variable health processes, and can lead to biased estimates

(Little et al., 2010). Jepsen et al. (2015) discuss the importance of multi-state and competing risks models for complex models of cirrhosis progression. Here again, the nonparametric approach assumes observation times correspond to transition times of the multi-state process.One alternative is to consider parametric models such as continuous-time Markov processes. Prior work

(Saeedi and Bouchard-Côté, 2011; Hajiaghayi et al., 2014; Rao and Teh, 2013) has focused on estimation of parametric continuous-time Markov processes under intermittent observations. Most parametric models, however, make strong assumptions about the underlying state-space process; in particular, most models assume independence among patients. One implication is that observing sharp changes in health among prior patient trajectories at a particular time since recruitment will not impact the likelihood of a similar sharp change in a future patient at the same timepoint. The proposed approach in this paper balances between the nonparametric and parametric approaches.## 2 Multi-state survival models

In this section we formally define the multi-state survival process and the notions of exchangeability, Kolmogorov-consistency, and the Markov property. We combine these in section 3 to provide characterization theorems for these processes in discrete and continuous-time.

### 2.1 Multi-state survival process

Formally the multi-state survival process, , is a function from the label set into the state space . For now, we assume the cardinality is finite (i.e., ). If the response is in discrete-time, then the process is defined on . If the response is in continuous-time then the process is defined on . Each label is a pair , and the value is an element of corresponding to the state of patient at time .

The distinguishing characteristic of survival processes
is *flatlining* (Dempsey and McCullagh, 2018); that is, there
exists an absorbing state such that
implies for all .
Thus, the survival time for unit is a deterministic function of the
multi-state survival process :

For all , we assume at recruitment, so . Multiple absorbing states representing different terminal events may occur, such as for competing risk processes.

Without loss of generality, we
assume .
For example, if the state-space is ,
we recode this to .
In this case, is the flatlining state.
At each time , the population-level
process is given by .
We write to denote a generic
element of .
We write
to denote the restriction
of the state space process
to .
We call the *-restricted
state-space process* for . We write
to denote a generic element of .

### 2.2 Transition graphs for multi-state survival process

The *transition graph* represents the set of potential
transitions between elements of the multi-state survival process out
of the set of possible transitions.
The transition graph is a directed graph . The vertex
set is all potential states; the directed edge set
contains all edges such that at jump times the Markov process
can jump from to . In the illness-death model, for example, a
patient can jump from Alive to Ill but not back;
therefore (Alive, Ill) is in the edge set but not (Ill,
Alive). Example 2 in supplementary
section C provides additional details. In the
bi-directional illness-death
model, both edges are present in the transition graph. In
continuous-time, jumps can only occur between distinct states, in
which case the number of possible edges is .
In this case, for all . Any absorbing
state satisfies for all . We write to denote the set of by transition
matrices satisfying ,
for all , and for all .
In the continuous-time setting, define .

### 2.3 Consistency under subsampling

We first note that sample size is often an arbitrary choice based on
power considerations and/or patient recruitment
constraints. Statistical models should be agnostic to such
considerations. That is, observing units versus units and
then restricting to the first units should be equivalent, i.e.,
the model should exhibit *consistency under subsampling*.

Consider the multi-state survival process for . Define the
restriction operator to be the restriction of to
the first individuals. Then the process is *consistent under
subsampling* if is a version of for
all . Under the consistency assumption, the
process satisfies *lack of interference*;
mathematically,

where is the -field generated by the variables for and . Lack of interference is essential, ensuring the -restricted multi-state survival process is unaffected by the multi-state survival process for subsequent components. Consistency under subsampling ensures the statistical models are embedded in suitable structures that permit extrapolation. Without it, one is forced into the awkward situation of only being interested in the current collected dataset. Consistency ensures statistical inference to be a special case of “reasoning by induction” (De Finetti, 1972).

### 2.4 Exchangeability

We next note that the patient labels, , are also arbitrary. Therefore, any suitable multi-state survival process must be agnostic to patient relabeling. We define a multi-state survival process to be [partially] exchangeable if for any permutation , the relabeled process is a version of .

### 2.5 Time-homogeneous Markov process

We say is a time-homogeneous Markov process if, for every , the conditional distribution of given the multi-state survival process history up to time , , only depends on and . This Markovian assumption is a simplifying assumption which leads to mathematically tractable conclusions. In this paper, we restrict our attention to time-homogeneous processes; therefore, we simply say is Markovian.

## 3 Markov, exchangeable multi-state survival processes

We define a multi-state survival process that is Markovian, exchangeable, and consistent under sampling as a Markov, exchangeable multi-state survival process. Below, we characterize these processes in both discrete and continuous time. The behavior is markedly different in each setting with continuous-time Markov processes exhibiting much more complex behavior – allowing for both single-unit changes at time as well as positive fraction changes – showing why choice of time-scale matters in applied settings.

### 3.1 Discrete-time multi-state survival models

In discrete-time, the Markov, exchangeable multi-state survival process is governed by a series of random transition matrices

each drawn independently from a probability measure

on . That is, the initial state is drawn from an exchangeable distribution on . Then at time , the transition distributions for each are given byi.e., the entry of , which is a random transition matrix drawn from . Let denote a discrete-time process constructed by this procedure with probability measure . By construction, the process is an exchangeable, Markov multi-state survival process in discrete time. Theorem 1 states that this procedure describes all such processes. The proof is left to supplementary section A.

###### Theorem 1 (Discrete-time characterization).

Let be a Markov, exchangeable multi-state survival process. Then there exists a probability measure on such that is a version of .

### 3.2 Continuous-time multi-state survival models

In continuous-time, the Markov exchangeable multi-state survival process is governed by a measure on transition matrices, denoted , and a set of constants associated with the edge set, denoted . Unlike discrete-time, however, the jumps occur at random times.

Consider the -restricted state space process. If

is the current state then the holding time in this state is exponentially distributed with a rate parameter depending on the current

*configuration*(see Section 5.3). At a jump time , one of two potential events can occur: (a) a

*single unit*experiences a transition from to a state , or (b) a

*subset of*(potentially a singleton) experience a simultaneous transition. Given the transition matrix , these simultaneous transitions are independent and identically distributed transitions according to . The transition matrix is obtained from a measure on and a set of constants ; unlike the discrete-time setting, the measure need not be integrable. Let denote a continuous-time process constructed by this procedure with probability measure .

###### Theorem 2 (Continuous-time characterization).

Let be a Markov, exchangeable multi-state survival process; and be the identity matrix. Then there exists a probability measure on satisfying

(1) |

and constants such that is a version of .

Theorem 2 generalizes Proposition 4.3 in Dempsey and McCullagh (2017) from the simple survival process setting. The multi-state survival process contains many more well-known examples from health and epidemiology – survival, illness-death, competing risk, and co-morbitity processes. We highlight these examples in supplementary section C. Within the discussion of examples, we extend Theorems 1 and 2 to the setting of recurrent events (see Corollaries 2 and 3). Here, Theorem 2 is used to characterize the continuous-time Markov chain in terms of (1) exponential holding rates and (2) transition matrix at jump times. We start by defining the characteristic index – a set of functions .

###### Definition 1 (Configuration vector).

For , define as the *configuration vector* –
an

-vector summary of the number of units in each state. For example, if

, , and , then ; for then .###### Definition 2 (Characteristic index).

For simple survival processes, the characteristic index defined here simplifies to the characteristic index as defined in Dempsey and McCullagh (2017). At a jump-time , the probability of transition from to is

where , is the indicator function, is the single unit to experience a transition, and is the non-normalized transition function.

## 4 Discretization and rounding

It has been argued that “there may be no scientific reason to prefer a true continuous time model over a fine discretization” (Breto et al., 2009, p. 325). We tend to disagree with such a viewpoint; a basic and very important issue in multi-state survival analysis is the distinction between inherently discrete data (coming from intrinsically time-discrete phenomena) and grouped data (coming from rounding of intrinsically continuous data). Theorems 1 and 2 reinforce this distinction as we see distinct characterizations of discrete and continuous-time processes. One example of the former in survival analysis is time to get pregnant, which should be measured in menstrual cycles. The latter represents the majority of multi-state survival data. For this reason, we focus the remainder of this paper on the continuous-time case.

## 5 Description of continuous-time process

### 5.1 Holding times

Let be a jump time at which the state vector transitions into state . To each such state , we associate an independent exponentially distributed holding time. By choosing the rate functions in an appropriate way, the Markov multi-state survival process can be made both consistent under subsampling, and exchangeable under permutation of particles.

###### Corollary 1.

A set of rate functions , is consistent if it is proportional to the characteristic index .

### 5.2 Density function

Since the evolution of the process

is Markovian, it is a straightforward exercise to give an expression for the probability density function for any specific temporal trajectory. The probability that the first transition occurs in the interval

with transition from to iswhere is the non-normalized transition probabilities. Continuing in this way, it can be seen that the joint density for a particular temporal trajectory consisting of transitions with transition times is

(2) |

The number of transitions

is a random variable whose distribution is determined by (

2), and hence by . Note that with probability one for each ; therefore, the transition at time is such that (i.e., all units have failed by time ). By definition and so the integral in equation (2) is finite with probability one.The

-dimensional joint distribution is continuous in the sense that it has no fixed atoms. For

, it is not continuous with respect to Lebesgue measure in . The one-dimensional marginal process is a Markov multi-state survival process with holding rates , assuming a valid starting state. For example, in a survival process the only valid starting state is “Alive”.Although the argument leading to (2) did not explicitly
consider censoring, the density function has been expressed in integral
form so that censoring is accommodated correctly.
The pattern of censoring affects the evolution of ,
and thus affects the integral,
but the product involves only transitions and transition times.
So long as the censoring mechanism is *exchangeability
preserving* (Dempsey and McCullagh, 2017), inference based on the joint density
given by equation (2) is possible. Both simple type I
censoring and independent censoring mechanism preserve exchangeability.

### 5.3 Sequential description

Kolmogorov consistency permits ease of computation for the trajectory of a new unit given trajectories for the first units . The conditional distribution is best described via a set of paired conditional hazard measures, . For , the pair consists of a continuous component in addition to an atomic measure with positive mass only at the observed transition times of .

For a time , not a transition time of , consider the new unit transitioning from state to . Then the continuous component has hazard and cumulative hazard

Note that is piecewise constant, so the integral is trivial to compute, but censoring implies that it is not necessarily constant between successive failures.

Now let be an observed transition time (i.e., ) and consider the atomic measure associated with switching from state to . At each such point, the conditional hazards has an atom with finite mass

or, on the probability scale,

The above calculations define the conditional holding time of the new
unit after it enters state at time (i.e., and
) conditional on .
For , let denote the observed transition
times of within the time-window .
Then, the probability that the unit stays in state for
*at least* time points is

This conditional probability is used to construct the Markov Chain Monte Carlo sampling procedure in Section 7. For any exchangeable, Markov multi-state survival process with absorbing state such that with probability one, for all implies that the continuous components ( for ) have infinite total mass, so the time until reaching an absorbing state is finite with probability one. This implies, for instance, that in the Aalen comorbidities study (see Example 3 in Section C) that the process will terminate in death with probability one.

Although the above conditional hazards look complex, it is not
difficult to generate the transition times sequentially for processes
whose characteristic index admits a simple expression for the above
expressions. Right censoring is automatically accommodated by the
integral in the continuous component, so the observed trajectory
may be incomplete. Below we introduce
the *self-similar harmonic process* – a multi-state survival process
which admits such simple expressions. We use this process as a
building block for more complex models in further sections.

### 5.4 Self-similar harmonic process

Theorem 2 proves tied failures
are an intrinsic aspect of Markov multi-state survival processes.
As stated in Section 4, grouped data
usually are the result of rounding of intrisically continous data.
For these processes to be useful in biomedical applications, it is
essential that the model should not be sensitive to rounding.
Sensitivity to rounding is addressed by restricting attention to
processes whose conditional distributions are *weakly
continuous*, i.e., a small perturbation of the transition times
gives rise to a small perturbation of the conditional distribution.

Dempsey and McCullagh (2017) originally studied
this question in the context of exchangeable, Markov survival
processes. In particular, it is shown that the *harmonic process*
is the only Markov survival process with weakly-continuous conditional
distributions.
Here, we extend the harmonic process to a multi-state survival process
by associating with each edge an independent harmonic
process with parameters .
For let denote
the unique observed transition times from to
for and let ; then the continuous component of the hazard
is given by:

where the sum runs over transition times , and is the last such event. The discrete component is a product over transition times

(3) |

For small , the combined discrete components above are essentially the same as the right-continuous version of the Aalen-Johansen estimator.

We call this process the self-similar harmonic process with transition graph . The associated measure on is

where is the single non-zero, off-diagonal entry, is the indicator function, and are the associated parameters.

While the self-similar harmonic process has strong appeal for use in applied work, we show below that it is not a universally optimal choice. The strong assumption embedded in the self-similar harmonic process is that the transition processes are independent across edges . This implies that at each transition time only transitions along a single edge are possible. We argue below that while this may make sense in specific instances, additional care is needed in writing down appropriate models for multi-state survival processes in general.

## 6 Composable multi-state survival models

We now discuss constraints on the multi-state survival models based on decompositions of the state-space . We use the bidirectional illness-death process as an illustrative example. Recall this process has three states, {Healthy, Ill, Dead} (i.e, ); see Example 2 for further details. The state “Dead” is unique and distinct from the states “Healthy” and “Ill”. Indeed, the latter two states require the individual to be categorized more broadly as alive, and are thus refinements of this more general state (i.e. “Alive”). Suppose the labels “Healthy” and “Ill” were uninformative with respect to failure transitions. Then the refinement is immaterial, and the transition rules should collapse to the transition rule for an exchangeable, Markov survival process.

The above discussion leads to two natural constraints:
(1) the state “Dead” (State 3) is unique and distinct from the other
states, and (2) the states “Healthy” (State 1) and “Ill” (State
2) should be considered *partially
exchangeable* (De Finetti, 1972). To satisfy this,
we require particular constraints on the measure on transition matrices. First, the measure must only take positive mass
on one of two sets of transition matrices: (a) with off-diagonal
positive mass only in entries and/or , or (b) with
off-diagonal positive mass only in entries and/or . The
first set are transition matrices representing transitions from
“Healthy” or “Ill” to “Dead”. The second are transitions between
“Healthy” and “Ill” or vice versa.
The partition splits the
space into two disjoint sets. Within each block the states are
partially exchangeable. We say that the bidirectional illness-death
process is thus relatively partially exchangeable with respect to the
partition .

Let denote the number of individuals in states “Healthy” and “Ill” directly preceding a transition of type (a). Then the probability that and individuals respectively transition to the state “Dead” is proportional to , where is the measure restricted to type (a) transition matrices. That is, puts positive mass on transition matrices such that so for , and (i.e., “Dead” is an absorbing state). In this paper, we focus on the following choice of the restricted measure:

A similar formula exists for transitions of type (b). The choice corresponds to a proportional model on the logarithmic scale. It strongly links and with baseline measure equivalent to that for a harmonic process. In supplementary section D, we provide more details on the connection to the proportional conditional hazards model. We now consider extending this approach to construction of the measure for any multi-state survival process.

### 6.1 Composable multi-state survival process

We now generalize the above by introducing -composable processes.

###### Definition 3.

A multi-state survival process is *-composable* if
there exists a partition of the
state-space such that elements within block are
*partially exchangeable* with respect to transition graph .

In other words, a -composable process is any process that is relatively partially exchangeable with respect to . For the bi-directional illness-death process (ex. 2), . For comorbidities (ex. 4), partitions the risk processes. Two risk processes in the same partition may represent the same underlying process. For the competing risks (ex. 4), where partition the absorbing states and the single state, “Alive”, is distinct which implies . Every process is composable via the degenerate partition where each is a singleton and . The partition accounts for absorbing states, , by satisfying for , where is the component of to which state belongs.

If is -composable and is a refinement of the partition , then is also -composable. To avoid confusion, from here on, when we say is -composable, we assume there does not exist a refinement such that is also -composable. Definition 3 is similar in spirit to that of Schweder 2007—both aim to formalize the notion that state changes in the process are due to changes in different components.

### 6.2 Choice of measure for a -composable process

Here, we construct an appropriate measure for a
-composable, Markov, exchangeable multi-state survival process.
The measure will take positive mass only on transitions from states
within to states within for a single choice of indexing components of the partition .
For each component , let denote a
*representative state*.
Then, for , define the restricted measure on
transitions from states in to states in , , by

(4) | ||||

(5) | ||||

(6) |

where , ,
, and
such that .
Here, the assumption is for , and .
Lines (4) and (5) build the general
measure from a baseline harmonic measure and the assumption of
proportionality on the logarithmic scale for , where the
proportionality constant depends on .
Note is set to 1 by design, and so the
parameters measure risk relative to the chosen *representative
state*.
Line (6) addresses the fact that a single state can transition to multiple states in .
Here, we choose an atomic measure located at , where the
parameters satisfy . These parameters lead to a multinomial
distribution across transitions. We take this approach because it
leads to simple Gibbs updates of the underlying parameters.
In the example presented in section 9

, Mild CAV status can transition to either No CAV or Severe CAV status. This is captured by presupposing a binomial distribution over these transitions.

Note, for , the above construction yields the self-similar harmonic process. For , the above construction yields the bi-directional illness-death process from Section 6.

## 7 Parameter estimation

In practice, the patient’s health status is typically measured at recruitment (), and regularly or intermittently thereafter while the patient is under observation. A complete observation on one patient consists of the appointment schedule , multi-state process measurements , and a failure/censoring time , and a censoring indicator . For censored records, and the censoring time is usually, but not necessarily, equal to the date of the most recent appointment or end of study.

Here, we assume *non-informative observation times*. In
particular, given previous appointment times and observation values , the next
appointment time satisfies

(7) |

In other words, the conditional distribution of the random interval may depend on the observed history but not on the subsequent health trajectory. This assumption combined with variational independence implies the component of the likelihood associated with the appointment schedule can be ignored for maximum likelihood estimation of parameters associated with the multi-state survival process. Under (7), we propose a Markov Chain Metropolis Hastings algorithm for posterior computation.

### 7.1 Uniformization

Uniformization (Jensen, 1953; Hobolth and Stone, 2009) is a well-known technique for generating sample paths for a given Markov state-space process. Take a time-homogeneous, continuous-time Markov process with transition matrix (i.e., a matrix satisfying and ). A sample path can be generated via Gillespie’s algorithm: (1) given in state , generate a random exponentially distributed holding time with rate ; (2) after time steps, randomly transition to a new state with probability . Alternatively, a sample path can be generated by uniformization with parameter : (1) generate sequence of ’potential’ transition times from a homogeneous Poisson process with intensity ; (2) run a discrete-time Markov chain with transition matrix on the times to generate sequence of states ; (3) construct by removing elements of where the process does not transition to a new state, and construct to be the sequence of states at times . The sample path is represented by . Both algorithms generate sample paths from the same Markov state-space process; however, uniformization is highly adaptable to Markov Chain Monte Carlo techniques. See Rao and Teh (2013) for an excellent discussion of uniformization-based MCMC for Markov processes.

### 7.2 The MCMC Algorithm

Direct application of existing Gibbs samplers based on uniformization to our setting is problematic due to combinatorial growth in the state-space and time-inhomogeneity of the conditional Markov process. Luckily, this approach can be adjusted using the sequential description from Section 5.3 as guidance. In this section, we derive an MCMC algorithm for posterior computations given irregularly sampled multi-state survival data.

#### 7.2.1 Prior specification and MCMC updates

We start by specifying priors on the parameters , and . We write to denote the complete set of parameters. We write , , et cetera to denote each subset of parameters. Recall for identifiability reasons for . For all other pairs , we take the prior to be a log-normal mean-zero distribution . Weakly informative default priors are an alternative Gelman et al. (2008); this corresponds to a log-Cauchy prior with center 0 and scale parameter . However, we saw minimal differences in simulation studies for our current setting. The complete-data likelihood is non-conjugate, so we perform Metropolis-Hastings updates, conditional on both the complete trajectory and all other parameters.

For , we follow Dempsey and McCullagh (2017) and take as a fixed tuning parameter. Next, define . Scaling by allows direct comparison of across various choices of the tuning parameter. We choose a Gamma prior, , which is conjugate to the complete-data likelihood. The posterior distribution conditional on the complete trajectory and other parameters is

(8) |

where is the number of transition between blocks and . Finally, for consider transitions to partition . Index states in such that a transition from is possible by . Then the prior for is a Dirichlet distribution with parameters . Then the posterior is conjugate and

(9) |

where counts the number of transitions from state to state in .

#### 7.2.2 Conditional sampling patient trajectories

We now adapt uniformization to construct a Gibbs sampling of the patient trajectory given all other trajectories , parameters , and the previous trajectory for this patient . For patient , we have observation . The appointment schedule is an ordered sequence where if and only if . By the Markov property, we can focus on generating the patient trajectory for fixed parameters for each interval separately. Let be the unique transition times for all other patients within the interval . Denote this set of transition times . At each time , define for some constant . By definition, this is a piecewise constant function that changes at unique transition times or censoring times. Next, sample a Poisson process with piecewise-constant rate , where is the continuous component of the conditional distribution. Finally, let denote the transition times from the previous trajectory . Then let denote the union of these times. We can then apply the forward-filtering, backward sampling algorithm with transition matrix at times and transition matrix at times . This combines the standard Gibbs sampling based on uniformization with the added complexity of the atomic component associated with the conditional distributions for exchangeable, Markov multi-state survival processes.

#### 7.2.3 MCMC procedure

In each iteration, we proceed sequentially through patients, sampling a latent multi-state path for patient given all other latent processes as discussed in Section 7.2.2. Then, conditional on the multi-state process , we perform Metropolis-Hastings updates for as the complete-data likelihoods are easy to compute via equation (2). We end each iteration by using equation (8) and (9) to sample from the posterior for and respectively. One issue with this procedure is sequential sampling of the latent process is computationally expensive and can significantly slow down the MCMC procedure. To address this issue, we also propose an approximate MCMC algorithm in which the latent processes are only updated every few iterations. We see performance is not significantly altered, and run time drops significantly. In the illustrative example below, we show iterations of the MCMC procedure where the latent process updates runs every iterations has similar perforance as the standard MCMC procedure with significant decrease in overall runtime.

## 8 MCMC procedure: a simulation example

We illustrate the sampling procedure on a bi-directional illness-death model (example 2). We set parameters as follows: first, we assume that marginally a healthy participant transitions to ill and dead after and years respectively (on average); ill participants to both healthy and dead on average every years respectively. Both healthy and ill participants took on average years to transition to failure. We assume a sample of individuals, with initially healthy and initially unhealth, were generated.

First, assume all transitions are observed. Maximum likelihood estimation is performed. Next, assume the state of each individual is observed annually, with the transition time to failure observed. Traceplots in Figure 0(a) suggest convergence of the MCMC procedure after the first 100 iterations. The MCMC sampler gives posteriors for the parameters. Table 1 contains these estimates. We see good performance for , , and . The posterior for

reflects the observation schedule; indeed, increasing the frequency of observation significantly improves these posteriors. In particular, under complete observations, the posterior distributions are approximately equal in distribution to the asymptotically normal confidence intervals.

Maximum Likelihood | Posterior distribution | ||||||
---|---|---|---|---|---|---|---|

Parameter | True Value | Estimate | Lower CI | Upper CI | Mean | 5% Quantile |
95% Quantile |

0.50 | 0.53 | 0.43 | 0.64 | 0.15 | 0.09 | 0.23 | |

0.20 | 0.19 | 0.13 | 0.25 | 0.20 | 0.15 | 0.25 | |

0.70 | 0.75 | 0.64 | 0.86 | 0.85 | 0.67 | 1.07 | |

1.71 | 1.71 | 1.28 | 2.13 | 1.23 | 1.62 | 2.09 |

Removing the first 100 iterations as burn-in, posterior distributions are presented in Figure 0(b). Black curves are the MCMC sampler procedure; red curves are the approximate MCMC sampling procedure with latent process updates every iterations. We see distributions are approximately equal in all cases, with largest errors for . This supports the aforementioned difficulty in estimating due intermittent observations. This is a consequence of the observation schedule being infrequent compared to the underlying stochastic dynamics. Complete observations (i.e., more frequent observations) significantly improves estimation of .

Beyond posterior distributions for parameters, one is typically interested in posterior distributions of the survival functions. Note, there are two distinct sources of variation – (1) intermittent observations and (2) parameter uncertainty. The MCMC sampling procedure accounts for both, allowing for survival functions to be constructed for each iteration of the MCMC sampler using each iterations’ latent process and parameters. Figure 2 presents the pointwise median, %, and % survival at every time since recruitment when the individual is healthy and ill at baseline respectively. The red curves are the true survival function given healthy/ill at baseline. We see that the posteriors for survival functions are almost exactly equal, suggesting intermittent observations did not significantly impact our ability to predict survival of future patients.

## 9 A worked example: Cardiac allograft vasculopathy (CAV) data

To illustrate our methodology, we use data from angiographic examinations of 622 heart transplant recipients at Patworth Hopsital in the United Kingdom. This data was downloaded from the R library http://cran.r-project.org/web/packages/msm maintained by Christopher Jackson. Cardiac allograft vasculopathy (CAV) is a deterioration of the arterial walls. Four states were defined for heart transplant recipients: no CAV , mild/moderate CAV , severe CAV , and dead . The transition graph is given by Figure 3. Yearly examinations occurred for up to 18 years following the transplant. Mean follow-up time, however, is years. Survival times are observed and/or censored, but CAV state was only observed at appointment times prior to death/censoring. For censored recovrds, the censoring time is assumed to be the final appointment time. Out of the patients, Only patients were observed in state (Mild CAV) at any point during their follow-up. Out of these , of these patients were subsequently observed in state . Only patients were observed in state (Severe CAV) at any point during their follow-up. Out of these , of these patients were subsequently observed in state . There was no overlap in the these two patient subsets.

We set , and assume the underlying process is a -composable, exchangeable, Markov multi-state survival process. Parameters are , , and . For identifiability, we set and equal to one. As transitions from state to occur but should not occur too often, the prior on

is set to a Beta distribution with parameters

and respectively. For parameters and, the Gamma prior has hyperparameters

and . Parameters have independent, standard log-normal priors. We use the approximate MCMC sampler to perform inference.Traceplots in Figure 3(a) suggest convergence after the first 100 iterations. The MCMC sampler gives posteriors for the parameters. The posterior mean of is (i.e., marginal time until transition from state to state is years). Parameters have posterior means , translating into marginal holding times of and years respectively. The posterior mean for is , suggesting that the patient is a bit more likely to experience progression of the CAV status than regression. The posterior mean for is (i.e., marginally the holding time in state until a transition to state is years). This suggests that in state , disease progression is slightly more likely than failure. Parameters have posterior means respectively. This translates marginally into holding times of and years respectively. For state , the distribution of does overlap suggesting that failure transitions from states and may occur at similar rates.

We next consider the posterior distributions for the survival functions. Figure 4(a) plots median survival at each time over all iterations of the MCMC sampler as well as the Kaplan-Meier survival function estimator. Note at baseline, all patients are in state ; therefore, the Kaplan-Meier curve should be compared with the median survival curve given the new patient is in state . We see that the posterior survival curve is significantly lower. This reflects expected disease progression since baseline; that is, a new patient in state at baseline who is alive at time is likely to have seen a progression in their disease state, leading to an increase in their current risk of failure. Under the multi-state survival process model, the expected survival time from baseline given new patient is in state , , and is , , and respectively. Under the Kaplan-Meier estimator, the expected survival time from baseline is . We have included the 5% and 95% quantiles for the survival function at each time when the patient is in state at baseline. Figure 4(b) plots median survival over all iterations of the MCMC sampler given the user is alive at time . Under the multi-state survival process model, the expected survival time from time