Presentation of a novel method to identify the location and number of change-points in a piecewise exponential model.
Method implemented within a freely available R package which produces outputs relevant for health economists conducting survival analysis.
Particularly useful for identifying the final change-point after which the observed hazards are approximately constant.
Addresses some of the key concerns raised in NICE TSD 21 regarding piecewise exponential models.
In survival analysis, one is interested in the lifetime of an individual and its hazard rate function. The hazard rate quantifies the instantaneous failure rate of a subject who has not failed at a given time point. Because the survival probabilities are directly related to the integral of the hazard function, changes in this function over time are of interest in a variety of situations. Parametric survival models allow the hazard function to vary over time (e.g. monotonic change associated with the Weibull distribution), however, there are examples when there appears to be a sudden change in the hazard. In these situations, piecewise exponential models, which allow the hazard to change at distinct time points, but constrain the hazard to be constant within each interval can provide a better fit to the observed data and allow for straightforward interpretation of the hazard function.
Davies et al. (2013) discuss some of the issues regarding extrapolating treatment benefits for new technologies based on limited clinical trial data while Kearns et al. (2019) highlight that standard parametric models may not be flexible enough to accurately fit the observed data. One approach to extrapolating long term survival is noted by Bagust and Beale (2014) who suggest identifying whether there is evidence of long-term linear trends in each arm in the latter part of the data, when transient effects have dissipated. They suggested assessing the cumulative hazard plots, whereby a linear trend indicates a constant hazard, however, because visual inspection is subjective a statistical analysis is more appropriate.
Following Matthews and Farewell (1982), the problem of estimating a single change-point using Maximum Likelihood (ML) estimation, when the hazards are unknown, was extensively studied. Yao (1986) showed that the likelihood is maximised at event times and that a change-point obtained by ML estimation is consistent i.e. it approaches the true change-point value as the sample size increases. Goodman et al. (2011) used a Wald type statistic with an alpha spending function to preserve Type 1 error when considering multiple change-point models while Han et al. (2014) used a likelihood ratio test with backward elimination. Comprehensive reviews of the technical aspects of change-point analysis of hazard functions are provided by Anis (2009) and Müller and Wang (1994).
A key issue for the (frequentist) approaches described above is that the uncertainty in the change-point(s) and hazards are not assessed, which is a key feature of interest in decision making. While Loader (1991) proposed a likelihood ratio method to find confidence regions for the change-point, this has yet to be extended to multiple change-points. Additionally, frequentist procedures rely on null distributions which often require some technical assumptions and conditions that are difficult to verify in practice and may not hold for for small-to-moderate sample sizes. Even in the presence of larger sample sizes, likelihood ratio tests will favour more complex models even when a simpler model fits the data adequately (Raftery, 1986).
In contrast to frequentist methods, Bayesian approaches do not require asymptotics, instead using a set of prior beliefs which are updated using information from an observed sample. Bayesian approaches can readily characterize the uncertainty associated with the hazards and the location of change-points. Arjas and Gasbarra (1994) introduced a Bayesian approach for multiple change-point estimation using Gibbs sampling. Kim et al. (2020) use a stochastic approximation Monte Carlo algorithm to identify which particular number and location of change-points gives the highest log-posterior values. They allow the sampler to move between different change-point models as part of the estimation procedure, however, they do not present the relative probabilities of models with different numbers of change-points. Chapple et al. (2020) estimate piecewise exponential (and piecewise log-linear) models using reversible jump MCMC methods (Green, 1995).
In this paper we introduce a novel method for the estimation of piecewise exponential models with multiple change-points. We apply a reversible jump algorithm to a collapsed change-point model (Wyse and Friel, 2010). Collapsing the model leads to highly efficient MCMC performance and accurate estimates of the number of and location of change-points. Estimation of the hazard rates of individual change-point segments or the overall behaviour of the hazard function is straightforward to compute using a post-hoc routine.
The rest of this article is organized as follows. In Section 2 we discuss the exponential and piecewise exponential models, the latter of which allows for changes in the hazard function. We also describe the approach to estimate the number of changes in the hazard function of a piecewise exponential model. Section 3 presents a simulation study to determine the sample sizes and changes in hazards required for the adequate estimation of the change-point location and frequency. In Section 4 we apply both methods to two real data sets that record time to death of Glioblastoma patients and time to death of heart transplant patients. The former data set has only recently become publicly available and to our knowledge has not been analysed using change-point models before. A discussion of the methods concludes the paper in Section 5. The proposed method is implemented in a R package which can be installed via GitHub (link in Supporting Information).
2 Constant Hazard and Change-point Survival Models
In this section, exponential and piecewise exponential models are discussed. The rationale for the manipulation of time to event data to time between events is also explained. We then present the model which estimates the number and location of the change-points.
2.1 Exponential model
The simplest possible survival distribution is obtained by assuming a constant risk over time, so the hazard is for all . The corresponding survival function is
and is known as the exponential distribution. The density function is then. Consider a sample of observations of survival times being time ordered, some of which may be censored. The likelihood function may be written as
with if the subject failed and 0 if censored.
Taking the natural logarithms, and noting that is equal to the negative cumulative hazard function , we obtain the log-likelihood function
The cumulative hazard is . Letting denote the total number of observed deaths, and denote the total observation (or exposure) time, we can rewrite the log-likelihood as a function of these totals to obtain and
This distribution plays a central role in survival analysis, although it is commonly too simple to be useful in applications in its own right. Therefore, an extension to the exponential model which allows the hazard to change at various intervals called a piecewise exponential model is discussed in the subsequent section.
2.2 Piecewise Exponential Model
A change-point occurs at observation if are generated differently to . In a piecewise constant model with one change-point, this requires that the segments and
have a constant hazard within the segment, but independent hazards between segments. It is assumed that the change-points occur at a particular event time (and not a censoring time). Multiple change-points at specific event times can be denoted as a vector, with these change-points splitting the data into segments. The likelihood of the piecewise exponential model can be formulated as follows
with if the subject was observed to fail and 0 otherwise and where if and 0 otherwise.
By omitting the potential for covariates and restricting ourselves to discrete change-points, it should be noted that there is no loss of information in recasting the time ordered data as times between individual event times. We let be the number of event times and right censored survival times. For notational ease, we assume here that only one individual dies at each time, so that there are no ties in the data, however, the model implementation allows for tied events. Denote the ordered distinct survival times by , so that is the ordered survival time. The set of individuals who are at risk at time will be denoted by (the risk-set), so that is the set of individuals who are event-free and uncensored at a time just prior to . We define as the total (sample) time between events and as
This is the composed of the difference between event times multiplied by the risk set at the event time plus the difference between any censored observations and the previous event time , provided they occurred within the interval .
We can re-express the likelihood of the piecewise exponential model in terms of . Let be a vector representing the number of events which have occurred at each of the elements of , with and . The likelihood of interval is . Censored observations are also allowed, providing exposure time within intervals without an event. The likelihood is then
2.3 Estimation using Collapsing Change-point Approach
Markov chain samplers that jump between models with different numbers of change-points allow us to estimate posterior probabilities for candidate models while also estimating the location of change-points within each model. Introducing priors for the change-point numbers, change-point locations and hazardsrespectively, means that we can treat the number of change-points as a random quantity to be inferred. The model posterior then becomes
Following the approach outlined by Wyse and Friel (2010), if we regard the hazards as nuisance parameters, the posterior density of the change-point number and their respective locations is proportional to
where denotes the marginal likelihood of the data segment. Adopting a common, independent Gamma prior for makes this quantity straightforward to calculate; see the Appendix for full details.
Because the marginal likelihood of each data segment is available in closed form, a switch from to change-points, or vice-versa, does not require the design of a bijective function between support subspaces. Changes to the change-point number are proposed and accepted with Metropolis-Hastings probability where
The ratio of the marginal likelihoods is straightforward to compute and can be expressed as
where the location of the additional change-point is denoted by . When adding a change-point in the proposal step, one of points where there could be a change-point are randomly selected. If this point occurs in segment , segments and are obtained, from which we calculate the marginal likelihoods and prior densities in Equation 4. When deleting a change-point, one of the change-points are randomly selected and becomes the new data segment where before deletion.
The probability of adding a change-point for a model with change-points is , and is the probability of removing a change-point for a model with change-points. Clearly , with and , for the largest change-point number under consideration, with for the other change-point numbers. The proposal one step transition probabilities for the number of change-points are and .
Following the change-point number proposal step, a single change-point location is also sampled at each iteration. One of the change-points is randomly selected, and its location sampled with probability
Regarding priors we assign a for the number of change-points . In the examples that follow, we set . The prior for the change-point locations is as in Fearnhead (2006) with referring to the number of events For the prior for each hazard, we set , and in the case that the timescale was in years, and for timescales in days or months respectively.
Although we integrate out the hazard parameters from the model during this estimation scheme, it is possible to estimate the hazards for a given change-point model using the already sampled change-point locations by simulating draws from the conditional distribution for . In effect, this introduces an extra sampling step, in which the hazards are “uncollapsed” and sampled at each iteration, before once again being collapsed before the change-point number and locations are sampled, albeit this is done in a post-hoc fashion. The conditional distribution
has a gamma distributionwith shape and rate .
3 Simulation Study
We conducted a simulation study to investigate the accuracy with which the model estimated the model hazards, identified the locations of the change-points, and selected the correct number of change-points. We simulated data from models with change-points. For each model we varied the sample size and the characteristics of the hazard function. Data from each scenario was simulated 500 times.
For the no change-point model we considered different amounts of censoring from 0% to 50%. For one change-point models we simulated datasets with increasing and decreasing hazards, varying the difference in the hazard between intervals along with with sample size with a change-point at 0.5, while for two change-point models we also considered bathtub and inverted bathtub hazards with change-points at 0.5 and 1.
We assumed a study follow up of 2 years and observations with a survival time greater than this were censored. For some simulation studies we assessed the impact of censoring within the study. The censoring percentages refer to the expected proportion of events within the study follow up which are censorsed. If a censoring percentage of was required, censoring and event times were generated for (i.e. double the required percentage) of the dataset with the censorsed time following the same piecewise distribution as the event times. This ensured that the censoring of the events occurred with approximately equal probability throughout the study follow up.
The technique presented by Castelloe and Zimmerman (2002) was used to determine the appropriate chain length. The Potential Scale Reduction Factor (PSRF) remained below 1.02 after around 100 iterations of the model, suggesting adequate mixing beyond this time. To ensure convergence model was run for 20,750 iterations with the first 750 discarded for each of the simulation studies detailed below.
3.1 Simulation Study Results
We tested the proposed method’s ability to detect the absence of a change-point. We calculated number of times our models chose the (correct) null model with no change-points for 500 simulated data sets of a particular sample size (), hazard () and degree of censoring (0% or 50%). Table 1 shows that the collapsing model selects the null model approximately 95% of the time, irrespective of the sample size hazard or censoring.
|True hazard||Probability Correct (%)||Censoring (%)|
Results for one and two change-point models are reported in Table 2. These report the frequency that the correct number of change-points were identified, and the average values of
, the posterior mean of the change-point(s) (associated standard error in parentheses) for the change-point model when the correct change-point model was selected. Also reported are, the simulated hazards for each interval. For clarity of exposition we omit the expected posterior mean of the hazards and its standard error, noting that the accuracy of hazard estimation is determined by the accuracy of the change-point locations. Piecewise exponential times for the event times were sampled using the rpwexp function from the R package hesim by Incerti and Jansen (2020).
For the one and two change-point simulations studies, large sample sizes and/or large changes in hazards resulted in the correct model being selected with a high probability. When changes in hazards are relatively large, the correct model is selected with high probability at all samples, while for smaller changes moderate to large samples are required. Similarly, is closer to the the true values of the change-point(s) and has a smaller standard error when there are large differences between the hazards and/or large sample sizes.
|Model||Parameters||= 300||= 500||=1000|
|Increasing Small||0.60 (0.15)||0.56 (0.12)||0.52 (0.07)||0.5,0.75|
|Increasing Large||0.52 (0.04)||0.51 (0.02)||0.5 (0.01)||0.25,0.75|
|Decreasing Small||0.52 (0.14)||0.52 (0.11)||0.51 (0.08)||0.75,0.5|
|Decreasing Large||0.49 (0.05)||0.49 (0.03)||0.5 (0.01)||0.75,0.25|
|Increasing||0.56 (0.1)||0.52 (0.06)||0.51 (0.03)||0.25,0.5,0.75|
|1.19 (0.14)||1.11 (0.13)||1.03 (0.08)|
|Decreasing||0.34 (0.1)||0.42 (0.09)||0.47 (0.06)||0.75,0.5,0.25|
|0.96 (0.1)||1.01 (0.09)||1 (0.05)|
|Bathtub||0.48 (0.03)||0.49 (0.02)||0.5 (0.01)||0.75,0.2,0.75|
|1.02 (0.04)||1.01 (0.02)||1 (0.01)|
|Invert Bathtub||0.51 (0.03)||0.5 (0.01)||0.5 (0.01)||0.2,0.75,0.2|
|0.99 (0.03)||0.99 (0.03)||1 (0.01)|
We applied our methods to real data sets. In our first application, we investigate how the method can be used to explore the behaviour of the hazard. In our second example we assess the performance of the change-point model in comparison with several popular survival models in the context of survival extrapolation.
4.1 Glioblastoma data: Identifying trends in the hazard function
One potential application of hazard change-point analysis is the visualization of the hazard function itself. For each simulation, we “uncollapse” the hazards and conditional on the change-points, plot both the posterior draws and quantiles of the hazard function. We compare the hazard function estimated from the Gibbs sampler with approaches documented byHagar and Dukic (2015), who review a variety of packages used to estimate the hazard in time to event data for the statistical software R.
We consider data relating to survival times for Glioblastoma, a central nervous system cancer in which prognosis remains poor. The data is available using the R package RTCGA developed by Kosinski and Biecek (2020) and contains a sample of 595 patients of which 446 experience death. The median survival is 1 year with approximately of patients surviving after 5 years.
Figure 1 below provides an estimate of the hazard of death for the Glioblastoma data using three approaches. The first approach, coloured in black (twodash line), divides the time interval into bins of equal width (in this case one year intervals), and then estimates the hazard in each bin as the number of events in that bin divided by the number of patients at risk in each interval, with the hazard for that interval (see R package muhaz by Hess and Gentleman (2019)). The second approach uses B-splines from a generalized linear model perspective to estimate a smoothed hazard function along with confidence regions (see R package bshazard by Paola Rebora and Reilly (2018)), coloured in blue (longdash line) with confidence regions in grey. The third approach, plots the posterior median of the hazard function. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
From Figure 1
it appears that the hazard peaks at 1-2 years and then falls thereafter. The posterior distribution of the change-points are concentrated at times 0.85 and 2.25 and their 95% credible intervals do not overlap. The posterior distributions of the hazards also do not overlap with the posterior distribution of an adjacent interval, suggesting a clear change in the hazards between each interval. Using the posterior medians of the parameters we can surmise that there are three distinct intervals; in the first interval up to approximatelyyears, there is a moderately large hazard of approximately . Then from the period to years the hazard peaks around and falls to approximately thereafter. This finding is consistent with the other methods estimating the hazard, but which do not explicitly identify a region of peaked hazard. The finding that the hazard is peaked is consistent with Wang et al. (2015), however, the long term drop in hazards is more pronounced for patients in this dataset.
4.2 Predicting survival by extrapolating constant hazards
Miller and Halpern (1982) presented survival times for 184 patients who received heart transplants. Visual inspection of the cumulative hazard plot suggests that after 1 year the hazards are approximately constant (i.e. linear). Assuming this to be correct, we artificially censored the data at 2 years and fit the piecewise exponential model and other commonly used survival distributions to this data (using the JAGS program by Plummer (2003)). For each model we assessed the statistical fit to the partially observed data and the difference in predicted survival to the fully observed data.
Statistical fit was assessed through two measures, Pseudo-Marginal Likelihood (PML) and Widely Applicable Information Criterion (WAIC) with details on their respective computation available in A. E. Gelfand (1994) and Watanabe (2010).
Similar to the previous section we “uncollapse” the hazards at each simulation and calculate the survival function with the mean posterior survival being the average of these survival probabilities at each timepoint. For the piecewise wise exponential model we found that the 2 change-point model had the highest posterior probability . The posterior mean of the first changepoint is 0.18 years at which the hazard falls from a posterior mean of 1.56 to 0.42. The posterior mean of the second change-point was 0.81 years after which the posterior mean of the hazard was 0.16. Figure 2 highlights that both the piecewise exponential and Weibull provide good estimates of the long term survival (with the generalized gamma reducing to the Weibull distribution), however, the Weibull distribution does not fit as well to the early part of the trial. Table 3 presents the PML and WAIC (evaluated using the R package loo by Vehtari et al. (2020)) for each of the models fit to the partially observed data. Also presented is the Area Under the Curve (AUC) based on the extrapolated survival for each model along with the cumulative sum of the absolute difference (all in years) between fully observed Kaplan Meier (KM) curve and model (Absolute Difference). Consistent with the hypothesis that the long term hazards were approximately constant, the piecewise exponential approach is the best fit to the true data, both in terms of statistical fit and deviation in terms of AUC.
* -2log(PML) was calculated to place it on the same scale as WAIC. Lower values indicate better fit.
We have presented a Bayesian approach to determining the number and location of change-points in a hazard function including the special case were no change-point exists. By employing a Bayesian approach, uncertainty around the number of change-points is automatically computed and is described with a probabilistic interpretation, allowing us to assess the relative evidence of alternative change-point models.
This approach takes advantage of the fact that the marginal likelihood for a piecewise exponential model without covariates can be expressed analytically. By restricting the change-points to be event times we reduce the complexity of the parameter space resulting in a simpler and computationally efficient algorithm. While the approach of Chapple et al. (2020) is clearly more general in that it allows covariates and continuous change-points, we found that in some examples the change-points were highly correlated due to the relative infrequency of moves between model dimensions and that for smaller changes in the hazard change-points were not detected. Based on extensive testing on real world datasets and simulated examples, our approach demonstrated good mixing and rapid convergence throughout, attributes which are often difficult to achieve in problems in which the dimension of the parameter space can vary. This suggests that it is advantageous to adopt the collapsed approach introduced in this paper when covariates are not present.
As with any Bayesian analysis, the inferences are somewhat informed by the choice of prior. We consider a discrete prior on the change-point locations which has the advantage of ensuring that the change-points are not too close together or close to the final events, where there is typically a sparsity of data. Because model selection is based on the evaluation of the marginal likelihood, this calculation can be sensitive to the hyperparameters used, and we provide an approach to specify a hyperprior for thehyperparameter which we describe in Web Appendix 1. The Poisson prior on the change-point number is reasonably robust to alternative specifications, however, in individual examples where posterior model probabilities are similar, it will naturally have some effect. We believe that a rate equal 1 for this Poisson prior is an appropriately parsimonious a priori choice.
Our simulation study demonstrates the ability of the algorithm to detect change-points when sample size and/or change in hazards is large along with the consistency of the estimators. As demonstrated by the no change-point simulation study, the model has a low probability of detecting the presence of change-points when they do not exist.
In this paper we have presented two real world applications of the models. Regarding the Glioblastoma data, our approach segments the hazard function into distinct intervals which may allow greater interpretability of the trends in the hazard function even when not considering the piecewise exponential model for survival extrapolation. In situations where the constant long term hazards are plausible we believe that a piecewise exponential model should be considered. Although we artificially censored the data for the purpose of our example it is reasonable to hypothesize that these heart transplant patients may be subject to different hazards as time progresses. Patients are likely subject to high hazards of death during or immediately after a complex surgical procedure such as a heart transplant. Over the initial number of months patients are likely to be at an elevated risk of transplant rejection and many events may occur over this period. If patients do not reject their transplanted organ, over the long term they are subject to a lower hazard associated with all-cause mortality. With that point in mind piecewise exponential models (and any parametric model) should always be adjusted to ensure the extrapolated hazards do not fall below general population mortality.
Finally we note the relatively recent Technical Support Document (TSD) regarding flexible survival models Rutherford et al. (2020). Regarding the use piecewise exponential models in health technology assessment, they state that “the cut-points for the various intervals may be arbitrary and may importantly influence the results of an analysis” and “splitting the data into sections according to time means that sample sizes are reduced in later segments of the curve”. We believe our approach addresses both of these limitations as firstly the location (and number) of change-points is informed by the data and secondly the prior we use for the change-points reduces the probability that change-points close together or to the final event will be selected. Rutherford et al. (2020) also highlight situations in which the Kaplan Meier survival function is used to represent the initial section of the survival function and an exponential function is adjoined to a predetermined point of the Kaplan Meier. In this situation, our approach could also be used in determining the final change-point and it’s associated uncertainty from which the constant hazard is extrapolated.
6 Declaration of Conflicting Interests
Philip Cooney is an employee of Novartis Pharmaceutical Company and they have sponsored his PhD fees, however, Novartis has no input on the direction, output or content of the PhD. All statements within this document are the authors own opinions and do not represent the views of Novartis Pharmaceutical Company. Arthur White has no conflicts of interest to declare.
- A. E. Gelfand (1994) A. E. Gelfand, D. K. D. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological) 56, 501–514.
- Anis (2009) Anis, M. Z. (2009). Inference on a Sharp Jump in Hazard Rate: A Review. Stochastics and Quality Control 24, 213–229.
Arjas, E. and Gasbarra, D. (1994).
Nonparametric bayesian inference from right censored survival data, using the gibbs sampler.Statistica Sinica 4, 505–524.
- Bagust and Beale (2014) Bagust, A. and Beale, S. (2014). Survival analysis and extrapolation modeling of time-to-event clinical trial data for economic evaluation: An alternative approach. Medical Decision Making 34, 343–351. PMID: 23901052.
- Castelloe and Zimmerman (2002) Castelloe, J. M. and Zimmerman, D. L. (2002). Convergence Assessment for Reversible Jump MCMC Samplers. Technical report, University of Iowa.
- Chapple et al. (2020) Chapple, A., Peak, T., and Hemal, A. (2020). A novel bayesian continuous piecewise linear log-hazard model, with estimation and inference via reversible jump markov chain monte carlo: Na. Statistics in Medicine 39,.
- Davies et al. (2013) Davies, C., Briggs, A., Lorgelly, P., Garellick, G., and Malchau, H. (2013). The “hazards” of extrapolating survival curves. Medical Decision Making 33, 369–380. PMID: 23457025.
- Fearnhead (2006) Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing 16, 203–213.
- Goodman et al. (2011) Goodman, M. S., Li, Y., and Tiwari, R. C. (2011). Detecting multiple change points in piecewise constant hazard functions. Journal of applied statistics 38, 2523–2532.
- Green (1995) Green, P. J. (1995). Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika 82, 711–732.
- Hagar and Dukic (2015) Hagar, Y. and Dukic, V. (2015). Comparison of hazard rate estimation in R.
- Han et al. (2014) Han, G., Schell, M. J., and Kim, J. (2014). Improved survival modeling in cancer research using a reduced piecewise exponential approach. Statistics in medicine 33, 59–73.
- Hess and Gentleman (2019) Hess, K. and Gentleman, R. (2019). muhaz: Hazard Function Estimation in Survival Analysis. R package version 18.104.22.168 (accessed Feb 01, 2021).
- Incerti and Jansen (2020) Incerti, D. and Jansen, J. P. (2020). hesim: Health-Economic Simulation Modeling and Decision Analysis. R package version 0.3.1 (accessed Feb 01, 2021).
- Kearns et al. (2019) Kearns, B., Stevenson, M. D., Triantafyllopoulos, K., and Manca, A. (2019). Generalized linear models for flexible parametric modeling of the hazard function. Medical Decision Making 39, 867–878. PMID: 31556792.
- Kim et al. (2020) Kim, J., Cheon, S., and Jin, Z. (2020). Bayesian multiple change-points estimation for hazard with censored survival data from exponential distributions. Journal of the Korean Statistical Society 49, 15–31.
- Kosinski and Biecek (2020) Kosinski, M. and Biecek, P. (2020). RTCGA: The Cancer Genome Atlas Data Integration. R package version 1.18.0 (accessed Feb 01, 2021).
- Loader (1991) Loader, C. R. (1991). Inference for a Hazard Rate Change Point. Biometrika 78, 749–757.
- Matthews and Farewell (1982) Matthews, D. E. and Farewell, V. T. (1982). On Testing for a Constant Hazard against a Change-Point Alternative. Biometrics 38, 463–468.
- Miller and Halpern (1982) Miller, R. and Halpern, J. (1982). Regression with Censored Data. Biometrika 69, 521–531.
- Müller and Wang (1994) Müller, H. G. and Wang, J.-L. (1994). Change-Point Models for Hazard Functions. Lecture Notes-Monograph Series 23, 224–241.
- Paola Rebora and Reilly (2018) Paola Rebora, A. S. and Reilly, M. (2018). bshazard: Nonparametric Smoothing of the Hazard Function. R package version 1.1 (accessed Feb 01, 2021).
- Plummer (2003) Plummer, M. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling.
- Raftery (1986) Raftery, A. (1986). Choosing models for cross-classifications. American Sociological Review 51, 145.
- Rutherford et al. (2020) Rutherford, M. J., Lambert, P. C., Sweeting, M. J., Pennington, B., Crowther, M. J., Abrams, K. R., and Latimer, N. R. (2020). Flexible methods for survival analysis tsd.
- Vehtari et al. (2020) Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P.-C., Paananen, T., and Gelman, A. (2020). loo: Efficient leave-one-out cross-validation and waic for bayesian models. R package version 2.4.1.
- Wang et al. (2015) Wang, M., Dignam, J. J., Won, M., Curran, W., Mehta, M., and Gilbert, M. R. (2015). Variation over time and interdependence between disease progression and death among patients with glioblastoma on RTOG 0525. Neuro-Oncology 17, 999–1006.
- Watanabe (2010) Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594.
- Wyse and Friel (2010) Wyse, J. and Friel, N. (2010). Simulation-based Bayesian analysis for multiple changepoints.
- Yao (1986) Yao, Y.-C. (1986). Maximum likelihood estimation in hazard rate models with a change-point. Communications in Statistics - Theory and Methods 15, 2455–2466.
Web Appendix A, referenced in Section 5, is available with this paper at the Medical Decision Making website. Method implementation in a R package, as well as a worked example using a simulated dataset can be found at https://github.com/Philip-Cooney/PiecewiseChangepoint.
Incorporating uncertainty in hyperparameters
The marginal likelihood can be sensitive to the hyperparameters and and therefore can influence the posterior distribution of the change-points. To account for this uncertainty and improve the robustness of the results we can introduce a hyperprior on . In an extra sampling step, the hazards can be “uncollapsed” and sampled at each iteration. We place a hyperprior on , with
Simplifying Equation 2 we note that the posterior density of the change-point number and locations is proportional to the likelihood, the prior on the hazards and the hyperprior on .
The marginal distribution of is
This is the kernel of a gamma distribution with shape and rate and is updated once each iteration. The hazards are sampled from a gamma distribution with the current value of before the sampling of a new .
One important practical point relates to the choice of when the data has a particular timescale. If the data is in years we should set to have an expected value of 1 (assuming that in all cases
is set to 1) with variance 1 which is achieved by settingand follows immediately from the properties of the Gamma distribution. If the data is in days and we wish to retain the same prior we require to have an expected value and variance of 365 which is achieved .
a.1 Marginal Likelihood for exponential survival times
The marginal likelihood is sometimes known as the probability of the data and appears as the denominator in Bayes Formula
In our model we consider the data conditional on the hyperparameters and which we denote together as . Therefore the expression becomes
The easiest way to evaluate this integral is indirectly through Bayes formula. Bayes formula is as follows:
The conjugate prior for an exponential likelihood is the gamma distribution. Therefore given hyperparametersand , the posterior is a where is the number of events and is the exposure time within that interval. Letting and and rearranging Bayes formula, it immediately follows that the marginal likelihood is the ratio of the prior normalizing factor divided by the posterior normalizing factor;
Given change-points, we have segments of data and the joint marginal likelihood is the product of these segments.