Variational Inference of Joint Models using Multivariate Gaussian Convolution Processes

by   Xubo Yue, et al.

We present a non-parametric prognostic framework for individualized event prediction based on joint modeling of both longitudinal and time-to-event data. Our approach exploits a multivariate Gaussian convolution process (MGCP) to model the evolution of longitudinal signals and a Cox model to map time-to-event data with longitudinal data modeled through the MGCP. Taking advantage of the unique structure imposed by convolved processes, we provide a variational inference framework to simultaneously estimate parameters in the joint MGCP-Cox model. This significantly reduces computational complexity and safeguards against model overfitting. Experiments on synthetic and real world data show that the proposed framework outperforms state-of-the art approaches built on two-stage inference and strong parametric assumptions.



There are no comments yet.


page 1

page 2

page 3

page 4


Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction

Missing data and noisy observations pose significant challenges for reli...

Multi-output Gaussian Process Modulated Poisson Processes for Event Prediction

Prediction of events such as part replacement and failure events plays a...

Functional Principal Component Analysis for Extrapolating Multi-stream Longitudinal Data

The advance of modern sensor technologies enables collection of multi-st...

Flexible Triggering Kernels for Hawkes Process Modeling

Recently proposed encoder-decoder structures for modeling Hawkes process...

Sparse Gaussian Process Modulated Hawkes Process

The Hawkes process has been widely applied to modeling self-exciting eve...

Predictive Modeling of Multivariate Longitudinal Insurance Claims Using Pair Copula Construction

The bundling feature of a nonlife insurance contract often leads to mult...

Predicting the evolution of stationary graph signals

An emerging way of tackling the dimensionality issues arising in the mod...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, the multivariate Gaussian process (MGP) has drawn significant attention as an efficient non-parametric approach to predict longitudinal signal trajectories (Dürichen et al., 2015; Moreno-Muñoz et al., 2018; Kontar et al., 2018b). The MGP draws its roots from multitask learning where transfer of knowledge is achieved through a shared representation between training and testing signals. One neat approach that achieves this knowledge transfer, employs convolution processes to construct the MGP. Specifically, each signal is expressed as a convolution of latent functions drawn from a Gaussian process (GP). Commonalities amongst training and testing signals are then captured by sharing these latent functions across the outputs (Titsias & Lawrence, 2010; Álvarez et al., 2010; Álvarez & Lawrence, 2011). Consequently, the multiple signals can be expressed as a single output from a common multivariate Gaussian convolution process (MGCP). Indeed, many recent studies have demonstrated the MGCP ability to account for non-trivial commonalities in the data and provide accurate predictive results (Zhao & Sun, 2016; Guarnizo & Álvarez, 2018; Cheng, 2018).

Figure 1: Joint modeling of longitudinal and time-to-event data.

In this article we exploit the MGCP to explore the following question: can we use both time-to-event data (also known as survival data) along with longitudinal signals to obtain a reliable event prediction? This is illustrated in Figure 1

. As shown in the figure, our goal is to utilize both survival data and longitudinal signals from training units to predict the survival probability of a partially observed testing unit. Naturally, the aforementioned question is often encountered in a wide range of applications, including: disease prognosis in clinical trials, event prediction using vital health signals from monitored patients at risk, remaining useful life estimation of operational units/machines and failure prognosis in connected manufacturing systems (e.g., nuclear power plants)

(Tsiatis et al., 1995; Bycott & Taylor, 1998; Gasmi et al., 2003; Pham et al., 2012; Gao et al., 2015; Soleimani et al., 2018).

In order to link survival and longitudinal data, state-of-the-art methods have focused on joint models. The seminal work of Rizopoulos (Rizopoulos, 2011, 2012) laid a foundation for joint models where a linear mixed effects model is used to model longitudinal signals. The coefficients of the mixed model are then used in a Cox model to compute the probability of event occurrence conditioned on the observed longitudinal signals. This idea provided the bases for many extensions and applications in the literature (Crowther et al., 2012; Wu et al., 2012; Zhu et al., 2012; Crowther et al., 2013; Proust-Lima et al., 2014; He et al., 2015; Rizopoulos et al., 2017; Mauff et al., 2018). It is important to note here that joint methods are in general built using a two-stage inference procedure. In two-stage inference, features from the longitudinal data are first learned, these estimated features are then inserted into a survival model to predict event probabilities. Indeed, many papers have shown that this two-stage procedure can produce competitive predictive results (Wulfsohn & Tsiatis, 1997; Yu et al., 2004; Zhou et al., 2014; Mauff et al., 2018). Nevertheless, the foregoing works are based on strong parametric assumptions where signals are assumed to follow a specific parametric form and all the signals (training and testing) exhibit that same functional form. In other words, signals behave according to a similar trend but at different rates (i.e., different parameter values). However, parametric methods are restrictive in many applications and if the specified form is far from the truth, predictive results will be misleading. Furthermore, the assumption that all signals possess the same functional form may not hold in real-life applications. For instance, units operated under different environmental conditions may exhibit different signal evolution rates and trends (Yan et al., 2016; Kontar et al., 2018a)

. Some recent efforts aimed to relax strong parametric assumptions using splines, continuous time Markov chains and the GP. Unfortunately, these methods still assume homogeneity across the population and focus on merely imputing the longitudinal data rather than predicting signal evolution within a time interval of interest

(Dempsey et al., 2017; Soleimani et al., 2018). We here note that there has been some recent attempts at rebuilding the Cox model using a GP (Fernández et al., 2016; Kim & Pavlovic, 2018). However these approaches are only based on survival data and do not handle joint modeling, which is the focus of this article.

To address the aforementioned challenges, we propose a flexible joint modeling approach denoted as MGCP-Cox. Our approach exploits the MGCP to model the evolution of longitudinal signals and a Cox model to map time-to-event data with longitudinal data modeled through MGCP. Event occurrence probability is then derived within any future interval as shown in Figure 1. We also propose a variational inference framework using pseudo-inputs (Snelson & Ghahramani, 2006; Damianou & Lawrence, 2013) to simultaneously estimate parameters in the joint MGCP-Cox model. This facilitates scalability to large data settings and safeguards against model overfitting. Finally, the advantageous features of the proposed method are demonstrated through numerical studies and a case study with real-world data in the application to finding the remaining useful lifetime of NASA Aero-propulsion engines.

The rest of the paper is organized as follows. In section 2 we briefly review survival analysis. In section 3, we present our joint modeling framework and the variational inference algorithm. Numerical experiments using synthetic data and real-world data are provided in section 4. Finally, section 5 concludes the paper with a brief discussion. A detailed code and the used real-world data are available in the supplementary materials.

2 Background: Survival Analysis

In this section, we will briefly review survival analysis which will be used for event prediction in the joint model. Survival analysis is a branch of statistics for analyzing time-to-event data and predicting the probability of occurrence of an event. For each individual unit , the associated data is , where is the event time (the unit failed at time or was censored at time ), is an event indicator ( indicates the unit has failed/censored), are the noisy observed longitudinal data (e.g., vital signals collected from patients) corresponding to the underlying latent values , and

is a set of time-invariant features (e.g., patient’s gender). Typically, the continuous random variable

is characterized by a survival function which represents the probability of survival up to time . Another important term is the hazard function and can be thought of as the instantaneous rate of occurrence of an event at time . It is easy to show that . The term is called cumulative hazard function and is denoted by . The basic scheme of survival analysis is to find suitable models to explain relationships between the hazard function and collected data . These models are defined as survival models.

Many survival models have been developed to analyze time-to-event data. They typically model the hazard function as a function of some time-varying and fixed features. One class of prevailed survival models is called the Cox model (Cox, 1972), which has the form , where is a baseline hazard function shared by all individuals, and is typically modeled by the Weibull or a piecewise constant function,

is a vector of coefficients for the fixed covariates (features),

is the feature estimated by a longitudinal model (e.g., linear mixed model, Gaussian Process), and

is a scaling parameter for the time-varying covariates. Parameters in the Cox model are typically estimated by maximizing the log-likelihood function


For a comprehensive review of survival models, see (Kalbfleisch & Prentice, 2011). Given an estimate of parameters from the Cox model, we can then obtain the event (failure) probability within a future time interval given the fact that the testing unit survives non-shorter than the current time instance . This probability, denoted , is estimated as follows:


where and are features for a testing unit . Note that in Figure 1 we show the survival curve which is defined as , where .

3 Joint Modeling and Variational Inference

3.1 The multivariate Gaussian convolution process (MGCP)

Assume data have been collected from units and let denote the set of all units. For unit , its associated data is . The observed longitudinal signal is denoted by , where represents the number of observations and denotes the inputs. We decompose the longitudinal signal as , where is a mean zero GP and denotes additive noise with zero mean and variance.

To obtain an accurate predictive result, we need to capture the intrinsic relatedness among signals. Particularly, we resort to the convolution process to model the latent function . We consider independent latent functions and different smoothing kernels . The latent functions are assumed independent GPs with covariance . We set to be scaled Gaussian kernels and to be squared exponential covariance functions (Álvarez & Lawrence, 2009).

Figure 2: Illustration of a convolution Process where two latent functions are shared across four outputs.

The GP is then constructed by convolving the shared latent functions with the smoothing kernel as shown in (4). This is the underlying principle of the MGCP, where the latent functions are shared across different outputs through the corresponding kernels

. Since convolutions are linear operators on a function and since the latent function, a GP, is shared across multiple outputs then all outputs can be expressed as a jointly distributed GP, an MGCP. As shown in Figure

2, a key feature is that information is shared through different parameters encoded in the kernels . Outputs then can possess both shared and unique features. Thus, accounting for heterogeneity in the longitudinal data.


Based on equation (4), the covariance function between and , and the covariance function between and , can be calculated as


where and .

Now denote the underlying latent values as , where . The density function of can be obtained as , where sized is the covariance function. The likelihood of involves inverting the large matrix . This operation has computational complexity and storage requirement . To alleviate computational burden, we introduce pseudo-inputs from the latent functions denoted as where .

Since the latent functions are GPs, then any sample

follows a multivariate Gaussian distribution. Conditioned on

, we next sample from the conditional prior . In equation (4), can be approximated well by the expectation as long as the latent functions are smooth (Álvarez & Lawrence, 2011). Denote by

. The probability distribution of

can be expressed as , where is a block-diagonal matrix such that each block is associated with the covariance of in (3). By multivariate Gaussian identities, the probability distribution of conditional on is


where . Therefore, can be approximated by , which is given as


By equation (7), can then be approximated by .

3.2 Joint Model and Variational Inference

Now following our convolution construction in (4), the hazard function at time is given as


This key equation links the MGCP to the Cox model. We begin with presenting the log-likelihood of the joint model given observed data . The marginal log-likelihood function is


We would like to provide a good approximation of by introducing an evidence lower bound (ELBO) . This bound is calculated by finding the Kullback-Leibler (KL) divergence between the variational density and the true posterior density . Specifically,


The variaitonal density is assumed to be factorized as


Maximizing the ELBO with respect to

and hyperparameters from the MGCP-Cox model can achieve purposes of variational inference and model selection simultaneously

(Kim & Pavlovic, 2018). By equation (10),


Furthermore, we can decompose , where , and . Based on equation (12), the MGCP propagates uncertainties through the latent processes to the Cox model.

It is desirable to find a closed form of the ELBO in equation (12). Since and are both Gaussian, we can obtain


where is a trace operator. Therefore, the ELBO can be simplified as


We compute the optimal upper bound of by reversing Jensen’s inequality. This gives an optimal distribution and


where . can be thought of as a penalization term that regularizes the estimation of the parameters. Note that the first two terms in equation (15) can be computed in (Snelson & Ghahramani, 2006).

We will present a solution to solve the last integration in equation (15) in the following section.

3.3 Variational Inference on Cox Model

Parameters in the Cox model can be attained by maximizing the following log-likelihood function:


In equation (15), we obtain the optimal to maximize the ELBO. In this section, we will use to approximate . Specifically, the optimal has the form


It is easy to show that

has the normal distribution with parameter

. Specifically,



The last integration in equation (15) can be simplified to


The first term in equation (19) can be calculated analytically. For each unit ,


where In the last step, we applied the Fubini’s theorem to interchange integrals. The second term in equation (19

) can be estimated by the moment generating function (MGF) and the numerical integration. For each unit



where , and . We can assume to be an exponential function , where , are parameters to be learned and when because units are not subject to risk before the first failure event. Note that the baseline hazard is non-decreasing with time and thus we add one constraint . To obtain a good baseline hazard prediction given the estimated , we can calculate the cumulative hazard at time point as , where , and is the -th smallest element in . Then we fit a regularized smooth spline to (Ruppert, 2002). The predicted baseline hazard at can be estimated by (Rosenberg, 1995).

The is maximized with respect to the parameters , , , , , , where , by the gradient-based method. Specifically, We can obtain the optimal parameters by maximizing subject to .

3.4 Event Prediction

Without loss of generality, we focus on predicting the event occurrence probability for unit . Suppose observations from the testing unit have been collected up to time . The survival model computes the event probabilities conditioned on the predicted longitudinal features . Given estimated parameters, and following (2), we are interested in calculating


Based on equation (22), the accurate extrapolation within is essential. In the MGCP, the predictive distribution for any new input point is given by


where , and . We have used as a notation to indicate when the covariance matrix is evaluated at the . Consequently, the predicted signal at the time point for unit is .

4 Experiments

We conduct case studies to demonstrate the performance of our proposed methodology, denoted as MGCP-Cox. Both synthetic and real-world data are used. We also provide an illustrative example in Figure 3 to demonstrate the benefits of the MGCP-Cox model.

Figure 3: Results of extrapolations and survival probabilities prediction for two testing units. Figures in the first row show results of extrapolations based on different observation percentiles. Figures in the second row show true and predicted survival curves.

4.1 Data setting

For the synthetic data we assume that the underlying true path for unit has the form , where , and with and where Without loss of generality, we assume that the time unit is month and that signals were obtained regularly at each month up to their failure or censoring time. An example of the signals is shown in the top row of Figure 3. For each unit we specify a time-invariant feature

generated by a Bernoulli distribution with

. In the Cox model, we use the Weibull baseline hazard rate function with and . We generate the failure time

for each unit by rejection sampling using its probability density function

. We set and . Also, we randomly select of the units to be right censored. The number of units generated is and the experiment is repeated for times. Detailed code for data generation is provided in the supplementary materials.

For the real-world case study we use the C-MAPSS dataset provided by the National Aeronautics and Space Administration (NASA). The dataset contains failure time data of aircraft turbofan engines and degradation signals from informative sensors mounted on these engines. Note that in our analysis we standardize all sensor data. We refer readers to Saxena & Goebel (2008) and Liu et al. (2013) for more details about the data. The C-MAPSS data publically available at:

4.2 Baselines and Evaluations

We focus on predicting the event probability within a future time interval . We consider months in this simulation study. Prediction performance at varying time points for the partially observed unit is then reported. The time instant is defined as the -observation percentile, where is the failure time of unit . The values of are specified as , . Figure 3

shows some examples of units observed up to different percentiles of their failure time. Further, in our simulation studies, we benchmark our method with three other reference methods for comparison: (1) Logistic Regression (LR) classifier: in the LR, event data is transformed into binary labels

denoting whether units failed or not within the time interval . The time-fixed covariate and the last observed signal measurement at

are used as the model predictors (2) Support Vector Machine (SVM) classifier: the SVM here is used as a flexible alternative to the LR. We use the radial basis kernel and determine parameters using

-fold cross-validation on the training data (3) Parametric Joint Model (LMM-Joint): we implement a state-of-the-art joint modeling algorithm using the linear mixed-effect model. The LMM-joint uses a general polynomial function whose corresponding degree is determined through an Akaike information criteria to model the signal path. Note that this framework estimates parameters from the mixed-effect model and the Cox model separately (Rizopoulos, 2011; Zhou et al., 2014; Mauff et al., 2018). Regarding our MGCP-Cox model we set the number of pseudo-inputs to and the number of latent functions to . This setting is a commonly used setting for the MGCP (Álvarez & Lawrence, 2011; Zhao & Sun, 2016). The performance of each method is then assessed by the Receiver Operating Characteristic (ROC) curve, which is a common diagnostic tool for binary classifier. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR). Predictive accuracy is then assessed through the area under the curve (AUC). The results from the synthetic data are shown in Figure 4. Due to poor performance of both the LR and SVM on , we also checked whether they can produce comparable results to the MGCP-Cox when . We denote those models as LR-200 and SVM-200.

For the real data, the true survival probabilities are not available since we do not have information about the underlying parameters used to generate the data. Therefore, to evaluate model performance, we calculate the mean remaining lifetime of the testing unit, which is defined as . This integration can be obtained by the Gauss-Legendre quadrature. The performance is assessed by the absolute error where is the true remaining lifetime of the testing unit. We then report the distribution of the errors across all units using the boxplot in Figure 5. Similar to the synthetic data we use 30% and 50% percentiles to assess prediction accuracy. We also note than we cannot obtain estimates from the SVM and LR as they transform event prediction into a time series classification problem. Therefore, only results from LMM-joint and MGCP-Cox are reported in Figure 5.

Figure 4: ROC curves from simulation studies under different percentile of observation .

4.3 Results

The illustrative example in Figure 3 demonstrates the behavior of our method. As shown in the figure, our joint model framework can provide accurate predictions of both longitudinal signals and event probabilities. The unique smoothing kernel for each individual allows flexibility in the prediction, since it enables each training signal to have its own characteristics. This substantiates the strength of the MGCP. Equipped with the shared latent processes, the model can infer the similarities among all units, and predict signal trajectory by borrowing strength from training units.

The results in Figures 4 and 5 indicate that our MGCP-Cox model clearly outperforms the benchmarked models. Based on the figure we can obtain some important insights. First, as expected, prediction errors significantly decrease as the lifetime percentiles increase. Thus, the prediction accuracy from the MGCP-Cox model will become more accurate as

increases and more data are collected from an online monitored unit. Second, the prediction accuracy slightly decreases as we predict over a longer horizon (i.e. prediction is better for the near future). This is intuitively understandable as accuracy might decrease when predicting over a large region where not many training data might be observed. Third, the results show that the MGCP-Cox clearly outperforms LMM-joint. This result highlights the danger of parametric modeling and demonstrates the ability of our non-parametric approach to avoid model misspecifications. Fourth, even when the LR and SVM had a much larger number of units, the MGCP-Cox was still superior. This observation, also true to the LMM-Joint, highlights the strength of joint models. Lastly, one striking feature, shown in Figures

3, 4 and 5, is that even with a small number of observations ( observation percentile) from the testing unit we were still able to get accurate predictive results. This crucial in many applications as its allows early prediction of an event occurrence such as a disease or machine failure.

Figure 5: Remaining life prediction accuracy from NASA data.

5 Conclusion

We have presented a flexible and efficient non-parametric joint modeling framework for longitudinal and time-to-event data. A variational inference framework using pseudo-inputs is established to jointly estimate parameters from the MGCP-Cox model. Empirical studies highlight the advantageous features of our model to predict signal trajectories and provide reliable event prediction.


  • Álvarez & Lawrence (2009) Álvarez, M. and Lawrence, N. D. Sparse convolved gaussian processes for multi-output regression. In Advances in neural information processing systems, pp. 57–64, 2009.
  • Álvarez & Lawrence (2011) Álvarez, M. and Lawrence, N. D. Computationally efficient convolved multiple output gaussian processes. Journal of Machine Learning Research, 12(May):1459–1500, 2011.
  • Álvarez et al. (2010) Álvarez, M., Luengo, D., Titsias, M., and Lawrence, N. D. Efficient multioutput gaussian processes through variational inducing kernels. In

    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics

    , pp. 25–32, 2010.
  • Bycott & Taylor (1998) Bycott, P. and Taylor, J. A comparison of smoothing techniques for cd4 data measured with error in a time-dependent cox proportional hazards model. Statistics in medicine, 17(18):2061–2077, 1998.
  • Cheng (2018) Cheng, C. Multi-scale gaussian process experts for dynamic evolution prediction of complex systems. Expert Systems with Applications, 99:25–31, 2018.
  • Cox (1972) Cox, D. R. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
  • Crowther et al. (2012) Crowther, M. J., Abrams, K. R., and Lambert, P. C. Flexible parametric joint modelling of longitudinal and survival data. Statistics in Medicine, 31(30):4456–4471, 2012.
  • Crowther et al. (2013) Crowther, M. J., Abrams, K. R., and Lambert, P. C. Joint modeling of longitudinal and survival data. The Stata Journal, 13(1):165–184, 2013.
  • Damianou & Lawrence (2013) Damianou, A. and Lawrence, N. D. Deep gaussian processes. In Artificial Intelligence and Statistics, pp. 207–215, 2013.
  • Dempsey et al. (2017) Dempsey, W. H., Moreno, A., Scott, C. K., Dennis, M. L., Gustafson, D. H., Murphy, S. A., and Rehg, J. M. isurvive: An interpretable, event-time prediction model for mhealth. In International Conference on Machine Learning, pp. 970–979, 2017.
  • Dürichen et al. (2015) Dürichen, R., Pimentel, M. A., Clifton, L., Schweikard, A., and Clifton, D. A. Multitask gaussian processes for multivariate physiological time-series analysis. IEEE Transactions on Biomedical Engineering, 62(1):314–322, 2015.
  • Fernández et al. (2016) Fernández, T., Rivera, N., and Teh, Y. W. Gaussian processes for survival analysis. In Advances in Neural Information Processing Systems, pp. 5021–5029, 2016.
  • Gao et al. (2015) Gao, R., Wang, L., Teti, R., Dornfeld, D., Kumara, S., Mori, M., and Helu, M. Cloud-enabled prognosis for manufacturing. CIRP annals, 64(2):749–772, 2015.
  • Gasmi et al. (2003) Gasmi, S., Love, C. E., and Kahle, W. A general repair, proportional-hazards, framework to model complex repairable systems. IEEE Transactions on Reliability, 52(1):26–32, 2003.
  • Guarnizo & Álvarez (2018) Guarnizo, C. and Álvarez, M. A. Fast kernel approximations for latent force models and convolved multiple-output gaussian processes. arXiv preprint arXiv:1805.07460, 2018.
  • He et al. (2015) He, Z., Tu, W., Wang, S., Fu, H., and Yu, Z. Simultaneous variable selection for joint models of longitudinal and survival outcomes. Biometrics, 71(1):178–187, 2015.
  • Kalbfleisch & Prentice (2011) Kalbfleisch, J. D. and Prentice, R. L. The statistical analysis of failure time data, volume 360. John Wiley & Sons, 2011.
  • Kim & Pavlovic (2018) Kim, M. and Pavlovic, V. Variational inference for gaussian process models for survival analysis. Uncertainty in Artificial Intelligence, 2018.
  • Kontar et al. (2018a) Kontar, R., Zhou, S., Sankavaram, C., Du, X., and Zhang, Y. Nonparametric-condition-based remaining useful life prediction incorporating external factors. IEEE Transactions on Reliability, 67(1):41–52, 2018a.
  • Kontar et al. (2018b) Kontar, R., Zhou, S., Sankavaram, C., Du, X., and Zhang, Y. Nonparametric modeling and prognosis of condition monitoring signals using multivariate gaussian convolution processes. Technometrics, 60(4):484–496, 2018b.
  • Liu et al. (2013) Liu, K., Gebraeel, N. Z., and Shi, J. A data-level fusion model for developing composite health indices for degradation modeling and prognostic analysis. IEEE Transactions on Automation Science and Engineering, 10(3):652–664, 2013.
  • Mauff et al. (2018) Mauff, K., Steyerberg, E., Kardys, I., Boersma, E., and Rizopoulos, D. Joint models with multiple longitudinal outcomes and a time-to-event outcome. arXiv preprint arXiv:1808.07719, 2018.
  • Moreno-Muñoz et al. (2018) Moreno-Muñoz, P., Artés-Rodríguez, A., and Álvarez, M. A. Heterogeneous multi-output gaussian process prediction. arXiv preprint arXiv:1805.07633, 2018.
  • Pham et al. (2012) Pham, H. T., Yang, B., and Nguyen, T. T. Machine performance degradation assessment and remaining useful life prediction using proportional hazard model and support vector machine. Mechanical Systems and Signal Processing, 32:320–330, 2012.
  • Proust-Lima et al. (2014) Proust-Lima, C., Séne, M., Taylor, J. M., and Jacqmin-Gadda, H. Joint latent class models for longitudinal and time-to-event data: A review. Statistical methods in medical research, 23(1):74–90, 2014.
  • Rizopoulos (2011) Rizopoulos, D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 67(3):819–829, 2011.
  • Rizopoulos (2012) Rizopoulos, D. Joint models for longitudinal and time-to-event data: With applications in R. Chapman and Hall/CRC, 2012.
  • Rizopoulos et al. (2017) Rizopoulos, D., Molenberghs, G., and Lesaffre, E. M. Dynamic predictions with time-dependent covariates in survival analysis using joint modeling and landmarking. Biometrical Journal, 59(6):1261–1276, 2017.
  • Rosenberg (1995) Rosenberg, P. S. Hazard function estimation using b-splines. Biometrics, pp. 874–887, 1995.
  • Ruppert (2002) Ruppert, D. Selecting the number of knots for penalized splines. Journal of computational and graphical statistics, 11(4):735–757, 2002.
  • Saxena & Goebel (2008) Saxena, A. and Goebel, K. Turbofan engine degradation simulation data set, 2008. data retrieved from NASA Ames Prognostics Data Repository,
  • Snelson & Ghahramani (2006) Snelson, E. and Ghahramani, Z. Sparse gaussian processes using pseudo-inputs. In Advances in neural information processing systems, pp. 1257–1264, 2006.
  • Soleimani et al. (2018) Soleimani, H., Hensman, J., and Saria, S. Scalable joint models for reliable uncertainty-aware event prediction. IEEE transactions on pattern analysis and machine intelligence, 40(8):1948–1963, 2018.
  • Titsias & Lawrence (2010) Titsias, M. and Lawrence, N. D. Bayesian gaussian process latent variable model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 844–851, 2010.
  • Tsiatis et al. (1995) Tsiatis, A. A., Degruttola, V., and Wulfsohn, M. S. Modeling the relationship of survival to longitudinal data measured with error. applications to survival and cd4 counts in patients with aids. Journal of the American Statistical Association, 90(429):27–37, 1995.
  • Wu et al. (2012) Wu, L., Liu, W., Yi, G. Y., and Huang, Y. Analysis of longitudinal and survival data: joint modeling, inference methods, and issues. Journal of Probability and Statistics, 2012, 2012.
  • Wulfsohn & Tsiatis (1997) Wulfsohn, M. S. and Tsiatis, A. A. A joint model for survival and longitudinal data measured with error. Biometrics, pp. 330–339, 1997.
  • Yan et al. (2016) Yan, H., Liu, K., Zhang, X., and Shi, J. Multiple sensor data fusion for degradation modeling and prognostics under multiple operational conditions. IEEE Transactions on Reliability, 65(3):1416–1426, 2016.
  • Yu et al. (2004) Yu, M., Law, N. J., Taylor, J. M., and Sandler, H. M. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica, pp. 835–862, 2004.
  • Zhao & Sun (2016) Zhao, J. and Sun, S. Variational dependent multi-output gaussian process dynamical systems. The Journal of Machine Learning Research, 17(1):4134–4169, 2016.
  • Zhou et al. (2014) Zhou, Q., Son, J., Zhou, S., Mao, X., and Salman, M. Remaining useful life prediction of individual units subject to hard failure. IIE Transactions, 46(10):1017–1030, 2014.
  • Zhu et al. (2012) Zhu, H., Ibrahim, J. G., Chi, Y., and Tang, N. Bayesian influence measures for joint models for longitudinal and survival data. Biometrics, 68(3):954–964, 2012.