1 Introduction
In recent years, the multivariate Gaussian process (MGP) has drawn significant attention as an efficient nonparametric approach to predict longitudinal signal trajectories (Dürichen et al., 2015; MorenoMuñoz et al., 2018; Kontar et al., 2018b). The MGP draws its roots from multitask learning where transfer of knowledge is achieved through a shared representation between training and testing signals. One neat approach that achieves this knowledge transfer, employs convolution processes to construct the MGP. Specifically, each signal is expressed as a convolution of latent functions drawn from a Gaussian process (GP). Commonalities amongst training and testing signals are then captured by sharing these latent functions across the outputs (Titsias & Lawrence, 2010; Álvarez et al., 2010; Álvarez & Lawrence, 2011). Consequently, the multiple signals can be expressed as a single output from a common multivariate Gaussian convolution process (MGCP). Indeed, many recent studies have demonstrated the MGCP ability to account for nontrivial commonalities in the data and provide accurate predictive results (Zhao & Sun, 2016; Guarnizo & Álvarez, 2018; Cheng, 2018).
In this article we exploit the MGCP to explore the following question: can we use both timetoevent data (also known as survival data) along with longitudinal signals to obtain a reliable event prediction? This is illustrated in Figure 1
. As shown in the figure, our goal is to utilize both survival data and longitudinal signals from training units to predict the survival probability of a partially observed testing unit. Naturally, the aforementioned question is often encountered in a wide range of applications, including: disease prognosis in clinical trials, event prediction using vital health signals from monitored patients at risk, remaining useful life estimation of operational units/machines and failure prognosis in connected manufacturing systems (e.g., nuclear power plants)
(Tsiatis et al., 1995; Bycott & Taylor, 1998; Gasmi et al., 2003; Pham et al., 2012; Gao et al., 2015; Soleimani et al., 2018).In order to link survival and longitudinal data, stateoftheart methods have focused on joint models. The seminal work of Rizopoulos (Rizopoulos, 2011, 2012) laid a foundation for joint models where a linear mixed effects model is used to model longitudinal signals. The coefficients of the mixed model are then used in a Cox model to compute the probability of event occurrence conditioned on the observed longitudinal signals. This idea provided the bases for many extensions and applications in the literature (Crowther et al., 2012; Wu et al., 2012; Zhu et al., 2012; Crowther et al., 2013; ProustLima et al., 2014; He et al., 2015; Rizopoulos et al., 2017; Mauff et al., 2018). It is important to note here that joint methods are in general built using a twostage inference procedure. In twostage inference, features from the longitudinal data are first learned, these estimated features are then inserted into a survival model to predict event probabilities. Indeed, many papers have shown that this twostage procedure can produce competitive predictive results (Wulfsohn & Tsiatis, 1997; Yu et al., 2004; Zhou et al., 2014; Mauff et al., 2018). Nevertheless, the foregoing works are based on strong parametric assumptions where signals are assumed to follow a specific parametric form and all the signals (training and testing) exhibit that same functional form. In other words, signals behave according to a similar trend but at different rates (i.e., different parameter values). However, parametric methods are restrictive in many applications and if the specified form is far from the truth, predictive results will be misleading. Furthermore, the assumption that all signals possess the same functional form may not hold in reallife applications. For instance, units operated under different environmental conditions may exhibit different signal evolution rates and trends (Yan et al., 2016; Kontar et al., 2018a)
. Some recent efforts aimed to relax strong parametric assumptions using splines, continuous time Markov chains and the GP. Unfortunately, these methods still assume homogeneity across the population and focus on merely imputing the longitudinal data rather than predicting signal evolution within a time interval of interest
(Dempsey et al., 2017; Soleimani et al., 2018). We here note that there has been some recent attempts at rebuilding the Cox model using a GP (Fernández et al., 2016; Kim & Pavlovic, 2018). However these approaches are only based on survival data and do not handle joint modeling, which is the focus of this article.To address the aforementioned challenges, we propose a flexible joint modeling approach denoted as MGCPCox. Our approach exploits the MGCP to model the evolution of longitudinal signals and a Cox model to map timetoevent data with longitudinal data modeled through MGCP. Event occurrence probability is then derived within any future interval as shown in Figure 1. We also propose a variational inference framework using pseudoinputs (Snelson & Ghahramani, 2006; Damianou & Lawrence, 2013) to simultaneously estimate parameters in the joint MGCPCox model. This facilitates scalability to large data settings and safeguards against model overfitting. Finally, the advantageous features of the proposed method are demonstrated through numerical studies and a case study with realworld data in the application to finding the remaining useful lifetime of NASA Aeropropulsion engines.
The rest of the paper is organized as follows. In section 2 we briefly review survival analysis. In section 3, we present our joint modeling framework and the variational inference algorithm. Numerical experiments using synthetic data and realworld data are provided in section 4. Finally, section 5 concludes the paper with a brief discussion. A detailed code and the used realworld data are available in the supplementary materials.
2 Background: Survival Analysis
In this section, we will briefly review survival analysis which will be used for event prediction in the joint model. Survival analysis is a branch of statistics for analyzing timetoevent data and predicting the probability of occurrence of an event. For each individual unit , the associated data is , where is the event time (the unit failed at time or was censored at time ), is an event indicator ( indicates the unit has failed/censored), are the noisy observed longitudinal data (e.g., vital signals collected from patients) corresponding to the underlying latent values , and
is a set of timeinvariant features (e.g., patient’s gender). Typically, the continuous random variable
is characterized by a survival function which represents the probability of survival up to time . Another important term is the hazard function and can be thought of as the instantaneous rate of occurrence of an event at time . It is easy to show that . The term is called cumulative hazard function and is denoted by . The basic scheme of survival analysis is to find suitable models to explain relationships between the hazard function and collected data . These models are defined as survival models.Many survival models have been developed to analyze timetoevent data. They typically model the hazard function as a function of some timevarying and fixed features. One class of prevailed survival models is called the Cox model (Cox, 1972), which has the form , where is a baseline hazard function shared by all individuals, and is typically modeled by the Weibull or a piecewise constant function,
is a vector of coefficients for the fixed covariates (features),
is the feature estimated by a longitudinal model (e.g., linear mixed model, Gaussian Process), and
is a scaling parameter for the timevarying covariates. Parameters in the Cox model are typically estimated by maximizing the loglikelihood function(1) 
For a comprehensive review of survival models, see (Kalbfleisch & Prentice, 2011). Given an estimate of parameters from the Cox model, we can then obtain the event (failure) probability within a future time interval given the fact that the testing unit survives nonshorter than the current time instance . This probability, denoted , is estimated as follows:
(2) 
where and are features for a testing unit . Note that in Figure 1 we show the survival curve which is defined as , where .
3 Joint Modeling and Variational Inference
3.1 The multivariate Gaussian convolution process (MGCP)
Assume data have been collected from units and let denote the set of all units. For unit , its associated data is . The observed longitudinal signal is denoted by , where represents the number of observations and denotes the inputs. We decompose the longitudinal signal as , where is a mean zero GP and denotes additive noise with zero mean and variance.
To obtain an accurate predictive result, we need to capture the intrinsic relatedness among signals. Particularly, we resort to the convolution process to model the latent function . We consider independent latent functions and different smoothing kernels . The latent functions are assumed independent GPs with covariance . We set to be scaled Gaussian kernels and to be squared exponential covariance functions (Álvarez & Lawrence, 2009).
(3) 
The GP is then constructed by convolving the shared latent functions with the smoothing kernel as shown in (4). This is the underlying principle of the MGCP, where the latent functions are shared across different outputs through the corresponding kernels
. Since convolutions are linear operators on a function and since the latent function, a GP, is shared across multiple outputs then all outputs can be expressed as a jointly distributed GP, an MGCP. As shown in Figure
2, a key feature is that information is shared through different parameters encoded in the kernels . Outputs then can possess both shared and unique features. Thus, accounting for heterogeneity in the longitudinal data.(4) 
Based on equation (4), the covariance function between and , and the covariance function between and , can be calculated as
(5) 
where and .
Now denote the underlying latent values as , where . The density function of can be obtained as , where sized is the covariance function. The likelihood of involves inverting the large matrix . This operation has computational complexity and storage requirement . To alleviate computational burden, we introduce pseudoinputs from the latent functions denoted as where .
Since the latent functions are GPs, then any sample
follows a multivariate Gaussian distribution. Conditioned on
, we next sample from the conditional prior . In equation (4), can be approximated well by the expectation as long as the latent functions are smooth (Álvarez & Lawrence, 2011). Denote by. The probability distribution of
can be expressed as , where is a blockdiagonal matrix such that each block is associated with the covariance of in (3). By multivariate Gaussian identities, the probability distribution of conditional on is(6) 
where . Therefore, can be approximated by , which is given as
(7) 
By equation (7), can then be approximated by .
3.2 Joint Model and Variational Inference
Now following our convolution construction in (4), the hazard function at time is given as
(8) 
This key equation links the MGCP to the Cox model. We begin with presenting the loglikelihood of the joint model given observed data . The marginal loglikelihood function is
(9) 
We would like to provide a good approximation of by introducing an evidence lower bound (ELBO) . This bound is calculated by finding the KullbackLeibler (KL) divergence between the variational density and the true posterior density . Specifically,
(10) 
The variaitonal density is assumed to be factorized as
(11) 
Maximizing the ELBO with respect to
and hyperparameters from the MGCPCox model can achieve purposes of variational inference and model selection simultaneously
(Kim & Pavlovic, 2018). By equation (10),(12) 
Furthermore, we can decompose , where , and . Based on equation (12), the MGCP propagates uncertainties through the latent processes to the Cox model.
It is desirable to find a closed form of the ELBO in equation (12). Since and are both Gaussian, we can obtain
(13) 
where is a trace operator. Therefore, the ELBO can be simplified as
(14) 
We compute the optimal upper bound of by reversing Jensen’s inequality. This gives an optimal distribution and
(15) 
where . can be thought of as a penalization term that regularizes the estimation of the parameters. Note that the first two terms in equation (15) can be computed in (Snelson & Ghahramani, 2006).
We will present a solution to solve the last integration in equation (15) in the following section.
3.3 Variational Inference on Cox Model
Parameters in the Cox model can be attained by maximizing the following loglikelihood function:
(16) 
In equation (15), we obtain the optimal to maximize the ELBO. In this section, we will use to approximate . Specifically, the optimal has the form
(17) 
It is easy to show that
has the normal distribution with parameter
. Specifically,(18) 
where
The last integration in equation (15) can be simplified to
(19) 
The first term in equation (19) can be calculated analytically. For each unit ,
(20) 
where In the last step, we applied the Fubini’s theorem to interchange integrals. The second term in equation (19
) can be estimated by the moment generating function (MGF) and the numerical integration. For each unit
,(21) 
where , and . We can assume to be an exponential function , where , are parameters to be learned and when because units are not subject to risk before the first failure event. Note that the baseline hazard is nondecreasing with time and thus we add one constraint . To obtain a good baseline hazard prediction given the estimated , we can calculate the cumulative hazard at time point as , where , and is the th smallest element in . Then we fit a regularized smooth spline to (Ruppert, 2002). The predicted baseline hazard at can be estimated by (Rosenberg, 1995).
The is maximized with respect to the parameters , , , , , , where , by the gradientbased method. Specifically, We can obtain the optimal parameters by maximizing subject to .
3.4 Event Prediction
Without loss of generality, we focus on predicting the event occurrence probability for unit . Suppose observations from the testing unit have been collected up to time . The survival model computes the event probabilities conditioned on the predicted longitudinal features . Given estimated parameters, and following (2), we are interested in calculating
(22) 
Based on equation (22), the accurate extrapolation within is essential. In the MGCP, the predictive distribution for any new input point is given by
(23) 
where , and . We have used as a notation to indicate when the covariance matrix is evaluated at the . Consequently, the predicted signal at the time point for unit is .
4 Experiments
We conduct case studies to demonstrate the performance of our proposed methodology, denoted as MGCPCox. Both synthetic and realworld data are used. We also provide an illustrative example in Figure 3 to demonstrate the benefits of the MGCPCox model.
4.1 Data setting
For the synthetic data we assume that the underlying true path for unit has the form , where , and with and where Without loss of generality, we assume that the time unit is month and that signals were obtained regularly at each month up to their failure or censoring time. An example of the signals is shown in the top row of Figure 3. For each unit we specify a timeinvariant feature
generated by a Bernoulli distribution with
. In the Cox model, we use the Weibull baseline hazard rate function with and . We generate the failure timefor each unit by rejection sampling using its probability density function
. We set and . Also, we randomly select of the units to be right censored. The number of units generated is and the experiment is repeated for times. Detailed code for data generation is provided in the supplementary materials.For the realworld case study we use the CMAPSS dataset provided by the National Aeronautics and Space Administration (NASA). The dataset contains failure time data of aircraft turbofan engines and degradation signals from informative sensors mounted on these engines. Note that in our analysis we standardize all sensor data. We refer readers to Saxena & Goebel (2008) and Liu et al. (2013) for more details about the data. The CMAPSS data publically available at: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognosticdatarepository/.
4.2 Baselines and Evaluations
We focus on predicting the event probability within a future time interval . We consider months in this simulation study. Prediction performance at varying time points for the partially observed unit is then reported. The time instant is defined as the observation percentile, where is the failure time of unit . The values of are specified as , . Figure 3
shows some examples of units observed up to different percentiles of their failure time. Further, in our simulation studies, we benchmark our method with three other reference methods for comparison: (1) Logistic Regression (LR) classifier: in the LR, event data is transformed into binary labels
denoting whether units failed or not within the time interval . The timefixed covariate and the last observed signal measurement atare used as the model predictors (2) Support Vector Machine (SVM) classifier: the SVM here is used as a flexible alternative to the LR. We use the radial basis kernel and determine parameters using
fold crossvalidation on the training data (3) Parametric Joint Model (LMMJoint): we implement a stateoftheart joint modeling algorithm using the linear mixedeffect model. The LMMjoint uses a general polynomial function whose corresponding degree is determined through an Akaike information criteria to model the signal path. Note that this framework estimates parameters from the mixedeffect model and the Cox model separately (Rizopoulos, 2011; Zhou et al., 2014; Mauff et al., 2018). Regarding our MGCPCox model we set the number of pseudoinputs to and the number of latent functions to . This setting is a commonly used setting for the MGCP (Álvarez & Lawrence, 2011; Zhao & Sun, 2016). The performance of each method is then assessed by the Receiver Operating Characteristic (ROC) curve, which is a common diagnostic tool for binary classifier. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR). Predictive accuracy is then assessed through the area under the curve (AUC). The results from the synthetic data are shown in Figure 4. Due to poor performance of both the LR and SVM on , we also checked whether they can produce comparable results to the MGCPCox when . We denote those models as LR200 and SVM200.For the real data, the true survival probabilities are not available since we do not have information about the underlying parameters used to generate the data. Therefore, to evaluate model performance, we calculate the mean remaining lifetime of the testing unit, which is defined as . This integration can be obtained by the GaussLegendre quadrature. The performance is assessed by the absolute error where is the true remaining lifetime of the testing unit. We then report the distribution of the errors across all units using the boxplot in Figure 5. Similar to the synthetic data we use 30% and 50% percentiles to assess prediction accuracy. We also note than we cannot obtain estimates from the SVM and LR as they transform event prediction into a time series classification problem. Therefore, only results from LMMjoint and MGCPCox are reported in Figure 5.
4.3 Results
The illustrative example in Figure 3 demonstrates the behavior of our method. As shown in the figure, our joint model framework can provide accurate predictions of both longitudinal signals and event probabilities. The unique smoothing kernel for each individual allows flexibility in the prediction, since it enables each training signal to have its own characteristics. This substantiates the strength of the MGCP. Equipped with the shared latent processes, the model can infer the similarities among all units, and predict signal trajectory by borrowing strength from training units.
The results in Figures 4 and 5 indicate that our MGCPCox model clearly outperforms the benchmarked models. Based on the figure we can obtain some important insights. First, as expected, prediction errors significantly decrease as the lifetime percentiles increase. Thus, the prediction accuracy from the MGCPCox model will become more accurate as
increases and more data are collected from an online monitored unit. Second, the prediction accuracy slightly decreases as we predict over a longer horizon (i.e. prediction is better for the near future). This is intuitively understandable as accuracy might decrease when predicting over a large region where not many training data might be observed. Third, the results show that the MGCPCox clearly outperforms LMMjoint. This result highlights the danger of parametric modeling and demonstrates the ability of our nonparametric approach to avoid model misspecifications. Fourth, even when the LR and SVM had a much larger number of units, the MGCPCox was still superior. This observation, also true to the LMMJoint, highlights the strength of joint models. Lastly, one striking feature, shown in Figures
3, 4 and 5, is that even with a small number of observations ( observation percentile) from the testing unit we were still able to get accurate predictive results. This crucial in many applications as its allows early prediction of an event occurrence such as a disease or machine failure.5 Conclusion
We have presented a flexible and efficient nonparametric joint modeling framework for longitudinal and timetoevent data. A variational inference framework using pseudoinputs is established to jointly estimate parameters from the MGCPCox model. Empirical studies highlight the advantageous features of our model to predict signal trajectories and provide reliable event prediction.
References
 Álvarez & Lawrence (2009) Álvarez, M. and Lawrence, N. D. Sparse convolved gaussian processes for multioutput regression. In Advances in neural information processing systems, pp. 57–64, 2009.
 Álvarez & Lawrence (2011) Álvarez, M. and Lawrence, N. D. Computationally efficient convolved multiple output gaussian processes. Journal of Machine Learning Research, 12(May):1459–1500, 2011.

Álvarez et al. (2010)
Álvarez, M., Luengo, D., Titsias, M., and Lawrence, N. D.
Efficient multioutput gaussian processes through variational inducing
kernels.
In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, pp. 25–32, 2010.  Bycott & Taylor (1998) Bycott, P. and Taylor, J. A comparison of smoothing techniques for cd4 data measured with error in a timedependent cox proportional hazards model. Statistics in medicine, 17(18):2061–2077, 1998.
 Cheng (2018) Cheng, C. Multiscale gaussian process experts for dynamic evolution prediction of complex systems. Expert Systems with Applications, 99:25–31, 2018.
 Cox (1972) Cox, D. R. Regression models and lifetables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
 Crowther et al. (2012) Crowther, M. J., Abrams, K. R., and Lambert, P. C. Flexible parametric joint modelling of longitudinal and survival data. Statistics in Medicine, 31(30):4456–4471, 2012.
 Crowther et al. (2013) Crowther, M. J., Abrams, K. R., and Lambert, P. C. Joint modeling of longitudinal and survival data. The Stata Journal, 13(1):165–184, 2013.
 Damianou & Lawrence (2013) Damianou, A. and Lawrence, N. D. Deep gaussian processes. In Artificial Intelligence and Statistics, pp. 207–215, 2013.
 Dempsey et al. (2017) Dempsey, W. H., Moreno, A., Scott, C. K., Dennis, M. L., Gustafson, D. H., Murphy, S. A., and Rehg, J. M. isurvive: An interpretable, eventtime prediction model for mhealth. In International Conference on Machine Learning, pp. 970–979, 2017.
 Dürichen et al. (2015) Dürichen, R., Pimentel, M. A., Clifton, L., Schweikard, A., and Clifton, D. A. Multitask gaussian processes for multivariate physiological timeseries analysis. IEEE Transactions on Biomedical Engineering, 62(1):314–322, 2015.
 Fernández et al. (2016) Fernández, T., Rivera, N., and Teh, Y. W. Gaussian processes for survival analysis. In Advances in Neural Information Processing Systems, pp. 5021–5029, 2016.
 Gao et al. (2015) Gao, R., Wang, L., Teti, R., Dornfeld, D., Kumara, S., Mori, M., and Helu, M. Cloudenabled prognosis for manufacturing. CIRP annals, 64(2):749–772, 2015.
 Gasmi et al. (2003) Gasmi, S., Love, C. E., and Kahle, W. A general repair, proportionalhazards, framework to model complex repairable systems. IEEE Transactions on Reliability, 52(1):26–32, 2003.
 Guarnizo & Álvarez (2018) Guarnizo, C. and Álvarez, M. A. Fast kernel approximations for latent force models and convolved multipleoutput gaussian processes. arXiv preprint arXiv:1805.07460, 2018.
 He et al. (2015) He, Z., Tu, W., Wang, S., Fu, H., and Yu, Z. Simultaneous variable selection for joint models of longitudinal and survival outcomes. Biometrics, 71(1):178–187, 2015.
 Kalbfleisch & Prentice (2011) Kalbfleisch, J. D. and Prentice, R. L. The statistical analysis of failure time data, volume 360. John Wiley & Sons, 2011.
 Kim & Pavlovic (2018) Kim, M. and Pavlovic, V. Variational inference for gaussian process models for survival analysis. Uncertainty in Artificial Intelligence, 2018.
 Kontar et al. (2018a) Kontar, R., Zhou, S., Sankavaram, C., Du, X., and Zhang, Y. Nonparametricconditionbased remaining useful life prediction incorporating external factors. IEEE Transactions on Reliability, 67(1):41–52, 2018a.
 Kontar et al. (2018b) Kontar, R., Zhou, S., Sankavaram, C., Du, X., and Zhang, Y. Nonparametric modeling and prognosis of condition monitoring signals using multivariate gaussian convolution processes. Technometrics, 60(4):484–496, 2018b.
 Liu et al. (2013) Liu, K., Gebraeel, N. Z., and Shi, J. A datalevel fusion model for developing composite health indices for degradation modeling and prognostic analysis. IEEE Transactions on Automation Science and Engineering, 10(3):652–664, 2013.
 Mauff et al. (2018) Mauff, K., Steyerberg, E., Kardys, I., Boersma, E., and Rizopoulos, D. Joint models with multiple longitudinal outcomes and a timetoevent outcome. arXiv preprint arXiv:1808.07719, 2018.
 MorenoMuñoz et al. (2018) MorenoMuñoz, P., ArtésRodríguez, A., and Álvarez, M. A. Heterogeneous multioutput gaussian process prediction. arXiv preprint arXiv:1805.07633, 2018.
 Pham et al. (2012) Pham, H. T., Yang, B., and Nguyen, T. T. Machine performance degradation assessment and remaining useful life prediction using proportional hazard model and support vector machine. Mechanical Systems and Signal Processing, 32:320–330, 2012.
 ProustLima et al. (2014) ProustLima, C., Séne, M., Taylor, J. M., and JacqminGadda, H. Joint latent class models for longitudinal and timetoevent data: A review. Statistical methods in medical research, 23(1):74–90, 2014.
 Rizopoulos (2011) Rizopoulos, D. Dynamic predictions and prospective accuracy in joint models for longitudinal and timetoevent data. Biometrics, 67(3):819–829, 2011.
 Rizopoulos (2012) Rizopoulos, D. Joint models for longitudinal and timetoevent data: With applications in R. Chapman and Hall/CRC, 2012.
 Rizopoulos et al. (2017) Rizopoulos, D., Molenberghs, G., and Lesaffre, E. M. Dynamic predictions with timedependent covariates in survival analysis using joint modeling and landmarking. Biometrical Journal, 59(6):1261–1276, 2017.
 Rosenberg (1995) Rosenberg, P. S. Hazard function estimation using bsplines. Biometrics, pp. 874–887, 1995.
 Ruppert (2002) Ruppert, D. Selecting the number of knots for penalized splines. Journal of computational and graphical statistics, 11(4):735–757, 2002.
 Saxena & Goebel (2008) Saxena, A. and Goebel, K. Turbofan engine degradation simulation data set, 2008. data retrieved from NASA Ames Prognostics Data Repository, https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognosticdatarepository/.
 Snelson & Ghahramani (2006) Snelson, E. and Ghahramani, Z. Sparse gaussian processes using pseudoinputs. In Advances in neural information processing systems, pp. 1257–1264, 2006.
 Soleimani et al. (2018) Soleimani, H., Hensman, J., and Saria, S. Scalable joint models for reliable uncertaintyaware event prediction. IEEE transactions on pattern analysis and machine intelligence, 40(8):1948–1963, 2018.
 Titsias & Lawrence (2010) Titsias, M. and Lawrence, N. D. Bayesian gaussian process latent variable model. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 844–851, 2010.
 Tsiatis et al. (1995) Tsiatis, A. A., Degruttola, V., and Wulfsohn, M. S. Modeling the relationship of survival to longitudinal data measured with error. applications to survival and cd4 counts in patients with aids. Journal of the American Statistical Association, 90(429):27–37, 1995.
 Wu et al. (2012) Wu, L., Liu, W., Yi, G. Y., and Huang, Y. Analysis of longitudinal and survival data: joint modeling, inference methods, and issues. Journal of Probability and Statistics, 2012, 2012.
 Wulfsohn & Tsiatis (1997) Wulfsohn, M. S. and Tsiatis, A. A. A joint model for survival and longitudinal data measured with error. Biometrics, pp. 330–339, 1997.
 Yan et al. (2016) Yan, H., Liu, K., Zhang, X., and Shi, J. Multiple sensor data fusion for degradation modeling and prognostics under multiple operational conditions. IEEE Transactions on Reliability, 65(3):1416–1426, 2016.
 Yu et al. (2004) Yu, M., Law, N. J., Taylor, J. M., and Sandler, H. M. Joint longitudinalsurvivalcure models and their application to prostate cancer. Statistica Sinica, pp. 835–862, 2004.
 Zhao & Sun (2016) Zhao, J. and Sun, S. Variational dependent multioutput gaussian process dynamical systems. The Journal of Machine Learning Research, 17(1):4134–4169, 2016.
 Zhou et al. (2014) Zhou, Q., Son, J., Zhou, S., Mao, X., and Salman, M. Remaining useful life prediction of individual units subject to hard failure. IIE Transactions, 46(10):1017–1030, 2014.
 Zhu et al. (2012) Zhu, H., Ibrahim, J. G., Chi, Y., and Tang, N. Bayesian influence measures for joint models for longitudinal and survival data. Biometrics, 68(3):954–964, 2012.
Comments
There are no comments yet.