Cystic fibrosis (CF) is a lethal genetic disorder that primarily affects the lungs. The clinical course of CF is marked by progressive loss of lung function and typically results in respiratory failure. Forced expiratory volume in 1 second (hereafter, ) is the most important clinical indicator in monitoring lung function decline in patients with CF. Patients during follow-up might experience acute respiratory events referred to as pulmonary exacerbations. It is, therefore, of clinical interest to characterize the association between the longitudinal outcome and time-to first exacerbation. The motivation for our research comes from the US CF Foundation Patient Registry that consists of patients that were monitored from 2003 until 2015. In particular, we examined data from 1016 patients aged 6 years and older who were enrolled with a median number follow-up equal to 7 years and a range of 1-21 years. The average age at baseline is 14.6 years (with a range of 6-21).
Several authors have studied the evolution of lung function over time, as summarized in a recent review (Szczesniak et al., 2017a), however, to our knowledge little work has been done regarding the association of the lung function such as with time to event outcomes. In particular, joint modeling of longitudinal and survival outcomes in CF was introduced several years ago (Schluchter et al., 2002), but has not been further used in CF epidemiology due to the computational burden of this approach. Furthermore, it is well recognized that different unobserved sub-groups of the biomarker exhibit different longitudinal profiles (Szczesniak et al., 2017b). Patients can be categorized in several sub-groups (latent classes) with different trajectories. It is, therefore, of high clinical interest to measure the strength of association between with the risk of first exacerbation accounting for the latent trajectories.
The joint model of longitudinal and survival data constitutes a popular framework to analyze longitudinal and survival outcomes jointly (Tsiatis and Davidian, 2004; Hickey et al., 2016). In particular, two paradigms within this framework are the shared parameter joint models and the joint latent class models. The former paradigm links the longitudinal and the survival process via the random effects (Faucett and Thomas, 1996; Wulfsohn and Tsiatis, 1997; Brown and Ibrahim, 2003; Rizopoulos and Ghosh, 2011; Rizopoulos, 2012; Andrinopoulou et al., 2014), which does not allow for latent classes. The latter paradigm (Lin et al., 2002; Proust-Lima et al., 2014; Rouanet et al., 2016), which associates the two processes through latent classes, explicitly postulates the existence of sub-populations but does not directly quantify the strength of the association.
The aim of the paper is twofold. Firstly, to model the relationship between
and time-to first exacerbation. For this purpose, we propose a Bayesian shared parameter joint model that integrates latent classes inherent in this heterogeneous population. This model will assess the strength of association between the two outcomes while allowing for latent classes. Secondly, to address a problem that arises in latent class models, which is the selection of the optimal number of classes. Several approaches have been proposed in the literature both in frequentist and Bayesian frameworks, including among others the use of information criterion, Bayes factors and reversible jump MCMC. These approaches are computationally intensive and can require the fit of several models with different numbers of classes, which can be time-consuming. To overcome this problem, we will implement the method ofNasserinejad et al. (2017) to our joint modeling. This method is a pragmatic extension of Rousseau and Mengersen (2011) criterion that showed that if we overfit a mixture model by assuming more latent classes than present in the data, the superfluous latent classes will asymptotically become empty if the Dirichlet prior on the class proportions is sufficiently uninformative. Nasserinejad et al. (2017)
performed an extensive simulation study to further investigate this approach and used it as a criterion also in longitudinal studies for obtaining the optimal number of classes by simply excluding latent classes that are negligible in proportion.
2 Joint Model Estimation
2.1 Longitudinal submodel
To account for the fact that the population is heterogeneous and consists of possible unobserved sub-groups, we postulate a latent class mixed-effects model (Verbeke and Lesaffre, 1996; Proust and Jacqmin-Gadda, 2005; Proust-Lima et al., 2013). We let
denote the longitudinal response vector for theth patient () obtained at different time points , . In particular, we have
where presents the latent class indicator, denotes the design vector for the fixed effects regression coefficients and the design vector for the random effects . Moreover,
. For the corresponding random effects, we assume a multivariate normal distribution, namely
where denotes the normal distribution andof belonging to latent class . Using a multinomial distribution we obtain the class of each individual as,
According to the specification of the latent class mixed-effects submodel (1), both fixed and random effects are class-specific, whereas the measurement error is not.
2.2 Survival submodel
We let denote the true failure time for the -th individual, and the censoring time. Moreover, denotes the observed failure time and is the event indicator where zero corresponds to censoring. We postulate a joint model for the relationship between the survival and the longitudinal outcome. Specifically, we have
where is a vector of baseline covariates with a corresponding vector of regression coefficients and is the baseline hazard. Specifically, the B-splines baseline hazard function is assumed , where denotes the -th basis function of a B-spline with knots and the vector of spline coefficients. The knots are placed at equally spaced percentiles of the observed event times. Furthermore, denotes the association parameter for the th class. According to the specification of the survival submodel (2) the baseline covariates, the baseline hazard and the association parameter are class-specific parameters. The proposed model goes beyond the standard joint model and joint latent class model where a single or no association parameter is assumed and provides a class-specific association. This is a more realistic assumption for the motivating data set since it is clinically expected that the risk of the first exacerbation will be higher when the rate of decline is faster. Accounting for these latent classes will lead to improved estimates of association arising from the joint model.
3 Bayesian Estimation
We employ a Bayesian approach where inference is based on the posterior distribution of parameters in the model. We use Markov chain Monte Carlo (MCMC) methods to estimate the parameters of the proposed model. The likelihood of the model is derived under the assumption that the longitudinal and survival processes are independent given the random effects. Moreover, the longitudinal responses of each subject are assumed independent given the random effects(Rizopoulos, 2012). The likelihood contribution for the -th patient is written as
where with and .
The likelihood contribution of the longitudinal outcome takes the form
The likelihood contribution of the survival model is given by
The posterior distribution is written as
and denotes the prior distributions.
A commonly used prior in mixture models for the class probability is a Dirichlet distribution. In particular,
Small values of correspond to a less informative prior and a flat prior distribution is obtained when each is equal to . The selection of is an important task and will be discussed in Section 3.1. Standard priors can be assumed for the rest of the parameters. In particular, for the coefficients of the longitudinal fixed effects, the survival covariates and the baseline hazard, normal priors can be taken. For the variance-covariance matrix of the random effects we can assume an inverse Wishart prior, while for the precision parameter of the longitudinal outcome we can assume an gamma prior.
3.1 Selection of Number of Classes
An important task in latent class models is to identify the optimal number of classes. Several approaches have been previously proposed for choosing the optimal number of classes in both frequentist and Bayesian settings. Common examples are the Bayesian information criterion (BIC) (Schwarz et al., 1978), deviation information criterion (DIC) (Celeux et al., 2006) and other Bayesian approaches such as Bayes factor and reversible jump MCMC algorithm (Green, 1995). A drawback of the aforementioned approaches is that they are computationally intensive and some require the fit of models assuming different numbers of classes, which might be time-consuming for complex models such as the joint models of longitudinal and survival outcomes.
An interesting alternative was proposed by Rousseau and Mengersen (2011), where they proved that in overfitted mixture models (with more latent classes than present in the data), the superfluous latent classes will asymptomatically become empty if the Dirichlet prior on the class proportion is sufficiently uninformative. Recently, Nasserinejad et al. (2017) used this approach and proposed a latent class selection procedure for longitudinal models. An overfitted mixture model converged to the true mixture by assigning a small portion of individuals to empty classes, if the parameters of the Dirichlet prior are smaller than , where is the number of class-specific parameters. Furthermore, uninformative priors for the rest of the parameters are required. The steps are described as follows:
First, a latent class model with a large enough number of latent classes is fitted.
Then, the number of non-empty classes at each iteration is calculated as:
where is the total number of classes, represents the iteration, is the number of patients in class at iteration , is the total number of patients and is a predifined value.
After obtaining the non-empty classes per iteration, the posterior mode of the non-empty classes is calculated.
Finally, the model with the optimal number of classes which are the non-empty classes is refitted.
Advantages of this approach are that it is easy to implement even in such complex models and it is not influenced by the label switching problem since we observe the non-empty classes at each iteration. The only time that we need to correct for label switching is when we fit the final model with the optimal number of classes. Furthermore, this approach requires us to fit the model only two times, (namely one with the high number of classes and one with the optimal number of classes) instead of assuming all the possible number of classes, therefore decreasing computational burden. It has been shown through extensive simulations in the longitudinal setting that this method performs better than alternative model selection criteria such as BIC and DIC (Nasserinejad et al., 2017).
4 Analysis of the CF data
In this section we present the analysis of the motivating data set introduced in Section 1. Our primary focus is to investigate the association between and time-to first exacerbation by taking into account that we have sub-groups with different evolution over time for . The first step is to obtain the optimal number of classes that can explain the heterogeneity of the population. From the literature, it is known that two or three classes are observed for the evolution of outcome (Szczesniak et al., 2017b)
. Therefore, for the selection process, we fitted a joint model assuming six classes. For the longitudinal outcome, we assumed a linear mixed-effects submodel including natural cubic splines for time (modeled as age, in years) with two internal knots at 13.76 and 17.62 years (corresponding to 33.3% and 66.67% of the observed follow-up times) in both the fixed and random effects parts. The DIC criterion was used to investigate the need of non-linear evolution over time. Furthermore, we corrected for some baseline characteristics. These variables, togehter with descriptive statistics, are presented in Table1.
|Number of F508del alleles (genotype):|
|SESlow (Using state/federal or having no|
|insurance is a marker of low socioeconomic status):|
|MRSA (Methicillin-resistant Staphylococcus aureus):|
|MSSA (Methicillin-sensitive Staphylococcus aureus):|
|Pa (Pseudomonas aeruginosa):|
|CFRD (CF-related diabetes):|
|CFRD with or without fasting hyperglycemia||12|
|PancEnzymes (Taking a pancreatic enzyme supplement,|
|marks pancreatic insufficiency):|
Mean (standard deviation)
|(Total of number of visits within the prior year)||5 (3.4)|
Specifically, the model takes the form,
To investigate the association between and time-to first exacerbation, we postulated the proposed joint latent class model:
For the baseline hazard we assumed a quadratic B-splines basis with 8 equi-distance internal knots ranging from zero until 19.25 years.
Following the recommendation in Nasserinejad et al. (2017), for the Dirichlet distribution in the prior of the class probability we assumed smaller than , where is the number of parameters. To ensure that we have the same scale for the coefficients of the covariates in order to easier select uninformative priors, we standardized the outcome and the continuous variables (age and numVisityr). Relatively uninformative prior were selected for the parameters in the model. These priors are as follows:
denotes the gamma distribution anddenotes the inverse Wishart distribution with being the scale matrix and
the degrees of freedom which is set as the total number of the random effects. For the variance of the association parameter no large variance was required to ensure that we have a uninformative prior since, with the standard joint model we obtained an association parameter smaller than 0.1. The selection of the variances of these priors was investigated with simulations. We ran the MCMC using a single chain with 100,000 iterations, 50,000 burn-in and 10 thinning. The results indicated the presence of three classes, assuming a class that contains of 8% to 15% of the patients is considered to be empty (8%15%).
We reran the model assuming three classes and the normal scale of the continuous covariates and outcome. We ran the MCMCs with a single chain for 250,000 iterations, with a burn-in of 150,000 and thinning of 10 and fixed the label switching problem. Convergence was monitored by trace plots. In Figure 1 we illustrate the evolution of the longitudinal outcome in each class assuming patients who are F508del homozygotes, non-Hispanic, White, without low SES, no infections with MRSA, MSSA or aspergillus, no pancreatic enzyme use, no CFRD and had five visits within the prior year (which is the mean value of all observations). In particular, the upper plots represent female patients while the lower plots represent male patients. We obtain a faster progression in class two for female patients, while a faster progression in class two and three for male patients. In Figure 2
we illustrate the evolution of the longitudinal outcome in each class assuming patients who are F508del heterozygotes, Hispanic, White, have low SES, infections with MRSA, MSSA, aspergillus, use pancreatic enzymes, have impaired CFRD and had five visits within the prior year. Again, the upper plots represent female patients while the lower plots represent male patients. In this case, we obtain a small increase between age 10 and 15. However, the confidence interval is wider in that period, indicating that we do not have enough information for this combination of characteristics. We obtain a faster progression in class two for both female and male patients. Furthermore, the mean and the credible interval of the MCMC samples of the association parameters per class are presented in Figure3. We obtain a weak association between and time-to first exacerbation for the first and third class, while a stronger negative association for class two.
We performed a series of simulations to investigate the proposed class selection method on the joint modeling framework.
We assumed , and patients with maximum number of repeated measurements equal to ten. To simulate the continuous longitudinal outcome, we used the following linear mixed-effects model per data set. In particular,
. For simplicity, we adopted a linear effect of time for both the fixed and the random part, and corrected for a binary variable (). Time
was simulated from a uniform distribution between zero and. For the survival part, we assumed the following model:
The baseline risk was simulated from a Weibull distribution . For the simulation of the censoring times, an exponential censoring distribution was chosen so that the censoring rate was between 40% and 60%. Age was simulated from a normal distribution with mean 45 and standard deviation 15.7.
Under this setting we simulated three different data sets that have different parameters for the fixed effects in the longitudinal submodel, the baseline covariates and baseline hazard in the survival submodel, the variance-covariance matrix of the random effects and the association parameter (more details are presented in Table 2). Figure 4 illustrates the evolution of the longitudinal outcome per group from the simulation parameters for each one of the three data sets.
|Data set 1|
|(Intercept) = 8.03||0.69||0.87||1.8||10||(Intercept) = -4.85||0.38|
|Male = -5.86||0.02||Age = -0.02|
|Time = -0.16|
|Data set 2|
|(Intercept) = -8.03||0.69||0.02||1.4||10||(Intercept) = -4.85||0.08|
|Male = 12.20||0.91||Age = 0.09|
|Time = 0.46|
|Data set 3|
|(Intercept) = 0.03||0.69||0.28||1.8||10||(Intercept) = 2.85||0.58|
|Male = -1.96||0.31||Age = -0.12|
|Time = -0.01|
In order to investigate the proposed class selection approach, we applied the model in three different Scenarios. For Scenario I we combined all three data sets assuming individuals in each of them, for Scenario II we combined the first two data sets with and finally, for Scenario III we used only the first data set with . We fitted the proposed joint model assuming six classes where all the parameters were class-specific except for the measurement error from the mixed-effects model. These include the fixed effects from the longitudinal submodel (3 parameters), the baseline covariates from the survival model (2 parameters), the baseline hazard (5 parameters) from the survival submodel, the variance-covariance matrix of the random effects (3 parameters) and the association parameter (1 parameter). To simplify the simulations, for the baseline hazard we assumed quadratic B-splines basis with 5 equally distance internal knots. We assumed which is smaller than the total number of the class-specific parameters divided by two. The same priors were used as in the application and we ran the MCMC using a single chain with 50,000 iterations, 25,000 burn-in and 10 thinning.
We performed 150 simulations per Scenario. We compared our proposed method with the joint latent class model using function Jointlcmm from the lcmm package in R developed by Proust-Lima et al. (2015), where we used the BIC as criterion. In particular, we assumed that the covariates from the fixed effects in the mixed-effects model, the variance-covariance of the random effects, the baseline covariate and baseline hazard in the survival model are class-specific. We, furthermore, assumed a cubic M-splines baseline risk function.
The results from the different Scenarios assuming different cut off percentage indicating when a class is defined as empty are illustrated in Table 3. In particular, we present the percentage of true number of classes and the mode of the number of classes.
For Scenario I, we obtain the highest percentage when assuming to be between 12-15%. In particular, we obtain around 55% of the time the correct number of classes and a mode equal to the correct number of classes (three). On the other hand, the BIC in the joint latent class model selects only 20% of the time the correct number of classes. Furthermore, this method seems to underestimate the true number of classes (mode equal to one).
For Scenario II, we obtain the highest percentage when is between 8-15%. In particular, we obtain around 50% of the time the correct number of classes and a mode equal to the correct number of classes (two). On the other hand, the BIC in the joint latent class model selects only 12% of the time the correct number of classes. Similar to Scenario I, this method seems to underestimate the true number of classes (mode equal to one).
Finally, for Scenario III, the BIC in the joint latent class model seems to perform better than the proposed approach where it always selects the correct number of classes (one). This is not surprising, since the BIC always underestimated the true number of classes in the previous Scenarios. Using the proposed approach and assuming that the is equal to 15%, we obtain 43% of the time the correct number of class and a mode equal to two. In this Scenario some convergence problems were detected. When recalculating the % and mode including only the simulations that were converged we obtain similar percentages for the correct number of class and a mode equal to 1 when is 15%.
|(%)||true # of classes (%)||mode of # of classes|
In this paper we proposed a shared parameter joint model incorporating latent classes. Applying it to CF data, this models accounted for patient heterogeneity inherent in the progression of . Compared to previously proposed joint latent class models (Proust-Lima et al., 2014) we obtained the strength of the association between and time-to first exacerbation per group of patients. Finally, we focused on the selection of the optimal number of classes and used an overfitted mixture model (high number of classes) to obtain the non-empty classes.
A limitation of this approach is that it requires an intensive computational effort. In particular, for the class selection, where a model with a high number of classes is required, the number of parameters increases drastically. This, in combination with the high number of observations in the CF application increases the computational time that is required. Considering the difficulty of this model it is almost impossible to obtain the optimal number of classes with other Bayesian criterion. Implementing the proposed criterion is straightforward; however, due to the complexity of the model it is computationally expensive to fit a model with a larger number of classes, e.g., 10. This could also explain the fact the a higher percentage for the predefined number was required in order to obtain the non-empty number of classes. It was shown in the simulation analysis that the BIC always underestimated the true number of parameters and it, therefore, performed better when the true number of classes was one. Even though, in that Scenario the proposed method did not work perfectly, it seems to be better than other criteria and easier to perform.
Possible extensions would be to include more covariates also in the survival submodel in order to take into account extra information regarding the patients. Furthermore, using the proposed model for obtaining future measurement and time-to first exacerbation probabilities, could lead to more efficient treatment prioritization and clinical management for patients with CF.
- Andrinopoulou et al. (2014) E.-R. Andrinopoulou, D. Rizopoulos, J. J. Takkenberg, and E. Lesaffre. Joint modeling of two longitudinal outcomes and competing risk data. Statistics in medicine, 33(18):3167–3178, 2014.
- Brown and Ibrahim (2003) E. R. Brown and J. G. Ibrahim. Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials. Biometrics, 59(3):686–693, 2003.
- Celeux et al. (2006) G. Celeux, F. Forbes, C. P. Robert, D. M. Titterington, et al. Deviance information criteria for missing data models. Bayesian analysis, 1(4):651–673, 2006.
- Faucett and Thomas (1996) C. L. Faucett and D. C. Thomas. Simultaneously modelling censored survival data and repeatedly measured covariates: a gibbs sampling approach. Statistics in medicine, 15(15):1663–1685, 1996.
- Green (1995) P. J. Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711–732, 1995.
- Hickey et al. (2016) G. L. Hickey, P. Philipson, A. Jorgensen, and R. Kolamunnage-Dona. Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues. BMC medical research methodology, 16(1):117, 2016.
- Lin et al. (2002) H. Lin, B. W. Turnbull, C. E. McCulloch, and E. H. Slate. Latent class models for joint analysis of longitudinal biomarker and event process data: application to longitudinal prostate-specific antigen readings and prostate cancer. Journal of the American Statistical Association, 97(457):53–65, 2002.
- Nasserinejad et al. (2017) K. Nasserinejad, J. van Rosmalen, W. de Kort, and E. Lesaffre. Comparison of criteria for choosing the number of classes in bayesian finite mixture models. PloS one, 12(1):e0168838, 2017.
Proust and Jacqmin-Gadda (2005)
C. Proust and H. Jacqmin-Gadda.
Estimation of linear mixed models with a mixture of distribution for the random effects.Computer methods and programs in biomedicine, 78(2):165–173, 2005.
- Proust-Lima et al. (2013) C. Proust-Lima, H. Amieva, and H. Jacqmin-Gadda. Analysis of multivariate mixed longitudinal data: a flexible latent process approach. British Journal of Mathematical and Statistical Psychology, 66(3):470–487, 2013.
- Proust-Lima et al. (2014) C. Proust-Lima, M. Séne, J. M. Taylor, and H. Jacqmin-Gadda. Joint latent class models for longitudinal and time-to-event data: A review. Statistical methods in medical research, 23(1):74–90, 2014.
- Proust-Lima et al. (2015) C. Proust-Lima, V. Philipps, and B. Liquet. Estimation of extended mixed models using latent classes and latent processes: the r package lcmm. arXiv preprint arXiv:1503.00890, 2015.
- Rizopoulos (2012) D. Rizopoulos. Joint models for longitudinal and time-to-event data: With applications in R. Chapman and Hall/CRC Biostatistics Series, Boca Raton, 2012.
- Rizopoulos and Ghosh (2011) D. Rizopoulos and P. Ghosh. A bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Statistics in medicine, 30(12):1366–1380, 2011.
- Rouanet et al. (2016) A. Rouanet, P. Joly, J.-F. Dartigues, C. Proust-Lima, and H. Jacqmin-Gadda. Joint latent class model for longitudinal data and interval-censored semi-competing events: Application to dementia. Biometrics, 72(4):1123–1135, 2016.
- Rousseau and Mengersen (2011) J. Rousseau and K. Mengersen. Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(5):689–710, 2011.
- Schluchter et al. (2002) M. D. Schluchter, M. W. Konstan, and P. B. Davis. Jointly modelling the relationship between survival and pulmonary function in cystic fibrosis patients. Statistics in medicine, 21(9):1271–1287, 2002.
- Schwarz et al. (1978) G. Schwarz et al. Estimating the dimension of a model. The annals of statistics, 6(2):461–464, 1978.
- Szczesniak et al. (2017a) R. Szczesniak, S. L. Heltshe, S. Stanojevic, and N. Mayer-Hamblett. Use of fev1 in cystic fibrosis epidemiologic studies and clinical trials: A statistical perspective for the clinical researcher. Journal of Cystic Fibrosis, 16(3):318–326, 2017a.
- Szczesniak et al. (2017b) R. D. Szczesniak, D. Li, W. Su, C. Brokamp, J. Pestian, M. Seid, and J. P. Clancy. Phenotypes of rapid cystic fibrosis lung disease progression during adolescence and young adulthood. American journal of respiratory and critical care medicine, 196(4):471–478, 2017b.
- Tsiatis and Davidian (2004) A. A. Tsiatis and M. Davidian. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica, 14:809–834, 2004.
- Verbeke and Lesaffre (1996) G. Verbeke and E. Lesaffre. A linear mixed-effects model with heterogeneity in the random-effects population. Journal of the American Statistical Association, 91(433):217–221, 1996.
- Wulfsohn and Tsiatis (1997) M. S. Wulfsohn and A. A. Tsiatis. A joint model for survival and longitudinal data measured with error. Biometrics, pages 330–339, 1997.