Statistical approaches using longitudinal biomarkers for disease early detection: A comparison of methodologies

08/21/2019
by   Yongli Han, et al.
National Institutes of Health
0

Early detection of clinical outcomes such as cancer may be predicted based on longitudinal biomarker measurements. Tracking longitudinal biomarkers as a way to identify early disease onset may help to reduce mortality from diseases like ovarian cancer that are more treatable if detected early. Two general frameworks for disease risk prediction, the shared random effects model (SREM) and the pattern mixture model (PMM) could be used to assess longitudinal biomarkers on disease early detection. In this paper, we studied the predictive performances of SREM and PMM on disease early detection through an application to ovarian cancer, where early detection using the risk of ovarian cancer algorithm (ROCA) has been evaluated. Comparisons of the above three methods were performed via the analyses of the ovarian cancer data from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial and extensive simulation studies. The time-dependent receiving operating characteristic (ROC) curve and its area (AUC) were used to evaluate the prediction accuracy. The out-of-sample predictive performance was calculated using leave-one-out cross-validation (LOOCV), aiming to minimize the problem of model over-fitting. A careful analysis of the use of the biomarker cancer antigen 125 for ovarian cancer early detection showed improved performance of PMM as compared with SREM and ROCA. More generally, simulation studies showed that PMM outperforms ROCA unless biomarkers are taken at very frequent screening settings.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/30/2022

Enhancing Cancer Prediction in Challenging Screen-Detected Incident Lung Nodules Using Time-Series Deep Learning

Lung cancer is the leading cause of cancer-related mortality worldwide. ...
04/24/2018

Test for Incremental Value of New Biomarkers Based on OR Rules

In early detection of disease, a single biomarker often has inadequate c...
04/29/2021

Leveraging Online Shopping Behaviors as a Proxy for Personal Lifestyle Choices: New Insights into Chronic Disease Prevention Literacy

Ubiquitous internet access is reshaping the way we live, but it is accom...
04/26/2011

Preprocessing: A Step in Automating Early Detection of Cervical Cancer

This paper has been withdrawn...
05/15/2020

A causal model for subgroup effects in randomized screening trials

The primary analysis of randomized cancer screening trials for cancer ty...
01/25/2022

Toward a Minecraft Mod for Early Detection of Alzheimer's Disease in Young Adults

This paper proposes a Minecraft-based system for early detection of Alzh...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Epidemiologic studies usually incorporate longitudinal biomarkers into the prediction of clinical outcomes. Early disease detection could also benefit from this approach, since additional information on critical time point and pathology is often contained in the subject-specific biomarker trajectories (Drescher et al., 2013; Menon et al., 2015; Han and Liu, 2019). Tracking longitudinal biomarkers in a population may result in earlier disease detection and may help to reduce mortality from diseases that are more treatable if detected early (McIntosh and Urban, 2003). One example is ovarian cancer, which is the fifth leading cause of cancer-related deaths among the U.S. women and one of the most lethal gynecological cancers (Skates et al., 2003; Zhang et al., 2004; Clarke-Pearson, 2009; Henderson et al., 2018; Russell et al., 2017). Ovarian cancer usually has no symptoms at its early stage and develops undetected until it has spread within the pelvis and abdomen (Matulonis et al., 2016). The U.S. ovarian cancer survival statistics show that the 5-year survival rate for women with late stage ovarian cancer is only 29.2% (23% in the U.K.), in contrast to 92.4% (about 90% in the U.K.) for those diagnosed at an early stage (Howlader et al., 2019; Russell et al., 2017). Although early stage ovarian cancer can be treated with a higher success rate (Clarke-Pearson, 2009), the majority of ovarian cancer cases are diagnosed at late stage, when curative treatment rarely exists, making methods for the detection of early stage ovarian cancer desirable (Skates et al., 2017). However, large randomized trials have not shown a survival benefit for current early detection approaches of ovarian cancer so far (Pinsky et al., 2013; Wentzensen, 2016).

When the interest is to use longitudinal biomarker information to predict a subsequent binary outcome, good modeling of the longitudinal biomarker trajectories is often the key to obtain accurate outcome prediction. However, in the prediction of ovarian cancer early detection, there is not much research to investigate how the different strategies of modeling the biomarker trajectories would affect the prediction accuracy under the longitudinal setting. Besides, a comparison of the different techniques is complicated, in part because it may depend on the screening frequency of the biomarker.

The risk of ovarian cancer algorithm (ROCA) has been proposed for ovarian cancer early detection using repeatedly measured serum biomarker cancer antigen 125 (CA-125) values (Skates et al., 2001)

. Specifically in the model setting, ROCA separately models the longitudinal CA-125 trajectories for the cases and the controls. For the controls, a constant mean model of CA-125 is assumed with a random intercept term that accounts for subject heterogeneity. For the cases, the CA-125 trajectory is assumed to be piecewise linear with a latent subject-specific changepoint. The probability of early detection is then constructed using Bayes’ theorem.

Recently, two general frameworks for disease risk prediction by modeling longitudinal biomarker behaviors have been developed. The shared random effects model (SREM) (Albert, 2012) predicts a binary outcome based on the longitudinal biomarkers by assuming a shared random effects structure that links the binary outcome and the longitudinal process together, while the pattern mixture model (PMM) (Liu and Albert, 2014) fits the biomarker distributions conditional on the binary outcome and then constructs the outcome prediction using Bayes’ theorem. SREM and PMM are originally proposed for disease risk prediction using serial biomarker values and associated observation times (Han and Liu, 2019; Liu and Albert, 2014), but can be applied more generally to predict disease early detection in the longitudinal setting.

In this paper, we focused on examining and evaluating the utility of SREM and PMM for disease early detection through an application to ovarian cancer, under which SREM and PMM were compared with ROCA. Comparisons of SREM and PMM with ROCA were performed via simulation studies and an empirical analysis of the ovarian cancer data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Specifically, we first extended ROCA to identify the potential effects of the baseline age and the screening time on the marker trajectories and then proposed a maximum-likelihood approach for parameter estimation. Specific formulations of SREM and PMM for predicting the early detection of ovarian cancer were also proposed. Modeling forms under SREM, PMM, and ROCA for the longitudinal CA-125 trajectories in the PLCO Cancer Screening Trial were compared and their effects on the prediction accuracy of ovarian cancer early detection were further assessed. The predictive performances of different models were evaluated by the time-dependent receiving operating characteristic (ROC) curve and its area (AUC)

(Heagerty et al., 2000), such that the right censored cancer diagnosis times can be incorporated into the AUC calculation. Previous studies, to the contrary, simply used an ordinary ROC curve to examine the accuracy of ROCA (Russell et al., 2017) by treating the ovarian cancer outcome as binary and hence ignored the diagnosis time information. Furthermore, to estimate the out-of-sample prediction accuracy, we applied the leave-one-out cross-validation (LOOCV) technique to minimize the problem of model overfitting (Russell et al., 2017; Berchuck et al., 2005). In addition, to check the effects of biomarker measuring frequency on the prediction accuracy of SREM, PMM, and ROCA regarding ovarian cancer early detection, we considered three screening frequencies (annual, biannual, and quarterly) in the simulation studies. Our research answered the questions about the effects of using SREM, PMM, and ROCA as well as the effects of different screening frequencies on the prediction accuracy of ovarian cancer early detection.

The rest of this paper is organized as follow: Section 2 briefly reviews the general settings of the SREM and PMM frameworks. Section 3 first introduces the problem of predicting the ovarian cancer early detection and reviews ROCA. Potential issues of ROCA are pointed out and several extensions are then proposed. In the end, the formulations of SREM and PMM tailored for predicting ovarian cancer early detection are specified. We compare the prediction performances of PMM, SREM, and ROCA in Section 4 through an application to the PLCO ovarian cancer trial and perform additional simulation studies in Section 5. Section 6 ends this study with a discussion.

2 Review of Methods

In this section, we review the SREM and the PMM frameworks. Without loss of generality, let denote the biomarker value for the th individual at time , where , is sample size, , and is the number of longitudinal biomarker measurements for the th participant. Without loss of generality, assume the first subjects are cases and the rest are controls. Let be the binary outcome, such that indicates a case and denotes a control.

2.1 Shared Random Effects Model (SREM)

SREM jointly models the longitudinal biomarker trajectories and the binary disease outcomes (Albert, 2012)

, which are assumed to share the same set of random effects. For the case and the control trajectories, a linear mixed model was proposed

(1)

where and are design matrices of the fixed and the random covariates, are the fixed effects, stand for the random effects, and are the random measurement errors. The relation between and is given as

(2)

where is a link function and is a function of random effects. In the simulation studies and the real data analyses, we set to be , so each random effect would affect the disease outcome differently. The strength of association between the longitudinal process and the binary outcome is quantified by , while is the intercept.

As SREM gives the distributions of and

, the joint distribution of

and can be derived by integrating over the random effects . The diagnosis probability thus can be calculated from the joint distribution of and as

(3)

where

is the probability density function (PDF) of

. If a probit link function is used and , Albert (2012) revealed that the SREM estimation can be substantially simplified by decomposing the joint likelihood function of and as

(4)

where is from the longitudinal process in (1) and has an explicit expression

(5)

Parameters in (1) are estimated by maximizing , while and in (2) are estimated by maximizing , given the estimates from . The probability is then obtained by replacing the parameters in (5) with their maximum likelihood estimates (MLEs).

2.2 Pattern Mixture Model (PMM)

PMM directly makes the assumption of a linear mixed model on the biomarker trajectories conditional on the binary outcome, i.e., . In other words, PMM formulates the longitudinal behaviors for the diseased and non-diseased subjects separately. With normal random effects and error terms,

follows a multivariate normal distribution

(6)

where the means vectors and the covariance matrices are both functions of the covariates. The linear mixed models can be easily estimated from standard statistical software packages. Then the probability

can be obtained using Bayes’ rule

(7)

where is the prior information that is often known or can be estimated from the empirical data. The likelihood ratio under PMM is shown to be the optimal combination of the longitudinal biomarkers, provided that can be accurately estimated (Liu and Albert, 2014).

PMM and SREM derive the disease risk prediction as an implicit function of the biomarkers and their observation time, and can be applied more generally, for instance, to disease early detection. It has been shown that PMM is a close approximation to SREM but the converse is not true: if SREM is the true data generation model, both PMM and SREM would have similar performances; but if PMM is the truth, SREM generally results in sub-optimal risk prediction (Liu and Albert, 2014; Han and Liu, 2019).

3 Methods for ovarian cancer early detection

Early disease detection may help to prevent death from diseases like cancer that are more curable at early stage. It is especially true for ovarian cancer, which rarely has curative treatment when detected at late stage. Good modeling of the biomarker trajectories is usually the key to accurate detection prediction. To understand the unique feature of ovarian cancer biomarker trajectories, we considered samples from the PLCO Cancer Screening Trial, where the biomarker cancer antigen 125 (CA-125) was studied for screening. Trajectories of the log-transformed CA-125 of 50 cases (women from the intervention arm of the trial who developed ovarian cancer during the screening) and 50 controls (those who did not but were also from the intervention arm of the trial so CA-125 levels were available) that were randomly selected from the PLCO Cancer Screening Trial are shown in Figure 1.

Figure 1: CA-125 trajectories of 50 cases and 50 controls that were randomly selected from the PLCO cancer screening trial

The CA-125 trajectories for cases may be either flat or stay flat at first and then jump up at some time point during the screening. As a contrast, the control trajectories do not jump up and almost always keep flat. The special patterns in the case and the control trajectories require advanced modeling strategies for the CA-125 behaviors.

In this section, we first review a well-studied approach that has been proposed for ovarian cancer early detection, namely the risk of ovarian cancer algorithm (ROCA). We then extend ROCA to identify the potential effects of the baseline age and the screening time on the biomarker trajectories and developed a maximum-likelihood approach for ROCA parameter estimation. We further propose a PMM and a SREM specifically for ovarian cancer early detection prediction.

Under the setting of ovarian cancer early detection, denotes the natural logarithm transformed CA-125 measurement for the th woman at time (years since trial randomization). The time is the cancer diagnosis time for a case subject and the censored follow-up time for a control subject.

3.1 Risk of Ovarian Cancer Algorithm (ROCA)

As a method specifically developed for predicting ovarian cancer early detection, ROCA separately models the longitudinal CA-125 trajectories for the cases and controls (Skates et al., 2001). For the averaged case CA-125 trajectory, a piecewise linear model with a latent subject-specific changepoint conditional on the diagnosis time is assumed. The mean for cases with elevated CA-125 after is

The slope of increase after is denoted as , is a subject-specific random intercept, and when and 0 otherwise. Skates et al. (2001) assumed that the changepoint follows a known truncated normal distribution conditional on . For cases without elevated CA-125, the mean is

As for controls, a constant mean model

is assumed due to the flat CA-125 behaviors.

Parameter estimation of ROCA was implemented using the Bayesian framework. As for the ovarian cancer detection prediction, ROCA also calculates the detection probability using the Bayes’ rule in (7) but slightly differs from PMM. The difference will be discussed in Section 3.2.

3.2 Extended ROCA

The piecewise linear model for cases under ROCA can be rewritten as

(8)

where is decomposed into a constant intercept and a random one , and is decomposed into a constant rate and a random slope . Random effects . This model depicts that the CA-125 trajectory is flat before the changepoint, and then increases with a slope of after the changepoint. For case subjects without observed changepoint during the screening, the term is just 0, leading to a flat trajectory . Similarly, the ROCA control model can be rewritten as a random intercept model

(9)

For easy presentation, denote the case model (8) as Case Model 1 (CS1) and the control model (9) as Control Model 1 (CN1).

Several issues of ROCA requires attention: (i) the changepoint follows an assumed known truncated normal distribution, which may not be reasonable for all study populations; (ii) CN1 ignores the potential effects of the baseline age and the screening time on the CA-125 trajectories, inducing potentially biased inferences; (iii) the detection probability calculation may suffer loss of accuracy. Theoretically, when predicting the cancer detection of a new subject , ROCA calculates the probability using the formula in (7). However, ROCA only obtains an approximation of , say , since it models rather than . Calculating requires ROCA to marginalize over the diagnosis time , which is frequently unknown for the new individual . To get the approximated probability , ROCA essentially implements the marginalization by “borrowing” information about the diagnosis time from participants who have already been known as cases via below formula

where is the number of known cases, the gap time , is the cancer diagnosis time for the th known case, whose last screening time is , and is the last screening time of the new individual . This marginalization eventually leads to

which may result in loss of prediction accuracy (as we observed this in simulation studies later), especially when the sample size of known cases is relatively small.

To address the above mentioned issues, we propose to extend the original ROCA in the following ways:

  • Instead of being prespecified, the parameters of the changepoint distribution are estimated, i.e., . Denote the case model (8) with this unspecified changepoint distribution as Case Model 2 (CS2).

  • The control CA-125 trajectory is characterized by a linear mixed model that adjusts for the screening time

    (10)

    Denote this model as Control Model 2 (CN2).

  • As an alternative to (ii), the control trajectory is depicted by a linear mixed model adjusting for both the screening time and the baseline age

    (11)

    Denote this model as Control Model 3 (CN3).

Different from the Bayesian strategy in Skates et al. (2001), we propose a maximum-likelihood approach for the parameter estimation. The likelihood function can be obtained by integrating in (8) over the changepoint

For CS1, parameters are and is the PDF of specified by the truncated normal distribution. As for CS2, and is the PDF specified in CS2. The above integration with respect to the latent changepoint could be numerically approximated by using the Gauss-Hermite quadrature. To the contrary, all of the three control models can be easily estimated from standard software packages, such as the R package nlme. Related R codes for the parameter estimation are given in the Supplementary Material.

Combinations of case and control models lead to different versions of ROCA, denoted by ROCA-CS1-CN1 (the original ROCA), ROCA-CS1-CN2, ROCA-CS1-CN3, ROCA-CS2-CN1, ROCA-CS2-CN2, and ROCA-CS2-CN3, respectively.

3.3 PMM for Ovarian Cancer

In this section, we propose a PMM such that the case model does not rely on the latent changepoint structure. More specifically, we formulated the averaged case trajectory using a linear mixed model with natural cubic splines to account for the effects of the screening time and the baseline age. The case model under PMM is given as

(12)

where is the B-spline basis of order 3 for the natural cubic spline with knot decided based on , , , and are the fixed effects, and are the random effects, and is the random measurement error. For

, the boundary knots were the minimum and the maximum of the baseline age of all cases, while the two internal knots were respectively set as the first and the third quantiles of all cases’ baseline age. The knots for the screening time were determined in the same way. As for the averaged control trajectory, we used control models CN1, CN2, and CN3. PMM then predicts the ovarian cancer diagnosis using the formula shown in (

7). Denote the three versions of PMM as PMM-CN1, PMM-CN2, and PMM-CN3.

As ROCA separately models the case and the control trajectories, it can be regarded as a special case of the PMM framework. However, there are two key differences between PMM and ROCA

  • The first difference is that instead of using a latent changepoint structure in the case model, PMM assumes a linear mixed model with natural cubic splines and additionally adjusts for the baseline age.

  • The second difference lies in the calculation of the cancer diagnosis probability. As PMM directly models , the diagnosis probability can be calculated without marginalization, avoiding the loss of prediction accuracy.

What is more, compared to ROCA, the parameter estimation under PMM could be easily implemented using standard statistical software packages rather than the Gauss-Hermite quadrature.

3.4 SREM for Ovarian Cancer

Under SREM, a linear mixed model with natural cubic splines for the screening time and the baseline age in a form same as (12) was proposed to simultaneously formulate the case and the control trajectories. The knot settings of the B-spline basis for the baseline age and the screening time were determined in the same way as under PMM. For the ovarian cancer diagnosis prediction, we linked the binary outcome to the longitudinal process using a probit link function . Under this setting, the joint likelihood of and can be directly obtained from (4). The diagnosis probability was calculated using (5) with the MLEs of , , and , which were estimated by the two-stage approach described in Section 2.1.

4 Analysis of the PLCO Ovarian Cancer Data

In this section, ROCA, PMM and SREM were applied to the ovarian cancer example from the PLCO Cancer Screening Trial.

4.1 The PLCO Ovarian Cancer Data

The ovarian cancer dataset from the PLCO Cancer Screening Trial contains 78215 women with baseline age between 55 and 74 years at 10 screening centers across the U.S. from 1993 to 2001. Among them, 39104 women were in the intervention arm (receiving up to 6 annual screenings with CA-125 and 4 annual tests with transvaginal ultrasound) and 39111 in the control arm (under usual medical care). After the first 6 years of active screening, participants in both arms were followed for an additional 7 years (Skates et al., 2003; Buys et al., 2011). Only women in the intervention arm were included in our analyses, since participants in the control arm did not receive CA-125 screening. Women who met any of the following criteria were excluded: (i) women who had historical ovarian cancer diagnosis before trial randomization; (ii) women who had bilateral oophorectomy; (iii) women who received no CA-125 screening or (iv) ovarian cancer cases who were diagnosed more than three years after the last screening test. For CA-125 screenings more than three years from the ovarian cancer diagnosis, they often have flat trajectories that are almost identical to those from controls. It is hard to tell whether any positive findings in the screening would be indicative of ovarian cancer or not (Pinsky et al., 2013). Therefore, cases diagnosed more than three years from the last screening test were excluded from our analysis. As for the intervention arm participants chosen as our controls, their follow-up time was similarly truncated to 3 years. In addition, we removed CA-125 screening results if they were performed after the cancer diagnosis. Our analytic sample eventually included women. Among them, there were

ovarian cancer cases and 30269 controls. The median numbers of longitudinal CA-125 measurements were 4 for cases and 6 for controls due to the PLCO trial design. The ovarian cancer cases were older at the baseline, had fewer CA-125 screenings, and were more likely to have a family history of ovarian or breast cancer. Additional descriptive statistics of the PLCO ovarian cancer samples were tabulated in Table S1 in the Supplementary Material. Pinsky et al. (2013) applied ROCA to the PLCO trial, with the aim of examining whether ROCA can result in a significant mortality benefit of screening in the intervention arm compared with the control arm

(Pinsky et al., 2013). Our analyses only used the intervention arm to compare the predictive accuracy of ROCA with PMM and SREM, in terms of the time-dependent AUC.

4.2 Model Implementation

Different specifications of ROCA case and control models were compared using the likelihood ratio test, Akaike information criterion (AIC) and Bayesian information criterion (BIC). Risk scores of each individual under different models were calculated using the leave-one-out cross-validation (LOOCV) technique to minimize model overfitting: longitudinal CA-125 levels of each participant were deleted from the dataset in turn, all models were estimated from the leave-one-out samples, and then the risk score of the excluded individual was calculated accordingly. Iterating this procedure across all the individuals yielded the out-of-sample risk prediction. Diagnosis prediction accuracy of the models was compared using the time-dependent AUC, during 0.5-3 years since the last CA-125 screening. The time-dependent AUC at a cutoff time is interpreted as the probability that a randomly selected “case”, whose cancer diagnosis was before time , had larger predicted risk than a randomly selected “control”, whose cancer diagnosis was after time

. The 95% confidence interval (CI) of each time-dependent AUC was calculated based on 200 bootstrapping replicates. Because of the LOOCV procedure, the nonparametric bootstrap is computationally forbidden. Therefore, instead of drawing bootstrap samples with replacement, we drew the bootstrapped parameter estimates from the fitted asymptotic multivariate normal distributions (R functions were attached in the Supplementary Material).

4.3 Results

The likelihood ratio test for CS1 and CS2 showed that CS2 had better model fitting than CS1 for the PLCO cases: the negative loglikelihood values for CS1 and CS2 were 387.03 and 330.34, respectively, indicating significant difference (-value: 0.0001) (AIC and BIC were in Table S2 in the Supplementary Material). The fitted parameters of CS1 and CS2 were reported in Table 1. In CS2, and

were 1.054 years (95% confidence interval (CI) = 0.978 to 1.130) and 0.314 year (95% CI = 0.234 to 0.394), respectively, substantially different from the prespecified mean of 2 years and standard deviation of 0.75 year in CS1. The fitted results of PMM and SREM were in Table S3 in the Supplementary Material.

Parameter CS1 CS2
Estimate (95% CI) Estimate (95% CI)
2.304 (2.202, 2.406) 2.381 (2.281, 2.481)
1.352 (1.080, 1.624) 2.365 (1.922, 2.808)
0.488 (0.404, 0.572) 0.497 (0.423, 0.571)
1.191 (0.991, 1.391) 1.423 (1.102, 1.744)
0.265 (0.240, 0.290) 0.271 (0.246, 0.296)
- 1.054 (0.978, 1.130)
- 0.314 (0.234, 0.394)
Table 1: Parameter estimates for CS1 and CS2 based on 133 cases from the PLCO Cancer Screening Trial: estimate and the 95% confidence interval (CI) were reported.

For controls, the likelihood ratio test revealed that CN2 and CN3 were statistically better than CN1 with respective -values 0.0001 and 0.0001, whereas there was no significant difference between CN2 and CN3 (-value: 0.077) (see Table S2 in the Supplementary Material for details). The fitted parameters for all the three control models were reported in Table 2

, which showed that the baseline age and the screening time had small but significant effects on the CA-125 trajectories. In specific, CN3 indicated that the geometric mean level of CA-125 increased by 1.92% (95% CI = 1.82% to 2.02%) every year of follow-up, and by 0.2% (95% CI = 0.1% to 0.3%) per 1-year older in the baseline age.

Parameter CN1 CN2 CN3
Estimate (95% CI) Estimate (95% CI) Estimate (95% CI)
2.337 (2.332, 2.342) 2.296 (2.290, 2.301) 2.158 (2.097, 2.219)
- 0.019 (0.018, 0.020) 0.019 (0.018, 0.020)
- - 0.002 (0.001, 0.003)
0.445 (0.437, 0454) 0.450 (0.441, 0.459) 0.451 (0.442, 0.460)
- 0.039 (0.009, 0.069) 0.039 (0.008, 0.069)
-
0.229 (0.224, 0.233) 0.215 (0.211, 0.220) 0.215 (0.211, 0.220)
Table 2: Parameter estimates for CN1, CN2, and CN3 based on 30269 controls from the PLCO Cancer Screening Trial: estimate and the 95% confidence interval (CI) were reported.

The comparisons of ROCA, PMM, and SREM with respect to their discrimination abilities were shown in Table 3: PMM had the highest time-dependent AUCs: 1.8-3.4% higher than ROCA and 1.6-4.8% higher than SREM across all six cutoff times, while SREM had the lowest AUCs. The comparison using bootstrapping replicates further showed that PMM had significantly larger AUCs than ROCA and SREM, whereas there was no significant difference between SREM and ROCA (details were in the Supplementary Material). Among different versions of ROCA, more complex case and control models only slightly improved the time-dependent AUCs at nearly all examined cutoff time points, despite having much better goodness of fit than the original ROCA. The comparisons over all of the methods regarding the same setting of case or control model were demonstrated in Figure 2. Figure 2(a)-2(c) illustrated that the improvement in the control model fitting slightly increased the diagnosis prediction accuracies of ROCA and PMM at almost all cutoff times. For example, at year 2, the AUC of ROCA-CS1-CN1 was 0.809 (95% CI (0.797, 0.818)), compared to 0.814 (0.802, 0.827) for ROCA-CS1-CN3 that was with a more complicated control model. Figure 2(d)-2(f) showed that more complex case model barely improved the prediction accuracy of ROCA at all cutoff times. For instance, at year 2, the AUC of ROCA-CS2-CN1 was 0.811 (0.798, 0.817), almost identical to the one of ROCA-CS1-CN1. The advantage of PMM over ROCA and SREM was displayed in Figure 2(d)-2(f). The AUCs of SREM were very close to those of ROCA at the beginning of the follow-up period but diminished thereafter. In addition, we compared the time-dependent ROC curves of the best ROCA (ROCA-CS2-CN3), the best PMM (PMM-CN3), and SREM across all six cutoff times. As shown in Figure 3, the ROC curve comparison supported the conclusion that PMM was better than ROCA and SREM, and there was no clear advantage of ROCA over SREM, though the AUCs of ROCA were larger than those of SREM.

Method Time-dependent AUC (95% bootstrapped confidence interval)
Year 0.5 Year 1.0 Year 1.5
ROCA-CS1-CN1 0.927 (0.916, 0.933) 0.866 (0.855, 0.873) 0.841 (0.827, 0.850)
ROCA-CS1-CN2 0.928 (0.917, 0.933) 0.868 (0.857, 0.875) 0.845 (0.830, 0.852)
ROCA-CS1-CN3 0.927 (0.916, 0.934) 0.868 (0.857, 0.876) 0.845 (0.829, 0.852)
ROCA-CS2-CN1 0.928 (0.920, 0.935) 0.865 (0.850, 0.869) 0.840 (0.824, 0.846)
ROCA-CS2-CN2 0.928 (0.919, 0.934) 0.866 (0.851, 0.871) 0.842 (0.827, 0.850)
ROCA-CS2-CN3 0.927 (0.918, 0.934) 0.867 (0.851, 0.872) 0.843 (0.827, 0.849)
PMM-CN1 0.946 (0.937, 0.953) 0.892 (0.887, 0.900) 0.863 (0.857, 0.871)
PMM-CN2 0.946 (0.937, 0.954) 0.894 (0.886, 0.902) 0.865 (0.857, 0.872)
PMM-CN3 0.946 (0.937, 0.954) 0.894 (0.886, 0.902) 0.865 (0.858, 0.872)
SREM 0.930 (0.920, 0.938) 0.853 (0.843, 0.862) 0.836 (0.827, 0.844)
Method Year 2.0 Year 2.5 Year 3.0
ROCA-CS1-CN1 0.809 (0.797, 0.818) 0.786 (0.770, 0.796) 0.767 (0.750, 0.773)
ROCA-CS1-CN2 0.813 (0.801, 0.826) 0.789 (0.774, 0.798) 0.772 (0.755, 0.777)
ROCA-CS1-CN3 0.814 (0.802, 0.827) 0.790 (0.774, 0.800) 0.773 (0.755, 0.778)
ROCA-CS2-CN1 0.811 (0.798, 0.817) 0.785 (0.770, 0.796) 0.768 (0.750, 0.773)
ROCA-CS2-CN2 0.814 (0.805, 0.824) 0.789 (0.773, 0.800) 0.772 (0.757, 0.779)
ROCA-CS2-CN3 0.814 (0.805, 0.824) 0.790 (0.773, 0.800) 0.772 (0.757, 0.780)
PMM-CN1 0.837 (0.831, 0.844) 0.816 (0.807, 0.824) 0.797 (0.789, 0.808)
PMM-CN2 0.842 (0.832, 0.851) 0.819 (0.810, 0.828) 0.801 (0.791, 0.809)
PMM-CN3 0.842 (0.832, 0.851) 0.819 (0.810, 0.828) 0.801 (0.791, 0.809)
SREM 0.794 (0.787, 0.801) 0.774 (0.765, 0.782) 0.760 (0.751, 0.768)
Table 3: Time-dependent AUCs of ROCA, PMM, and SREM on analyzing the ovarian cancer data from the PLCO Cancer Screening Trial. The 95% bootstrapped confidence intervals were provided.

Figure 2: Time-dependent AUC comparisons for ROCA, PMM, and SREM: comparisons under the same case model setting were in figure (a)-(c) while comparisons under the same control model setting were in figure (d)-(f).

Figure 3: Time-dependent ROC curve comparisons for ROCA, PMM, and SREM across all six cutoff time. Only the best ROCA (ROCA-CS2-CN3), the best PMM (PMM-CN3), and SREM were considered.

5 Simulation Studies

5.1 Simulation Settings

Predictive performances of ROCA, PMM and SREM were further compared in simulation studies. In order to apply ROCA, both longitudinal and survival information for cases and controls should be simulated. However, as discussed in Section 3.3 and Section 3.4, PMM and SREM predicted the ovarian cancer diagnosis using CA-125 levels directly. No explicit dependence relationship between the longitudinal process and the diagnosis time was set up under PMM and SREM, indicating the diagnosis time simulated from PMM and SREM would not be informative for the longitudinal observations, and hence can not be used to fit ROCA. To the contrary, the longitudinal process and the diagnosis time were explicitly linked together under ROCA.

Two simulation scenarios were considered: Scenario 1 used ROCA-CS2-CN3 as the true model, whereas Scenario 2 used PMM-CN3 as the true model. We did not pursue the scenario that SREM is the true data generation model based on the results in Section 4.3 that PMM was superior to SREM. For each simulation, a training dataset of controls and cases were generated according to the true model. The controls and cases in the training dataset were used to fit the corresponding control and case models, respectively. Then the estimated models were applied to a separately generated testing dataset, which also contained the same number of controls and cases as the training dataset. The time-dependent AUCs from 0.5 to 3 years after the last CA-125 observation were calculated in the testing sample. In addition, three screening frequencies were considered in the simulations: annual, biannual and quarterly screening, aiming to examine how the performances of the above models would be affected by the frequency of CA-125 screening.

To closely mimic the real PLCO ovarian cancer data, particularly the gap time between the diagnosis time and last screening test associated with each subject, we proposed the following data generation procedure. Let be the unobserved time of the ovarian cancer diagnosis, and the censoring time. The observed survival time is given by , where and with being the indicator function. Let be the cluster size and be the log-transformed CA-125 marker values observed at times , respectively. Let be the gap time between the last observation and the end of follow-up. The procedure to generate survival and longitudinal data is given as follow

  1. Step 1: simulate the survival data. The distribution of

    is simulated from an exponential distribution with rate

    . The distribution of is simulated from a mixture of two lognormal distributions . All parameter values of the above distributions were obtained by fitting the real PLCO ovarian cancer data. The censoring time is truncated at 8.9 years as the maximum follow-up used in this analysis. Results from the PLCO ovarian cancer data showed that the Kaplan-Meier survival curves were very close to the above fitted parametric distributions for and (see Figure S6 in the Supplementary Material for detail).

  2. Step 2: simulate the gap time . The gap time is bounded in , where and . This setting is to guarantee that all the observation times would be between year 0 and year 6. For each subject , we randomly sample one of the participants from the PLCO cancer data that are bounded in . Meanwhile, the associated age of is also chosen as the baseline age of the th subject.

  3. Step 3: simulate the cluster size and the screening time . Set , where denotes the maximum integer not exceeding . Under the annual screening setting, the screening time , so that the biomarker is screened annually from year 0 to year . The cluster size and the screening time under the settings of biannual and quarterly screening can be set similarly. In specific, under the biannual screening, the size is and the time is , while they are and under the quarterly screening.

  4. Step 4: simulate the log-transformed CA-125 values using ROCA-CS2-CN3 or PMM-CS3 based on the above simulated screening time, the diagnosis time, the event status, and the baseline age.

Under Scenario 1, both training and testing datasets were set to contain 1000 controls and 500 cases. While under Scenario 2, the total number of controls and cases was 30402 in both the training and testing datasets. Details about the simulation setting and an R function were provided in the Supplementary Material.

Simulation results under Scenario 1 were reported in Table 4. When the annual screening scheme was simulated (same as the PLCO ovarian cancer data), the differences among all ROCAs were small, but in general ROCA-CS2-CN3 had the best performance regarding diagnosis prediction, almost identically followed by ROCA-CS2-CN2. ROCA outperformed PMM and SREM over all cutoff time points, while SREM had the least satisfactory performance. As the number of CA-125 screenings increased, the predictive advantage of ROCA over PMM and SREM became more prominent. This is because more data points near the CA-125 changepoint became available to precisely estimate the latent changepoint structure of ROCA. To the contrary, the predictive performances of SREM deteriorated as the number of the screened CA-125 measurements increased, as the difference between case and control trajectories became evident. The discriminative accuracy of PMM did not change much.

When PMM-CN3 was the true model, Table 5 showed that PMM outperformed ROCA and SREM. For both ROCA and PMM, complicated case or control models only provided a small amount of improvements to the time-dependent AUCs. SREM still had the least satisfactory performances. As the number of CA-125 screenings increased, all methods had improved values for the time-dependent AUC.

6 Discussion

In this paper, we focused on the problem of predicting disease early detection using longitudinal biomarker measurements. Two general disease risk prediction frameworks, the shared random effects model (SREM) (Albert, 2012) and the pattern mixture model (PMM) (Liu and Albert, 2014) were considered. We showed that SREM and PMM can be applied to disease early detection in a general setting, though they were developed in a very different situation of disease risk prediction. We examined and evaluated the utility of SREM and PMM for disease early detection through an application to the early detection of ovarian cancer from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial. The predictive performances of SREM and PMM were compared with the risk of ovarian cancer algorithm (ROCA), which is specifically proposed for ovarian cancer early detection (Skates et al., 2001). Specific formulations of SREM and PMM for predicting ovarian cancer early detection were provided. We also extended ROCA by estimating the latent changepoint structure and considering the effects of the screening time and the baseline age on the development of the biomarker trajectory. The predictive performances of the above three methods were assessed using the time-dependent AUC (Heagerty et al., 2000), such that the censored cancer diagnosis time information can be incorporated into the AUC calculation. We additionally studied the effects of three biomarker screening frequencies (annual, biannual, and quarterly) on model prediction accuracy via simulations.

In the PLCO ovarian cancer data analysis, we found that PMM significantly outperformed ROCA and SREM. Though ROCA had slightly larger AUC values than SREM, it did not significantly differ from SREM. We noticed that a design of the PLCO trial may affect the AUC values we presented: in the intervention arm, CA-125 was used to manage women, i.e., they were referred to diagnostic evaluation when CA-125 was elevated. Therefore, the AUC based on CA-125 may be overestimated under this setting. However, such design would not affect the comparison pattern of the above approaches. The comparison is interesting as ROCA is more biologically sensible for modeling the “jump-up” pattern in the case marker trajectories shown in Figure 1. One explanation is the way that ROCA implements the prediction. ROCA models the marker profiles using unobserved changepoints conditioning on the cancer diagnosis time, which is unknown when it comes to predict the cancer onset for a new individual. To calculate the detection probability, ROCA needs to estimate the joint distribution of the longitudinal marker profiles by marginalizing out the diagnosis time. However, this marginalization may result in loss of prediction accuracy, especially when the sample size of the cases is relatively small. In addition, the estimation of the latent changepoint structure may suffer from sparse measurements around the changepoint under the annual screening design of CA-125 in the PLCO trial. We found from the simulation studies that the performance of ROCA could be substantially improved with more frequent screening data. To the contrary, SREM and PMM directly estimated the CA-125 distribution independently from the cancer diagnosis time and used natural splines to model the nonlinear marker trajectories, avoiding the difficulty in estimating the latent changepoint. The changepoint pattern also exists in other cancer studies, for example, prostate cancer, (Barry, 2001), where the level of the biomarker prostate-specfic-antigen would be elevated before a prostate cancer case is diagnosed. This hence indicates the general applicability of SREM and PMM to disease early detection.

Extensions to the case and the control models in the original ROCA were proposed. We found that these extensions resulted in better model fitting, but only slightly improved the predictive performance of ROCA, both in the PLCO data analysis and in simulation studies. There may be several explanations of this result. First, as a rank-based measure, AUC is difficult to improve, unless the rankings of the calculated risk differ dramatically. Second, the latent changepoint structure is hard to estimate precisely with only few observations around the changepoint, and hence extending the case model with flexible changepoint distribution may not substantially change the risk calculation. Third, as the screening time and the baseline age only have small effects on the longitudinal CA-125 trajectory, incorporating them in the control model may not strongly affect the risk calculation either.

The performance of SREM was not as good as PMM or ROCA, possibly due to that SREM models the CA-125 trajectories of both cases and controls simultaneously. This simultaneous modeling may not be a sensible choice, especially when the case trajectories are evidently different from the control ones, as shown by the ROCA simulation results in Table 4 under the setting of quarterly screening. Furthermore, SREM may need to formulate the shared random effects and the outcome in a more complicated way rather than using a simple linear relation, calling for future methodological development.

In our study, the comparison on the discriminative performances of SREM, PMM, and ROCA on predicting the ovarian cancer early detection was based on a single biomarker CA-125, which was annually screened in the PLCO Cancer Screening Trial. Several biomarkers have been recently reported for the early detection of ovarian cancer, and studies show that incorporating those biomarkers may help to gain better prediction accuracy (Zhang et al., 2004; Russell et al., 2017; Visintin et al., 2008). For example, Russell et al. (2017)

propose a risk prediction method for ovarian cancer by adopting three additional biomarkers together with CA-125 and demonstrate that their method has better discriminative performance than ROCA. As ROCA models CA-125 only, it cannot handle multiple biomarkers. To the contrary, as general frameworks for disease risk prediction of longitudinal studies, PMM and SREM can be easily extended to deal with studies that are with multiple biomarkers

(Liu and Albert, 2014; Zhang et al., 2012), resulting in a possible solution to predict the early detection of ovarian cancer using the recently reported markers. However, using multiple longitudinal biomarkers can be computationally challenging and requires further research.

ROCA, PMM and SREM were all constructed with the binary outcome (cancer and non-cancer) but did not fully utilize the cancer diagnosis time. Therefore, the risk calculation cannot provide an absolute risk estimation, i.e., -year cancer-free survival since the last CA-125 screening. Our future investigations will focus on the extensions of the above mentioned methods to handle the survival outcome.

In conclusion, our study shows that SREM and PMM can be applied to disease early detection in the general setting of longitudinal studies, though they were originally developed for disease risk prediction. Analysis of the ovarian cancer data from the PLCO Cancer Screening Trial finds that using PMM to predict the early detection of ovarian cancer under an annual screening setting significantly outperforms ROCA and SREM. The proposed extensions to the case and the control models in the original ROCA can significantly improve the model fitting but not necessarily the prediction accuracy. The early detection prediction accuracy of ROCA could be improved with more frequent CA-125 screenings, as the latent changepoint structure would be better estimated accordingly.

Acknowledgments

This study was supported by the Intramural Research Program of the National Cancer Institute, the National Institutes of Health (NIH), United States. This work utilized the computational resources of the NIH High-Performance Computing Biowulf cluster (http://hpc.nih.gov).

References

  • P. S. Albert (2012) A linear mixed model for predicting a binary event from longitudinal data under random effects misspecification. Statistics in Medicine 31 (2), pp. 143–154. Cited by: §1, §2.1, §2.1, §6.
  • M. J. Barry (2001) Prostate-specific–antigen testing for early diagnosis of prostate cancer. New England Journal of Medicine 344 (18), pp. 1373–1377. Cited by: §6.
  • A. Berchuck, E. S. Iversen, J. M. Lancaster, J. Pittman, J. Luo, P. Lee, S. Murphy, H. K. Dressman, P. G. Febbo, M. West, et al. (2005) Patterns of gene expression that characterize long-term survival in advanced stage serous ovarian cancers. Clinical Cancer Research 11 (10), pp. 3686–3696. Cited by: §1.
  • S. S. Buys, E. Partridge, A. Black, C. C. Johnson, L. Lamerato, C. Isaacs, D. J. Reding, R. T. Greenlee, L. A. Yokochi, et al. (2011) Effect of screening on ovarian cancer mortality: the prostate, lung, colorectal and ovarian (plco) cancer screening randomized controlled trial. Journal of the American Medical Association 305 (22), pp. 2295–2303. Cited by: §4.1.
  • D. L. Clarke-Pearson (2009) Screening for ovarian cancer. New England Journal of Medicine 361 (2), pp. 170–177. Cited by: §1.
  • C. W. Drescher, C. Shah, J. Thorpe, K. O’Briant, G. L. Anderson, C. D. Berg, N. Urban, and M. W. McIntosh (2013) Longitudinal screening algorithm that incorporates change over time in ca125 levels identifies ovarian cancer earlier than a single-threshold rule. Journal of Clinical Oncology 31 (3), pp. 387. Cited by: §1.
  • Y. Han and D. Liu (2019)

    Accounting for random observation time in risk prediction with longitudinal markers: an imputation approach

    .
    Statistical Methods in Medical Research. Note: PMID: 30854937, https://doi.org/10.1177/0962280219833089 External Links: Document, Link, https://doi.org/10.1177/0962280219833089 Cited by: §1, §1, §2.2.
  • P. J. Heagerty, T. Lumley, and M. S. Pepe (2000) Time-dependent roc curves for censored survival data and a diagnostic marker. Biometrics 56 (2), pp. 337–344. Cited by: §1, §6.
  • J. T. Henderson, E. M. Webber, and G. F. Sawaya (2018) Screening for ovarian cancer: updated evidence report and systematic review for the us preventive services task force. Journal of the American Medical Association 319 (6), pp. 595–606. Cited by: §1.
  • N. Howlader, A. Noone, M. Krapcho, D. Miller, A. Brest, M. Yu, J. Ruhl, Z. Tatalovich, A. Mariotto, et al. (2019) SEER cancer statistics review, 1975-2016, national cancer institute. bethesda, md. , pp. . Cited by: §1.
  • D. Liu and P. S. Albert (2014) Combination of longitudinal biomarkers in predicting binary events. Biostatistics 15 (4), pp. 706–718. Cited by: §1, §2.2, §2.2, §6, §6.
  • U. A. Matulonis, A. K. Sood, L. Fallowfield, B. E. Howitt, J. Sehouli, and B. Y. Karlan (2016) Ovarian cancer. Nature Reviews Disease Primers 2, pp. 16061. Cited by: §1.
  • M. W. McIntosh and N. Urban (2003)

    A parametric empirical bayes method for cancer screening using longitudinal observations of a biomarker

    .
    Biostatistics 4 (1), pp. 27–40. Cited by: §1.
  • U. Menon, A. Ryan, J. Kalsi, A. Gentry-Maharaj, A. Dawnay, M. Habib, S. Apostolidou, N. Singh, E. Benjamin, M. Burnell, et al. (2015) Risk algorithm using serial biomarker measurements doubles the number of screen-detected cancers compared with a single-threshold rule in the united kingdom collaborative trial of ovarian cancer screening. Journal of Clinical Oncology 33 (18), pp. 2062. Cited by: §1.
  • P. F. Pinsky, C. Zhu, S. J. Skates, A. Black, E. Partridge, S. S. Buys, and C. D. Berg (2013) Potential effect of the risk of ovarian cancer algorithm (roca) on the mortality outcome of the prostate, lung, colorectal and ovarian (plco) trial. International Journal of Cancer 132 (9), pp. 2127–2133. Cited by: §1, §4.1.
  • M. R. Russell, A. D’Amato, C. Graham, E. J. Crosbie, A. Gentry-Maharaj, A. Ryan, J. K. Kalsi, E. Fourkala, C. Dive, M. Walker, et al. (2017) Novel risk models for early detection and screening of ovarian cancer. Oncotarget 8 (1), pp. 785. Cited by: §1, §1, §6.
  • S. J. Skates, M. H. Greene, S. S. Buys, P. L. Mai, P. Brown, M. Piedmonte, G. Rodriguez, J. O. Schorge, M. Sherman, M. B. Daly, et al. (2017) Early detection of ovarian cancer using the risk of ovarian cancer algorithm with frequent ca125 testing in women at increased familial risk–combined results from two screening trials. Clinical Cancer Research 23 (14), pp. 3628–3637. Cited by: §1.
  • S. J. Skates, U. Menon, N. MacDonald, A. N. Rosenthal, D. H. Oram, R. C. Knapp, and I. J. Jacobs (2003) Calculation of the risk of ovarian cancer from serial ca-125 values for preclinical detection in postmenopausal women. Journal of Clinical Oncology 21 (10 Suppl), pp. 206s–210s. Cited by: §1, §4.1.
  • S. J. Skates, D. K. Pauler, and I. J. Jacobs (2001) Screening based on the risk of cancer calculation from bayesian hierarchical changepoint and mixture models of longitudinal markers. Journal of the American Statistical Association 96 (454), pp. 429–439. Cited by: §1, §3.1, §3.2, §6.
  • I. Visintin, Z. Feng, G. Longton, D. C. Ward, A. B. Alvero, Y. Lai, J. Tenthorey, A. Leiser, R. Flores-Saaib, H. Yu, et al. (2008) Diagnostic markers for early detection of ovarian cancer. Clinical Cancer Research 14 (4), pp. 1065–1072. Cited by: §6.
  • N. Wentzensen (2016) Large ovarian cancer screening trial shows modest mortality reduction, but does not justify population-based ovarian cancer screening. BMJ Evidence-Based Medicine 21 (4), pp. 159–159. Cited by: §1.
  • J. Zhang, S. Kim, J. Grewal, and P. S. Albert (2012) Predicting large fetuses at birth: do multiple ultrasound examinations and longitudinal statistical modelling improve prediction?. Paediatric and Perinatal Epidemiology 26 (3), pp. 199–207. Cited by: §6.
  • Z. Zhang, R. C. Bast, Y. Yu, J. Li, L. J. Sokoll, A. J. Rai, J. M. Rosenzweig, B. Cameron, Y. Y. Wang, X. Meng, et al. (2004) Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Research 64 (16), pp. 5882–5890. Cited by: §1, §6.