1 Introduction
An illustration of how the proposed method (red curves) tackles the existing biomarker curve fitting problems. Left: A flexible function is used to better fit the asymmetric shape of the data points. Middle: A constrained function is utilized to estimate the exact dynamic of the biomarker considering its known lower and upper asymptotes. Right: A robust estimator is applied to fit a curve to data contaminated with outliers.
Alzheimer’s disease (AD) is the most common type of dementia and leads to progressive neurodegeneration, affecting memory and behavior according to regional damage to brain cells (Ewers et al., 2011). The hippocampus, which is the center of learning and memory, is often one of the first regions of the brain to be damaged. It has also been shown that cerebrospinal fluid (CSF) biomarkers can become abnormal in the presymptomatic phase of the disease, preceding positron emission tomography (PET) and magnetic resonance imaging (MRI) biomarkers followed by clinical markers (Jack Jr et al., 2010; Biagioni and Galvin, 2011). Currently, the cause of AD is not clear, and there is no cure or effective treatment to stop its progression, but an early diagnosis of the disease, especially in the presymptomatic stages, can provide time to treat symptoms and plan for the future. However, early diagnosis of AD is challenging mainly because elderly subjects under study can suffer from different agerelated pathologies and normal aging besides AD. Therefore, methods to stage and identify atrisk individuals are critical to dementia research.
Disease progression modeling use longitudinal studies to develop datadriven quantitative models that describe the evolution of the disease over time. The approach can therefore provide a complete perspective of the disease by computationally exploring the available data to help with better understanding of the disease for diagnostic, staging, monitorization, and prognostic purposes. Parametric disease progression modeling methods can be divided into two categories, continuous fitting for modeling the dynamic of biomarkers and discrete ordering for abnormality detection of biomarkers, both relying on unsupervised learning, e.g., by using maximumlikelihood estimation. The goal of the continuous approaches is to model biomarker values as a function of disease progression
(Jedynak et al., 2012; Oxtoby and Alexander, 2017). The discrete approaches, on the other hand, focus on temporally ordering of biomarkers becoming abnormal during the disease stages by discretizing the disease progression trajectory using generative, eventbased models subjected to a number of assumptions (Fonteijn et al., 2012; Venkatraghavan et al., 2019).Continuous, datadriven Alzheimer’s disease progression modeling has been inspired by the hypothetical models assuming sigmoidal evolution of AD biomarkers (Jack Jr et al., 2010, 2013). Accordingly, a variety of approaches have been applied to fit a continuous function to the longitudinal dynamic of each biomarker using statistical models such as differential equations and mixedeffects models (Oxtoby et al., 2014; Yau et al., 2015; Guerrero et al., 2016), in which one needs to align the trajectory of individuals based on some time measure, e.g., timetoconversion or yearstoonset. These methods are simple and require less data, but parametric assumptions on the biomarker trajectories limit the flexibility of the fits.
Recently, nonparametric disease progression modeling methods has been introduced to model biomarkers jointly while taking temporal dependencies among measurements into account (Lorenzi et al., 2017; Ghazi et al., 2019). These methods are flexible, make less assumption for modeling, and do not require alignment of the trajectories of the individuals. On the other hand, they are less interpretable, need a lot of multidimensional data for successful training, and cannot simply be applied to predict the test data when it includes missing or irregular visits, or when it has fewer biomarkers than what is used to train the network.
proposed a parametric algorithm, which requires less timeseries data to train and incorporates information from multiple biomarkers for modeling progression of AD over a common disease timescale. The method linearly transforms the chronological age of individual to a disease progression score (DPS) for timewise alignment of withincohort measurements, assuming that the visit intervals in the data under study are short relative to the disease duration. Alternating least squares is used to fit a sigmoid function to the longitudinal dynamic of each biomarker. The model results in a reasonable approximation for normal or cognitively impaired subjects; however, it is not robust to outliers that can be found in a more contaminated data and does not generalize well for AD patients due to lack of data covering severe AD.
In this paper, a robust extension of Jedynak et al. (2012, 2015) is proposed that fits a constrained generalized logistic function to the longitudinal dynamic of each biomarker using Mestimation to address potential curve fitting problems in the biomarker modeling. Figure 1
illustrates 1) how the use of a generalized logistic function improves the curve flexibility, 2) how the use of a constrained function moves asymptotes and ensures the full biomarker range fitting, and 3) how the use of Mestimation reduces the influence of outliers. The estimated parameters are quantified using bootstrapping via Monte Carlo resampling, and the inflection points are used to temporally order the biomarkers in the disease course. In addition, kernel density estimation with normal bases is applied to the estimated DPSs for clinical status prediction using a Bayesian classifier. Different logistic functions, including a new generalized type proposed in this study called modified Stannard, and loss functions are applied to AD progression modeling of the Alzheimer’s Disease Neuroimaging Initiative (ADNI)
(Petersen et al., 2010) and the National Alzheimer’s Coordinating Center (NACC) (Beekly et al., 2007) data using volumetric MRI biomarkers, PET measurements, and neuropsychological tests. The obtained results indicate that the proposed generalized logistic function (modified Stannard) fitted using the modified Huber loss achieves the best modeling performance over different bootstraps, and it consistently outperforms the basic algorithm of Jedynak et al. (2012, 2015) in all the considered experiments.The main contribution of this study is fourfold. First, the introduction of different generalized logistic functions including the new type proposed in this study called modified Stannard allows for more flexibility, e.g., asymmetry, improving the biomarker trajectory modeling capabilities of the method compared to using a standard sigmoid function. Second, the use of Mestimation suppresses the effect of outliers on the fit. Third, the acrosscohort generalizability of the proposed model is evaluated by applying the model trained using ADNI data to the test data from the NACC cohort. Finally, an endtoend approach is introduced that performs biomarker trajectory modeling (unsupervised learning), biomarker’s significant change detection (event ordering), and clinical status prediction (supervised learning). This is a holistic way to implement a system suitable for both research and clinical applications to better study and monitor AD.
2 The proposed method
The main objective of this work is to minimize the error of parametric disease progression modeling while making the estimates stable and robust to outliers. This is achieved by fitting a constrained generalized logistic function using a maximum likelihood type estimator (Mestimator).
2.1 Modeling dynamics of biomarkers
Similar to Jedynak et al. (2012, 2015), two sets of parameters are estimated in the model: observed biomarkerspecific parameters, which are assigned for fitting the biomarker curves, and hidden subjectspecific (agerelated) disease progression parameters that are defined for aligning the trajectory of subjects. Assume that is the th biomarker’s value at the th visit of the th subject and is an Sshaped logistic function of DPS with parameters . Each biomarker measurement is defined as
where
is the standard deviation of the
th biomarker with parameters, is additive white Gaussian noise (random effect) with i.i.d. assumption, and is the DPS for the th visit of the th subject and is obtained aswhere is the chronological age of subject in visit , and and are the rate and onset of disease progression of subject , respectively. Finally, the objective function for robust nonlinear regression is defined as
where is a maximum likelihood type function.
For fitting the longitudinal trajectories of biomarkers, four logistic functions are considered (Table 1). All functions have the same range and can produce the same inflection points at , to be used for biomarker ordering. We candidate utilization of a flexible logistic function obtained by modifying Stannard function (Stannard et al., 1985) so as to create an asymmetric growth curve with an inflection point at . In the defined functions, and denote the growth rate and symmetry parameter of the curves, respectively. The reason for restricting to the positive real numbers is to make parameters of the estimation identifiable.
Logistic function  

Verhulst  at  0  
Gompertz  at  0  
Richards  at  0  
Modified Stannard  at  0 
It can also be deducted from Table 1 that the sigmoid function first introduced by Verhulst (Verhulst, 1845) is a special (symmetric) case of both Richards’ function (Richards, 1959) and the proposed function when . Moreover, Gompertz’s function (Gompertz, 1825) is a simplified form of Richards’ function when approaches zero, i.e., . Finally, the upper and lower asymptotes of the curves can be adjusted by two additional parameters (Zwietering et al., 1990) as
The range parameters, and , are set to fixed values corresponding to the predefined range of biomarkers, which is almost always the case for neuropsychological tests. For the other biomarkers whose exact range are not defined, e.g., volumetric MRI and PET measurements, the parameters are constrained to lower and upper bounds derived from the available range of the data, i.e., .
2.2 Model fitting
Alternating approach, as an efficient optimization technique, is applied to solve the problem where the algorithm iteratively estimates using fixed values of and and vice versa until the parameters converge. This way, there is no need to optimize the objective function for the standard deviation of biomarkers , and they can be obtained when the biomarkerspecific parameters are estimated. The proposed algorithm can be summarized as follows

Initialization: initialize and update them by iteratively applying linear least squares regression to fit the dynamics of biomarkers while linearly mapping ages to DPSs in an alternating scheme.

Optimization: iterate until convergence.
Biomarker fitting: estimate the biomarkerspecific parameters using values of all subjects and visits.
(1) Age mapping: estimate the subjectspecific parameters using values of all biomarkers and visits.
(2)
where , , and correspond to the number of subjects available for the th biomarker, number of measurements among all subjects and visits available for the th biomarker, and number of measurements among all biomarkers and visits available for the th subject, respectively. Moreover, denotes the number of biomarkerspecific parameters for the th biomarker, is called the generalized Mscale estimator (Van Aelst et al., 2013) for the th biomarker, and
represents the degrees of freedom of the
th fit. Therefore, the algorithm can be applied in case any subject has at least two distinct points considering all biomarkers and visits, and if any biomarker contains more than points considering all subjects and visits.The utilized maximum likelihood type functions for robust regression (Holland and Welsch, 1977; Pennacchi, 2008) are described in Table 2. These estimators attempt to diminish influence of the outliers while fitting curves. In general, Mestimators use a tuning parameter called to scale the functions as in order to yield
asymptotic efficiency with respect to the standard normal distribution. The corresponding tuning constants for the utilized functions are also reported in Table
2.Loss function  

L2  1  
L1L2  1  
Logistic  1.205  
Modified Huber  1.2107  
CauchyLorentz  2.3849 
Finally, the obtained DPSs are standardized with respect to the scores of the available cognitively normal visits of subjects in order to calibrate all biomarker trajectories in different experiments. This process removes mean of the normal visits’ distribution of DPSs and scales the range to give a better intuition of timing of disease progression in the course of AD. In this case, it would be necessary to properly update the parameters as
where and are the mean and standard deviation of the DPSs in the available cognitively normal visits of subjects, respectively.
2.3 Biomarker value prediction
Biomarker values can be predicted using the fitted model parameters. Age mapping part of the proposed algorithm is applied to estimate the subjectspecific parameters using Equation (2) based on values of those biomarkers of the test subject that have available estimated biomarkespecific parameters in the fitted model. Next, biomarker values are predicted as using the estimated test DPSs where is the logistic function applied to model fitting.
2.4 Clinical status prediction
In order to predict the clinical status of test subjects per visit, kernel density estimation (KDE) (Parzen, 1962) is used to fit likelihoods of cognitively normal, cognitively impaired, and AD groups in a nonparametric way. Assume that is a set of i.i.d. DPSs sampled from an unknown distribution with density function , where denotes the th class label. KDE is expressed as
where is a smooth (kernel) function with a smoothing bandwidth . Here, the Gaussian kernel is used as the smoothing function.
The clinical status is predicted based on the DPSs with a Bayesian classifier that uses the KDEbased fitted likelihoods as
where
is the datadriven prior probability for the
th class, the term in the denominator specifies the evidence, andis the posterior probability for predicting the test DPS that belongs to the class
.3 Experimental setup
3.1 Data
Data used in this work is obtained from the ADNI and the NACC databases.
clinical status  age, year  education, year  MMSE  

(meanSD)  (meanSD)  (meanSD)  
CN  77.814.95  15.553.03  29.111.09  
MCI  75.077.67  15.722.89  27.462.13  
AD  76.467.50  15.662.90  21.704.38  
ADNI 
Unknown  74.987.72  15.812.81  26.572.61 
CN  81.867.72  13.923.58  27.932.10  
MCI  80.728.54  13.933.63  25.432.73  
AD  81.108.15  13.923.63  19.565.08  
NACC 
Unknown  79.408.38  14.363.45 

Note that missing clinical status is indicated as Unknown, and MMSE is not available for NACC subjects at visits with missing clinical status.
# visits per clinical status  # visits per subject  visit interval, year  # measurements per subject  

CN  MCI  AD  Unknown  (meanSD)  (meanSD)  (meanSD)  
ADNI  405  3,847  2,057  1,431  6.522.45  0.640.37  46.8620.79 
NACC  178  203  317  199  7.732.95  0.980.60  19.587.40 
3.1.1 Adni
The ADNI was launched in 2003 as a publicprivate partnership, led by principal investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease. We use The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge dataset (Marinescu et al., 2018) that includes the three ADNI phases ADNI 1, ADNI GO, and ADNI 2. This dataset contains measurements from brain MRI, PET, CSF, neuropsychological tests, and demographics and genetics information.
The labels cognitively normal (CN), significant memory concern (SMC), and normal (NL) are merged under CN; mild cognitive impairment (MCI), early MCI (EMCI), and late MCI (LMCI) under MCI; and AD and dementia under AD. In addition, subjects converting from one clinical status to another, e.g., MCItoAD, are assigned the latter label (AD in this example). The utilized ADNI data includes T1weighted brain MRI volumes of ventricles, hippocampus, whole brain, fusiform, and entorhinal cortex, fludeoxyglucosePET (FDGPET), as well as the neuropsychological tests clinical dementia rating sum of boxes (CDRSB), Alzheimer’s disease assessment scale 13 items (ADAS13), minimental state examination (MMSE), functional activities questionnaire (FAQ), Montreal cognitive assessment (MOCA), and Rey auditory verbal learning test of immediate recall (RAVLTI).
3.1.2 Nacc
The NACC, established by the National Institute on Aging of the National Institutes of Health in 1999, has been developing a large database of standardized clinical and neuropathological data from both exploratory and explanatory Alzheimer’s disease research (Beekly et al., 2007). The data has been collected from different Alzheimer’s disease centers across the United States and among others contains measurements from different modalities such as MRI, PET, and neuropsychological tests.
Labels with numerical cognitive status of one (normal cognition) and two (impairednotMCI) are merged under CN, and cognitive status of three (MCI) and four (Dementia) are set to MCI and AD, respectively. It should be noted that we only keep subjects with primary etiologic diagnosis of normal, AD, or missing. This is to exclude subjects diagnosed with other types of dementia, nonneurodegenerative disease, or a nonneurological condition. The used NACC data includes T1weighted brain MRI volumes of hippocampus and whole brain, and the neuropsychological tests CDRSB, MMSE, and total FAQ.
3.1.3 Data filtering
In each of the ADNI and NACC subsets, withinclass outliers across all subjects and visits are filtered for each biomarker using predefined range of the biomarker by assuming them as missing values. Moreover, erroneous and preceding dates or clinical diagnoses are corrected or removed per subject considering the neighboring visits. Finally, measurements or clinical diagnoses with missing date information per visit are set to missing values. We also omit normal subjects staying normal to reduce the effect of normal aging on AD progression modeling.
In order to be able to apply the proposed algorithm, the utilized data needs to meet two filtering criteria. According to the first criteria, subjects with less than two distinct points considering all biomarkers and visits are eliminated. To fulfill the second criteria, any biomarker with less than points considering all subjects and visits is omitted.
The utilized 12 ADNI biomarkers are collected from 1,187 subjects (692 males and 495 females) in 7,740 visits between 2005 and 2017, and the five NACC biomarkers are acquired from 116 subjects (43 males and 73 females) in 897 visits between 2002 and 2018. Table 3 and Table 4 summarize statistics of the demographics and measurements in the two datasets after data filtering. Note that both datasets include missing values and missing clinical status, the latter indicated as Unknown.
3.1.4 Data preprocessing
In the ADNI dataset, the volumetric measurements were obtained using two different versions of the public FreeSurfer software package (Fischl et al., 2002), and in the NACC dataset, they were calculated using IDeA Lab following ADNI protocols. Therefore, the MRI measurements need to be standardized across FreeSurfer version (Gronenschild et al., 2012) and across software package. In addition, biological difference in brain size hinders direct utilization of MRI measurements for disease progression estimation. Intracranial volume (ICV) is a commonly used measure for normalization to correct for head size. To overcome both aforementioned problems of difference in software (version) and head size, we employ the residual approach (O’Brien et al., 2011)
based on the analysis of covariance, which takes data from control groups and linearly regresses MRI volumes on their corresponding ICV as a covariate of interest. The corrected measurements can thus be calculated as the estimated residuals
of the volumes using the regression parameters obtained from the control data aswhere is the th MRI volume for subject at visit calculated (observed) using software or software version , is the corresponding intracranial volume, and and are the intercept and slope of the regression estimated from the CN group.
3.1.5 Data partitioning and bootstrapping
To evaluate the algorithms, each of the ADNI and NACC datasets is partitioned into two nonoverlapping sets for training and testing. Based on the first and last available diagnoses of subjects, i.e., CNCN, CNMCI, …, ADAD, we divide each of these types of pairs into two groups including few and large number of visits using the median number of visits as threshold and randomly select of the subjects from each of these groups for testing.
Additionally, bootstrapping is used in the experiments for quantification of the estimation, and in each bootstrap, a subset of the training subjects is randomly sampled with replacement based on the first and last available pair of diagnoses while ensuring that subjects with few and large number of visits are included in the training set. The unused subjects are assigned for validation, which constitute of the subjects where
is the base of the natural logarithm. This also means that the estimated variance using the bootstrapped model will account for approximately
of the total variance.To facilitate future research in AD progression modeling and comparison with the current study, all the data splits, including each bootstrap split, are also provided on arXiv as ancillary files.
3.2 Evaluation metrics
The mean of the median absolute error (MMAE) across all biomarkers is used to assess the modeling performance. It is a measure robust to outliers based on the absolute differences between actual and estimated values over all normalized biomarker values as follows
where is the number of biomarkers, and are the groundtruth and estimated values of the th biomarker for the th test subject at the th visit, and and indicate the lower and upper bounds of the th biomarker. The modeling performance of two different methods is statistically compared using the paired, twosided Wilcoxon signed rank test (Wilcoxon, 1945) applied to the MMAEs obtained from different bootstraps.
Additionally, multiclass area under the receiver operating characteristic (ROC) curve (MAUC) (Hand and Till, 2001) is used to measure the diagnostic performance in a multiclass test set and is calculated using the posterior probabilities as
where is the number of distinct classes, denotes the number of available observations belonging to the th class, and is the sum of the ranks of posteriors after sorting all concatenated posteriors in an ascending order, where and
are vectors of DPSs belonging to the true classes
and , respectively. Likewise, is the sum of the ranks of posteriors after sorting all concatenated posteriors in an ascending order.[1pt][lr] Logistic  

functionLoss function  L2  L1L2  Logistic  Modified Huber  CauchyLorentz 
Verhulst  60.900.92  59.851.03  59.711.00  59.640.98  60.201.03 
Gompertz  60.880.93  60.070.98  59.880.95  59.830.87  60.610.95 
Richards  60.590.91  59.760.91  59.610.85  59.550.84  60.160.95 
Modified Stannard  60.610.81  59.700.89  59.700.95  59.490.75  60.070.90 

The best result is shown in bold face and its corresponding configuration is selected for the remaining experiments.
3.3 Initialization of the algorithm and optimization
Since the fitting algorithm is iteratively performed using an alternating approach starting from values optimized in the previous step, initialization is an important step for efficiently reaching the optimum. We initially set and to and , respectively. Moreover, we set the intercept of the (substituted) linear functions used for biomarker fitting to and their constrained slope () to either or depending on the distribution of biomarker values across age. A positive slope is considered when the median of the th biomarker’s values for young subjects is less than that for old subjects and vice versa. Next, we update estimates of the subject/biomarkerspecific parameters for iterations using linear least squares fitting for both biomarker fitting and age mapping.
After all parameters are linearly estimated, the parameters of the logistic functions are initially estimated as and , where and are the minimum and maximum of the th biomarker’s values, respectively, provided that the slope is positive, and vice versa if the slope is negative. In addition, each inflection point is estimated as the median of the obtained DPSs. Finally, we repeat the alternating procedure using the logistic functions and the trustregion algorithm (Coleman and Li, 1996) considering robust estimators for at most iterations.
3.4 Stopping criteria
To avoid overfitting, the optimization process is stopped according to the following stopping criteria (Prechelt, 1998)
where and are the average validations loss and average training loss over all biomarkers at the th iteration, respectively, and they are obtained through biomarker fitting using Equation (1). The left hand side of the inequality is called the generalization loss, and the right hand side of the inequality is the training progress measured by a training strip of length and threshold of , set to and , respectively. This strip provides a suppressed stopping taking iterations into account, which allows the generalization loss to possibly be repaired when it begins fluctuating and the training error still decreases quickly.
4 Results and discussion
4.1 Biomarker modeling
First, the proposed method is applied to model the dynamics of the ADNI biomarkers. Table 5 illustrates the modeling performance (MMAE) for ADNI validation subsets obtained from 100 bootstraps using different logistic and loss functions. As can be seen, the combination of the modified Stannard function for biomarker fitting and the modified Huber loss achieves the best modeling performance of all the alternatives with both the lowest mean MMAE and the smallest standard deviation. This configuration will be used in all the remaining experiments.
To further investigate the stability and robustness of the model with the chosen configuration of logistic and loss functions, we visualize the fitted trajectories for each of the 100 bootstrap runs together with their average per biomarker in Figure 2.
Within cohort  Across cohort  

Method  ADNI  NACC  ADNI to NACC 
Jedynak et al. (2015)  61.390.64  73.669.67  69.141.81 
The proposed  59.010.30  71.515.23  66.201.10 
4.2 Temporal ordering of biomarkers
To give an indication of the timing and the dynamics of the different biomarkers relative to each other, Figure 3 shows the average curves scaled to using the estimated upper and lower asymptotes per biomarker and superimposed in the same figure. The distribution of the inflection points of the biomarkers, quantified through bootstrapping, can be used to see how biomarkers proceed in the course of AD with respect to each other. The inflection point can be considered a turning point after which a dramatic change in the biomarker is expected to occur. Figure 4 displays the temporal ordering of the ADNI biomarkers based on the estimated inflection points. As can be seen, RAVLTI precedes all other biomarkers. This interesting finding is in line with the results of Jedynak et al. (2012, 2015) and is consistent with several clinical studies concluding that some neuropsychological tests including RAVLT are significant predictors that can predict neurodegenerative changes up to 10 years before clinical diagnosis (Tierney et al., 2005; Landau et al., 2010; Zandifar et al., 2019). Moreover, bearing Figure 3 in mind, volumetric MRI biomarkers start becoming abnormal early in the disease course. However, some of these biomarkers such as ventricles and whole brain are, as also seen in Figure 4, noisy measurements to model progression of AD in this dataset. It is important to note that the inflection points are utilized to order the biomarkers in the disease course. These points do not imply when the biomarkers start becoming abnormal and hence, are not measures for early abnormality detection.
4.3 DPS distribution versus biomarker timing
Figure 5 shows the variance of the estimated inflection points per biomarker alongside the estimated classconditional likelihoods of the obtained DPSs from 100 bootstraps. As can be seen, there is a high overlap between the DPS distributions of CN and MCI while the CN and AD groups can be discriminated easily. Moreover, the estimated inflection points per biomarker are almost in line with those of the hypothetical model by Jack Jr et al. (2010) that illustrates when biomarkers are dynamic versus disease stages. Especially, inflection points of the MRI biomarkers (brain structure) are mainly located in the MCI stage while those of the neuropsychological tests (memory) lie on the AD stage.
4.4 Predicting biomarker values
The biomarkerspecific parameters estimated using the training set in each bootstrap are applied to map the test individuals’ ages to DPSs using Equation (2). The obtained DPSs are subsequently fed to the perbootstrap estimated biomarker functions for the proposed model and the analogous stateoftheart model by Jedynak et al. (2015) that fits the basic sigmoid function using unconstrained, L2norm loss function to evaluate their predictive power. Table 6 shows the ADNI test MMAEs of the aforementioned methods obtained using the estimates from 100 bootstraps. The proposed model significantly outperforms the analogous stateoftheart model with an average MMAE of vs. ().
4.5 Predicting clinical status
To evaluate the diagnostic predictive performance, the obtained training DPSs per bootstrap are used to estimate classconditional likelihood functions using KDE and fed to a threeclass Bayesian classifier with prior probabilities proportional to the number of training observations in each class. The classifiers, one for each bootstrap, are applied to the test DPSs estimated as described in Section 4.4 to compute the posterior probabilities of the clinical labels. The proposed model achieves an MAUC of in predicting the clinical status of the test ADNI subjects per visit, which reveals the effect of modeling on classification performance.
The obtained posterior probabilities from the different classifiers from each bootstrap can be combined using ensemble learning techniques to potentially improve prediction performance and robustness (Kuncheva, 2014); for example, by taking the average of the withinclass posteriors over an ensemble of models from different bootstraps, corresponding to bagging, before calculating the MAUC. By fusing the posteriors of the proposed method in such a way, the MAUC increases to .
The MAUC of obtained using the proposed method outperforms the stateoftheart, crosssectional MRIbased classification results of the challenge on ComputerAided Diagnosis of Dementia (CADDementia) (Bron et al., 2015) where the best MAUC was . However, it should be noted that the CADDementia algorithms were applied to classify data from independent cohorts using different types of crosssectional MRI biomarkers besides volumetry.
4.6 Generalizability across cohorts
As the final set of experiments, the generalizability of the proposed model to an independent cohort is assessed using the NACC data. First, the model using the optimal configuration (modified Stannard as logistic function and modified Huber loss as Mestimator) according to the ADNI experiments is applied to model AD within NACC. Figure 6 depicts the modeled NACC biomarkers for 100 bootstraps. Second, the previously ADNItrained model with the optimal configuration is utilized to predict the NACC test measurements using the ADNIbased estimates of the corresponding set of biomarkers, i.e., CDRSB, MMSE, FAQ, hippocampus, and whole brain. Table 6 compares the modeling performance of the ADNItrained and the NACCtrained models applied to the NACC test set. As it can be noticed from the obtained results, the previously selected configuration for ADNI data is also a good choice when applied to NACC data. Moreover, both modeling performance of the proposed method and the stateoftheart model of Jedynak et al. (2015) significantly improve () by applying the ADNItrained model to the NACC test set by average MMAEs of and , respectively. That is, not only does the ADNItrained model generalize across cohorts, it also improves the performance compared to using the NACCtrained model. One reason for the improvement could be the relatively larger size of the ADNI data and its additional biomarkers providing a better foundation for robust modeling of biomarkers. The results also show that the proposed model significantly outperforms () the stateoftheart model of Jedynak et al. (2015) in all cases.
Additionally, we apply the ADNI and NACC trained models to the estimated test NACC DPSs to predict clinical status per subject per visit. The proposed model achieves MAUCs of and in predicting clinical status of the test NACC subjects per visit when the trained NACC and trained ADNI models are applied, respectively. This also reveals that the diagnostic performance improves by applying the ADNItrained model to the NACC test set.
5 Conclusions
In this paper, a robust parametric model of Alzheimer’s disease progression was proposed based on alternating Mestimation using the modified Huber loss that linearly transformed individuals’ chronological ages to disease progression scores and fitted a novel generalized logistic function to the longitudinal dynamic of each biomarker. The estimated parameters were then used to temporally order the biomarkers in the disease course and to predict the test biomarker values as well as the clinical status per subject visit. The obtained results showed the superiority of the proposed method over the analogous stateoftheart model proposed by Jedynak et al. (2015) in terms of prediction performance and generalizability.
The proposed approach can be applied to different timeseries data including missing points and labels, or to biomarkers with other characteristics than the monotonic behavior that one typically encounters in MRIbased neurodegenerative disease progression modeling as long as suitable functions are used for the biomarker modeling. Moreover, as an alternative to using Mestimators, resistant estimators such as the least trimmed sum of squares and least median of squares (Rousseeuw and Leroy, 1987) with higher breakdown points can be used instead in the algorithm to fit biomarkers. However, this will result in an additional parameter to be optimized for the coverage (range) needed for trimming the residuals.
Disclosures
M. Nielsen is shareholder in Biomediq A/S and Cerebriu A/S. A. Pai is shareholder in Cerebriu A/S. The remaining authors report no disclosures.
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant agreement No 721820. This work uses the TADPOLE data sets (https://tadpole.grandchallenge.org) constructed by the EuroPOND consortium (http://europond.eu) funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 666992.
ADNI acknowledgments–Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; BristolMyers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
NACC acknowledgments–The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIAfunded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P50 AG005134 (PI Bradley Hyman, MD, PhD), P50 AG016574 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Thomas Wisniewski, MD), P30 AG013854 (PI M. Marsel Mesulam, MD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P50 AG005131 (PI James Brewer, MD, PhD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG053760 (PI Henry Paulson, MD, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P30 AG049638 (PI Suzanne Craft, PhD), P50 AG005136 (PI Thomas Grabowski, MD), P50 AG033514 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), P50 AG047270 (PI Stephen Strittmatter, MD, PhD).
References
 Beekly et al. (2007) Beekly, D.L., Ramos, E.M., Lee, W.W., Deitrich, W.D., Jacka, M.E., Wu, J., Hubbard, J.L., Koepsell, T.D., Morris, J.C., Kukull, W.A., et al., 2007. The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set. Alzheimer Dis. Assoc. Disord. 21, 249–258.
 Biagioni and Galvin (2011) Biagioni, M.C., Galvin, J.E., 2011. Using biomarkers to improve detection of Alzheimer’s disease. Neurodegener. Dis. Manag. 1, 127–139.
 Bron et al. (2015) Bron, E.E., Smits, M., Van Der Flier, W.M., Vrenken, H., Barkhof, F., Scheltens, P., Papma, J.M., Steketee, R.M., Orellana, C.M., Meijboom, R., et al., 2015. Standardized evaluation of algorithms for computeraided diagnosis of dementia based on structural MRI: the CADDementia challenge. NeuroImage 111, 562–579.
 Coleman and Li (1996) Coleman, T.F., Li, Y., 1996. An interior trust region approach for nonlinear minimization subject to bounds. SIAM J. Optim. 6, 418–445.
 Ewers et al. (2011) Ewers, M., Sperling, R.A., Klunk, W.E., Weiner, M.W., Hampel, H., 2011. Neuroimaging markers for the prediction and early diagnosis of Alzheimer’s disease dementia. Trends Neurosci. 34, 430–442.
 Fischl et al. (2002) Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., Van Der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., Dale, A.M., 2002. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355.
 Fonteijn et al. (2012) Fonteijn, H.M., Modat, M., Clarkson, M.J., Barnes, J., Lehmann, M., Hobbs, N.Z., Scahill, R.I., Tabrizi, S.J., Ourselin, S., Fox, N.C., et al., 2012. An eventbased model for disease progression and its application in familial Alzheimer’s disease and Huntington’s disease. NeuroImage 60, 1880–1889.

Ghazi et al. (2019)
Ghazi, M.M., Nielsen, M.,
Pai, A., Cardoso, M.J.,
Modat, M., Ourselin, S.,
Sørensen, L., 2019.
Training recurrent neural networks robust to incomplete data: application to Alzheimer’s disease progression modeling.
Med. Image Anal. 53, 39–46.  Gompertz (1825) Gompertz, B., 1825. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos. Trans. R. Soc. London , 513–583.
 Gronenschild et al. (2012) Gronenschild, E.H., Habets, P., Jacobs, H.I., Mengelers, R., Rozendaal, N., Van Os, J., Marcelis, M., 2012. The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements. PloS One 7, e38234.
 Guerrero et al. (2016) Guerrero, R., SchmidtRichberg, A., Ledig, C., Tong, T., Wolz, R., Rueckert, D., 2016. Instantiated mixed effects modeling of Alzheimer’s disease markers. NeuroImage 142, 113–125.
 Hand and Till (2001) Hand, D.J., Till, R.J., 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186.
 Holland and Welsch (1977) Holland, P.W., Welsch, R.E., 1977. Robust regression using iteratively reweighted leastsquares. Commun. Stat. Theory Methods 6, 813–827.
 Jack Jr et al. (2013) Jack Jr, C.R., Knopman, D.S., Jagust, W.J., Petersen, R.C., Weiner, M.W., Aisen, P.S., Shaw, L.M., Vemuri, P., Wiste, H.J., Weigand, S.D., et al., 2013. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 12, 207–216.
 Jack Jr et al. (2010) Jack Jr, C.R., Knopman, D.S., Jagust, W.J., Shaw, L.M., Aisen, P.S., Weiner, M.W., Petersen, R.C., Trojanowski, J.Q., 2010. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurol. 9, 119–128.
 Jedynak et al. (2012) Jedynak, B.M., Lang, A., Liu, B., Katz, E., Zhang, Y., Wyman, B.T., Raunig, D., Jedynak, C.P., Caffo, B., Prince, J.L., 2012. A computational neurodegenerative disease progression score: method and results with the Alzheimer’s Disease Neuroimaging Initiative cohort. NeuroImage 63, 1478–1486.
 Jedynak et al. (2015) Jedynak, B.M., Liu, B., Lang, A., Gel, Y., Prince, J.L., 2015. A computational method for computing an Alzheimer’s disease progression score; experiments and validation with the ADNI data set. Neurobiol. Aging 36, S178–S184.
 Kuncheva (2014) Kuncheva, L.I., 2014. Combining pattern classifiers: methods and algorithms. John Wiley & Sons.
 Landau et al. (2010) Landau, S., Harvey, D., Madison, C., Reiman, E., Foster, N., Aisen, P., Petersen, R.C., Shaw, L., Trojanowski, J., Jack, C., et al., 2010. Comparing predictors of conversion and decline in mild cognitive impairment. Neurol. 75, 230–238.
 Lorenzi et al. (2017) Lorenzi, M., Filippone, M., Frisoni, G.B., Alexander, D.C., Ourselin, S., 2017. Probabilistic disease progression modeling to characterize diagnostic uncertainty: application to staging and prediction in Alzheimer’s disease. NeuroImage 190, 56–68.
 Marinescu et al. (2018) Marinescu, R.V., Oxtoby, N.P., Young, A.L., Bron, E.E., Toga, A.W., Weiner, M.W., Barkhof, F., Fox, N.C., Klein, S., Alexander, D.C., 2018. TADPOLE challenge: prediction of longitudinal evolution in Alzheimer’s disease. CoRR abs/1805.03909.
 O’Brien et al. (2011) O’Brien, L.M., Ziegler, D.A., Deutsch, C.K., Frazier, J.A., Herbert, M.R., Locascio, J.J., 2011. Statistical adjustments for brain size in volumetric neuroimaging studies: some practical implications in methods. Psychiatry Res.: Neuroimaging 193, 113–122.
 Oxtoby and Alexander (2017) Oxtoby, N.P., Alexander, D.C., 2017. Imaging plus X: multimodal models of neurodegenerative disease. Curr. Opin. Neurol. 30, 371.
 Oxtoby et al. (2014) Oxtoby, N.P., Young, A.L., Fox, N.C., Daga, P., Cash, D.M., Ourselin, S., Schott, J.M., Alexander, D.C., 2014. Learning imaging biomarker trajectories from noisy Alzheimer’s disease data using a Bayesian multilevel model, in: Bayesian and Graphical Models for Biomedical Imaging, pp. 85–94.

Parzen (1962)
Parzen, E., 1962.
On estimation of a probability density function and mode.
Ann. Math. Stat. 33, 1065–1076.  Pennacchi (2008) Pennacchi, P., 2008. Robust estimate of excitations in mechanical systems using Mestimators–theoretical background and numerical applications. J. Sound Vib. 310, 923–946.
 Petersen et al. (2010) Petersen, R.C., Aisen, P., Beckett, L., Donohue, M., Gamst, A., Harvey, D., Jack, C., Jagust, W., Shaw, L., Toga, A., Trojanowski, J., Weiner, M., 2010. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurol. 74, 201–209.
 Prechelt (1998) Prechelt, L., 1998. Early stopping–but when?, in: Neural Networks: Tricks of the Trade, pp. 55–69.
 Richards (1959) Richards, F., 1959. A flexible growth function for empirical use. J. Exp. Bot. 10, 290–301.

Rousseeuw and Leroy (1987)
Rousseeuw, P.J., Leroy, A.M.,
1987.
Robust regression and outlier detection. volume 1.
Wiley Online Library.  Stannard et al. (1985) Stannard, C., Williams, A., Gibbs, P., 1985. Temperature/growth relationships for psychrotrophic foodspoilage bacteria. Food Microbiol. 2, 115–122.
 Tierney et al. (2005) Tierney, M.C., Yao, C., Kiss, A., McDowell, I., 2005. Neuropsychological tests accurately predict incident Alzheimer disease after 5 and 10 years. Neurol. 64, 1853–1859.
 Van Aelst et al. (2013) Van Aelst, S., Willems, G., Zamar, R.H., 2013. Robust and efficient estimation of the residual scale in linear regression. J. Multivar. Anal. 116, 278–296.
 Venkatraghavan et al. (2019) Venkatraghavan, V., Bron, E.E., Niessen, W.J., Klein, S., 2019. Disease progression timeline estimation for Alzheimer’s disease using discriminative event based modeling. NeuroImage 186, 518–532.
 Verhulst (1845) Verhulst, P., 1845. La loi d’accroissement de la population. Nouv. Mém. de l’Academie Royale des Sci. et BellesLettres de Bruxelles 18, 1–41.
 Wilcoxon (1945) Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biom. Bull. 1, 80–83.
 Yau et al. (2015) Yau, W.Y.W., Tudorascu, D.L., McDade, E.M., Ikonomovic, S., James, J.A., Minhas, D., Mowrey, W., Sheu, L.K., Snitz, B.E., Weissfeld, L., et al., 2015. Longitudinal assessment of neuroimaging and clinical markers in autosomal dominant Alzheimer’s disease: a prospective cohort study. Lancet Neurol. 14, 804–813.
 Zandifar et al. (2019) Zandifar, A., Fonov, V., Ducharme, S., Belleville, S., Collins, D.L., 2019. MRI and cognitive scores complement each other to accurately predict Alzheimer’s dementia 2 to 7 years before clinical onset. bioRxiv , 567867.
 Zwietering et al. (1990) Zwietering, M., Jongenburger, I., Rombouts, F., Van’t Riet, K., 1990. Modeling of the bacterial growth curve. Appl. Environ. Microbiol. 56, 1875–1881.
Comments
There are no comments yet.