Robust parametric modeling of Alzheimer's disease progression

by   Mostafa Mehdipour Ghazi, et al.

Quantitative characterization of disease progression using longitudinal data can provide long-term predictions for the pathological stages of individuals. This work studies robust modeling of Alzheimer's disease progression using parametric methods. The proposed method linearly maps individual's chronological age to a disease progression score (DPS) and robustly fits a constrained generalized logistic function to the longitudinal dynamic of a biomarker as a function of the DPS using M-estimation. Robustness of the estimates is quantified using bootstrapping via Monte Carlo resampling, and the inflection points are used to temporally order the modeled biomarkers in the disease course. Moreover, kernel density estimation is applied to the obtained DPSs for clinical status prediction using a Bayesian classifier. Different M-estimators and logistic functions, including a new generalized type proposed in this study called modified Stannard, are evaluated on the ADNI database for robust modeling of volumetric MRI and PET biomarkers, as well as neuropsychological tests. The results show that the modified Stannard function fitted using the modified Huber loss achieves the best modeling performance with a mean of median absolute errors (MMAE) of 0.059 across all biomarkers and bootstraps. In addition, applied to the ADNI test set, this model achieves a multi-class area under the ROC curve (MAUC) of 0.87 in clinical status prediction, and it significantly outperforms an analogous state-of-the-art method with a biomarker modeling MMAE of 0.059 vs. 0.061 (p < 0.001). Finally, the experiments show that the proposed model, trained using abundant ADNI data, generalizes well to data from the independent NACC database, where both modeling and diagnostic performance are significantly improved (p < 0.001) compared with using a model trained using relatively sparse NACC data.



page 1

page 2

page 3

page 4


Training recurrent neural networks robust to incomplete data: application to Alzheimer's disease progression modeling

Disease progression modeling (DPM) using longitudinal data is a challeng...

DPVis: Visual Exploration of Disease Progression Pathways

Clinical researchers use disease progression modeling algorithms to pred...

DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression for Real-World Clinical Applications

The ability to predict the future trajectory of a patient is a key step ...

Robust training of recurrent neural networks to handle missing data for disease progression modeling

Disease progression modeling (DPM) using longitudinal data is a challeng...

Representing Alzheimer's Disease Progression via Deep Prototype Tree

For decades, a variety of predictive approaches have been proposed and e...

Copula-based semiparametric transformation model for bivariate data under general interval censoring

This research is motivated by discovering and underpinning genetic cause...

A Discriminative Event Based Model for Alzheimer's Disease Progression Modeling

The event-based model (EBM) for data-driven disease progression modeling...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1:

An illustration of how the proposed method (red curves) tackles the existing biomarker curve fitting problems. Left: A flexible function is used to better fit the asymmetric shape of the data points. Middle: A constrained function is utilized to estimate the exact dynamic of the biomarker considering its known lower and upper asymptotes. Right: A robust estimator is applied to fit a curve to data contaminated with outliers.

Alzheimer’s disease (AD) is the most common type of dementia and leads to progressive neurodegeneration, affecting memory and behavior according to regional damage to brain cells (Ewers et al., 2011). The hippocampus, which is the center of learning and memory, is often one of the first regions of the brain to be damaged. It has also been shown that cerebrospinal fluid (CSF) biomarkers can become abnormal in the presymptomatic phase of the disease, preceding positron emission tomography (PET) and magnetic resonance imaging (MRI) biomarkers followed by clinical markers (Jack Jr et al., 2010; Biagioni and Galvin, 2011). Currently, the cause of AD is not clear, and there is no cure or effective treatment to stop its progression, but an early diagnosis of the disease, especially in the pre-symptomatic stages, can provide time to treat symptoms and plan for the future. However, early diagnosis of AD is challenging mainly because elderly subjects under study can suffer from different age-related pathologies and normal aging besides AD. Therefore, methods to stage and identify at-risk individuals are critical to dementia research.

Disease progression modeling use longitudinal studies to develop data-driven quantitative models that describe the evolution of the disease over time. The approach can therefore provide a complete perspective of the disease by computationally exploring the available data to help with better understanding of the disease for diagnostic, staging, monitorization, and prognostic purposes. Parametric disease progression modeling methods can be divided into two categories, continuous fitting for modeling the dynamic of biomarkers and discrete ordering for abnormality detection of biomarkers, both relying on unsupervised learning, e.g., by using maximum-likelihood estimation. The goal of the continuous approaches is to model biomarker values as a function of disease progression

(Jedynak et al., 2012; Oxtoby and Alexander, 2017). The discrete approaches, on the other hand, focus on temporally ordering of biomarkers becoming abnormal during the disease stages by discretizing the disease progression trajectory using generative, event-based models subjected to a number of assumptions (Fonteijn et al., 2012; Venkatraghavan et al., 2019).

Continuous, data-driven Alzheimer’s disease progression modeling has been inspired by the hypothetical models assuming sigmoidal evolution of AD biomarkers (Jack Jr et al., 2010, 2013). Accordingly, a variety of approaches have been applied to fit a continuous function to the longitudinal dynamic of each biomarker using statistical models such as differential equations and mixed-effects models (Oxtoby et al., 2014; Yau et al., 2015; Guerrero et al., 2016), in which one needs to align the trajectory of individuals based on some time measure, e.g., time-to-conversion or years-to-onset. These methods are simple and require less data, but parametric assumptions on the biomarker trajectories limit the flexibility of the fits.

Recently, nonparametric disease progression modeling methods has been introduced to model biomarkers jointly while taking temporal dependencies among measurements into account (Lorenzi et al., 2017; Ghazi et al., 2019). These methods are flexible, make less assumption for modeling, and do not require alignment of the trajectories of the individuals. On the other hand, they are less interpretable, need a lot of multidimensional data for successful training, and cannot simply be applied to predict the test data when it includes missing or irregular visits, or when it has fewer biomarkers than what is used to train the network.

Jedynak et al. (2012, 2015)

proposed a parametric algorithm, which requires less time-series data to train and incorporates information from multiple biomarkers for modeling progression of AD over a common disease timescale. The method linearly transforms the chronological age of individual to a disease progression score (DPS) for time-wise alignment of within-cohort measurements, assuming that the visit intervals in the data under study are short relative to the disease duration. Alternating least squares is used to fit a sigmoid function to the longitudinal dynamic of each biomarker. The model results in a reasonable approximation for normal or cognitively impaired subjects; however, it is not robust to outliers that can be found in a more contaminated data and does not generalize well for AD patients due to lack of data covering severe AD.

In this paper, a robust extension of Jedynak et al. (2012, 2015) is proposed that fits a constrained generalized logistic function to the longitudinal dynamic of each biomarker using M-estimation to address potential curve fitting problems in the biomarker modeling. Figure 1

illustrates 1) how the use of a generalized logistic function improves the curve flexibility, 2) how the use of a constrained function moves asymptotes and ensures the full biomarker range fitting, and 3) how the use of M-estimation reduces the influence of outliers. The estimated parameters are quantified using bootstrapping via Monte Carlo resampling, and the inflection points are used to temporally order the biomarkers in the disease course. In addition, kernel density estimation with normal bases is applied to the estimated DPSs for clinical status prediction using a Bayesian classifier. Different logistic functions, including a new generalized type proposed in this study called modified Stannard, and loss functions are applied to AD progression modeling of the Alzheimer’s Disease Neuroimaging Initiative (ADNI)

(Petersen et al., 2010) and the National Alzheimer’s Coordinating Center (NACC) (Beekly et al., 2007) data using volumetric MRI biomarkers, PET measurements, and neuropsychological tests. The obtained results indicate that the proposed generalized logistic function (modified Stannard) fitted using the modified Huber loss achieves the best modeling performance over different bootstraps, and it consistently outperforms the basic algorithm of Jedynak et al. (2012, 2015) in all the considered experiments.

The main contribution of this study is four-fold. First, the introduction of different generalized logistic functions including the new type proposed in this study called modified Stannard allows for more flexibility, e.g., asymmetry, improving the biomarker trajectory modeling capabilities of the method compared to using a standard sigmoid function. Second, the use of M-estimation suppresses the effect of outliers on the fit. Third, the across-cohort generalizability of the proposed model is evaluated by applying the model trained using ADNI data to the test data from the NACC cohort. Finally, an end-to-end approach is introduced that performs biomarker trajectory modeling (unsupervised learning), biomarker’s significant change detection (event ordering), and clinical status prediction (supervised learning). This is a holistic way to implement a system suitable for both research and clinical applications to better study and monitor AD.

2 The proposed method

The main objective of this work is to minimize the error of parametric disease progression modeling while making the estimates stable and robust to outliers. This is achieved by fitting a constrained generalized logistic function using a maximum likelihood type estimator (M-estimator).

2.1 Modeling dynamics of biomarkers

Similar to Jedynak et al. (2012, 2015), two sets of parameters are estimated in the model: observed biomarker-specific parameters, which are assigned for fitting the biomarker curves, and hidden subject-specific (age-related) disease progression parameters that are defined for aligning the trajectory of subjects. Assume that is the -th biomarker’s value at the -th visit of the -th subject and is an S-shaped logistic function of DPS with parameters . Each biomarker measurement is defined as


is the standard deviation of the

-th biomarker with parameters, is additive white Gaussian noise (random effect) with i.i.d. assumption, and is the DPS for the -th visit of the -th subject and is obtained as

where is the chronological age of subject in visit , and and are the rate and onset of disease progression of subject , respectively. Finally, the objective function for robust nonlinear regression is defined as

where is a maximum likelihood type function.

For fitting the longitudinal trajectories of biomarkers, four logistic functions are considered (Table 1). All functions have the same range and can produce the same inflection points at , to be used for biomarker ordering. We candidate utilization of a flexible logistic function obtained by modifying Stannard function (Stannard et al., 1985) so as to create an asymmetric growth curve with an inflection point at . In the defined functions, and denote the growth rate and symmetry parameter of the curves, respectively. The reason for restricting to the positive real numbers is to make parameters of the estimation identifiable.

Logistic function
Verhulst at 0
Gompertz at 0
Richards at 0
Modified Stannard at 0
Table 1: Details of the utilized logistic functions for AD progression modeling. Note that the range of each function can be controlled by two additional parameters.

It can also be deducted from Table 1 that the sigmoid function first introduced by Verhulst (Verhulst, 1845) is a special (symmetric) case of both Richards’ function (Richards, 1959) and the proposed function when . Moreover, Gompertz’s function (Gompertz, 1825) is a simplified form of Richards’ function when approaches zero, i.e., . Finally, the upper and lower asymptotes of the curves can be adjusted by two additional parameters (Zwietering et al., 1990) as

The range parameters, and , are set to fixed values corresponding to the pre-defined range of biomarkers, which is almost always the case for neuropsychological tests. For the other biomarkers whose exact range are not defined, e.g., volumetric MRI and PET measurements, the parameters are constrained to lower and upper bounds derived from the available range of the data, i.e., .

2.2 Model fitting

Alternating approach, as an efficient optimization technique, is applied to solve the problem where the algorithm iteratively estimates using fixed values of and and vice versa until the parameters converge. This way, there is no need to optimize the objective function for the standard deviation of biomarkers , and they can be obtained when the biomarker-specific parameters are estimated. The proposed algorithm can be summarized as follows

  1. Initialization: initialize and update them by iteratively applying linear least squares regression to fit the dynamics of biomarkers while linearly mapping ages to DPSs in an alternating scheme.

  2. Optimization: iterate until convergence.

    Biomarker fitting: estimate the biomarker-specific parameters using values of all subjects and visits.


    Age mapping: estimate the subject-specific parameters using values of all biomarkers and visits.


where , , and correspond to the number of subjects available for the -th biomarker, number of measurements among all subjects and visits available for the -th biomarker, and number of measurements among all biomarkers and visits available for the -th subject, respectively. Moreover, denotes the number of biomarker-specific parameters for the -th biomarker, is called the generalized M-scale estimator (Van Aelst et al., 2013) for the -th biomarker, and

represents the degrees of freedom of the

-th fit. Therefore, the algorithm can be applied in case any subject has at least two distinct points considering all biomarkers and visits, and if any biomarker contains more than points considering all subjects and visits.

The utilized maximum likelihood type functions for robust regression (Holland and Welsch, 1977; Pennacchi, 2008) are described in Table 2. These estimators attempt to diminish influence of the outliers while fitting curves. In general, M-estimators use a tuning parameter called to scale the functions as in order to yield

asymptotic efficiency with respect to the standard normal distribution. The corresponding tuning constants for the utilized functions are also reported in Table


Loss function
L2 1
L1-L2 1
Logistic 1.205
Modified Huber 1.2107
Cauchy-Lorentz 2.3849
Table 2: The utilized -type M-estimators and their corresponding scale factors for robust regression.

Finally, the obtained DPSs are standardized with respect to the scores of the available cognitively normal visits of subjects in order to calibrate all biomarker trajectories in different experiments. This process removes mean of the normal visits’ distribution of DPSs and scales the range to give a better intuition of timing of disease progression in the course of AD. In this case, it would be necessary to properly update the parameters as

where and are the mean and standard deviation of the DPSs in the available cognitively normal visits of subjects, respectively.

2.3 Biomarker value prediction

Biomarker values can be predicted using the fitted model parameters. Age mapping part of the proposed algorithm is applied to estimate the subject-specific parameters using Equation (2) based on values of those biomarkers of the test subject that have available estimated biomarke-specific parameters in the fitted model. Next, biomarker values are predicted as using the estimated test DPSs where is the logistic function applied to model fitting.

2.4 Clinical status prediction

In order to predict the clinical status of test subjects per visit, kernel density estimation (KDE) (Parzen, 1962) is used to fit likelihoods of cognitively normal, cognitively impaired, and AD groups in a nonparametric way. Assume that is a set of i.i.d. DPSs sampled from an unknown distribution with density function , where denotes the -th class label. KDE is expressed as

where is a smooth (kernel) function with a smoothing bandwidth . Here, the Gaussian kernel is used as the smoothing function.

The clinical status is predicted based on the DPSs with a Bayesian classifier that uses the KDE-based fitted likelihoods as


is the data-driven prior probability for the

-th class, the term in the denominator specifies the evidence, and

is the posterior probability for predicting the test DPS that belongs to the class


3 Experimental setup

3.1 Data

Data used in this work is obtained from the ADNI and the NACC databases.

clinical status age, year education, year MMSE
(meanSD) (meanSD) (meanSD)
CN 77.814.95 15.553.03 29.111.09
MCI 75.077.67 15.722.89 27.462.13
AD 76.467.50 15.662.90 21.704.38


Unknown 74.987.72 15.812.81 26.572.61
CN 81.867.72 13.923.58 27.932.10
MCI 80.728.54 13.933.63 25.432.73
AD 81.108.15 13.923.63 19.565.08


Unknown 79.408.38 14.363.45
  • Note that missing clinical status is indicated as Unknown, and MMSE is not available for NACC subjects at visits with missing clinical status.

Table 3: Demographics of the utilized data across visits.
# visits per clinical status # visits per subject visit interval, year # measurements per subject
CN MCI AD Unknown (meanSD) (meanSD) (meanSD)
ADNI 405 3,847 2,057 1,431 6.522.45 0.640.37 46.8620.79
NACC 178 203 317 199 7.732.95 0.980.60 19.587.40
Table 4: Statistics of the visits per dataset.

3.1.1 Adni

The ADNI was launched in 2003 as a public-private partnership, led by principal investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease. We use The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge dataset (Marinescu et al., 2018) that includes the three ADNI phases ADNI 1, ADNI GO, and ADNI 2. This dataset contains measurements from brain MRI, PET, CSF, neuropsychological tests, and demographics and genetics information.

The labels cognitively normal (CN), significant memory concern (SMC), and normal (NL) are merged under CN; mild cognitive impairment (MCI), early MCI (EMCI), and late MCI (LMCI) under MCI; and AD and dementia under AD. In addition, subjects converting from one clinical status to another, e.g., MCI-to-AD, are assigned the latter label (AD in this example). The utilized ADNI data includes T1-weighted brain MRI volumes of ventricles, hippocampus, whole brain, fusiform, and entorhinal cortex, fludeoxyglucose-PET (FDG-PET), as well as the neuropsychological tests clinical dementia rating sum of boxes (CDR-SB), Alzheimer’s disease assessment scale 13 items (ADAS-13), mini-mental state examination (MMSE), functional activities questionnaire (FAQ), Montreal cognitive assessment (MOCA), and Rey auditory verbal learning test of immediate recall (RAVLTI).

3.1.2 Nacc

The NACC, established by the National Institute on Aging of the National Institutes of Health in 1999, has been developing a large database of standardized clinical and neuropathological data from both exploratory and explanatory Alzheimer’s disease research (Beekly et al., 2007). The data has been collected from different Alzheimer’s disease centers across the United States and among others contains measurements from different modalities such as MRI, PET, and neuropsychological tests.

Labels with numerical cognitive status of one (normal cognition) and two (impaired-not-MCI) are merged under CN, and cognitive status of three (MCI) and four (Dementia) are set to MCI and AD, respectively. It should be noted that we only keep subjects with primary etiologic diagnosis of normal, AD, or missing. This is to exclude subjects diagnosed with other types of dementia, non-neurodegenerative disease, or a non-neurological condition. The used NACC data includes T1-weighted brain MRI volumes of hippocampus and whole brain, and the neuropsychological tests CDR-SB, MMSE, and total FAQ.

3.1.3 Data filtering

In each of the ADNI and NACC subsets, within-class outliers across all subjects and visits are filtered for each biomarker using pre-defined range of the biomarker by assuming them as missing values. Moreover, erroneous and preceding dates or clinical diagnoses are corrected or removed per subject considering the neighboring visits. Finally, measurements or clinical diagnoses with missing date information per visit are set to missing values. We also omit normal subjects staying normal to reduce the effect of normal aging on AD progression modeling.

In order to be able to apply the proposed algorithm, the utilized data needs to meet two filtering criteria. According to the first criteria, subjects with less than two distinct points considering all biomarkers and visits are eliminated. To fulfill the second criteria, any biomarker with less than points considering all subjects and visits is omitted.

The utilized 12 ADNI biomarkers are collected from 1,187 subjects (692 males and 495 females) in 7,740 visits between 2005 and 2017, and the five NACC biomarkers are acquired from 116 subjects (43 males and 73 females) in 897 visits between 2002 and 2018. Table 3 and Table 4 summarize statistics of the demographics and measurements in the two datasets after data filtering. Note that both datasets include missing values and missing clinical status, the latter indicated as Unknown.

3.1.4 Data pre-processing

In the ADNI dataset, the volumetric measurements were obtained using two different versions of the public FreeSurfer software package (Fischl et al., 2002), and in the NACC dataset, they were calculated using IDeA Lab following ADNI protocols. Therefore, the MRI measurements need to be standardized across FreeSurfer version (Gronenschild et al., 2012) and across software package. In addition, biological difference in brain size hinders direct utilization of MRI measurements for disease progression estimation. Intracranial volume (ICV) is a commonly used measure for normalization to correct for head size. To overcome both aforementioned problems of difference in software (version) and head size, we employ the residual approach (O’Brien et al., 2011)

based on the analysis of covariance, which takes data from control groups and linearly regresses MRI volumes on their corresponding ICV as a covariate of interest. The corrected measurements can thus be calculated as the estimated residuals

of the volumes using the regression parameters obtained from the control data as

where is the -th MRI volume for subject at visit calculated (observed) using software or software version , is the corresponding intracranial volume, and and are the intercept and slope of the regression estimated from the CN group.

3.1.5 Data partitioning and bootstrapping

To evaluate the algorithms, each of the ADNI and NACC datasets is partitioned into two non-overlapping sets for training and testing. Based on the first and last available diagnoses of subjects, i.e., CN-CN, CN-MCI, …, AD-AD, we divide each of these types of pairs into two groups including few and large number of visits using the median number of visits as threshold and randomly select of the subjects from each of these groups for testing.

Additionally, bootstrapping is used in the experiments for quantification of the estimation, and in each bootstrap, a subset of the training subjects is randomly sampled with replacement based on the first and last available pair of diagnoses while ensuring that subjects with few and large number of visits are included in the training set. The unused subjects are assigned for validation, which constitute of the subjects where

is the base of the natural logarithm. This also means that the estimated variance using the bootstrapped model will account for approximately

of the total variance.

To facilitate future research in AD progression modeling and comparison with the current study, all the data splits, including each bootstrap split, are also provided on arXiv as ancillary files.

3.2 Evaluation metrics

The mean of the median absolute error (MMAE) across all biomarkers is used to assess the modeling performance. It is a measure robust to outliers based on the absolute differences between actual and estimated values over all normalized biomarker values as follows

where is the number of biomarkers, and are the ground-truth and estimated values of the -th biomarker for the -th test subject at the -th visit, and and indicate the lower and upper bounds of the -th biomarker. The modeling performance of two different methods is statistically compared using the paired, two-sided Wilcoxon signed rank test (Wilcoxon, 1945) applied to the MMAEs obtained from different bootstraps.

Additionally, multi-class area under the receiver operating characteristic (ROC) curve (MAUC) (Hand and Till, 2001) is used to measure the diagnostic performance in a multi-class test set and is calculated using the posterior probabilities as

where is the number of distinct classes, denotes the number of available observations belonging to the -th class, and is the sum of the ranks of posteriors after sorting all concatenated posteriors in an ascending order, where and

are vectors of DPSs belonging to the true classes

and , respectively. Likewise, is the sum of the ranks of posteriors after sorting all concatenated posteriors in an ascending order.

[1pt][lr] Logistic
functionLoss function L2 L1-L2 Logistic Modified Huber Cauchy-Lorentz
Verhulst 60.900.92 59.851.03 59.711.00 59.640.98 60.201.03
Gompertz 60.880.93 60.070.98 59.880.95 59.830.87 60.610.95
Richards 60.590.91 59.760.91 59.610.85 59.550.84 60.160.95
Modified Stannard 60.610.81 59.700.89 59.700.95 59.490.75 60.070.90
  • The best result is shown in bold face and its corresponding configuration is selected for the remaining experiments.

Table 5: Modeling performance as MMAE (meanSD) for 100 bootstrapped ADNI validation subsets using different logistic and loss functions.

3.3 Initialization of the algorithm and optimization

Since the fitting algorithm is iteratively performed using an alternating approach starting from values optimized in the previous step, initialization is an important step for efficiently reaching the optimum. We initially set and to and , respectively. Moreover, we set the intercept of the (substituted) linear functions used for biomarker fitting to and their constrained slope () to either or depending on the distribution of biomarker values across age. A positive slope is considered when the median of the -th biomarker’s values for young subjects is less than that for old subjects and vice versa. Next, we update estimates of the subject/biomarker-specific parameters for iterations using linear least squares fitting for both biomarker fitting and age mapping.

After all parameters are linearly estimated, the parameters of the logistic functions are initially estimated as and , where and are the minimum and maximum of the -th biomarker’s values, respectively, provided that the slope is positive, and vice versa if the slope is negative. In addition, each inflection point is estimated as the median of the obtained DPSs. Finally, we repeat the alternating procedure using the logistic functions and the trust-region algorithm (Coleman and Li, 1996) considering robust estimators for at most iterations.

3.4 Stopping criteria

To avoid overfitting, the optimization process is stopped according to the following stopping criteria (Prechelt, 1998)

where and are the average validations loss and average training loss over all biomarkers at the -th iteration, respectively, and they are obtained through biomarker fitting using Equation (1). The left hand side of the inequality is called the generalization loss, and the right hand side of the inequality is the training progress measured by a training strip of length and threshold of , set to and , respectively. This strip provides a suppressed stopping taking iterations into account, which allows the generalization loss to possibly be repaired when it begins fluctuating and the training error still decreases quickly.

4 Results and discussion

4.1 Biomarker modeling

First, the proposed method is applied to model the dynamics of the ADNI biomarkers. Table 5 illustrates the modeling performance (MMAE) for ADNI validation subsets obtained from 100 bootstraps using different logistic and loss functions. As can be seen, the combination of the modified Stannard function for biomarker fitting and the modified Huber loss achieves the best modeling performance of all the alternatives with both the lowest mean MMAE and the smallest standard deviation. This configuration will be used in all the remaining experiments.

To further investigate the stability and robustness of the model with the chosen configuration of logistic and loss functions, we visualize the fitted trajectories for each of the 100 bootstrap runs together with their average per biomarker in Figure 2.

Figure 2: Estimated curves per bootstrap for the ADNI biomarkers using the modified Stannard function and the modified Huber loss. The average of the 100 bootstrapped curves per biomarker is shown as the black curve.

(a) The whole trajectory of all biomarkers.

(b) A zoom on the DPS axis showing the most dynamic area.
Figure 3: Average of the normalized curves of the ADNI biomarkers across 100 bootstraps.
Figure 4: Temporal ordering of the ADNI biomarkers in the disease course obtained using inflection points and quantified through 100 bootstraps.
Within cohort Across cohort
Jedynak et al. (2015) 61.390.64 73.669.67 69.141.81
The proposed 59.010.30 71.515.23 66.201.10
Table 6: Test modeling performance of different methods as MMAE (meanSD) for ADNI and NACC test biomarkers. All average MMAEs are significantly different ().

4.2 Temporal ordering of biomarkers

To give an indication of the timing and the dynamics of the different biomarkers relative to each other, Figure 3 shows the average curves scaled to using the estimated upper and lower asymptotes per biomarker and superimposed in the same figure. The distribution of the inflection points of the biomarkers, quantified through bootstrapping, can be used to see how biomarkers proceed in the course of AD with respect to each other. The inflection point can be considered a turning point after which a dramatic change in the biomarker is expected to occur. Figure 4 displays the temporal ordering of the ADNI biomarkers based on the estimated inflection points. As can be seen, RAVLTI precedes all other biomarkers. This interesting finding is in line with the results of Jedynak et al. (2012, 2015) and is consistent with several clinical studies concluding that some neuropsychological tests including RAVLT are significant predictors that can predict neurodegenerative changes up to 10 years before clinical diagnosis (Tierney et al., 2005; Landau et al., 2010; Zandifar et al., 2019). Moreover, bearing Figure 3 in mind, volumetric MRI biomarkers start becoming abnormal early in the disease course. However, some of these biomarkers such as ventricles and whole brain are, as also seen in Figure 4, noisy measurements to model progression of AD in this dataset. It is important to note that the inflection points are utilized to order the biomarkers in the disease course. These points do not imply when the biomarkers start becoming abnormal and hence, are not measures for early abnormality detection.

4.3 DPS distribution versus biomarker timing

Figure 5 shows the variance of the estimated inflection points per biomarker alongside the estimated class-conditional likelihoods of the obtained DPSs from 100 bootstraps. As can be seen, there is a high overlap between the DPS distributions of CN and MCI while the CN and AD groups can be discriminated easily. Moreover, the estimated inflection points per biomarker are almost in line with those of the hypothetical model by Jack Jr et al. (2010) that illustrates when biomarkers are dynamic versus disease stages. Especially, inflection points of the MRI biomarkers (brain structure) are mainly located in the MCI stage while those of the neuropsychological tests (memory) lie on the AD stage.

4.4 Predicting biomarker values

The biomarker-specific parameters estimated using the training set in each bootstrap are applied to map the test individuals’ ages to DPSs using Equation (2). The obtained DPSs are subsequently fed to the per-bootstrap estimated biomarker functions for the proposed model and the analogous state-of-the-art model by Jedynak et al. (2015) that fits the basic sigmoid function using unconstrained, L2-norm loss function to evaluate their predictive power. Table 6 shows the ADNI test MMAEs of the aforementioned methods obtained using the estimates from 100 bootstraps. The proposed model significantly outperforms the analogous state-of-the-art model with an average MMAE of vs. ().

Figure 5: Estimated class-conditional likelihoods using the DPSs obtained from 100 ADNI-trained bootstraps. The box plots indicate the 25th to 75th percentiles of the estimated inflection points per biomarker, centrally marked with the median, and they are extended to the most extreme non-outlier inflection points using dashed lines.

Figure 6: Estimated curves per bootstrap for the NACC biomarkers using the modified Stannard function and the modified Huber loss. The average of the 100 bootstrapped curves per biomarker is shown as the black curve. The last subfigure shows average of the normalized curves of the NACC biomarkers across 100 bootstraps.

4.5 Predicting clinical status

To evaluate the diagnostic predictive performance, the obtained training DPSs per bootstrap are used to estimate class-conditional likelihood functions using KDE and fed to a three-class Bayesian classifier with prior probabilities proportional to the number of training observations in each class. The classifiers, one for each bootstrap, are applied to the test DPSs estimated as described in Section 4.4 to compute the posterior probabilities of the clinical labels. The proposed model achieves an MAUC of in predicting the clinical status of the test ADNI subjects per visit, which reveals the effect of modeling on classification performance.

The obtained posterior probabilities from the different classifiers from each bootstrap can be combined using ensemble learning techniques to potentially improve prediction performance and robustness (Kuncheva, 2014); for example, by taking the average of the within-class posteriors over an ensemble of models from different bootstraps, corresponding to bagging, before calculating the MAUC. By fusing the posteriors of the proposed method in such a way, the MAUC increases to .

The MAUC of obtained using the proposed method outperforms the state-of-the-art, cross-sectional MRI-based classification results of the challenge on Computer-Aided Diagnosis of Dementia (CADDementia) (Bron et al., 2015) where the best MAUC was . However, it should be noted that the CADDementia algorithms were applied to classify data from independent cohorts using different types of cross-sectional MRI biomarkers besides volumetry.

4.6 Generalizability across cohorts

As the final set of experiments, the generalizability of the proposed model to an independent cohort is assessed using the NACC data. First, the model using the optimal configuration (modified Stannard as logistic function and modified Huber loss as M-estimator) according to the ADNI experiments is applied to model AD within NACC. Figure 6 depicts the modeled NACC biomarkers for 100 bootstraps. Second, the previously ADNI-trained model with the optimal configuration is utilized to predict the NACC test measurements using the ADNI-based estimates of the corresponding set of biomarkers, i.e., CDR-SB, MMSE, FAQ, hippocampus, and whole brain. Table 6 compares the modeling performance of the ADNI-trained and the NACC-trained models applied to the NACC test set. As it can be noticed from the obtained results, the previously selected configuration for ADNI data is also a good choice when applied to NACC data. Moreover, both modeling performance of the proposed method and the state-of-the-art model of Jedynak et al. (2015) significantly improve () by applying the ADNI-trained model to the NACC test set by average MMAEs of and , respectively. That is, not only does the ADNI-trained model generalize across cohorts, it also improves the performance compared to using the NACC-trained model. One reason for the improvement could be the relatively larger size of the ADNI data and its additional biomarkers providing a better foundation for robust modeling of biomarkers. The results also show that the proposed model significantly outperforms () the state-of-the-art model of Jedynak et al. (2015) in all cases.

Additionally, we apply the ADNI and NACC trained models to the estimated test NACC DPSs to predict clinical status per subject per visit. The proposed model achieves MAUCs of and in predicting clinical status of the test NACC subjects per visit when the trained NACC and trained ADNI models are applied, respectively. This also reveals that the diagnostic performance improves by applying the ADNI-trained model to the NACC test set.

5 Conclusions

In this paper, a robust parametric model of Alzheimer’s disease progression was proposed based on alternating M-estimation using the modified Huber loss that linearly transformed individuals’ chronological ages to disease progression scores and fitted a novel generalized logistic function to the longitudinal dynamic of each biomarker. The estimated parameters were then used to temporally order the biomarkers in the disease course and to predict the test biomarker values as well as the clinical status per subject visit. The obtained results showed the superiority of the proposed method over the analogous state-of-the-art model proposed by Jedynak et al. (2015) in terms of prediction performance and generalizability.

The proposed approach can be applied to different time-series data including missing points and labels, or to biomarkers with other characteristics than the monotonic behavior that one typically encounters in MRI-based neurodegenerative disease progression modeling as long as suitable functions are used for the biomarker modeling. Moreover, as an alternative to using M-estimators, resistant estimators such as the least trimmed sum of squares and least median of squares (Rousseeuw and Leroy, 1987) with higher breakdown points can be used instead in the algorithm to fit biomarkers. However, this will result in an additional parameter to be optimized for the coverage (range) needed for trimming the residuals.


M. Nielsen is shareholder in Biomediq A/S and Cerebriu A/S. A. Pai is shareholder in Cerebriu A/S. The remaining authors report no disclosures.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721820. This work uses the TADPOLE data sets ( constructed by the EuroPOND consortium ( funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 666992.

ADNI acknowledgments–Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

NACC acknowledgments–The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIAfunded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P50 AG005134 (PI Bradley Hyman, MD, PhD), P50 AG016574 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Thomas Wisniewski, MD), P30 AG013854 (PI M. Marsel Mesulam, MD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P50 AG005131 (PI James Brewer, MD, PhD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG053760 (PI Henry Paulson, MD, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P30 AG049638 (PI Suzanne Craft, PhD), P50 AG005136 (PI Thomas Grabowski, MD), P50 AG033514 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), P50 AG047270 (PI Stephen Strittmatter, MD, PhD).


  • Beekly et al. (2007) Beekly, D.L., Ramos, E.M., Lee, W.W., Deitrich, W.D., Jacka, M.E., Wu, J., Hubbard, J.L., Koepsell, T.D., Morris, J.C., Kukull, W.A., et al., 2007. The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set. Alzheimer Dis. Assoc. Disord. 21, 249–258.
  • Biagioni and Galvin (2011) Biagioni, M.C., Galvin, J.E., 2011. Using biomarkers to improve detection of Alzheimer’s disease. Neurodegener. Dis. Manag. 1, 127–139.
  • Bron et al. (2015) Bron, E.E., Smits, M., Van Der Flier, W.M., Vrenken, H., Barkhof, F., Scheltens, P., Papma, J.M., Steketee, R.M., Orellana, C.M., Meijboom, R., et al., 2015. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: the CADDementia challenge. NeuroImage 111, 562–579.
  • Coleman and Li (1996) Coleman, T.F., Li, Y., 1996. An interior trust region approach for nonlinear minimization subject to bounds. SIAM J. Optim. 6, 418–445.
  • Ewers et al. (2011) Ewers, M., Sperling, R.A., Klunk, W.E., Weiner, M.W., Hampel, H., 2011. Neuroimaging markers for the prediction and early diagnosis of Alzheimer’s disease dementia. Trends Neurosci. 34, 430–442.
  • Fischl et al. (2002) Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., Van Der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., Dale, A.M., 2002. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355.
  • Fonteijn et al. (2012) Fonteijn, H.M., Modat, M., Clarkson, M.J., Barnes, J., Lehmann, M., Hobbs, N.Z., Scahill, R.I., Tabrizi, S.J., Ourselin, S., Fox, N.C., et al., 2012. An event-based model for disease progression and its application in familial Alzheimer’s disease and Huntington’s disease. NeuroImage 60, 1880–1889.
  • Ghazi et al. (2019) Ghazi, M.M., Nielsen, M., Pai, A., Cardoso, M.J., Modat, M., Ourselin, S., Sørensen, L., 2019.

    Training recurrent neural networks robust to incomplete data: application to Alzheimer’s disease progression modeling.

    Med. Image Anal. 53, 39–46.
  • Gompertz (1825) Gompertz, B., 1825. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos. Trans. R. Soc. London , 513–583.
  • Gronenschild et al. (2012) Gronenschild, E.H., Habets, P., Jacobs, H.I., Mengelers, R., Rozendaal, N., Van Os, J., Marcelis, M., 2012. The effects of FreeSurfer version, workstation type, and Macintosh operating system version on anatomical volume and cortical thickness measurements. PloS One 7, e38234.
  • Guerrero et al. (2016) Guerrero, R., Schmidt-Richberg, A., Ledig, C., Tong, T., Wolz, R., Rueckert, D., 2016. Instantiated mixed effects modeling of Alzheimer’s disease markers. NeuroImage 142, 113–125.
  • Hand and Till (2001) Hand, D.J., Till, R.J., 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186.
  • Holland and Welsch (1977) Holland, P.W., Welsch, R.E., 1977. Robust regression using iteratively reweighted least-squares. Commun. Stat. Theory Methods 6, 813–827.
  • Jack Jr et al. (2013) Jack Jr, C.R., Knopman, D.S., Jagust, W.J., Petersen, R.C., Weiner, M.W., Aisen, P.S., Shaw, L.M., Vemuri, P., Wiste, H.J., Weigand, S.D., et al., 2013. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 12, 207–216.
  • Jack Jr et al. (2010) Jack Jr, C.R., Knopman, D.S., Jagust, W.J., Shaw, L.M., Aisen, P.S., Weiner, M.W., Petersen, R.C., Trojanowski, J.Q., 2010. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurol. 9, 119–128.
  • Jedynak et al. (2012) Jedynak, B.M., Lang, A., Liu, B., Katz, E., Zhang, Y., Wyman, B.T., Raunig, D., Jedynak, C.P., Caffo, B., Prince, J.L., 2012. A computational neurodegenerative disease progression score: method and results with the Alzheimer’s Disease Neuroimaging Initiative cohort. NeuroImage 63, 1478–1486.
  • Jedynak et al. (2015) Jedynak, B.M., Liu, B., Lang, A., Gel, Y., Prince, J.L., 2015. A computational method for computing an Alzheimer’s disease progression score; experiments and validation with the ADNI data set. Neurobiol. Aging 36, S178–S184.
  • Kuncheva (2014) Kuncheva, L.I., 2014. Combining pattern classifiers: methods and algorithms. John Wiley & Sons.
  • Landau et al. (2010) Landau, S., Harvey, D., Madison, C., Reiman, E., Foster, N., Aisen, P., Petersen, R.C., Shaw, L., Trojanowski, J., Jack, C., et al., 2010. Comparing predictors of conversion and decline in mild cognitive impairment. Neurol. 75, 230–238.
  • Lorenzi et al. (2017) Lorenzi, M., Filippone, M., Frisoni, G.B., Alexander, D.C., Ourselin, S., 2017. Probabilistic disease progression modeling to characterize diagnostic uncertainty: application to staging and prediction in Alzheimer’s disease. NeuroImage 190, 56–68.
  • Marinescu et al. (2018) Marinescu, R.V., Oxtoby, N.P., Young, A.L., Bron, E.E., Toga, A.W., Weiner, M.W., Barkhof, F., Fox, N.C., Klein, S., Alexander, D.C., 2018. TADPOLE challenge: prediction of longitudinal evolution in Alzheimer’s disease. CoRR abs/1805.03909.
  • O’Brien et al. (2011) O’Brien, L.M., Ziegler, D.A., Deutsch, C.K., Frazier, J.A., Herbert, M.R., Locascio, J.J., 2011. Statistical adjustments for brain size in volumetric neuroimaging studies: some practical implications in methods. Psychiatry Res.: Neuroimaging 193, 113–122.
  • Oxtoby and Alexander (2017) Oxtoby, N.P., Alexander, D.C., 2017. Imaging plus X: multimodal models of neurodegenerative disease. Curr. Opin. Neurol. 30, 371.
  • Oxtoby et al. (2014) Oxtoby, N.P., Young, A.L., Fox, N.C., Daga, P., Cash, D.M., Ourselin, S., Schott, J.M., Alexander, D.C., 2014. Learning imaging biomarker trajectories from noisy Alzheimer’s disease data using a Bayesian multilevel model, in: Bayesian and Graphical Models for Biomedical Imaging, pp. 85–94.
  • Parzen (1962) Parzen, E., 1962.

    On estimation of a probability density function and mode.

    Ann. Math. Stat. 33, 1065–1076.
  • Pennacchi (2008) Pennacchi, P., 2008. Robust estimate of excitations in mechanical systems using M-estimators–theoretical background and numerical applications. J. Sound Vib. 310, 923–946.
  • Petersen et al. (2010) Petersen, R.C., Aisen, P., Beckett, L., Donohue, M., Gamst, A., Harvey, D., Jack, C., Jagust, W., Shaw, L., Toga, A., Trojanowski, J., Weiner, M., 2010. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurol. 74, 201–209.
  • Prechelt (1998) Prechelt, L., 1998. Early stopping–but when?, in: Neural Networks: Tricks of the Trade, pp. 55–69.
  • Richards (1959) Richards, F., 1959. A flexible growth function for empirical use. J. Exp. Bot. 10, 290–301.
  • Rousseeuw and Leroy (1987) Rousseeuw, P.J., Leroy, A.M., 1987.

    Robust regression and outlier detection. volume 1.

    Wiley Online Library.
  • Stannard et al. (1985) Stannard, C., Williams, A., Gibbs, P., 1985. Temperature/growth relationships for psychrotrophic food-spoilage bacteria. Food Microbiol. 2, 115–122.
  • Tierney et al. (2005) Tierney, M.C., Yao, C., Kiss, A., McDowell, I., 2005. Neuropsychological tests accurately predict incident Alzheimer disease after 5 and 10 years. Neurol. 64, 1853–1859.
  • Van Aelst et al. (2013) Van Aelst, S., Willems, G., Zamar, R.H., 2013. Robust and efficient estimation of the residual scale in linear regression. J. Multivar. Anal. 116, 278–296.
  • Venkatraghavan et al. (2019) Venkatraghavan, V., Bron, E.E., Niessen, W.J., Klein, S., 2019. Disease progression timeline estimation for Alzheimer’s disease using discriminative event based modeling. NeuroImage 186, 518–532.
  • Verhulst (1845) Verhulst, P., 1845. La loi d’accroissement de la population. Nouv. Mém. de l’Academie Royale des Sci. et Belles-Lettres de Bruxelles 18, 1–41.
  • Wilcoxon (1945) Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biom. Bull. 1, 80–83.
  • Yau et al. (2015) Yau, W.Y.W., Tudorascu, D.L., McDade, E.M., Ikonomovic, S., James, J.A., Minhas, D., Mowrey, W., Sheu, L.K., Snitz, B.E., Weissfeld, L., et al., 2015. Longitudinal assessment of neuroimaging and clinical markers in autosomal dominant Alzheimer’s disease: a prospective cohort study. Lancet Neurol. 14, 804–813.
  • Zandifar et al. (2019) Zandifar, A., Fonov, V., Ducharme, S., Belleville, S., Collins, D.L., 2019. MRI and cognitive scores complement each other to accurately predict Alzheimer’s dementia 2 to 7 years before clinical onset. bioRxiv , 567867.
  • Zwietering et al. (1990) Zwietering, M., Jongenburger, I., Rombouts, F., Van’t Riet, K., 1990. Modeling of the bacterial growth curve. Appl. Environ. Microbiol. 56, 1875–1881.