Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by the presence of AD pathology (ADP) such as aberrant deposition of amyloid beta (A) proteins, and the appearance of neurofibrillary tangles of tau proteins. The initial symptom of AD is cognitive impairment notably in the memory domain, that gradually involves other domains leading to a clinical diagnosis of dementia of Alzheimer’s type (DAT). Patients with DAT progressively succumb to severe stages of dementia, requiring complete assistance for daily activities. DAT is the most common form of dementia, affecting 1 in 9 people over the age of 65 years [AlzAssoc2015FactsFigures] and as many as 1 in 3 people over the age of 85 [Hebert2013]
. As of 2015, there were an estimated 46.8 million dementia afflicted growing to reach 131.5 million in 2050[Prince2016], projecting a very sizeable burden on healthcare systems and caregivers worldwide. This impending public health crisis due to rising DAT cases has prompted drug-development efforts to find treatments for AD that can reduce the severity of ADP or remove it altogether [Cummings2014, Godyn2016]. However, the success of such treatments ultimately depends on the ability to diagnose DAT as early as possible before irreversible brain damage occurs. Therefore, in recent years there has been a considerable push towards developing robust biomarkers useful for diagnosing DAT in clinical practice [Weiner2017].
Fluorodeoxyglucose positron emission tomography (FDG-PET) is a minimally invasive neuroimaging technique to quantify the glucose metabolism in the brain which indirectly measures the underlying neuronal activity[Mosconi2010FdgPetReview]. As metabolic disruptions are hypothesized to precede the appearance of cognitive symptoms in AD [Jack2013]
, FDG-PET imaging presents itself as an attractive tool for investigating the metabolism changes triggered by ADP across the entire DAT spectrum, ranging from the presymptomatic phase to the mild cognitive impairment (MCI) stage followed by dementia. Our aim in this work is to develop an automatic method that can aid in the interpretation of the 3D topographic metabolism patterns encoded in FDG-PET images for the purpose of DAT diagnosis. To this end, we devised a supervised machine learning framework that takes as input a FDG-PET image of subject and outputs a continuous value between 0 and 1 termed as the FDG-PET DAT score (FPDS), which indicates the probability of the subject’s metabolism profile to be belonging to the DAT trajectory, i.e., how likely is the subject to be clinically diagnosed with DAT.
One of the main contributions of our work is the introduction of a novel approach for stratifying the imaging data used in the development and validation of the proposed FPDS methodology. Most commonly, imaging biomarker studies employ a 3 group stratification, where the clinical diagnostic labels of NC, MCI and DAT assigned at the time of image acquisition are directly used for grouping the imaging data [Rathore2017]
. In contrast, here we present a stratification scheme that groups images based not only on their associated clinical diagnosis but also on past and future clinical diagnoses. Our novel stratification is able to more faithfully represent the different diagnostic trajectories observed in a real-world clinical setting when compared to the stratification depending only on the diagnosis at a single timepoint. For instance, based on our stratification, we can distinguish among NC images that stay NC (stable NC, sNC) from those that convert to MCI (unstable NC, uNC), and from those that convert to DAT (progressive NC, pNC). A similar delineation is also induced among the MCI and DAT images using our stratification scheme. An important contribution in this paper is the design of a novel multi-scale ensemble classification model for the proposed FPDS computation. The ensemble model consists of several individual classifiers trained on features extracted from the FDG-PET image at multiple scales. The probability predictions from each of these individual classifiers regarding the association of the given FDG-PET image with a DAT trajectory are fused together to obtain a more robust final FPDS prediction. Another noteworthy contribution of our work is the exhaustive and comprehensive statistical evaluation approach used to validate the FPDS predictions. First, the training model fit was evaluated and then a pseudo-independent test sample consisting of follow-up images corresponding to the baseline training data was used to obtain a more accurate estimate of the ensemble model’s generalization error. Finally, the predictive performance of the FPDS biomarker was evaluated on a large completely independent validation set of images taken from different stages of the DAT spectrum demonstrating a strong generalization potential of the reported results. To the best of our knowledge, ours is the largest FDG-PET based imaging biomarker study reported till date.
2.1 Study participants
Data used in the preparation of this article was obtained from the ADNI database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by principal investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early Alzheimer’s disease. Till date, ADNI has involved 1887 subjects and assessed over one or more visits. Clinical diagnosis received by these subjects, can be broadly categorized among one of NC, MCI and DAT. Detailed description of the ADNI recruitment procedure, image acquisition protocols and diagnostic criteria can be found at www.adni-info.org.
2.2 Novel database stratification
We devised a novel stratification scheme to distinguish within the NC, MCI and DAT groups based on past and future clinical diagnosis received by the individual. Each of these three groups were further divided into subgroups based on the diagnoses received during their follow-up. The subgroups are named according to the convention ‘prefixGroup’, where ‘Group’ is the clinical diagnosis obtained during the imaging visit, and ‘prefix’ signifies the past or the future clinical diagnoses of the same individual. Images associated with clinical diagnosis of NC, and a consistent diagnoses of NC during the entire ADNI study period are termed as the stable NC (sNC) group. Images associated with clinical diagnosis of NC, but convert to MCI in the future visits are termed as unstable NC (uNC). Images associated with a clinical diagnosis of NC and convert to DAT in their future visits are termed as progressive NC (pNC). Similarly, images associated with MCI are subgrouped as stable MCI (sMCI) and progressive MCI (pMCI) based on persistent MCI diagnosis and conversion to DAT diagnosis respectively in their subsequent followup. Images with a clinical diagnosis of DAT who joined ADNI at the DAT stage, i.e., they converted to clinical diagnosis of DAT prior to ADNI recruitment, and remained DAT for the future ADNI visits are termed as stable DAT (sDAT). Images with a clinical diagnosis of DAT, with the recent past ADNI clinical diagnosis of either NC or MCI, i.e., they converted to DAT within the ADNI visits are termed as early DAT (eDAT). Note that a past or future clinical diagnosis visit may or may not include neuroimaging, but the past or future clinical diagnosis enables an enriched staging of each image given the evolution of clinical diagnosis.
The proposed stratification provides key advantage, offers subgroups namely pNC, pMCI, eDAT and sDAT, that represent various stages of DAT trajectory. The pNC subgroup is the earliest, the sDAT subgroup is the most advanced and the pMCI and eDAT subgroups are in-between these extremes along the DAT spectrum. These are denoted as the DAT+ class of images indicating their trajectory towards DAT. The subjects in the sNC, uNC and sMCI subgroups do not include a followup clinical diagnosis of DAT during the ADNI window; so although there is the possibility that post-ADNI these could progress to a clinical diagnosis of DAT, for the purposes of analysis in this paper, these subgroups are considered to not be on the DAT+ trajectory, hence denoted as DAT-.
2.3 MRI processing
Pre-processing of the 3D structural MPRAGE T1-weighted MRI images from ADNI included standard intensity normalization to remove image geometry distortions arising from gradient non-linearity, B1 calibrations to correct for image intensity non-uniformities and N3 histogram peak sharpening (http://adni.loni.usc.edu/methods/mri-analysis/mri-pre-processing). The pre-processed images were segmented into the gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) tissue regions [Dale1999FreesurferCorticalReconstruction] using the Freesurfer software package (https://surfer.nmr.mgh.harvard.edu). A rigorous quality control procedure was employed to manually identify and correct any errors in the automated tissue segmentations following Freesurfer’s troubleshooting guidelines. Subsequently, the GM tissue region was parcellated into 85 different anatomical ROIs using Freesurfer’s cortical [desikan2006automated] and subcortical [Fischl2002Freesurfer] labeling pipelines.
2.4 FDG-PET processing
The ADNI FDG-PET images used in this study were pre-processed using a series of steps to mitigate inter-scanner variability and obtain FDG-PET data with a uniform spatial resolution and intensity range for further analysis (http://adni.loni.usc.edu/methods/PET-analysis/pre-processing). Briefly, the original raw FDG-PET frames were co-registered and averaged to obtain a single FDG-PET image, which was then mapped from its native space to a standard image grid with mm voxels. After standardizing the spatial resolution and orientation, the intensity range of the FDG-PET image was normalized such that average intensity of all the foreground voxels in the image was exactly equal to one. The intensity normalized images were then filtered using scanner-specific filter functions to obtain FDG-PET data at a uniform smoothing level of isotropic mm full width at half maximum (FWHM) Gaussian kernel.
2.5 Multi-scale patch-wise FDG-PET SUVR features
In order to better localize the average regional glucose metabolism signal, each of the GM ROIs obtained using Freesurfer were further subdivided into smaller volumetric sub-regions or patches. Our previously proposed adaptive surface patch generation method [Raamana2015ThickNetFusion_NBA], which is based on -means clustering, was applied to the 3D image domain to obtain a patch-wise parcellation of the GM ROIs. Instead of subdividing each GM ROI into a fixed number of patches, the number of patches per ROI were adaptively determined using the patch size parameter (), denoting the number of voxels in each patch. This achieves a patch density (patches in ROI/voxels in ROI) that is uniform () throughout the image domain, which is desirable, as it leads to a compact yet rich description of the entire GM tissue region. The scale-space theory framework [witkin1984scale] argues for storing the signal at multiple scales in the absence of a-priori knowledge regarding the appropriate scale at which to analyze the signal. Motivated by this scale-space idea, we generated different levels of patch-wise parcellations, = to obtain a fine to coarse multi-scale representation of the GM region for capturing the regional glucose metabolism signals at different scales. We note that the patch-wise parcellations were initially generated on the standard MNI ICBM 152 non-linear average T1 template [grabner2006symmetric] (http://nist.mni.mcgill.ca/?p=858) and then were propagated to each of the target MRI images in our dataset using the large deformation diffeomorphic metric mapping (LDDMM) non-rigid registration [Beg2005lddmm]. This template-based parcellation approach ensures a one-to-one correspondence between the target image patches, which is required for the construction of a valid multi-scale FDG-PET feature space in the next step.
The FDG-PET images were co-registered with their respective MRI images using the inter-modal linear registration facility [jenkinson2002improved] available as part of the FSL-FLIRT program (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT). The quality of the co-registration was visually checked and the detected failures were corrected by re-running FSL-FLIRT with a narrower rotation angle search range parameter to avoid getting trapped in local minima. The estimated degrees of freedom (DOF) mapping was used to transfer the patch-wise parcellations from the MRI domain onto the FDG-PET domain. The mean FDG-PET image intensity value in each of the mapped patches was used to calculate the patch-wise standardized uptake value ratios (SUVRs) as, the mean intensity in a given patch divided by the mean intensity in the brainstem, chosen as the reference ROI. This resulted in a total of
(including the original Freesurfer parcellation) patch-wise FDG-PET SUVR feature vectors that encoded the multi-scale regional glucose metabolism information derived from a given target FDG-PET image.
2.6 FDG-PET DAT score computation via supervised ensemble learning
A supervised classification framework following the well established ensemble learning paradigm was used to calculate the proposed FDG-PET DAT score from the multi-scale patch-wise SUVR feature vectors. The main idea behind ensemble based supervised classification is to combine several individually trained classifiers together to obtain a single, more robust classification model [dietterich2000ensemble]. Accordingly, in the proposed framework, classifiers were trained separately on each of the individual multi-scale feature vector spaces to construct a classifier ensemble. Then, a fusion of the multiple predictions from individual classifiers in the ensemble was performed, yielding the ensemble model estimate about the probability of the input multi-scale feature vectors belonging to the DAT+ trajectory. This probabilistic prediction output by the ensemble classification framework was taken to be the proposed FDG-PET DAT score.
The training samples corresponding to the DAT- and DAT+ classes needed for building the ensemble classification model were given by the baseline sNC (=) and sDAT (=) images respectively (Table 1, footnotes 5 and 6). The proposed multi-scale patch-wise FDG-PET SUVR feature vectors were extracted from all the training samples. To prevent over-fitting of the ensemble model to the chosen training sample set, the subagging approach [buhlmann2003bagging] was employed to randomly generate subsets of training samples. The random sampling was performed using a sampling ratio of in a stratified manner to avoid class imbalance, ensuring an equal number of samples from both the DAT- and DAT+ classes, i.e., samples in each of the training subsets. An ensemble of probabilistic kernel [Damoulas2008ProteinFoldRecog] classifiers were individually trained on each of the feature spaces using the
different training subsets. The classifier training was preceded by a t-statistic based feature selection step to identify the
most discriminative features within the feature vector and also to address the “curse of dimensionality” issue[Raamana2015ThickNetFusion_NBA]. Each of the trained probabilistic kernel classifiers output a continuous scalar , that denotes the probability of an input feature vector belonging to the DAT+ class ( being the DAT- class membership probability). The FDG-PET DAT score is then simply defined as the mean of the DAT+ class probability predictions obtained from each of the classifiers.
In summary, given an unseen “test” sample containing a FDG-PET/MRI image pair, we first extract the multi-scale patch-wise SUVR features vectors from the images, and then reduce the dimensionality of each of these feature vectors by retaining only the most discriminative features that were identified during the training phase. The pruned feature vectors are fed to the previously trained classifier ensemble to obtain probability predictions regarding the DAT+ class membership, which are then averaged to obtain the FDG-PET DAT score corresponding to the given test sample.
Our study dataset consisted of FDG-PET images (with corresponding structural MRI images), belonging to ADNI subjects, who have undergone imaging and clinical evaluations at one or more longitudinal time points. The images were stratified into one of the study groups based on the clinical diagnosis received at the time of image acquisition and the clinical diagnosis received previously and/or during subsequent follow-up time points (Table 1).
In the proposed stratification scheme, we distinguish among the images that have a clinical diagnosis of NC (sNC, uNC, pNC) at the imaging visit. Within this NC group, there are NC that will stay NC, i.e., stable NC (sNC, = images), convert to MCI, i.e., unstable NC (uNC, = images) or convert to DAT, i.e., progressive NC (pNC, = images), and hence even though all are NC, the images are treated as distinct subgroups of the NC group given their future divergent evolution of clinical diagnosis. In a similar fashion, we distinguish among the images with clinical diagnosis of MCI as consisting of those who will continue to stay MCI, i.e., stable MCI (sMCI, = images) throughout ADNI, or convert to AD, i.e., progressive MCI (pMCI, = images) at a future visit. Finally, we distinguish among those images that have an associated clinical diagnosis of DAT. Those DAT that had a previous clinical diagnosis of NC or MCI, i.e., joined ADNI as either NC or MCI and converted to DAT during ADNI are denoted as the early DAT group (eDAT, = images) given their recent conversion, whereas those that joined ADNI with a clinical diagnosis of DAT and hence their conversion was prior to their ADNI recruitment and remained DAT throughout the ADNI window are designated as the stable DAT (sDAT, = images). There are 110 individuals with FDG-PET images at both the pMCI and the eDAT stages, i.e., these individuals underwent conversion from MCI to DAT during the ADNI window and this conversion was sampled with neuroimaging.
3.1 Demographic, clinical & biomarker values across groups
The stratified image sets were compared
-test or Wilcoxon ranksum test was used depending on if the data followed a normal distribution or not. The cases where the group mean values were significantly (<) different are highlighted in bold and the cases where data followed a normal distribution are underlined.
for group-level differences in their associated age, mini mental state exam (MMSE) score and CSF t-tau/A measure (ratio of total tau to beta amyloid ) values. Pairwise significance testing of the group mean value differences was performed between all the groups, using the -test in the case of normally distributed data and the Wilcoxon ranksum test for the non-parametric data distribution case. The -values obtained from each of the pairwise significance tests are reported in Table 2. The statisical significance threshold was set at <. The mean age was observed to be statistically similar across all the groups except for the uNC and pNC groups which exhibited significantly higher ages. The mean MMSE scores were significantly higher among the sNC, uNC and pNC groups when compared to either the sMCI and pMCI groups or the eDAT and sDAT groups. The DAT- (sNC, uNC, sMCI) groups had significantly lower mean CSF t-tau/A measures when compared to the DAT+ (pNC, pMCI, eDAT, sDAT) groups apart from the two cases where pNC showed statistically similar CSF t-tau/A measures compared to uNC and sMCI respectively.
3.2 Automatic salient ROI selection for FPDS computation
|ROI||Frequency ()||ROI||Frequency ()|
|name||[Left | Right]||name||[Left | Right]|
|isthmuscingulate||100.00 | 99.65||fusiform||24.29 | 0.53|
|precuneus||100.00 | 83.88||medialorbitofrontal||12.76 | 10.29|
|inferiortemporal||99.82 | 83.35||superiorfrontal||14.29 | 5.94|
|posteriorcingulate||96.12 | 85.06||superiortemporal||11.94 | 5.24|
|middletemporal||99.35 | 80.71||lateralorbitofrontal||12.18 | 2.24|
|inferiorparietal||99.18 | 64.94||superiorparietal||11.41 | 3.00|
|supramarginal||67.41 | 26.06||parsopercularis||9.88 | 1.06|
|entorhinal||57.94 | 32.53||temporalpole||9.35 | 0.18|
|hippocampus||47.82 | 32.00||rostralanteriorcingulate||5.18 | 0.00|
|bankssts||27.76 | 15.82||frontalpole||0.82 | 0.82|
|rostralmiddlefrontal||24.94 | 17.18||caudate||0.71 | 0.00|
|amygdala||22.18 | 17.29||parstriangularis||0.35 | 0.00|
|parahippocampal||28.00 | 10.06||parsorbitalis||0.18 | 0.00|
|caudalmiddlefrontal||22.76 | 13.18|
The feature selection phase of the ensemble classification model training identified several ROIs that contained strong discriminatory FDG uptake information useful for separating the DAT- and DAT+ classes. Specifically, each of the individual classifiers in the ensemble model automatically selected a set of most discriminative ROIs from which the multi-scale patch-wise FDG-PET SUVR features were taken and used to compute the FPDS. In Table 3, selection frequencies of the ROIs chosen by the classifier ensemble are listed. The selection frequency of a ROI is defined as the fraction of the classifiers in the ensemble that chose the particular ROI. Interestingly, ROIs from the left hemisphere exhibited much higher selection frequencies compared to the corresponding right hemisphere ROIs. Further, the cortical ROIs had far greater selection frequencies than the subcortical ROIs. In particular, the isthmus and posterior parts of the cingulate gyrus, the precuneus and the inferior and middle temporal gyri had very high (>90%) total (left and right averaged) selection frequencies.