3D CNN-based classification using sMRI and MD-DTI images for Alzheimer disease studies

Computer-aided early diagnosis of Alzheimers Disease (AD) and its prodromal form, Mild Cognitive Impairment (MCI), has been the subject of extensive research in recent years. Some recent studies have shown promising results in the AD and MCI determination using structural and functional Magnetic Resonance Imaging (sMRI, fMRI), Positron Emission Tomography (PET) and Diffusion Tensor Imaging (DTI) modalities. Furthermore, fusion of imaging modalities in a supervised machine learning framework has shown promising direction of research. In this paper we first review major trends in automatic classification methods such as feature extraction based methods as well as deep learning approaches in medical image analysis applied to the field of Alzheimer's Disease diagnostics. Then we propose our own algorithm for Alzheimer's Disease diagnostics based on a convolutional neural network and sMRI and DTI modalities fusion on hippocampal ROI using data from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). Comparison with a single modality approach shows promising results. We also propose our own method of data augmentation for balancing classes of different size and analyze the impact of the ROI size on the classification results as well.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 10

11/13/2018

Neuroimaging Modality Fusion in Alzheimer's Classification Using Convolutional Neural Networks

Automated methods for Alzheimer's disease (AD) classification have the p...
09/09/2018

Binary Classification of Alzheimer Disease using sMRI Imaging modality and Deep Learning

Alzheimer Disease (AD) is the most common form of dementia affecting the...
12/02/2017

A global feature extraction model for the effective computer aided diagnosis of mild cognitive impairment using structural MRI images

Multiple modalities of biomarkers have been proved to be very sensitive ...
03/10/2021

Fusing Medical Image Features and Clinical Features with Deep Learning for Computer-Aided Diagnosis

Current Computer-Aided Diagnosis (CAD) methods mainly depend on medical ...
12/22/2021

Fusion of medical imaging and electronic health records with attention and multi-head machanisms

Doctors often make diagonostic decisions based on patient's image scans,...
10/01/2019

Predicting Alzheimer's Disease by Hierarchical Graph Convolution from Positron Emission Tomography Imaging

Imaging-based early diagnosis of Alzheimer Disease (AD) has become an ef...
10/10/2017

Multilevel Modeling with Structured Penalties for Classification from Imaging Genetics data

In this paper, we propose a framework for automatic classification of pa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Alzheimer’s Disease (AD) is the most common type of dementia. It is characterized by degeneration of brain cells which results in changes of brain structures noticeable on images form different imaging modalities e.g. sMRI, DTI, PET. With the development of machine learning approaches, research on computer-aided diagnostics (CAD) has become very much intensive volumetric:VBM,c2.2_3D_CNN_USA; c4_sparse_encoder_Korea; c8_medical_UK; c5_demnet_Philippines.

Images of different modalities such as structural and functional magnetic resonance imaging (sMRI, fMRI), positron emission tomography (PET) and diffusion tensor imaging (DTI) scans can be used for early detection of the disease.

The majority of earlier works were focused on the volumetric-based approaches that perform comparison of anatomical brain structures assuming one-to-one correspondence between subjects. The wide-spread voxel-based morphometry (VBM) volumetric:VBM is an automatic volumetric method for studying the differences in local concentrations of white and gray matter and comparison of brain structures of the subjects to test with reference normal control (NC) brains. Tensor-based morphometry (TBM) volumetric:TBM was proposed to identify local structural changes from the gradients of deformations fields when matching tested brain and the reference healthy NC. Object-based morphometry (OBM) volumetric:OBM was introduced for shape analysis of anatomical structures.

In general, the automatic classification on brain images of different modalities can be applied to the whole brain c2.2_3D_CNN_USA; c4_sparse_encoder_Korea; c8_medical_UK; c5_demnet_Philippines, or performed using the domain knowledge on specific regions of interest (ROIs). Structural changes in some structures e.g. hippocampal ROI are strongly correlated to the disease pierrick. The changes in such regions are considered as AD biomarkers.

Advances in computer vision and content-based image retrieval research made penetrate the so-called feature-based methods into classification approaches for (AD) detection

f1_Jenny_MKL; f3_Jenny_pcc; f5_Jenny. The reason for this is in inter-subject variability, which is difficult to handle in VBM. On the contrary, the quantity of local features which can be extracted form the brain scans together with captured particularities of the image signal allowed an efficient classification with lower computational workload f5_Jenny

. The obtained feature vectors are classified using machine learning algorithms.

Lately with the development of neural networks the feature-based approach became less popular and is gradually replaced with convolutional neural networks of different architectures.

In the present paper we give a substantial overview of recent trends in classification of different brain imaging modalities in the problem of computer-aided diagnostics of Alzheimer disease and its prodromal stage, i.e. mild cognition impairment (MCI) and propose our own algorithm for this purpose. The algorithm is based on the recent trend in supervised machine learning such as Deep Convolutional Neural Networks (CNNs). We propose an adapted architecture of a CNN for classification of 3D volumes of hippocampal ROIs and explore fusion of two modalities sMRI and DTI available for the same cohort of patients. In our work we use a subset of ADNI database (http://adni.loni.usc.edu).

The paper is organized as follows. In Section 2 we overview the recent trends in classification of brain images in the problem of AD detection. Main feature-based approaches are presented in section 2.1. In section 2.2 we compare different approaches based on neural networks. Particular attention is paid in each case to fusion of modalities. All reviewed approaches are compared in Table 1. In section 3 we present the proposed method of classification with 3D CNNs. In section 4 results of the method are presented. Section 5 contains discussion and conclusion of our work and outlines research perspectives.

2 Review of the existing classification methods in the problem of AD detection

As an alternative to heavy volumetric methods, feature-based approaches were applied in the problem of AD detection using domain knowledge both on the ROI biomarkers and on the nature of the signal in sMRI and DTI modalities which is blurry and cannot be sufficiently well described by conventional differential descriptors such as SIFTSIFT and SURFSURF.

2.1 Feature-based classification

Feature-based classification can be performed on images of different modalities. Here we compare and discuss the usage of sMRI, DTI and sMRI fusion with other modalities.

2.1.1 sMRI

In previous joint work f3_Jenny_pcc, Ahmed et al. computed local features on sMRI scans in hippocampus and posterior cingulate cortex (PCC) structures of the brain. The originality of the work consisted in the usage of Gauss-Laguerre Harmonic Functions (GL-CHFs) instead of traditional SIFTSIFT and SURFSURF

descriptors. CHFs perform image decomposition on the orthonormal functional basis, which allows capturing local directions of the image signal and intermediate frequencies. It is similar to Fourier decomposition, but is more appropriate in case of smooth contrasts of MRI modality. For each projection of each ROI a signature vector was calculated using a bag-of-visual-words model (BoVWM) with a low-dimensional dictionary with 300 clusters. This led to the total signature length of 1800 per image. Principal component analysis was then applied to reduce the signature length to 278. The signatures then were classified using SVM with RBF kernel and 10-fold cross-validation and reached the accuracy level of 0.838, 0.695, 0.621 for AD/NC, NC/MCI and AD/MCI binary classification problems accordingly on the subset of ADNI database.

2.1.2 Dti

This modality is probably the most recent to be used for AD classification tasks. Both Mean Diffusivity (MD) and Fractional Anisotropy (FA) maps are being explored for this purpose. In

g1_graph_ensemble_2017

the authors acquired DTI images of 15 AD patients, 15 MCI patients, and 15 healthy volunteers (NC). After the preprocessing steps the FA map, which is an indicator of brain connectivity, was calculated. The authors considered 41 Brodmann areas, calculated the connectivity matrices for this areas and generated a connectivity graph with corresponding 41 nodes. Two nodes corresponding to Brodmann areas are marked with an edge if there is at least one fiber connecting them. Then the graph is described with the vector of features, calculated for each node and characterizing the connectivity of the node neighborhood. Totally each patient is characterized by 451 feature. The vectors were reduced to the size of 430 and 110 using ANOVA-based feature selection approach. All vectors were classified with the ensemble of classifiers (Logistic regression, Random Forest, Gaussian native Bayes, 1-nearest neighbor, SVM) using 5-fold cross-validation. The authors have achieved the 0.8, 0.833, 0.7 accuracy levels for AD/NC, AD/MCI and MCI/NC accordingly on their custom database.

Another methodology is described in g3_svm_DTI_Korea. The authors use the fractional anisotropy (FA) and mode of anisotropy (MO) values of DTI scans of 50 patients from the LONI Image Data Archive (https://ida.loni.usc.edu). After non-linear registration to the standard FA map, the authors calculate the skeleton of the mean FA image as well as MO and perform the second step of registration. After that a Relief feature algorithm is performed on all voxels of the image, relevant ones are used for 10-cross validation training the SVM classifier with RBF kernel. The declared accuracy is 0.986 and 0.977 for classification AD/MCI, AD/NC accordingly.

2.1.3 Data fusion

In g2_sift_China authors use a fusion of sMRI and PET images together with canonical correlation analysis (CCA). After preprocessing and aligning images of 2 modalities given the covariance data of sMRI and PET image they find the projection matrices by maximizing the correlation between projected features. Here

are the -dimentional sMRI and PET features of samples,

is a covariance matrix,

are the projection matrices and

are the resulting projections. The authors construct the united data representation for each patient:

and calculate SIFT descriptors. This descriptors are used to form the BoVW model, the classification is performed using SVM. The achieved accuracy is 0.969, 0.866 and for classifying AD/NC and MCI/NC accordingly on a subset of ADNI database.

Ahmed et al. in f5_Jenny demonstrated the efficiency of using the amount of cerebrospinal fluid (CSF) in the hippocampal area calculated by an adaptive Otsu’s thresholding method as an additional feature for AD diagnostics. In f1_Jenny_MKL they further improved the result of f3_Jenny_pcc by combining visual features derived from sMRI and DTI MD maps with a multiple kernel learning scheme (MKL). Similar to f3_Jenny_pcc they selected hippocampus ROIs on the axial, saggital and coronal projections and described them using Gauss-Laguerre Harmonic Functions (GL-CHFs). These features are clustered into 250 and 150 clusters for sMRI and MD DTI modalities and encoded using the BoVW model. Thus they got three sets of features: BoVW histogram for sMRI, BoVW histogram for MD DTI and CSF features. The obtained vectors are classified using MKL approach based on SVM. The achieved accuracy is 0.902, 0.794, 0.766 for AD/NC, MCI/NC and AD/MCI classification on a subset of ADNI database.

2.2 Classification with neural networks

Deep neural networks (DNN) and specifically convolutional NN (CNNs) have become popular now due to their good generalization capacity and available GPU Hardware needed for parameter optimization. Their main drawback for AD classification is the small amount of available training data and also a low resolution of input images when the ROIs are considered. This problem can be eliminated in several ways: i) by using shallow networks with relatively small number of neurons, ii) applying transfer learning from an existing trained network or iii) pretraining some of the layers of the network.

Forming shallow networks kills the idea of deep learning to recognize structures at different scales and reduces the generalization ability of the network, so this methodology has not often been used since recently, despite it has shown decent results aderghal2017classification. In this case the classification performance could be enhanced by selecting several ROIs in each image and applying the voting rule. In particular in ex_2 authors used 7 ROIs in each sMRI image.

One way to enlarge the dataset is to use domain-dependent data augmentation. In case of medical images this often comes down to mirror flipping, small-magnitude translations and weak Gaussian blurring aderghal2017classification.

Another way is to use more input data e.g. consider several ROIs instead of one. So Liu et al. in ex_6 first identify discriminative 50 anatomical landmarks from MR images in a data-driven manner, and then extract multiple image patches around these detected landmarks. After that they use a deep multi-task multi-channel convolutional neural network for disease classification. The authors addressed the problem of classification of patients into NC, stable MCI (sMCI) an progressive MCI (pMCI). The authors used MRI images from ADNI database containing in total 1396 images and achieved 0.518 accuracy in four-class (NC/sMCI/pMCI/AD) classification.

A more simplified idea was proposed by Cheng et al. in ex_7 as they used a number of 3D convolutional neural networks with 4 layers together with late fusion. With the subset of ADNI database of 428 sMRI images the authors achieved an accuracy value of 0.872 for AD/NC classification.

2.2.1 Autoencoders

The idea of pretraining some of the layers in the network is easily implemented with autoencodedrs (AE) or in image processing tasks more often with convolutional autoencoders (CAE). Autoencoder consists of an input layer, hidden layer and an output layer, where the input and output layers have the same number of units (Fig.

1). Given the input vector

autoencoder maps it to the hidden representation

:

where are the weights, are the biases, is the number of input units, is the number of hidden units, is anon-linear encoder function e.g. sigmoid. After that the hidden representation is mapped back to :

where , , is the identity function. The weights and biases are found by gradient methods to minimize the cost function:

where is the number of inputs.

Figure 1: Architecture of autoencoder

The overcompleted hidden layer is used to make the autoencoder extracting features.

Introducing spatial constraints with convolutions easily alignes the model of autoencoder to the convolutional autoencoder (CAE) and 3D convolutional autoencoder (3D-CAE).

In c4_sparse_encoder_Korea authors added a sparsity constraint to prevent hidden layers of autoencoder from learning the identity function. They use 3D convolutions on the both sMRI and PET modalities and train the autoencoder on random

image patches. Maxpooling, fully-connected and softmax layers were applied after autoencoding. Mixing data of sMRI and PET modalities is performed at FC layer. The use of autoencoders allowed the authors using a subset of ADNI database to increase the classification accuracy by 4-6% and leads to the level of 0.91 for AD/NC classification.

Nearly the same approach with a sparse 3D autoencoder was used in c8_medical_UK to classify sMRI images into 3 categories (AD/MCI/NC). The proposed network architecture is shown in Fig.2. Larger obtained dataset selected from ADNI database and more accurate network parameters configuration allowed the authors to reach the accuracy of 0.954, 0.868 and 0.921 in AD/NC, AD/MCI and NC/MCI determination accordingly.

Figure 2: Typical CNN architecture with CAE pretraining.

The authors of c2.2_3D_CNN_USA extended the idea of applying autoencoders. They proposed using three stacked 3D convolutional autoencoders instead of only one. Two fully-connected layers before the softmax were used for a progressive dimension reduction. The usage of stacked 3D CAE allowed the authors to achieve one of the best accuracy levels on 2265 images from ADNI database: 0.993, 1, 0.942 for AD/NC, AD/MCI and MCI/NC classification using sMRI images only.

2.2.2 Transfer Learning

Transfer learning is considered as the transfer of knowledge from one learned task to a new task in machine learning. In the context of neural networks, it is transferring learned features of a pretrained network to a new problem. Glozman and Liba in c1_report_Standford used the widely known AlexNet alexnet

, pretrained on the ImageNet benchmark and fine-tuned the last 3 fully-connected layers (Fig.

3). The main problem of transfer learning is the necessity to transform the available data so that it corresponds to the network input. In c1_report_Standford

the authors created several 3-channel 2D images from the 3D input of sMRI and PET images by choosing central and nearby slices from axial, coronal and saggital projections. They then interpolated the slices to the size

compatible with AlexNet. Naturally one network was used for each projection. To augment the source data only mirror flipping was applied. This transfer learning based approach allowed the authors to reach 0.665 and 0.488 accuracy on 2-way (AD/NC) and 3-way (AD/MCI/NC) classifications accordingly on a subset of ADNI database.

In ex_3 authors apart from using the transfer learning technique proposed a convolutional neural network by involving Tucker tensor decomposition for classification of MCI subjects. The achieved accuracy on a subset of ADNI database containing 629 subjects is 0.906.

Figure 3: AlexNet architecture. Includes 5 convolutional layers and 3 fully-connected layers.

2.2.3 2D convolutional neural networks

In c7.1_DeepAd_Canada, c7.3_DeeapAd_Canada, c7.2_DeeapAd_Canada the authors compared the classification of structural and functional MRI images using one of the lightest Deep architectures, the LeNet-5 architecture. They transformed the source 3D and 4D (in case of fMRI) data to a batch of 2D images. LeNet-5 consists of two convolutional and two fully-connected layers. The reached level of accuracy for 2-class classification (AD/NC) was 0.988 for sMRI and 0.999 for fMRI images.

Billones et al. proposed in c5_demnet_Philippines to use a modified 16-layered VGG network VGG to classify sMRI images. The key feature of this paper was to use 2D convolutional network to classify each slice of source data separately. The authors selected 20 central slices for each image and the final score was calculated as the output of the last softmax layer of the network. The accuracy of each slice among all images was also studied, 17 slices were selected as representative, 3 slices (the first and two last slices in the image sequence) demonstrated lower level of accuracy. All in all authors reached a very good accuracy level: 0.983, 0.939, 0.917 for AD/NC, AD/MCI and MCI/NC classification using 900 sMRI images from a subset of ADNI database.

In Aderghal Aderghal et al. used 3 central slices in each projection of a hippocampal ROI. The network architecture represented three 2D convolutional networks (one network per projection) that were joined in the last fully-connected layer. The reached accuracy for AD/NC, AD/MCI and MCI/NC classification is 0.914, 0.695 and 0.656 accordingly on a subset of ADNI database was nevertheless obtained not with siamese networks but by majority voting mechanism.

Ortiz-Suárez in r_1 explored the brain regions most contributing to Alzheimer’s disease by applying 2D convolutional neural networks to 2D sMRI brain images (coronal, sagittal and axial cuts). Using the dataset of 85 subjects the authors build a shallow 2D convolutional neural network. Then they create brain models for each filter at the CNN first layer and identify the filters with greatest discriminating power, thus choosing the most contributing brain regions. The authors demonstrated the largest differentiation between patients in the frontal pole region, which is known to host intellectual deficits related to the disease.

2.2.4 Other networks

A new approach was proposed in c9_DPN_China. Shi et al. used a deep polinomial network to analyze sMRI and PET images. It differs from classical CNNs by non-linearity of operations. The building block of the architecture is shown in Fig.4. Here, represents a layer of nodes, means a layer of nodes that calculate the weighted sum , all other nodes compute . These blocks were combined into a deep network, the input layers were fed with the average intensity of the 93 ROIs selected on sMRI and PET brain images. A scheme of the Deep Polynomial Network module is given in figure 4 below.

Figure 4: An example of a DPN module.
Algorithm Methodology Modalities Content Data (size) Accuracy
AD/NC AD/MCI MCI/NC
Magnin et al. volumetric:OBM Volumetric sMRI Full brain custom (38) 0.945 - -
Ahmed et al. f3_Jenny_pcc Feature-based sMRI 2 ROIs ADNI (509) 0.838 0.695 0.621
Ebadi et al. g1_graph_ensemble_2017 Feature-based DTI Full brain custom (34) 0.8 0.833 0.7
Lee et al. g3_svm_DTI_Korea Feature-based DTI Full brain LONI (141) 0.977 0.977 -
Lei et al. g2_sift_China Feature-based sMRI + PET Full brain ADNI (398) 0.969 - 0.866
Ahmed et al. f5_Jenny Feature-based sMRI + DTI 1 ROI ADNI (203) 0.902 0.766 0.794
Vu et al. c4_sparse_encoder_Korea NN-based sMRI + PET Full brain ADNI (203) 0.91 - -
Payan and Montana c8_medical_UK NN-based sMRI Full brain ADNI (2265) 0.993 1 0.942
Glozman and Liba c1_report_Standford NN-based sMRI + PET Full brain ADNI (1370) 0.665 - -
Sarraf et al. c7.1_DeepAd_Canada NN-based sMRI, fMRI Full brain ADNI (302) 0.988, 0.999 - -
Billones et al. c5_demnet_Philippines NN-based sMRI Full brain ADNI (900) 0.983 0.939 0.917
Aderghal et al. Aderghal NN-based sMRI 1 ROI ADNI (815) 0.914 0.695 0.656
Shi et al. c9_DPN_China NN-based sMRI + PET Full brain ADNI (202) 0.971 - 0.872
Korolev et al. c3_Skolkovo NN-based sMRI Full brain ADNI (231) 0.79-0.8 - -
Suk et al. c6_deep_ensemble_sparse_regr_Korea NN-based sMRI 93 ROIs ADNI (805) 0.903 - 0.742
Suk et al. c6_deep_ensemble_sparse_regr_Korea NN-based sMRI 93 ROIs ADNI (805) 0.903 - 0.742
Luo et al. ex_2 NN-based sMRI 7 ROIs ADNI (81) 0.83 - -
Wang et al. ex_3 NN-based sMRI Full brain ADNI (629) - - 0.906
Li et al. ex_5 NN-based sMRI 1 ROI ADNI (1776) 0.965 0.67 0.622
Cheng et al. ex_7 NN-based sMRI 27 ROIs ADNI (1428) 0.872 - -
Li et al. ex_9 NN-based sMRI Full brain ADNI (832) 0.91 0.877 0.855
Table 1: Comparison of different state of the art classification methods.

This architecture allowed the authors to reach very good level of accuracy: 0.971, 0.872 for AD/NC, MCI/NC classification. The used algorithm also demonstrated a good level of accuracy (0.789) for MCI-C/MCI-NC determination, where MCI-C stands for MCI patients that lately converted to AD and MCI-NC stands for MCI patient that were not converted.

In c3_Skolkovo the authors compared the residual (ResNet) and plain 3D convolutional neural networks for sMRI image classification. Here the authors examined the four binary classification tasks AD/LMCI/EMCI/NC, where LMCI and EMCI stands for the late and early MCI stages accordingly. Both networks demonstrated nearly the same performance level, the best figures being obtained for AD/NC classification with 0.79-0.8 accuracies, using 231 sMRI images from a subset of ADNI database.

Residual convolutional networks having shown good performances in computer vision tasks, Li et al. in ex_5 have also proposed a deep network with residual blocks to preform ordinal ranking. They compared their model to classical multi-category classification techniques. Data of the only one hippocampal ROI from 1776 sMRI images of ADNI database were used. The final accuracy performance of the proposed method is 0.965, 0.67 and 0.622 for AD/NC, AD/MCI and MCI/NC classification accordingly.

A so-called spectral convolutional neural network was proposed in ex_9. It combines classical convolutions with the ability to learn some topological brain features. Li et al. represented a subject’s brain as a graph with a set of ROIs as nodes and edges computed using Pearson correlation from a brain grey matter. With a subset of ADNI database containing sMRI images of 832 subjects the authors achieved the classification accuracy 0.91, 0.877, 0.855 for AD/NC, AD/MCI and MCI/NC classification.

In c6_deep_ensemble_sparse_regr_Korea Suk et al. try to combine two different methods: sparse regression and convolutional neural networks. The authors got different sparse representations of the 93 ROIs of the sMRI data by varying the sparse control parameter, which allowed them to produce different sets of selected features. Each representation is a vector, so the result of generating multiple representations can be treated as a matrix. This matrix is then fed to the convolutional neural network with 2 convolutional layers and 2 fully-connected layers. This approach led to the classification accuracy level of 0.903 and 0.742 for AD/NC and MCI/NC classification.

3 Proposed method of classification

AD MCI NC
Subjects 48 108 58
Samples for train 36 96 46
Samples for test 12 12 12
Samples for train
after augmentation
960 960 960
Samples for test
after augmentation
120 120 120
Table 2: Number of patients and data samples for each class before and after augmentation

3.1 Data selection

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org. We selected 214 subjects: 48 AD patients, 108 MCI and 58 NC (Table 2). For each patient there is a T1-weighted sMRI image as well as a DTI image. Table 3 presents a summary of the demographic characteristics of the selected subjects including the age, gender and Mind Mental State Examination (MMSE) score of cognitive functions. In our case the number of images in the dataset is limited by the availability of DTI data. We focus on the hippocampal ROI and surrounding region in the brain scans.

A preprocess procedure is performed on all used DTI brain images. It includes correction of eddy currents and head motion, skull stripping with Brain Extraction Tool (BET) BET and fitting of diffusion tensors to the data with DTIfit module of the FSL software library FSL. Fitting step generates MD and FA maps. In the current work we focus only on MD maps of DTI images. To use a normalized anatomical atlas for ROI selection the MD images are affinely co-registered to corresponding sMRI scans. After such co-registration both image modalities are spatially normalized onto the Montreal Neurological Institute (MNI) brain template MNI. Thus, after the preprocessing step, for each patient there is a sMRI and MD-DTI aligned images of the same resolution of voxels.

Diagnosis Subjects Age
Gender
(F/M)
MMSE
AD 48
[55.72 - 91.53]
75.65 8.63
20 / 28 23.0 2.42
MCI 108
[55.32 - 91.88]
73.46 7.47
42 / 66 27.39 1.99
NC 58
[60.40 - 89.59]
73.41 5.90
30 / 28 28.88 1.18
Table 3: Demographic description of the ADNI group, Values are denoted as intervals and as mean std

For the further analysis on each image we select two ROIs (left and write lobes of the hippocampus) as the most discriminative parts of human brain for Alzheimer disease analysis pierrick. The ROI selection is performed using atlas AAL AAL, the resolution of both hippocampal areas is voxels. To compare the influence of the amount of image data passed to the network we also consider the extended ROIs of size , and voxels. Herewith, the centers of extended ROIs coincide with the centers of the base ROIs, so the extended ROIs contain all voxels from the base ROI as well as some voxels corresponding to gray matter in the surrounding region.

3.2 Data augmentation

We divide the used image database into train and test sets. For test set we select 12 patients from each class, thus leaving 36, 96 and 46 patients for the train set for AD. MCI and NC classes accordingly (Table 2).

The common problem of using limited dataset for training a neural network is overfitting. To enlarge the amount of data and prevent the network overfitting we perform data augmentation. The augmentation process is performed separately for the train and test sets. As in many other medical problems the used dataset is imbalanced: the number of patients with MCI is almost 3 times bigger than the number of patients with AD. To eliminate the effect of different class capacities on the network training process we propose to perform a special balancing procedure during data augmentation. Improvements of classification results in case of using data balancing procedure was demonstrated in r_1. The main distinctive feature of the method we propose is that class balancing is performed using data augmentation.

The augmentation process with balancing procedure is described with a parameter , which controls the level of augmentation (i.e. the amount of new images generated from the source ones). Let us suppose the largest set to have instances. In this case all classes should be augmented to elements. So for each class that has elements new elements have to be generated. All new images are generated from the source images by random shift up to 2 pixels in each of the three dimensions and random Gaussian blur with up to .

In this work we have chosen

resulting the number of images to be 960 for each class in training set. The situation with test dataset is more complicated than with training set. On the one side, the test dataset should not be augmented in order to leave the test data as realistic as possible. On the other side, the size of used test set is very small which leads to a strong discretization of estimation metrics. So, we consider three different test sets: the original set without augmentation (test 0), augmented set using random shifts only (test 1) and augmented set using both random shifts and blur (test 2). Test 0, test 1 and test 2 sets contain 12, 120 and 120 samples accordingly. Thus we can compare this test sets and draw some conclusions about the impact of test set augmentation on the classification results.

3.3 Network architecture

configuration
name
number of
convolutional
layers
convolution kernel
size for each layer
number of
convolutional filters
in each layer
number of
fully-connected
layers
number of units in
each fully-connected
layer
C1 4 (5, 4, 3, 3) (16, 32, 64, 128) 2 (16, 8)
C2 5 (5, 4, 3, 3, 3) (16, 32, 64, 128, 128) 2 (16, 8)
C3 5 (7, 6, 5, 4, 3) (16, 32, 64, 128, 256) 2 (32, 8)
C4 6 (7, 6, 5, 4, 3, 3) (16, 32, 64, 128, 256, 256) 1 16
Table 4: Compared architectures of the used neural networks

In this work we use a number of 3D convolutional neural networks with slightly different configuration and compare them. The base building block of the used networks consists of 4 consistent operations: 3D convolution, batch normalization

batch_norm_2015

, applying rectifier linear unit and 3D pooling illustrated in Figure

5.

Figure 5: Main convolutional block of the proposed network architecture.
Figure 6: Proposed convolutional neural network architecture.

For each ROI in the image and for each modality we use a separate pipeline of the described blocks. Each pipeline ends with a flatten operation, after which the outputs of the pipelines are concatenated and are passed to the fully-connected layer. This fully-connected layer is followed with a dropout layer and the softmax layer, which produces the network output (Figure 6). Thus the described network is a siamese network which performs the late fusion of the data from input ROIs.

The usage of batch normalization allows us to speed up the network training process and according to batch_norm_2015 eliminate the necessity of using the pretraining techniques (e.g. autoencoders). Batch normalization partially plays a role of regularization and also allows each layer of a network to learn by itself a little bit more independently of other layers.

The network training process is performed by minimizing the euclidean loss function with Nesterov momentum optimization:

where is the velocity at each iteration and is the optimized vector of network weights at iteration , is the momentum value, is the learning rate. The momentum value we use is . The learning rate is updated exponentially:

where is a decay rate and is a decay step. Here the division is an integer division, so the learning rate updates ones every iterations. We use decay rate , decay step and the initial value of learning rate .These values were chosen empirically in accordance with our preliminary experiments yielded the best accuracies.

In this work we compare a few convolutional neural networks differing in their parameters such as a number of convolutional layers and convolutional filters (Table 4). We intentionally consider architectures with different depth as we use ROIs with different sizes. Specifically, we use more shallow C1 and C2 architectures for ROIs of size 28 and 38, and deeper C3 and C4 architectures for ROIs of size 42 and 48.

To prevent the network overfitting we use a method similar to a 10-fold cross-validation. During training we randomly select 90% of training data and use it to learn the network weights, while 10% left data is used as validation. After the fixed number of iterations this train-test separation is repeated again. This approach leads to the effective usage of the available training data.

For more accurate comparison of the networks we use the same number of batches passed into every of the inputs. If the whole number of batches does not fit into the memory, we separate them into a several mini-groups and correct the weights of the network after accumulating the gradients of all mini-groups. Additionally each network is trained with 1000 iterations.

4 Results

Top-mean ACC
[95% CI]
Top-mean SEN
[95% CI]
Top-mean SPC
[95% CI]
used data ROI size configu- ration test 0 test 1 test 2 test 0 test 1 test 2 test 0 test 1 test 2
C1
0.854
0.141
0.827
0.048
0.839
0.047
0.883
0.123
0.880
0.041
0.883
0.041
0.900
0.110
0.885
0.040
0.901
0.038
28 C2
0.825
0.152
0.813
0.049
0.813
0.049
0.867
0.135
0.848
0.046
0.850
0.045
0.908
0.104
0.879
0.0412
0.869
0.043
C1
0.808
0.158
0.790
0.052
0.801
0.051
0.775
0.167
0.799
0.051
0.786
0.052
0.883
0.123
0.831
0.047
0.863
0.044
38 C2
0.767
0.169
0.769
0.053
0.776
0.053
0.825
0.152
0.809
0.050
0.804
0.050
0.817
0.155
0.816
0.049
0.828
0.048
sMRI_L + sMRI_R C3
0.800
0.160
0.777
0.053
0.790
0.052
0.792
0.163
0.813
0.049
0.790
0.052
0.883
0.123
0.894
0.039
0.882
0.041
42 C4
0.771
0.168
0.725
0.057
0.757
0.054
0.725
0.179
0.711
0.057
0.724
0.057
0.892
0.116
0.821
0.049
0.827
0.048
C3
0.788
0.164
0.779
0.053
0.779
0.053
0.767
0.169
0.742
0.055
0.738
0.056
0.883
0.123
0.864
0.043
0.854
0.045
48 C4
0.842
0.146
0.829
0.048
0.836
0.047
0.925
0.090
0.888
0.040
0.898
0.038
0.858
0.140
0.852
0.045
0.838
0.047
C1
0.875
0.129
0.874
0.042
0.854
0.045
0.892
0.116
0.901
0.038
0.874
0.042
0.892
0.116
0.901
0.038
0.887
0.040
28 C2
0.892
0.116
0.904
0.037
0.885
0.040
0.933
0.083
0.940
0.030
0.904
0.037
0.917
0.097
0.914
0.035
0.912
0.036
C1
0.883
0.123
0.904
0.037
0.891
0.040
0.925
0.090
0.923
0.034
0.918
0.035
0.883
0.123
0.910
0.036
0.910
0.036
sMRI_L + sMRI_R + DTI_L + DTI_R 38 C2
0.842
0.146
0.854
0.045
0.842
0.046
0.925
0.090
0.873
0.042
0.892
0.039
0.875
0.129
0.883
0.041
0.872
0.042
C3
0.900
0.110
0.908
0.037
0.895
0.039
0.933
0.083
0.907
0.037
0.910
0.036
0.917
0.097
0.924
0.034
0.913
0.036
42 C4
0.908
0.104
0.871
0.042
0.866
0.043
0.850
0.143
0.823
0.048
0.806
0.050
0.992
0.022
0.973
0.021
0.956
0.026
C3
0.904
0.107
0.903
0.038
0.895
0.039
0.867
0.135
0.883
0.041
0.871
0.042
0.942
0.076
0.923
0.034
0.924
0.034
48 C4
0.967
0.055
0.938
0.03
0.947
0.03
0.958
0.06
0.943
0.03
0.936
0.035
0.975
0.06
0.939
0.03
0.964
0.035
Table 5: AD-NC classification results. Here designation corresponds to the top-mean value of metric and it’s confidence interval. - metric value, - confidence interval.
Top-mean ACC
[95% CI]
Top-mean SEN
[95% CI]
Top-mean SPC
[95% CI]
used data ROI size configu- ration test 0 test 1 test 2 test 0 test 1 test 2 test 0 test 1 test 2
C1
0.754
0.172
0.760
0.054
0.759
0.054
0.617
0.195
0.638
0.061
0.639
0.061
0.950
0.069
0.962
0.024
0.953
0.027
28 C2
0.746
0.174
0.736
0.056
0.754
0.054
0.600
0.196
0.633
0.061
0.609
0.062
0.967
0.053
0.959
0.025
0.956
0.026
C1
0.642
0.192
0.677
0.059
0.663
0.060
0.525
0.200
0.565
0.063
0.526
0.063
1.00
0.000
0.982
0.017
0.958
0.025
sMRI_L + sMRI_R 38 C2
0.679
0.187
0.731
0.056
0.690
0.059
0.658
0.190
0.708
0.058
0.711
0.057
0.992
0.022
0.999
0.002
0.991
0.011
C3
0.688
0.185
0.651
0.060
0.678
0.059
0.592
0.197
0.553
0.063
0.610
0.062
1.000
0.000
0.983
0.017
0.978
0.019
42 C4
0.683
0.186
0.708
0.058
0.717
0.057
0.433
0.198
0.478
0.063
0.476
0.063
0.983
0.034
0.983
0.016
0.978
0.018
C3
0.717
0.180
0.694
0.058
0.693
0.058
0.692
0.185
0.597
0.062
0.644
0.061
0.992
0.022
0.992
0.010
0.973
0.020
48 C4
0.717
0.180
0.709
0.058
0.704
0.058
0.575
0.198
0.530
0.063
0.520
0.063
1.000
0.000
0.997
0.005
0.996
0.006
C1
0.725
0.179
0.742
0.055
0.736
0.056
0.533
0.200
0.597
0.062
0.569
0.063
0.958
0.061
0.954
0.027
0.951
0.027
28 C2
0.783
0.165
0.735
0.056
0.788
0.052
0.867
0.135
0.832
0.047
0.865
0.043
0.917
0.097
0.863
0.044
0.886
0.040
C1
0.7792
0.166
0.767
0.054
0.777
0.053
0.858
0.140
0.863
0.044
0.813
0.049
0.908
0.104
0.913
0.036
0.909
0.036
sMRI_L + sMRI_R + DTI_L + DTI_R 38 C2
0.800
0.213
0.751
0.077
0.791
0.073
0.933
0.104
0.927
0.047
0.902
0.053
0.883
0.149
0.893
0.055
0.875
0.059
C3
0.738
0.176
0.734
0.056
0.763
0.054
0.692
0.185
0.708
0.058
0.723
0.057
0.892
0.116
0.876
0.042
0.903
0.037
42 C4
0.721
0.180
0.743
0.055
0.744
0.055
0.650
0.191
0.699
0.058
0.674
0.059
0.900
0.110
0.909
0.036
0.914
0.035
C3
0.788
0.164
0.787
0.052
0.787
0.052
0.792
0.163
0.771
0.053
0.778
0.053
0.992
0.022
0.985
0.015
0.986
0.015
48 C4
0.738
0.176
0.734
0.056
0.739
0.056
0.625
0.194
0.619
0.061
0.623
0.061
0.983
0.034
0.978
0.019
0.977
0.019
Table 6: AD-MCI classification results. Here designation corresponds to the top-mean value of metric and it’s confidence interval. - metric value, - confidence interval.
Top-mean ACC
[95% CI]
Top-mean SEN
[95% CI]
Top-mean SPC
[95% CI]
used data ROI size configu- ration test 0 test 1 test 2 test 0 test 1 test 2 test 0 test 1 test 2
C1
0.588
0.197
0.558
0.063
0.542
0.063
0.733
0.177
0.710
0.057
0.699
0.058
0.750
0.173
0.725
0.057
0.718
0.057
28 C2
0.571
0.198
0.564
0.063
0.578
0.063
0.875
0.129
0.832
0.047
0.810
0.050
0.558
0.199
0.528
0.063
0.605
0.062
C1
0.608
0.195
0.630
0.061
0.603
0.062
0.667
0.189
0.643
0.061
0.667
0.060
0.775
0.167
0.759
0.054
0.797
0.051
38 C2
0.571
0.198
0.593
0.062
0.593
0.062
0.667
0.189
0.674
0.059
0.638
0.061
0.575
0.198
0.583
0.062
0.600
0.062
sMRI_L + sMRI_R C3
0.642
0.271
0.614
0.087
0.602
0.088
0.592
0.278
0.608
0.087
0.550
0.089
0.733
0.250
0.673
0.084
0.735
0.079
42 C4
0.625
0.194
0.631
0.061
0.618
0.062
0.783
0.165
0.769
0.053
0.748
0.055
0.558
0.199
0.525
0.063
0.553
0.063
C3
0.658
0.268
0.657
0.085
0.631
0.086
0.800
0.213
0.778
0.074
0.713
0.081
0.817
0.201
0.788
0.073
0.788
0.073
48 C4
0.542
0.199
0.502
0.063
0.518
0.063
0.683
0.186
0.696
0.058
0.671
0.060
0.600
0.196
0.560
0.063
0.618
0.062
C1
0.538
0.200
0.532
0.063
0.517
0.063
0.767
0.169
0.788
0.052
0.693
0.058
0.500
0.200
0.518
0.063
0.542
0.063
28 C2
0.538
0.200
0.563
0.063
0.548
0.063
0.800
0.160
0.777
0.053
0.713
0.057
0.500
0.200
0.472
0.063
0.504
0.063
C1
0.542
0.199
0.547
0.063
0.556
0.063
0.758
0.171
0.738
0.056
0.706
0.058
0.417
0.197
0.403
0.062
0.501
0.063
sMRI_L + sMRI_R + DTI_L + DTI_R 38 C2
0.625
0.194
0.613
0.062
0.612
0.062
0.858
0.140
0.834
0.047
0.792
0.051
0.458
0.199
0.441
0.063
0.520
0.063
C3
0.621
0.194
0.628
0.061
0.608
0.062
0.708
0.182
0.759
0.054
0.690
0.059
0.600
0.196
0.570
0.063
0.624
0.061
42 C4
0.513
0.200
0.537
0.063
0.531
0.063
0.692
0.185
0.728
0.056
0.673
0.060
0.408
0.197
0.424
0.063
0.453
0.063
C3
0.588
0.197
0.585
0.062
0.573
0.063
0.708
0.182
0.697
0.058
0.638
0.061
0.567
0.198
0.520
0.063
0.604
0.062
48 C4
0.554
0.199
0.543
0.063
0.552
0.063
0.725
0.179
0.719
0.057
0.707
0.058
0.467
0.200
0.450
0.063
0.471
0.063
Table 7: MCI-NC classification results. Here designation corresponds to the top-mean value of metric and it’s confidence interval. - metric value, - confidence interval.

In this work we analyze the dependency of used data and network configuration on the efficiency of Alzheimer disease detection. We run a number of experiments with different configurations: used image modalities (sMRI, DTI), ROI sizes, number of convolutional layers, number of convolutions in each layer and compare them. As in previous joint works Aderghal we train and evaluate 3 binary classifiers: AD-NC, AD-MCI, MCI-NC. The obtained results are shown in tables 5, 6, 7.

Figure 7: An example of accuracy plot for 1000 iteration during training the network for train, validation, test 0, test 1 and test 2 sets.

To evaluate and score each experiment we use accuracy as a reference metric. As the database we use in this paper is not large enough, the typical accuracy curve for the test set among iterations is not smooth (Fig.7). To eliminate this problem and densely characterize each experiment we propose to calculate mean accuracy for every sequential iterations and find the maximum. Let’s call this value a top-mean accuracy. In this work an interval of

iterations is chosen. For a more precise description of accuracy curve we also calculate accuracy variance on the same interval with the top-mean value.

Along with accuracy (ACC) we also report the values of sensitivity (SEN) and specificity (SPC). All described metrics are calculated using a top-mean approach. We should also notice the absence of commonly used balanced accuracy (BAC) metric. That is because we use an already balanced test set, as the number of patients used for testing in each class is the same (Table 2). So in the current case all reported accuracy values in this work are are equal to the balanced accuracy values.

To perform an interval estimation of the classification we report a 95% confidence interval for all described metrics using Wilson score interval wilson_1; wilson_2. The confidence interval is calculated as:

where is a constant corresponding to confidence range (in case of 95% range ), is a number of samples in the set and is a value of metric for which we calculate a confidence interval. Here we should also notice, that the width of the confidence interval depends on the number of samples in the set: the more samples there is in the set, the shorter confidence interval would be. There by in our case we get shorter confidence intervals for test 1 and test 2 sets than for the test 0 set because of augmentation.

As it was discussed earlier for each ROI and each modality we perform a base pipeline of convolutions and then do a late fusion. To compare sMRI and DTI modalities and analyze their applicability to Alzheimer disease detection we consider neural networks with the following pipeline inputs (the corresponding abbreviations used below are given at the beginning of the lines):

  1. DTI_L+DTI_R: left hippocampus on MD-DTI and right hippocampus on MD-DTI images

  2. sMRI_L+sMRI_R: left hippocampus on sMRI and right hippocampus on sMRI images

  3. sMRI_L+sMRI_R+DTI_L+DTI_R: left hippocampus on sMRI, right hippocampus on sMRI, left hippocampus on MD-DTI and right hippocampus on MD-DTI images

  4. sMRI_LR+DTI_LR: left-right hippocampus on sMRI and left-right hippocampus on MD-DTI images

Here left-right hippocampus corresponds to the data set obtained by uniting left hippocampus ROI and mirror-flipped right hippocampus ROI. This is justified from medical point of view as left and right lobes of hippocampus represent symmetrical structures. An example of used network is shown in Figure 6.

During the experiments it was found that AD-NC and AD-MCI binary classifiers achieve the best classification scores with the third type of input (left hippocampus on MD-DTI + right hippocampus on MD-DTI + left hippocampus on sMRI + right hippocampus on sMRI). Slightly inferior results are obtained with data fusion in the most difficult MCI-NC case. It can be explained by the fact that adding a less informative DTI modality in conditions of small amount of data leads to making the weights of fully-connected layers of the network more noisy and thereby worsening the final result.

The results obtained using the first type of input on a single DTI modality (left hippocampus on MD-DTI and right hippocampus on MD-DTI) were the worst for all three classifiers, so we do not add them into tables 5, 6, 7. Moreover the results obtained with the fourth type of input (left-right hippocampus on sMRI and left-right hippocampus on MD-DTI) were in almost cases better than in case of using sMRI data only (second type of input), but worser than using data fusion for each ROI separately (third type of input). Presumably this happens due to the fact that although left and right hippocampal structures are symmetrical they are not identical. So treating them separately gives a greater effect than using them simultaneously. For the described reason we do not include this type of input in the results tables 5, 6, 7.

The results also demonstrate that the size of ROIs does matter. The usage of bigger ROIs ( and voxels) in combination with a deeper network architecture leads to better classification results. So for example we achieved a classification accuracy of 0.967 using ROI and a 6-layered network with data fusion. A combination of ROIs with 5-layered architecture also demonstrated a good level of performance in all three classification cases.

We should also discuss the impact of test set augmentation on the classification results. As it can be seen from Fig. 7 and tables 5, 6, 7, the usage of more augmented set leads to a smoother accuracy curve, but also slightly worsens the classification results in all cases.

All in all we succeeded to achieve the classification accuracy of 0.967, 0.8 and 0.658 for AD-NC, AD-MCI and MCI-NC classification problems respectively.

The proposed method was implemented using the TensorFlow

tensorflow framework. The experiments were performed on two configurations: Intel Core i7-6700HQ with Nvidia GeForce GTX 960M and Intel Core i7-7700HQ CPU with Nvidia GeForce GTX 1070 GPU.

5 Discussion and Conclusion

As it can be seen from the Table 1 relatively new feature-based and neural network-based methods demonstrate very good level of performance compared to the classical volumetric methods that are performed manually by medical experts.

It should be mentioned, that the direct comparison of the reviewed algorithms for Alzheimer’s disease diagnostics is impossible. The proposed results were obtained using images from several databases and in different quantities (see Table 1). Moreover different classification problems were challenged: although most papers focus on the 3-class AD/MCI/NC binary classification, some of them consider only 2-class AD/NC classification c4_sparse_encoder_Korea; c7.1_DeepAd_Canada; c7.2_DeeapAd_Canada; c7.3_DeeapAd_Canada and even 4-class AD/eMCI/lMCI/NC classification c3_Skolkovo. Also c5_demnet_Philippines; c6_deep_ensemble_sparse_regr_Korea deserve special attention as the authors try to solve a problem in demand of prediction of Alzheimer converters. Nevertheless, we can state that our results confirm the general trend: with Deep Neural Networks on 3D volumes and fusing different modalities, we achieve scores of accuracy higher then 0.9. This makes us think that the CNN - based classification can indeed be used for real-world CAD systems in large cohort screening. In this paper we focused on only one biomarler ROI, the hippocampal ROI. Nevertheless, accordingly to previous research f3_Jenny_pcc it is interesting to add other ROIs known to be deteriorated due to AD.

6 Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate;Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F.Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research and Development, LLC.; Johnson and Johnson Pharmaceutical Research and Development LLC.; Lumosity; Lundbeck; Merck and Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuroimaging at the University of Southern California.

This research was supported by Ostrogradsky scholarship grant 2017 established by French Embassy in Russia and TOUBKAL French-Morocco research grant Alclass. We thank Dr. Pierrick Coupé from LABRI UMR 5800 University of Bordeaux/CNRS/Bordeaux-IPN who provided insight and expertise that greatly assisted the research.

References