Tensorizing GAN with High-Order Pooling for Alzheimer's Disease Assessment

by   Wen Yu, et al.

It is of great significance to apply deep learning for the early diagnosis of Alzheimer's Disease (AD). In this work, a novel tensorizing GAN with high-order pooling is proposed to assess Mild Cognitive Impairment (MCI) and AD. By tensorizing a three-player cooperative game based framework, the proposed model can benefit from the structural information of the brain. By incorporating the high-order pooling scheme into the classifier, the proposed model can make full use of the second-order statistics of the holistic Magnetic Resonance Imaging (MRI) images. To the best of our knowledge, the proposed Tensor-train, High-pooling and Semi-supervised learning based GAN (THS-GAN) is the first work to deal with classification on MRI images for AD diagnosis. Extensive experimental results on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset are reported to demonstrate that the proposed THS-GAN achieves superior performance compared with existing methods, and to show that both tensor-train and high-order pooling can enhance classification performance. The visualization of generated samples also shows that the proposed model can generate plausible samples for semi-supervised learning purpose.



There are no comments yet.


page 1

page 3

page 7

page 12

page 13


MRI Images Analysis Method for Early Stage Alzheimer's Disease Detection

Alzheimer's disease is a neurogenerative disease that alters memories, c...

Morphological feature visualization of Alzheimer's disease via Multidirectional Perception GAN

The diagnosis of early stages of Alzheimer's disease (AD) is essential f...

GANDALF: Generative Adversarial Networks with Discriminator-Adaptive Loss Fine-tuning for Alzheimer's Disease Diagnosis from MRI

Positron Emission Tomography (PET) is now regarded as the gold standard ...

Bilinear pooling and metric learning network for early Alzheimer's disease identification with FDG-PET images

FDG-PET reveals altered brain metabolism in individuals with mild cognit...

Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements, Sparse Coding, and Correntropy

Biomarker-assisted diagnosis and intervention in Alzheimer's disease (AD...

Tensor-based Nonlinear Classifier for High-Order Data Analysis

In this paper we propose a tensor-based nonlinear model for high-order d...

Semi-supervised Learning Approach to Generate Neuroimaging Modalities with Adversarial Training

Magnetic Resonance Imaging (MRI) of the brain can come in the form of di...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Alzheimer’s Disease (AD) is an irreversible and chronic neurodegenerative disease with progressive impairment of memory and other mental functions. It is estimated to be the third leading cause of death, after heart disease and cancer

[1]. According to the World Alzheimer Report [2], the total estimated prevalence of AD was around 50 million worldwide in 2018, and the number will increase to 152 million by 2050. AD is caused by abnormal deposits of protein in the brain that destroys cells in the regions that control memory and mental functions. To date, AD is incurable but preventable. Early diagnosis of AD is crucial for timely therapy to slow the progression of the disease. Currently, the clinical diagnosis of AD heavily depends on clinical history[3]. The diagnosis procedure is time-consuming and requires extensive clinical training and experience for neurologists. Therefore, accurate AD assessment in its earliest stage by utilizing deep learning is highly desirable.

T1-MRI image is an important biomarker for AD diagnosis in routine clinical practice. Early work for AD diagnosis using MRI images primarily focused on traditional machine learning techniques


, which heavily relied on specific assumptions about brain structural abnormalities, such as regional cortical thickness, hippocampal volume, and gray matter volume. The performance of these manual feature extraction methods is limited since they require advanced clinical domain knowledge and complicated preprocessing steps. Therefore, they tend to be time-consuming and subjective. Besides, the brain is a huge network with complicated connections. The disease-related structure changes are subtle and scattered throughout the entire brain in different tissues. These kinds of patterns are difficult to learn since not all morphological abnormalities related to AD can be captured accurately, and the extracted Regions Of Interest (ROI) or voxel features are processed independently. Hence these features are unable to express the internal brain connections sufficiently.

Recent advances in deep learning [6] [7]

have explosive popularity in computer vision and various medical applications. Instead of manually extracting features according to domain-specific knowledge, deep learning can discover the discriminant representations of images by incorporating the feature extraction into the task learning process. However, most existing methods can only utilize the labeled data in a supervised manner. Annotation of MRI images is laborious and costly, which requires clinical confirmation with great effort by experts. As a result, only small amounts of labeled MRI images are available for AD assessment, and the unlabeled MRI images can not be used directly.

Generative Adversarial Network (GAN) has attracted much attention as it is capable of generating data without explicitly modeling the probability density function. It is intelligent for the discriminator to incorporate unlabeled data into the training process by utilizing the adversarial loss


. Furthermore, GAN has been proven to be feasible in data augmentation, image-to-image translation, and Semi-Supervised Learning (SSL). To make full use of both labeled and unlabeled MRI images, Semi-Supervised GAN (SS-GAN)

[9, 10, 11, 12] can be adopted. In this paper, our primary goal is to leverage GAN to characterize the high-order distribution of MRI images for semi-supervised classification. In particular, we discovered that the recently introduced triple-GAN could alleviate the instability and incompatible problems of the SS-GAN [12]. Triple-GAN designed a three-player cooperative game instead of the conventional two-player competition game by introducing the auxiliary classifier network based on generator and discriminator. Inspired by this, our model exploits the three-player cooperative game for modeling MRI images to assess MCI and AD.

Based on these observations, in this paper, we propose a novel Tensorizing GAN with High-order pooling to assess MCI and AD. More specifically, in order to stabilize the training of GAN and speed up the convergence, the proposed model utilizes the compatible learning objects of the three-player cooperative game. Our proposed model is called THS-GAN, i.e., Tensor-train decomposition, Higher-order pooling, and S

emi-supervised learning are employed in the proposed GAN model. Instead of vectorizing each layer as conventional GAN, the tensor-train decomposition is applied to all layers in classifier and discriminator, including fully-connected layers and convolutional layers. Thus the number of parameters can be reduced significantly. Besides, in such a tensor-train format, our model can benefit from the structural information of the brain. Moreover, compared with the first-order pooling, the high-order pooling module can extract more significant features by making full use of the second-order statistics of the holistic MRI image. Thus our model also exploits Global Second-order Pooling (GSP) block as a high-order pooling module in the classifier. In particular, GSP block can capture the long-range dependencies of features at distant positions by computing all pairwise channel correlations of the 4D feature-maps extracted by 3D-DenseNet. Thus both GSP and 3D-DenseNet are integrated into the classifier to enhance salient feature channels and suppress less-useful feature channels. As a result, useful features related to anatomical abnormalities are extracted in a self-attention manner to improve the performance of classification. The contributions of this paper are summarized as follows:

  1. By tensorizing the three-player cooperative game based framework, the proposed model can benefit from the structural information of the brain.

  2. The proposed THS-GAN leverages the high-order pooling to make full use of the second-order statistics of the holistic MRI images. The long-range dependencies between slices of different directions can be captured effectively. Thus more significant features can be extracted automatically in a self-attention manner to boost the predictive performance.

  3. The THS-GAN model is designed to assess MCI and AD in a semi-supervised manner to take advantage of both labeled and unlabeled MRI images.

The rest of this paper is organized as follows. We review the related work in Section II. In Section III, we present the proposed THS-GAN in detail. In Section IV, THS-GAN is tested with various configurations and experimental results are presented to demonstrate its advantage. Finally, concluding remarks and future work are discussed in Section V.

Fig. 1: An illustration of THS-GAN (best view in color). Real and Fake are the adversarial losses. and denote the cross-entropy loss for supervised learning for real data and generated data respectively. and are unbiased regularizations that ensure the consistency between , and , which are the distributions defined by the generator, classifier and true data respectively.

Ii Related work

The current AD diagnosis model can be categorized into two types: the traditional machine learning-based approach and the deep learning-based approach.

The traditional machine learning techniques can be divided further into three categories: Voxel-based approach, ROI-based approach, and patch-based approach. Although the voxel-based approach [13]

is intuitive and straightforward in terms of interpretation, the process of classification is computationally expensive since the voxel-wise features are of extremely high dimensionality, and the classification performance will deteriorate due to the “curse of dimensionality

[14]. For ROI-based approach [4], the ROIs are segmented by prior hypothesis, but the abnormal regions related to AD may not fit the predefined ROIs ideally in practice, and the features extracted from ROIs are very coarse in the sense that they can’t sufficiently represent all subtle changes involved in the brain diseases. As a result, the representation power of ROI features is limited. Patch-based approach dissected brain areas into small 3D-patches, followed by extracting features from each selected patch individually, and then the features are combined hierarchically in a classifier level [15]. However, the features extracted by these methods neglect the correlated variations of the whole brain structure affected by AD in other regions. Besides, the extraction of these handcrafted features heavily depend on how well the images are registered and segmented, which often require the domain expert knowledge.

In the application domain of AD diagnosis, the previous deep learning studies focused on two directions: (1) CNN is utilized for supervised classification [16], primarily by using large-scale annotated data sets. (2) Unsupervised GAN is exploited for data synthesis or image-to-image translation [17] [18]. In the first approach, Islam et al. [19] presented a method based on 2D-DenseNet. The MRI images is sliced in three directions (axial, coronal, and sagittal). Then three parallel 2D-DenseNets are evaluated on MRI slices separately. Finally, the results are fused for AD diagnosis. However, the way of converting a 3D-image into a series of 2D-slices causes CNNs to disregard the spatial information of 3D space, and different slicing methods lead to loss of features. Thus many studies focus on 3D-CNN instead of 2D to alleviate this issue. For instance, Payan et al. [16] utilized 3D-convolutions [20] combined with a sparse auto-encoder, which yielded better performance than 2D-convolutions on slices. In the second approach, Pan et al. [17]imputed the missing PET images by learning bi-directional mappings between MRI and PET via 3D-cGAN. Then, based on the complete MRI and PET (after imputation), they develop a landmark-based multi-modal multi-instance learning method (LM3IL) for AD diagnosis. Karim et al. [18] proposed the Cycle-MedGAN framework based on the traditional Cycle-GAN with new non-adversarial losses for PET to CT translation. Wang et al. [7] proposed a 3D auto-context-based locality adaptive multi-modality generative adversarial network model (LA-GANs) to synthesize the high-quality FDG PET image from the low-dose one with the MRI images that provide anatomical information.

In this paper, our approach is different from the previous GAN applications on AD diagnosis that focus on image synthesis and image-to-image translation. Our aim is to enhance GAN for AD classification in a semi-supervised manner with less annotated T1-MRI images. We remark that the research of GAN adaptation in T1-MRI images is still under development.

Iii The Proposed THS-GAN Method

Iii-a Overview

Fig. 1 summarizes the architecture of the proposed THS-GAN. After data preprocessing (see Section IV-A), the normalized T1-MRI images are fed into THS-GAN. Since the input T1-MRI images are high-order with complicated brain structure, we modify the triple-GAN with the following four significant improvements. (1) Instead of 2D transposed convolution, 3D transposed convolution is utilized in the generator to generate T1-MRI images. (2) 3D-DenseNet [21][22] is adopted in both the classifier and discriminator to extract subtle features related to AD within the limited receptive field at a local level. (3) All layers in classifier and discriminator are compressed by Tensor-Train decomposition. (4) The high-order pooling module GSP block is incorporated into the classifier to make full use of the correlation within feature-maps along the channel axis to capture more discriminative features at the global level to represent the holistic brain. The details of the proposed method will be presented in Section III-B.

Iii-B The Architecture

The proposed THS-GAN is designed for semi-supervised classification. Input data is partially labeled and represents the corresponding label. denotes the empirical distribution of input data and is assumed as the distribution of labels on partially annotated data. The goal is to predict the label for both labelled and unlabeled data as well as to the new generated samples conditioned on . As the label is incomplete, our density model should characterize the uncertainty of both and

, thus the joint distribution

of image-label pairs can be calculated in two ways: and . The conditional distributions and are learnt by the class-conditional generator and auxiliary classifier respectively. Thus the proposed THS-GAN consists of three networks: (1) a class-conditional generator that approximately characterizes the conditional distribution ; (2) a classifier that approximately characterizes the conditional distribution in the opposite direction ; and (3) a discriminator that distinguishes whether the image-label pair comes from the real data distribution .

More specifically, in the three-player game as illustrated in Fig. 1, a sample is drawn from , classifier predict label given following the conditional distribution . Hence, the pseudo image-label pair is from the joint distribution . Similarly, a pseudo image-label pair is produced by generator given by utilizing , hence forming the joint distribution . With respect to , is transformed by generator given label and the latent variables . , , where is a simple distribution (e.g., uniform or standard normal). Then the pseudo image-label pairs and are fed into the discriminator for identification. Discriminator will identify the image-label pairs from real data distribution as positive samples, and discriminator D is trained to maximize the probability of assigning the correct label to both real samples and fake samples from generater G and classifier C. To achieve equilibrium that the joint distributions defined by classifier and generator both converge to real data distributions, compatible objective of adversarial loss is defined as below:


where , , and are individual networks. and are represented by TT-layers (Tensor-Train layers). denotes the expectation over the real labelled data. is the expectation over the real unlabelled data produced by the classifier, and is the expectation over the fake data produced by the generator. represents the probability that image-label pair came from the real labelled data. Meanwhile, and represent the probability that image-label pair came from fake data produced by classifier and generator respectively. is a constant that controls the relative importance of generation and classification, and we use the fixed value of 0.5. The game defined in Equation (1) achieves its equilibrium if and only if . The equilibrium indicates that if one of classifier and generator tends to the real data distribution, the other will also go towards the data distribution, which addresses the competing problem of the conventional semi-supervised GAN. Note that the conventional semi-supervised GAN only contains two players: generator and discriminator. The discriminator shares incompatible roles of identifying fake samples and predicting real labels simultaneously, and the generator estimates the data without considering the labels. By utilizing the three-player cooperative game, both the classifier and generator will converge to the real data distribution if the model has been trained to achieve the optimum. In this manner, the class-conditional generator can disentangle different modalities and generate T1-MRI images to cover all classes (AD, MCI, and NC). On the other hand, the discriminator is trained with dissimilar samples from various classes (AD, MCI, and NC) to provide gradients for the generator. Hence, the mode collapse problem is alleviated.

As aforementioned, layers are tensorized as TT-layer and we treat the elements of the TT-cores as the parameters of the layer. TT-layers of classifier and discriminator are represented as various TT-cores of elements and respectively. The classifier is updated by descending along its stochastic gradient according to with respect to all the elements

of TT-cores. The classifier loss function

is composed of two parts: the supervised loss and the unsupervised loss:


The supervised loss function is defined by the cross-entropy loss of real image-label samples and generated image-label samples in a supervised learning setting:


The cross-entropy loss of real labelled data distribution for classifier is defined as , which is equivalent to model the KL-divergence between and . As the generated data can also be used for boosting classification performance, the cross-entropy loss of synthesis data is defined as , which optimizes classifier on the samples produced by generator in the supervised manner. Minimizing with respect to classifier is equivalent to minimizing . Note that directly minimizing is infeasible since the unknown likelihood ratio can not be computed directly.

is the weight hyperparameter fixed as 0.05.

The unsupervised loss is, in fact, the adversarial loss of standard GAN minimax game:


In other words, the unsupervised loss is computed to distinguish real and fake image-label samples. The supervised loss computes the cross-entropy for real classes. In this work, these classes are AD, MCI, and NC.

The generator loss is defined as:


With respect to above, a reconstruction loss term is added as the L1 distance between generated images and real images . is fixed as 0.01. The discriminator is updated by descending along its stochastic gradient according to with respect to all the elements of TT-cores:


Intuitively, a sound generator can produce meaningful labeled data beyond training set as auxiliary information for the classifier, which will improve the predictive performance, and vice versa, a sound classifier will boost the performance of the generator. As a result, both the classifier and generator can improve mutually. Moreover, the discriminator can utilize the label information of the unlabeled data through the classifier and then assist the generator to generate correct image-label pairs. Therefore, THS-GAN is more likely to reach Nash equilibrium.

Two components of triple-GAN (classifier and discriminator) are converted to the Tensor-Train format (TT-format) [23, 24, 25]. We refer to 1-D data as a vector, denoted as . 2-D array is matrix, denoted as , and higher dimensional array is tensor, denoted as . To refer one specific element from a tensor, we use , where d is the dimensionality of the tensor and is the index vector. Our proposed THS-GAN ingests T1-MRI image as 3D tensor, where each dimension corresponds to height, width, and slice respectively. A -dimensional tensor can be represented in the TT-format[26] [25] as:


where is an matrix, which is one slice from the 3-dimensional array . The elements of the collection are called TT-ranks. is the boundary condition to keep the matrix product (9) of size .

The collections of matrices are called TT-cores[24]. The TT-format requires parameters to represent a tensor which has elements. The TT-ranks control the trade-off between the number of parameters and the accuracy of the representation. The smaller the TT-ranks, the more memory efficient the TT-format is. But if the TT-ranks are set too small, the accuracy might deteriorate due to information loss caused by over-compressing. Such a representation is memory-efficient to store high-order data. Meanwhile, the significant structural information of data can be preserved. These properties are suitable for representing T1-MRI images. In the following, we introduce tensor-train decomposition for fully-connected layers and convolutional layers respectively.

Fig. 2: The classifier framework is composed of 3D-DenseNet and GSP block. Note that only one GSP block is inserted at one of three optional positions in red. There is no GSP block in discriminator.

Iii-B1 Fully-Connected Layers Tensor-train Decomposition

The fully-connected layer is applied to an input N-dimensional vector :


where the weight matrix

and the bias vector

define the linear transformation. A TT-fully-connected-layer transforms a d-dimensional tensor

(which is constructed from the corresponding vector ) to the d-dimensional tensor (which corresponds to the output vector ) by factorizing the weight matrix into the TT-format with the TT-cores . Thus the linear transformation (Equation (10)) of a fully-connected layer can be represented in the TT-layer:


where is a slice of cores as illustrated in the red part of Fig. 1. Since the fully-connected layer is a special case of convolutional layer with kernel size , such TT-format can also be applied to convolutional layers in a similar manner.

Iii-B2 Convolutional layers Tensor-train Decomposition

3D convolution is an extension of 2D convolution with one more spatial dimension in terms of slice with respect to T1-MRI volume. The traditional 3D convolutional layer transforms the 4-dimensional input tensor into the output by convolving with the kernel :


When stride is set as 1 and there is no zero padding,

, and . The Tensor-Train decomposition is applied to the convolutional kernel as follows:


Red part of Fig. 1 also presents an illustration for Equation (13), and the 3D convolutional layer is converted to TT-layer as follows:




where , and

is the number of TT-cores. By replacing the 4D convolutional kernel with approximations using lower rank matrices, redundancy in convolutional layers can be removed implicitly. It is worth noting that although applying tensor-train decomposition to neural networks can achieve a large factor of compression, finding optimal TT-ranks remains difficult

[23] [27]. The TT-layer is compatible with the existing training algorithms for neural networks because all the derivatives required by the back-propagation algorithm can be computed using the properties of the TT-format.

The network utilized in both the classifier and discriminator is DenseNet [21]

. We expand it to 3D-DenseNet by adding a spatial dimension to all convolutional and pooling layers in DenseNet for 3D T1-MRI volume. Feature-maps learned by all preceding layers are concatenating along the last dimension for the subsequent layers. Through such dense connectivity, feature-maps are reused and the vanishing-gradient problem is alleviated. Meanwhile, 3D-DenseNet can extract the local morphological features related to AD lesions from the whole volumes efficiently. The details of 3D-denseNet are found in

[21][22]. In this paper, the depth is set as 30, the growth rate is set as 12, the number of the Dense-BC block is set as 3, and reduction is set as 0.5.

Furthermore, the high-order pooling module GSP block can make full use of the second-order statistics of the holistic MRI images. The long-range dependencies between slices of different directions can be effectively captured for extracting more significant features in a self-attention manner. Thus the GSP block is added after the Dense-BC block in the classifier, as illustrated in Fig. 2, aiming to learn more discriminative representations by re-calibrating the 4D channel-wise feature-maps. There is one more GSP block (in red), which can be positioned at: (1) GSP block 1, (2) GSP block 2, or (3) GSP block 3.

Fig. 3: High-order pooling module GSP block. Given an input 4D feature map, convolution is performed to reduce dimension. Then the covariance matrix is computed followed by convolution and non-linear activation, finally weight vector is produced to recalibrate the feature map along the channel dimension. The high-order pooling can capture the dependency of features at distant positions by computing all pairwise channel correlations. As a result, significant features will be enhanced. As each channel corresponds to a particular feature, each feature map of all channels is considered as a feature set that can map back to individual voxels of input MRI image. The discriminative features related to AD are shown in the benchmark[28].

Inspired by [29], the GSP block is extended to a 4D tensor, as illustrated in Fig. 3. Given a 4D feature map outputted by a previous Dense-BC block, we first perform GSP to model pairwise channel correlations of the holistic feature map. Then the resulting covariance matrix is processed by convolutions and non-linear activations, which is finally used for scaling the 4D feature map along the channel dimension.

More specifically, the GSP block consists of two modules: a squeeze module and an excitation module. The squeeze module aims to model the second-order statistics along the channel dimension of the input feature map for capturing channel dependency. Consider a 4D feature map of as an input, where is the spatial height of the feature-map, is the width, is depth, and is the number of channels. It can be seen as cubes where each cube is of size . First, convolution is utilized to reduce the number of channels from to () to decrease the computational cost of the following operations. For the tensor of reduced dimensionality, the pairwise channel correlations are computed to one covariance matrix. The resulting covariance matrix has clear physical meaning, its row indicates the statistical dependency of channel with all channels. As the quadratic operations involved change the order of data, row-wise normalization is performed for the covariance matrix with respect to the structural information of brain. To simplify the block design and to find the appropriate trade-off between computational complexity and classification accuracy, we calculate the size of the covariance matrix as in a self-adaptive manner.

The excitation module aims to scale the channel for feature re-calibration. In the excitation module, before channel scaling, we perform two consecutive operations of convolution and non-linear activation for the covariance matrix. To maintain the structural information, the covariance matrix is processed with row-wise convolution, which is followed by a Leaky Rectified Linear Unit (LReLU). Then we perform the second convolution and the sigmoid function as a non-linear activation to compute the weight vector of [

,,…,]. The final output of the GSP block is obtained by operating the dot product between the weight vector [,,…,] and the respective channels [Channel 1,Channel 2,…,Channel ]. Individual channels are thus emphasized or suppressed in this soft manner in terms of the weights. Thus the discriminative features related to AD lesions are enhanced, and redundant features are suppressed. As shown in Fig. 3, the feature map output by GSP block is close to the benchmark with less redundant features, and all significant features are discovered. On the other hand, the feature map without high-order pooling includes more redundant features compared with the benchmark.

Furthermore, the network structure of each component in THS-GAN is further optimized from the following perspectives. (1) For generator, the condition variable can either be concatenated with the random noise in the first layer or be added in the subsequent layers as additional channels. In our study, we adopt the latter one. (2) As suggested by Radford et al. [30]

, we also add Batch Normalization (BN) to both the discriminator and the generator in the THS-GAN model to prevent the generator from collapsing all the samples to a single point. However, adding BN to all layers causes model instabilities. Hence we also avoid using BN in the generator output layer and the discriminator input layer as they suggest. The tanh function is used in the generator output layer. (3) The first order pooling (average pooling) is still utilized since the GSP block can not reduce dimensions of the feature-map resulting in a large number of parameters. Thus the first order pooling is combined with GSP block to abstract the discriminative representations, so that the proposed THS-GAN model can take advantage of both first-order and second-order statistics for AD diagnosis.

Iv Experiments and Results

Iv-a Dataset and Preprocessing

A total of 833 T1-weighted MRI images are downloaded from ADNI111 http://adni.loni.usc.edu/ database in the neuroimaging informatics technology initiative (NIfTI) format, which have already been processed for spatial distortion correction caused by gradient nonlinearity and B1 field inhomogeneity. The standard image pre-processing procedure is performed on the selected T1-MRI images for each subject, including grad-warping, skull-stripping, cerebellum removal, and intensity correction. We perform skull stripping using Brain Extraction Tool (FSL-BET), followed by a manual correction to ensure that both skull and dura have been removed completely. Then, we remove the cerebellum by warping a labeled template to each skull-stripped image. Finally, all brain images were aligned to the standardized MIN152 template using FSL FLIRT [31]. The dimension of each image is . Each image comprises 109 2D slices of .

Among 833 T1-MRI images, there are 221 AD subjects, 297 MCI subjects, and 315 Normal Controls (NC) subjects respectively. To evaluate the effectiveness of our model, we set up three groups of experiments: (1) AD vs. NC, (2) MCI vs. NC, and (3) AD vs. MCI classification. It is worth noting that the second classification is significant to distinguish MCI from NC for early diagnosis so that timely therapeutic interventions can be carried out to slow down the progression of MCI to AD.

The T1-MRI image is normalized into the range [-1,1], and the whole volume of voxels is fed into the proposed THS-GAN model as a tensor directly without compressing or downsizing to ensure no information loss. No data augmentation was used. For evaluation, 10% of the total data is selected as a validation dataset and another 10% as a test dataset. The remainding (80%) was used as a training dataset for our THS-GAN model. The validation dataset was utilized to tune hyperparameters to find the best model out of several trained models.

Iv-B Experimental Setup

The proposed THS-GAN model is trained on the ADNI dataset from scratch in an end-to-end manner. We implement our method based on TensorFlow


. It takes around 10 hours for training our model on the training data set along with validation on the validation set at each epoch on NVIDIA GeForce GTX 1080 GPU. The initial learning rate is 0.01 and will decrease to

at 75 epochs and at 110 epochs. A weight decay of

is applied for all the weights, and we use stochastic gradient descent with Nesterov momentum

[32] of coefficient 0.9. The validation accuracy will be evaluated once for each training epoch. Besides, we set the batch size of both labeled data and unlabelled data as 7, and the number of epochs as 150. The loss is not applied until the number of epochs reaches a threshold that the generator can generate meaningful data. We search the threshold in {60,120} based on the validation performance, and is fixed as 0.05.

Iv-C Evaluation Metrics

Five metrics are used for quantitative evaluation and comparison, including accuracy, precision, recall, f1-score, and AUC. The Area Under a ROC curve (AUC) is a single value frequently used to measure classifier performance (

). In other words, AUC is an indicator of the probability that a classifier will correctly classify instances. Note that an AUC value of 0.5 indicates a random classifier (guessing). We denote TP, TN, FP, and FN as true positive, true negative, false positive, and false negative respectively. The evaluation metrics are defined as follows:




Iv-D The effect of TT-core number

As mentioned in Section III-B, the TT-core number and the TT-rank are two parameters that have a great impact on classification results. This section provides a comparative evaluation of the proposed THS-GAN with respect to a range of TT-core numbers. The GSP block is fixed at the position of “GSP block 3”. TT-rank of the classifier and discriminator was fixed at 14 and 6 respectively. Fig. 4 shows that as the TT-core number increased from 3 to 6, the classification accuracy decreased for AD/NC classification. Meanwhile, for AD/MCI and MCI/NC classification, there are no specific trends of accuracy as core number increased from 3 to 6. But similar to AD/NC classification, the best accuracy is achieved at the minimal core number. This observation is consistent with [23]. Thus we set the TT-core number as 3 in the rest of the experiments.

Fig. 4: Comparison of different TT-core numbers.

Iv-E The effect of TT-rank and GSP block position

GSP block Position C_rank D_rank #parameters AUC(%) Accuracy(%) Class precision(%) recall(%) f1-score(%)
GSP block 1 14 6 118,210 50.00 56.00 AD 56.00 100 71.79
NC 0 0 0
15 7 139,611 50.00 44.00 AD 44.00 100 61.11
NC 0 0 0
16 8 163,516 50.00 60.00 AD 60.00 100 75.00
NC 0 0 0
17 9 189,925 84.00 84.00 AD 84.00 84.00 84.00
NC 84.00 84.00 84.00
18 10 218,838 63.33 77.55 AD 100 26.67 42.11
NC 75.56 100 86.08
19 11 250,255 69.64 62.22 AD 100 39.29 56.41
NC 50.00 100 66.67
20 12 284,176 91.99 92.00 AD 92.31 92.31 92.31
NC 91.67 91.67 91.67
GSP block 2 14 6 120,034 83.33 79.59 AD 100 66.67 80.00
NC 65.52 100 79.17
15 7 141,435 50.00 48.98 AD 0 0 0
NC 48.98 100 65.75
16 8 165,340 93.18 93.88 AD 100 86.36 92.68
NC 90.00 100 94.74
17 9 191,749 44.71 55.10 AD 60.47 83.87 70.27
NC 16.67 5.56 8.34
18 10 220,662 83.36 83.67 AD 85.71 78.26 81.82
NC 82.14 88.46 85.18
19 11 252,079 83.88 83.67 AD 88.89 82.76 85.72
NC 77.27 85.00 80.95
20 12 286,000 95.92 95.92 AD 95.83 95.83 95.83
NC 96.00 96.00 96.00
GSP block 3 14 6 121,048 91.81 91.84 AD 91.30 91.30 91.30
NC 92.31 92.31 92.31
15 7 142,449 74.48 73.47 AD 83.33 68.97 75.47
NC 64.00 80.00 71.11
16 8 166,354 84.32 81.63 AD 95.83 74.19 83.63
NC 68.00 94.44 79.07
17 9 192,763 80.77 79.59 AD 69.70 100 82.14
NC 100 61.54 76.19
18 10 221,676 73.82 73.47 AD 79.17 70.37 74.51
NC 68.00 77.27 72.34
19 11 253,093 69.40 69.39 AD 72.00 69.23 70.59
NC 66.67 69.57 68.09
20 12 287,014 92.00 91.84 AD 85.71 100 92.31
NC 100 84.00 91.30
SS-GAN [12] 251,637 80.02 80.39 AD 82.76 82.76 82.76
NC 77.27 77.27 77.27
triple-GAN [33] 506,386 86.83 87.76 AD 90.32 90.32 90.32
NC 83.33 83.33 83.33
TABLE I: Comparison of THS-GAN using different GSP block positions and TT-ranks for AD/NC classification.
GSP block Position C_rank D_rank #parameters AUC(%) Accuracy(%) Class precision(%) recall(%) f1-score(%)
GSP block 1 14 6 118,210 70.13 74.07 MCI 69.77 96.77 81.08
NC 90.91 43.48 58.83
15 7 139,611 84.89 85.19 MCI 81.25 92.86 86.67
NC 90.91 76.92 83.33
16 8 163,516 64.94 59.26 MCI 48.72 90.48 63.34
NC 86.67 39.39 54.16
17 9 189,925 79.94 81.48 MCI 84.21 69.57 76.19
NC 80.00 90.32 84.85
18 10 218,838 85.71 85.19 MCI 100 71.43 83.33
NC 76.47 100 86.67
19 11 250,255 71.56 70.37 MCI 62.16 92.00 74.19
NC 88.24 51.72 65.22
20 12 284,176 66.48 61.11 MCI 51.22 95.45 66.67
NC 92.31 37.50 53.33
GSP block 2 14 6 120,034 65.74 62.50 MCI 54.55 96.00 69.57
NC 91.67 35.48 51.16
15 7 141,435 88.32 87.50 MCI 96.15 80.65 87.72
NC 80.00 96.00 87.27
16 8 165,340 52.23 53.57 MCI 57.14 14.81 23.52
NC 53.06 89.66 66.67
17 9 191,749 70.18 69.64 MCI 63.89 85.19 73.02
NC 80.00 55.17 65.30
18 10 220,662 63.87 64.29 MCI 67.74 67.74 67.74
NC 60.00 60.00 60.00
19 11 252,079 76.87 76.79 MCI 83.87 76.47 80.00
NC 68.00 77.27 72.34
20 12 286,000 81.25 82.14 MCI 81.82 75.00 78.26
NC 82.35 87.50 84.85
GSP block 3 14 6 121,048 70.75 71.43 MCI 66.67 89.66 76.47
NC 82.35 51.85 63.63
15 7 142,449 72.22 73.21 MCI 65.91 100 79.45
NC 100 44.44 61.53
16 8 166,354 67.69 67.86 MCI 70.00 70.00 70.00
NC 65.38 65.38 65.38
17 9 192,763 77.60 76.79 MCI 68.97 83.33 75.47
NC 85.19 71.88 77.97
18 10 221,676 80.14 80.36 MCI 78.13 86.21 81.97
NC 83.33 74.07 78.43
19 11 253,093 88.72 89.29 MCI 85.29 96.67 90.62
NC 95.45 80.77 87.5
20 12 287,014 69.74 71.43 MCI 66.67 93.33 77.78
NC 85.71 46.15 60.00
SS-GAN [12] 251,637 71.15 69.64 MCI 61.54 92.31 73.85
NC 88.24 50.00 63.83
triple-GAN [33] 506,386 73.44 71.43 MCI 61.76 87.50 72.41
NC 86.36 59.38 70.37
TABLE II: Comparison of THS-GAN using different GSP block positions and TT-ranks for MCI/NC classification.
GSP block Position C_rank D_rank #parameters AUC(%) Accuracy(%) Class precision(%) recall(%) f1-score(%)
GSP block 1 14 6 118,210 69.37 68.89 AD 62.50 90.91 74.07
MCI 84.62 47.83 61.12
15 7 139,611 50.00 46.94 AD 0 0 0
MCI 46.94 100 63.89
16 8 163,516 59.08 59.18 AD 59.09 54.17 56.52
MCI 59.26 64.00 61.54
17 9 189,925 63.83 64.44 AD 80.00 36.36 50.00
MCI 60.00 91.30 72.41
18 10 218,838 59.45 61.22 AD 70.00 60.43 64.86
MCI 58.97 88.46 70.77
19 11 250,255 55.18 55.10 AD 52.00 56.52 54.17
MCI 58.33 53.85 56.00
20 12 284,176 69.00 71.11 AD 76.92 50.00 60.61
MCI 68.75 88.00 77.19
GSP block 2 14 6 120,034 63.68 67.35 AD 60.00 47.37 52.94
MCI 70.59 80.00 75.00
15 7 141,435 48.71 53.06 AD 38.46 25.00 30.30
MCI 58.33 72.41 64.61
16 8 165,340 70.65 69.39 AD 86.67 50.00 63.42
MCI 61.76 91.30 73.68
17 9 191,749 85.35 85.71 AD 85.71 88.89 87.27
MCI 85.71 81.82 83.72
18 10 220,662 61.45 61.22 AD 56.00 63.64 59.58
MCI 66.67 59.26 62.75
19 11 252,079 57.74 63.27 AD 80.00 19.05 30.77
MCI 61.36 96.43 75.00
20 12 286,000 45.26 48.98 AD 33.33 25.00 28.57
MCI 55.88 65.52 60.32
GSP block 3 14 6 121,048 63.44 71.43 AD 70.73 93.55 80.56
MCI 75.00 33.33 46.15
15 7 142,449 74.00 73.47 AD 80.95 65.38 72.34
MCI 67.86 82.61 74.51
16 8 166,354 72.07 71.43 AD 80.00 61.54 69.57
MCI 65.52 82.61 73.08
17 9 192,763 59.47 55.10 AD 75.00 40.00 52.17
MCI 45.45 78.95 57.69
18 10 221,676 49.56 57.14 AD 37.50 15.79 22.22
MCI 60.98 83.33 70.42
19 11 253,093 69.05 71.43 AD 71.00 86.00 77.78
MCI 73.00 52.00 60.74
20 12 287,014 70.37 67.35 AD 100 41.00 58.16
MCI 58.00 100 73.42
SS-GAN [12] 251,637 50.00 48.98 AD 48.98 100 65.75
MCI 0 0 0
triple-GAN [33] 506,386 72.14 73.47 AD 76.47 59.09 66.67
MCI 71.88 85.19 77.97
TABLE III: Comparison of THS-GAN using different GSP block positions and TT-ranks for AD/MCI classification.

To investigate the effect of TT-rank and different GSP block position on classification performance, this section provides a comparative evaluation of the proposed THS-GAN with respect to a range of TT-rank values and different GSP block positions for each evaluation group. The TT-core number is fixed as 3. As far as we know, there have been no published studies that adopt tensor-train decomposition in GAN for semi-supervised classification. Thus the most suitable TT-rank remains to be explored. Nonetheless, we conducted a variety of preliminary experiments, and have empirically chosen TT-ranks according to the performances in our validation sets. More specifically, we consider the effect of TT-ranks on classification performance when and . Note that and represent TT-rank of classifier and discriminator respectively. SS-GAN [12] and triple-GAN [33] are used as two baseline models for comparison purpose. With respect to SS-GAN, the discriminator has 3 output units corresponding to . CLASS-1 and CLASS-2 correspond to one of classes AD, MCI, NC respectively according to the evaluation group. In this case, discriminator can also act as classifier. For the fair comparison, the two baselines have the same structure and hyperparameter settings as our model but without tensor-train decomposition and high-order module GSP block.

From Table I, it can be observed that the best AUC can be achieved using and no matter the GSP block is at the position of either GSP block 1, GSP block 2 or GSP block 3 in the context of AD/NC classification. The best AUC of 95.92% is obtained when GSP block 2 is inserted after Dense-BC block 2. On the other hand, in the context of MCI/NC classification, Table II shows that the optimal TT-rank is not consistent with AD/NC classification when GSP block is positioned at different locations. With respect to GSP block 1, a good AUC of 85.71% is obtained when and . Similarly regarding GSP block 2, a good AUC of 88.32% is obtained when and . In the same manner, with respect to GSP block 3, a good AUC of 88.72% is obtained when and . The best AUC of 88.72% is obtained when GSP block 3 is utilized. In the context of AD/MCI classification, Table III also indicates the same trend that the optimal TT-rank is different when GSP block is positioned at different locations. With respect to GSP block 1, a good AUC of 69.37% is obtained when and . Similarly regarding GSP block 2, a good AUC of 85.35% is obtained when and . In the same manner, with respect to GSP block 3, a good AUC of 74% is obtained when and . The best AUC of 85.35% is obtained when GSP block 2 is utilized.

From Table I to Table III, the following overall observations can be made. (1) THS-GAN with optimal hyperparameter settings can achieve the best classification performance in terms of AUC and accuracy compared with triple-GAN and SS-GAN. The triple-GAN performs better than SS-GAN, which confirms that the triple-GAN can alleviate the competing problem of SS-GAN that the discriminator has two incompatible convergence points. (2) Compared with the triple-GAN, THS-GAN can obtain AUC gains of 9.09% (95.92%-86.83%) for AD/NC classification, 15.28% (88.72%-73.44% ) for MCI/NC classification, and 13.21% (85.35%-72.14%) for AD/MCI classification, improving the performance by a large margin. This indicates that the performance of the proposed model is significantly improved by introducing tensor-train decomposition and high-order pooling. Furthermore, THS-GAN used far fewer parameters, compared with the triple-GAN which used 506,386 parameters. The compression rates are 506,386/286,000 =1.77 for AD/NC classification, 506,386/253,093 =2 for MCI/NC classification, and 506,386/191,749 =2.64 for AD/MCI classification respectively. (3) According to our results, the best classification results are obtained by utilizing either GSP block 2 or GSP block 3, but not GSP block 1. This observation indicates that exploiting the second-order statistics in the later layers can improve the predictive power significantly. The conjectured reason for this is that the features extracted in the earlier layers are simple and common, but in the later layers representative features will be abstracted, and by inserting the high-order pooling module GSP block in the later layers, more discriminative features can be enhanced and redundant features will be suppressed; thus the predictive performance is improved. Although inserting GSP block at the later layers will increase the number of parameters, the best trade-off between accuracy and number of parameters should be chosen at GSP block 2. GSP block 2 arrangement leads to the best accuracy with the optimal TT-ranks. (4) TT-rank has a significant effect on testing accuracy, and the optimal value of TT-rank depends on network architecture and data. It is difficult to specify an optimal value for TT-rank in advance. Again, this observation is consistent with [23] that finding optimal TT-rank remains a challenge. According to the experimental results, the optimal value of TT-rank lies in the range for classifier and for discriminator. It is not time-consuming to find it in practical applications. Under optimal TT-ranks, THS-GAN can achieve better performance than triple-GAN and our model uses fewer parameters, which indicates that TT-decomposition can utilize parameters more efficiently, and is less likely to converge to local minima. Note that the optimal hyperparameter settings for each evaluation group will be utilized in the rest of the experiments.

Iv-F The effect of the amount of labeled data

Fig. 5: Comparison of different number of labeled data for AD/MCI classification.
Fig. 6: Comparison of different number of labeled data for MCI/NC classification.
Fig. 7: Comparison of different number of labeled data for AD/NC classification.

In this subsection, we investigate the effect of using the different number of labeled data for semi-supervised classification. For our proposed THS-GAN, the architecture and hyperparameters are fixed as the optimal settings found in Section IV-E. The 3D-DenseNet architecture is the same as the classifier of THS-GAN but without tensor-train decomposition and GSP block. Similarly, the structure of SS-GAN is also the same as THS-GAN but without tensor-train decomposition and GSP block. It can be seen from Fig.5 that as the number of labeled data increased, our THS-GAN outperforms SS-GAN by a large margin and performs better than 3D-DenseNet when there are less labeled data for AD/MCI classification. Fig. 6 shows that as the number of labeled data increased, our THS-GAN always outperforms both 3D-DenseNet and SS-GAN for MCI/NC classification. The same trend can be found in Fig. 7. We can also observe that the THS-GAN requires fewer labeled samples to achieve comparable results. In Fig. 5, when the number of labeled data is small such as 300, our THS-GAN can still achieve better performance than SS-GAN and 3D-DenseNet which use more labeled data such as 330, 360, 390 and 420 respectively in the context of AD/MCI classification. Similar trends can also be found for MCI/NC and AD/NC in Fig. 6 and Fig. 7 respectively. This improvement is given by the real MRI images without labels and the synthetic MRI images produced by the generator.

Iv-G The effect of number of parameters

Fig. 8: Comparison of different number of parameters for AD/MCI classification.

In this subsection, we investigate the properties of THS-GAN and compare with triple-GAN uncompressed in the context of AD/MCI classification. In order to compare the performance for the same range of parameters, varies TT-ranks are utilized with respect to THS-GAN. The result in Fig. 8 illustrates that THS-GAN can obtain the best AUC with optimal TT-ranks when the number of parameters is compressed in the range [, ](in red dashed circle). Furthermore, THS-GAN can achieve comparable AUC when TT-ranks are set to large numbers, and the number of parameters is in the range of [, ] or [, ]. Overall speaking, THS-GAN can achieve much better performance when TT-rank is set to be not large, and the number of the parameter is compressed between and so that the predictive performance will be boosted.

Iv-H The convergence comparison

Fig. 9: Convergence curves for MCI/NC classification
Fig. 10: Convergence curves for AD/MCI classification

Fig. 9 and Fig. 10 show the convergence curves of our THS-GAN and SS-GAN for evaluation group MCI/NC and AD/MCI respectively. Our model converges faster than conventional SS-GAN. In the case of AD/MCI classification, SSgan can not converge since the differences between AD and MCI are so subtle that the T1-MRI images of AD, MCI, and fake are hard to be distinguished by the discriminator. These results are also consistent with Table II and Table III in Section IV-E. More specifically, for MCI/NC classification, AUC of THS-GAN (88.72%) is much higher than SS-GAN (71.15%) since THS-GAN converges faster than SS-GAN. For AD/MCI classification, AUC of SS-GAN is only 50%, and SS-GAN can not converge during the training process. On the other hand, our THS-GAN converges faster and AUC of 85.35% can be achieved.

Iv-I The visualization of generated images

30 Epoches 60 Epoches 90 Epoches
Coronal Sagittal Axial Coronal Sagittal Axial Coronal Sagittal Axial
    Fig.11: Comparison of brain MRI slices generated by SS-GAN, GAN and THS-GAN.

In this subsection, we visualize the center-cut slices of the generated 3D T1-MRI images from random latent vectors during the training process as shown in Fig. 11. In the beginning, generated samples are blurry, and the detailed features of the brain disappear. In the latter stage, compared with SS-GAN and GAN, the generated samples from our proposed THS-GAN can reflect more detailed attributes of the brain (e.g., sulci, gyri).

Iv-J The comparison with existing methods

Model MCI vs NC(%) AD vs MCI(%) AD vs NC(%)
ACC Recall AUC ACC Recall AUC ACC Recall AUC
Plocharski et al. [34] 84.40 82.30 84.00 81.50 81.70 83.00 92.30 91.30 98.00
Peng et al. [35] 71.60 83.90 - 65.40 41.20 - 88.40 84.10 -
Xu et al. [36] 70.89 61.39 79.02 - - - 90.40 92.36 95.36
Neffati et al. [37] - - - - - - 91.11 85.00 -
Li et al. [38] 75.00 81.90 75.80 - - - 89.10 84.60 91.00
Cui et al. [39] - - - - - - 91.33 86.87 93.22
Ren et al. [40] 88.50 82.16 82.00 85.32 78.79 80.00 93.75 94.23 93.00
Liu et al. [20] 77.84 76.81 82.72 - - - 84.97 82.65 90.63
Cheng et al. [41] - - - - - - 87.15 86.36 92.26
THS-GAN 89.29 96.67 88.72 85.71 88.89 85.35 95.92 95.83 95.92
TABLE IV: Comparison with existing methods

Several machine learning methods have been tried for the discrimination of subjects using structural MRI images. Table IV presents the reported performance of some related studies. Although a direct comparison of these studies is difficult, as each study uses different datasets and preprocessing protocols, the table indicates comparison results for the classification of T1-MRI images. Table IV demonstrates that our proposed method performs better than the previous methods. Compared with Plocharski et al. [34], our proposed method requires less image-preprocessing steps for feature extraction. No segmentation and rigid registration are required in our method. In particular, Li et al. [38]

constructed denseNets on the decomposed image patches of the internal and external hippocampus to learn the intensity and shape features. Then Recurrent neural network (RNN) is cascaded to combine the features from the left and right hippocampus, then the high-level features are abstracted for disease classification. Cheng et al.

[41] proposed a method based on a combination of multiple 3D-CNNs to classify AD and NC subjects. It is built on different local image patches to transform the local brain image into more compact high-level features. Our method outperforms those methods as well. It demonstrates the benefit of tensor-train decomposition and the high-order pooling module leveraged in our THS-GAN. Our method achieves superior classification performance, indicating its potential capability of assessing MCI and AD.

V Conclusions

In this paper, we developed a novel THS-GAN for assessing MCI and AD. The three-player cooperative game based framework is tensorized so that the proposed model can benefit from the structural information of the brain. By introducing high-order pooling in our model, more significant features can be extracted by making full use of the second-order statistics of the holistic MRI images. Thus the capability of our model is enhanced. To the best of our knowledge, the proposed THS-GAN model is the first work to consider tensor-train decomposition in GAN and leverage GAN for semi-supervised classification on MRI images for AD diagnosis. The experimental results demonstrate that the proposed THS-GAN model can obtain promising results. The visualization of the generated images during the training process also shows that our model can generate plausible MRI images. In future work, we will investigate the generated MRI images for data augmentation.


This work is supported by National Natural Science Foundations of China under Grant No. 61872351, Shenzhen Key Basic Research Project under Grant No.JCYJ20180507182506416.


  • [1] B. D. James, S. E. Leurgans, L. E. Hebert, P. A. Scherr, K. Yaffe, and D. A. Bennett, “Contribution of alzheimer disease to mortality in the united states,” Neurology, vol. 82, no. 12, pp. 1045–1050, 2014. [Online]. Available: https://n.neurology.org/content/82/12/1045
  • [2] C. Patterson, “World alzheimer report 2018 the state of the art of dementia research: new frontiers,” Alzheimer s Disease International (ADI): London,UK, 2018.
  • [3] J. Zhang, Y. Gao, Y. Gao, B. C. Munsell, and D. Shen, “Detecting anatomical landmarks for fast alzheimer s disease diagnosis,” IEEE Transactions on Medical Imaging, vol. 35, no. 12, pp. 2524–2533, Dec 2016.
  • [4] S. Liu, S. Liu, W. Cai, H. Che, S. Pujol, R. Kikinis, D. Feng, M. J. Fulham, and ADNI, “Multimodal neuroimaging feature learning for multiclass diagnosis of alzheimer’s disease,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 4, pp. 1132–1140, April 2015.
  • [5] B. Lei, P. Yang, T. Wang, S. Chen, and D. Ni, “Relational-regularized discriminative sparse learning for alzheimer s disease diagnosis,” IEEE Transactions on Cybernetics, vol. 47, no. 4, pp. 1102–1113, April 2017.
  • [6] L. Y, B. Y, and H. G, “Deep learning,” Nature, no. 521, pp. 436–444, 2015.
  • [7] Y. Wang, L. Zhou, B. Yu, L. Wang, C. Zu, D. S. Lalush, W. Lin, X. Wu, J. Zhou, and D. Shen, “3d auto-context-based locality adaptive multi-modality gans for pet synthesis,” IEEE Transactions on Medical Imaging, vol. 38, no. 6, pp. 1328–1339, June 2019.
  • [8] X. Yi, E. Walia, and P. Babyn, “Generative Adversarial Network in Medical Imaging: A Review,” arXiv e-prints, Sep. 2018.
  • [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.   Curran Associates, Inc., 2014, pp. 2672–2680. [Online]. Available: http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
  • [10] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds.   Curran Associates, Inc., 2016, pp. 2234–2242. [Online]. Available: http://papers.nips.cc/paper/6125-improved-techniques-for-training-gans.pdf
  • [11] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv e-prints, Nov. 2015.
  • [12] A. Odena, “Semi-Supervised Learning with Generative Adversarial Networks,” arXiv e-prints, 2016.
  • [13] J. Baron, G. Ch telat, B. Desgranges, G. Perchey, B. Landeau, V. de la Sayette, and F. Eustache, “In vivo mapping of gray matter loss with voxel-based morphometry in mild alzheimer’s disease,” NeuroImage, vol. 14, no. 2, pp. 298 – 309, 2001. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1053811901908481
  • [14] Y. Fan, D. Shen, R. C. Gur, R. E. Gur, and C. Davatzikos, “Compare: Classification of morphological patterns using adaptive regional elements,” IEEE Transactions on Medical Imaging, vol. 26, no. 1, pp. 93–105, Jan 2007.
  • [15] M. Liu, D. Zhang, D. Shen, and the Alzheimer’s Disease Neuroimaging Initiative, “Hierarchical fusion of features and classifier decisions for alzheimer’s disease diagnosis,” Human Brain Mapping, vol. 35, no. 4, pp. 1305–1319, 2014. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.22254
  • [16] A. Payan and G. Montana, “Predicting alzheimer’s disease: a neuroimaging study with 3d convolutional neural networks,” CoRR, vol. abs/1502.02506, 2015. [Online]. Available: http://arxiv.org/abs/1502.02506
  • [17] Y. Pan, M. Liu, C. Lian, T. Zhou, Y. Xia, and D. Shen, “Synthesizing missing pet from mri with cycle-consistent generative adversarial networks for alzheimer’s disease diagnosis,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger, Eds.   Cham: Springer International Publishing, 2018, pp. 455–463.
  • [18] K. Armanious, C. Jiang, S. Abdulatif, T. Küstner, S. Gatidis, and B. Yang, “Unsupervised medical image translation using cycle-medgan,” CoRR, vol. abs/1903.03374, 2019. [Online]. Available: http://arxiv.org/abs/1903.03374
  • [19]

    J. Islam and Y. Zhang, “Brain mri analysis for alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks,”

    Brain Informatics, vol. 5, no. 2, p. 2, May 2018. [Online]. Available: https://doi.org/10.1186/s40708-018-0080-3
  • [20] M. Liu, D. Cheng, K. Wang, Y. Wang, and the Alzheimer’s Disease Neuroimaging Initiative, “Multi-modality cascaded convolutional neural networks for alzheimer’s disease diagnosis,” Neuroinformatics, vol. 16, no. 3, pp. 295–308, Oct 2018. [Online]. Available: https://doi.org/10.1007/s12021-018-9370-4
  • [21] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2017.
  • [22] D. Gu, “3d densely connected convolutional network for the recognition of human shopping actions,” in Universit d’Ottawa/University of Ottawa, 2017.
  • [23] A. Novikov, D. Podoprikhin, A. Osokin, and D. Vetrov, “Tensorizing neural networks,” in Advances in Neural Information Processing Systems 28 (NIPS), 2015.
  • [24] T. Garipov, D. Podoprikhin, A. Novikov, and D. Vetrov, “Ultimate tensorization: compressing convolutional and FC layers alike,” arXiv preprint arXiv:1611.03214, 2016.
  • [25] I. Oseledets, “Tensor-train decomposition,” SIAM Journal on Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 2011. [Online]. Available: https://doi.org/10.1137/090752286
  • [26] H. Huang and H. Yu, “Ltnn: A layerwise tensorized compression of multilayer neural network,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2018.
  • [27] X. Cao and Q. Zhao, “Tensorizing generative adversarial nets,” CoRR, vol. abs/1710.10772, 2017. [Online]. Available: http://arxiv.org/abs/1710.10772
  • [28] J. M. Rondina, L. R. K. Ferreira, F. L. de Souza Duran, R. Kubo, C. R. Ono, C. C. Leite, J. Smid, R. Nitrini, C. A. Buchpiguel, and G. F. Busatto, “Selecting the most relevant brain regions to discriminate alzheimer’s disease patients from healthy controls using multiple kernel learning: A comparison across functional and structural imaging modalities and atlases,” in NeuroImage: Clinical, 2018.
  • [29] Z. Gao, J. Xie, Q. Wang, and P. Li, “Global second-order pooling convolutional networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  • [30] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv e-prints, Nov. 2015.
  • [31] M. Jenkinson and S. Smith, “A global optimisation method for robust affine registration of brain images,” Medical Image Analysis, vol. 5, no. 2, pp. 143 – 156, 2001. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1361841501000366
  • [32] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ser. ICML’13.   JMLR.org, 2013, pp. III–1139–III–1147. [Online]. Available: http://dl.acm.org/citation.cfm?id=3042817.3043064
  • [33] C. LI, T. Xu, J. Zhu, and B. Zhang, “Triple generative adversarial nets,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds.   Curran Associates, Inc., 2017, pp. 4088–4098. [Online]. Available: http://papers.nips.cc/paper/6997-triple-generative-adversarial-nets.pdf
  • [34] M. Plocharski and L. R. Østergaard, “Sulcal and cortical features for classification of alzheimer’s disease and mild cognitive impairment,” in Image Analysis, M. Felsberg, P.-E. Forssén, I.-M. Sintorn, and J. Unger, Eds.   Cham: Springer International Publishing, 2019, pp. 427–438.
  • [35] J. Peng, X. Zhu, Y. Wang, L. An, and D. Shen, “Structured sparsity regularized multiple kernel learning for alzheimer s disease diagnosis,” Pattern Recognition, vol. 88, pp. 370 – 382, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320318304151
  • [36] L. Xu, Z. Yao, J. Li, C. lv, H. Zhang, and B. hu, “Sparse feature learning with label information for alzheimer’s disease classification based on magnetic resonance imaging,” IEEE Access, vol. PP, pp. 1–1, 01 2019.
  • [37] S. Neffati, K. Ben Abdellafou, I. Jaffel, O. Taouali, and K. Bouzrara, “An improved machine learning technique based on downsized kpca for alzheimer’s disease classification,” International Journal of Imaging Systems and Technology, vol. 29, no. 2, pp. 121–131, 2019. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ima.22304
  • [38] F. Li and M. Liu, “A hybrid convolutional and recurrent neural network for hippocampus analysis in alzheimer’s disease,” Journal of Neuroscience Methods, vol. 323, pp. 108 – 118, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0165027019301463
  • [39] R. Cui and M. Liu, “Rnn-based longitudinal analysis for diagnosis of alzheimer s disease,” Computerized Medical Imaging and Graphics, vol. 73, pp. 1 – 10, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0895611118303987
  • [40] F. Ren, C. Yang, Q. Qiu, N. Zeng, C. Cai, C. Hou, and Q. Zou, “Exploiting discriminative regions of brain slices based on 2d cnns for alzheimer s disease classification,” IEEE Access, pp. 1–1, 2019.
  • [41] D. Cheng, M. Liu, J. Fu, and Y. Wang, “Classification of MR brain images by combination of multi-CNNs for AD diagnosis,” in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, ser. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 10420, Jul. 2017, p. 1042042.