Large-Scale Unsupervised Deep Representation Learning for Brain Structure

05/02/2018 ∙ by Ayush Jaiswal, et al. ∙ University of Southern California 0

Machine Learning (ML) is increasingly being used for computer aided diagnosis of brain related disorders based on structural magnetic resonance imaging (MRI) data. Most of such work employs biologically and medically meaningful hand-crafted features calculated from different regions of the brain. The construction of such highly specialized features requires a considerable amount of time, manual oversight and careful quality control to ensure the absence of errors in the computational process. Recent advances in Deep Representation Learning have shown great promise in extracting highly non-linear and information-rich features from data. In this paper, we present a novel large-scale deep unsupervised approach to learn generic feature representations of structural brain MRI scans, which requires no specialized domain knowledge or manual intervention. Our method produces low-dimensional representations of brain structure, which can be used to reconstruct brain images with very low error and exhibit performance comparable to FreeSurfer features on various classification tasks.



There are no comments yet.


page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Structural brain magnetic resonance imaging (sBMRI) helps medical practitioners make effective diagnoses of disorders by allowing the visualization of characteristics of their patients’ brains and the detection of abnormalities. The curation of such data, where the diagnosis has already been made, provides opportunities to use Machine Learning (ML) methods to learn models from these examples and assist medical practitioners in making future diagnoses more efficiently.

Developing ML models for classifying a subject’s brain as normal or having a certain disorder requires an appropriate feature representation of the brain. Software such as FreeSurfer 

111 have been traditionally used to extract summary statistics from macro-scale brain regions that are then used to train classification models [7, 12, 13]. These features typically consist of surface area, volume and thickness of various regions of the brain. Although these specialized features are automatically computed by software, manual verification and careful quality control is required to ascertain that the process is error-free and to manually correct errors when they occur. Further, such software conform to a very strong prior over the brain, a very fixed view of anatomy, which is not correct, especially in abnormal cases where the structure of brains are deformed. Moreover, such a feature representation comprises a fixed set of targeted region properties, which fails to capture other potentially important information contained in the images.

We propose an unsupervised deep representation learning approach, based on convolutional autoencoders (CAEs) 


, to learn low-dimensional representations of brain structure from sBMRI scans. We present three CAE models that we employed in our work: CAE-staged, CAE-joint and CAE-3D. CAE-staged and CAE-joint treat 3D brain images as a series of 2D frames along the Z-axis. CAE-staged comprises two separate CAEs: one learns frame-level representations and the other learns a combined representation of the brain structure from the frame-level encodings. We merge the two autoencoders of CAE-staged together to create the CAE-joint network for end-to-end representation learning. CAE-3D operates on the 3D brain images using 3D operations to directly learn latent embeddings. Large amounts of data are required to train these networks due to their massive number of parameters, as is true with most deep learning models. However, most publicly available sBMRI datasets are relatively very small, on the spectrum of tens to a few thousands of images. Hence, we combine data from nine different sources to create a common dataset for unsupervised representation learning.

The diversity induced from pooling data from multiple sources also makes the learned features more informative and robust. Features learned using our approach take considerably less time to construct and have performance comparable to those calculated using FreeSurfer (FS) on binary classification tasks based on identifying patients as healthy, having AD, having mild cognitive impairment (MCI), or having Autism Spectrum Disorder (ASD).

In the following sections, we present the data and preprocessing methods, conceptual understanding of CAEs, our CAE architectures, and their qualitative and quantitative analyses. To the best of our knowledge, our work is the first large-scale (spanning nine different datasets) unsupervised deep representation learning effort for structural brain images, paving the way for a plethora of possible future work, which is further motivated by our results.

2 Data

We train our deep CAE models on sBMRI scans, which are static 3D images of brains meant for capturing structural details, from nine different sources:

  • Alzheimer’s Disease Neuroimaging Initiative (ADNI) - for the study of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).

  • Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing (AIBL) - for the study of AD [2].

  • Open Access Series of Imaging Studies (OASIS) - for the study of AD. The data is made available by the Washington University Alzheimer’s Disease Research Center, Dr. Randy Buckner at the Howard Hughes Medical Institute (HHMI) at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN).

  • Autism Brain Imaging Data Exchange (ABIDE-I) - for the study of Autism Spectrum Disorder (ASD).

  • Brainomics/Localizer - for the study of inter-subject variability along different modalities of brain imaging [10, 9].

  • Human Connectome Project (HCP) - for the study of various functions [3].

  • International Consortium for Brain Mapping (ICBM) - for the development of a probabilistic reference system for the human brain.

  • Northwestern University Schizophrenia Data and Software Tool 666 (NUSDAST) - for the study of Schizophrenia.

  • Parkinson’s Progression Markers Initiative (PPMI) - for the study of the progression of Parkinson’s disease.

Data Preprocessing. The combined dataset consists of images. A typical sBMRI scan is a image containing the subject’s head. We use the skull-stripping functionality of FreeSurfer to extract only the brain from each image. We crop all the resulting images to the central bounding box based on our observation that this only removes empty space around the brains. We rotate each image by , , and degrees about each of the three axes, and translate both the original and the rotated images by and voxels along each of the three axes. This gives us a total of images, further enhancing the data size and making our models invariant to translation and rotation. We use of the original images and their augmented versions ( in total) to train our models. We downsample all the images to dimensions to reduce the memory requirement for training.

3 Deep Representation Learning Models

3.1 Convolutional Autoencoders

An autoencoder [6]

(AE) is a neural network composed of an

encoder, which transforms data into its latent representation, and a decoder, which reconstructs data from its encoding. The model is trained to minimize the reconstruction loss. Convolutional Autoencoders (CAEs) [8]

effectively capture frequently occurring local features through parameter sharing across the input by employing convolution layers in the encoder and deconvolution layers in the decoder. We use max-pooling layers in CAE encoders to gradually downsample data and unpooling layers in CAE decoders for progressive upsampling. We use the rectified linear activation function (described by

) at all the layers of our networks, and batch-normalization to speed up the training process.

Figure 1: Architecture of CAE-staged and CAE-joint models. The left half of the figure is CAE-staged-I and the right half is CAE-staged-II, as marked. Merging the layers of the two networks, we get CAE-joint as the entire graph in the figure.
Figure 2:

Architecture of the CAE-3D model. All the operations are done in 3D. Hence, intermediate data for every image are 4D tensors.

3.2 CAE Architectures

CAE-staged. This model consists of two separate CAEs: CAE-staged-I and CAE-staged-II. Figure 1 describes their complete architectures as the left and the right halves of the entire graph. We first train CAE-staged-I to reconstruct any given frame so that it learns -dimensional frame-level representations. We then stack the -dimensional representations of all the frames to create a 2D representation of each brain. We train CAE-staged-II to learn a -dimensional latent representation of entire brains that can encode and reconstruct these intermediate encodings.

CAE-joint. We create the CAE-joint model by merging CAE-staged-I and CAE-staged-II into a single network, as shown in Figure 1. We reshape -dimensional frame-level encodings into 2D brain representations in the encoder and the reconstructed representations into reconstructed -dimensional frame-level encodings in the decoder. We use the learned weights from CAE-staged-I and CAE-staged-II as initial weights while training CAE-joint. CAE-joint also produces -dimensional encodings for entire brains.

CAE-3D. This model is a single CAE which treats each brain as a 3D image. It uses the 3D versions of convolution, deconvolution, pooling and unpooling layers. Figure 2 describes the complete architecture of CAE-3D. We train it to directly learn a -dimensional representation of brain images that can encode and reconstruct brain images wholly.

(a) Original brain image
(b) CAE-staged-I reconstruction
(c) CAE-joint reconstruction
(d) CAE-3D reconstruction
Figure 3: 2D frames in original and reconstructed brain images.

4 Experimental Evaluation

4.1 Qualitative Analysis

The original brain images have voxel values between and . The voxel-wise reconstruction mean squared errors of CAE-staged-I, CAE-joint and CAE-3D on validation data are , and , respectively. We visualize reconstructed brain images by plotting a few 2D frames from the 3D tensors. Figure 3 shows frames from original and reconstructed tensors generated by CAE-staged-I (first half of CAE-staged), CAE-joint and CAE-3D models. It can be seen that in all the three models, the learned representations can be used to generate high quality reconstructions of brain images.

To understand how our models extract features from 3D brain images, we infer and visualize the saliency maps of the nodes in the 50-dimensional brain representation layers of our models using the publicly available keras-vis 999 library. The saliency map of a node is calculated by taking the absolute value of the partial derivative of the node’s value with respect to the input features. Hence, it indicates the magnitude of the effect of a voxel’s perturbation on the node’s activation. Figures 4,5 and 6 show the saliency maps of a few nodes in the embedding layers of CAE-staged, CAE-joint and CAE-3D, respectively. Each row in these figures corresponds to five frames of the saliency map of a single embedding node. The results show that all the three models capture features from highly-complex 3D substructures of brain images. The maps also reveal that each embedding node has different localized attention regions, which it accounts for in the information it captures. The saliency maps of the three models look dissimilar, indicating that they process brain images differently. The attention regions of CAE-3D look relatively more localized compared to those of CAE-staged and CAE-joint, indicating its stronger focus on local details.

Figure 4: Saliency map of two nodes in the embedding layer - CAE-staged
Figure 5: Saliency map of two nodes in the embedding layer - CAE-joint
Figure 6: Saliency map of two nodes in the embedding layer - CAE-3D

4.2 Quantitative Analysis

We compare the learned CAE embeddings with the features extracted using FS on four classification tasks. Thus, we evaluate four feature representations: (1) FreeSurfer (FS), (2) CAE-staged (CAES), (3) CAE-joint (CAEJ) and (4) CAE-3D. The classification tasks are derived from two datasets: ADNI and ABIDE-I. The ADNI dataset has

healthy subjects (H-ADNI), subjects with AD and subjects with MCI. The ABIDE-I dataset has healthy subjects (H-ABIDE) and

subjects with ASD. The classification tasks we evaluate on are: (1) H-ADNI vs. AD, (2) H-ADNI vs. MCI, (3) AD vs. MCI, and (4) H-ABIDE vs. ASD. We train a Logistic Regression classifier and a Random Forest classifier using each feature representation for each classification task. We use Area Under the Receiver Operating Characteristic curve (AUROC) 


as the evaluation metric. We use

of the data for training and for testing, split randomly. We summarize the results of our experiments on test data in Table 1. We see that the embeddings learned by our models show performance comparable to traditionally used FS features on all the aforementioned classification tasks, with CAE-3D performing slightly better than CAE-joint.

Timing Analysis. Feature construction using FreeSurfer takes hours for each image 101010 In contrast, CAE-staged and CAE-joint take , and CAE-3D takes on average to generate the latent embedding of each brain image.

Logistic Regression Random Forest
H-ADNI / AD 0.81 0.67 0.82 0.81 0.86 0.64 0.80 0.83
AD / MCI 0.71 0.67 0.76 0.72 0.77 0.70 0.73 0.73
H-ADNI / MCI 0.77 0.75 0.76 0.76 0.81 0.71 0.77 0.80
H-ABIDE / ASD 0.60 0.57 0.57 0.60 0.65 0.64 0.63 0.66
Table 1: AUROC with Logistic Regression and Random Forest. Colors indicate similarity of scores obtained using the same classifier for each classification task. Darker colors show higher scores.

5 Related Work

Güçlü and Gerven [5] showed that unsupervised feature learning from functional brain MRI data improves human brain activity prediction in response to natural images. Brosch et al. [1]

proposed a method for manifold learning of brain images in the ADNI dataset using Deep Belief Networks (DBNs) composed of convolutional Restricted Boltzmann Machines. However, their method is not end-to-end trainable and, hence, not scaleable. Plis et al. 

[11] presented the use of DBNs in a Constraint Satisfaction Problem framework to learn latent embeddings of gray matter images extracted from sBMRI scans on two datasets separately. Our work is different from theirs as we propose a large-scale deep unsupervised approach to learn latent representations of brain structure from sBMRI scans, spanning nine different datasets, without the need for first extracting gray matter images.

6 Conclusion

We presented an approach for large-scale deep unsupervised representation learning for sBMRI data incorporating models based on CAEs: CAE-staged, CAE-joint and CAE-3D. Features learned using our approach can reconstruct brain images with very low error and show performance comparable to FS features on classification tasks. Feature encoding using our method takes considerably less time compared to FS features, while employing no specialized domain knowledge. Our models do not have a fixed view of brain anatomy and can be made increasingly general with diversified training data. The proposed models are adaptive, i.e., the learned features can be improved by training with more data.

Acknowledgments. The collection and sharing of data used in this work was funded by NIH Grant U01 AG024904, Department of Defense award number W81XWH-12-2-0012, P50 AG05681, P01 AG03991, R01 AG021910, P20 MH071616, U24 RR021382, NIMH K23MH087770, the McDonnell Center for Systems Neuroscience at Washington University, 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research, National Institute of Biomedical Imaging and BioEngineering, NIMH 1R01 MH084803, the Michael J. Fox Foundation for Parkinson’s Research and funding partners, the Leon Levy Foundation and primary support for the work by Michael P. Milham (MPM) and the INDI team was provided by gifts from Joseph P. Healy and the Stavros Niarchos Foundation to the Child Mind Institute, as well as by an NIMH award to MPM (R03MH096321). The authors also thank Daniel Moyer, Neda Jahanshad and Marc Harrison of USC Image Genetics Center for assistance with retrieving part of the data used in this project. Computation for the work described in this paper was supported by USC’s Center for High-Performance Computing.


  • [1] Brosch, T., Tam, R., Alzheimer’s Disease Neuroimaging Initiative: Manifold learning of brain mris by deep learning. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 633–640. Springer (2013)
  • [2]

    Ellis, K.A., Bush, A.I., Darby, D., Fazio, D.D., Foster, J., Hudson, P., Lautenschlager, N.T., Lenzo, N., Martins, R.N., Maruff, P., Masters, C., Milner, A., Pike, K., Rowe, C., Savage, G., Szoeke, C., Taddei, K., Villemagne, V., Woodward, M., Ames, D.: The australian imaging, biomarkers and lifestyle (aibl) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of alzheimer’s disease. International Psychogeriatrics 21(4), 672–687 (2009)

  • [3] Essen, D.V., Ugurbil, K., Auerbach, E., D. Barch, c.T.B., Bucholz, R., Chang, A., Chen, L., Corbetta, M., Curtiss, S., Penna, S.D., Feinberg, D., Glasser, M., Harel, N., Heath, A., Larson-Prior, L., Marcus, D., Michalareas, G., Moeller, S., Oostenveld, R., Petersen, S., Prior, F., Schlaggar, B., Smith, S., Snyder, A., Xu, J., Yacoub, E., Consortium, W.M.H.: The human connectome project: A data acquisition perspective. NeuroImage 62(4), 2222–2231 (Feb 2012)
  • [4] Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (Jun 2006)
  • [5] Güçlü, U., van Gerven, M.A.J.: Unsupervised feature learning improves prediction of human brain activity in response to natural images. PLOS Computational Biology 10(8), 1–12 (Aug 2014)
  • [6] Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006a)
  • [7] Lebedev, A., Westman, E., Westen, G.V., Kramberger, M., Lundervold, A., Aarsland, D., Soininen, H., Kłoszewska, I., Mecocci, P., Tsolaki, M., Vellas, B., Lovestone, S., Simmons, A.: Random forest ensembles for detection and prediction of alzheimer’s disease with a good between-cohort robustness. NeuroImage : Clinical 6, 115–125 (Aug 2014)
  • [8] Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Proceedings of the 21th International Conference on Artificial Neural Networks - Volume Part I. pp. 52–59. ICANN’11, Springer-Verlag, Berlin, Heidelberg (2011)
  • [9] Orfanos, D.P., Michel, V., Schwartz, Y., Pinel, P., Moreno, A., Bihan, D.L., Frouin, V.: The brainomics/localizer database. NeuroImage 144, Part B, 309 – 314 (2017), data Sharing Part {II}
  • [10] Pinel, P., Fauchereau, F., Moreno, A., Barbot, A., Lathrop, M., Zelenika, D., Le Bihan, D., Poline, J.B., Bourgeron, T., Dehaene, S.: Genetic variants of foxp2 and kiaa0319/ttrap/them2 locus are associated with altered brain activation in distinct language-related regions. Journal of Neuroscience 32(3), 817–825 (2012)
  • [11] Plis, S.M., Hjelm, D.R., Salakhutdinov, R., Allen, E.A., Bockholt, H.J., Long, J.D., Johnson, H.J., Paulsen, J.S., Turner, J.A., Calhoun, V.D.: Deep learning for neuroimaging: a validation study. Frontiers in Neuroscience 8, 229 (2014)
  • [12] Retico, A., Tosetti, M., Muratori, F., Calderoni, S.: Neuroimaging-based methods for autism identification: a possible translational application? Functional Neurology 29(4), 231–239 (Oct–Dec 2014)
  • [13] Sabuncu, M.R., Konukoglu, E.: Clinical prediction from structural brain mri scans: A large-scale empirical study. Neuroinformatics 13(1), 31–46 (Jan 2015)