Ensemble of 3D CNN regressors with data fusion for fluid intelligence prediction

by   Marina Pominova, et al.

In this work, we aim at predicting children's fluid intelligence scores based on structural T1-weighted MR images from the largest long-term study of brain development and child health. The target variable was regressed on a data collection site, socio-demographic variables and brain volume, thus being independent to the potentially informative factors, which are not directly related to the brain functioning. We investigate both feature extraction and deep learning approaches as well as different deep CNN architectures and their ensembles. We propose an advanced architecture of VoxCNNs ensemble, which yield MSE (92.838) on blind test.



There are no comments yet.


page 1

page 2

page 3

page 4


Predicting Fluid Intelligence of Children using T1-weighted MR Images and a StackNet

In this work, we utilize T1-weighted MR images and StackNet to predict f...

Predicting intelligence based on cortical WM/GM contrast, cortical thickness and volumetry

We propose a four-layer fully-connected neural network (FNN) for predict...

A Combined Deep Learning-Gradient Boosting Machine Framework for Fluid Intelligence Prediction

The ABCD Neurocognitive Prediction Challenge is a community driven compe...

Association Between Intelligence and Cortical Thickness in Adolescents: Evidence from the ABCD Study

The relationship between the intelligence and brain morphology is warmly...

Weighted Ensemble-model and Network Analysis: A method to predict fluid intelligence via naturalistic functional connectivity

Objectives: Functional connectivity triggered by naturalistic stimulus (...

A Domain Guided CNN Architecture for Predicting Age from Structural Brain Images

Given the wide success of convolutional neural networks (CNNs) applied t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Understanding cognitive development in children may potentially improve their health outcomes through adolescence. Thus, determining neural mechanism underlying general intelligence is a critical task. One of two discrete factors of general intelligence is fluid intelligence.

Fluid intelligence is the capacity to think logically and solve problems in novel situations, independent of acquired knowledge. It involves the ability to identify patterns and relationships that underpin novel problems and to extrapolate these findings using logic [Car93].

There are research devoted on fluid intelligence prediction based on different brain imaging techniques and extracted features [ZLL18],[PLN16]. However, the authors could not highlight robust biomarkers and methods to predict fluid intelligence scores .

Deep learning approaches and convolutional neural networks, in particular, have shown high potential on imagery classification, recognition and processing and thus could be considered useful for fluid intelligence scores prediction based on MRI data (3D brain images).

The advantage of deep learning methods is the ability to automatically derive complex and informative features from the raw data during the training process. That allows training a neural network directly on high-dimensional 3D brain imaging data skipping the feature extraction step.

By design, neural architectures for deep learning are built in a modular way, with basic building blocks, such as composite convolutional layers, typically reused across many models and applications. This enables the standardization of deep learning architectures, with much research devoted to the exploration of pre-built layers and pre-trained activations (for transfer learning, image retrieval, etc.). However, the choice of appropriate architecture targeting specific clinical applications such as cognitive potential prediction or pathology classification remains open problem and requires further investigation.

In the present study we carry out an extensive experimental evaluation of deep voxelwise neural network architectures for fluid intelligence scores prediction based on MRI data with multimodal input structure.

The article has the following structure. In Section 2 we overview deep network architectures used for MRI data processing. In Section 3 we present the training dataset and our deep network architecture. We describe obtained results in Section 4, provide discussions in Section 5 and draw conclusions in Section 6.

2 Related work

There is a number of successful applications of convolutional neural networks (CNN) with different architectures for segmentation of MRI data. Many of these solutions are based on adapting existing approaches to analyzing 2D images for processing of three-dimensional data.

For example, for segmentation of the brain, an architecture similar to ResNet [HZRS16] was proposed, which expands the possibilities of deep residual learning for processing volumetric MRI data using 3D filters in convolutional layers. The model, called VoxResNet [CDY18], consists of volumetric residual blocks (VoxRes blocks), containing convolutional layers as well as several deconvolutional layers. The authors demonstrated the potential of ResNet-like volumetric architectures, achieving better results than many modern methods of MRI image segmentation [MNA16]. Convolutional neural networks also showed good classification results in problems associated with neuropsychiatric diseases such as Alzheimer’s disease.

Recently proposed classification model with a VGG-like architecture called VoxCNN was used for neuro-degenerative decease classification [HAGEB16]. These results were more accurate or comparable to earlier approaches that use previously extracted morphometrical lower dimensional brain characteristics [SAA18, SAK18, ISA18].

Thus, this indicates that convolutional networks can be applied directly to the raw neuroimaging data without loss of model performance and over-fitting, which allows skipping the pre-processing step.

However, to the depth of our knowledge, there has not been much work on the use of convolutional networks for predicting fluid intelligence based on MRI imaging.

3 Materials and Methods

3.1 Data set

The training data set is provided by ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge 2019111https://sibis.sri.com/abcd-np-challenge/). The data contained of T1-weighed MRI images for four thousand individuals (of age 9-10 years) and corresponding sociodemographic variables [HHM18]. The participants’ fluid intelligence scores (4154 subjects, 3739 for training and 415 for validation) are also provided.

3.2 Target processing

The fluid intelligence scores were pre-residualized on a data collection site, sociodemographic variables and brain volume. For that a linear regression model was fitted with fluid intelligence as the dependent variable and brain volume, data collection site, age at baseline, sex at birth, race/ethnicity, highest parental education, parental income, and parental marital status as independent variables


The obtained residuals are used as targets to be predicted by a regression model.

3.3 MRI data processing

Imagery dataset consists of skull stripped images affinely aligned to the SRI 24 atlas [RZSP10], segmented into regions of interest according to the atlas, and the corresponding volume scores of each ROI [PKB17]. T1-weighted MRI was transformed according to the Minimal Processing Pipeline by ABCD [HHM18].

The cross-sectional component of the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA) pipeline [BBT15] was applied to T1 images. The steps included noise removal and field inhomogeneity correction confined to the brain mask, defined by non-rigidly aligning SRI24 atlas to the T1w MRI via Advanced normalization tools (ANTS) [ATS09].

The brain mask was refined by majority voting across maps extracted by FSL BET [Smi02], AFNI 3dSkullStrip [Cox96], FreeSurfer mrigcut [SZCZ10], and the Robust Brain Extraction (ROBEX) methods [ILTT11], which were applied on combinations of bias and non-bias corrected T1w images. Using the refined masked, image inhomogeneity correction was repeated and the skull-stripped T1w image was segmented into brain tissue (gray matter, white matter, and cerebrospinal fluid) via Atropos [ATW11]. Gray matter tissue was further parcelled according to the SRI24 atlas, which was non-rigidly registered to the T1w image via ANTS.

3.4 Specifications of the investigated models

We use an ensemble of deep neural networks with VoxCNN architecture [KSBD17, PAS18] to solve the regression problem. The proposed architecture has already demonstrated some successful applications to brain image analysis tasks. To provide better convergence and stronger regularization of results we enhanced this architecture.

VoxCNN networks are similar to VGG [SZ14] architecture, which is a popular architecture for 2D-images classification. VoxCNN applies 3D convolutions to deal with three-dimensional MRI brain scans.

Proposed network consists of four blocks with two convolutional layers each having 3D convolutions followed by batch-normalization and ReLU activation function


. Number of filters in convolutional layers starts from 16 in the first block and doubles with each next block. Filters of the very first layer are applied with the stride


to reduce the dimension of the original image. Our experiments have shown that this step does not reduce the network performance but helps to speed up the convergence and meet the limitations of GPU memory. The blocks are separated by max-pooling layers. We also apply 3D-dropout after each pooling layer to promote independence between feature maps and reduce over-fitting


Next, feature maps extracted by the convolutional layers are fed into the fully connected layer with 1024 hidden units, batch-normalization, ReLU activation, and dropout regularization, and then to the final layer with a single unit without non-linearity.

It was previously shown that auxiliary tower backpropagates the classification loss earlier in the network, serving as an additional regularization mechanism

[SLJ15, SVI16].

Therefore, the auxiliary output was added to the network to provide better training of the deeper layers. For this purpose, feature maps from intermediate layers are fed to the separate fully connected layer to produce another target prediction, which is then added to the main network output with adjusted weight. In this case, the output of the third block of convolutional layer was used to compute auxiliary prediction and average it with the main output with weights 0.4 and 0.6 respectively.

We estimate quality of the models by Mean Squared Error (MSE) between the predicted scores and the pre-residualized fluid intelligence scores. The models were selected by optimizing the

MSE-loss with the Adam optimizer. The learning rate was set to 3e-5, batch size is 10 and each network was trained until the loss on validation set starts to increase.

To train the model we use multi-modal input data: brain scan data (T1-weighted imagery after preprocessing) and gray matter segmented brain masks. For each subject, two three-dimensional images were stacked as channels of a single image. We fed the resulted 3D image with two channels into the VoxCNN network as an input.

We use cross-validation to increase the model performance: we divide the training sample into two separate parts and two neural networks are trained with the same architecture on each part independently. Then for the validation subjects, an ensemble of these two models, defined as a weighted average of their predictions, is applied. Weights for averaging are determined based on the validation performance of each model (test predictions of the network that turned out to demonstrate lower MSE score on validation were set to larger weights). The number of layers, Stride and ReLU blocks position were adjusted correspondingly.

The train set consists of n = 3739 samples, the validation set – n = 415 samples, and the test set – n = 4515 samples.

The models were implemented in PyTorch and trained on a single GPU [CPC16].

(a) Straight-forward
(b) Architecture with auxiliary output
Figure 1: VoxCNN model architectures used for fluid target prediction.

4 Experimental results

In Table 1 represented deep neural network architectures used and corresponding results for fluid intelligence prediction. Here the brain morphemic characteristics predictive capacity is considered as a baseline for prediction.

# Model architecture MSE
1 Brain morphometry 71.293
2 VoxCNN on brain T1 imagery 71.777
3 VoxCNN on 3D segmented brain mask 72.094
4 Ensemble: VoxCNNs on T1 and segmented mask 71.314
5 Ensemble: VoxCNNs on T1, segmented mask with morphology features 70.635
Table 1: Model architectures and results on the Validation set.
# Model architecture MSE
1 Ensemble: VoxCNNs on T1 and segmented mask 92.8378
2 Ensemble: VoxCNNs on T1, segmented mask with morphology features 94.0808
Table 2: Model architectures and results for the fluid intelligence prediction on the Test set.

The most accurate prediction (in terms of MSE on the validation set) was obtained as a weighted average of the two predictions by VoxCNN neural networks trained on different parts of the training sample:

  1. VoxCNN network, trained on both brain T1 images and segmented images,

  2. VoxCNN network (with auxiliary head for better convergence), trained on brain T1 images, segmented images and additional socio-demographic data. We used segmented brain masks and full brain imagery after pre-processing.

As a result, the first and the second network architectures showed and MSE scores on the Validation set. After averaging the predictions with adjusted weights and , the final validation performance reached MSE when using ensembles of models.

Then on the Test set the ensemble models yielded and MSE scores correspondingly.

5 Discussion

All constructed regression models provided MSE, which is equal approximately to . These results are comparable to the baseline result, calculated using morphological characteristics on the Validation set.

This incremental improvement and rather high errors across all models could potentially imply both the study design and the data inconsistency: the reason may be that structural T1-weighted images alone are not enough to predict fluid intelligence scores; at the same time brain functional data like fMRI might have more predictive power for cognitive assessment.

The top performing model was the combination (a weighted average prediction) of two VoxCNN neural networks trained on different parts of the training sample, highlighting the potential strength of the models’ ensembles yielded MSE on the Validation set and MSE on the Test set.

6 Conclusion

In our work for the first time ensembles of VoxCNN networks were applied to the 3D brain imagery regression task. According to the results of this architecture we could consider it as a consistent predictive tool for large datasets with heavy and multi-modal inputs.

Due to the rich structure of the considered dataset there is enough room for further improvements. A future work on the model hyperparameters optimization is needed in order to achieve better network convergence. We can use advanced approaches to initialization of neural network parameters

[BE16] and construction of ensembles [BP13]. Sparse 3D convolutions could decrease memory requirements [NKB18].

Transfer learning and domain adaptation techniques could potentially show better performance [GMK17, LZC17, GWB16]. Also we can utilize multi-fidelity approaches when solving the regression problem with multi-modal data [BZ15, ZB17a, ZB17b]. Conformal prediction framework [KBB18, BV14, BN16] is a ready-to-use tool to assess prediction uncertainty.

The considered problem was formulated in the scope of the Project “Machine Learning and Pattern Recognition for the development of diagnostic and clinical prognostic prediction tools in psychiatry, borderline mental disorders, and neurology” (a part of the Skoltech Biomedical Initiative program).

6.0.1 Acknowledgements

The work was supported by the Russian Science Foundation under Grant 19-41-04109.


  • [ATS09] Brian B Avants, Nick Tustison, and Gang Song. Advanced normalization tools (ants). Insight j, 2:1–35, 2009.
  • [ATW11] Brian B Avants, Nicholas J Tustison, Jue Wu, Philip A Cook, and James C Gee. An open source multivariate framework for n-tissue segmentation with evaluation on public data. Neuroinformatics, 9(4):381–400, 2011.
  • [BBT15] Sandra A Brown, Ty Brumback, Kristin Tomlinson, Kevin Cummins, Wesley K Thompson, Bonnie J Nagel, Michael D De Bellis, Stephen R Hooper, Duncan B Clark, Tammy Chung, et al. The national consortium on alcohol and neurodevelopment in adolescence (ncanda): a multisite study of adolescent development and substance use. Journal of studies on alcohol and drugs, 76(6):895–908, 2015.
  • [BE16] E. Burnaev and P. Erofeev. The influence of parameter initialization on the training time and accuracy of a nonlinear regression model. Journal of Communications Technology and Electronics, 61(6):646–660, Jun 2016.
  • [BN16] E. Burnaev and I. Nazarov.

    Conformalized kernel ridge regression.

    In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 45–52, 2016.
  • [BP13] E. V. Burnaev and P. V. Prikhod’ko. On a method for constructing ensembles of regression models. Automation and Remote Control, 74(10):1630–1644, Oct 2013.
  • [BV14] E. Burnaev and V. Vovk. Efficiency of conformalized ridge regression. In Maria Florina Balcan, Vitaly Feldman, and Csaba Szepesvari, editors, Proceedings of The 27th Conference on Learning Theory, volume 35 of Proceedings of Machine Learning Research, pages 605–622, Barcelona, Spain, 13–15 Jun 2014. PMLR.
  • [BZ15] E. Burnaev and A. Zaytsev. Surrogate modeling of multifidelity data for large samples. Journal of Communications Technology and Electronics, 60(12):1348–1355, 2015.
  • [Car93] John B. Carroll. Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge University Press, 1993.
  • [CDY18] Hao Chen, Qi Dou, Lequan Yu, Jing Qin, and Pheng-Ann Heng. Voxresnet: Deep voxelwise residual networks for brain segmentation from 3d mr images. NeuroImage, 170:446–455, 2018.
  • [Cox96] Robert W Cox. Afni: software for analysis and visualization of functional magnetic resonance neuroimages. Computers and Biomedical research, 29(3):162–173, 1996.
  • [CPC16] Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678, 2016.
  • [ESH19] Konstantin Eckle and Johannes Schmidt-Hieber. A comparison of deep networks with relu activation function and linear spline-type methods. Neural Networks, 110:232–242, 2019.
  • [GMK17] Mohsen Ghafoorian, Alireza Mehrtash, Tina Kapur, Nico Karssemeijer, Elena Marchiori, Mehran Pesteie, Charles RG Guttmann, Frank-Erik de Leeuw, Clare M Tempany, Bram van Ginneken, et al. Transfer learning for domain adaptation in mri: Application in brain lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 516–524. Springer, 2017.
  • [GWB16] Michael Goetz, Christian Weber, Franciszek Binczyk, Joanna Polanska, Rafal Tarnawski, Barbara Bobek-Billewicz, Ullrich Koethe, Jens Kleesiek, Bram Stieltjes, and Klaus H Maier-Hein.

    Dalsa: domain adaptation for supervised learning from sparsely annotated mr images.

    IEEE transactions on medical imaging, 35(1):184–196, 2016.
  • [HAGEB16] Ehsan Hosseini-Asl, Georgy Gimel’farb, and Ayman El-Baz. Alzheimer’s disease diagnostics by a deeply supervised adaptable 3d convolutional network. arXiv preprint arXiv:1607.00556, 2016.
  • [HHM18] Donald J Hagler, Sean N Hatton, Carolina Makowski, M Daniela Cornejo, Damien A Fair, Anthony Steven Dick, Matthew T Sutherland, BJ Casey, Deanna M Barch, Michael P Harms, et al. Image processing and analysis methods for the adolescent brain cognitive development study. bioRxiv, page 457739, 2018.
  • [HZRS16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 770–778, 2016.
  • [ILTT11] Juan Eugenio Iglesias, Cheng-Yi Liu, Paul M Thompson, and Zhuowen Tu. Robust brain extraction across datasets and comparison with publicly available methods. IEEE transactions on medical imaging, 30(9):1617–1634, 2011.
  • [ISA18] S. Ivanov, M. Sharaev, A. Artemov, E. Kondratyeva, A. Cichocki, S. Sushchinskaya, E. Burnaev, and A. Bernstein. Learning connectivity patterns via graph kernels for fmri-based depression diagnostics. In Proc. of IEEE International Conference on Data Mining Workshops (ICDMW), pages 308–314, 2018.
  • [KBB18] A. Kuleshov, A. Bernstein, and E. Burnaev. Conformal prediction in manifold learning. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Evgueni Smirnov, and Ralf Peeters, editors, Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications, volume 91 of Proceedings of Machine Learning Research, pages 234–253. PMLR, 11–13 Jun 2018.
  • [KSBD17] Sergey Korolev, Amir Safiullin, Mikhail Belyaev, and Yulia Dodonova. Residual and plain convolutional neural networks for 3d brain mri classification. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pages 835–838. IEEE, 2017.
  • [LZC17] Hao Lu, Lei Zhang, Zhiguo Cao, Wei Wei, Ke Xian, Chunhua Shen, and Anton van den Hengel.

    When unsupervised domain adaptation meets tensor representations.

    In Proceedings of the IEEE International Conference on Computer Vision, pages 599–608, 2017.
  • [MNA16] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 565–571. IEEE, 2016.
  • [NKB18] A. Notchenko, Ye. Kapushev, and E. Burnaev. Large-scale shape retrieval with sparse 3d convolutional neural networks. In Wil M.P. van der Aalst, D. Ignatov, M. Khachay, and et al., editors, Analysis of Images, Social Networks and Texts, pages 245–254, Cham, 2018. Springer International Publishing.
  • [PAS18] Marina Pominova, Alexey Artemov, Maksim Sharaev, Ekaterina Kondrateva, Alexander Bernstein, and Evgeny Burnaev.

    Voxelwise 3d convolutional and recurrent neural networks for epilepsy and depression diagnostics from structural and functional mri data.

    In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pages 299–307. IEEE, 2018.
  • [PKB17] Adolf Pfefferbaum, Dongjin Kwon, Ty Brumback, Wesley K Thompson, Kevin Cummins, Susan F Tapert, Sandra A Brown, Ian M Colrain, Fiona C Baker, Devin Prouty, et al. Altered brain developmental trajectories in adolescents after initiating drinking. American journal of psychiatry, 175(4):370–380, 2017.
  • [PLN16] Erick J Paul, Ryan J Larsen, Aki Nikolaidis, Nathan Ward, Charles H Hillman, Neal J Cohen, Arthur F Kramer, and Aron K Barbey. Dissociable brain biomarkers of fluid intelligence. NeuroImage, 137:201–211, 2016.
  • [RZSP10] Torsten Rohlfing, Natalie M Zahr, Edith V Sullivan, and Adolf Pfefferbaum. The sri24 multichannel atlas of normal adult human brain structure. Human brain mapping, 31(5):798–819, 2010.
  • [SAA18] M. Sharaev, A. Andreev, A. Artemov, E. Burnaev, E. Kondratyeva, S. Sushchinskaya, I. Samotaeva, V. Gaskin, and A. Bernstein. Pattern recognition pipeline for neuroimaging data. In Luca Pancioni, Friedhelm Schwenker, and Edmondo Trentin, editors, Artificial Neural Networks in Pattern Recognition, pages 306–319, Cham, 2018. Springer International Publishing.
  • [SAK18] M. Sharaev, A. Artemov, E. Kondratyeva, S. Sushchinskaya, E. Burnaev, A. Bernstein, R. Akzhigitov, and A. Andreev. Mri-based diagnostics of depression concomitant with epilepsy: in search of the potential biomarkers. In

    Proceedings of IEEE 5th International Conference on Data Science and Advanced Analytics

    , pages 555–564, 2018.
  • [SLJ15] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • [Smi02] Stephen M Smith. Fast robust automated brain extraction. Human brain mapping, 17(3):143–155, 2002.
  • [SVI16] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • [SZ14] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [SZCZ10] Suresh A Sadananthan, Weili Zheng, Michael WL Chee, and Vitali Zagorodnov. Skull stripping using graph cuts. NeuroImage, 49(1):225–239, 2010.
  • [TGJ15] Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 648–656, 2015.
  • [ZB17a] A. Zaytsev and E. Burnaev. Large scale variable fidelity surrogate modeling.

    Annals of Mathematics and Artificial Intelligence

    , 81(1):167–186, Oct 2017.
  • [ZB17b] A. Zaytsev and E. Burnaev.

    Minimax Approach to Variable Fidelity Data Interpolation.

    In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 652–661, Fort Lauderdale, FL, USA, 20–22 Apr 2017. PMLR.
  • [ZLL18] Meifang Zhu, Bing Liu, and Jin Li. Prediction of general fluid intelligence using cortical measurements and underlying genetic mechanisms. In IOP Conference Series: Materials Science and Engineering, volume 381, page 012186. IOP Publishing, 2018.