Fluid intelligence (Gf) refers to the ability to reason and to solve new problems independently of previously acquired knowledge. Gf is critical for a wide variety of cognitive tasks, and it is considered one of the most important factors in learning. Moreover, Gf is closely related to professional and educational success, especially in complex and demanding environments . The ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge 2019) provides 8556 subjects, age 9-10 years, with T1-weighted MR images and fluid intelligence which is withheld for testing subjects. The motivation of the ABCD-NP-Challenge 2019 is to discover the relationship between the brain and behavioral measures by leveraging the modern machine learning methods.
A few recent studies use structural MR images to predict fluid intelligence. Paul et al.  demonstrated that brain volume is correlated with quantitative reasoning and working memory. Wang et al. 
proposed a novel framework for the estimation of a subject’s intelligence quotient score with sparse learning based on the neuroimaging features. In this work, we utilize the T1-weighted MR images of adolescents to predict their fluid intelligence with a StackNet. While whole brain volumes have been examined in relation to aspects of intelligence, to our knowledge there has been no previous work which examines the predictive ability of whole brain parcellation distributions for fluid intelligence. The main contributions of our work are two-fold: (1) to predict pre-residualized fluid intelligence based on parcellation volume distributions, and (2) to show the significance of the volume of each region on the overall prediction.
2 Materials and Methods
The Adolescent Brain Cognitive Development Neurocognitive Prediction Challenge (ABCD-NP-Challenge 2019) [4, 6, 8, 15, 18] provides data for 3739 training subjects, 415 validation subjects and 4402 testing subjects (age 9-10 years). MR-T1 image is given for each subject, but the fluid intelligence scores are only provided for the training and validation subjects. MR-T1 images are distributed after skull-stripped and registered to the SRI 24 atlas  of voxel dimension . In addition to the MR-T1 images, the distributions of gray matter, white matter, and cerebrospinal fluid in different regions of interest according to the SRI 24 atlas are also provided for all subjects. The fluid intelligence scores are pre-residualized on data collection site, sociodemographic variables and brain volume. The provided scores should, therefore, represent differences in Gf not due to these known factors.
2.2 StackNet Design
is a computational, scalable and analytical framework that resembles a feed-forward neural network. It uses Wolpert’s stacked generalization
in multiple levels to improve the accuracy of classifier or reduce the error of regressor. In contrast to the backward propagation used by feed-forward neural networks during the training phase, StackNet is built iteratively one layer at a time (using stacked generalization), with each layer using the final target as its target.
There are two different modes of StackNet: (i) each layer directly uses the predictions from only one previous layer, and (ii) each layer uses the predictions from all previous layers including the input layer that is called restacking mode. StackNet is usually better than the best single model contained in each first layer. However, its ability to perform well still relies on a mix of strong and diverse single models in order to get the best out of this meta-modeling methodology.
We adapt the StackNet architecture for our problem based on the following ideas: (i) including more models which have similar prediction performance, (ii) having a linear model in each layer (iii) placing models with better performance on a higher layer, and (iv) increasing the diversity in each layer. The resulting StackNet, shown in Fig. 1, consists of three layers and 11 models. These models include one Bayesian ridge regressor 
, four random forest regressors, three extra-trees regressors 
, one gradient boosting regressor, one kernel ridge regressor , and one ridge regressor. The first layer has one linear regressor and five ensemble-based regressors, the second layer contains one linear regressor and two ensemble-based regressors, and the third layer only has one linear regressor. Each layer uses the predictions from all previous layers including the input layer.
2.3 Predicting Gf using Structural MR Images and StackNet
Fig. 2 shows the framework of predicting the fluid intelligence scores using MR-T1 images and a StackNet. The framework is implemented with the scikit-learn [2, 14] Python library. In the training phase, features are extracted from the MR-T1 images of training and validation subjects. We then apply normalization and feature selection on the extracted features. In the end, these pre-processed features are used to train the StackNet in Fig. 1. In the testing phase, features are extracted from the MR-T1 images of testing subjects, and the same feature pre-processing factors are applied to these extracted features. Thereafter, the pre-processed features are used with the trained StackNet to predict the fluid intelligence of the testing subjects. Details of each step are described below.
The ABCD-NP-Challenge 2019 data includes pre-computed 122-dimension feature that characterizes the volume of brain tissues, i.e., gray matter, white matter, and cerebrospinal fluid, parcellated into SRI-24  regions. The feature extracted for each subject is defined as where is the index of subject and is the index of feature dimension.
We apply a standard score normalization on each feature dimension, where is the index of subject, is the index of feature dimension, and and are the normalized and raw feature dimension of subject , respectively. and
are the mean and standard deviation of feature dimension, respectively.
2.3.2 Feature Selection:
Feature selection consists of three steps: (i) reducing the noise of data and generating an accurate representation of data through principal component analysis (PCA) with maximum-likelihood estimator 
(ii) removing the feature dimensions with the low variance between subjects, and (iii) selecting 24 feature dimensions with the highest correlations to the ground-truth Gf scores through univariate linear regression tests. Thereafter, the feature dimensions shrink from 120 to 24.
2.3.3 Training a StackNet:
Because the mean of the pre-residualized fluid intelligence for the training dataset () and validation dataset () are quite different, we combine these two datasets () for hyperparemeter optimization and training a StackNet.
2.3.4 Predicting Fluid Intelligence:
In the testing phase, we first apply the same pre-processing factors used in the training phase to the extracted features of testing subjects. We then use the trained StackNet with these pre-processed features to predict the fluid intelligence scores of testing subjects.
2.3.5 Evaluation Metric:
The mean squared error (MSE) is used to calculate the error between the predicted Gf scores and the corresponding ground-truth Gf scores.
2.4 Computing Feature Importance
We would like to discover the correlation between the Gf score and the brain tissue volume in a region. Thus, we compute the importance of each feature dimension, and higher importance represents a higher correlation. However, after feature selection, the original data space of dimension 122 is projected and reduced to a new space of dimension 24. In this new space, we first compute the importance of each feature dimension and then backward propagate it to the original data space of dimension 122. The details are explained as follows.
After dimensionality reduction, we obtain the individual correlations between the remaining 24 feature dimensions and the ground-truth Gf scores. These correlations are first converted to F values and then normalized w.r.t. the total F values of feature dimensions, i.e., , where is the normalized F value of feature dimension
. These normalized F values are used to build a normalized F vector asand , where and are the corresponding eigenvector and eigenvalue for , respectively. The dimension of is 122. We also normalize the eigenvalue vector w.r.t. the total value of eigenvalues, i.e., , where . The normalization for eigenvalues is required to ensure that they have the same scale as the F values. Thereafter, we use and to build the feature importance matrix . In the end, we sum up the absolute value of each element in every row of the matrix ,
is the feature importance vector in the original data space, and we also normalize it w.r.t. its total importance and rescale it,
Now, is the normalized feature importance vector in the original data space of dimension 122, and each value of this vector represents the importance of a brain tissue volume in a region for the task of predicting the Gf scores. Higher importance represents higher correlation to the Gf score.
3 Results and Discussion
We examine the Gf prediction performance of individual models and StackNet on the combined dataset with 10-fold cross-validation, with the quantitative results shown in Table 1. The baseline is calculated by assigning the mean fluid intelligence () to every subject in the combined dataset. From Table 1, the performance of each model is better than the baseline of guessing the mean every subject, and the performance of the StackNet is better than every individual model within itself because it takes advantage of stacked generalization.
The proposed StackNet in Fig. 1 is different from the StackNet which is used to report the MSE on the validation leader board. The StackNet used to report the MSE on the validation leader board has two layers and 8 models, and it achieves an MSE of 84.04 and 70.56 (rank 7 out of 17 teams) on the training and validation set, respectively. However, we noticed that statistics between the training set and validation are quite different, so we decided to combine these two datasets and work on this combined dataset ( and ) using 10-fold cross-validation. In addition, we also ensured that the mean and standard deviation of each fold is similar to the mean and standard deviation of combined dataset. The source code is available on GitHub111https://github.com/pykao/ABCD-MICCAI2019.
Second, we compute the importance of each dimension of the extracted feature by leveraging the F score from feature selection and eigenvectors and eigenvalues from PCA as described in Section2.4. Each dimension of the extracted feature corresponds to the volume of a certain type of brain tissue in a certain region. Table 2 and Table 3 show the top 10 most and least important feature dimensions for the task of predicting Gf, respectively, and higher importance represents higher correlation to the Gf scores.
In conclusion, we demonstrate that the proposed StackNet with the distribution of different brain tissues in different brain parcellation regions has the potential to predict fluid intelligence in adolescents.
|Pons white matter volume||1.18|
|Right insula gray matter volume||1.13|
|Right inferior temporal gyrus gray matter volume||1.11|
|Corpus callosum white matter volume||1.08|
|Cerebellum hemisphere white matter right volume||1.07|
|Cerebellum hemisphere white matter left volume||1.06|
|Left inferior temporal gyrus gray matter volume||1.06|
|Left insula gray matter volume||1.06|
|Left superior frontal gyrus, orbital part gray matter volume||1.05|
|Left opercular part of inferior frontal gyrus gray matter volume||1.05|
|Right hippocampus gray matter volume||0.53|
|Right amygdala gray matter volume||0.54|
|Left hippocampus gray matter volume||0.56|
|Right caudate nucleus gray matter volume||0.58|
|Right lobule IX of cerebellar hemisphere volume||0.60|
|Right lobule X of cerebellar hemisphere (flocculus) volume||0.60|
|Left lobule X of cerebellar hemisphere (flocculus) volume||0.61|
|Right superior parietal lobule gray matter volume||0.61|
|Left middle temporal pole gray matter volume||0.63|
|Left lobule IX of cerebellar hemisphere volume||0.63|
This research was partially supported by a National Institutes of Health (NIH) award # 5R01NS103774-02.
-  Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
-  Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning. pp. 108–122 (2013)
-  Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of statistics pp. 1189–1232 (2001)
-  Garavan, H., et al.: Recruiting the abcd sample: Design considerations and procedures. Developmental cognitive neuroscience 32, 16–22 (2018)
-  Geurts, P., et al.: Extremely randomized trees. Machine learning 63(1), 3–42 (2006)
-  Hagler, D.J., et al.: Image processing and analysis methods for the adolescent brain cognitive development study. bioRxiv (2018). https://doi.org/10.1101/457739
-  Jaeggi, S.M., et al.: Improving fluid intelligence with training on working memory. Proceedings of the National Academy of Sciences 105(19), 6829–6833 (2008)
-  Luciana, M., et al.: Adolescent neurocognitive development and impacts of substance use: Overview of the adolescent brain cognitive development (abcd) baseline neurocognition battery. Developmental cognitive neuroscience 32, 67–79 (2018)
MacKay, D.J.: Bayesian interpolation. Neural computation4(3), 415–447 (1992)
-  Michailidis, M.: Stacknet, meta modelling framework. https://github.com/kaz-Anova/StackNet (2017)
-  Minka, T.P.: Automatic choice of dimensionality for pca. In: Advances in neural information processing systems. pp. 598–604 (2001)
-  Murphy, K.P.: Machine learning: a probabilistic perspective (2012)
-  Paul, E.J., et al.: Dissociable brain biomarkers of fluid intelligence. NeuroImage 137, 201–211 (2016)
-  Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
-  Pfefferbaum, A., et al.: Altered brain developmental trajectories in adolescents after initiating drinking. American journal of psychiatry 175(4), 370–380 (2017)
-  Rohlfing, T., et al.: The sri24 multichannel atlas of normal adult human brain structure. Human brain mapping 31(5), 798–819 (2010)
-  Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(3), 611–622 (1999)
-  Volkow, N.D., et al.: The conception of the abcd study: From substance use to a broad nih collaboration. Developmental cognitive neuroscience 32, 4–7 (2018)
-  Wang, L., et al.: Mri-based intelligence quotient (iq) estimation with sparse learning. PloS one 10(3), e0117295 (2015)
-  Wolpert, D.H.: Stacked generalization. Neural networks 5(2), 241–259 (1992)