The human brain undergoes rapid development in the prenatal period. Previous studies show significant links between abnormal brain development in utero and adverse postnatal outcomes [1, 4]. As a result, understanding fetal neurodevelopment, particularly timescales and patterns for maturation of functional systems, is of great clinical importance.
Recent methodological advances in fetal functional Magnetic Resonance Imaging (fMRI) have attracted much attention [7, 4]. One promising advance is application of CNN-based methods to fetal brain MRI image segmentation . For fetal neurodevelopmental studies, existing works focus on analysis of the functional connectivity in fetal brain, such as discoveries of functional networks and identification of highly connected hubs [5, 18]. Those studies are based on functional connectivity measures (i.e. adjacency matrix of Fisher z-scores) extracted from the spatio-temporal 4D resting-state fMRI. However, from the perspective of computational methodology, those approaches rely heavily on the data processing steps for extracting z-scores and only utilize temporal information in the fMRI images, which miss the spatial information in the 2D or 3D fMRI images.
There is precedent for direct learning from MRI images in children and adults. For example, CNN has been used to classify neurological disorders such as Autism Spectrum Disorder (ASD)[10, 21] and Alzheimer’s disease [6, 8], and to characterize brain functional networks . Further, a small number of studies have built CNN models to predict age from fMRI data 
. Probably due to very limited availability of fetal fMRI data, CNN models have not been applied for understanding fetal neurodevelopment.
To bridge the methodological gap, we propose a novel application of deep CNN in conjunction with CNN interpretation techniques on the analysis of fetal brain fMRI data, by leveraging CNN’s merits of learning high-level feature representations. To the best of our knowledge, this work represents the first application of CNN models as a data-driven approach in understanding fetal developmental processes. Specifically, as age and brain development are closely linked, we utilize the age group as the surrogate target (younger v.s.
older) and train a supervised CNN directly on the residual fMRI images to capture associations between age effect and blood oxygen-level dependence (BOLD). The trained model achieves superior predictive performance over traditional machine learning algorithms, demonstrating its effectiveness in learning directly from fetal fMRI images. More importantly, since CNN is interpretable, we are able to explain the trained CNN to highlight brain regions that are strongly influential in model predictions. The interpretation results of important regions are clinically compelling that match with existing studies. Furthermore, the trained CNN model also proposes novel clinically sensible regions that are potentially informative in characterizing brain development in fetuses, demonstrating its strong promise as a data-driven approach to enhance our understanding of brain development as well as facilitate fetal developmental studies.
2.1 Data Acquisition and Preprocessing
Subjects in our analysis were subset of a cohort in a study of longitudinal fetal brain development. The cohort consists of 148 pregnant women that underwent MRI with gestational age (GA) between 24 and 40 weeks. We manually checked subjects according to data quality (high motion, artifacts et al.) and inclusionary criteria (gestational age, born without nervous system abnormality et al.), leaving 75 qualified subjects: 30 subjects with GA between 26 and 29 weeks (younger group), and 45 between 34 and 37 weeks (older group).
MRI data were acquired on a Siemens Verio 70-cm open-bore 3T MR system. Resting-state fMRI data were acquired using a gradient echo planar imaging sequence: TR/TE 2000/30 ms, flip angle , 360 frames, axial 4 mm slice thickness, voxel size 3.4x3.4x4 mm, repeated twice. Between 12 to 24 minutes of fetal resting-state fMRI data were collected per subject.
Extraction of fetal brain fMRI data requires multiple steps in preprocessing raw resting state fMRI images. In brief, periods of fetal quiescence were manually identified using FSL image viewer, wherein individual segments must consist of at least 20 seconds (10 frames) of low motion (
2 mm translation and/or 3 degrees rotational movement) (A). After motion censoring, fetal brain masks were then created separately for each low-motion epoch from a single reference image using Brainsuite (B). After masking, each temporal segment was reoriented manually, realigned to the mean BOLD volume, resampled to 2 mm isotropic voxels, and normalized to a 32-week fetal brain template  using SPM8  (C). All normalized images from each segment were then concatenated into one run, realigned to the mean BOLD volume, and smoothed with a 4 mm FWHM Gaussian kernel (D-E). Further preprocessing was performed in CONN toolbox (v14n)  including linear detrending, nuisance regression using aCompCor of five principal components extracted from a 32-week fetal atlas white matter and CSF mask, six head motion parameters, and band-pass filtering at 0.008 to 0.09 Hz (F). Fig. 1 shows the preprocessing pipeline from raw MRI to residual fMRI used in our model.
2.2 Modeling Age Effect on Neurodevelopment with 3D CNN
For each subject, the data are a time series of 3D fMRI volumes of dimension . To exploit the spatio-temporal information in fMRI, we take a sliding window approach . Specifically, starting from the first 3D frame of 4D fMRI whose length is , we take the mean 3D fMRI image in a window of size
over the time axis; as the window slides over the time axis with stride, a total of 3D fMRI images are generated, which significantly increases the sample size in training data. After initial experiments, (or ) and achieve good performance and increasing window size leads to over-smoothing and degrade model performance. As mentioned in Section 1, younger (26GA29 weeks) and older (34GA37 weeks) age groups are used as the surrogate target. For each 3D fMRI generated from sliding window, we label it with the subject’s age group.
To learn fetal neurodevelopmental processes by classifying 3D fMRI, we build an effective 3D CNN to capture the association between residual fMRI and age group. The proposed CNN along with its architectural parameters (i.e. kernel size, stride, channel et al.) is shown in Fig. 2
. It has two convolutional layers with ReLU non-linear activation and one fully connected layer, where each convolutional layer is followed by a max pooling layer. Model predictions are made with sigmoid activation on the output layer after the fully connected layer. The objective function is hence the binary cross entropy plusregularization on the network weights:
where (older group) or (younger group) is the true label of , is the probability modeled by CNN, and is the tuning parameter of regularization.
2.3 Model Interpretation
As our primary goal is to understand the age effect on fetal brain development, model interpretation is critical to our CNN approach. Once CNN is trained, we perform sensitivity analysis (SA) [17, 14]
on leave-out testing images to identify important regions of interests (ROIs). The sensitivity score for each image pixel is the squared gradient with respect to the CNN output (i.e. probability of being younger or older), which measures how sensitive the prediction is to the change of voxel values. The calculation of sensitivity score only needs one pass of backpropagation with respect to the input image. For fetal fMRI images, a region (i.e. multiple voxels in a neighborhood) in the brain with larger sensitivity scores is more influential, hence more important in predicting age group.
2.4 Training Setup
The CNN is trained with classic stochastic gradient decent algorithm (SGD) with momentum set to 0.8. The learning rate is set to 0.1 initially and decreased by multiplying a factor 0.2 for every 7 epochs. regularization with
is used to prevent overfitting. The size of mini-batch in each epoch is set to 128. We apply early stopping as another regularization in model validation. In our practice, training usually is stopped at 15 epochs. Model parameters are initialized with uniform distribution onwhere
is reciprocal of the number of weights in each layer. We implement our CNN in Pytorch.
To avoid model seeing images of the same subject in testing and training (since each subject generates multiple 3D fMRI images), we split the dataset at subject level. In the experiments, data are divided with stratification into training, validation and testing sets by 80%/10%/10%. All images are normalized by subtracting mean image and then divided by the maximal absolute intensity value, both calculated from training data. The splitting procedure results in about 9300 images in training, 1200 in validation and testing. We repeat this procedure 10 times in experiments. To evaluate model performance, we use F1 score which is calculated as 2(precision
recall)/(precision+recall) and Area under ROC curve (AUC) as the evaluation metric, and the average F1 and AUC over 10 runs are reported.
3 Experimental Results
3.1 Predictive Performance
For performance comparison, we also test several baseline classification algorithms, including random forest (RF), gradient boosting machine (GBM) and logistic regression (LR) with L2 regularization. These alternatives were tested on two types of data: (1) fetal brain functional connectivity matrices extracted by correlating resting-state fMRI time series data across 100 brain regions (Fisher z-score), (2) fetal BOLD fMRI images. For fMRI, due to the high dimensionality (dim= 39984)111
We remove zero columns after flattening original 3D fMRI into a vector of size
and strong correlations among neighboring voxels in fMRI, we use principal component analysis (PCA) for dimension reduction before testing on baselines. We select the optimal number of PCA components as well as optimal parameters of classification models using the validation data. In our experiments, 100 components achieves good performance and including more components does not necessarily lead to performance gain.
The predictive performance (F1 score and AUC) on testing data at subject level is shown in Table 1. Note that subject level prediction is made with soft voting: probability for each subject is averaged across its corresponding 3D images. The probability threshold for classification is chosen as 0.5 for calculating F1 score. We see from Table 1
that CNN trained directly on fMRI images achieves the best and robust performance (F1 0.84 with standard deviation 0.05, AUC 0.91 with 0.06), due to CNN’s merit of capturing spatial information of fMRI images in classification. The improvements demonstrate that CNN can effectively capture discriminative information that could reveal associations between age effect and brain development.
|Fisher z-score||fMRI Image|
|CNN||-||-||0.84 (0.05)||0.91 (0.06)|
|RF||0.76 (0.03)||0.69 (0.18)||0.79 (0.06)||0.77 (0.15)|
|GBM||0.73 (0.09)||0.69 (0.17)||0.81 (0.06)||0.77 (0.18)|
|LR||0.77 (0.06)||0.75 (0.18)||0.77 (0.01)||0.67 (0.14)|
3.2 Inference About Fetal Neurodevelopment from CNN Classification
It was possible to discriminate older versus younger fetuses on the basis of spontaneous baseline resting-state BOLD measurements. BOLD reflects changing regional blood concentrations of oxy- and deoxy-hemoglobin, which are influenced at least in part by regional metabolic activity . Having observed that fetal age can be differentiated on the basis of these fluctuating signals suggests that patterns in baseline BOLD during resting-state reflect aspects of brain maturation.
Sensitivity analysis was used to identify regions within each age group where change in voxel values most altered strength of the prediction. Important regions (identified with large sensitivity scores, see Section 2.3) for both younger and older fetus groups are shown in Fig. 3. We found that sensitive regions (1) have a high degree of spatial overlap across both age groups, (2) are largely bilaterally distributed, (3) encompass brain regions that have been identified as having high baseline metabolic activity in positron emission tomography (PET) studies in human newborns and infants, including subcortex, thalamus, and medial temporal lobe .
Specifically, across groups we observe high sensitivity in bilateral occipital cortex, bilateral ventrolateral prefrontal cortex (vlPFC), sub-cortex, posterior cingulate cortex (PCC), medial temporal lobe (MTL), thalamus, and cerebellum. In addition, baseline BOLD in anterior cingulate cortex (ACC), hypothalamus, and insula regions are important for group classification in older fetuses.
In this paper, we propose a novel application of CNN for interpretation of fetal brain age effects directly from fMRI images. The predictive performance of CNN demonstrate that it can well capture associations between age and variation in BOLD signal. To better understand the relevance of our predictive CNN to fetal development, we use sensitivity analysis to isolate regions critical for CNN performance, and discovered that our most sensitive regions were regions that are high in metabolic activity in early human brain development. The experimental results reveal compelling associations and demonstrate potential promise of CNN applied to spontaneous BOLD activity as a data-driven approach to understand fetal developmental processes.
With those identified important ROIs, in our future studies, we plan to perform seed-based analyses to examine how functional connectivities and networks of those ROIs are associated with whole-brain age effects.
-  Benkarim, O.M., et al: Toward the automatic quantification of in utero brain development in 3d structural mri: a review. Human brain mapping 38(5), 2772–2787 (2017)
-  Chugani, H.T.: Imaging brain metabolism in the newborn. Journal of child neurology 33(13), 851–860 (2018)
-  Fukunaga, M., Horovitz, S.G., De Zwart, J.A., Van Gelderen, P., Balkin, T.J., Braun, A.R., Duyn, J.H.: Metabolic origin of bold signal fluctuations in the absence of stimuli. Journal of Cerebral Blood Flow & Metabolism 28(7), 1377–1387 (2008)
-  van den Heuvel, M.I., Thomason, M.E.: Functional connectivity of the human brain in utero. Trends in cognitive sciences 20(12), 931–939 (2016)
-  van den Heuvel, M.I., Turk, E., Manning, J.H., Hect, J., Hernandez-Andrade, E., Hassan, S.S., Romero, R., van den Heuvel, M.P., Thomason, M.E.: Hubs in the human fetal brain network. Developmental cognitive neuroscience 30, 108–115 (2018)
-  Hosseini-Asl, E., Gimel’farb, G., El-Baz, A.: Alzheimer’s disease diagnostics by a deeply supervised adaptable 3d convolutional network. arXiv preprint arXiv:1607.00556 (2016)
-  Konkel, L.: The brain before birth: Using fmri to explore the secrets of fetal neurodevelopment (2018)
-  Korolev, S., et al: Residual and plain convolutional neural networks for 3d brain mri classification. In: ISBI. pp. 835–838. IEEE (2017)
-  Li, H., Satterthwaite, T.D., Fan, Y.: Brain age prediction based on resting-state functional connectivity patterns using convolutional neural networks. In: ISBI. pp. 101–104. IEEE (2018)
-  Li, X., Dvornek, N.C., Zhuang, J., Ventola, P., Duncan, J.S.: Brain biomarker interpretation in asd using deep learning and fmri. In: MICCAI. pp. 206–214. Springer (2018)
-  Makropoulos, A., Counsell, S.J., Rueckert, D.: A review on automatic fetal and neonatal brain mri segmentation. NeuroImage 170, 231–248 (2018)
-  Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
-  Penny, W.D., Friston, K.J., Ashburner, J.T., Kiebel, S.J., Nichols, T.E.: Statistical parametric mapping: the analysis of functional brain images. Elsevier (2011)
-  Rieke, J., et al: Visualizing convolutional networks for mri-based diagnosis of alzheimer’s disease. In: Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pp. 24–31. Springer (2018)
-  Serag, A., et al: Construction of a consistent high-definition spatio-temporal atlas of the developing brain using adaptive kernel regression. NeuroImage 59(3), 2255–2265 (2012)
-  Shattuck, D.W., Leahy, R.M.: Brainsuite: an automated cortical surface identification tool. Medical Image Analysis 6(2), 129–142 (2002)
-  Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
-  Thomason, M.E., et al: Age-related increases in long-range connectivity in fetal functional neural connectivity networks in utero. Developmental Cognitive Neuroscience 11, 96 – 104 (2015)
-  Whitfield, S., Nieto, A.: Conn: a functional connectivity toolbox for correlated and anticorrelated brain networks. Brain connectivity 2(3), 125–141 (2012)
-  Zhao, Y., et al: Automatic recognition of fmri-derived functional networks using 3-d convolutional neural networks. IEEE Transactions on Biomedical Engineering 65(9), 1975–1984 (2018)
-  Zhao, Y., Ge, F., Zhang, S., Liu, T.: 3d deep convolutional neural network revealed the value of brain network overlap in differentiating autism spectrum disorder from healthy controls. In: MICCAI. pp. 172–180. Springer (2018)