There is growing scientific interest in understanding functional and structural organizations of the human brain from a large scale of multimodal brain imaging data. In medical imaging analysis, one of the popular ways for this task is to explore brain regional connections (i.e., brain networks) measured from the brain imaging signals. The topological patterns of brain networks are closely related to the brain functional organizations  and the connection breakdown between the relevant brain regions has an intimate association with the progress of neurodegenerative diseases [12, 22] or normal brain developments . However, patterns of focal damages in brain networks are different across modalities, making the mining of multimodal network changes difficult.
Deep learning methods have been successfully applied to extract biological information from the neuroimaging data [24, 29]. Most of the prior brain network analysis represent graph structure as a grid-like image to enable convolutional computation [21, 7, 34]. More recently, deep graph convolutional networks (GCNs) have been introduced to brain network research [1, 14, 16]. These studies perform the localized convolutional operation at either graph nodes or edges. They can be categorized into the graph spectral convolution [1, 16] and the graph spatial convolution . The former approach is suitable for node-centric problems defined on the fixed-sized neighborhood graphs. For graph-centric problems, the spectral method requires a group-wise graph structure before approximating the spectral graph convolution. Therefore, its performance to a large extent depends on the predefined network basis. However, the existing framework  is designed for a single modality and lacks a well defined k-hop convolutional operator on each node. This makes the multimodal brain network fusion intractable in the node domain and thus difficult to draw brain saliency maps.
In this paper, we propose a novel GCN model for multimodal brain networks analysis. Two naturally coherent brain network modalities, i.e., functional and structural brain networks, are considered. The structural network acts as the anatomical skeleton to constrain brain functional activities and, in return, consistent functional activities reshape the structural network in the long term . Hence, we argue the existence of a high-level dependency, namely networks communication , across them. It is deciphered by a deep encoding-decoding graph network in our model. Meanwhile, the obtained node features help representation learning of brain network structure in a supervised manner. The contributions can be summarised into four-folds. (1) It is the first paper using a deep graph learning to model brain functions evolving from its structural basis. (2) We propose an end-to-end automatic brain network representation framework based on the intrinsic graph topology. (3) We model the cross-modality relationship through a deep graph encoding-decoding process based on the proposed multi-stage graph convolutional kernel. (4) We draw graph saliency maps subject to the supervised tasks, enabling phenotypic and disease-related biomarker detection.
Multimodal Brain Network Data. A brain network uses a graph structure to describe interconnections between brain regions and is a weighted graph , where is the node set indicating brain regions, is the edges set and is the corresponding edge weight; For a given subject, we have a pair of networks , where represents the functional brain network and is the structural brain network. These two networks share the same set of nodes, i.e., using an identical definition of brain regions, but differ in network topology and edge weights. An edge weight in is the correlation of fMRI signals between node and , while a structural edge weight in
is the probability of fiber tractography between them.
2.1 Multi-Stage Graph Convolution Kernel
A brain structural network can be interpreted as a freeway net where biological information such as brain functional signals flows from node to node. In the brain network, a node shall be affected by its neighboring nodes and their affection is negatively correlated with the shortest network distance . To encode these node-to-node patterns, we adopt the spatial graph convolution kernel which will give the node embedding features with respect to the local graph topology. It defines a way to aggregate node features in a given size of neighborhood, e.g., 1-hop connections.
Given a target node and its neighbourhood graph topology , the graph convolution kernel first collects node features of its immediate neighbours:
and then updates the node feature as:
Here, is a non-linear activation and is a learnable weight matrix of a fully-connected layer (FC). Previous research proves that a -hop convolution kernel can be divided into 1-hop convolutions . Therefore, we stack several 1-hop convolutions to increase size of the effective receptive field on graphs.
A potential problem with Eq. 1 is its poor generalization of the local aggregation, i.e., the aggregation weight is fixed to be . Though these predefined values reflect the brain biological profiles, they might not be optimal for brain network encoding, especially for the cross-modality learning pursued by our research. For example, brain regions that are interconnected with large weights in the brain structural network are not guaranteed to be more strongly connected in the brain functional network as well . Besides, compared with brain structural networks, brain functional networks are more dynamic and fluctuant on the edge connections. Therefore, the dynamic adjustment of the aggregation weights during graph learning is favored. To this end, we adopt the idea of graph attention network (GAT) 
. Given each pair of node features, their dynamic edge weights are learned by a single-layer feedforward neural network, i.e.,
. More specifically, we first increase the expression power of the node features by using a shared linear transformation,, where is a learned parameter. Then, we use a single-layer feedforward neural network to derive the edge weight:
where is the concatenate operator and is a parameter of the feedforward network. To assure generalization of Eq. 3
across different nodes, a softmax layer is append for normalization of the neighbourhood,
Compared with , is associated with the node order and thus is asymmetric on edge . Besides, it is free of local network topology. In addition to the graph attention based aggregation (Fig. 1, A), we also propose a binary symmetric aggregation defined with a threshould function (Fig. 1, B). thresholds an edge by a given threshould value , e.g., aggregation weight will be 1 if , otherwise 0. We set empirically in this study. This process follows an assumption that two brain regions are highly interactive in functional brain network as long as they are structurally connected . To integrate all of the aggregation mechanisms, we design a multi-stage graph convolution kernel (MGCK). Eq. 1 is thus updated as:
where and are learnable parameters balancing different aggregation mechanisms. In the above equation, we have 4 different aggregation weights. and are the pre-defined network connections with and without attention weights. is the attention aggregation alone and is the threshold connections. In the end, we introduce the multi-head learning  to stabilize the aggregation in MGCK. independent multi-stage aggregation are conducted and aggregated features are concatenated before feeding to a FC layer. Accordingly, Eq. 2 is updated as:
Previous research indicates that graph convolution network performs poorly with a deep architecture due to the high complexity of back-propagation in the deep layers. To address this problem, residual block in GCN  is proposed. It is inspired by the success of ResNet 
for image data. We add the residual connection after MGCK,
is a FC layer parameterized by . Parameter is designed to match the dimensions.
2.2 Deep Multimodal Brain Networks (DMBN)
We show the pipeline of DMBN in Fig. 2. It generates the multimodal graph node representations for different learning tasks. There are two parts in DMBN. The first part is for cross-modality learning via an encoding-decoding network. Here, we construct brain functional network from brain structural network. The brain functional network contains both positive and negative connections. These two types of brain functional connectivities yield a distinct relationship with brain structural network [11, 26]. Hence, we separate their encoding into two independent encoding networks. For each graph encoder, we use several MGCK layers to aggregate node features from diverse ranges of the neighborhood in structural network. The generated node features are then fed into the decoding networks to reconstruct the positive and negative connections respectively. Specifically, for each undirected edge , we define the reconstructed links as:
is a node feature vector in the network embedding space andis a learnable layer weight. Eq. 8 maps the deep node embeddings to a connection matrix where each element ranges from 0 to 1 consisting with the functional connections.
The second part of our model is a supervised learning. The node embedding features () from the positive and negative encoding networks are concatenated node-wisely and processed by an MLP. Since our tasks are graph level learning, a global pooling is applied before the last FC layer to remove the effect of node orders. Along with the supervised learning tasks, it is important to understand the key brain regions closely associated with the tasks. Inspired by the classic activation maps , a graph localization strategy is carried out by learning contribution scores of graph nodes. As shown in Fig. 2, suppose the final node feature matrix consists of channels for nodes, a global mean pooling generates a channel-wise vector treated as the network feature. Therefore, each channel has a corresponding weight, , learned by the last FC layer. To obtain the node-wise importance score, we warp it back by an inner product between node features and channel weights, i.e., . In the end, we rank the top- nodes for each subject and conduct a group voting to obtain the group-wise saliency map.
There are 3 loss terms in DMBN controlling the brain network reconstruction and supervised learning tasks (Eq. 9). The reconstruction loss consists of the global and local decoding losses to preserve different levels of graph topology.
1) Global Decoding Loss.
This term evaluates the averaged performance of edge reconstruction in the target network.
where is the additional penalty of the edge reconstruction. Here, we set it as , which gives the higher weights for stronger connections in brain functional network. and indicate the decoded network connections from the positive and negative flow of encoding.
2) Local Decoding Loss.
The cross-modality reconstruction of brain networks is challenging, hence we do not expect a full recovery of all edges but rather the reconstruction of local graph structure on important connections, e.g., edges with strong connections in both structural and functional networks. We adopt the first-order proximity 
to capture the local structure. The loss function is defined as:
where is the number of neighbouring nodes of in brain structural network. is a threshold function which favors strong generalization. Eq. 11 generalizes Laplacian Eigenmaps  and drives nodes with similar embedding features together.
3) Supervised Loss.
The loss function for prediction is defined as:
where is the number of subjects and is a function learned by the MLP network.
3.1 Gender Prediction
The data are from the WU-Minn HCP 1200 Subjects Data Release . We include 746 healthy subjects (339 males, 407 females), each has high-quality resting fMRI and dMRI data. The functional network is processed using CONN toolbox  and structural connectivity is measured by using FSL toolbox . Here we try to predict the gender based on the multimodal brain network topology. Previous research has shown the strong relationship between gender and brain connectivity patterns .
, are transitional machine learning algorithms while the rest two, i.e. BrainNetCNN and Brain-Cheby 
use deep models. In addition, 5 variant models of MDBN are tested in the experiments as an ablation study. We apply the 5-fold cross-validation for all methods. In our model setting, the positive connection encoding has 5 cascade MGCK layers and negative connection encoding has 4 MGCK layers. In each encoding, each of MGCKs has the feature dimension  and 4-heads learning. We report the statistical results with three evaluation metrics: accuracy, precision, and F1 scores. Besides, we take a grid search to decide hyperparametersand . Based on the empirical knowledge, we set the search range for as [10, 1, 0.1, 0.01] and as [5, 1, 0.5, 0.1]. The best result appears at and . Details can be found in Supplementary Fig.1.
As shown in the Tab. 1 (HCP), our model achieves the highest accuracy (
) in the gender prediction among all the methods and significantly outperforms the others with at least 8% and 10% increases in accuracy and F1 scores, respectively. Generally, deep models are superior to the traditional node embedding method (tBNE). We notice that, when we remove the cross-modality learning, i.e., variant methods denoted by w/o Recon, the performance drops significantly. Though they are still comparable to the other baselines, the training process is unstable with a high variance. The cross-modality learning enables node-level learning to be effective and consequently affects further graph-level learning. In addition, the 10 most important brain regions affecting the gender prediction are shown in Supplementary Fig. 2. These regions spread at the cortical areas including the frontal and orbital gyrus, precentral gyrus, insular gyrus, as well as the subcortical areas such as basal ganglia. All those regions play vital roles in regulating cognitive functioning, motor and emotion controls, which, with a high probability, exert the gender discrepancy[23, 25].
We explore influence of each element in our model (Tab. 1). We first remove the decoding network that makes our model a single modality learning (w/o Recon). Under such a configuration, our model is still comparable to the baselines. However, the decreased performance suggests the cross-modality is indispensable to an informative network representation. Based on this setting, we further evaluate the role of each aggregation mechanism in MGCK. We remove the threshold aggregation weight (w/o TAGGRecon) and graph attention aggregation (w/o AAGGRecon) respectively. All of them cause a significant decrease in performance. In addition to the single modality learning, we also validate the importance of different reconstruction losses in multimodal learning. Missing the local (MDBN w/o Local) or global (MDBN w/o Global) losses results in around 3% downgrade in prediction accuracy. Meanwhile, the global reconstruction loss yields a larger weight than the local reconstruction loss. Since the global loss considers all of the edges in the functional network, it contains relatively more fruitful information than the local loss which focuses on the direct edges in the structural network. However, they are complementary to each other.
To validate the efficacy of cross-modality learning, we turn off the prediction tasks, i.e., only keeping the reconstruction losses during training. Results have been shown in Fig. 3. We present the predicted functional networks of a randomly selected sample and the group average of the whole testing data. From the sparse structural networks, the corresponding functional connections have been correctly predicted and major patterns of the local network connections are captured. To further prove the accuracy, we conduct the statistical analysis on edges. Both direct and indirect edges in the target functional network are highly correlated with the predicted edges (Spearman correlation, overall is with ), where the direct edges, , are slightly greater than the indirect edges, . We also prove the robustness of our model to the different sparsity levels of brain structural networks and results are shown in Supplementary Fig. 4.
3.2 Disease Classification
In addition to the gender prediction in the healthy subjects, we retest our model on the disease classification. In this experiment, we include 323 subjects from Parkinson’s Progression Markers Initiative (PPMI)  and 224 of them are patients of Parkinson’s disease (PD). We follow the experimental setting in gender prediction. and are used according to the grid search.
We consider the state-of-the-art baseline methods for comparison. The results are shown in Tab. 1 (PPMI). Our model achieves the best prediction performance than other models (improving the accuracy by 5% than BrainNetCNN, 9% than Brain-Cheby and other baselines). Moreover, It also shows adding the cross-modality reconstruction do upgrade the performance. We locate the 10 key regions associating with the PD classification via the saliency map, see Supplementary Fig. 3. Most of the salient regions locate at the subcortical structures, such as the bilateral hippocampus and basal ganglia. These structures are conventionally conceived as the biomarkers of PD in medical imaging analysis [19, 5].
We propose a novel multimodal brain network fusion framework based on a deep graph modal. The cross-modality network embedding is generated by an encoding-decoding network. The network embedding is also supervised by the prediction tasks. Eventually, the learned node features contribute to the brain saliency map for detecting disease-related biomarkers. In the future, we plan to extend our model to other learning tasks such as brain cortical parcellation and cognitive activity prediction.
This work was supported in part by NIH (RF1AG051710 and R01EB025032). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
-  (2018) Graph saliency maps through spectral convolutional networks: application to sex classification with brain connectivity. arXiv preprint arXiv:1806.01764. Cited by: §1, §2.2.
-  (2018) Communication dynamics in complex brain networks. Nature Reviews Neuroscience. Cited by: §1.
-  (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §2.2.
-  (2012) The economy of brain network organization. Nature Reviews Neuroscience 13 (5), pp. 336. Cited by: §1, §1.
-  (2003) Parkinson’s disease is associated with hippocampal atrophy. Movement Disorders 18 (7), pp. 784–790. Cited by: §3.2.
T-BNE: tensor-based brain network embedding. In Proceedings of SIAM International Conference on Data Mining (SDM), Cited by: §3.1, Table 1.
-  (2015) Fully connected cascade artificial neural network architecture for attention deficit hyperactivity disorder classification from functional magnetic resonance imaging data. IEEE transactions on cybernetics. Cited by: §1.
-  (2015) Multimodal analysis of functional and structural disconnection in a lzheimer’s disease using multiple kernel svm. Human brain mapping 36 (6), pp. 2118–2131. Cited by: §3.1, Table 1.
-  (2017) Inductive representation learning on large graphs. In NIPS, Cited by: §1.
-  (2016) Deep residual learning for image recognition. In , pp. 770–778. Cited by: §2.1.
-  (2009) Predicting human resting-state functional connectivity from structural connectivity. Proceedings of the National Academy of Sciences 106 (6), pp. 2035–2040. Cited by: §2.2.
-  (2015) Functional brain network changes associated with clinical and biochemical measures of the severity of hepatic encephalopathy. Neuroimage. Cited by: §1.
-  (2012) Fsl. Neuroimage. Cited by: §3.1.
BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage. Cited by: §1, §3.1, Table 1.
-  (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §2.1.
-  (2018) Metric learning with spectral graph convolutions on brain connectivity networks. NeuroImage. Cited by: §1, §3.1, Table 1.
-  (2019) Deepgcns: can gcns go as deep as cnns?. In Proceedings of the IEEE International Conference on Computer Vision, pp. 9267–9276. Cited by: §2.1.
-  (2011) The parkinson progression marker initiative (ppmi). Progress in neurobiology 95 (4), pp. 629–635. Cited by: §3.2.
-  (2000) Pathophysiology of the basal ganglia in parkinson’s disease. Trends in neurosciences 23, pp. S8–S19. Cited by: §3.2.
-  (2019) System-level matching of structural and functional connectomes in the human brain. NeuroImage 199, pp. 93–104. Cited by: §2.1.
-  (2018) Reading the (functional) writing on the (structural) wall: multimodal fusion of brain structure and function via a deep neural network based translation approach reveals novel impairments in schizophrenia. NeuroImage. Cited by: §1.
-  (2011) Brain network connectivity in individuals with schizophrenia and their siblings. Biological psychiatry. Cited by: §1.
-  (2012) Normal sexual dimorphism in the human basal ganglia. Human brain mapping. Cited by: §3.1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, Cited by: §1.
-  (2014) A meta-analysis of sex differences in human brain structure. Neuroscience & Biobehavioral Reviews. Cited by: §3.1, §3.1.
-  (2011) Negative edges and soft thresholding in complex network analysis of resting state functional connectivity data. Neuroimage 55 (3), pp. 1132–1146. Cited by: §2.2.
-  (2016) The relation between structural and functional connectivity patterns in complex brain networks. International Journal of Psychophysiology. Cited by: §2.1, §2.1.
-  (2011) Discriminating schizophrenia and bipolar disorder by fusing fmri and dti in a multimodal cca+ joint ica model. Neuroimage 57 (3), pp. 839–855. Cited by: §3.1, Table 1.
-  (2013) Deep learning-based feature representation for ad/mci classification. In MICCAI, Cited by: §1.
-  (2013) The WU-Minn human connectome project: an overview. Neuroimage 80, pp. 62–79. Cited by: §3.1.
-  (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §2.1.
-  (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2.1.
-  (2016) Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. Cited by: §2.2.
-  (2017) Structural deep brain network mining. In ACM SIGKDD, Cited by: §1.
-  (2012) Conn: a functional connectivity toolbox for correlated and anticorrelated brain networks. Brain connectivity 2 (3), pp. 125–141. Cited by: §3.1.
-  (2018) Multimodal fusion of brain networks with longitudinal couplings. In MICCAI, Cited by: §1.