I Introduction
Functional neuroimaging techniques provide information about the function of the brain. Among the techniques, functional magnetic resonance imaging (fMRI) is a noninvasive method for measuring temporal dynamics of blood oxygen level dependency (BOLD) signal as an indirect indicator of the neuronal activity
[1].One of the standard method in fMRI analysis is analyzing the functional connectivity of brains [2, 3, 4]. Functional connectivity is defined as the level of coactivation between a pair of brain regions. Evidences support that the functional connectivity between certain regions compose specific networks during rest [5, 6]. Accordingly, network neuroscientist are interested in studying these brain networks and their relevance to the cognitive function and disorders of the brain [7].
In this regards, graphs provide an efficient way to mathematically model nonregular interactions between data in terms of nodes and edges [8, 9, 10]. The network of the brain can be modeled as a graph consisting of ROIs as the nodes and their functional connectivity as the edges. In classical graph theoretic approaches, various graph metrics including local/global efficiency, average path length, and smallworldedness, are computed to analyze the brain networks [11]. These metrics could be further used for group comparison to reveal the different network properties, providing insights to the physiological charactersitics and the disorders of the brain [12, 13].
Recently, there have been remarkable progresses and growing interests in Graph Neural Networks (GNNs), which comprise graph operations performed by deep neural networks (see the extensive survey in [14]
). The GNNs are suitable for solving tasks such as node classification, edge prediction, graph classification, etc. Usual GNNs typically integrate the features at each layer to embed each node features into a predefined next layer feature vector. The integration process is implemented by choosing appropriate functions for aggregating features of the neighborhood nodes. As one layer in the GNN aggregates its 1hop neighbors, each node feature is embedded with features within its
hop neighbors of the graph after aggregating layers. The feature of the whole graph is then extracted by applying a readout function to the embedded node features.Considering the development of GNNs, it is not surprising that there are keen interests in applying GNNs to fMRI data analysis. For example, some works have applied the GNN to classify one’s phenotypic status based on the graph structure of the brain functional networks [15, 16, 17, 18, 19]. Some other works employed the GNN to classify the subjects, not only based on the imaging data, but also including the nonimage phenotypic data [20, 21, 22]. Despite the early contribution of these works in applying the GNNs for fMRI analysis, there exists a common limitation in that they often fail to provide proper mapping of the ROIs for neuroscientific interpretation. There are attempts to address the issue of neuroscientific interpretability by visualizing the important features of the brain with the saliency mapping methods [23, 24, 18].
By extending these ideas, here we revisit that the Graph Isomorphism Network (GIN) structure [25], which is known as a spatialbased convolutional GNN, to understand the difference from the spectraldomain approaches such as the graph convolutional network (GCN) [26]. Our analysis shows that while GIN is similar to GCN in learning the spectral filters from graphs, GIN can be considered as a dual representation of the convolutional neural network (CNN) with twotab convolution filter in the graph space where the adjacent matrix is defined as a generalized shift operation. This leads to a closedform representation of the GIN in terms of basislike expansions as in [27]. With this novel generalization, we can visualize the important brain regions that are related to a certain phenotypic difference. Experimental results on sex classification confirm our method can provide stable classification results and provides interpretability of the classification results in terms of saliency maps, which correspond well with the previous functional neuroimaging results that studied the sex differences on the restingstate fMRI (rsfMRI).
Ii Related Works
Iia Mathematical Preliminaries
We denote a graph with a set of vertices with and edges , where an edge connects vertices and if they are adjacent or neighbors. The set of neighborhoods of a vertex is denoted by . For weighted graphs, the edge has a real value. If is an unweighted graph, then is a sparse matrix with elements of either 0 or 1.
When analyzing the fMRI data, the functional connectivity between two regions of the brain is often computed from the Pearson correlation coefficient between the fMRI time series. Specifically, the Pearson correlation coefficient between the fMRI time series at the vertex and the fMRI time series at the vertex is given by
where is the cross covariance between and , and
denotes the standard deviation of
. Unweighted graph edge can be derived from the functional connectivity by thresholding the correlation coefficients by a certain threshold.For a simple unweighted graph with vertex set , the adjacency matrix is a square matrix such that its element is one when there is an edge from vertex to vertex , and zero when there is no edge. For the given adjacent matrix , the graph Laplacian and its normalized version are then defined by
(1) 
where is the degree matrix with the diagonal element
(2) 
and zeros elsewhere.
Graph Laplacian is useful for signal processing on a graph [28, 29, 30]. More specifically, the graph convolution for realvalued functions on the set of the graph’s vertices, is often defined by
(3) 
where the superscript denotes the adjoint operation, is the matrix composed of singular vectors of the normalized graph Laplacian, i.e.
(4) 
where
denotes the diagonal matrices with the singular values, which is often referred to as the graph spectrum.
IiB Graph Neural Networks
The goal of GNNs for the graph classification task is to learn a nonlinear mapping from a graph to a feature vector:
(5) 
where is a feature vector of the whole graph that helps predicting the labels of the graph. Recent perspective distinguish the GNNs into two groups based on the neighborhood aggregating schemes [14]. First group is the spectralbased convolutional GNNs (spectral GNN). This group of GNNs are inspired by the spectral decomposition of the graphs, and aim to approximate the spectral filters in each aggregating layers [31, 26]. The other group of GNNs are the spatialbased convolutional GNNs (spatial GNN). They do not explicitly aim to learn spectral features of graph, but rather implement the neighborhood aggregation based on the nodes’ spatial relations. Some wellknown examples of the spatialbased convolutional GNNs are the Message Passing Neural Network (MPNN) [32] and the GIN [25]. In this section, we provide a brief review of the these approaches to understand their relationships.
IiB1 Spectralbased Convolutional GNNs
Spectralbased convolutional GNNs are based on the graph convolution relationship (3), in which is replaced by the parameterixed graph spectrum :
More specifically, the graph convolutional layer of the spectral GNN is then implemented as follows:
(6) 
where is a elementbyelement nonlinearity, is the graph signal at the channel and is a diagonal matrix that parameterized the graph spectrum with learnable parameters.
Note that spectral GNN using (6) is a direct way of learning the convolution filter in (3) in terms of graph spectrum. As graph spectrum correspond to the frequency components that provide rich information about the topological variations of the signals on graph [33], spectral GNNs may be useful for analyzing the brain functional networks. Unfortunately, due to the eigendecomposition of the Laplacian matrix, spectral GNNs have several limitations [14]. First, any perturbation to a graph results in a change of eigenbasis so that the learned filters are dependent on the domain. Therefore, for different group of graphs, which may have different eigenbasis, it is difficult to apply spectral GNN. Moreover, eigendecomposition requires high computational complexity, which is difficult to use brain connectivity analysis.
To address these issues, GCN was proposed as the firstorder approximation of the spectral GNN [26]. Specifically, the authors of [26] showed that the first orderapproximation of the Chebyshev expansion of the spectral convolution operation can be implemented as the spatial domain convolution:
(7) 
where is the adjacent matrix assuming the recurring loop, is a degree matrix of , and
(8) 
denotes the channel signals at the th layer. This implies that GCN implements the node feature with its neighborhoods by mapping through a layerspecific learnable weight matrix and nonlinearity .
IiB2 Spatialbased Convolutional GNNs
Unlike the spectral GNN, spatialbased methods define graph convolutions based on a node’s spatial relations. More specifically, this opertion is generally composed of the , and functions:
where denotes the th layer feature vector at the th node. In other words, the function collects features of the neighborhood nodes to extract aggregated feature vector for the layer , and function then combines the previous node feature with aggregated node features to output the node feature of the current th layer . After this spatial operation, the mapping (5) is defined by
Due to the lack of the eigendecomposition, the spatialbased GNN is computationally more efficient. Moreover, the and share the similar idea of information propagation/message passing on graphs [14].
In particular, GIN was proposed by [25] as a special case of spatial GNN suitable for graph classification tasks. The network implements the aggregate and readout functions as the sum of the node features:
(9) 
where is a learnable parameter, and
is a multilayer perceptron with nonlinearity. For graphlevel readout, the embedded node features of every layers are summed up and then concatenated to obtain the final graph feature
as in [25, 34],(10)  
(11) 
The authors argue that the proposed network architecture can learn injective mapping of the function , which makes the model to be possibly as powerful as the WeisfeilerLehman (WL) test [35, 36, 25].
Iii Main Contribution
In this section, we show that the GIN is a dual representnation of CNN on the graph space where the adjacent matrix is defined as a generalized shift operation. From this finding, we further propose a method for applying the GIN to the rsfMRI data for graph classification.
Iiia GIN as a generalized CNN on the graph space
Note that the GIN processing (9) can be decomposed as
(12) 
where
(13)  
(14) 
where and is the adjacent matrix and denotes the th column of a matrix . This operation is performed for .
One of the most important observations is that the feature matrix is closely related to the signal matrix in (8). More specifically, we have the following dual relationship:
(15) 
Then, using the observation that is selfadjoint, the matrix representation of (12) can be converted to a dual representation:
(16) 
where denotes the fully connected network weight from the MLP. Eq. (16) shows that aside from the iteration dependent , the main difference of GIN from GCN is the presence of the instead of the normalized adjacency matrix . This implies that GIN is considered as an extension of the GCN as a first order approximation of the spectral GNN using the unnormalized graph Laplacian.
However, another important contribution of this paper is that the difference is not this minor variation, instead it implies the fundamental differences between the two approaches. More specifically, by exploring the role of in (16), Theorem 1 shows that (16) is a dual representation of the two tab convolutional neural network without pooling layer on the graph spaces, where the adjacent matrix is defined as a shift operation.
Theorem 1.
Proof.
To understand this claim, we first revisit the classical CNN for the 1D signal. A building block for the CNN is the following multichannel convolution [27]:
(17) 
where is the number of channels at the th layer, denotes the th channel signal at the th layer, and is the convolution filter that convolves with th input channel signal to produce th channel output. Finally, denotes the matrix that represent the pooling operation.
Suppose that the convolution filter has two tabs. Without loss of generality, the filter can be represented by
for some constant . Then, the convolution operation can be simplified as
where is the shift matrix defined by
(18) 
if we assumes the periodic boundary condition. Accordingly, for the cases of a CNN with no pooling layers, i.e. , (17) with the twotab filter can be represented in the following matrix form:
(19) 
where
By inspection of the dual representation of GIN in (16) and the CNN operation (19), we can see that the only difference of (16) is the adjacency matrix instead of the shift matrix in (19). Therefore, we can conclude that the GIN is a dual representation of CNN with two tab filter in the graph space where adjacent matrix is defined as a shift operation. ∎
Note that the identification of the adjacency matrix as a generalized shift operation is not our own invention, but rather it is a classical observation in graph signal processing literature [28, 29, 30]. Accordingly, Theorem 1 confirms that the insight from the classical signal processing plays an important role in understanding the GNN. Based on this understanding, we can now provide a dual space insight of the GIN operations in (10) and (11). More specifically, (10) can be understand as sumpooling operation, since we have
(20) 
where the pooling matrix is given by
(21) 
Then, (11) is indeed the multichannel concatenation layer from the pooled feature at each layer as shown in Fig. 1. Therefore, the GIN operations can be understood as a dual representation of CNN classifier on the graph signal space where the shift operation is defined by the adjacency matrix. In fact, CNN and GIN differs in their definition of the shift operation as shown in Fig. 1 and Fig. 2.
IiiB Saliency Map of GIN
One of the important advantages of understanding GIN as a dual representation of CNN with generalized shift operation is its mathematical interpretation. First, similar to our prior work [27] that interprets a CNN as a combinatorial convolutional framelet representation, we can show that the output of a GIN for binary classification can be described by the following closeform representation:
(22) 
where denotes the input vector defined by
(23) 
and the basislike functions and are defined in the Appendix.
From (22), we can easily see that the feature vector is nothing but the expansion coefficients, i.e.,
(24) 
and the classifier design is to find a decision boundary in the space of the expansion coefficients. Furthermore, (22) leads to a novel mathematical approach to visualize important brain regions related to the class features. Specifically, the Jacobian matrix of the expression in (22) is given by
(25) 
where denotes the th element of . The derivative in (IIIB) can be computed by backpropagating through the GIN [37, 38].
The expression in (IIIB) reveals the dependency of the Jacobian with respect to the input vector formulation. Specifically, if we embed the input node features into the onehot vectors that encode the semantic information of each ROIs of the brain, then the input becomes a diagonal matrix and when . As the each element of Jacobian implies to the sensitivity with respect to the specific coordinate direction, the Jacobian should have only nonzero values along this nonzero coefficients. Accordingly, if we define the saliency map by collecting the sensitivity with respect to the nonzero sensitivity:
(26) 
then the th elements of the saliency map implies the sensitivity with respect to the th ROI index. Accordingly, the resulting saliency map can quantify the sensitivity with respect to the node geometry, which provide a neuroscientific information about the relative importance of the each ROIs related to the class features.
One may wonder that our saliency map is similar to the gradientbased saliency map [37, 38], which is defined as the derivative of the class output with respect to the input, i.e. basically the Jacobian in (IIIB). However, the dimension of the Jacobian in (IIIB) is for the case of GNN with onehot vector encoding, whereas our saliency map in (26) has dimension of the node size (i.e. ) by restricting Jacobian with respect to the active coordinate directions. Therefore, our new definition of the saliency map can be readily interpreted by mapping to the graph.
Iv Methods
Based on the aforementioned understanding of the GIN, we proceed to apply the GIN to the rsfMRI data for graph classification of the sex and provide neuroscientific interpretation. The Fig. 1 provides schematic illustration of the proposed analysis pipeline.
Iva Data Description and Preprocessing
The rsfMRI data was obtained from the Human Connectome Project (HCP) dataset S1200 release [39]. The data was acquired for two runs of two restingstate session each for 15 minutes, with eyes open fixating on a crosshair (TR=720ms, TE=33.1ms, flip angle=, FOV=mm, slice thickness=2.0mm). Of the total 4 runs, we used the first run of the dataset. Preprocessing of the fMRI volume timeseries included gradient distortion correction, motion correction, and field map preprocessing, followed by registration to T1 weighted image. The registered EPI image was then normalized to the standard MNI152 space. Finally, FIXICA based denoising was applied to reduce nonneural source of noise in the data [40, 41]. Details of the HCP preprocessing pipeline is referred to [42].
From the preprocessed HCP dataset, rsfMRI scans of 1094 subjects were obtained from the project. To further minimize the unwanted effect of head motion on model training, we discarded the subject scans with framewise displacement (FD) over 0.3mm at any time of the scan. The FD was computed with fsl_motion_outliers function of the FSL [43]. There were 152 discarded scans from filtering out with the FD, and 942 scans were left. The 942 scans consisted of data from 531 female subjects and 411 male subjects. We paired each scan with the sex of the corresponding subject as an inputlabel for training the neural network.
IvB Graph Construction from Preprocessed Data
The ROIs are defined from the cortical volume parcellation by [44]. We used the 400 parcellations as in [45, 46]. Semantic region labels (e.g., Posterior cingulate cortex) and functional network labels (e.g., Default mode) corresponding to every parcels are provided with the dataset [44]. Vertices are defined as onehot vectors encoding the semantic region labels of the whole 400 ROIs. It can be said that no actual signal from the fMRI BOLD activity is represented in the vertex of the constructed graph.
To define the edges, functional connectivity matrix was constructed as follows. First, mean timeseries of cortical parcels were obtained by averaging the preprocessed fMRI data voxels within each ROIs. Functional connectivity is defined as the correlation coefficient of the pearson’s correlation between the timeseries of the two voxels. Thus, the connectivity matrix is constructed by computing the pearson’s correlation cofficient between every other ROIs. Derivation of the mean timeseries and the connectivity matrix was performed with the MATLAB toolbox GRETNA [47]. To derive an undirected, unweighted graph from the connectivity matrix, we threshold the connectivity matrix with sparsity by selecting the top percentile elements of the connectivity matrix as connected, and others unconnected. We experiment the sparsity values of 5%, 10%, 15%, 20%, 25%, 30%, 35%, and 40%.
IvC Training details
We implement the GIN (Eq. (9)) for our main experiment. The concatenated graph features from all layers in (11) is mapped to the classifier output for predicting the onehot vector encoded groundtruth label of the graph , where and is a set of all possible class labels. Note that we omit the graph feature from the 0th layer when concatenating since it is the same onehot embedding of each predefined ROIs which have no difference between the subjects. The GIN is then trained to minimize the crossentropy loss :
(27) 
where the expectation is taken over the training data. For the gender classification in this paper, the classifier is binary, so we use .
Deep Graph Infomax (DGI) was introduced in [48] as an unsupervised method for the representation learning of the graph. The DGI learns the node representation by maximizing the mutual information between the node feature vectors and the corresponding graph feature . A discriminator that takes a pair of a node feature vector and a graph feature as input is trained to discriminate whether the two embeddings are from the same graph:
(28) 
Here, is a corrupted node feature vector, which is usually obtained by randomly selecting a node feature vector from another sample in the minibatch [48]. The DGI was first proposed as an unsupervised representation learning method, but [19] has made use of the DGI as a regularizer for the graph classification task.
Following the work by [19]
, we added the DGI loss as a regularizer with the expectation that maximizing the mutual information between the node features and the graph features can help extract better representation of the graph. Thus, the final loss function is defined as:
(29) 
where is the cross entropy loss in (27) and is defined in (28), respectively. In this paper, we coin the term Infomax regularization indicating the regularizer . After a number of preliminary experiments, the regularization weight was empirically set to 0.1. We performed 10fold crossvalidation of the 942 graphs following [49].
The GIN was implemented 5 layers deep with 64 hidden units in each layer. We apply 1dimensional batch normalization after each layers. Adam optimizer was used for 150 epochs of training with the learning rate of 0.01. Dropout was applied at the final linear layer with droprate of 0.5 during the training phase.
IvD Comparative Study
To investigate the optimality of the proposed method, we performed comparative study with existing methods. First, we compared the classification performance without applying the Infomax regularization. Specifically, we set in (29) for the training. This is to validate the importance of the regularization term. Second, we compared the classification performance when the input features were not encoded in onehot vectors. Specifically, we embedded the input features of each region as its centroid coordinate as in previous works [15, 16, 18, 19], and train the GIN with same model parameters. The centroid coordinates are defined as a threedimensional vector with each vector element representing the location of the axis R, A, and S.
V Experimental Results
Condition  Accuracy  Precision  Recall 

Sparsity 5%  
Sparsity 10%  
Sparsity 15%  0.836  
Sparsity 20%  0.804  0.840  
Sparsity 25%  
Sparsity 30%  
Sparsity 35%  
Sparsity 40%  
No regularizer  
Centroid input  
Va Classification results
The classification accuracy, precision, and recall are reported in Table
I. Highest accuracy of 0.804 was achieved by the 20% sparsity of the edges as connected. Precision was also highest when 20% of the edges were present, with the value of 0.840. Highest value of recall was 0.836 with the sparsity of 15%. Since the classification performance was best with the 20% sparsity, we report further results with the same 20% sparsity.To further validate the proposed ideas, we demonstrate the results of comparative studies. First, we compared the classification performance without applying the Infomax regularization. We set the edge sparsity to 20% and the regularization coefficient of (29) to zero. Training without Infomax regularization resulted in the accuracy , the precision , and the recall (Table I). The accuracy and the precision was lower, and the recall was comparable without the Infomax regularization. Performance gain from the Infomax regularization is suggested to be related with better learned representations in the latent space. Silhouette score of the latent space of the network was 0.0636 when trained without regularization, whereas the score was 0.0989 when trained with regularization. Second, we compared the classification performance when the input features were not encoded in onehot vectors. The results suggest that the classification performance is significantly affected by changing the input features. Silhouette score was 0.0210 for the latent space of the network with centroid coordinate input features.
We visualize the latent space of the networks trained for the comparative experiments with tSNE embedding in Fig. 4. It can be seen from the visualization that the latent space is not linearly separable before training the neural network, but shows a trend of linear separability by training the neural network, and further by adding the Infomax regularization. Training the neural network without onehot vector encoding of the input features results in an even more entangled latent space when visualized with the tSNE embedding.
VB Saliency mapping
The proposed saliency mapping was applied for visualizing the brain regions that are related to each class of sexes. We computed the saliency map using (26) for each test subject. The map was averaged across all subjects, and then was normalized by dividing the maximum absolute value of the whole vertices. This resulted in a normalized grouplevel saliency map with maximum absolute value of 1.0. We defined the grouplevel salient regions by thresholding the ROIs with the value over 0.7. Plotted image and the full list of ROIs of the salienct regions are reported in the Fig. 5, and the Table II, respectively.
Female  
Region  Network  R  A  S  Value  
L.  Lateral PFC  CCN  28  56  12  1.0 
L.  Somatomotor area  SMN  8  42  70  0.964 
L.  PCC  DMN  8  52  10  0.948 
R.  Temporal lobe  DMN  48  16  20  0.902 
R.  Somatomotor area  SMN  6  22  72  0.849 
R.  Parietal lobe  DMN  56  46  32  0.849 
R.  Lateral PFC  CCN  44  18  44  0.789 
L.  Temporal lobe  DMN  60  36  18  0.765 
L.  Temporal lobe  DMN  56  54  30  0.751 
R.  Medial PFC  DMN  6  42  10  0.733 
L.  PFC  DMN  8  42  52  0.724 
L.  PFC  DMN  6  44  6  0.723 
R.  Visual cortex  VN  22  60  6  0.713 
L.  Visual cortex  VN  18  64  6  0.709 
Male  
L.  Somatomotor area  SMN  8  42  70  1.0 
L.  Lateral PFC  CCN  28  56  12  0.873 
L.  PCC  DMN  8  52  10  0.838 
R.  Somatomotor area  SMN  6  22  72  0.802 
R.  Lateral PFC  CCN  44  18  44  0.771 
R.  Temporal lobe  DMN  48  16  20  0.769 
R.  Parietal lobe  DMN  56  46  32  0.740 
L.  Visual cortex  VN  18  64  6  0.731 
R.  Somatomotor area  SMN  4  80  24  0.731 
R.  Precuneus  DAN  8  72  52  0.726 
R.  Visual cortex  VN  22  60  6  0.705 
* Bold typeset indicates regions not overlapping with the reciprocal class 
The brain regions shown to be salient to the female class were the lateral prefrontal cortex (PFC) on both sides, the right medial PFC, the left PFCs, the bilateral somatomotor area, the left posterior cingulate cortex (PCC), the bilateral temporal lobe, the right parietal lobe, and the bilateral visual cortex. The functional networks that these brain regions comprise are the default mode network (DMN), the cognitive control network (CCN), the somatomotor network (SMN), and the visual network (VN). No brain regions from the dorsal attention network (DAN), the saliency/ventral attention network, and the limbic network were shown to be related to the classification of the female class. Among the functional networks, salient regions from the DMN constituted the majority with the ratio of 57.1% (Fig. (a)a). The other three networks, the CCN, the SMN, and the VN, constituted with the same ratio of 14.3%. Between the two hemispheres, salient regions were slightly more dominant in the left hemisphere (57.1%) than the right hemisphere (42.9%).
For the male class, salient regions were the lateral PFC on both sides, the bilateral somatomotor area, the left PCC, the right precuneus, the right temporal and parietal lobes, and the bilateral visual cortex. One region that comprise the DAN was salient in the male class, along with the regions from the DMN, the CCN, the SMN, and the VN. Dominance of the DMN was not as prominent as in the female class with the ratio of 27.3% (Fig. (b)b). The SMN had a comparable ratio to the DMN (27.3%). The CCN, the VN, and the DAN constituted with the ratio of 18.2%, 18.2%, and 9.1%, respectively. In contrast to the female class, the right hemisphere had the dominance with the ratio of 63.6% in the male class.
It can be noted from the results that there exists a significant overlap with an opposite trend of salient regions for classifying female subjects and male subjects. This trend is actually not very surprising, since well trained neural network can learn to output a linearly separable logits from the input.
Vi Discussion
In this study, we proposed a framework for analyzing the fMRI data with the GIN. The framework suggests on first constructing the graph from the semantic region labels and the functional connectivity between them. We train a GIN for classifying the subject phenotype based on the whole graph properties. After training, we can classify the subject with the trained GIN, or visualize the regions related to the classification by backpropagating through the trained GIN. An important theoretical basis that we found which underlie in this proposed method is that the GIN is not just a blackbox operation that aggregates the graph structure with the MLP, but is actually a dual representation of a CNN on the graph space where the adjacency matrix is used as a generalized shift operator.
Classification of sex based on the rsfMRI data resulted in the accuracy, precision, and recall of 0.804, 0.840, and 0.818, respectively (sparsity 20%). The performance of the classifier is comparable to recent studies for classifying sex based on the rsfMRI data [16, 51, 46]. Metric learning with siamese spectral GCN has achieved classification accuracy over 80% with the UK Biobank dataset [16]. Classification employing the partial least squares and bootstrap resampling has achieved the accuracy of 86.6% [51] with the HCP dataset. However, other processing steps were included for maximizing the classification accuracy, such as regressing out demographic confounds (age, blood pressure, etc.), and combining the data from multiple runs. Considering the same run as our experiment (first run of the HCP data) without regressing out the confounds, the accuracy is reported to be 79.2%, which is similar to our results [51]. The study that had the most similar processing of the fMRI data to ours was [46]. They have used only the first run of the HCP dataset, and extracted the fMRI mean timeseries from the ROIs including the 400 parcellation of [44]
. In that study, withinsample validation accuracy was reported to be 74.81% by applying a nonlinear support vector machine with radial basis function kernel
[46]. We agree with [46] that exploiting the data from multiple runs of one subject as in [51] is not very practical in that most rsfMRI experiments are finished with just one run. Also, we did not regress out any demographic covariates from the data, which can possibly affect the result of our main interest, the saliency mapbased fMRI analysis. Being apart from our goal, we did not go through extensive experiments to achieve better performance on the classification task. However, improving the classification performance of the GIN based methods can still be a very interesting topic in the future.After fully training the GIN for the sex classification task, we could map the salient regions related to the classification by the saliency mapping method. From the saliency mapping result, we could find the regions from the DMN to be taking the most prominant role in classifying the sex of the subject. Importance of the DMN in the sex classification based on rsfMRI data has been consistently reported [51, 46]. In the study by Zhang and colleagues [51], there were seven features involving the DMN of the top twenty important regions (35.0%) for sex classification. Considering the ratio of the DMN for the nine regions that were salient for both classes, our result show a similar (33.3%) ratio with the previous study [51]. This result is known to be related to the difference of the DMN functional connectivity between the two sexes during the restingstate [52]. Considering the difference of the DMN between the two sexes, it has been found consistently, and also from the metaanalysis, that the female individuals show stronger functional connectivity of the DMN compared to the males [53, 54, 55, 52, 51]. Interestingly, our study also replicated the different involvement of the within the DMN between the two sexes. In our results, dominant trend of the DMN was clearer for the female class. There were eight salient regions from the DMN among the fourteen regions, outnumbering salient regions from the other three networks (Table II). This higher dominance of the DMN in the female class (57.1%) than in the male class (27.3%) is suggested to be a replicative finding of the previous studies.
Not only the DMN is related to the sex difference in restingstate functional networks. Previous studies note that the male subjects exhibit a stronger functional connectivity within the sensorimotor networks when compared to female subjects [55, 51]. This trend was also true in our experiments. The salient regions found in the male class had a comparable dominance of the SMN to the DMN (27.3%), whereas this was not the case in the female class. it was found that the male class had a stronger dominance in the SMN when compared to the female class.
Along with the DMN and the SMN, the lateral PFC of the CCN was found to be involved in the classification of the two sexes. The CCN, also known as the frontoparietal network, is the network that takes part in the cognitive control function. There are evidences that there exists sex related difference of the CCN, especially in the lateral PFCs [56, 57]. To sum up, the relative importance of the functional networks in the both sexes well replicate the previous findings with different approaches, which suggests the validity of our proposed method.
Apart from the functional networks, hemisphere related sex differences are also previously reported [12, 57]. The studies indicate that female subjects show higher functional connectivity in the left hemisphere, and male subjects in the right hemisphere [12]. This difference in hemisphere dominance has shown the same trend in our experiment. In the female class, the the salient regions in the left hemisphere outnumbered the salient regions in the right hemisphere (left 57.1% vs. right 42.9%). In contrast, the male class resulted in the right hemisphere lateralized saliency mapping result (left 36.4% vs. right 63.6%). The hemisphere related sex differences also replicate the previous results, further supporting the validity of our method.
The result from the comparative study showed that embedding the input features into the threedimensional vector representing the centroid coordiates significantly degrades the classification performance of the GIN. We analyzed the mathematical reason using the explicit representation of the sensitivity map. The idea is further supported by evaluating the latent space of the GIN with the input features as the centroid coordinates. Visualization with tSNE showed highly entangled latent space, and the silhouette score was lower than the baseline. It can be said that embedding the input features into the onehot vector is essential, not only in that it enables the saliency mapping for neuroscientific interpretation, but also in that it maximizes the performance of the GIN. The other comparative study which removed the Infomax regularization resulted in lower accuracy and linearly separable latent space. Thus, we suggest to add the Infomax regularization for better results.
There are some limitations and caveats that needs to be discussed. First, the demographics that can affect the analysis have not been considered or controlled thoroughly. It is well known that the restingstate network can be affected by the age, handedness, fluid intelligence, and other subject charactersitics. The results are expected to have stronger explainability by taking the demographics of the subjects into account in the analysis. Second, the cutoff value for determining the salient region was heuristically set. The method would have even more validity if the salient regions were determined in a more datadriven way, as in the classical methods perform statistical testing to determine the significance of each voxels. We have not gone through extensive study on the topic of determining the significant regions from the saliency map, but is worth further studies and discussion.
Still, we insist that analyzing the fMRI data based on the GIN has shown its theoretical and experimental validity in this study. We believe that the GIN based analysis method offers a potential advancement in the area, by opening a way to exploit the capability of the GIN to learn highly nonlinear mappings. Some interesting topics related to this work can be considered. Thoeretically, exploring the operations beyond the twotab convolution filter by GIN can potentially provide better performance than the existing GIN. Neuroscientifically, extension of the method to clinical data interpretation or to the multiclass graph classification problem can be interesting topics in the future. With enough data assured, the proposed method is expected to help reveal new findings from the functional networks of the brain.
Vii Conclusion
We propose a framework for understanding the mathematical structure of GIN and analyzing the rsfMRI data. Our framework resulted in a comparable classification performance to the previous studies of sex classification with rsfMRI data. We theoretically deduced that the employed GIN model is indeed a dual representation of the classical CNN for the graph space where the adjacent matrix is defined as a shift operation. Based on this theoretical understanding, we extended the method to delineate the important brain regions for classification with saliency mapping. The saliency map derived from the trained GIN had consistent findings with previous functional neuroimage studies, validating the proposed method.
Acknowledgment
The authors would like to thank Sangmin Lee for his useful comments.
The derivation is a direct extension of the results in [27] for the GIN, but the detailed derivation is different due to the generalized definition of shift operation, so we include it here.
Consider the CNN representation of GIN in Fig. 1, where the convolution operation is defined using a generalized shift operation with the adjacency matrix. At the th layer, and denote the dimension of the signal, and the number of channels, respectively. Now, using the equality
where denotes the columnwise stacking of the matrix , (16) can be converted to a vector form:
(30) 
where the th layer encoder matrix is defined as
(31) 
and
is the diagonal matrix with 1 and 0 values depending on ReLU activation pattern
[27]. After the sumpooling operation, the corresponding feature vector from the th layer is then defined by(32) 
Now, the decoder layer before the logit operation is the average pooling and fully connected layer as shown in Fig. 1. This can be represented by
(33) 
where denotes the input vector, and the encoder expansion function is given by
and the decoder expansion function is
where is the nonlinearitydependent weight parameter that is dependent upon the input vector.
This expression has very important mathematical interpretation. One of the important requirement to become a neural network is that for the the same trained filter sets , the CNN representation should be adjusted to different data set. For the case of standard CNN, the adaptivity is originated from that the ReLU activation pattern changes depending on the input vector , and the resulting basislike representation varies accordingly. For the case of graph structure data, the adaptivity can be originated from two different ways. First, similar to the standard CNN, the input vector can be designed to reflect the graph structure. For example, threedimensional vector representing the centroid coordinates can be used as the input data as in [15, 16, 18, 19]. However, for the case of graph data, there is another way to make the CNN representation dependent on the data. More specifically, in our encoder matrix for GIN in (31), the generalized shift matrix is dependent on the specific data. Therefore, even with the same trained filter sets , the CNN representation varies accordingly. This interesting way of providing adaptivity is one of the unique features in the GNNs, which deserves further investigation in the future.
References
 [1] S. A. Huettel, A. W. Song, G. McCarthy et al., Functional magnetic resonance imaging. Sinauer Associates Sunderland, MA, 2004, vol. 1.
 [2] C. Honey, O. Sporns, L. Cammoun, X. Gigandet, J.P. Thiran, R. Meuli, and P. Hagmann, “Predicting human restingstate functional connectivity from structural connectivity,” Proceedings of the National Academy of Sciences, vol. 106, no. 6, pp. 2035–2040, 2009.
 [3] M. D. Greicius, K. Supekar, V. Menon, and R. F. Dougherty, “Restingstate functional connectivity reflects structural connectivity in the default mode network,” Cerebral cortex, vol. 19, no. 1, pp. 72–78, 2009.
 [4] B. Biswal, F. Zerrin Yetkin, V. M. Haughton, and J. S. Hyde, “Functional connectivity in the motor cortex of resting human brain using echoplanar MRI,” Magnetic resonance in medicine, vol. 34, no. 4, pp. 537–541, 1995.
 [5] M. D. Greicius, B. Krasnow, A. L. Reiss, and V. Menon, “Functional connectivity in the resting brain: a network analysis of the default mode hypothesis,” Proceedings of the National Academy of Sciences, vol. 100, no. 1, pp. 253–258, 2003.
 [6] M. P. Van Den Heuvel and H. E. H. Pol, “Exploring the brain network: a review on restingstate fmri functional connectivity,” European neuropsychopharmacology, vol. 20, no. 8, pp. 519–534, 2010.
 [7] D. S. Bassett and O. Sporns, “Network neuroscience,” Nature neuroscience, vol. 20, no. 3, p. 353, 2017.
 [8] D. S. Bassett and E. T. Bullmore, “Human brain networks in health and disease,” Current opinion in neurology, vol. 22, no. 4, p. 340, 2009.
 [9] Y. He and A. Evans, “Graph theoretical modeling of brain connectivity,” Current opinion in neurology, vol. 23, no. 4, pp. 341–350, 2010.
 [10] O. Sporns, “Graph theory methods: applications in brain networks,” Dialogues in clinical neuroscience, vol. 20, no. 2, p. 111, 2018.
 [11] J. Wang, X. Zuo, and Y. He, “Graphbased network analysis of restingstate functional mri,” Frontiers in systems neuroscience, vol. 4, p. 16, 2010.
 [12] L. Tian, J. Wang, C. Yan, and Y. He, “Hemisphereand genderrelated differences in smallworld brain networks: a restingstate functional mri study,” Neuroimage, vol. 54, no. 1, pp. 191–202, 2011.
 [13] S. Micheloyannis, E. Pachou, C. J. Stam, M. Breakspear, P. Bitsios, M. Vourkas, S. Erimaki, and M. Zervakis, “Smallworld networks and disturbed functional connectivity in schizophrenia,” Schizophrenia research, vol. 87, no. 13, pp. 60–66, 2006.
 [14] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
 [15] S. I. Ktena, S. Parisot, E. Ferrante, M. Rajchl, M. Lee, B. Glocker, and D. Rueckert, “Distance metric learning using graph convolutional networks: Application to functional brain networks,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2017, pp. 469–477.
 [16] ——, “Metric learning with spectral graph convolutions on brain connectivity networks,” NeuroImage, vol. 169, pp. 431–442, 2018.
 [17] G. Ma, N. K. Ahmed, T. Willke, D. Sengupta, M. W. Cole, N. TurkBrowne, and P. S. Yu, “Similarity learning with higherorder proximity for brain network analysis,” arXiv preprint arXiv:1811.02662, 2018.
 [18] X. Li, N. C. Dvornek, Y. Zhou, J. Zhuang, P. Ventola, and J. S. Duncan, “Graph neural network for interpreting taskfmri biomarkers,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2019, pp. 485–493.
 [19] X. Li, N. C. Dvornek, J. Zhuang, P. Ventola, and J. Duncana, “Graph embedding using infomax for asd classification and brain functional difference detection,” arXiv preprint arXiv:1908.04769, 2019.
 [20] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, R. G. Moreno, B. Glocker, and D. Rueckert, “Spectral graph convolutions for populationbased disease prediction,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2017, pp. 177–185.
 [21] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, R. Guerrero, B. Glocker, and D. Rueckert, “Disease prediction using graph convolutional networks: Application to autism spectrum disorder and alzheimer’s disease,” Medical image analysis, vol. 48, pp. 117–130, 2018.
 [22] T. He, R. Kong, A. J. Holmes, M. Nguyen, M. R. Sabuncu, S. B. Eickhoff, D. Bzdok, J. Feng, and B. T. Yeo, “Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics,” NeuroImage, p. 116276, 2019.
 [23] S. Arslan, S. I. Ktena, B. Glocker, and D. Rueckert, “Graph saliency maps through spectral convolutional networks: Application to sex classification with brain connectivity,” in Graphs in Biomedical Image Analysis and Integrating Medical Imaging and NonImaging Modalities. Springer, 2018, pp. 3–13.

[24]
B. A. Duffy, M. Liu, T. Flynn, A. Toga, A. J. Barkovich, D. Xu, and H. Kim,
“Regression activation mapping on the cortical surface using graph
convolutional networks,” in
International Conference on Medical Imaging with Deep Learning – Extended Abstract Track
, London, United Kingdom, 08–10 Jul 2019. [Online]. Available: https://openreview.net/forum?id=rJlhd1S0FE  [25] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” arXiv preprint arXiv:1810.00826, 2018.
 [26] T. N. Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.

[27]
J. C. Ye and W. K. Sung, “Understanding geometry of encoderdecoder CNNs,”
in
International Conference on Machine Learning
, 2019, pp. 7064–7073. 
[28]
D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending highdimensional data analysis to networks and other irregular domains,”
IEEE signal processing magazine, vol. 30, no. 3, pp. 83–98, 2013.  [29] A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proceedings of the IEEE, vol. 106, no. 5, pp. 808–828, 2018.
 [30] W. Huang, T. A. Bolton, J. D. Medaglia, D. S. Bassett, A. Ribeiro, and D. Van De Ville, “A graph signal processing perspective on functional brain imaging,” Proceedings of the IEEE, vol. 106, no. 5, pp. 868–885, 2018.
 [31] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connected networks on graphs,” arXiv preprint arXiv:1312.6203, 2013.
 [32] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine LearningVolume 70. JMLR. org, 2017, pp. 1263–1272.
 [33] W. Huang, L. Goldsberry, N. F. Wymbs, S. T. Grafton, D. S. Bassett, and A. Ribeiro, “Graph frequency analysis of brain signals,” IEEE journal of selected topics in signal processing, vol. 10, no. 7, pp. 1189–1203, 2016.
 [34] K. Xu, C. Li, Y. Tian, T. Sonobe, K.i. Kawarabayashi, and S. Jegelka, “Representation learning on graphs with jumping knowledge networks,” arXiv preprint arXiv:1806.03536, 2018.
 [35] B. Weisfeiler and A. A. Lehman, “A reduction of a graph to a canonical form and an algebra arising during this reduction,” NauchnoTechnicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968.
 [36] N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “WeisfeilerLehman graph kernels,” Journal of Machine Learning Research, vol. 12, no. Sep, pp. 2539–2561, 2011.
 [37] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.

[38]
P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, and H. Hoffmann,
“Explainability methods for graph convolutional neural networks,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2019, pp. 10 772–10 781.  [39] D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W.M. H. Consortium et al., “The wuminn human connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013.

[40]
G. SalimiKhorshidi, G. Douaud, C. F. Beckmann, M. F. Glasser, L. Griffanti, and S. M. Smith, “Automatic denoising of functional mri data: combining independent component analysis and hierarchical fusion of classifiers,”
Neuroimage, vol. 90, pp. 449–468, 2014.  [41] L. Griffanti, G. SalimiKhorshidi, C. F. Beckmann, E. J. Auerbach, G. Douaud, C. E. Sexton, E. Zsoldos, K. P. Ebmeier, N. Filippini, C. E. Mackay et al., “Icabased artefact removal and accelerated fmri acquisition for improved resting state network imaging,” Neuroimage, vol. 95, pp. 232–247, 2014.
 [42] M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni et al., “The minimal preprocessing pipelines for the human connectome project,” Neuroimage, vol. 80, pp. 105–124, 2013.
 [43] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W. Woolrich, and S. M. Smith, “Fsl,” Neuroimage, vol. 62, no. 2, pp. 782–790, 2012.
 [44] A. Schaefer, R. Kong, E. M. Gordon, T. O. Laumann, X.N. Zuo, A. J. Holmes, S. B. Eickhoff, and B. T. Yeo, “Localglobal parcellation of the human cerebral cortex from intrinsic functional connectivity mri,” Cerebral Cortex, vol. 28, no. 9, pp. 3095–3114, 2017.
 [45] R. Kashyap, R. Kong, S. Bhattacharjee, J. Li, J. Zhou, and B. T. Yeo, “Individualspecific fmrisubspaces improve functional connectivity prediction of behavior,” NeuroImage, vol. 189, pp. 804–812, 2019.
 [46] S. Weis, K. R. Patil, F. Hoffstaedter, A. Nostro, B. T. T. Yeo, and S. B. Eickhoff, “Sex Classification by Resting State Brain Connectivity,” Cerebral Cortex, 06 2019, bhz129. [Online]. Available: https://doi.org/10.1093/cercor/bhz129
 [47] J. Wang, X. Wang, M. Xia, X. Liao, A. Evans, and Y. He, “Gretna: a graph theoretical network analysis toolbox for imaging connectomics,” Frontiers in human neuroscience, vol. 9, p. 386, 2015.
 [48] P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep graph infomax,” arXiv preprint arXiv:1809.10341, 2018.
 [49] G. Varoquaux, P. R. Raamana, D. A. Engemann, A. HoyosIdrobo, Y. Schwartz, and B. Thirion, “Assessing and tuning brain decoders: crossvalidation, caveats, and guidelines,” NeuroImage, vol. 145, pp. 166–179, 2017.

[50]
B. Thomas Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari,
M. Hollinshead, J. L. Roffman, J. W. Smoller, L. Zöllei, J. R. Polimeni
et al.
, “The organization of the human cerebral cortex estimated by intrinsic functional connectivity,”
Journal of neurophysiology, vol. 106, no. 3, pp. 1125–1165, 2011.  [51] C. Zhang, C. C. Dougherty, S. A. Baum, T. White, and A. M. Michael, “Functional connectivity predicts gender: evidence for gender differences in resting brain connectivity,” Human brain mapping, vol. 39, no. 4, pp. 1765–1776, 2018.
 [52] L. E. Mak, L. Minuzzi, G. MacQueen, G. Hall, S. H. Kennedy, and R. Milev, “The default mode network in healthy individuals: a systematic review and metaanalysis,” Brain connectivity, vol. 7, no. 1, pp. 25–33, 2017.
 [53] R. L. Bluhm, E. A. Osuch, R. A. Lanius, K. Boksman, R. W. Neufeld, J. Théberge, and P. Williamson, “Default mode network connectivity: effects of age, sex, and analytic approach,” Neuroreport, vol. 19, no. 8, pp. 887–891, 2008.
 [54] B. B. Biswal, M. Mennes, X.N. Zuo, S. Gohel, C. Kelly, S. M. Smith, C. F. Beckmann, J. S. Adelstein, R. L. Buckner, S. Colcombe et al., “Toward discovery science of human brain function,” Proceedings of the National Academy of Sciences, vol. 107, no. 10, pp. 4734–4739, 2010.
 [55] E. A. Allen, E. B. Erhardt, E. Damaraju, W. Gruner, J. M. Segall, R. F. Silva, M. Havlicek, S. Rachakonda, J. Fries, R. Kalyanam et al., “A baseline for the multivariate comparison of restingstate networks,” Frontiers in systems neuroscience, vol. 5, p. 2, 2011.
 [56] M. Filippi, P. Valsasina, P. Misci, A. Falini, G. Comi, and M. A. Rocca, “The organization of intrinsic brain activity differs between genders: A restingstate fmri study in a large cohort of young healthy subjects,” Human brain mapping, vol. 34, no. 6, pp. 1330–1343, 2013.
 [57] H. Hjelmervik, M. Hausmann, B. Osnes, R. Westerhausen, and K. Specht, “Resting states are resting traits–an fmri study of sex differences and menstrual cycle effects in resting state cognitive control networks,” PloS one, vol. 9, no. 7, p. e103492, 2014.