I Introduction
Aanalysis of brain functional connectivity (FC) networks inferred from functional magnetic resonance imaging (fMRI) data has become an important method to probe largescale functional organization of the human brain in health and disease [2]. Considerable evidence from restingstate fMRI (rsfMRI) studies has shown altered or aberrant brain functional connectome in various neuropsychiatric and neurodegenerative disorders [49], e.g., schizophrenia [45], autism spectrum disorder (ASD) [29], Alzheimer’s disease (AD) [53], suggesting potential use of networkbased biomarkers for clinical diagnostics [14]. Functional abnormalities are detected not only in the strengths of individual connections but also topological structure of restingstate FC networks [2]. The brain function in major depressive disorder (MDD) — the most prevalent psychiatric disorder with pervasive depressed mode, cognitive inability and suicidal tendency, has been a subject of intensive studies recently. It is increasingly understood as a networkbased disorder with consistent alternations in FC patterns [28]. Disrupted restingstate FC from fMRI has been found in MDD core networks, such as the default mode network (DMN) related to selfreferential processing and emotion regulation, central executive network (CEN) for attention and working memory, and other subcortical circuitries [4]. Increased connectivity within DMN [13] and decreased connectivity between DMN and CEN have been observed in MDD patients compared to healthy controls (HCs) [28]. Graph theoretical analyses of rsfMRI also revealed altered network topological properties in MDD, e.g., enhanced global efficiency [55] and high local efficiency and modularity [51].
Machine learning techniques have been increasingly used in turning altered brain FC into biomarkers for fast and automated classification of brain disorders [48]
. Vast majority of studies use traditional machine learning algorithms for classification, such as support vector machine (SVM), logistic regression and linear discriminant analysis (For review see
[6, 10, 9]). Compared to other disorders, functional connectomebased classification of MDD is relatively unexplored. Several recent studies [52, 8, 3, 12, 57]have employed SVMs combined with some adhoc feature selection methods to differentiate MDD from HCs using rsfMRI FC, and obtained reasonable classification accuracies on leaveonesubjectout crossvalidation.
Deep learning methods have received significant interest in fMRIbased classification of brain disorders [37]
. In recent applications to connectomebased classification, it has shown great potential providing substantial gain in performance over traditional classifiers. Deep neural networks (DNNs) can automatically learn a hierarchy of representations directly from the connectome data, without relying on preliminary feature handcrafting and selection. Fullyconnected DNNs have been used as autoencoders (AE) to map highdimensional input vectors of FC metrics to latent compact representations for rsfMRI classification of ASD
[17, 38] and schizophrenia [21]. Inspired by remarkable success in image and object classification, deep convolutional neural networks (CNNs) have also been used to learn spatial maps of brain functional networks. A CNN architecture (BrainNetCNN) with speciallydesigned convolutional filters for modeling connectome data was introduced by [20] for predicting neurodevelopment in infants. Various variants of connectome CNNs were subsequently proposed for FC classification. These include onedimensional (1D) spatial convolutional filters on rsfMRI FC data for mild cognitive impairment (MCI) identification [27], 2DCNNs for FC matrices for ASD classification [42], 3D CNNs to combine static and dynamic FC for early MCI detection [19], and multidomain connectome CNN to integrate different brain network measures [36]. The abovementioned deep learning models generally neglect the topological information of the brain networks which may lead to suboptimal performance in brain disorder identification. The flattening of input FC maps in fullyconnected DNNs destroys the spatial structure, while the use of fixed 1D or 2D regular grid convolution operators in CNNs also fails to capture the graphstructured connectome data. Brain networks typically exhibit irregular structure with nodes being unordered and connected to a different number of neighbors, which renders convolution operations for regular grid inappropriate for modeling graphs.Extending deep learning approaches to data in nonEuclidean domain, including graphs, is a rapidly growing field [50]. One popular graphbased neural network (GNN) architecture, the graph convolutional networks (GCNs), generalizes operations in CNNs to learn local and global structural patterns in irregular graphs. A spectralbased GCN has been proposed to perform convolutions in the graph spatial domain as multiplications in the graph spectral domain [7, 23]. Applications of spectral GCNs to brain disorder detection from brain functional networks are introduced only recently and in its very early stage, e.g., for predicting ASD and conversion from MCI to AD [33, 32, 26, 18]
. These studies used a population graph as input to GCN, where nodes represent subjects with associated restingstate FC feature vectors, while phenotype information is encoded as graph edge weights. However, this approach inherently relies on nonimaging data to construct graphs and requires prior knowledge of relevant phenotype information for specific disorders. Moreover, it is semisupervised learning using all subjects (both training and testing sets) as inputs and thus lacks generalization on unseen subjects. A recent benchmarking study
[15] also showed that populationbased spectral GCN is less effective than the BrainNetCNN in restingstate FCbased behavioral prediction.In this paper, we propose a novel framework based on deep GNN for graph embedding on brain functional networks for classifying neuropsychiatric disorders associated with functional dysconnectivity. Precisely, we develop a graph autoencoder (GAE) architecture that leverages GCN to encode the nonEuclidean information about brain connectome into lowdimensional latent representations (or network embeddings), on which a decoder is trained to reconstruct the graph structure. The learned embeddings allow dimensionality reduction of largesized brain network data, and preserves the network topological structure and node content information used for subsequent connectomebased classification. The extracted patterns by the multiple graph convolutional layers in GCNs can include highlevel representations of nodes’ local graph neighborhood. Besides the unsupervised embedding learning using GAE, we also consider supervised learning where the model makes use of disorder class labels to optimize the embeddings. Finally, a readout layer is added to summarize the node representations of each graph into a graph representation, which is then used as feature inputs to a fullyconnected DNN (FCNN) for network classification. We apply the proposed GAEFCNN to rsfMRI data for classification of MDD and HCs using wholebrain FC networks. The GAEDNN is trained on highdimensional functional networks constructed from rsfMRI using LedoitWolf (LDW) covariance estimator
[25]. We also explore different types of node features: fMRI time series, associated FC edges and local graph measures. The main contributions of this work are summarized as follows:
We propose, for the first time, a graph deep learning framework for brain FCbased identification of MDD.

The proposed GAEFCNN framework offers a novel approach to directly leverage on the alterations in network structure for brain disorder classification via the learned network embeddings. The GCNbased GAE architecture provides a purely unsupervised way to learn embeddings that encode the irregular topological structure of brain networks, which are inadequately modeled by the connectome CNNs and the vectorized FC features in population graphs. The GAE combined with a deep DNN facilitates graphlevel classification to predict class labels for the entire brain graph, rather than node/subjectlevel classification based on population graphs.

We demonstrate that our approach outperforms both the BrainNetCNN and populationbased GCN by a large margin in identifying MDD based on restingstate functional brain networks from fMRI.

We show that higherorder network reconstructed from nodes embeddings learned by the proposed GCNbased GAE can reveal differences in network organization between MDD and HCs related to emotion processing.
Ii rsfMRI Dataset for MDD
Ii1 Subjects & Data Acquisition
We used a restingstate fMRI MDD dataset collected at the Duke University Medical Center, USA, studied previously in [1, 47]. The current analyses consist of 43 subjects, including 23 nondepressed (HC) and 20 depressed (MDD) participants aged between 20 and 50 years old. Depressed participants had met the Diagnostic and Statistical Manual of Mental Disorder (DSMIV) criteria of MDD, as assessed by the MiniInternational Neuropsychiatric Interview (MINI, version 5.0) [41] and interview with a study psychiatrist. Participants were scanned on a Siemens 3.0T Trio Tim scanner, with an 8channel head coil. Echoplanar BOLD functional resting scans were acquired with transverse orientation (TR/TE = 2000/27 ms, voxel size = 4.0 4.0 4.0 mm, 32 axial slices). A time series of 150 volumes were collected for each scan.
Ii2 Preprocessing
Standard preprocessing steps were applied to the fMRI data using Conn toolbox (version 15.g) in SPM 12, including motion correction, slice timing correction, coregistration of functional and anatomical images, normalization to the standard MNI (Montreal Neurological Institute) template, and Gaussian spatial smoothing with FWHM (full width at half maximum) = 6 mm. The fMRI data were bandpass filtered between 0.010.07 Hz. The automated anatomical labeling (AAL) atlas was used to obtain an anatomical parcellation of the wholebrain into 116 regions of interest (ROIs), and ROIwise fMRI time series were extracted by averaging over voxels.
Iii Methods
Fig. 1 shows an overview of the proposed GAEFCNN framework for identifying brain disorders using fMRIbased functional brain networks, which consists of three stages: (1) Network construction. Highdimensional FC networks are constructed from fMRI data using LDW shrinkage covariance estimator, and associated node attributes/features are extracted. (2) Network embedding via a GCNbased GAE. The GAE learns network embeddings by using an encoder of stacked GCNs to map the input graph structure and node content of FC networks into latent representation (or embeddings), and using an innerproduct decoder to enforce embeddings to preserve graph topological information. (3) Network classification. The learned network embeddings are then used as inputs to a fullyconnected DNN to discriminate between MDD patients and HCs. We develop an unsupervised (Fig. 1(a)) and a supervised (Fig. 1(b)) framework for learning graph embeddings in brain networks.
Iiia Connectivity Network Construction
We consider an undirected graph of brain functional network for each subject, represented by where is a set of nodes (voxels or ROIs) and denotes the connectivity edge between nodes and . The topological structure of the graph can be represented by an adjacency matrix , where if nodes and are connected, otherwise . We denote by the node feature matrix for , with representing the content feature vector associated with each node .
IiiA1 Network Connectivity
In constructing FC networks, we compute the FC matrix based on the temporal correlations of fMRI time series between pairs of ROIs. Let , be the fMRI time series of length measured from the ROIs. For largesized fMRIderived networks in which the number of nodes is larger or comparable to the number of scans , traditional sample correlation matrix is no longer a reliable and accurate estimator of FC. This is due to large number of correlation coefficients (i.e., ) to be estimated relative to the sample size. This condition applies to the MDD fMRI data considered here ( and ROIs). To estimate functional connectomes efficiently, we use the LedoitWolf (LDW) regularized shrinkage estimator [25, 5] which can yield wellconditioned FC estimates in highdimensional settings when the ratio of is large. The LDW covariance estimator is defined by with , where is a shrinkage parameter, is a identity matrix and is the sample covariance matrix. The correlation matrix is then computed as where
We can generate the adjacency matrix by thresholding the correlation matrix . We used the proportional thresholding [44] which sets a proportion of strongest connections (with the highest absolute correlation values) of the derived FC matrix for each individual network to 1, and other connections to zero. By applying a proportional threshold value of , the number of retained links/edges in a graph is . This approach will result in a fixed density of edges in graphs across all subjects, and thus enabling meaningful comparison of network topology between different groups and conditions. It can also generate more stable network metrics compared to the absolute thresholding [11]. It has been shown that the setting of threshold has a significant impact on the overall performance of the network classification model [18]. Besides, when decreases, networks become sparser and may lead to the zerodegree nodes (isolated nodes totally disconnected from the rest of the graph). By evaluating over a range of thresholds, was chosen to generate graphs without zerodegree nodes for all subjects and give the optimal classification performance.
IiiA2 Node Features
We consider three types of node features for . (1) raw rsfMRI time series associated with each node which can capture spontaneous fluctuations in the BOLD signal in individual brain regions. (2) FC weights of edges connected to each node, i.e., each row of the LDWestimated correlation matrix. (3) Graphtheoretic measures to characterize graph topological attributes at local (nodal) level. A list of 18 different nodal graph measures [40]
was extracted for each individual node, including degree, eigenvector centrality, modularity, PageRank centrality, nodal eccentricity, community Louvain, module degree zscore, participation coefficient, routing efficiency, clustering coefficient, diversity coefficient, gateway coefficient (node strength), gateway coefficient (betweenness centrality), local assortativity, participation coefficient, node strength, node betweenness, and global efficiency.
IiiB Graph Convolutional Autoencoder
We propose a new approach that builds on the graph autoencoder (GAE) [24, 31] to learn graph embeddings on brain networks in a purely unsupervised framework. Given the brain network , the autoencoder maps the nodes to lowdimensional vectors (or embeddings), using an encoder where with the dimension of embedding, and then reconstruct the graph structure from the embeddings using a decoder. The learned latent representations should reflect the topological structure of the graph and the node content information . It contains all the information necessary for downstream graph classification tasks for brain disorders. We consider two variants of GAE: (1) Generic GAE which aims to reconstruct the original input graph adjacency matrix, (2) Variational GAE (VGAE) [24], a variational extension of GAE to learn the distribution of embeddings, which could prevent potential model overfitting.
IiiB1 Graph Convolutional Encoder Model
To encode both graph structure and node content into in a unified way, we employ a variant of graph convolutional network (GCN) [23] as the graph encoder of GAE. The GCN is a firstorder approximation of graph convolutions in the spectral domain. We consider a multilayer GCN which learns a layerwise transformation by a spectral graph convolutional function as follows
(1) 
where is the latent feature matrix after convolution at th layer of GCN, is a layerspecific trainable weight matrix. Here, is the input node feature matrix. The propagation for each layer of the GCN can be calculated as
(2) 
where is normalized adjacency matrix with added selfconnections to ensure numerical stability, is a node degree matrix with diagonals , and
denotes the activation function. Model (
2) generates embeddings for a node by aggregating feature information from its local neighborhood at each layer.We construct the graph convolutional encoder based on twolayered GCN as in [31]
(3)  
(4) 
which produces latent representation with the following forward propagation
(5) 
where , and are ReLU(·) and linear activation functions in first and second layers, respectively.
In the VGAE, variational graph encoder is defined by an inference model parameterized by a twolayer GCN [24]
(6)  
(7) 
Here, the embeddings
are generated according to a normal distribution with mean
and variance
. is the matrix of mean vectors defined by the GCN encoder output in (5), and is defined similarly as using another encoder.IiiB2 Decoder Model
The decoder of GAE aims to decode graph structural information from the embeddings by reconstructing the graph adjacency matrix. The GAE decoder model predicts the presence of a link between two nodes based on the inner product between latent vectors of
(8)  
where
is the logistic sigmoid function. The graph adjacency matrix can be reconstructed as
.IiiB3 Optimization
The GAE is trained by minimizing the reconstruction error of the graph by
(9) 
Since the groundtruth adjacency matrix is sparse, and the minimization is constrained to the nonzero elements of (i.e., ). For the VGAE, we maximize the variational lower bound w.r.t the parameters
(10) 
where
is the KullbackLeibler divergence function that measures the distance between two distributions. We use a Gaussian prior
.IiiC GAEFCNN for Network Classification
We design a GAEFCNN framework for brain connectome classification by combining the GAE with a fullyconnected DNN (FCNN). A readout layer is added to summarize latent node representations learned by the GAE for each graph into graphlevel representations, which are then fed into an FCNN to classify individual networks into MDD and HC.
IiiC1 Graph Embeddings Vectorization (Readout)
We apply a readout operation on the network node representations to generate higher graphlevel representations. In the readout layer, a vector representation of the graph can be learned by aggregating all individual node embeddings in the graph via some statistical summary measures
(11) 
where is the index of the last graph convolutional layer. The graph embedding can then be used to make predictions about the entire graph. The mean/max/sumbased embeddings can be used individually or concatenated into a single vector to capture different graphlevel information. In addition, to retain embedding information for all nodes, we also compute the graph embedding as by flattening of .
IiiC2 FCNN Classifier
The graph vector embeddings
are then used as inputs to a deep FCNN for networklevel classification. The FCNN classifier consists of multiples fullyconnected/dense layers, plus a final softmax classification layer to output the predictive probabilities of class labels for each network. The dense layer approximates a nonlinear mapping function to further capture relational information in the graph embeddings to discriminate between MDD and HC. The weight parameters of the FCNN are trained by minimizing crossentropy loss function using stochastic gradient descent methods and backpropagation of error. Dropout is also applied to prevent overfitting
[43].IiiC3 Supervised & Unsupervised Embedding Learning
We consider two classification schemes using the network embeddings learned in supervised and unsupervised ways. The proposed encoderdecoder framework (Fig. 1(a)) to extract network embeddings described thus far is by default unsupervised, i.e., the GAE is trained to reconstruct the original graph structure. We further develop a supervised framework, as shown in Fig. 1(b), which utilizes the taskspecific classification labels in order to learn the network embeddings. The innerproduce decoder in the supervised model is replaced with an FCNN to decode the embeddings from the output of GCN encoder to class labels. The parameters of the GCN encoder can be trained based on crossentropy loss between the predicted and true class labels using the backpropagation algorithm. By incorporating taskspecific supervision, the encoder model is optimized to generate embeddings that may be more discriminative of the MDD and HC classes. This model provides an endtoend framework for the brain network classification.
Iv Experiments
In this section, we present experimental evaluation of the proposed GAEDNN models for connectome classification on the rsfMRI MDD dataset described in Section II.
Classifier  Adjacency Matrix  Node Feature  Readout  Acc  Sen  Pre  F1 

Unsupervised GAEFCNN  Pearson (0.25)  RawfMRI  [mean,max,sum]  35.00 2.04  40.00 12.25  33.00 4.76  35.60 7.47 
Graphmeasures  flatten  57.50 13.82  30.00 29.15  50.00 44.72  33.33 28.60  
PearsonFC  [mean,max,sum]  58.06 9.15  70.00 18.71  56.67 9.33  60.31 6.44  
LDW (0.4)  RawfMRI  [mean,max,sum]  60.56 4.36  45.00 33.17  48.10 24.85  44.07 25.38  
Graphmeasures  [mean,max,sum]  69.72 9.06  55.00 18.71  80.00 18.71  61.33 14.04  
LDWFC  flatten  72.50 10.77  60.00 20.00  80.00 18.71  65.14 17.20  
Unsupervised VGAEFCNN  Pearson (0.25)  RawfMRI  flatten  57.50 22.08  60.00 30.00  52.67 21.87  55.49 25.18 
Graphmeasures  flatten  58.33 20.24  55.00 33.17  50.17 28.68  50.63 28.68  
PearsonFC  flatten  64.72 8.94  65.00 12.25  62.33 8.27  63.10 8.65  
LDW (0.4)  RawfMRI  [mean,max,sum]  65.28 6.33  55.00 18.71  63.67 8.33  57.86 13.96  
Graphmeasures  flatten  58.33 14.91  60.00 25.50  55.33 12.58  55.43 17.08  
LDWFC  [mean,max,sum]  55.83 8.07  60.00 20.00  52.00 7.48  54.22 13.23  
Supervised GCNFCNN  Pearson (0.25)  RawfMRI  flatten  57.78 20.14  65.00 25.50  57.22 19.61  58.74 18.32 
Graphmeasures  [mean,max,sum]  62.78 16.63  50.00 31.62  51.67 34.32  49.86 31.49  
PearsonFC  flatten  56.11 10.30  60.00 33.91  48.89 31.70  49.64 24.94  
LDW (0.4)  RawfMRI  flatten  53.61 15.31  25.00 15.81  51.67 40.96  32.05 21.68  
Graphmeasures  [mean,max,sum]  48.61 9.86  45.00 33.17  38.89 22.22  39.45 22.75  
LDWFC  flatten  62.50 9.54  60.00 25.50  61.67 10.00  57.86 13.96 
. Results are averages (standard deviations) of performance measures over 5fold crossvalidation.
Classifier  Readout  Acc  Sen  Pre  F1 

Unsupervised GAEFCNN  flatten  72.50 10.77  60.00 20.00  80.00 18.71  65.14 17.20 
mean  32.22 10.66  40.00 33.91  24.56 14.81  29.75 20.40  
max  46.39 11.64  55.00 24.49  45.56 12.37  47.45 11.92  
sum  39.44 16.70  35.00 12.25  37.33 18.03  35.87 14.66  
mean,max,sum  43.89 12.47  45.00 18.71  40.00 12.25  42.00 14.35  
Unsupervised VGAEFCNN  flatten  55.56 15.32  65.00 25.50  51.67 15.28  56.43 17.72 
mean  50.56 17.07  45.00 24.49  39.67 24.37  41.89 24.19  
max  55.83 17.00  60.00 33.91  52.46 33.69  51.55 26.63  
sum  51.39 9.86  50.00 22.36  55.00 24.49  47.00 13.27  
mean,max,sum  55.83 8.07  60.00 20.00  52.00 7.48  54.22 13.23  
Supervised GCNFCNN  flatten  62.50 9.54  60.00 25.50  61.67 10.00  57.86 13.96 
mean  57.50 13.82  65.00 33.91  45.57 25.21  52.58 27.09  
max  50.83 7.01  25.00 31.62  20.00 24.49  22.00 27.13  
sum  56.39 13.54  50.00 35.36  40.33 24.64  44.33 27.78  
mean,max,sum  44.44 6.09  50.00 27.39  40.00 8.16  42.76 14.38 
Iva Experimental Setup
IvA1 Data Partitioning
We applied a nestedstratified 5fold crossvalidation (CV) data partitioning strategy/scheme [35] to evaluate the performance of different models in classifying MDD and HC. Specifically, a twolevel 5fold CV was used comprising an outerloop for testing and an innerloop for model hyperparameter optimization. For each iteration in the outerloop, a test set was assigned, and the rest of the data were split into five trainvalidation partitions to tune the model hyperparameters. This process was repeated for all outerloop 5fold partitions. The best performing model (on the validation set) of the five candidate models was then selected to evaluate the performance on the unseen test sets. Four binary classification metrics were used to evaluate the classification performance, i.e., classification accuracy (), sensitivity (), precision (
), and Fscore (
).Classifier  Acc  Sen  Pre  F1  

Competing  SVMRBF  50.83 7.01  15.00 20.00  16.67 21.08  15.71 20.40 
BrainNetCNN  51.11 4.16  45.00 36.74  28.57 23.47  34.91 28.57  
Populationbased GCN  55.56 11.65  45.00 18.71  58.57 20.90  47.58 12.84  
Proposed  Supervised GCNFCNN  62.50 9.54  60.00 25.50  61.67 10.00  57.86 13.96 
Unsupervised GAEFCNN  72.50 10.77  60.00 20.00  80.00 18.71  65.14 17.20  
Unsupervised VGAEFCNN  65.28 6.33  55.00 18.71  63.67 8.33  57.86 13.96 
IvA2 Model Architecture and Training
We implement the proposed GAEFCNN based on Pytorch
[34] using the GraphConv module from DGL library [46] for GCN. For the unsupervised model, the architecture and hyperparameters of GAE and FCNN were determined separately. We computed the reconstruction error of graph over a range of hyperparameters for the GAE, and a twolayered GCN with respective embedding dimensions of 64 and 16 was identified as the optimal architecture for both GAE/VGAE with the minimum reconstruction error. Further increase in the number of GCN layers gave no further improvement. Using the extracted network adjacency matrices and node feature matrices (dimension depends on type of features used) as inputs, the GAEs were trained using Adam optimizer [22] to minimize graph reconstruction loss, with learning rate of , reducefactor of , 200 training epochs and a batch size of 8. Fig. 2 illustrates a training curve of the GAE model with decreasing reconstruction error over epochs. The trained GAE decoder was then used to generate node embedding matrices as inputs to the FCNN. Bayesian optimization [16] with expected improvement acquisition function was used to optimize the hyperparameters of FCNN, which suggested an architecture of 3 dense layers (with respective 256, 256, 128 hidden nodes), learning rate of , reduce factor of and a batch size of 4. The FCNN was also trained on the extracted graph embeddings using Adam algorithm.For the supervised model, the hyperparameters of the GCN and FCNN were optimized simultaneously using the Bayesian optimizer. The selected hyperparameters are: 1 convolutional layer with dimension of 94 for GCN, 2 dense layers (with 128, 264 hidden nodes) for FCNN with learning rate of , reduce factor of and a batch size of 6. A dropout ratio of 0.2 was also chosen for the dense layers. The model was trained on the fMRI network data with target class labels, using the Adam algorithm to minimize crossentropy loss.
IvA3 Methods for Comparison
We benchmark the classification performance of the proposed methods with a traditional SVM classifier and two stateoftheart connectomespecific DNN models: BrainNetCNN and population graphbased GCN. These competing models were evaluated with the same 5fold CV as the proposed methods.

SVMRBF
: We trained SVM with radial basis function (RBF) on the vectorized LDWcorrelation coefficients. The hyperparameters of SVM were optimized based on a gridsearch on the validation set.

BrainNetCNN: The BrainNetCNN [20] is a specially designed deep CNN model which can preserve spatial information in brain connectivity data. Here, the LDW correlation matrices were used directly as inputs to the BrainNetCNN to predict the class labels of MDD and HC as output. It consists of three types of layers: edgetoedge (E2E) layers, edgetonode (E2N) layers, and nodetograph (N2G) layers. The E2E layer applies a crossshaped convolution filter to each element of the FC input matrix, and combines the edge weights of neighbor nodes to output an matrix. The E2N layer is equivalent to the 1DCNN filter designed for dimensionality reduction. The N2G layer is a dense layer taking the E2N output to produce a single scalar. Finally, the output of N2G is fed to classification layer for prediction.

Populationbased GCN: This method exploits the GCN to model a population graph, where each node represents a subject and edges encode similarity between subjects [23]. It performs node/subject levelclassification in a semisupervised manner to predict brain disorders. Similar to [23], we used the vectorized upper triangular part of LDW correlation matrices of all subjects as inputs to the populationbased GCN. We set the default model hyperparameters with Chebyshev polynomial basis filters for spectral convolutions as in [23]. The model was trained using 500 epochs with early stopping patience of 10 epochs.
IvB Results
IvB1 Comparison of Network Construction Strategies
Table I shows the classification performance (average and standard deviation over 5 folds) of the unsupervised GAE/VGAEFCNN and supervised GCNFCNN classifiers. To investigate the impact of choices of network construction strategies on classification, we also evaluated two FC metrics to construct the graph adjacency matrix : Pearson’s correlation matrix and LDW shrinkage correlation matrix; three types of input node features for : raw rsfMRI time series, FC weights (LDW correlation coefficients) and nodal graphtheoretic measures. The selected readout schemes are also given, and details will be discussed in the next section. As expected, using input graph data based on the LDW correlations shows superior performance over the traditional Pearson’s correlations in classifying MDD and HC for all classification models, as the LDW shrinkage method can provide more reliable estimate of the highdimensional network structure. For node features, the use of LDWFC generally provided better classification than the raw fMRI time series and local graph measures. This indicates more discriminative information in the connection weights compared to the lowlevel BOLD fluctuations, and learning of higherlevel meta representations from local graph features also fails to offer additional advantages for classification.
We can see that the unsupervised GAE/VGAEFCNNs performed better than the supervised GCNFCNN model, with GAEFCNN achieving the highest classification accuracy when using LDWFC for both the graph construction and node features. This suggests that embeddings learned in an unsupervised manner to preserve faithfully the brain network topology can be more predictive of MDD and HC than that optimized to discriminate the class labels directly. Among the unsupervised models, however use of the probabilistic encoding framework in VGAE does not improve classification performance, probably limited by the strong assumption of an i.i.d. Gaussian prior on latent embeddings, and the approximated model parameter inference of the variational method. Future work will investigate bettersuited prior distribution in the VGAE for brain network data.
IvB2 Comparison of Readout Strategies
Table II shows the classification results for different readout strategies. We compared different readout/transformation methods to obtain graphlevel representation as inputs to FCNN classifier, i.e., flattening of and mean/max/sum aggregation of node embeddings . It can be seen that the flattening method by concatenating learned embeddings of all nodes as input yields better classification performance for different classifiers generally, compared to the aggregation method which may induce loss of information about individual nodes.
IvB3 Comparison with StateoftheArt Methods
Table III shows the performance comparison of different connectomebased classification methods. The proposed methods clearly outperformed the competing models by a large margin, with the unsupervised GAEFCNN performing the best. In consistency with recent studies, our results suggest the advantages of DNN methods over traditional SVM classifier with significant improvement in FC classification. The populationbased GCN performs slightly better than the BrainNetCNN. The populationbased GCN, while leveraging on pairwise associations between subjects in a population graph for node/subjectlevel classification, does not classify brain networks directly as in our proposed models. The use of gridwise convolutions in the BrainNetCNN, despite its capability to capture spatial information of neighboring nodes, fails to account for irregular structure of brain networks. The superior performance of GAEFCNN models compared to the two DNNs implies that incorporating network topological structure as captured by the network embeddings for classification can provide discriminative information for identifying MDD, a disorder associated with disrupted brain networks. The proposed framework achieved the best accuracy of when using the unsupervised GAEFCNN on a challenging task of classifying MDD brain networks, based on 5fold CV on a small dataset, in contrast to the leaveonesubjectout classification in previous studies.
IvB4 Connectivity Maps Learned by GAE
In Fig. 3, we plot the averaged feature maps of nodelevel embeddings learned by the GCNGAE from LDWbased networks for the MDD and HC subjects. Noticeable difference in the learned embedding pattern can be seen between the two groups, with stronger activation for some ROIs in MDD compared to HC. This demonstrates the ability of the proposed model to extract latent representations of brain network structure that can clearly distinguish between MDD and controls, which explains the enhanced performance in the downstream classification task compared to other methods.
We further constructed higherorder FC by correlating the GAElearned embeddings between pairs of nodes. Fig. 4 shows the difference in connectivity pattern between the MDD and HC groups as quantified by the LDWestimated raw FC and the embeddingbased higherorder FC. A grouplevel ttest was used to contrast the FC between the two groups, and connections with significant difference <0.05) are shown in Fig. 4(right). The embeddingbased FC matrices (Fig. 4(left & middle)) exhibit an apparent block structure revealing the modular organization, an important property of brain networks. Compared to raw FC, it is evident that the embeddingbased FC detected more pronounced and systematic difference in connectivity, particularly between specific communities or modules of ROIs. To examine whether these differences are biologically meaningful and related to MDD as a networkbased disorder, we plot the topological maps in Fig. 5 to visualize the increase and decrease in FC between ROIs in MDD relative to HC. The embedding FC identified a spread reduction in intrinsic connectivity of the amygdala with a variety of ROIs involved in emotional processing and regulation in MDD subjects (including caudate, temporal regions, occipital cortex, and cerebellum), as reported in previous rsfMRI studies [39]. In agreement with previous findings [28], we also found significant increase in FC in the default mode network (DMN). The detected altered rsFC between cerebellum with the DMN and affective network has also been associated with major depression [30, 52]
V Conclusion
We developed a deep GNN framework for embedding learning in brain functional networks to identify connectomespecific biosignatures for classifying brain disorders such as MDD. The proposed GAEFCNN provides a novel approach to incorporating the nonEuclidean information about graph structure into the classification of brain networks. It combines a GCNbased GAE that can learn latent embeddings effectively to encode topological information and node content, and a deep FCNN that leverages on the learned embeddings to reveal disrupted neural connectivity patterns in MDD relative to HC for classification purpose. On a challenging task of classifying MDD and HC using a small amount of rsfMRI data, the proposed method substantially outperforms several stateoftheart brain connectome classifiers, achieving the best accuracy of with the unsupervised GAEFCNN model. Furthermore, higherorder networks constructed from the node embeddings generated from the proposed GAE detects altered FC patterns in MDD related to emotional processing, which are not captured by the original FC measures. Our framework is generally applicable to other functional neuroimaging data, e.g., EEGderived networks, and other neuropsychiatric disorders besides MDD associated with neural network dysconnectivity, showing potential as diagnostic tool in clinical settings. This study focuses on analysis of static brain networks to show GAE as a promising approach to brain connectome classification. Recent rsfMRI studies have suggested disruption in the dynamic FC and graph properties in MDD [56, 54]. Future work could extend the proposed framework to classify dynamic FC patterns by learning timevarying latent representations to embed the dynamic network structure.
References
 [1] (2019) Brain network functional connectivity and cognitive performance in major depressive disorder. J. Psychiatr. Res. 110, pp. 51–56. Cited by: §II1.
 [2] (2009) Human brain networks in health and disease. Curr. Opin. Neurol. 22 (4), pp. 340. Cited by: §I.
 [3] (2017) Multivariate pattern analysis strategies in detection of remitted major depressive disorder using resting state functional connectivity. Neuroimage Clin. 16, pp. 390–398. Cited by: §I.
 [4] (2017) Resting state brain network function in major depression–depression symptomatology, antidepressant treatment effects, future research. J. Psychiatr. Res. 92, pp. 147–159. Cited by: §I.
 [5] (2015) Partial covariance based functional connectivity computation using ledoit–wolf covariance regularization. NeuroImage 121, pp. 29–38. Cited by: §IIIA1.
 [6] (2016) Machine learning on human connectome data from MRI. arXiv preprint arXiv:1611.08699. Cited by: §I.
 [7] (2014) Spectral networks and locally connected networks on graphs. pp. 1–14. Cited by: §I.
 [8] (2014) Aberrant functional connectivity for diagnosis of major depressive disorder: A discriminant analysis. Psychiatry Clin. Neurosci. 68 (2), pp. 110–119. Cited by: §I.
 [9] (2019) Benchmarking functional connectomebased predictive models for restingstate fMRI. NeuroImage 192, pp. 115–134. Cited by: §I.
 [10] (2018) Classification and prediction of brain disorders using functional connectivity: Promising but challenging. Front. Neurosci. 12, pp. 525. Cited by: §I.
 [11] (2015) The (in) stability of functional brain network measures across thresholds. NeuroImage 118, pp. 651–661. Cited by: §IIIA1.
 [12] (2018) Multivariate classification of major depressive disorder using the effective connectivity and functional connectivity. Front. Neurosci. 12, pp. 38. Cited by: §I.
 [13] (2007) Restingstate functional connectivity in major depression: abnormally increased contributions from subgenual cingulate cortex and thalamus. Biol. Psychiatry 62 (5), pp. 429–437. Cited by: §I.
 [14] (2020) Human brain connectivity: Clinical applications for clinical neurophysiology. Clin. Neurophysiol. 131, pp. 1621–51. Cited by: §I.
 [15] (2020) Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. NeuroImage 206, pp. 116276. Cited by: §I.
 [16] Scikitoptimize/scikitoptimize: v0.5.2 External Links: Document, Link Cited by: §IVA2.
 [17] (2018) Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neuroimage Clin. 17, pp. 16–23. Cited by: §I.
 [18] (2020) HiGCN: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction. Comput. Biol. Med. 127, pp. 104096. Cited by: §I, §IIIA1.
 [19] (2019) Deep learning of static and dynamic brain functional networks for early MCI detection. IEEE Trans. Med. Imaging 39 (2), pp. 478–487. Cited by: §I.
 [20] (2017) BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage 146, pp. 1038–1049. Cited by: §I, item 2.
 [21] (2016) Deep neural network with weight sparsity control and pretraining extracts hierarchical features and enhances classification performance: evidence from wholebrain restingstate functional connectivity patterns of schizophrenia. Neuroimage 124, pp. 127–146. Cited by: §I.
 [22] (2015) Adam: A method for stochastic optimization. In 3rd Int. Conf. Learn. Repr., ICLR 2015, San Diego, CA, USA, May 79, External Links: Link Cited by: §IVA2.
 [23] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §I, §IIIB1, item 3.
 [24] (2016) Variational graph autoencoders. NIPS Workshop on Bayesian Deep Learning. Cited by: §IIIB1, §IIIB.
 [25] (2004) A wellconditioned estimator for largedimensional covariance matrices. J. Multivar. Anal. 88 (2), pp. 365–411. Cited by: §I, §IIIA1.
 [26] (2019) Graph neural network for interpreting taskfMRI biomarkers. In 22nd Int. Conf. Med. Image Comput. Assist. Interv.,MICCAI, Shenzhen, China, Oct 1317, pp. 485–493. Cited by: §I.
 [27] (2017) Resting state fMRI functional connectivitybased classification using a convolutional neural network architecture. Front. Neuroinf. 11 (61). Cited by: §I.
 [28] (2015) Restingstate functional connectivity in major depressive disorder: A review. Neurosci. Biobehav. Rev. 56, pp. 330–344. Cited by: §I, §IVB4.
 [29] (2011) Underconnected, but how? A survey of functional connectivity MRI studies in autism spectrum disorders. Cereb. Cortex 21 (10), pp. 2233–2243. Cited by: §I.
 [30] (2010) Distinct and overlapping functional zones in the cerebellum defined by resting state functional connectivity. Cerebral Cortex 20 (4), pp. 953–965. Cited by: §IVB4.
 [31] (2018) Adversarially regularized graph autoencoder for graph embedding. pp. 2609–2615. Cited by: §IIIB1, §IIIB.
 [32] (2018) Disease prediction using graph convolutional networks: application to autism spectrum disorder and Alzheimer’s disease. Med. Image Anal. 48, pp. 117–130. Cited by: §I.
 [33] (2017) Spectral graph convolutions for populationbased disease prediction. In 20th Int. Conf. Med. Image Comput. Assist. Interv.,MICCAI, Quebec, Canada, Sep 1014, pp. 177–185. Cited by: §I.
 [34] (2019) PyTorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlchéBuc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Link Cited by: §IVA2.
 [35] (2009) Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 45 (1), pp. S199–S209. Cited by: §IVA1.
 [36] (2019) A multidomain connectome convolutional neural network for identifying schizophrenia from eeg connectivity patterns. IEEE J. Biomed. Health. Inf. 24 (5), pp. 1333–1343. Cited by: §I.
 [37] (2014) Deep learning for neuroimaging: a validation study. Front. Neurosci. 8, pp. 229. Cited by: §I.
 [38] (2020) Improving the detection of autism spectrum disorder by combining structural and functional MRI information. Neuroimage Clin. 25, pp. 102181. Cited by: §I.
 [39] (2014) Reduced intrinsic connectivity of amygdala in adults with major depressive disorder. Front. Psychiatry 5, pp. 17. Cited by: §IVB4.
 [40] (2010) Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52 (3), pp. 1059–1069. Cited by: §IIIA2.
 [41] (1998) The miniinternational neuropsychiatric interview (MINI): The development and validation of a structured diagnostic psychiatric interview for DSMIV and ICD10.. J. Clin. Psychiatry. Cited by: §II1.
 [42] (2020) Automated detection of autism spectrum disorder using a convolutional neural network. Front. Neurosci. 13, pp. 1325. Cited by: §I.
 [43] (2014) Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (1), pp. 1929–1958. Cited by: §IIIC2.
 [44] (2017) Proportional thresholding in restingstate fMRI functional connectivity networks and consequences for patientcontrol connectome studies: issues and recommendations. NeuroImage 152, pp. 437–449. Cited by: §IIIA1.
 [45] (2012) Whole brain resting state functional connectivity abnormalities in schizophrenia. Schizophr. Res. 139 (13), pp. 7–12. Cited by: §I.
 [46] (2019) Deep graph library: A graphcentric, highlyperformant package for graph neural networks. arXiv preprint arXiv:1909.01315. Cited by: §IVA2.
 [47] (2020) A bayesian approach to examining default mode network functional connectivity and cognitive performance in major depressive disorder. Psychiatry Res. Neuroimaging 301, pp. 111102. Cited by: §II1.
 [48] (2017) Building better biomarkers: Brain models in translational neuroimaging. Nat. Neurosci. 20 (3), pp. 365. Cited by: §I.
 [49] (2015) Restingstate functional connectivity in psychiatric disorders. JAMA Psychiatry 72 (8), pp. 743–744. Cited by: §I.
 [50] (2021) A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 22 (1), pp. 4–24. Cited by: §I.
 [51] (2015) Changes of functional brain networks in major depressive disorder: a graph theoretical analysis of restingstate fMRI. PloS one 10 (9), pp. e0133775. Cited by: §I.
 [52] (2012) Identifying major depression using wholebrain functional connectivity: A multivariate pattern analysis. Brain 135 (5), pp. 1498–1507. Cited by: §I, §IVB4.
 [53] (2010) Resting brain connectivity: changes during the progress of Alzheimer disease. Radiology 256 (2), pp. 598–606. Cited by: §I.
 [54] (2020) The concurrent disturbance of dynamic functional and structural brain connectome in major depressive disorder: the prefrontoinsular pathway. J. Affect. Disord. 274, pp. 1084–1090. Cited by: §V.
 [55] (2011) Disrupted brain connectivity networks in drugnaive, firstepisode major depressive disorder. Biol. Psychiatry 70 (4), pp. 334–342. Cited by: §I.
 [56] (2018) Aberrant dynamic functional network connectivity and graph properties in major depressive disorder. Front. Psychiatry 9, pp. 339. Cited by: §V.
 [57] (2020) Crossnetwork interaction for diagnosis of major depressive disorder based on resting state functional connectivity. Brain Imag. Behav., pp. 1–11. Cited by: §I.