Aanalysis of brain functional connectivity (FC) networks inferred from functional magnetic resonance imaging (fMRI) data has become an important method to probe large-scale functional organization of the human brain in health and disease . Considerable evidence from resting-state fMRI (rs-fMRI) studies has shown altered or aberrant brain functional connectome in various neuropsychiatric and neurodegenerative disorders , e.g., schizophrenia , autism spectrum disorder (ASD) , Alzheimer’s disease (AD) , suggesting potential use of network-based biomarkers for clinical diagnostics . Functional abnormalities are detected not only in the strengths of individual connections but also topological structure of resting-state FC networks . The brain function in major depressive disorder (MDD) — the most prevalent psychiatric disorder with pervasive depressed mode, cognitive inability and suicidal tendency, has been a subject of intensive studies recently. It is increasingly understood as a network-based disorder with consistent alternations in FC patterns . Disrupted resting-state FC from fMRI has been found in MDD core networks, such as the default mode network (DMN) related to self-referential processing and emotion regulation, central executive network (CEN) for attention and working memory, and other subcortical circuitries . Increased connectivity within DMN  and decreased connectivity between DMN and CEN have been observed in MDD patients compared to healthy controls (HCs) . Graph theoretical analyses of rs-fMRI also revealed altered network topological properties in MDD, e.g., enhanced global efficiency  and high local efficiency and modularity .
have employed SVMs combined with some ad-hoc feature selection methods to differentiate MDD from HCs using rs-fMRI FC, and obtained reasonable classification accuracies on leave-one-subject-out cross-validation.
Deep learning methods have received significant interest in fMRI-based classification of brain disorders 
. In recent applications to connectome-based classification, it has shown great potential providing substantial gain in performance over traditional classifiers. Deep neural networks (DNNs) can automatically learn a hierarchy of representations directly from the connectome data, without relying on preliminary feature hand-crafting and selection. Fully-connected DNNs have been used as autoencoders (AE) to map high-dimensional input vectors of FC metrics to latent compact representations for rs-fMRI classification of ASD[17, 38] and schizophrenia . Inspired by remarkable success in image and object classification, deep convolutional neural networks (CNNs) have also been used to learn spatial maps of brain functional networks. A CNN architecture (BrainNetCNN) with specially-designed convolutional filters for modeling connectome data was introduced by  for predicting neurodevelopment in infants. Various variants of connectome CNNs were subsequently proposed for FC classification. These include one-dimensional (1D) spatial convolutional filters on rs-fMRI FC data for mild cognitive impairment (MCI) identification , 2D-CNNs for FC matrices for ASD classification , 3D CNNs to combine static and dynamic FC for early MCI detection , and multi-domain connectome CNN to integrate different brain network measures . The above-mentioned deep learning models generally neglect the topological information of the brain networks which may lead to sub-optimal performance in brain disorder identification. The flattening of input FC maps in fully-connected DNNs destroys the spatial structure, while the use of fixed 1D or 2D regular grid convolution operators in CNNs also fails to capture the graph-structured connectome data. Brain networks typically exhibit irregular structure with nodes being unordered and connected to a different number of neighbors, which renders convolution operations for regular grid inappropriate for modeling graphs.
Extending deep learning approaches to data in non-Euclidean domain, including graphs, is a rapidly growing field . One popular graph-based neural network (GNN) architecture, the graph convolutional networks (GCNs), generalizes operations in CNNs to learn local and global structural patterns in irregular graphs. A spectral-based GCN has been proposed to perform convolutions in the graph spatial domain as multiplications in the graph spectral domain [7, 23]. Applications of spectral GCNs to brain disorder detection from brain functional networks are introduced only recently and in its very early stage, e.g., for predicting ASD and conversion from MCI to AD [33, 32, 26, 18]
. These studies used a population graph as input to GCN, where nodes represent subjects with associated resting-state FC feature vectors, while phenotype information is encoded as graph edge weights. However, this approach inherently relies on non-imaging data to construct graphs and requires prior knowledge of relevant phenotype information for specific disorders. Moreover, it is semi-supervised learning using all subjects (both training and testing sets) as inputs and thus lacks generalization on unseen subjects. A recent benchmarking study also showed that population-based spectral GCN is less effective than the BrainNetCNN in resting-state FC-based behavioral prediction.
In this paper, we propose a novel framework based on deep GNN for graph embedding on brain functional networks for classifying neuropsychiatric disorders associated with functional dysconnectivity. Precisely, we develop a graph autoencoder (GAE) architecture that leverages GCN to encode the non-Euclidean information about brain connectome into low-dimensional latent representations (or network embeddings), on which a decoder is trained to reconstruct the graph structure. The learned embeddings allow dimensionality reduction of large-sized brain network data, and preserves the network topological structure and node content information used for subsequent connectome-based classification. The extracted patterns by the multiple graph convolutional layers in GCNs can include high-level representations of nodes’ local graph neighborhood. Besides the unsupervised embedding learning using GAE, we also consider supervised learning where the model makes use of disorder class labels to optimize the embeddings. Finally, a readout layer is added to summarize the node representations of each graph into a graph representation, which is then used as feature inputs to a fully-connected DNN (FCNN) for network classification. We apply the proposed GAE-FCNN to rs-fMRI data for classification of MDD and HCs using whole-brain FC networks. The GAE-DNN is trained on high-dimensional functional networks constructed from rs-fMRI using Ledoit-Wolf (LDW) covariance estimator. We also explore different types of node features: fMRI time series, associated FC edges and local graph measures. The main contributions of this work are summarized as follows:
We propose, for the first time, a graph deep learning framework for brain FC-based identification of MDD.
The proposed GAE-FCNN framework offers a novel approach to directly leverage on the alterations in network structure for brain disorder classification via the learned network embeddings. The GCN-based GAE architecture provides a purely unsupervised way to learn embeddings that encode the irregular topological structure of brain networks, which are inadequately modeled by the connectome CNNs and the vectorized FC features in population graphs. The GAE combined with a deep DNN facilitates graph-level classification to predict class labels for the entire brain graph, rather than node/subject-level classification based on population graphs.
We demonstrate that our approach outperforms both the BrainNetCNN and population-based GCN by a large margin in identifying MDD based on resting-state functional brain networks from fMRI.
We show that higher-order network reconstructed from nodes embeddings learned by the proposed GCN-based GAE can reveal differences in network organization between MDD and HCs related to emotion processing.
Ii rs-fMRI Dataset for MDD
Ii-1 Subjects & Data Acquisition
We used a resting-state fMRI MDD dataset collected at the Duke University Medical Center, USA, studied previously in [1, 47]. The current analyses consist of 43 subjects, including 23 non-depressed (HC) and 20 depressed (MDD) participants aged between 20 and 50 years old. Depressed participants had met the Diagnostic and Statistical Manual of Mental Disorder (DSM-IV) criteria of MDD, as assessed by the Mini-International Neuropsychiatric Interview (MINI, version 5.0)  and interview with a study psychiatrist. Participants were scanned on a Siemens 3.0T Trio Tim scanner, with an 8-channel head coil. Echoplanar BOLD functional resting scans were acquired with transverse orientation (TR/TE = 2000/27 ms, voxel size = 4.0 4.0 4.0 mm, 32 axial slices). A time series of 150 volumes were collected for each scan.
Standard preprocessing steps were applied to the fMRI data using Conn toolbox (version 15.g) in SPM 12, including motion correction, slice timing correction, co-registration of functional and anatomical images, normalization to the standard MNI (Montreal Neurological Institute) template, and Gaussian spatial smoothing with FWHM (full width at half maximum) = 6 mm. The fMRI data were band-pass filtered between 0.01-0.07 Hz. The automated anatomical labeling (AAL) atlas was used to obtain an anatomical parcellation of the whole-brain into 116 regions of interest (ROIs), and ROI-wise fMRI time series were extracted by averaging over voxels.
Fig. 1 shows an overview of the proposed GAE-FCNN framework for identifying brain disorders using fMRI-based functional brain networks, which consists of three stages: (1) Network construction. High-dimensional FC networks are constructed from fMRI data using LDW shrinkage covariance estimator, and associated node attributes/features are extracted. (2) Network embedding via a GCN-based GAE. The GAE learns network embeddings by using an encoder of stacked GCNs to map the input graph structure and node content of FC networks into latent representation (or embeddings), and using an inner-product decoder to enforce embeddings to preserve graph topological information. (3) Network classification. The learned network embeddings are then used as inputs to a fully-connected DNN to discriminate between MDD patients and HCs. We develop an unsupervised (Fig. 1(a)) and a supervised (Fig. 1(b)) framework for learning graph embeddings in brain networks.
Iii-a Connectivity Network Construction
We consider an undirected graph of brain functional network for each subject, represented by where is a set of nodes (voxels or ROIs) and denotes the connectivity edge between nodes and . The topological structure of the graph can be represented by an adjacency matrix , where if nodes and are connected, otherwise . We denote by the node feature matrix for , with representing the content feature vector associated with each node .
Iii-A1 Network Connectivity
In constructing FC networks, we compute the FC matrix based on the temporal correlations of fMRI time series between pairs of ROIs. Let , be the fMRI time series of length measured from the ROIs. For large-sized fMRI-derived networks in which the number of nodes is larger or comparable to the number of scans , traditional sample correlation matrix is no longer a reliable and accurate estimator of FC. This is due to large number of correlation coefficients (i.e., ) to be estimated relative to the sample size. This condition applies to the MDD fMRI data considered here ( and ROIs). To estimate functional connectomes efficiently, we use the Ledoit-Wolf (LDW) regularized shrinkage estimator [25, 5] which can yield well-conditioned FC estimates in high-dimensional settings when the ratio of is large. The LDW covariance estimator is defined by with , where is a shrinkage parameter, is a identity matrix and is the sample covariance matrix. The correlation matrix is then computed as where
We can generate the adjacency matrix by thresholding the correlation matrix . We used the proportional thresholding  which sets a proportion of strongest connections (with the highest absolute correlation values) of the derived FC matrix for each individual network to 1, and other connections to zero. By applying a proportional threshold value of , the number of retained links/edges in a graph is . This approach will result in a fixed density of edges in graphs across all subjects, and thus enabling meaningful comparison of network topology between different groups and conditions. It can also generate more stable network metrics compared to the absolute thresholding . It has been shown that the setting of threshold has a significant impact on the overall performance of the network classification model . Besides, when decreases, networks become sparser and may lead to the zero-degree nodes (isolated nodes totally disconnected from the rest of the graph). By evaluating over a range of thresholds, was chosen to generate graphs without zero-degree nodes for all subjects and give the optimal classification performance.
Iii-A2 Node Features
We consider three types of node features for . (1) raw rs-fMRI time series associated with each node which can capture spontaneous fluctuations in the BOLD signal in individual brain regions. (2) FC weights of edges connected to each node, i.e., each row of the LDW-estimated correlation matrix. (3) Graph-theoretic measures to characterize graph topological attributes at local (nodal) level. A list of 18 different nodal graph measures 
was extracted for each individual node, including degree, eigenvector centrality, modularity, PageRank centrality, nodal eccentricity, community Louvain, module degree z-score, participation coefficient, routing efficiency, clustering coefficient, diversity coefficient, gateway coefficient (node strength), gateway coefficient (betweenness centrality), local assortativity, participation coefficient, node strength, node betweenness, and global efficiency.
Iii-B Graph Convolutional Autoencoder
We propose a new approach that builds on the graph autoencoder (GAE) [24, 31] to learn graph embeddings on brain networks in a purely unsupervised framework. Given the brain network , the autoencoder maps the nodes to low-dimensional vectors (or embeddings), using an encoder where with the dimension of embedding, and then reconstruct the graph structure from the embeddings using a decoder. The learned latent representations should reflect the topological structure of the graph and the node content information . It contains all the information necessary for downstream graph classification tasks for brain disorders. We consider two variants of GAE: (1) Generic GAE which aims to reconstruct the original input graph adjacency matrix, (2) Variational GAE (VGAE) , a variational extension of GAE to learn the distribution of embeddings, which could prevent potential model overfitting.
Iii-B1 Graph Convolutional Encoder Model
To encode both graph structure and node content into in a unified way, we employ a variant of graph convolutional network (GCN)  as the graph encoder of GAE. The GCN is a first-order approximation of graph convolutions in the spectral domain. We consider a multi-layer GCN which learns a layer-wise transformation by a spectral graph convolutional function as follows
where is the latent feature matrix after convolution at -th layer of GCN, is a layer-specific trainable weight matrix. Here, is the input node feature matrix. The propagation for each layer of the GCN can be calculated as
where is normalized adjacency matrix with added self-connections to ensure numerical stability, is a node degree matrix with diagonals , and
denotes the activation function. Model (2) generates embeddings for a node by aggregating feature information from its local neighborhood at each layer.
We construct the graph convolutional encoder based on two-layered GCN as in 
which produces latent representation with the following forward propagation
where , and are ReLU(·) and linear activation functions in first and second layers, respectively.
In the VGAE, variational graph encoder is defined by an inference model parameterized by a two-layer GCN 
Here, the embeddings
are generated according to a normal distribution with mean
and variance. is the matrix of mean vectors defined by the GCN encoder output in (5), and is defined similarly as using another encoder.
Iii-B2 Decoder Model
The decoder of GAE aims to decode graph structural information from the embeddings by reconstructing the graph adjacency matrix. The GAE decoder model predicts the presence of a link between two nodes based on the inner product between latent vectors of
is the logistic sigmoid function. The graph adjacency matrix can be reconstructed as.
The GAE is trained by minimizing the reconstruction error of the graph by
Since the ground-truth adjacency matrix is sparse, and the minimization is constrained to the non-zero elements of (i.e., ). For the VGAE, we maximize the variational lower bound w.r.t the parameters
is the Kullback-Leibler divergence function that measures the distance between two distributions. We use a Gaussian prior.
Iii-C GAE-FCNN for Network Classification
We design a GAE-FCNN framework for brain connectome classification by combining the GAE with a fully-connected DNN (FCNN). A readout layer is added to summarize latent node representations learned by the GAE for each graph into graph-level representations, which are then fed into an FCNN to classify individual networks into MDD and HC.
Iii-C1 Graph Embeddings Vectorization (Readout)
We apply a readout operation on the network node representations to generate higher graph-level representations. In the readout layer, a vector representation of the graph can be learned by aggregating all individual node embeddings in the graph via some statistical summary measures
where is the index of the last graph convolutional layer. The graph embedding can then be used to make predictions about the entire graph. The mean/max/sum-based embeddings can be used individually or concatenated into a single vector to capture different graph-level information. In addition, to retain embedding information for all nodes, we also compute the graph embedding as by flattening of .
Iii-C2 FCNN Classifier
The graph vector embeddings
are then used as inputs to a deep FCNN for network-level classification. The FCNN classifier consists of multiples fully-connected/dense layers, plus a final softmax classification layer to output the predictive probabilities of class labels for each network. The dense layer approximates a non-linear mapping function to further capture relational information in the graph embeddings to discriminate between MDD and HC. The weight parameters of the FCNN are trained by minimizing cross-entropy loss function using stochastic gradient descent methods and backpropagation of error. Dropout is also applied to prevent overfitting.
Iii-C3 Supervised & Unsupervised Embedding Learning
We consider two classification schemes using the network embeddings learned in supervised and unsupervised ways. The proposed encoder-decoder framework (Fig. 1(a)) to extract network embeddings described thus far is by default unsupervised, i.e., the GAE is trained to reconstruct the original graph structure. We further develop a supervised framework, as shown in Fig. 1(b), which utilizes the task-specific classification labels in order to learn the network embeddings. The inner-produce decoder in the supervised model is replaced with an FCNN to decode the embeddings from the output of GCN encoder to class labels. The parameters of the GCN encoder can be trained based on cross-entropy loss between the predicted and true class labels using the backpropagation algorithm. By incorporating task-specific supervision, the encoder model is optimized to generate embeddings that may be more discriminative of the MDD and HC classes. This model provides an end-to-end framework for the brain network classification.
In this section, we present experimental evaluation of the proposed GAE-DNN models for connectome classification on the rs-fMRI MDD dataset described in Section II.
|Classifier||Adjacency Matrix||Node Feature||Readout||Acc||Sen||Pre||F1|
|Unsupervised GAE-FCNN||Pearson (0.25)||Raw-fMRI||[mean,max,sum]||35.00 2.04||40.00 12.25||33.00 4.76||35.60 7.47|
|Graph-measures||flatten||57.50 13.82||30.00 29.15||50.00 44.72||33.33 28.60|
|Pearson-FC||[mean,max,sum]||58.06 9.15||70.00 18.71||56.67 9.33||60.31 6.44|
|LDW (0.4)||Raw-fMRI||[mean,max,sum]||60.56 4.36||45.00 33.17||48.10 24.85||44.07 25.38|
|Graph-measures||[mean,max,sum]||69.72 9.06||55.00 18.71||80.00 18.71||61.33 14.04|
|LDW-FC||flatten||72.50 10.77||60.00 20.00||80.00 18.71||65.14 17.20|
|Unsupervised VGAE-FCNN||Pearson (0.25)||Raw-fMRI||flatten||57.50 22.08||60.00 30.00||52.67 21.87||55.49 25.18|
|Graph-measures||flatten||58.33 20.24||55.00 33.17||50.17 28.68||50.63 28.68|
|Pearson-FC||flatten||64.72 8.94||65.00 12.25||62.33 8.27||63.10 8.65|
|LDW (0.4)||Raw-fMRI||[mean,max,sum]||65.28 6.33||55.00 18.71||63.67 8.33||57.86 13.96|
|Graph-measures||flatten||58.33 14.91||60.00 25.50||55.33 12.58||55.43 17.08|
|LDW-FC||[mean,max,sum]||55.83 8.07||60.00 20.00||52.00 7.48||54.22 13.23|
|Supervised GCN-FCNN||Pearson (0.25)||Raw-fMRI||flatten||57.78 20.14||65.00 25.50||57.22 19.61||58.74 18.32|
|Graph-measures||[mean,max,sum]||62.78 16.63||50.00 31.62||51.67 34.32||49.86 31.49|
|Pearson-FC||flatten||56.11 10.30||60.00 33.91||48.89 31.70||49.64 24.94|
|LDW (0.4)||Raw-fMRI||flatten||53.61 15.31||25.00 15.81||51.67 40.96||32.05 21.68|
|Graph-measures||[mean,max,sum]||48.61 9.86||45.00 33.17||38.89 22.22||39.45 22.75|
|LDW-FC||flatten||62.50 9.54||60.00 25.50||61.67 10.00||57.86 13.96|
. Results are averages (standard deviations) of performance measures over 5-fold cross-validation.
|Unsupervised GAE-FCNN||flatten||72.50 10.77||60.00 20.00||80.00 18.71||65.14 17.20|
|mean||32.22 10.66||40.00 33.91||24.56 14.81||29.75 20.40|
|max||46.39 11.64||55.00 24.49||45.56 12.37||47.45 11.92|
|sum||39.44 16.70||35.00 12.25||37.33 18.03||35.87 14.66|
|mean,max,sum||43.89 12.47||45.00 18.71||40.00 12.25||42.00 14.35|
|Unsupervised VGAE-FCNN||flatten||55.56 15.32||65.00 25.50||51.67 15.28||56.43 17.72|
|mean||50.56 17.07||45.00 24.49||39.67 24.37||41.89 24.19|
|max||55.83 17.00||60.00 33.91||52.46 33.69||51.55 26.63|
|sum||51.39 9.86||50.00 22.36||55.00 24.49||47.00 13.27|
|mean,max,sum||55.83 8.07||60.00 20.00||52.00 7.48||54.22 13.23|
|Supervised GCN-FCNN||flatten||62.50 9.54||60.00 25.50||61.67 10.00||57.86 13.96|
|mean||57.50 13.82||65.00 33.91||45.57 25.21||52.58 27.09|
|max||50.83 7.01||25.00 31.62||20.00 24.49||22.00 27.13|
|sum||56.39 13.54||50.00 35.36||40.33 24.64||44.33 27.78|
|mean,max,sum||44.44 6.09||50.00 27.39||40.00 8.16||42.76 14.38|
Iv-a Experimental Setup
Iv-A1 Data Partitioning
We applied a nested-stratified 5-fold cross-validation (CV) data partitioning strategy/scheme  to evaluate the performance of different models in classifying MDD and HC. Specifically, a two-level 5-fold CV was used comprising an outer-loop for testing and an inner-loop for model hyper-parameter optimization. For each iteration in the outer-loop, a test set was assigned, and the rest of the data were split into five train-validation partitions to tune the model hyper-parameters. This process was repeated for all outer-loop 5-fold partitions. The best performing model (on the validation set) of the five candidate models was then selected to evaluate the performance on the unseen test sets. Four binary classification metrics were used to evaluate the classification performance, i.e., classification accuracy (), sensitivity (), precision (
), and F-score ().
|Competing||SVM-RBF||50.83 7.01||15.00 20.00||16.67 21.08||15.71 20.40|
|BrainNetCNN||51.11 4.16||45.00 36.74||28.57 23.47||34.91 28.57|
|Population-based GCN||55.56 11.65||45.00 18.71||58.57 20.90||47.58 12.84|
|Proposed||Supervised GCN-FCNN||62.50 9.54||60.00 25.50||61.67 10.00||57.86 13.96|
|Unsupervised GAE-FCNN||72.50 10.77||60.00 20.00||80.00 18.71||65.14 17.20|
|Unsupervised VGAE-FCNN||65.28 6.33||55.00 18.71||63.67 8.33||57.86 13.96|
Iv-A2 Model Architecture and Training
We implement the proposed GAE-FCNN based on Pytorch using the GraphConv module from DGL library  for GCN. For the unsupervised model, the architecture and hyper-parameters of GAE and FCNN were determined separately. We computed the reconstruction error of graph over a range of hyper-parameters for the GAE, and a two-layered GCN with respective embedding dimensions of 64 and 16 was identified as the optimal architecture for both GAE/VGAE with the minimum reconstruction error. Further increase in the number of GCN layers gave no further improvement. Using the extracted network adjacency matrices and node feature matrices (dimension depends on type of features used) as inputs, the GAEs were trained using Adam optimizer  to minimize graph reconstruction loss, with learning rate of , reduce-factor of , 200 training epochs and a batch size of 8. Fig. 2 illustrates a training curve of the GAE model with decreasing reconstruction error over epochs. The trained GAE decoder was then used to generate node embedding matrices as inputs to the FCNN. Bayesian optimization  with expected improvement acquisition function was used to optimize the hyper-parameters of FCNN, which suggested an architecture of 3 dense layers (with respective 256, 256, 128 hidden nodes), learning rate of , reduce factor of and a batch size of 4. The FCNN was also trained on the extracted graph embeddings using Adam algorithm.
For the supervised model, the hyper-parameters of the GCN and FCNN were optimized simultaneously using the Bayesian optimizer. The selected hyper-parameters are: 1 convolutional layer with dimension of 94 for GCN, 2 dense layers (with 128, 264 hidden nodes) for FCNN with learning rate of , reduce factor of and a batch size of 6. A dropout ratio of 0.2 was also chosen for the dense layers. The model was trained on the fMRI network data with target class labels, using the Adam algorithm to minimize cross-entropy loss.
Iv-A3 Methods for Comparison
We benchmark the classification performance of the proposed methods with a traditional SVM classifier and two state-of-the-art connectome-specific DNN models: BrainNetCNN and population graph-based GCN. These competing models were evaluated with the same 5-fold CV as the proposed methods.
: We trained SVM with radial basis function (RBF) on the vectorized LDW-correlation coefficients. The hyper-parameters of SVM were optimized based on a grid-search on the validation set.
BrainNetCNN: The BrainNetCNN  is a specially designed deep CNN model which can preserve spatial information in brain connectivity data. Here, the LDW correlation matrices were used directly as inputs to the BrainNetCNN to predict the class labels of MDD and HC as output. It consists of three types of layers: edge-to-edge (E2E) layers, edge-to-node (E2N) layers, and node-to-graph (N2G) layers. The E2E layer applies a cross-shaped convolution filter to each element of the FC input matrix, and combines the edge weights of neighbor nodes to output an matrix. The E2N layer is equivalent to the 1D-CNN filter designed for dimensionality reduction. The N2G layer is a dense layer taking the E2N output to produce a single scalar. Finally, the output of N2G is fed to classification layer for prediction.
Population-based GCN: This method exploits the GCN to model a population graph, where each node represents a subject and edges encode similarity between subjects . It performs node/subject level-classification in a semi-supervised manner to predict brain disorders. Similar to , we used the vectorized upper triangular part of LDW correlation matrices of all subjects as inputs to the population-based GCN. We set the default model hyper-parameters with Chebyshev polynomial basis filters for spectral convolutions as in . The model was trained using 500 epochs with early stopping patience of 10 epochs.
Iv-B1 Comparison of Network Construction Strategies
Table I shows the classification performance (average and standard deviation over 5 folds) of the unsupervised GAE/VGAE-FCNN and supervised GCN-FCNN classifiers. To investigate the impact of choices of network construction strategies on classification, we also evaluated two FC metrics to construct the graph adjacency matrix : Pearson’s correlation matrix and LDW shrinkage correlation matrix; three types of input node features for : raw rs-fMRI time series, FC weights (LDW correlation coefficients) and nodal graph-theoretic measures. The selected readout schemes are also given, and details will be discussed in the next section. As expected, using input graph data based on the LDW correlations shows superior performance over the traditional Pearson’s correlations in classifying MDD and HC for all classification models, as the LDW shrinkage method can provide more reliable estimate of the high-dimensional network structure. For node features, the use of LDW-FC generally provided better classification than the raw fMRI time series and local graph measures. This indicates more discriminative information in the connection weights compared to the low-level BOLD fluctuations, and learning of higher-level meta representations from local graph features also fails to offer additional advantages for classification.
We can see that the unsupervised GAE/VGAE-FCNNs performed better than the supervised GCN-FCNN model, with GAE-FCNN achieving the highest classification accuracy when using LDW-FC for both the graph construction and node features. This suggests that embeddings learned in an unsupervised manner to preserve faithfully the brain network topology can be more predictive of MDD and HC than that optimized to discriminate the class labels directly. Among the unsupervised models, however use of the probabilistic encoding framework in VGAE does not improve classification performance, probably limited by the strong assumption of an i.i.d. Gaussian prior on latent embeddings, and the approximated model parameter inference of the variational method. Future work will investigate better-suited prior distribution in the VGAE for brain network data.
Iv-B2 Comparison of Readout Strategies
Table II shows the classification results for different readout strategies. We compared different readout/transformation methods to obtain graph-level representation as inputs to FCNN classifier, i.e., flattening of and mean/max/sum aggregation of node embeddings . It can be seen that the flattening method by concatenating learned embeddings of all nodes as input yields better classification performance for different classifiers generally, compared to the aggregation method which may induce loss of information about individual nodes.
Iv-B3 Comparison with State-of-the-Art Methods
Table III shows the performance comparison of different connectome-based classification methods. The proposed methods clearly outperformed the competing models by a large margin, with the unsupervised GAE-FCNN performing the best. In consistency with recent studies, our results suggest the advantages of DNN methods over traditional SVM classifier with significant improvement in FC classification. The population-based GCN performs slightly better than the BrainNetCNN. The population-based GCN, while leveraging on pairwise associations between subjects in a population graph for node/subject-level classification, does not classify brain networks directly as in our proposed models. The use of grid-wise convolutions in the BrainNetCNN, despite its capability to capture spatial information of neighboring nodes, fails to account for irregular structure of brain networks. The superior performance of GAE-FCNN models compared to the two DNNs implies that incorporating network topological structure as captured by the network embeddings for classification can provide discriminative information for identifying MDD, a disorder associated with disrupted brain networks. The proposed framework achieved the best accuracy of when using the unsupervised GAE-FCNN on a challenging task of classifying MDD brain networks, based on 5-fold CV on a small dataset, in contrast to the leave-one-subject-out classification in previous studies.
Iv-B4 Connectivity Maps Learned by GAE
In Fig. 3, we plot the averaged feature maps of node-level embeddings learned by the GCN-GAE from LDW-based networks for the MDD and HC subjects. Noticeable difference in the learned embedding pattern can be seen between the two groups, with stronger activation for some ROIs in MDD compared to HC. This demonstrates the ability of the proposed model to extract latent representations of brain network structure that can clearly distinguish between MDD and controls, which explains the enhanced performance in the downstream classification task compared to other methods.
We further constructed higher-order FC by correlating the GAE-learned embeddings between pairs of nodes. Fig. 4 shows the difference in connectivity pattern between the MDD and HC groups as quantified by the LDW-estimated raw FC and the embedding-based higher-order FC. A group-level t-test was used to contrast the FC between the two groups, and connections with significant difference <0.05) are shown in Fig. 4(right). The embedding-based FC matrices (Fig. 4(left & middle)) exhibit an apparent block structure revealing the modular organization, an important property of brain networks. Compared to raw FC, it is evident that the embedding-based FC detected more pronounced and systematic difference in connectivity, particularly between specific communities or modules of ROIs. To examine whether these differences are biologically meaningful and related to MDD as a network-based disorder, we plot the topological maps in Fig. 5 to visualize the increase and decrease in FC between ROIs in MDD relative to HC. The embedding FC identified a spread reduction in intrinsic connectivity of the amygdala with a variety of ROIs involved in emotional processing and regulation in MDD subjects (including caudate, temporal regions, occipital cortex, and cerebellum), as reported in previous rs-fMRI studies . In agreement with previous findings , we also found significant increase in FC in the default mode network (DMN). The detected altered rs-FC between cerebellum with the DMN and affective network has also been associated with major depression [30, 52]
We developed a deep GNN framework for embedding learning in brain functional networks to identify connectome-specific bio-signatures for classifying brain disorders such as MDD. The proposed GAE-FCNN provides a novel approach to incorporating the non-Euclidean information about graph structure into the classification of brain networks. It combines a GCN-based GAE that can learn latent embeddings effectively to encode topological information and node content, and a deep FCNN that leverages on the learned embeddings to reveal disrupted neural connectivity patterns in MDD relative to HC for classification purpose. On a challenging task of classifying MDD and HC using a small amount of rs-fMRI data, the proposed method substantially outperforms several state-of-the-art brain connectome classifiers, achieving the best accuracy of with the unsupervised GAE-FCNN model. Furthermore, higher-order networks constructed from the node embeddings generated from the proposed GAE detects altered FC patterns in MDD related to emotional processing, which are not captured by the original FC measures. Our framework is generally applicable to other functional neuroimaging data, e.g., EEG-derived networks, and other neuropsychiatric disorders besides MDD associated with neural network dysconnectivity, showing potential as diagnostic tool in clinical settings. This study focuses on analysis of static brain networks to show GAE as a promising approach to brain connectome classification. Recent rs-fMRI studies have suggested disruption in the dynamic FC and graph properties in MDD [56, 54]. Future work could extend the proposed framework to classify dynamic FC patterns by learning time-varying latent representations to embed the dynamic network structure.
-  (2019) Brain network functional connectivity and cognitive performance in major depressive disorder. J. Psychiatr. Res. 110, pp. 51–56. Cited by: §II-1.
-  (2009) Human brain networks in health and disease. Curr. Opin. Neurol. 22 (4), pp. 340. Cited by: §I.
-  (2017) Multivariate pattern analysis strategies in detection of remitted major depressive disorder using resting state functional connectivity. Neuroimage Clin. 16, pp. 390–398. Cited by: §I.
-  (2017) Resting state brain network function in major depression–depression symptomatology, antidepressant treatment effects, future research. J. Psychiatr. Res. 92, pp. 147–159. Cited by: §I.
-  (2015) Partial covariance based functional connectivity computation using ledoit–wolf covariance regularization. NeuroImage 121, pp. 29–38. Cited by: §III-A1.
-  (2016) Machine learning on human connectome data from MRI. arXiv preprint arXiv:1611.08699. Cited by: §I.
-  (2014) Spectral networks and locally connected networks on graphs. pp. 1–14. Cited by: §I.
-  (2014) Aberrant functional connectivity for diagnosis of major depressive disorder: A discriminant analysis. Psychiatry Clin. Neurosci. 68 (2), pp. 110–119. Cited by: §I.
-  (2019) Benchmarking functional connectome-based predictive models for resting-state fMRI. NeuroImage 192, pp. 115–134. Cited by: §I.
-  (2018) Classification and prediction of brain disorders using functional connectivity: Promising but challenging. Front. Neurosci. 12, pp. 525. Cited by: §I.
-  (2015) The (in) stability of functional brain network measures across thresholds. NeuroImage 118, pp. 651–661. Cited by: §III-A1.
-  (2018) Multivariate classification of major depressive disorder using the effective connectivity and functional connectivity. Front. Neurosci. 12, pp. 38. Cited by: §I.
-  (2007) Resting-state functional connectivity in major depression: abnormally increased contributions from subgenual cingulate cortex and thalamus. Biol. Psychiatry 62 (5), pp. 429–437. Cited by: §I.
-  (2020) Human brain connectivity: Clinical applications for clinical neurophysiology. Clin. Neurophysiol. 131, pp. 1621–51. Cited by: §I.
-  (2020) Deep neural networks and kernel regression achieve comparable accuracies for functional connectivity prediction of behavior and demographics. NeuroImage 206, pp. 116276. Cited by: §I.
-  Scikit-optimize/scikit-optimize: v0.5.2 External Links: Cited by: §IV-A2.
-  (2018) Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neuroimage Clin. 17, pp. 16–23. Cited by: §I.
-  (2020) Hi-GCN: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction. Comput. Biol. Med. 127, pp. 104096. Cited by: §I, §III-A1.
-  (2019) Deep learning of static and dynamic brain functional networks for early MCI detection. IEEE Trans. Med. Imaging 39 (2), pp. 478–487. Cited by: §I.
-  (2017) BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage 146, pp. 1038–1049. Cited by: §I, item 2.
-  (2016) Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: evidence from whole-brain resting-state functional connectivity patterns of schizophrenia. Neuroimage 124, pp. 127–146. Cited by: §I.
-  (2015) Adam: A method for stochastic optimization. In 3rd Int. Conf. Learn. Repr., ICLR 2015, San Diego, CA, USA, May 7-9, External Links: Cited by: §IV-A2.
-  (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §I, §III-B1, item 3.
-  (2016) Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning. Cited by: §III-B1, §III-B.
-  (2004) A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88 (2), pp. 365–411. Cited by: §I, §III-A1.
-  (2019) Graph neural network for interpreting task-fMRI biomarkers. In 22nd Int. Conf. Med. Image Comput. Assist. Interv.,MICCAI, Shenzhen, China, Oct 13-17, pp. 485–493. Cited by: §I.
-  (2017) Resting state fMRI functional connectivity-based classification using a convolutional neural network architecture. Front. Neuroinf. 11 (61). Cited by: §I.
-  (2015) Resting-state functional connectivity in major depressive disorder: A review. Neurosci. Biobehav. Rev. 56, pp. 330–344. Cited by: §I, §IV-B4.
-  (2011) Underconnected, but how? A survey of functional connectivity MRI studies in autism spectrum disorders. Cereb. Cortex 21 (10), pp. 2233–2243. Cited by: §I.
-  (2010) Distinct and overlapping functional zones in the cerebellum defined by resting state functional connectivity. Cerebral Cortex 20 (4), pp. 953–965. Cited by: §IV-B4.
-  (2018) Adversarially regularized graph autoencoder for graph embedding. pp. 2609–2615. Cited by: §III-B1, §III-B.
-  (2018) Disease prediction using graph convolutional networks: application to autism spectrum disorder and Alzheimer’s disease. Med. Image Anal. 48, pp. 117–130. Cited by: §I.
-  (2017) Spectral graph convolutions for population-based disease prediction. In 20th Int. Conf. Med. Image Comput. Assist. Interv.,MICCAI, Quebec, Canada, Sep 10-14, pp. 177–185. Cited by: §I.
-  (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Cited by: §IV-A2.
-  (2009) Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 45 (1), pp. S199–S209. Cited by: §IV-A1.
-  (2019) A multi-domain connectome convolutional neural network for identifying schizophrenia from eeg connectivity patterns. IEEE J. Biomed. Health. Inf. 24 (5), pp. 1333–1343. Cited by: §I.
-  (2014) Deep learning for neuroimaging: a validation study. Front. Neurosci. 8, pp. 229. Cited by: §I.
-  (2020) Improving the detection of autism spectrum disorder by combining structural and functional MRI information. Neuroimage Clin. 25, pp. 102181. Cited by: §I.
-  (2014) Reduced intrinsic connectivity of amygdala in adults with major depressive disorder. Front. Psychiatry 5, pp. 17. Cited by: §IV-B4.
-  (2010) Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52 (3), pp. 1059–1069. Cited by: §III-A2.
-  (1998) The mini-international neuropsychiatric interview (MINI): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10.. J. Clin. Psychiatry. Cited by: §II-1.
-  (2020) Automated detection of autism spectrum disorder using a convolutional neural network. Front. Neurosci. 13, pp. 1325. Cited by: §I.
-  (2014) Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (1), pp. 1929–1958. Cited by: §III-C2.
-  (2017) Proportional thresholding in resting-state fMRI functional connectivity networks and consequences for patient-control connectome studies: issues and recommendations. NeuroImage 152, pp. 437–449. Cited by: §III-A1.
-  (2012) Whole brain resting state functional connectivity abnormalities in schizophrenia. Schizophr. Res. 139 (1-3), pp. 7–12. Cited by: §I.
-  (2019) Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315. Cited by: §IV-A2.
-  (2020) A bayesian approach to examining default mode network functional connectivity and cognitive performance in major depressive disorder. Psychiatry Res. Neuroimaging 301, pp. 111102. Cited by: §II-1.
-  (2017) Building better biomarkers: Brain models in translational neuroimaging. Nat. Neurosci. 20 (3), pp. 365. Cited by: §I.
-  (2015) Resting-state functional connectivity in psychiatric disorders. JAMA Psychiatry 72 (8), pp. 743–744. Cited by: §I.
-  (2021) A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 22 (1), pp. 4–24. Cited by: §I.
-  (2015) Changes of functional brain networks in major depressive disorder: a graph theoretical analysis of resting-state fMRI. PloS one 10 (9), pp. e0133775. Cited by: §I.
-  (2012) Identifying major depression using whole-brain functional connectivity: A multivariate pattern analysis. Brain 135 (5), pp. 1498–1507. Cited by: §I, §IV-B4.
-  (2010) Resting brain connectivity: changes during the progress of Alzheimer disease. Radiology 256 (2), pp. 598–606. Cited by: §I.
-  (2020) The concurrent disturbance of dynamic functional and structural brain connectome in major depressive disorder: the prefronto-insular pathway. J. Affect. Disord. 274, pp. 1084–1090. Cited by: §V.
-  (2011) Disrupted brain connectivity networks in drug-naive, first-episode major depressive disorder. Biol. Psychiatry 70 (4), pp. 334–342. Cited by: §I.
-  (2018) Aberrant dynamic functional network connectivity and graph properties in major depressive disorder. Front. Psychiatry 9, pp. 339. Cited by: §V.
-  (2020) Cross-network interaction for diagnosis of major depressive disorder based on resting state functional connectivity. Brain Imag. Behav., pp. 1–11. Cited by: §I.