Structure-Preserving Graph Kernel for Brain Network Classification

by   Zhaomin Kong, et al.

This paper presents a novel graph-based kernel learning approach for connectome analysis. Specifically, we demonstrate how to leverage the naturally available structure within the graph representation to encode prior knowledge in the kernel. We first proposed a matrix factorization to directly extract structural features from natural symmetric graph representations of connectome data. We then used them to derive a structure-persevering graph kernel to be fed into the support vector machine. The proposed approach has the advantage of being clinically interpretable. Quantitative evaluations on challenging HIV disease classification (DTI- and fMRI-derived connectome data) and emotion recognition (EEG-derived connectome data) tasks demonstrate the superior performance of our proposed methods against the state-of-the-art. Results showed that relevant EEG-connectome information is primarily encoded in the alpha band during the emotion regulation task.


Transformers for EEG Emotion Recognition

Electroencephalogram (EEG) can objectively reflect emotional state and c...

Multiscale Fractal Analysis of Stimulated EEG Signals with Application to Emotion Classification

Emotion Recognition from EEG signals has long been researched as it can ...

Towards Emotion Recognition: A Persistent Entropy Application

Emotion recognition and classification is a very active area of research...

Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization

Nonnegative Matrix Factorization (NMF) has been continuously evolving in...

EEG-Based Emotion Recognition Using Regularized Graph Neural Networks

In this paper, we propose a regularized graph neural network (RGNN) for ...

Unsupervised Learning in Reservoir Computing for EEG-based Emotion Recognition

In real-world applications such as emotion recognition from recorded bra...

Fast Kronecker product kernel methods via generalized vec trick

Kronecker product kernel provides the standard approach in the kernel me...

1 Introduction

Brain network analysis, enriched by the advances of neuroimaging technologies such as electroencephalography (EEG) and diffusion tensor imaging (DTI), has been an appealing research topic in recent years in neuroscience [7]. The study originates from modeling the human brain connectome as a graph – a mathematical construct mapping the connectivity of anatomically distinct brain regions (i.e., nodes) and inter-regional pathways (i.e., edges). By graph based analysis, the information encoded by the connectome can promote critical understanding on how the brain manages cognition, what signals the connections convey and how these signals affect brain regions [28]. It has shown great potential in disease diagnosis, clinical outcome prediction, therapeutic adjustment and collection of biological features [19, 8, 23]

. With the development of machine learning algorithms on graph-structured data, it became motivating to use such approaches for brain network analysis.

In the literature, a variety of machine learning methods have been explored for brain network selection and classification. For example, support vector machine (SVM) [26], graph kernel [25]

, independent component analysis

[6], frequent graph-based pattern mining (gSpan) [5], tensor decomposition [13, 11]

. Deep learning methods such as convolution neural network (CNN)

[20] and graph convolutional network [27], which are successful on many tasks, are exploited as well. Although great achievements have been made in various research aspects of these methods, some issues still exists. The human connectome has complex and non-linear characteristics, which may not be well captured by linear models. Meanwhile, deep learning methods suffer from the enormous parameter sizes, which is both difficult for training and vulnerable to overfitting. Besides, many methods do not make good use or even fail to preserve the graph structure. Thus, it is desirable to develop a concise method for brain network analysis.

In this paper, we propose a novel graph-based kernel learning approach for brain network predictive analysis, and apply it to challenging EEG-connectome emotion regulation task. The proposed framework is illustrated in Fig. 1. The contributions of this work are threefold:

Figure 1: The framework of graph-based kernel learning.
  • We derived a Structure-preserving Symmetric Graph Kernel (SSGK) in tensor product space for brain network classification. A new matrix factorization scheme was introduced to incorporate the graph structure as well as the symmetric constraint and sparse layouts.

  • Extensive experiments on multiclass EEG-based emotion regulation task with respect to different frequency bands demonstrate the superior performance of SSGK, compared with the state-of-the-art traditional and deep learning methods. Results also show that relevant EEG signals are primarily encoded in alpha and theta bands during the emotion regulation task, which is consistent with previous studies.

  • SSGK is a general graph-kernel framework for efficiently measuring the similarity of structured data. It has great potentials to be applied to a wide range of applications, in conjunction with various kernel-based methods and kernel functions.

2 Preliminaries

In this section, we first introduce some notations and basic operations that will be used throughout this paper. Then we review some aspects of the kernel learning problem.

Notations and Basic Operations. Following [12], we denote vectors by lowercase boldface letters, e.g., ; and matrices by uppercase boldface, e.g., . An index is denoted with a lowercase letter, spanning the range from 1 to the uppercase letter of the index, e.g., . We denote a matrix as , and their elements by . We will often use calligraphic letters (, , , ) to denote general spaces. Specifically, the inner product of two matrices is defined as . The Frobenius norm of a matrix is defined as . The norm of a vector is defined as the sum of the absolute values of its elements. A rank-one matrix equals to the outer product of two vectors: . Note that for rank-one matrices it holds that


Kernel Learning. In a typical prediction task, given a collection of training examples , where is the input samples, and is the class label of , the goal is to find a function that accurately predicts the label of an unseen example in

. Support Vector Machines (SVMs) are one of the most popular kernel-based learning algorithms, which are effective on the data by linear boundaries, and in order to extend classifier functionality to classify by non-linear boundaries the kernel functions are used

[16]. The kernel function encapsulates the hypothesis language, i.e., how to perform data transformation and knowledge encoding. In general, it maps data from the original input feature space to a higher dimensional feature space (known as Hilbert space), and a kernel function corresponds to the inner product in this higher dimensional feature space. The computational attractiveness of kernel methods comes from the fact that quite often a closed form of ‘feature space inner products’ exists [9]. Instead of mapping the data explicitly, the kernel can be calculated directly. According to Mercer’s theorem [18], we can verify whether a kernel function is valid by the following Theorem [3].

Theorem 1

A function defined on is a positive definite kernel of if and only if there exists a feature mapping function such that


for any .

In particular, an important property of positive definite kernels is that they are closed under sum, multiplication by a scalar and product [4].

3 Methods

The brain networks are biologically expected to be both sparse and highly localized in space. Such unique characterizations put specific topological constraints onto machine learning models we can use effectively. We propose a new matrix factorization scheme to incorporate the graph structure as well as the symmetric constraint and sparse layouts, which allows one to interpret brain network as a bilinear tensor product approximation. We then use this approximation to define a structure-preserving symmetric graph kernel function (SSGK) for the SVM classifier. We present the key steps of our methods in detail as below.

Feature Extraction.

Graph provides a natural representation for connectome data, but there is no guarantee that such representation will be good for kernel learning. Since learning will only be successful if the regularities that underlie the data can be discerned by the kernel. From the characteristics of connectome object, we know that the essential information in the connectome is embedded in the structure of the graph. Thus, one important aspect of kernel learning for such complex objects is to represent them by sets of key structural features which are easier to manipulate. In previous work, it was found that matrix factorization is particularly effective for extracting this structure. It can take the correlation in the graph matrix into account and represent it directly into a sum of rank one matrices (bilinear bases), yielding a more compact representation of connectome data. Motivated by these observations, we use matrix factorization for feature extraction. In particular, given a graph matrix

, we solve the following optimization problem:


where is the rank of the matrix defined as the smallest number of rank-one matrices in an exact matrix factorization, is is the Frobenius norm of the matrix, and is the norm for sparse solution (known as lasso regularization). Equation (3) can be solved by the tensorlab toolbox [17] in Matlab.

Graph Structure Mapping. Note that although matrix factorization factorizes the graph matrix, we can still preserve the graph structure and recover the original from the factorized results. We show how the above feature extraction results can be exploited to induce a structure-preserving graph kernel. Suppose we are given the matrix factorization of by and respectively. We assume the graph observations are mapped into the Hilbert space by


Importantly, the mapping result is still a symmetric matrix, but its dimension is high, even infinite depending on the feature mapping function .

Based on the definition of the kernel function, we know that the feature space is a high-dimensional space generated from the original space, equipped with the same operations. Thus, we can factorize graph data directly in the feature space in the same way as in the original space. This is formally equivalent to performing the following mapping:


In this sense, it corresponds to mapping graphs into high-dimensional graphs that retain the original structure. More precisely, it can be regarded as mapping the original graph matrix to matrix feature space and then conducting the matrix factorization in the feature space, as illustrated in Fig. 2.

Figure 2: Schemic diagram of the feature extraction and graph structure mapping

After mapping the matrix factorization into the outer product feature space, the kernel can be defined directly with the inner product in that feature space. Thus, based on equation (1), we can derive our SSGK model:


Based on the Theorem 1, it is easy to see that this kernel is ‘valid’ as it is described as an inner product of two matrices and . From the derivation process, we know that such a kernel can take into account the flexibility of graph structure. In general, SSGK is an extension of the conventional kernels in the vector space to matrix space, and each vector kernel can be used in this framework for EEG-connectome analysis in conjunction with kernel machines. Our positive result can be viewed as saying that designing a good graph kernel function is much like designing a good graph structure in the feature space.

4 Experiments and Discussions

Figure 3: Average EEG-connectome during neutral, maintain and reappraise in the five different frequency bands.

Data. Data were collected from 22 healthy participants at *** and from 11 healthy participants at @@@111*** and @@@ are used for blind review.. Each participant underwent an Emotion Regulation Task (ERT). During the ERT session, participants were instructed to look at pictures displayed on the screen. Emotionally neutral pictures (e.g., landscape, everyday objects) and negative pictures (e.g., car crash, nature disasters) would appear on the screen for seven seconds in random orders. One second after the picture on display, a corresponding auditory guide would instruct the participant to neutral: viewing the neutral pictures; to maintain: viewing the negative pictures as they normally would; or to reappraise: viewing the negative pictures while attempting to reduce their emotion response by re-interpreting the meaning of pictures. All subjects were recorded using the Biosemi system equipped with an elastic cap with 34 scalp channels. The acquisition connectivity matrix is with 130 time points and 50 frequencies ranging from 1Hz to 50Hz in increments of 1Hz.

Tasks. We study multi-class EEG-connectome emotion regulation tasks and analyze the effect of different frequency bands of EEG signals. In emotion regulation, studies have shown that relevant EEG information is primarily encoded in the low frequency bands [2, 22]. Thus, we analyze the EEG-connectome data in 5 frequency bands, and they are: Delta (1–3 Hz), Theta (4–7 Hz), Alpha (8–12 Hz), Beta (13–30 Hz), for relative power, as well as the total power of the EEG (1–30 Hz) [14]. The average EEG-connectome during neutral, maintain and reappraise in the five different frequency bands are shown in Figure 3 where the – and – axes represent the vertex id, and the color of the cell represents the strength of the connectivity between vertex and . We can see that the connectivity in the alpha band is generally stronger than other frequency bands, and theta band is the second one.

Algorithms. We evaluate eight algorithms in Table 1 on the five tasks above, each of which represents a different strategy: the edge based feature extraction (Edge), where edge values were directly used as features by flatting connectivity matrices of EEG-connectome into vectors; the local clustering coefficients (CC) [15] which measures a network’s local segregation; the characteristic path length (CPL) [21], which quantifies the global information integration; the graph-based substructure pattern mining (gSpan) [24], which is a discriminative subgraph selection approach; the dual structure-preserving kernel (DuSK) [11], which takes multidimensional tensors as input. We use second- (i.e., averaged over time and frequency), third- (i.e., averaged over time) and fourth-order (i.e., all data with dimension

, where x corresponds to a number of the frequency level) version of this scheme, denoted as DuSK-2D, DuSK-3D, DuSK-4D respectively; the convolutional neural network (CNN) with 2D convolutions for averaged 2D brain network data and 3D convolutions for averaged 3D brain network data

[10]; the graph convolutional network (GCN) for averaged 2D brain network data [27], where the average of all brain networks is used as the graph structure (i.e., adjacency matrix) for information propagation; the proposed method and its variant without sparse-constraint (SSGK and ).

Experimental Settings. We use the subjects collected from *** as the training set, and @@@ as the testing set in all the experiments. Following [11]

, we use the SVM with the Gaussian RBF kernel as the base classifier for all methods. We use the classification accuracy as the evaluation metric.

Algorithm Settings. All of the methods select the optimal trade-off parameter of SVM and kernel width parameter from . Other parameters for gSpan and DuSK are set following [24] and [11], repectively. For our SSGK and methods, the parameter and was automatically selected from the value set of and by the grid search.

Results. Detailed results are listed in Table 1. From Table 1, it can be seen that the proposed SSGK-based methods outperform all compared methods by 10%- 20% on almost all five different frequency bands. The superiority of the proposed methods demonstrate the effectiveness of utilizing the structure within the graph representation during encoding. More specifically, among all five frequency bands, SSGK produces the best performance on Alpha band and second best performance on Theta band, which is consistent with previous findings [22, 1] and can also be observed in our visualization in Figure 3. Furthermore, by comparing SSGK and , it is noticed that the proposed SSGK approach with sparse regularization consistently outperforms the same approach without sparse regularization, and the advantage of sparsity characterization indicates the importance of modeling the redundant information of observed frequency bands.

Frequency Band
Category    Method   Delta   Theta   Alpha   Beta   All
Traditional Edge 42.42 54.55 51.52 51.52 45.45
CC 54.55 54.55 42.42 51.52 42.42
CPL 48.48 42.42 45.45 48.48 39.39
gSpan 39.39 51.52 39.39 54.55 48.48
DuSK–2D 51.52 63.64 51.51 51.52 54.55
DuSK–3D 57.58 57.58 57.58 54.55 48.48
DuSK–4D 54.55 54.55 51.52 54.55 57.58
Deep Learning CNN–2D 51.11 43.71 43.07 42.54 41.48
CNN–3D 46.67 45.93 41.48 57.04 44.44
GCN 41.31 48.08 41.01 40.61 37.37
Ours 57.58 66.67 63.64 54.55 57.58
SSGK 63.64 69.70 72.73 60.61 57.58
Table 1: The classification accuracy in percentage (%) by seven competing methods and the proposed two methods for the five tasks. The best results for each task are highlighted in bold font.

5 Conclusion

In this paper, a graph-based kernel learning approach called Structure-preserving Symmetric Graph Kernel (SSGK) is proposed to deal with EEG-derived connectome classification task. The proposed method mainly follows two consecutive steps: first, a sparse-inducing symmetric matrix factorization strategy is applied to extract structural features from natural symmetric graph representations of EEG-connectome data, then the extracted structural features are directly used to define the SSGK function and further fed into the support vector machine for classification. The proposed method is clinically interpretable and is able to encode prior knowledge in the kernel with structural information in graph representation. Experimental results on challenging emotion recognition task demonstrates the effectiveness of the proposed method for encoding relevant EEG-connectome information.


Text Text Text Text Text Text Text Text. might want to know about text text text text


This work was supported by the National Institutes of Health [AA123456 to C.S., BB765432 to M.H.]; and the Alcohol Education Research Council.


  • [1] S. A. M. Aris, S. Lias, and M. N. Taib (2010) The relationship of alpha waves and theta waves in eeg during relaxation and iq test. In 2010 2nd International Congress on Engineering Education, pp. 69–72. Cited by: §4.
  • [2] M. Balconi, E. Grippa, and M. E. Vanutelli (2015) What hemodynamic (fnirs), electrophysiological (eeg) and autonomic integrated measures can tell us about emotional processing. Brain and cognition 95, pp. 67–76. Cited by: §4.
  • [3] A. Berlinet and C. Thomas-Agnan (2011)

    Reproducing kernel hilbert spaces in probability and statistics

    Springer Science & Business Media. Cited by: §2.
  • [4] N. Cristianini and J. Shawe-Taylor (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge university press. Cited by: §2.
  • [5] J. Du, L. Wang, B. Jie, and D. Zhang (2016) Network-based classification of adhd patients using discriminative subnetwork selection and graph kernel pca. Computerized Medical Imaging and Graphics 52, pp. 82–88. Cited by: §1.
  • [6] Y. Fan, Y. Liu, H. Wu, Y. Hao, H. Liu, Z. Liu, and T. Jiang (2011) Discriminant analysis of functional connectivity patterns on grassmann manifold. Neuroimage 56 (4), pp. 2058–2067. Cited by: §1.
  • [7] A. Fornito, A. Zalesky, and E. Bullmore (2016) Fundamentals of brain network analysis. Academic Press. Cited by: §1.
  • [8] H. Gao, C. Cai, J. Yan, L. Yan, J. G. Cortes, Y. Wang, F. Nie, J. West, A. Saykin, L. Shen, et al. (2015) Identifying connectome module patterns via new balanced multi-graph normalized cut. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 169–176. Cited by: §1.
  • [9] T. Gärtner (2003) A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter 5 (1), pp. 49–58. Cited by: §2.
  • [10] A. Gupta, M. Ayhan, and A. Maida (2013) Natural image bases to represent neuroimaging data.. In ICML, pp. 987–994. Cited by: §4.
  • [11] L. He, X. Kong, P. S. Yu, X. Yang, A. B. Ragin, and Z. Hao (2014) Dusk: a dual structure-preserving kernel for supervised tensor learning with applications to neuroimages. In SDM, pp. 127–135. Cited by: §1, §4, §4, §4.
  • [12] T. G. Kolda and B. W. Bader (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §2.
  • [13] N. Leonardi and D. Van De Ville (2013) Identifying network correlates of brain states using tensor decompositions of whole-brain dynamic functional connectivity. In

    2013 International Workshop on Pattern Recognition in Neuroimaging

    pp. 74–77. Cited by: §1.
  • [14] A. Lutz, H. A. Slagter, N. B. Rawlings, A. D. Francis, L. L. Greischar, and R. J. Davidson (2009) Mental training enhances attentional stability: neural and behavioral evidence. The Journal of Neuroscience 29 (42), pp. 13418–13427. Cited by: §4.
  • [15] M. Rubinov and O. Sporns (2010) Complex network measures of brain connectivity: uses and interpretations. Neuroimage 52 (3), pp. 1059–1069. Cited by: §4.
  • [16] I. SenGupta, A. Kumar, and R. Kumar (2018) A study on handling non linear separation of classes using kernel based supervised noise clustering approach. International Journal of Computer Applications 181 (5), pp. 29–35. Cited by: §2.
  • [17] L. Sorber, M. V. Barel, and L. D. Lathauwer (January 2014) Tensorlab v2.0. Available online. Note: Cited by: §3.
  • [18] V. Vapnik (2013)

    The nature of statistical learning theory

    Springer Science & Business Media. Cited by: §2.
  • [19] D. Wang, Y. Wang, F. Nie, J. Yan, W. Cai, A. J. Saykin, L. Shen, and H. Huang (2014) Human connectome module pattern detection using a new multi-graph minmax cut model. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 313–320. Cited by: §1.
  • [20] S. Wang, L. He, B. Cao, C. Lu, P. S. Yu, and A. B. Ragin (2017) Structural deep brain network mining. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–484. Cited by: §1.
  • [21] D. J. Watts and S. H. Strogatz (1998) Collective dynamics of ‘small-world’networks. nature 393 (6684), pp. 440–442. Cited by: §4.
  • [22] M. Xing, R. Tadayonnejad, A. MacNamara, O. Ajilore, K. L. Phan, H. Klumpp, and A. Leow (2016) EEG based functional connectivity reflects cognitive load during emotion regulation. In ISBI, Cited by: §4, §4.
  • [23] N. Yahata, J. Morimoto, R. Hashimoto, G. Lisi, K. Shibata, Y. Kawakubo, H. Kuwabara, M. Kuroda, T. Yamada, F. Megumi, et al. (2016) A small number of abnormal brain connections predicts adult autism spectrum disorder. Nature communications 7 (1), pp. 1–12. Cited by: §1.
  • [24] X. Yan and J. Han (2002) Gspan: graph-based substructure pattern mining. In ICDM, pp. 721–724. Cited by: §4, §4.
  • [25] J. Yang, Q. Zhu, R. Zhang, J. Huang, and D. Zhang (2020) Unified brain network with functional and structural data. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 114–123. Cited by: §1.
  • [26] L. Zeng, H. Shen, L. Liu, L. Wang, B. Li, P. Fang, Z. Zhou, Y. Li, and D. Hu (2012) Identifying major depression using whole-brain functional connectivity: a multivariate pattern analysis. Brain 135 (5), pp. 1498–1507. Cited by: §1.
  • [27] X. Zhang, L. He, K. Chen, Y. Luo, J. Zhou, and F. Wang (2018) Multi-view graph convolutional network and its applications on neuroimage analysis for parkinson’s disease. In AMIA Annual Symposium Proceedings, Vol. 2018, pp. 1147. Cited by: §1, §4.
  • [28] Y. Zhang and H. Huang (2019) New graph-blind convolutional network for brain connectome data analysis. In International Conference on Information Processing in Medical Imaging, pp. 669–681. Cited by: §1.