1 Introduction
Brain network analysis, enriched by the advances of neuroimaging technologies such as electroencephalography (EEG) and diffusion tensor imaging (DTI), has been an appealing research topic in recent years in neuroscience [7]. The study originates from modeling the human brain connectome as a graph – a mathematical construct mapping the connectivity of anatomically distinct brain regions (i.e., nodes) and interregional pathways (i.e., edges). By graph based analysis, the information encoded by the connectome can promote critical understanding on how the brain manages cognition, what signals the connections convey and how these signals affect brain regions [28]. It has shown great potential in disease diagnosis, clinical outcome prediction, therapeutic adjustment and collection of biological features [19, 8, 23]
. With the development of machine learning algorithms on graphstructured data, it became motivating to use such approaches for brain network analysis.
In the literature, a variety of machine learning methods have been explored for brain network selection and classification. For example, support vector machine (SVM) [26], graph kernel [25]
, independent component analysis
[6], frequent graphbased pattern mining (gSpan) [5], tensor decomposition [13, 11]. Deep learning methods such as convolution neural network (CNN)
[20] and graph convolutional network [27], which are successful on many tasks, are exploited as well. Although great achievements have been made in various research aspects of these methods, some issues still exists. The human connectome has complex and nonlinear characteristics, which may not be well captured by linear models. Meanwhile, deep learning methods suffer from the enormous parameter sizes, which is both difficult for training and vulnerable to overfitting. Besides, many methods do not make good use or even fail to preserve the graph structure. Thus, it is desirable to develop a concise method for brain network analysis.In this paper, we propose a novel graphbased kernel learning approach for brain network predictive analysis, and apply it to challenging EEGconnectome emotion regulation task. The proposed framework is illustrated in Fig. 1. The contributions of this work are threefold:

We derived a Structurepreserving Symmetric Graph Kernel (SSGK) in tensor product space for brain network classification. A new matrix factorization scheme was introduced to incorporate the graph structure as well as the symmetric constraint and sparse layouts.

Extensive experiments on multiclass EEGbased emotion regulation task with respect to different frequency bands demonstrate the superior performance of SSGK, compared with the stateoftheart traditional and deep learning methods. Results also show that relevant EEG signals are primarily encoded in alpha and theta bands during the emotion regulation task, which is consistent with previous studies.

SSGK is a general graphkernel framework for efficiently measuring the similarity of structured data. It has great potentials to be applied to a wide range of applications, in conjunction with various kernelbased methods and kernel functions.
2 Preliminaries
In this section, we first introduce some notations and basic operations that will be used throughout this paper. Then we review some aspects of the kernel learning problem.
Notations and Basic Operations. Following [12], we denote vectors by lowercase boldface letters, e.g., ; and matrices by uppercase boldface, e.g., . An index is denoted with a lowercase letter, spanning the range from 1 to the uppercase letter of the index, e.g., . We denote a matrix as , and their elements by . We will often use calligraphic letters (, , , ) to denote general spaces. Specifically, the inner product of two matrices is defined as . The Frobenius norm of a matrix is defined as . The norm of a vector is defined as the sum of the absolute values of its elements. A rankone matrix equals to the outer product of two vectors: . Note that for rankone matrices it holds that
(1) 
Kernel Learning. In a typical prediction task, given a collection of training examples , where is the input samples, and is the class label of , the goal is to find a function that accurately predicts the label of an unseen example in
. Support Vector Machines (SVMs) are one of the most popular kernelbased learning algorithms, which are effective on the data by linear boundaries, and in order to extend classifier functionality to classify by nonlinear boundaries the kernel functions are used
[16]. The kernel function encapsulates the hypothesis language, i.e., how to perform data transformation and knowledge encoding. In general, it maps data from the original input feature space to a higher dimensional feature space (known as Hilbert space), and a kernel function corresponds to the inner product in this higher dimensional feature space. The computational attractiveness of kernel methods comes from the fact that quite often a closed form of ‘feature space inner products’ exists [9]. Instead of mapping the data explicitly, the kernel can be calculated directly. According to Mercer’s theorem [18], we can verify whether a kernel function is valid by the following Theorem [3].Theorem 1
A function defined on is a positive definite kernel of if and only if there exists a feature mapping function such that
(2) 
for any .
In particular, an important property of positive definite kernels is that they are closed under sum, multiplication by a scalar and product [4].
3 Methods
The brain networks are biologically expected to be both sparse and highly localized in space. Such unique characterizations put specific topological constraints onto machine learning models we can use effectively. We propose a new matrix factorization scheme to incorporate the graph structure as well as the symmetric constraint and sparse layouts, which allows one to interpret brain network as a bilinear tensor product approximation. We then use this approximation to define a structurepreserving symmetric graph kernel function (SSGK) for the SVM classifier. We present the key steps of our methods in detail as below.
Feature Extraction.
Graph provides a natural representation for connectome data, but there is no guarantee that such representation will be good for kernel learning. Since learning will only be successful if the regularities that underlie the data can be discerned by the kernel. From the characteristics of connectome object, we know that the essential information in the connectome is embedded in the structure of the graph. Thus, one important aspect of kernel learning for such complex objects is to represent them by sets of key structural features which are easier to manipulate. In previous work, it was found that matrix factorization is particularly effective for extracting this structure. It can take the correlation in the graph matrix into account and represent it directly into a sum of rank one matrices (bilinear bases), yielding a more compact representation of connectome data. Motivated by these observations, we use matrix factorization for feature extraction. In particular, given a graph matrix
, we solve the following optimization problem:(3) 
where is the rank of the matrix defined as the smallest number of rankone matrices in an exact matrix factorization, is is the Frobenius norm of the matrix, and is the norm for sparse solution (known as lasso regularization). Equation (3) can be solved by the tensorlab toolbox [17] in Matlab.
Graph Structure Mapping. Note that although matrix factorization factorizes the graph matrix, we can still preserve the graph structure and recover the original from the factorized results. We show how the above feature extraction results can be exploited to induce a structurepreserving graph kernel. Suppose we are given the matrix factorization of by and respectively. We assume the graph observations are mapped into the Hilbert space by
(4) 
Importantly, the mapping result is still a symmetric matrix, but its dimension is high, even infinite depending on the feature mapping function .
Based on the definition of the kernel function, we know that the feature space is a highdimensional space generated from the original space, equipped with the same operations. Thus, we can factorize graph data directly in the feature space in the same way as in the original space. This is formally equivalent to performing the following mapping:
(5) 
In this sense, it corresponds to mapping graphs into highdimensional graphs that retain the original structure. More precisely, it can be regarded as mapping the original graph matrix to matrix feature space and then conducting the matrix factorization in the feature space, as illustrated in Fig. 2.
After mapping the matrix factorization into the outer product feature space, the kernel can be defined directly with the inner product in that feature space. Thus, based on equation (1), we can derive our SSGK model:
(6) 
Based on the Theorem 1, it is easy to see that this kernel is ‘valid’ as it is described as an inner product of two matrices and . From the derivation process, we know that such a kernel can take into account the flexibility of graph structure. In general, SSGK is an extension of the conventional kernels in the vector space to matrix space, and each vector kernel can be used in this framework for EEGconnectome analysis in conjunction with kernel machines. Our positive result can be viewed as saying that designing a good graph kernel function is much like designing a good graph structure in the feature space.
4 Experiments and Discussions
Data. Data were collected from 22 healthy participants at *** and from 11 healthy participants at @@@^{1}^{1}1*** and @@@ are used for blind review.. Each participant underwent an Emotion Regulation Task (ERT). During the ERT session, participants were instructed to look at pictures displayed on the screen. Emotionally neutral pictures (e.g., landscape, everyday objects) and negative pictures (e.g., car crash, nature disasters) would appear on the screen for seven seconds in random orders. One second after the picture on display, a corresponding auditory guide would instruct the participant to neutral: viewing the neutral pictures; to maintain: viewing the negative pictures as they normally would; or to reappraise: viewing the negative pictures while attempting to reduce their emotion response by reinterpreting the meaning of pictures. All subjects were recorded using the Biosemi system equipped with an elastic cap with 34 scalp channels. The acquisition connectivity matrix is with 130 time points and 50 frequencies ranging from 1Hz to 50Hz in increments of 1Hz.
Tasks. We study multiclass EEGconnectome emotion regulation tasks and analyze the effect of different frequency bands of EEG signals. In emotion regulation, studies have shown that relevant EEG information is primarily encoded in the low frequency bands [2, 22]. Thus, we analyze the EEGconnectome data in 5 frequency bands, and they are: Delta (1–3 Hz), Theta (4–7 Hz), Alpha (8–12 Hz), Beta (13–30 Hz), for relative power, as well as the total power of the EEG (1–30 Hz) [14]. The average EEGconnectome during neutral, maintain and reappraise in the five different frequency bands are shown in Figure 3 where the – and – axes represent the vertex id, and the color of the cell represents the strength of the connectivity between vertex and . We can see that the connectivity in the alpha band is generally stronger than other frequency bands, and theta band is the second one.
Algorithms. We evaluate eight algorithms in Table 1 on the five tasks above, each of which represents a different strategy: the edge based feature extraction (Edge), where edge values were directly used as features by flatting connectivity matrices of EEGconnectome into vectors; the local clustering coefficients (CC) [15] which measures a network’s local segregation; the characteristic path length (CPL) [21], which quantifies the global information integration; the graphbased substructure pattern mining (gSpan) [24], which is a discriminative subgraph selection approach; the dual structurepreserving kernel (DuSK) [11], which takes multidimensional tensors as input. We use second (i.e., averaged over time and frequency), third (i.e., averaged over time) and fourthorder (i.e., all data with dimension
, where x corresponds to a number of the frequency level) version of this scheme, denoted as DuSK2D, DuSK3D, DuSK4D respectively; the convolutional neural network (CNN) with 2D convolutions for averaged 2D brain network data and 3D convolutions for averaged 3D brain network data
[10]; the graph convolutional network (GCN) for averaged 2D brain network data [27], where the average of all brain networks is used as the graph structure (i.e., adjacency matrix) for information propagation; the proposed method and its variant without sparseconstraint (SSGK and ).Experimental Settings. We use the subjects collected from *** as the training set, and @@@ as the testing set in all the experiments. Following [11]
, we use the SVM with the Gaussian RBF kernel as the base classifier for all methods. We use the classification accuracy as the evaluation metric.
Algorithm Settings. All of the methods select the optimal tradeoff parameter of SVM and kernel width parameter from . Other parameters for gSpan and DuSK are set following [24] and [11], repectively. For our SSGK and methods, the parameter and was automatically selected from the value set of and by the grid search.
Results. Detailed results are listed in Table 1. From Table 1, it can be seen that the proposed SSGKbased methods outperform all compared methods by 10% 20% on almost all five different frequency bands. The superiority of the proposed methods demonstrate the effectiveness of utilizing the structure within the graph representation during encoding. More specifically, among all five frequency bands, SSGK produces the best performance on Alpha band and second best performance on Theta band, which is consistent with previous findings [22, 1] and can also be observed in our visualization in Figure 3. Furthermore, by comparing SSGK and , it is noticed that the proposed SSGK approach with sparse regularization consistently outperforms the same approach without sparse regularization, and the advantage of sparsity characterization indicates the importance of modeling the redundant information of observed frequency bands.
Frequency Band  
Category  Method  Delta  Theta  Alpha  Beta  All 
Traditional  Edge  42.42  54.55  51.52  51.52  45.45 
CC  54.55  54.55  42.42  51.52  42.42  
CPL  48.48  42.42  45.45  48.48  39.39  
gSpan  39.39  51.52  39.39  54.55  48.48  
DuSK–2D  51.52  63.64  51.51  51.52  54.55  
DuSK–3D  57.58  57.58  57.58  54.55  48.48  
DuSK–4D  54.55  54.55  51.52  54.55  57.58  
Deep Learning  CNN–2D  51.11  43.71  43.07  42.54  41.48 
CNN–3D  46.67  45.93  41.48  57.04  44.44  
GCN  41.31  48.08  41.01  40.61  37.37  
Ours  57.58  66.67  63.64  54.55  57.58  
SSGK  63.64  69.70  72.73  60.61  57.58 
5 Conclusion
In this paper, a graphbased kernel learning approach called Structurepreserving Symmetric Graph Kernel (SSGK) is proposed to deal with EEGderived connectome classification task. The proposed method mainly follows two consecutive steps: first, a sparseinducing symmetric matrix factorization strategy is applied to extract structural features from natural symmetric graph representations of EEGconnectome data, then the extracted structural features are directly used to define the SSGK function and further fed into the support vector machine for classification. The proposed method is clinically interpretable and is able to encode prior knowledge in the kernel with structural information in graph representation. Experimental results on challenging emotion recognition task demonstrates the effectiveness of the proposed method for encoding relevant EEGconnectome information.
Acknowledgements
Text Text Text Text Text Text Text Text. might want to know about text text text text
Funding
This work was supported by the National Institutes of Health [AA123456 to C.S., BB765432 to M.H.]; and the Alcohol Education Research Council.
References
 [1] (2010) The relationship of alpha waves and theta waves in eeg during relaxation and iq test. In 2010 2nd International Congress on Engineering Education, pp. 69–72. Cited by: §4.
 [2] (2015) What hemodynamic (fnirs), electrophysiological (eeg) and autonomic integrated measures can tell us about emotional processing. Brain and cognition 95, pp. 67–76. Cited by: §4.

[3]
(2011)
Reproducing kernel hilbert spaces in probability and statistics
. Springer Science & Business Media. Cited by: §2.  [4] (2000) An introduction to support vector machines and other kernelbased learning methods. Cambridge university press. Cited by: §2.
 [5] (2016) Networkbased classification of adhd patients using discriminative subnetwork selection and graph kernel pca. Computerized Medical Imaging and Graphics 52, pp. 82–88. Cited by: §1.
 [6] (2011) Discriminant analysis of functional connectivity patterns on grassmann manifold. Neuroimage 56 (4), pp. 2058–2067. Cited by: §1.
 [7] (2016) Fundamentals of brain network analysis. Academic Press. Cited by: §1.
 [8] (2015) Identifying connectome module patterns via new balanced multigraph normalized cut. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 169–176. Cited by: §1.
 [9] (2003) A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter 5 (1), pp. 49–58. Cited by: §2.
 [10] (2013) Natural image bases to represent neuroimaging data.. In ICML, pp. 987–994. Cited by: §4.
 [11] (2014) Dusk: a dual structurepreserving kernel for supervised tensor learning with applications to neuroimages. In SDM, pp. 127–135. Cited by: §1, §4, §4, §4.
 [12] (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §2.

[13]
(2013)
Identifying network correlates of brain states using tensor decompositions of wholebrain dynamic functional connectivity.
In
2013 International Workshop on Pattern Recognition in Neuroimaging
, pp. 74–77. Cited by: §1.  [14] (2009) Mental training enhances attentional stability: neural and behavioral evidence. The Journal of Neuroscience 29 (42), pp. 13418–13427. Cited by: §4.
 [15] (2010) Complex network measures of brain connectivity: uses and interpretations. Neuroimage 52 (3), pp. 1059–1069. Cited by: §4.
 [16] (2018) A study on handling non linear separation of classes using kernel based supervised noise clustering approach. International Journal of Computer Applications 181 (5), pp. 29–35. Cited by: §2.
 [17] (January 2014) Tensorlab v2.0. Available online. Note: http://www.tensorlab.net/ Cited by: §3.

[18]
(2013)
The nature of statistical learning theory
. Springer Science & Business Media. Cited by: §2.  [19] (2014) Human connectome module pattern detection using a new multigraph minmax cut model. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 313–320. Cited by: §1.
 [20] (2017) Structural deep brain network mining. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–484. Cited by: §1.
 [21] (1998) Collective dynamics of ‘smallworld’networks. nature 393 (6684), pp. 440–442. Cited by: §4.
 [22] (2016) EEG based functional connectivity reflects cognitive load during emotion regulation. In ISBI, Cited by: §4, §4.
 [23] (2016) A small number of abnormal brain connections predicts adult autism spectrum disorder. Nature communications 7 (1), pp. 1–12. Cited by: §1.
 [24] (2002) Gspan: graphbased substructure pattern mining. In ICDM, pp. 721–724. Cited by: §4, §4.
 [25] (2020) Unified brain network with functional and structural data. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 114–123. Cited by: §1.
 [26] (2012) Identifying major depression using wholebrain functional connectivity: a multivariate pattern analysis. Brain 135 (5), pp. 1498–1507. Cited by: §1.
 [27] (2018) Multiview graph convolutional network and its applications on neuroimage analysis for parkinson’s disease. In AMIA Annual Symposium Proceedings, Vol. 2018, pp. 1147. Cited by: §1, §4.
 [28] (2019) New graphblind convolutional network for brain connectome data analysis. In International Conference on Information Processing in Medical Imaging, pp. 669–681. Cited by: §1.