1 Introduction
Analyzing the brain activity through brain imaging such as electroencephalography (EEG) is important in understanding the mental state or thoughts of a person. It is essential in a variety of applications including brain computer interface, emotion recognition, and mental disease diagnosis. The brain consists of multiple functional regions, and the activation patterns over the regions provide valuable information regarding the mental state. Therefore, studying the interregional relationship appearing in the patterns, which is called functional connectivity (simply noted as connectivity in this paper), has been shown to be effective for analysis of the brain signal horwitz2003elusive ; bullmore2011brain . Since the brain regions do not lie on the Euclidean space, graphs are the most natural and suitable data structure to represent the connectivity. Previous studies showed that the graph analysis approach can be successfully applied to understanding brain signals rubinov2010complex ; preti2017dynamic ; betzel2017multi .
However, how to measure the level of connectivity, how to define an appropriate graph structure, and how to define appropriate features for signals from different brain regions are all still open problems. Usually, they are determined manually based on a priori knowledge. For instance, correlation or causality metrics between the signals of different regions can be used as connectivity measures ding2006granger ; then, a graph can be constructed by connecting pairs of brain regions showing large connectivity values bullmore2011brain , and finally, the power or entropy of the signal can be used as a feature of each region (i.e., a vertex of the graph) saa2010eeg ; sabeti2009entropy . Apparently, however, separately tackling these issues manually would not be the optimal. In fact, this challenge is applied to not only brain signal data, but also other data involving graph structures such as social networks and chemicals battaglia2018relational ; franceschi2019learning ; de2018molgan .
In this paper, we resolve these issues via direct learning from data. We propose a new deep learning model, which can extract both a graph structure and a feature vector at each vertex of the graph from raw timeseries EEG data, and perform classification using the extracted graph and features. The whole model is trained in an endtoend manner to maximize the classification performance, which can simultaneously optimize the graph extraction, feature extraction, and classification parts. The graph and features become all different adaptively for different data instances, which can overcome the notorious nontaskrelated variability of EEG signals. In particular, the extracted graphs are weighted directed multilayer graphs, which can convey rich information regarding the brain activation pattern better than undirected singlelayer graphs usually used in the previous work.
In addition, we propose three different graph sampling methods, which control the conditions of the extracted graphs such as weighted vs. unweighted, and multigraph vs. simple graph. Furthermore, we propose a way to evaluate the quality of the extracted graph structure in terms of consistency, in order to alleviate the limitation that no ground truth structure is available to measure the correctness of the obtained graph structure.
2 Related Work
Deep learning for brain signals
There are two major approaches of brain signal analysis and classification using traditional deep learning models. The signalbased approach exploits popular convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to which the timeseries brain signal data (themselves or after conversion to imagelike representations) are inputted, without considering interregional connectivity
schirrmeister2017deep ; zhang2018cascade ; bashivan2015learning . In the connectivitybased approach, on the other hand, the connectivity information representing pairwise correlation or causality is obtained and represented as images, which are modeled by CNNs phang2019classification ; moon2018convolutional .Graph neural networks (GNNs) wu2019comprehensive ; kipf2016semi ; defferrard2016convolutional ; gilmer2017neural are a type of neural networks for graph data, which can be used to directly model connectivitybased graph representations of brain signals as shown in jang2018eeg ; song2018eeg ; ktena2017distance . However, they have a limitation in that appropriate graph structures still need to be designed manually.
Graph structure inference
Research on finding descriptive graph structures from given nongraph data is still at its infancy qiao2018data ; battaglia2018relational . The neural relational inference kipf2018neural
is a variational autoencoderbased model to extract structural information of dynamic relational systems such as physical interaction systems for prediction of future states of dynamic objects from observed past states. However, this method additionally requires a priori knowledge about the graph structure (i.e., graph density), whereas our method relies only on the data to extract an appropriate graph structure. In
franceschi2019learning , a graph convolutional networkbased model is proposed, which jointly learns the graph structure and features using bilevel programming for node classification problems. This method is developed only for transductive learning, while our model is for inductive learning. Furthermore, to the best of our knowledge, our work is the first attempt to datadriven graph structure inference for EEG data.3 Proposed Method
3.1 Problem statement and notations
Our goal is to build a neural network model performing classification of the given EEG data by using the underlying connectivity structure that is also discovered by the model.
The given data is represented by , where is a set of time domain signals collected from sensors (i.e., EEG electrodes), whose length is , and is the corresponding class label.
The graph structure to be estimated is assumed to be a weighted directed multilayer graph without selfloops:
. represents the vertices, and and indicate the existence of edges and the edge weights between vertex pairs in the th graph layer, respectively. We assume .is a hyperparameter to control the number of graph layers.
3.2 Proposed model
The overall structure of our model is illustrated in Figure 1. It can be divided into four parts: graph membership extraction, graph sampling, feature extraction, and classification.
In the graph membership extraction part, the given set of raw signals are fed into the network to infer the membership of each edge to each graph layer. The inferred memberships are used for graph sampling to construct a multilayer graph structure. In the feature extraction part, the raw signals are converted to features using convolutional and maxpooling operations. Finally, the obtained graph structure and features are combined in the GNN to obtain the final classification result.
3.2.1 Graph membership extraction
From the given timeseries data, we compute the latent membership representing the certainty of the existence of a directed edge from vertex to vertex for each graph layer. Nodetoedge and edgetonode operations kipf2018neural and fullyconnected networks are used:
(1) 
(2) 
where denotes the concatenation operation and to are fullyconnected networks having exponential linear units clevert2015fast
ioffe2015batch .3.2.2 Graph sampling
From the graph layer membership information, the graph structure is obtained via probabilistic or deterministic sampling. We consider the following three methods for sampling.
Stochastic sampling (STO)
The stochastic sampling method probabilistically assigns the potential edge from vertex to vertex to one of the graph layers. Since the sampled graph weight is discrete, the Gumbelsoftmax reparametrization technique jang2016categorical ; maddison2016concrete is used to provide continuous relaxation and enable computation of gradients:
(3) 
where is a random vector whose components are i.i.d. and follow the standard Gumbel distribution. is the softmax temperature controlling sampling smoothness, which is set as in this paper. Then, the unweighted edge from to is obtained by
(4) 
Deterministic thresholding (DET)
The estimated graph has multiple layers, and we expect that different graph layers model different types of connectivity information. While the stochastic sampling method restricts each edge to belonging to only one graph layer, the deterministic thresholding method relaxes this restriction to allow a pair of vertices to have edges in multiple layers via thresholding:
(5) 
where is a threshold, which is set as in our work. The same continuous relaxation technique used in the stochastic sampling method is used to make discrete variables differentiable during training.
Continuous sampling (CON)
While the previous two methods construct unweighted graphs, the continuous sampling method allows edge weights to be continuous values so that different degrees of connectivity in different graph layers are maintained. For this, having continuous values between 0 and 1, which is obtained from the Gumbelsoftmax operation in (3), is directly used as the edge weight from to in the th graph layer. Therefore, this method produces the most general form of graph structures among the three methods, i.e., weighted directed multilayer graphs.
Skip layer
A graph constructed by the stochastic or continuous sampling methods enforces no ordered pair of vertices not to have an edge in all graph layers. However, there may exist no direct relationship between certain pairs of vertices. In order to enable this, one of the graph layers can be assigned as a skip layer. The skip layer is discarded when the graph is passed to the GNN so that the edges belonging to this layer are omitted in the graph used for classification.
3.2.3 Feature extraction
Apart from the process of graph extraction, signal features are extracted from the original signals by 1D convolutional and maxpooling operations. In order to capture the dynamic information of the signals, we adopt a 1D version of dilated inception modules
shi2017single ; yang2019dilated including dilated convolutional layers Yu2015multi with various dilation rates. The dilated convolutional layers with low dilation rates capture features appearing among neighboring samples, which correspond to fastchanging highfrequency information, while those with high dilation rates consider more slowlyvarying features over larger temporal windows. This can be seen as analogous to the popular EEG signal analysis approach where the signal is divided into multiple frequency bands for separate analysis harmony1996eeg ; barry2009eeg . As a result, we obtain the features , where is the reduced signal length and is the feature dimension.3.2.4 Graph neural network
The GNN performs classification using the signal features and the constructed graph to obtain the predicted class label . First, the nodetoedge operation is performed to the features, whose result is combined with the graph structure via the message passing operation, and the aggregation and edgetonode operations are performed:
(6) 
where is the time index, and if a skip layer is used (assuming that the first graph layer is assigned as the skip layer) and otherwise.
is modeled by a fullyconnected network having rectified linear units, whose input and output are
 and dimensional, respectively, i.e., . Finally, this result is concatenated with the signal features via a skip connection, which is fed into a fullyconnected network after vectorization, i.e.,(7) 
where .
4 Experiments
4.1 Setup
We use the DEAP dataset koelstra2011deap
, which is one of the largest databases regarding human affective mental states. It contains 32channel EEG recordings collected from 32 subjects during watching 40 affective video stimuli and the corresponding emotional ratings (valence and arousal) by the subjects. We consider a video identification task where the model classifies the given set of EEG signals to one of the 40 video stimuli. Details of the data processing can be found in Appendix A.1.
Our model^{1}^{1}1https://github.com/TBA
is implemented in PyTorch
paszke2017automatic , and is trained using the Adam optimizer kingma2014adam . We repeat the experiment five times with different random seeds, and the average performance is reported. Details of the model structure and training parameters are given in Appendix A.2.4.2 Result
Table 1
summarizes the classification accuracy of the proposed model (using the deterministic thresholding method for graph sampling and three graph layers without skip layer) and existing methods. We consider traditional classifiers including knearest neighbor (kNN) and random forest, and the ChebNetbased method
jang2018eeg where the graph structure is determined by physical distances between the electrodes and the signal entropy is used as features. In addition, we also test our model without graph structure extraction (noted as “GNN only” in the table), i.e., only the lower part in Figure 1, where an unweighted complete directed graph is used in the GNN part. It is clear that the proposed method yields significantly better performance than the other methods. Our method outperforms the ChebNetbased method, indicating that datadriven extraction of graphs and features is effective. When the graph extraction is omitted in our method (“GNN only”), the performance is significantly deteriorated, which also proves that the graph structure is crucial for modeling the EEG data. Further comparison with some other deep learning methods is shown in Appendix B, which also supports the effectiveness of our method.kNN  Random forest  ChebNet  GNN only  Proposed  
Accuracy  48.50%  51.34%  65.27%  44.70%  91.23% 
In Table 2, we show the accuracy of our model for various combinations of the graph sampling method, the number of graph layers (), and the existence of the skip layer. The result shows that the number of graph layers is the most important parameter. Singlelayer graphs, considered in most existing studies, are not sufficient, and modeling different types of interregion interaction with separate graph layers is highly beneficial. Among the three graph sampling methods, the deterministic thresholding method shows the best performance except the case for , and the continuous sampling method also shows good performance when the number of graph layers is large. It seems that the restriction of choosing only one edge among the graph layers in the stochastic sampling method limits the performance. The existence of the skip layer has only a minor effect to the performance, especially when the number of graph layers is large.
#Layers ()  1+skip  2  2+skip  3  3+skip 

STO  69.28%  73.98%  76.03%  86.86%  86.65% 
DET  55.91%  86.61%  83.14%  91.23%  91.04% 
CON  58.31%  76.29%  77.84%  90.08%  89.43% 
[trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_1_raw_v2  [trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_1_graph_v2 
(a)  (b) 
[trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_2_raw_v2  [trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_2_graph_v2 
(c)  (d) 
The extracted graph structures are analyzed using the tSNE technique maaten2008visualizing . Figure 2 compares the tSNE visualization of the original EEG signals and the adjacency matrices of all graph layers in the extracted graphs for two subjects, where different colors indicate different classes. In Figures 2(a) and 2(c), the raw signals of different classes are rather mixed, so it is not easy to distinguish them. On the other hand, the graphs of the same class are closely grouped in Figures 2(b) and 2(d), which greatly contributes to classification.
4.3 Graph consistency analysis
Since no ground truth for the graph structure is available, it is not possible to evaluate the correctness of the extracted graphs. As a way to evaluate the quality of the extracted graphs, we consider their consistency over repeated experiments. That is, the dissimilarity between the graph structures obtained from different repetitions is measured, which is applied to all pairwise combinations of repetitions. A low level of dissimilarity, i.e., a high level of consistency, among the extracted graphs indicates that they are reliable and meaningful.
The overall procedure to compute the dissimilarity of two graph structures is summarized in Algorithm 1. Basically, a distance function is applied to the adjacency matrices of the two graphs, for which we adopt the sum of absolute differences. Note that we do not have the issue of isomorphism because the vertices are clearly identified as distinguished EEG electrodes, which allows computation of differences. The distances for all pairs of repetitions are averaged, divided by the total number of possible edges, and subtracted from 1 to get the final consistency score.
One issue in measuring the dissimilarity between two graphs is the permutation ambiguity of graph layers. We do not impose any restriction on the order of the graph layers during training, thus permutation of the order of the graph layers needs to be allowed in the dissimilarity computation. We simply perform an exhaustive search to find the best matching permutation for the two graphs.
Table 3 shows the result of consistency analysis in percentage. Except the case with , high consistency levels (about 75 to 90%) are observed across different conditions. This shows that the graph extraction process works reliably, and we can expect that the extracted graphs contain meaningful representations of the data.
#Layers ()  1+skip  2  2+skip  3  3+skip 

STO  43.91%  89.32%  89.67%  83.41%  79.10% 
DET  61.23%  88.01%  85.31%  77.79%  77.66% 
CON  55.87%  84.01%  76.58%  81.64%  82.11% 
4.4 Graph structure analysis
For the best case of our method (i.e., , no skip layer, and deterministic thresholding), we examine the obtained graph structures in order to understand the learned representations in the perspective of emotional cognitive responses. Since the graph structure is different for each data, we obtain a representative graph structure by aggregating multiple graph structures. Here, we consider a graph containing the most frequently appearing (top10%) edges as the representative structure. In Figure 3(a), the “default graph” that contains edges activated most frequently regardless of the task (i.e., across different video stimuli) by obtaining the representative graph for all test data. Figures 3(b) and 3(c) correspond to the representative graph structures for the video stimuli corresponding to the most positive and most negative valence ratings, respectively. Note that the size of a vertex indicates its indegree; the outdegree is not explicitly shown because it is similar across all vertices (see Appendix C.2).
In the first layer (red) of Figure 3(a), strong activations toward the left temporal lobe are observed. The temporal lobe is related to the processing of complex visual stimuli such as scenes alarcao17emotions ; the amygdala, which plays an important role in the emotional processing, is also located in the medial temporal lobe pessoa10emotion . Therefore, we can say that this first layer represents the mental state under exposure to emotional visual stimuli. The functional connectivity related to visual content and emotional processing in the first layer is also observed in Figures 3(b) and 3(c). The first layer of Figure 3(c) contains a large number of edges entering the frontal and occipital lobes that are related to the emotional processing schmidt01frontal ; mohammadi17wavelet and sensory processing of visual stimuli kamps16occipital ; grillspector04human , respectively. In the case of the first layer of Figure 3
(b), the connectivity is largely related to the content of the video stimulus; the video contains rhythmical dances, which probably contribute to the incoming connections to the frontocentral area that is known to be involved in the motorrelated perception
hauk04neuro .The second layers (green) of Figures 3(b) and 3(c) show the patterns that are clearly distinguished from each other. The right part of the frontal region receives a larger number of incoming connections than the left part in Figure 3(b), which is opposite in Figure 3(c). In literature, it is consistently reported that the asymmetry of the left and right frontal lobes is significantly associated with the valence of emotion harmonjones10role ; reznik18frontal . That is, one side of the frontal brain is more activated compared to the other side during the valencerelated emotional processing, and the more activated side changes depending on the polarity of the emotion, which accords with the observed patterns in the second layers of Figures 3(b) and 3(c). In the second layer of Figure 3(a), the incoming edges are rather spread over the whole brain, which can be considered as a result of aggregation of the patterns appearing in the positive and negative valence video stimuli. Therefore, we can deduce that the second layers have learned the valencerelated characteristics of the brain signal.
It seems that the third layers (blue) in Figure 3, showing relatively sparse connections, mainly supplement the other layers. In all cases, the frontal region of the brain receives a large number of connections. The frontocentral lobe is also attached with a number of incoming edges in Figure 3(b), which is probably due to the same reason to the case of the first layer (i.e., motorrelated).





5 Conclusion
We have proposed an endtoend deep network that can extract an appropriate directed multilayer graph structure from the given raw EEG signals for classification without any a priori information about the desirable structure. The experimental results showed that this datadriven approach for learning the graph structure significantly improves the classification performance in comparison to the other deep neural network and GNN approaches based on manually defined features and graphs. It was also shown that the extracted graph structures are reliable, and consistent with the known brain activation patterns of cognitive responses to emotional visual stimuli.
References
 [1] Barry Horwitz. The elusive concept of brain connectivity. NeuroImage, 19(2):466–470, 2003.
 [2] Edward T. Bullmore and Danielle S. Bassett. Brain graphs: graphical models of the human brain connectome. Annual Review of Clinical Psychology, 7:113–140, 2011.
 [3] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. NeuroImage, 52(3):1059–1069, 2010.
 [4] Maria Giulia Preti, Thomas A.W. Bolton, and Dimitri Van De Ville. The dynamic functional connectome: stateoftheart and perspectives. NeuroImage, 160:41–54, 2017.
 [5] Richard F. Betzel and Danielle S. Bassett. Multiscale brain networks. NeuroImage, 160:73–83, 2017.
 [6] Mingzhou Ding, Yonghong Chen, and Steven L. Bressler. Granger causality: basic theory and application to neuroscience. In Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, chapter 17, pages 437–460. Wiley Online Library, 2006.
 [7] Jaime F. Delgado Saa and Miguel Sotaquirá Gutierrez. EEG signal classification using power spectral features and linear discriminant analysis: a brain computer interface application. In Proceedings of the 8th Latin American and Caribbean Conference for Engineering and Technology, pages 1–7, 2010.
 [8] Malihe Sabeti, Serajeddin Katebi, and Reza Boostani. Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artificial Intelligence in Medicine, 47(3):263–274, 2009.
 [9] Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro SanchezGonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Dann Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, pages 1–40, 2018.

[10]
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He.
Learning discrete structures for graph neural networks.
In
Proceedings of the 36th International Conference on Machine Learning
, pages 1–13, 2019.  [11] Nicola De Cao and Thomas Kipf. MolGAN: an implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, pages 1–11, 2018.
 [12] Robin Tibor Schirrmeister, Jost Tobias Springenberg, Lukas Dominique Josef Fiederer, Martin Glasstetter, Katharina Eggensperger, Michael Tangermann, Frank Hutter, Wolfram Burgard, and Tonio Ball. Deep learning with convolutional neural networks for EEG decoding and visualization. Human Brain Mapping, 38(11):5391–5420, 2017.
 [13] Dalin Zhang, Lina Yao, Xiang Zhang, Sen Wang, Weitong Chen, Robert Boots, and Boualem Benatallah. Cascade and parallel convolutional recurrent neural networks on EEGbased intention recognition for brain computer interface. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, pages 1703–1710, 2018.
 [14] Pouya Bashivan, Irina Rish, Mohammed Yeasin, and Noel Codella. Learning representations from EEG with deep recurrentconvolutional neural networks. In Proceedings of the 4th International Conference on Learning Representations, pages 1–15, 2016.
 [15] ChunRen Phang, CheeMing Ting, Fuad Noman, and Hernando Ombao. Classification of EEGbased brain connectivity networks in schizophrenia using a multidomain connectome convolutional neural network. arXiv preprint arXiv:1903.08858, pages 1–15, 2019.
 [16] SeongEun Moon, Soobeom Jang, and JongSeok Lee. Convolutional neural network approach for EEGbased emotion recognition using brain connectivity and its spatial information. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2556–2560, 2018.
 [17] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, pages 1–22, 2019.
 [18] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, pages 1–14, 2017.
 [19] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 3844–3852, 2016.
 [20] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, pages 1263–1272, 2017.
 [21] Soobeom Jang, SeongEun Moon, and JongSeok Lee. EEGbased video identification using graph signal modeling and graph convolutional neural network. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3066–3070, 2018.
 [22] Tengfei Song, Wenming Zheng, Peng Song, and Zhen Cui. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing (Early Access), pages 1–10, 2018.
 [23] Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, and Daniel Rueckert. Distance metric learning using graph convolutional networks: application to functional brain networks. In Proceedings of the International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 469–477, 2017.
 [24] Lishan Qiao, Limei Zhang, Songcan Chen, and Dinggang Shen. Datadriven graph construction and graph learning: a review. Neurocomputing, 312:336–351, 2018.
 [25] Thomas Kipf, Ethan Fetaya, KuanChieh Wang, Max Welling, and Richard Zemel. Neural relational inference for interacting systems. In Proceedings of the 35th International Conference on Machine Learning, pages 2688–2697, 2018.
 [26] DjorkArné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the 3rd International Conference on Learning Representations, pages 1–14, 2015.
 [27] Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, pages 448–456, 2015.
 [28] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbelsoftmax. In Proceedings of the 5th International Conference on Learning Representations, pages 1–12, 2017.

[29]
Chris J. Maddison, Andriy Mnih, and Yee Whye Teh.
The concrete distribution: a continuous relaxation of discrete random variables.
In Proceedings of the 5th International Conference on Learning Representations, pages 1–20, 2017. 
[30]
Wuzhen Shi, Feng Jiang, and Debin Zhao.
Single image superresolution with dilated convolution based multiscale information learning inception module.
In Proceedings of the IEEE International Conference on Image Processing, pages 977–981, 2017.  [31] Sheng Yang, Guosheng Lin, Qiuping Jiang, and Weisi Lin. A dilated inception network for visual saliency prediction. arXiv preprint arXiv:1904.03571, 2019.
 [32] Fisher Yu and Vladlen Koltun. Multiscale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, pages 1–13, 2016.
 [33] Thalía Harmony, Thalía Fernández, Juan Silva, Jorge Bernal, Lourdes DíazComas, Alfonso Reyes, Erzsébet Marosi, Mario Rodríguez, and Miguel Rodríguez. EEG delta activity: an indicator of attention to internal processing during performance of mental tasks. International Journal of Psychophysiology, 24(12):161–171, 1996.
 [34] Robert J. Barry, Adam R Clarke, Stuart J. Johnstone, and Christopher R. Brown. EEG differences in children between eyesclosed and eyesopen resting conditions. Clinical Neurophysiology, 120(10):1806–1811, 2009.
 [35] Sander Koelstra, Christian Muhl, Mohammad Soleymani, JongSeok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. DEAP: a database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 3(1):18–31, 2011.
 [36] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In Proceedings of the NIPS 2017 Autodiff Workshop: The Future of Gradientbased Machine Learning Software and Techniques, pages 1–4, 2017.
 [37] Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, pages 1–15, 2015.
 [38] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using tSNE. Journal of Machine Learning Research, 9:2579–2605, 2008.
 [39] Soraia M. Alarcao and Manuel J. Fonseca. Emotions recognition using EEG signals: a survey. IEEE Transactions on Affective Computing (Early Access), pages 1–20, 2017.
 [40] Luiz Pessoa. Emotion and cognition and the amygdala: from “what is it?” to “what’s to be done?”. Neuropsychologia, 48(12):3416–3429, 2010.
 [41] Louis A. Schmidt and Laurel J. Trainor. Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition & Emotion, 15(4):487–500, 2001.
 [42] Zeynab Mohammadi, Javad Frounchi, and Mahmood Amiri. Waveletbased emotion recognition system using EEG signal. Neural Computing and Applications, 28(8):1985–1990, 2017.
 [43] Frederik S. Kamps, Joshua B. Julian, Jonas Kubilius, Nancy Kanwisher, and Daniel D. Dilks. The occipital place area represents the local elements of scenes. NeuroImage, 132:417–424, 2016.
 [44] K. GrillSpector and R. Malach. The human visual cortex. Annual Review of Neuroscience, 27:649–677, 2004.
 [45] Olaf Hauk and Friedman Pulvermuller. Neurophysiological distinction of action words in the frontocentral cortex. Human Brain Mapping, 21(3):191–201, 2004.
 [46] Eddie HarmonJones, Philip A. Gable, and Carly K. Peterson. The role of asymmetric frontal cortical activity in emotionrelated phenomena: a review and update. Biological Psychology, 84(3):451–462, 2010.
 [47] Samantha J. Reznik and John J. B. Allen. Frontal asymmetry as a mediator and moderator of emotion: an updated review. Psychophysiology, 55(1):1–32, 2018.
Appendix A Implementation Details
a.1 Data processing
Each oneminutelong EEG signal in the DEAP dataset is divided into threesecondlong segments () with a twosecondlong overlap. This results in a total of 74,240 (32 subjects40 videos58 segments) sets of 32channel EEG signals. They are randomly split into the training, validation, and test datasets, which hold 80%, 10%, and 10% of the entire dataset, respectively.
a.2 Model details
Graph membership extraction
, , and
are twolayer fullyconnected networks, which have 256 hidden neurons and 256 output neurons. Each fullyconnected layer has exponential linear units
[26], and batch normalization [27] is used in the output layer. consists of three fullyconnected layers, where the first two layers have 256 hidden neurons with exponential linear units and the last layer has output neurons.Feature extraction
Each dilated inception module consists of four 1D convolutional layers having a kernel size of 3, which have dilation rates of 1, 2, 4, and 8, respectively. Each layer has eight output channels, so . The maxpooling size is set as . Three dilated inception modules are connected in a row so that the signal features have a reduced length of .
Graph neural network
, are twolayer fullyconnected networks with 256 hidden neurons and 256 output neurons with rectified linear units. is a twolayer fullyconnected network having 256 hidden neurons with rectified linear units and 40 softmax output neurons.
Training
Our model is trained with the Adam optimizer [37]
with a learning rate of 0.0001 for 30 epochs to minimize the crossentropy loss. The batch size is 32. The training procedure takes about 1012 hours using a single NVIDIA K80 GPU. The test accuracy is measured using the network showing the best validation accuracy during the training process.
Appendix B Further Performance Comparison
The performance of our method, which is shown in Section 4.2, is further compared with the two existing methods using CNNs [14, 16].
The method in [14] first divides the given timeseries signal at each electrode into 10 different frequency bands. Then, the power spectral density (PSD) is extracted from each divided signal as the signal feature. The PSD values of all electrodes for each frequency band are arranged as a 3232 2D image according to the physical locations of the electrodes. Thus, we obtain a 323210 matrix, which is inputted to a CNN.
In the method in [16], the phase locking value (PLV) features representing connectivity are extracted and used for a CNN as input. The PLV values for all pairs of electrodes are arranged as a 3232 2D matrix. Aggregating these matrices for all bands results in a 323210 matrix as the input to a CNN.
CNNs having four convolutional layers (conv), two maxpooling layers (maxpool), and a fullyconnected layer (fc) are used, i.e., conv(32)conv(64)maxpoolconv(128)conv(256)maxpoolfc, where the numbers in the parentheses indicate the numbers of convolutional filters. We follow the way of signal segmentation originally used in [16] for these methods, because training the CNNs with the dataset used in our experiment was not successful. Thus, the accuracy for these two methods may not be directly compared to that shown in Section 4.2, but can be considered for a rough comparison.
The obtained accuracy of the two methods is 31.03% and 72.09%, respectively. This shows that modeling connectivity features with CNN is beneficial, whose result is roughly comparable to that of ChebNet. However, it is still far worse than the result of the proposed method, proving the effectiveness of the datadriven approach for connectivity extraction.
Appendix C Supplementary Graph Visualization
c.1 EEG electrodes in graph visualization
Figure C.1 shows the names and positions of the 32 EEG electrodes for the graph representation used in this paper.
[trim=.150pt .150pt .150pt .150pt,clip,width=3cm]Figures/All_Name_v5 
c.2 Visualization of outdegrees




