Analyzing the brain activity through brain imaging such as electroencephalography (EEG) is important in understanding the mental state or thoughts of a person. It is essential in a variety of applications including brain computer interface, emotion recognition, and mental disease diagnosis. The brain consists of multiple functional regions, and the activation patterns over the regions provide valuable information regarding the mental state. Therefore, studying the inter-regional relationship appearing in the patterns, which is called functional connectivity (simply noted as connectivity in this paper), has been shown to be effective for analysis of the brain signal horwitz2003elusive ; bullmore2011brain . Since the brain regions do not lie on the Euclidean space, graphs are the most natural and suitable data structure to represent the connectivity. Previous studies showed that the graph analysis approach can be successfully applied to understanding brain signals rubinov2010complex ; preti2017dynamic ; betzel2017multi .
However, how to measure the level of connectivity, how to define an appropriate graph structure, and how to define appropriate features for signals from different brain regions are all still open problems. Usually, they are determined manually based on a priori knowledge. For instance, correlation or causality metrics between the signals of different regions can be used as connectivity measures ding2006granger ; then, a graph can be constructed by connecting pairs of brain regions showing large connectivity values bullmore2011brain , and finally, the power or entropy of the signal can be used as a feature of each region (i.e., a vertex of the graph) saa2010eeg ; sabeti2009entropy . Apparently, however, separately tackling these issues manually would not be the optimal. In fact, this challenge is applied to not only brain signal data, but also other data involving graph structures such as social networks and chemicals battaglia2018relational ; franceschi2019learning ; de2018molgan .
In this paper, we resolve these issues via direct learning from data. We propose a new deep learning model, which can extract both a graph structure and a feature vector at each vertex of the graph from raw time-series EEG data, and perform classification using the extracted graph and features. The whole model is trained in an end-to-end manner to maximize the classification performance, which can simultaneously optimize the graph extraction, feature extraction, and classification parts. The graph and features become all different adaptively for different data instances, which can overcome the notorious non-task-related variability of EEG signals. In particular, the extracted graphs are weighted directed multi-layer graphs, which can convey rich information regarding the brain activation pattern better than undirected single-layer graphs usually used in the previous work.
In addition, we propose three different graph sampling methods, which control the conditions of the extracted graphs such as weighted vs. unweighted, and multigraph vs. simple graph. Furthermore, we propose a way to evaluate the quality of the extracted graph structure in terms of consistency, in order to alleviate the limitation that no ground truth structure is available to measure the correctness of the obtained graph structure.
2 Related Work
Deep learning for brain signals
There are two major approaches of brain signal analysis and classification using traditional deep learning models. The signal-based approach exploits popular convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to which the time-series brain signal data (themselves or after conversion to image-like representations) are inputted, without considering inter-regional connectivityschirrmeister2017deep ; zhang2018cascade ; bashivan2015learning . In the connectivity-based approach, on the other hand, the connectivity information representing pairwise correlation or causality is obtained and represented as images, which are modeled by CNNs phang2019classification ; moon2018convolutional .
Graph neural networks (GNNs) wu2019comprehensive ; kipf2016semi ; defferrard2016convolutional ; gilmer2017neural are a type of neural networks for graph data, which can be used to directly model connectivity-based graph representations of brain signals as shown in jang2018eeg ; song2018eeg ; ktena2017distance . However, they have a limitation in that appropriate graph structures still need to be designed manually.
Graph structure inference
is a variational autoencoder-based model to extract structural information of dynamic relational systems such as physical interaction systems for prediction of future states of dynamic objects from observed past states. However, this method additionally requires a priori knowledge about the graph structure (i.e., graph density), whereas our method relies only on the data to extract an appropriate graph structure. Infranceschi2019learning , a graph convolutional network-based model is proposed, which jointly learns the graph structure and features using bilevel programming for node classification problems. This method is developed only for transductive learning, while our model is for inductive learning. Furthermore, to the best of our knowledge, our work is the first attempt to data-driven graph structure inference for EEG data.
3 Proposed Method
3.1 Problem statement and notations
Our goal is to build a neural network model performing classification of the given EEG data by using the underlying connectivity structure that is also discovered by the model.
The given data is represented by , where is a set of time domain signals collected from sensors (i.e., EEG electrodes), whose length is , and is the corresponding class label.
The graph structure to be estimated is assumed to be a weighted directed multi-layer graph without self-loops:. represents the vertices, and and indicate the existence of edges and the edge weights between vertex pairs in the th graph layer, respectively. We assume .
is a hyperparameter to control the number of graph layers.
3.2 Proposed model
The overall structure of our model is illustrated in Figure 1. It can be divided into four parts: graph membership extraction, graph sampling, feature extraction, and classification.
In the graph membership extraction part, the given set of raw signals are fed into the network to infer the membership of each edge to each graph layer. The inferred memberships are used for graph sampling to construct a multi-layer graph structure. In the feature extraction part, the raw signals are converted to features using convolutional and max-pooling operations. Finally, the obtained graph structure and features are combined in the GNN to obtain the final classification result.
3.2.1 Graph membership extraction
From the given time-series data, we compute the latent membership representing the certainty of the existence of a directed edge from vertex to vertex for each graph layer. Node-to-edge and edge-to-node operations kipf2018neural and fully-connected networks are used:
where denotes the concatenation operation and to are fully-connected networks having exponential linear units clevert2015fastioffe2015batch .
3.2.2 Graph sampling
From the graph layer membership information, the graph structure is obtained via probabilistic or deterministic sampling. We consider the following three methods for sampling.
Stochastic sampling (STO)
The stochastic sampling method probabilistically assigns the potential edge from vertex to vertex to one of the graph layers. Since the sampled graph weight is discrete, the Gumbel-softmax reparametrization technique jang2016categorical ; maddison2016concrete is used to provide continuous relaxation and enable computation of gradients:
where is a random vector whose components are i.i.d. and follow the standard Gumbel distribution. is the softmax temperature controlling sampling smoothness, which is set as in this paper. Then, the unweighted edge from to is obtained by
Deterministic thresholding (DET)
The estimated graph has multiple layers, and we expect that different graph layers model different types of connectivity information. While the stochastic sampling method restricts each edge to belonging to only one graph layer, the deterministic thresholding method relaxes this restriction to allow a pair of vertices to have edges in multiple layers via thresholding:
where is a threshold, which is set as in our work. The same continuous relaxation technique used in the stochastic sampling method is used to make discrete variables differentiable during training.
Continuous sampling (CON)
While the previous two methods construct unweighted graphs, the continuous sampling method allows edge weights to be continuous values so that different degrees of connectivity in different graph layers are maintained. For this, having continuous values between 0 and 1, which is obtained from the Gumbel-softmax operation in (3), is directly used as the edge weight from to in the th graph layer. Therefore, this method produces the most general form of graph structures among the three methods, i.e., weighted directed multi-layer graphs.
A graph constructed by the stochastic or continuous sampling methods enforces no ordered pair of vertices not to have an edge in all graph layers. However, there may exist no direct relationship between certain pairs of vertices. In order to enable this, one of the graph layers can be assigned as a skip layer. The skip layer is discarded when the graph is passed to the GNN so that the edges belonging to this layer are omitted in the graph used for classification.
3.2.3 Feature extraction
Apart from the process of graph extraction, signal features are extracted from the original signals by 1-D convolutional and max-pooling operations. In order to capture the dynamic information of the signals, we adopt a 1-D version of dilated inception modulesshi2017single ; yang2019dilated including dilated convolutional layers Yu2015multi with various dilation rates. The dilated convolutional layers with low dilation rates capture features appearing among neighboring samples, which correspond to fast-changing high-frequency information, while those with high dilation rates consider more slowly-varying features over larger temporal windows. This can be seen as analogous to the popular EEG signal analysis approach where the signal is divided into multiple frequency bands for separate analysis harmony1996eeg ; barry2009eeg . As a result, we obtain the features , where is the reduced signal length and is the feature dimension.
3.2.4 Graph neural network
The GNN performs classification using the signal features and the constructed graph to obtain the predicted class label . First, the node-to-edge operation is performed to the features, whose result is combined with the graph structure via the message passing operation, and the aggregation and edge-to-node operations are performed:
where is the time index, and if a skip layer is used (assuming that the first graph layer is assigned as the skip layer) and otherwise.
is modeled by a fully-connected network having rectified linear units, whose input and output are- and -dimensional, respectively, i.e., . Finally, this result is concatenated with the signal features via a skip connection, which is fed into a fully-connected network after vectorization, i.e.,
We use the DEAP dataset koelstra2011deap
, which is one of the largest databases regarding human affective mental states. It contains 32-channel EEG recordings collected from 32 subjects during watching 40 affective video stimuli and the corresponding emotional ratings (valence and arousal) by the subjects. We consider a video identification task where the model classifies the given set of EEG signals to one of the 40 video stimuli. Details of the data processing can be found in Appendix A.1.
is implemented in PyTorchpaszke2017automatic , and is trained using the Adam optimizer kingma2014adam . We repeat the experiment five times with different random seeds, and the average performance is reported. Details of the model structure and training parameters are given in Appendix A.2.
summarizes the classification accuracy of the proposed model (using the deterministic thresholding method for graph sampling and three graph layers without skip layer) and existing methods. We consider traditional classifiers including k-nearest neighbor (k-NN) and random forest, and the ChebNet-based methodjang2018eeg where the graph structure is determined by physical distances between the electrodes and the signal entropy is used as features. In addition, we also test our model without graph structure extraction (noted as “GNN only” in the table), i.e., only the lower part in Figure 1, where an unweighted complete directed graph is used in the GNN part. It is clear that the proposed method yields significantly better performance than the other methods. Our method outperforms the ChebNet-based method, indicating that data-driven extraction of graphs and features is effective. When the graph extraction is omitted in our method (“GNN only”), the performance is significantly deteriorated, which also proves that the graph structure is crucial for modeling the EEG data. Further comparison with some other deep learning methods is shown in Appendix B, which also supports the effectiveness of our method.
|k-NN||Random forest||ChebNet||GNN only||Proposed|
In Table 2, we show the accuracy of our model for various combinations of the graph sampling method, the number of graph layers (), and the existence of the skip layer. The result shows that the number of graph layers is the most important parameter. Single-layer graphs, considered in most existing studies, are not sufficient, and modeling different types of inter-region interaction with separate graph layers is highly beneficial. Among the three graph sampling methods, the deterministic thresholding method shows the best performance except the case for , and the continuous sampling method also shows good performance when the number of graph layers is large. It seems that the restriction of choosing only one edge among the graph layers in the stochastic sampling method limits the performance. The existence of the skip layer has only a minor effect to the performance, especially when the number of graph layers is large.
|[trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_1_raw_v2||[trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_1_graph_v2|
|[trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_2_raw_v2||[trim=.280pt .280pt .280pt .260pt,clip,width=0.45]Figures/subject_2_graph_v2|
The extracted graph structures are analyzed using the t-SNE technique maaten2008visualizing . Figure 2 compares the t-SNE visualization of the original EEG signals and the adjacency matrices of all graph layers in the extracted graphs for two subjects, where different colors indicate different classes. In Figures 2(a) and 2(c), the raw signals of different classes are rather mixed, so it is not easy to distinguish them. On the other hand, the graphs of the same class are closely grouped in Figures 2(b) and 2(d), which greatly contributes to classification.
4.3 Graph consistency analysis
Since no ground truth for the graph structure is available, it is not possible to evaluate the correctness of the extracted graphs. As a way to evaluate the quality of the extracted graphs, we consider their consistency over repeated experiments. That is, the dissimilarity between the graph structures obtained from different repetitions is measured, which is applied to all pairwise combinations of repetitions. A low level of dissimilarity, i.e., a high level of consistency, among the extracted graphs indicates that they are reliable and meaningful.
The overall procedure to compute the dissimilarity of two graph structures is summarized in Algorithm 1. Basically, a distance function is applied to the adjacency matrices of the two graphs, for which we adopt the sum of absolute differences. Note that we do not have the issue of isomorphism because the vertices are clearly identified as distinguished EEG electrodes, which allows computation of differences. The distances for all pairs of repetitions are averaged, divided by the total number of possible edges, and subtracted from 1 to get the final consistency score.
One issue in measuring the dissimilarity between two graphs is the permutation ambiguity of graph layers. We do not impose any restriction on the order of the graph layers during training, thus permutation of the order of the graph layers needs to be allowed in the dissimilarity computation. We simply perform an exhaustive search to find the best matching permutation for the two graphs.
Table 3 shows the result of consistency analysis in percentage. Except the case with , high consistency levels (about 75 to 90%) are observed across different conditions. This shows that the graph extraction process works reliably, and we can expect that the extracted graphs contain meaningful representations of the data.
4.4 Graph structure analysis
For the best case of our method (i.e., , no skip layer, and deterministic thresholding), we examine the obtained graph structures in order to understand the learned representations in the perspective of emotional cognitive responses. Since the graph structure is different for each data, we obtain a representative graph structure by aggregating multiple graph structures. Here, we consider a graph containing the most frequently appearing (top-10%) edges as the representative structure. In Figure 3(a), the “default graph” that contains edges activated most frequently regardless of the task (i.e., across different video stimuli) by obtaining the representative graph for all test data. Figures 3(b) and 3(c) correspond to the representative graph structures for the video stimuli corresponding to the most positive and most negative valence ratings, respectively. Note that the size of a vertex indicates its in-degree; the out-degree is not explicitly shown because it is similar across all vertices (see Appendix C.2).
In the first layer (red) of Figure 3(a), strong activations toward the left temporal lobe are observed. The temporal lobe is related to the processing of complex visual stimuli such as scenes alarcao17emotions ; the amygdala, which plays an important role in the emotional processing, is also located in the medial temporal lobe pessoa10emotion . Therefore, we can say that this first layer represents the mental state under exposure to emotional visual stimuli. The functional connectivity related to visual content and emotional processing in the first layer is also observed in Figures 3(b) and 3(c). The first layer of Figure 3(c) contains a large number of edges entering the frontal and occipital lobes that are related to the emotional processing schmidt01frontal ; mohammadi17wavelet and sensory processing of visual stimuli kamps16occipital ; grill-spector04human , respectively. In the case of the first layer of Figure 3
(b), the connectivity is largely related to the content of the video stimulus; the video contains rhythmical dances, which probably contribute to the incoming connections to the fronto-central area that is known to be involved in the motor-related perceptionhauk04neuro .
The second layers (green) of Figures 3(b) and 3(c) show the patterns that are clearly distinguished from each other. The right part of the frontal region receives a larger number of incoming connections than the left part in Figure 3(b), which is opposite in Figure 3(c). In literature, it is consistently reported that the asymmetry of the left and right frontal lobes is significantly associated with the valence of emotion harmon-jones10role ; reznik18frontal . That is, one side of the frontal brain is more activated compared to the other side during the valence-related emotional processing, and the more activated side changes depending on the polarity of the emotion, which accords with the observed patterns in the second layers of Figures 3(b) and 3(c). In the second layer of Figure 3(a), the incoming edges are rather spread over the whole brain, which can be considered as a result of aggregation of the patterns appearing in the positive and negative valence video stimuli. Therefore, we can deduce that the second layers have learned the valence-related characteristics of the brain signal.
It seems that the third layers (blue) in Figure 3, showing relatively sparse connections, mainly supplement the other layers. In all cases, the frontal region of the brain receives a large number of connections. The fronto-central lobe is also attached with a number of incoming edges in Figure 3(b), which is probably due to the same reason to the case of the first layer (i.e., motor-related).
We have proposed an end-to-end deep network that can extract an appropriate directed multi-layer graph structure from the given raw EEG signals for classification without any a priori information about the desirable structure. The experimental results showed that this data-driven approach for learning the graph structure significantly improves the classification performance in comparison to the other deep neural network and GNN approaches based on manually defined features and graphs. It was also shown that the extracted graph structures are reliable, and consistent with the known brain activation patterns of cognitive responses to emotional visual stimuli.
-  Barry Horwitz. The elusive concept of brain connectivity. NeuroImage, 19(2):466–470, 2003.
-  Edward T. Bullmore and Danielle S. Bassett. Brain graphs: graphical models of the human brain connectome. Annual Review of Clinical Psychology, 7:113–140, 2011.
-  Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. NeuroImage, 52(3):1059–1069, 2010.
-  Maria Giulia Preti, Thomas A.W. Bolton, and Dimitri Van De Ville. The dynamic functional connectome: state-of-the-art and perspectives. NeuroImage, 160:41–54, 2017.
-  Richard F. Betzel and Danielle S. Bassett. Multi-scale brain networks. NeuroImage, 160:73–83, 2017.
-  Mingzhou Ding, Yonghong Chen, and Steven L. Bressler. Granger causality: basic theory and application to neuroscience. In Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, chapter 17, pages 437–460. Wiley Online Library, 2006.
-  Jaime F. Delgado Saa and Miguel Sotaquirá Gutierrez. EEG signal classification using power spectral features and linear discriminant analysis: a brain computer interface application. In Proceedings of the 8th Latin American and Caribbean Conference for Engineering and Technology, pages 1–7, 2010.
-  Malihe Sabeti, Serajeddin Katebi, and Reza Boostani. Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artificial Intelligence in Medicine, 47(3):263–274, 2009.
-  Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Dann Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, pages 1–40, 2018.
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He.
Learning discrete structures for graph neural networks.
Proceedings of the 36th International Conference on Machine Learning, pages 1–13, 2019.
-  Nicola De Cao and Thomas Kipf. MolGAN: an implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, pages 1–11, 2018.
-  Robin Tibor Schirrmeister, Jost Tobias Springenberg, Lukas Dominique Josef Fiederer, Martin Glasstetter, Katharina Eggensperger, Michael Tangermann, Frank Hutter, Wolfram Burgard, and Tonio Ball. Deep learning with convolutional neural networks for EEG decoding and visualization. Human Brain Mapping, 38(11):5391–5420, 2017.
-  Dalin Zhang, Lina Yao, Xiang Zhang, Sen Wang, Weitong Chen, Robert Boots, and Boualem Benatallah. Cascade and parallel convolutional recurrent neural networks on EEG-based intention recognition for brain computer interface. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, pages 1703–1710, 2018.
-  Pouya Bashivan, Irina Rish, Mohammed Yeasin, and Noel Codella. Learning representations from EEG with deep recurrent-convolutional neural networks. In Proceedings of the 4th International Conference on Learning Representations, pages 1–15, 2016.
-  Chun-Ren Phang, Chee-Ming Ting, Fuad Noman, and Hernando Ombao. Classification of EEG-based brain connectivity networks in schizophrenia using a multi-domain connectome convolutional neural network. arXiv preprint arXiv:1903.08858, pages 1–15, 2019.
-  Seong-Eun Moon, Soobeom Jang, and Jong-Seok Lee. Convolutional neural network approach for EEG-based emotion recognition using brain connectivity and its spatial information. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2556–2560, 2018.
-  Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, pages 1–22, 2019.
-  Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, pages 1–14, 2017.
-  Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 3844–3852, 2016.
-  Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, pages 1263–1272, 2017.
-  Soobeom Jang, Seong-Eun Moon, and Jong-Seok Lee. EEG-based video identification using graph signal modeling and graph convolutional neural network. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3066–3070, 2018.
-  Tengfei Song, Wenming Zheng, Peng Song, and Zhen Cui. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing (Early Access), pages 1–10, 2018.
-  Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, and Daniel Rueckert. Distance metric learning using graph convolutional networks: application to functional brain networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 469–477, 2017.
-  Lishan Qiao, Limei Zhang, Songcan Chen, and Dinggang Shen. Data-driven graph construction and graph learning: a review. Neurocomputing, 312:336–351, 2018.
-  Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, and Richard Zemel. Neural relational inference for interacting systems. In Proceedings of the 35th International Conference on Machine Learning, pages 2688–2697, 2018.
-  Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the 3rd International Conference on Learning Representations, pages 1–14, 2015.
-  Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, pages 448–456, 2015.
-  Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbel-softmax. In Proceedings of the 5th International Conference on Learning Representations, pages 1–12, 2017.
Chris J. Maddison, Andriy Mnih, and Yee Whye Teh.
The concrete distribution: a continuous relaxation of discrete random variables.In Proceedings of the 5th International Conference on Learning Representations, pages 1–20, 2017.
Wuzhen Shi, Feng Jiang, and Debin Zhao.
Single image super-resolution with dilated convolution based multi-scale information learning inception module.In Proceedings of the IEEE International Conference on Image Processing, pages 977–981, 2017.
-  Sheng Yang, Guosheng Lin, Qiuping Jiang, and Weisi Lin. A dilated inception network for visual saliency prediction. arXiv preprint arXiv:1904.03571, 2019.
-  Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. In Proceedings of the 4th International Conference on Learning Representations, pages 1–13, 2016.
-  Thalía Harmony, Thalía Fernández, Juan Silva, Jorge Bernal, Lourdes Díaz-Comas, Alfonso Reyes, Erzsébet Marosi, Mario Rodríguez, and Miguel Rodríguez. EEG delta activity: an indicator of attention to internal processing during performance of mental tasks. International Journal of Psychophysiology, 24(1-2):161–171, 1996.
-  Robert J. Barry, Adam R Clarke, Stuart J. Johnstone, and Christopher R. Brown. EEG differences in children between eyes-closed and eyes-open resting conditions. Clinical Neurophysiology, 120(10):1806–1811, 2009.
-  Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. DEAP: a database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 3(1):18–31, 2011.
-  Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In Proceedings of the NIPS 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques, pages 1–4, 2017.
-  Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, pages 1–15, 2015.
-  Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605, 2008.
-  Soraia M. Alarcao and Manuel J. Fonseca. Emotions recognition using EEG signals: a survey. IEEE Transactions on Affective Computing (Early Access), pages 1–20, 2017.
-  Luiz Pessoa. Emotion and cognition and the amygdala: from “what is it?” to “what’s to be done?”. Neuropsychologia, 48(12):3416–3429, 2010.
-  Louis A. Schmidt and Laurel J. Trainor. Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition & Emotion, 15(4):487–500, 2001.
-  Zeynab Mohammadi, Javad Frounchi, and Mahmood Amiri. Wavelet-based emotion recognition system using EEG signal. Neural Computing and Applications, 28(8):1985–1990, 2017.
-  Frederik S. Kamps, Joshua B. Julian, Jonas Kubilius, Nancy Kanwisher, and Daniel D. Dilks. The occipital place area represents the local elements of scenes. NeuroImage, 132:417–424, 2016.
-  K. Grill-Spector and R. Malach. The human visual cortex. Annual Review of Neuroscience, 27:649–677, 2004.
-  Olaf Hauk and Friedman Pulvermuller. Neurophysiological distinction of action words in the fronto-central cortex. Human Brain Mapping, 21(3):191–201, 2004.
-  Eddie Harmon-Jones, Philip A. Gable, and Carly K. Peterson. The role of asymmetric frontal cortical activity in emotion-related phenomena: a review and update. Biological Psychology, 84(3):451–462, 2010.
-  Samantha J. Reznik and John J. B. Allen. Frontal asymmetry as a mediator and moderator of emotion: an updated review. Psychophysiology, 55(1):1–32, 2018.
Appendix A Implementation Details
a.1 Data processing
Each one-minute-long EEG signal in the DEAP dataset is divided into three-second-long segments () with a two-second-long overlap. This results in a total of 74,240 (32 subjects40 videos58 segments) sets of 32-channel EEG signals. They are randomly split into the training, validation, and test datasets, which hold 80%, 10%, and 10% of the entire dataset, respectively.
a.2 Model details
Graph membership extraction
, , and
are two-layer fully-connected networks, which have 256 hidden neurons and 256 output neurons. Each fully-connected layer has exponential linear units, and batch normalization  is used in the output layer. consists of three fully-connected layers, where the first two layers have 256 hidden neurons with exponential linear units and the last layer has output neurons.
Each dilated inception module consists of four 1-D convolutional layers having a kernel size of 3, which have dilation rates of 1, 2, 4, and 8, respectively. Each layer has eight output channels, so . The max-pooling size is set as . Three dilated inception modules are connected in a row so that the signal features have a reduced length of .
Graph neural network
, are two-layer fully-connected networks with 256 hidden neurons and 256 output neurons with rectified linear units. is a two-layer fully-connected network having 256 hidden neurons with rectified linear units and 40 softmax output neurons.
Our model is trained with the Adam optimizer 
with a learning rate of 0.0001 for 30 epochs to minimize the cross-entropy loss. The batch size is 32. The training procedure takes about 10-12 hours using a single NVIDIA K80 GPU. The test accuracy is measured using the network showing the best validation accuracy during the training process.
Appendix B Further Performance Comparison
The method in  first divides the given time-series signal at each electrode into 10 different frequency bands. Then, the power spectral density (PSD) is extracted from each divided signal as the signal feature. The PSD values of all electrodes for each frequency band are arranged as a 3232 2-D image according to the physical locations of the electrodes. Thus, we obtain a 323210 matrix, which is inputted to a CNN.
In the method in , the phase locking value (PLV) features representing connectivity are extracted and used for a CNN as input. The PLV values for all pairs of electrodes are arranged as a 3232 2-D matrix. Aggregating these matrices for all bands results in a 323210 matrix as the input to a CNN.
CNNs having four convolutional layers (conv), two max-pooling layers (maxpool), and a fully-connected layer (fc) are used, i.e., conv(32)-conv(64)-maxpool-conv(128)-conv(256)-maxpool-fc, where the numbers in the parentheses indicate the numbers of convolutional filters. We follow the way of signal segmentation originally used in  for these methods, because training the CNNs with the dataset used in our experiment was not successful. Thus, the accuracy for these two methods may not be directly compared to that shown in Section 4.2, but can be considered for a rough comparison.
The obtained accuracy of the two methods is 31.03% and 72.09%, respectively. This shows that modeling connectivity features with CNN is beneficial, whose result is roughly comparable to that of ChebNet. However, it is still far worse than the result of the proposed method, proving the effectiveness of the data-driven approach for connectivity extraction.
Appendix C Supplementary Graph Visualization
c.1 EEG electrodes in graph visualization
Figure C.1 shows the names and positions of the 32 EEG electrodes for the graph representation used in this paper.
|[trim=.150pt .150pt .150pt .150pt,clip,width=3cm]Figures/All_Name_v5|
c.2 Visualization of out-degrees