1 Introduction
Emotion recognition is an important subarea of affective computing, which focuses on recognizing human emotions based on a variety of modalities, such as audiovisual expressions, body language, physiological signals, etc. Compared to other modalities, physiological signals, such as electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), galvanic skin response (GSR), etc., have the advantage of being difficult to hide or disguise. In recent years, due to the rapid development of noninvasive, easytouse and inexpensive EEG recording devices, EEGbased emotion recognition has received an increasing amount of attention in both research [4] and applications [2].
Emotion models can be broadly categorized into discrete models and dimensional models. The former categorizes emotions into discrete entities, e.g., anger, disgust, fear, happiness, sadness, and surprise in Ekman’s theory [18]. The latter describes emotions using their underlying dimensions, e.g., valence, arousal and dominance [43], which measures emotions from unpleasant to pleasant, passive to active, and submissive to dominant, respectively.
EEG signals measure voltage fluctuations from the cortex in the brain and have been shown to reveal important information about human emotional states [52]. For example, greater relative left frontal EEG activity has been observed when experiencing positive emotions [52]. The voltage fluctuations on different brain regions are measured by electrodes attached to the scalp. Each electrode collects EEG signals in one channel. The collected EEG signals are often analyzed in specific frequency bands for each channel, namely delta (14 Hz), theta (47 Hz), alpha (813 Hz), beta (1330 Hz), and gamma (>30 Hz).
Many existing EEGbased emotion recognition methods are primarily based on the supervised machine learning approach wherein features are extracted from preprocessed EEG signals in each channel over a time window and then a classifier is trained on the extracted features to recognize emotions. Wang
et al. [63]compared power spectral density features (PSD), wavelet features and nonlinear dynamical features with a Support Vector Machine (SVM) classifier. Zheng and Lu
[72] investigated critical frequency bands and channels using PSD, differential entropy (DE) [53]and PSD asymmetry features, and obtained robust accuracy using deep belief networks (DBN). However, most existing EEGbased emotion recognition approaches do not address the following three challenges: 1) the topological structure of EEG signals are not effectively exploited to learn more discriminative EEG representations; 2) EEG signals vary significantly across different subjects, which hinders the generalizability of the trained classifiers; and 3) participants may not always generate the intended emotions when watching emotioneliciting stimuli. Consequently, the emotion labels in the collected EEG data are noisy and may not be consistent with the actual elicited emotions.
There have been several attempts to address the first challenge. Zhang et al. [68] and Zhang et al. [69]
incorporated spatial relations in EEG signals using convolutional neural networks (CNN) and recurrent neural networks (RNN), respectively. However, their approaches require a 2D representation of EEG channels on the scalp, which may cause information loss during flattening because channels are actually arranged in the 3D space. In addition, their approach of using CNNs and RNNs to capture interchannel relations has difficulty in learning longrange dependencies
[45]. Graph neural networks (GNN) has been applied in [56] to capture interchannel relations using an adjacency matrix. However, similar to CNNs and RNNs, their approach only considers relations between the nearest channels, which thus may lose valuable information between distant channels, such as PSD asymmetry between channels on the left and right hemispheres in the frontal region, which has been shown as informative in valence prediction [52]. A recent work applies RNNs to learn EEG representations in the two hemispheres separately and then adopts the asymmetric differences between them to recognize emotions [37]. However, their approach is limited to using only the bihemispherical discrepancies and ignores other useful features such as neuronal activities recorded from each channel.In recent years, several studies [73, 11] investigated the transferability of EEGbased emotion recognition models across subjects. Lan et al. [32] compared several domain adaptation techniques such as maximum independence domain adaptation (MIDA), transfer component analysis (TCA), subspace alignment (SA), etc. They found that the subjectindependent classification accuracy can be improved by around 10%. Li et al. [38] applied domain adversarial learning to lower the influence of individual subject on EEG data and obtained improved performance as well. However, their approaches do not exploit any graph structure and only leads to small performance improvement (see Section 7.1).
To the best of our knowledge, no attempt has been made to address the problem of noisy labels in EEGbased emotion recognition.
In this paper, we propose a regularized graph neural network (RGNN) aiming to address all three aforementioned challenges. Graph analysis for human brain has been studied extensively in the neuroscience literature [19, 21]. However, making an accurate connectome is still an open question and subject to different scales [21]. Inspired by [9, 56], we consider each channel in EEG signals as a node in our graph. Our RGNN model extends the simple graph convolution network (SGC) [64] and leverages the topological structure of EEG signals, i.e., according to the economy of brain network organization [9], we propose a biologically supported sparse adjacency matrix to capture both local and global interchannel relations. Local interchannel relations connect nearby groups of neurons and may reveal anatomical connectivity at macroscale [15, 21]. Global interchannel relations connect distant groups of neurons between the left and right hemispheres and may reveal emotionrelated functional connectivity [52, 38].
In addition, we propose a nodewise domain adversarial training (NodeDAT) to regularize our graph model for better generalization in subjectindependent classification scenarios. Different from the domain adversarial training adopted by [22, 38], our NodeDAT gives a finergrained regularization by minimizing the domain discrepancies between features in the source and target domains for each channel/node. Moreover, we propose an emotionaware distribution learning (EmotionDL) method to address the problem of noisy labels in the datasets. Prior studies have shown that noisy labels can adversely impact classification accuracy [76]. Instead of learning singlelabel classification, our EmotionDL learns a distribution of labels of the training data and thus acts as a regularizer to improve the robustness of our model against noisy labels. Finally, we conduct extensive experiments to validate the effectiveness of our proposed model and investigate emotionrelated informative neuronal activities.
In summary, the main contributions of this paper are as follows:

We propose a regularized graph neural network (RGNN) model to recognize emotions based on EEG signals. Our model is biologically supported and captures both local and global interchannel relations.

We propose two regularizers: a nodewise domain adversarial training (NodeDAT) and an emotionaware distribution learning (EmotionDL), which aim to improve the robustness of our model against crosssubject variations and noisy labels, respectively.

We conduct extensive experiment in both subjectdependent and subjectindependent classification settings on two public EEG datasets, namely SEED [72] and SEEDIV [71]. Experimental results demonstrate the effectiveness of our proposed model and regularizers. In addition, our RGNN achieves superior performance over the stateoftheart baselines in most experimental settings.

We investigate the neuronal activities and the results reveal that prefrontal, parietal and occipital regions may be the most informative regions for emotion recognition. In addition, global interchannel relations between the left and right hemispheres are important and local interchannel relations between (FP1, AF3), (F6, F8) and (FP2, AF4) may also provide useful information.
2 Related Work
In this section, we review related work in the fields of EEGbased emotion recognition, graph neural networks, unsupervised domain adaptation and learning with noisy labels.
2.1 EEGBased Emotion Recognition
EEG feature extractors and classifiers are the two fundamental components in the machine learning approach of EEGbased emotion recognition. EEG features can be broadly divided into singlechannel features and multichannel ones [27]. The majority of existing features are singlechannel features such as statistical features [59, 61], fractal dimension (FD) [41], PSD [40], differential entropy (DE) [53], and wavelet features [3]. A few features are computed on multiple channels to capture the interchannel relations, e.g., the asymmetry features of PSD [72] and functional connectivity [65, 35]
, where common indices such as correlation, coherence and phase synchronization were used estimate brain functional connectivity between channels. However, leveraging functional connectivity require laborintensive manual connectivity analysis for each subject and may not be ideal for realtime applications.
EEG classifiers can be broadly divided into topologyinvariant classifiers and topologyaware ones. The majority of existing classifiers are topologyinvariant classifiers such as SVM, kNearest Neighbors (KNN), DBNs
[74] and RNNs [28], which do not take the topological structure of EEG features into account when learning the EEG representations. In contrast, topologyaware classifiers such as CNNs [5, 36, 68, 34] and GNNs [56] consider the interchannel topological relations and learn EEG representations for each channel by aggregating features from nearby channels using convolutional operations either in the Euclidean space or in the nonEuclidean space. However, as discussed in Section 1, existing CNNs and GNNs have difficulty in learning the dependencies between distant channels, which may reveal important emotionrelated information. Recently, Zhang et al. [69] and Li et al. [37] proposed to use RNNs to learn spatial topological relations between channels by scanning electrodes in both vertical and horizontal directions. However, their approaches do not fully exploit the topological structure of EEG channels. For example, two topologically close channels may be far away from each other in the scanning sequence.2.2 Graph Neural Networks
Graph neural networks (GNN) is a class of neural networks dealing with data in the graph domains, e.g., molecular structures, social networks and knowledge graphs
[66]. One early work on GNNs [51] aimed to learn a converged static state embedding for each node in the graph using a transition function applied to its neighborhood. Later, inspired by the convolutional operation of CNN in Euclidean domains, Bruna et al. [8] combined spectral graph theory [12] with neural networks and defined convolutional operations in graph domains using the spectral filters computed from the normalized graph Laplacian. Following this line of research, Defferrard et al. [16] proposed fast localized convolutions by using a recursive formulation of the order Chebyshev polynomials to approximate the filters. The resulting representation for each node is an aggregation of its order neighborhood. Kipf and Welling [30] further limited and proposed the standard graph convolutional network (GCN) with a faster localized graph convolutional operation. The convolutional layers in GCN can be stacked K times to effectively convolve the order neighborhood of a node. Recently, Wu et al. [64]simplified GCN by removing the nonlinearities between convolutional layers in GCN and proposed the simple graph convolution network (SGC), which effectively behaves like a linear feature transformation followed by a logistic regression. SGC performs orders of magnitude faster than GCNs with comparable classification accuracy. In this paper, we extend SGC to model EEG signals and propose a biologically supported adjacency matrix and two regularizers for robust EEGbased emotion recognition.
2.3 Unsupervised Domain Adaptation
Unsupervised domain adaptation aims to mitigate the domain shift in knowledge transfer from a supervised source domain to an unsupervised target domain. The most common approaches are instance reweighting, domaininvariant feature learning, domain mapping and normalization statistics. Instance reweighting methods [26] aim to infer the resampling weight directly by feature distribution matching across source and target domains in a nonparametric manner. Domaininvariant feature leaning methods align features from both source and target domains to a common feature space. The alignment can be achieved by minimizing divergence [25], maximizing reconstruction [24] or adversarial training [22]
. The domain mapping technique is typically applied in the computer vision field where pixellevel imagetoimage translation from one domain to another domain improves domain adaptation performance
[6]. Normalization statistics are based on the assumption that the batch norm statistics learn domain knowledge. Cariucci et al. [10] performed domain adaptation by modulating the batch norm layers’ statistics from source to target domain. Our proposed NodeDAT regularizer extends the domain adversarial training [22] to graph neural networks and achieves finergrained regularization by minimizing the discrepancies between features in source and target domains for each channel/node individually.2.4 Learning with Noisy Labels
Commonly adopted approaches to learning with noisy labels are based on the noise transition matrix and robust loss functions. The noise transition matrix specifies the probabilities of transition from each ground true label to each noisy label and is often applied to modify the crossentropy loss. The matrix can be precomputed as
a prior [46] or estimated from noisy data [58]. A few studies tackle noisy labels by using noisetolerant robust loss functions, such as unhinged loss [62] and ramp loss [7]. Several other approaches include bootstrap that leverages predicted labels to generate training targets [48] and alternatively updating network parameters and labels during training [60]. Our proposed EmotionDL regularizer is inspired by [23], which applies distribution learning to learn labels with ambiguity in the computer vision domain.3 Preliminaries
In this section, we introduce the preliminaries of the simple graph convolution network (SGC) [64] and its spectral analysis, which is the basis of our RGNN model.
3.1 Simple Graph Convolution Network (SGC)
Given a graph , where denotes a set of nodes and denotes a set of edges between nodes in . Data on can be represented by a feature matrix , where denotes the number of nodes and denotes the input feature dimension. The edge set can be represented by a weighted adjacency matrix with selfloops, i.e., , . In general, GNNs learn a feature transformation function for and produces output , where denotes the output feature dimension.
Between adjacent layers in GNNs, the feature transformation can be written as
(1) 
where , denotes the number of layers, , , and denotes the function we want to learn. A simple definition of would be
(2) 
where denotes a nonlinear function and denotes a weight matrix at layer . For each node , function simply sums up all node features in its neighborhood including
itself, followed by a nonlinear transformation. However, one major limitation of
in (2) is that repeatedly applying along multiple layers may lead to with overly large values due to summation. Kipf and Welling [30] alleviated this limitation by proposing the graph convolution network (GCN) as follows:(3) 
where denotes the diagonal degree matrix of , i.e., . The normalized adjacency matrix prevents from growing overly large. If we ignore and temporarily and expand (3), the hidden state for node , , can be computed via
(4) 
Note that each neighboring is now normalized by both the degrees of and . Therefore, essentially, for each node, the feature transformation function in GCN is a nonlinear transformation of the weighted sum of node features of itself and its neighborhood. Successively applying graph convolutional layers aggregates node features within a neighborhood of size .
To further accelerate training while keeping comparable performance, Wu et al. [64] proposed SGC by removing the nonlinear function in (3) and reparameterizing all linear transformations across all layers into one linear transformation as follows:
(5) 
where , and . Essentially, SGC computes a topologyaware linear transformation , followed by one final linear transformation .
3.2 Spectral Graph Convolution
We analyze GCN from the perspective of spectral graph theory [12]. Graph Fourier analysis relies on the graph Laplacian or the normalized graph Laplacian . Since is a symmetric positive semidefinite matrix, it can be decomposed as , where
is the orthonormal eigenvector matrix of
andis the diagonal matrix of corresponding eigenvalues. Given graph data
, the graph Fourier transform of
is , and the inverse Fourier transform of is . Hence, the graph convolution between and a filter is computed as follows:(6) 
where denotes elementwise multiplication, and denotes a diagonal matrix with spectral filter coefficients.
To reduce the current learning complexity of to that of conventional CNN, i.e., , (6) can be approximated using the th order polynomials as follows:
(7) 
where denotes coefficients. To further reduce computational cost, Defferrard et al. [16] proposed to use Chebyshev polynomials to approximate the filtering operation as follows:
(8) 
where denotes learnable parameters, denotes the scaled normalized Laplacian with its eigenvalues lying within , and denotes the Chebyshev polynomials recursively defined as with and .
The GCN proposed in [30] made a few approximations to simplify the filtering operation in (8): 1) use ; 2) set ; and 3) set . The resulted GCN arrives at (3). Essentially, the graph convolutional operations defined in (3) and (5) behave like a lowpass filter by smoothing the features of each node on the graph using node features in its neighborhood.
4 Regularized Graph Neural Network
In this section we present our regularized graph neural network (RGNN), specifically, the biologically supported adjacency matrix, and RGNN with two regularizers, i.e., nodewise domain adversarial training (NodeDAT) and emotionaware distribution learning (EmotionDL).
4.1 Adjacency Matrix in RGNN
The adjacency matrix in RGNN represents the topological structure of EEG channels, where denotes the number of channels in EEG signals or nodes on the graph. Each entry in the adjacency matrix indicates the weight of connection between channels and . Note that contains selfloops. To reduce overfitting, we model as a symmetric matrix by using only number of parameters instead of . Salvador et al. [49] observed that the strength of connection between brain regions decays as an inverse square or gravitylaw function of physical distance. Hence, we initialize the local interchannel relations in our adjacency matrix as follows:
(9) 
where , , denotes the physical distance between channels and , computed from the data sheet of the recording device, and denotes a sparsity hyperparameter controlling the decay rate of the connection between channels.
Bullmore and Sporns [9] proposed that the brain organization is shaped by an economic tradeoff between minimizing wiring costs and network running costs. Minimizing wiring costs encourages local interchannel connections as modelled in (9). However, minimizing network running costs encourages certain global interchannel connections for high efficiency of information transfer across the network as a whole. To this end, we add several global connections to our adjacency matrix. The global connections are subject to the specific EEG channel placement adopted in experiments. Fig. 1 depicts the global connections in both SEED [72] and SEEDIV [71]. The selection of global channels is supported by prior studies showing that the asymmetry in neuronal activities between the left and right hemispheres is informative in valence and arousal predictions [17, 52, 70]. To leverage the differential asymmetry information, we initialize the global interchannel relations in to as follows:
(10) 
where denotes the indices of empirically selected symmetric channel pairs that balance wiring cost and global efficiency [9]: (FP1, FP2), (AF3, AF4), (F5, F6), (FC5, FC6), (C5, C6), (CP5, CP6), (P5, P6), (PO5, PO6), and (O1, O2). Note that our adjacency matrix obtained in (10) aims to represent the brain network which combines both local anatomical connectivity and emotionrelated global functional connectivity.
The last step in constructing the adjacency matrix is finding an optimal value of to regularize the weights of connections between local channels. Achard and Bullmore [1] observed that sparse fMRI networks, comprising around 20% of all possible connections, typically maximize the efficiency of the network topology. Thus, we choose such that around 20% of entries in are larger than in absolute values. We empirically pick as the threshold of having negligible connections between channels.
4.2 Dynamics of RGNN
Our RGNN model extends the SGC model [64]. The architecture of RGNN is illustrated in Fig. 2. Given EEG features and labels , where denotes the number of training samples, denotes the number of nodes or channels, denotes the input feature dimension, denotes the label index, and denotes the number of classes. Our model aims to minimize the following crossentropy loss:
(11) 
where denotes the model parameters we want to optimize, and denotes the L1 sparse regularization strength of our adjacency matrix .
By passing each feature matrix into our RGNN, the output probability of class can be computed as
(12) 
where , and follow the definitions in (5), , denotes the output weight matrix, and
denotes the sum pooling across all nodes on the graph. We choose sum pooling because it demonstrated more expressive power than mean pooling and max pooling
[67]. Note that we use the absolute values of to compute the degree matrix because has negative elements, e.g., global connections.4.2.1 Nodewise Domain Adversarial Training
EEG signals vary significantly across different subjects, which hinders the generalizability of trained classifiers. To improve subjectindependent classification performance, we extend the domain adversatial training [22] by proposing a nodewise domain adversarial training (NodeDAT) to reduce the discrepancies between source and target domains, i.e., training and testing sets, respectively. Specifically, a domain classifier is proposed to classify each node representation into either source domain or target domain. Compared to [22], which only regularizes the pooled representation in the last layer, our NodeDAT has finergrained regularization because it explicitly regularizes each node representation before pooling (see Section 7.1). During optimization, our model aims to confuse the domain classifier by learning domaininvariant representations for each node.
Specifically, given source/training data (in this subsection, we denote by for better clarity) and unlabelled target/testing data , where in practice can be either oversampled or donwsampled to have the same number of samples as [22], the domain classifier aims to minimize the sum of the following two binary crossentropy losses:
(13) 
where and denote source and target domains, respectively. Intuitively the domain classifier aims to classify source data as 0 and target data as 1. The domain probabilities for node are computed as
(14) 
where denote the th node representation in , and denotes the model parameters in the domain classifier. Essentially, our NodeDAT encourages learning domain invariant node presentation by trying to confuse the domain classifier.
Note that our domain classifier implements a gradient reversal layer (GRL) [22]
to reverse the gradients of the domain classifier during backpropagation. The gradients are further scaled by a GRL scaling factor
which gradually increases from 0 to 1 as the training progresses. The gradually increasing allows our domain classifier to be less sensitive to noisy inputs at the early stages of the training process. Specifically, as suggested in [22], we let , where denotes the training progress.4.2.2 Emotionaware Distribution Learning
Participants may not always generate the intended emotions when watching emotioneliciting stimuli. To address the problem of noisy emotion labels in the datasets, we propose an emotionaware distribution learning method (EmotionDL) to learn a distribution of classes instead of one single class for each training sample. Specifically, we convert each training label
into a prior probability distribution of all classes
, where denotes the probability of class c in . The conversion is datasetdependent. In SEED, there are three classes: negative, neutral, and positive with corresponding class indices 0, 1, and 2, respectively. We convert as follows:(15) 
where denotes a hyperparameter controlling the noise level in the training labels. This conversion mechanism is based on our assumption that participants are unlikely to generate opposite emotions when watching emotioneliciting stimuli. Therefore, the converted class distribution centers on the original class and has nonzero and zero probabilities at its nearest and opposite classes, respectively.
In SEEDIV, there are four classes: neutral, sad, fear, and happy with corresponding class indices 0, 1, 2, and 3, respectively. We can convert as follows:
(16) 
The intuition behind this conversion is based on the distances between the four emotions on the valencearousal plane. Specifically, in the selfreported ratings [71], neutral, sad, fear, and happy movie ratings cluster in the zero valence zero arousal, negative valence negative arousal, negative valence positive arousal, and positive valence positive arousal regions, respectively. Thus, we assume that participants are likely to generate emotions that have similar ratings in either valence or arousal dimensions, e.g., both angry and happy have high arousal, but unlikely to generate emotions that are far away in both dimensions, e.g., sad and happy are different in both valence and arousal.
After obtaining the converted class distributions , our model can be optimized by minimizing the following KullbackLeibler (KL) divergence [31] instead of (11):
(17) 
where denotes the output probability distribution computed via (12). Note that our EmotionDL is different from label smoothing, which simply adds uniform noise to other classes.
4.2.3 Optimization of RGNN
Combining both NodeDAT and EmotionDL, the overall loss function of RGNN is computed as follows:
(18) 
The detailed algorithm for training RGNN is presented in Algorithm 1.
5 Experimental Settings
In this section, we present the datasets, classification settings and model settings in our experiments.
5.1 Datasets
We use both SEED and SEEDIV datasets in our experiments. The SEED dataset [72] comprises EEG data of 15 subjects (7 males) recorded in 62 channels using the ESI NeuroScan System^{1}^{1}1https://compumedicsneuroscan.com/. The EEG data was collected when participants watch emotioneliciting movies in three types of emotions, namely negative, neutral and positive. Each movie lasts around 4 minutes. There are three sessions of data collected and each session comprises 15 trials/movies for each subject. To make a fair comparison with existing studies, we directly use the precomputed differential entropy (DE) features smoothed by linear dynamic systems (LDS) [54, 72]
in SEED. DE extends the idea of Shannon entropy and measures the complexity of a continuous random variable. For a fixed length EEG segment, DE features are computed as the logarithm energy spectrum in a certain frequency band
[53]. In SEED, DE features are precomputed over five frequency bands (delta, theta, alpha, beta and gamma) for each second of EEG signals (without overlapping) in each channel.The SEEDIV dataset [71] comprises EEG data of 15 subjects (7 males) recorded in 62 channels^{2}^{2}2SEEDIV also contains eye movement data, which we do not use in our experiment.. The recording device is the same as the one used in SEED. The EEG data were collected when participants watch emotioneliciting movies in four types of emotions, namely, neutral, sad, fear, and happy. Each movie lasts around 2 minutes. There are three sessions of data collected and each session comprises 24 trials/movies for each subject. Similar to SEED, we adopt the precomputed DE features from SEEDIV.
5.2 Classification Settings
We conduct both subjectdependent and subjectindependent classifications on both SEED and SEEDIV to evaluate our model.
5.2.1 SubjectDependent Classification
For SEED, we follow the experimental settings in [72, 56, 38] to evaluate our RGNN model using subjectdependent classification, i.e., we evaluate our model for individual subjects. Specifically, for each subject, we train our model using the first 9 trials as the training set and the remaining 6 trials as the testing set. We evaluate the model performance by using the accuracy averaged across all subjects over two sessions of EEG data in SEED [72]. For SEEDIV, we follow the experimental settings in [71, 37] to evaluate our RGNN model using subjectdependent classification. Specifically, for each subject, the first 16 trials are used for training and the remaining 8 trials containing all emotions (each emotion with two trials) are used for testing. We evaluate our model using data from all three sessions[71].
5.2.2 SubjectIndependent Classification
For SEED, we follow the experimental settings in [73, 56, 38] to evaluate our RGNN model using subjectindependent classification. Specifically, we adopt leaveonesubjectout crossvalidation, i.e, during each fold, we train our model on 14 subjects and test on the remaining subject. We evaluate the model performance using the accuracy averaged cross all test subjects over one session of EEG data in SEED [73]. For SEEDIV, we follow the experimental settings in [37] to evaluate our RGNN model using subjectindependent classification. We evaluate our model using data from all three sessions.
SEED  SEEDIV  
Model  delta band  theta band  alpha band  beta band  gamma band  all bands  all bands 
SVM  60.50/14.14  60.95/10.20  66.64/14.41  80.76/11.56  79.56/11.38  83.99/09.92  56.61/20.05 
GSCCA [75]  63.92/11.16  64.64/10.33  70.10/14.76  76.93/11.00  77.98/10.72  82.96/09.95  69.08/16.66 
DBN [72]  64.32/12.45  60.77/10.42  64.01/15.97  78.92/12.48  79.19/14.58  86.08/08.34  66.77/07.38 
STRNN [69]  80.90/12.27  83.35/09.15  82.69/12.99  83.41/10.16  69.61/15.65  89.50/07.63   
DGCNN [56]  74.25/11.42  71.52/05.99  74.43/12.16  83.65/10.17  85.73/10.64  90.40/08.49  69.88/16.29 
ALSTM [55]              69.50/15.65 
BiDANN [38]  76.97/10.95  75.56/07.88  81.03/11.74  89.65/09.59  88.64/09.46  92.38/07.04  70.29/12.63 
EmotionMeter [71]              70.58/17.01 
BiHDM [37] (SOTA)            93.12/06.06  74.35/14.09 
RGNN (Our model)  76.17/07.91  72.26/07.25  75.33/08.85  84.25/12.54  89.23/08.90  94.24/05.95  79.37/10.54 
Subjectdependent classification accuracy (mean/standard deviation) on SEED and SEEDIV
SEED  SEEDIV  
Model  delta band  theta band  alpha band  beta band  gamma band  all bands  all bands 
SVM  43.06/08.27  40.07/06.50  43.97/10.89  48.63/10.29  51.59/11.83  56.73/16.29  37.99/12.52 
TCA [44]  44.10/08.22  41.26/09.21  42.93/14.33  43.93/10.06  48.43/09.73  63.64/14.88  56.56/13.77 
SA [20]  53.23/07.47  50.60/08.31  55.06/10.60  56.72/10.78  64.47/14.96  69.00/10.89  64.44/09.46 
TSVM [13]            72.53/14.00   
TPT [50]            76.31/15.89   
DGCNN [56]  49.79/10.94  46.36/12.06  48.29/12.28  56.15/14.01  54.87/17.53  79.95/09.02  52.82/09.23 
ALSTM [55]              55.03/09.28 
DAN [33]            83.81/08.56  58.87/08.13 
BiDANNS [38]  63.01/07.49  63.22/07.52  63.50/09.50  73.59/09.12  73.72/08.67  84.14/06.87  65.59/10.39 
BiHDM [37] (SOTA)            85.40/07.53  69.03/08.66 
RGNN (Our model)  64.88/06.87  60.69/05.79  60.84/07.57  74.96/08.94  77.50/08.10  85.30/06.72  73.84/08.02 
5.3 Model Settings in RGNN
For our RGNN in all experiments, we empirically set the number of convolutional layers , dropout rate [57] of at the output fullyconnected layer, and batch size of . We use Adam optimization [29] with default values, i.e., and . We only tune the output feature dimension , label noise level , learning rate , L1 regularization factor , and L2 regularization for each experiment. Note that we only adopt NodeDAT in subjectindependent classification experiments. We compare our model with several baselines, which are cited from published results [56, 38, 69, 37].
6 Performance Evaluations
In this section we present model evaluation results in both subjectdependent and subjectindependent classification settings on both datasets. We also investigate critical frequency bands and confusion matrix of our model.
6.1 SubjectDependent Classification
Table I presents the subjectdependent classification accuracy (mean/standard deviation) of our RGNN model and all baselines on both SEED and SEEDIV using the precomputed DE features. The performance on SEED using DE feature in the individual delta, theta, alpha, beta, and gamma bands is reported as well. It is encouraging to see that our model achieves superior performance on both datasets as compared to all baselines including the stateoftheart BiHDM when DE features from all frequency bands are used. It is worth noting that our model improves the accuracy of the stateoftheart model on SEEDIV by around 5%. In particular, our model performs better than DGCNN, which is another GNNbased model that leverages the topological structure in EEG signals. Besides the proposed two regularizers (see Table III), the main performance improvement can be attributed to two factors: 1) our adjacency matrix incorporates the global interchannel asymmetry relation between the left and right hemispheres; and 2) our model has less concern of overfitting by extending SGC, which is much simpler than ChebNet [16] used in DGCNN.
6.2 SubjectIndependent Classification
Similar to Table I, Table II presents the subjectindependent classification results. When using features from all frequency bands, our model performs marginally worse than BiHDM on SEED but much better than BiHDM on SEEDIV (nearly 5% improvement). In addition, our model achieves the lowest standard deviation in accuracy compared to all baselines on both datasets, demonstrating the robustness of our model.
Comparing the results shown in Tables I and II
, we find that the accuracy obtained in subjectindependent settings is consistently worse than the accuracy obtained in subjectdependent settings by around 5% to 30% for every model. This finding is unsurprising because the variability of EEG signals across subjects makes subjectindependent classification more challenging. However, the interesting part is that the performance gap between these two settings is gradually decreasing from around 27% on SEED and 19% on SEEDIV using SVM to around 9% on SEED and 6% on SEEDIV using our model. One possible reason for the diminishing gap is that recent deep learning models in subjectindependent settings are becoming better at leveraging a larger amount of data and learning more subjectinvariant EEG representations. This observation seems to indicate that transfer learning may be a necessary tool for emotion recognition in crosssubject settings. With the increasing amount of data available from different subjects and a proper transfer learning tool, it would not be surprising that subjectindependent classification accuracy will surpass the subjectdependent classification accuracy in the future.
6.3 Performance Comparison of Frequency Bands
We further compare the performance of our model and all baselines using features from different frequency bands, as reported in Tables I and II. In subjectdependent experiments on SEED, STRNN achieves the highest accuracy in delta, theta and alpha bands, BiDANN performs best in beta band, and our model performs best in gamma band. In subjectindependent experiments on SEED, BiDANNS achieves the highest accuracy in theta and alpha bands, and our model performs best in delta, beta and gamma bands.
We investigate the critical frequency bands for emotion recognition. For both subjectdependent and subjectindependent settings on SEED, we compare the performance of each model across different frequency bands. In general, most models including our model achieve better performance on beta and gamma bands than delta, theta and alpha bands, with one exception of STRNN, which performs the worst on gamma band. This observation is consistent with the literature [47, 72]. One subtle difference between our model and other models is that our model performs consistently better in gamma band than beta band, whereas other models perform comparably in both bands, indicating that gamma band may be the most discriminative band for our model.
6.4 Confusion Matrix
We present the confusion matrix of our model in Fig. 3. For both subjectdependent and subjectindependent settings on SEED, our model can recognize better for positive and neutral emotions than negative emotion. By combining training data from other subjects (see Fig. 3 (a) and (b)), our model is getting much worse at detecting negative emotion, indicating that participants are likely to generate distinct EEG patterns when experiencing negative emotion. Similar phenomenon is observed in SEEDIV for sad emotion as well (see Fig. 3 (c) and (d)). For SEEDIV, our model performs significantly better on sad emotion than all other emotions in both classification settings. We notice that fear is the only emotion that performs better in subjectindependent classification than in subjectdependent classification. This finding indicates that participants watching horror movies may generate similar EEG patterns.
7 Model Analysis on RGNN
In this section we conduct ablation study and sensitivity analysis for model.
7.1 Ablation Study
Model  SEED  SEEDIV 

RGNN  85.30/06.72  73.84/08.02 
 global connection  82.42/08.24  71.13/08.78 
 symmetric adjacency matrix  83.69/07.92  72.02/08.66 
 NodeDAT  81.92/09.35  71.65/09.43 
 NodeDAT + DAT  83.51/08.11  72.40/08.54 
 EmotionDL  82.27/08.81  70.76/09.22 
We conduct ablation study to investigate the contribution of each key component in our model. Table III reports the results obtained in subjectindependent setting on both datasets. The two major designs in our adjacency matrix , i.e., global connection and symmetric adjacency matrix designs, are helpful in recognizing emotions. The global connection models the asymmetric difference between neuronal activities in the left and right hemispheres and have been shown to reveal certain emotions [17, 52, 70]. The symmetric adjacency matrix design is mostly motivated to reduce the number of model parameters and prevent overfitting, especially in subjectdependent classifications where lesser training data is available.
Our NodeDAT regularizer has a noticeable positive impact on the performance of our model, which demonstrates that domain adaptation is significantly helpful in crosssubject classification. To further investigate the impact of our nodelevel domain classifier, we further experimented with replacing NodeDAT with a generic domain classifier (DAT) [22] that operates after the pooling operation, i.e., (NodeDAT + DAT) in Table III
. The clear performance gap between (NodeDAT + DAT) and our RGNN model indicates that our NodeDAT can better regularize the model by learning subjectinvariant representation at node level than graph level. In addition, if NodeDAT is removed, the performance of our model has a greater variance, demonstrating the importance of NodeDAT in improving the robustness of our model against crosssubject variations.
Our EmotionDL regularizer improves performance of our model by around 3% in accuracy on both datasets. This performance gain validates our assumption that participants are not always generating the intended emotions when watching emotioneliciting stimuli. In addition, our EmotionDL can be easily adopted by other deep learning models.
7.2 Sensitivity Analysis
We analyze the performance of our model across varying L1 sparsity coefficient (see (11)) and noise coefficient in EmotionDL (see (15) and (16)), as illustrated in Fig. 4. For subjectdependent classification, increasing from 0 to 0.1 will generally increase the model performance. However, for subjectindependent classification, increasing beyond a certain threshold, i.e, 0.01 in Fig. 4(a), will decrease the model performance. One possible explanation for the difference in model behaviors is that there is much less training data in subjectdependent classification, which requires a stronger regularization to reduce overfitting, whereas for subjectindependent classification where the number of training data is less of a concern, adding stronger regularization may introduce bias and hinder the learning efficacy.
As illustrated in Fig. 4(b), our model behaves consistently across different experimental settings with varying noise coefficient . Specifically, by increasing , the performance of our model first increases and then decreases. In particular, our model usually performs best when is set to 0.2, demonstrating the existence of label noises and the necessity of addressing them on both datasets. Introducing excessive noise in EmotionDL causes performance drop, which is expected because excessive noise weakens the true learning signals.
8 Neuronal Activity Analysis for Emotion Recognition
In this section we analyze and identify important neuronal activities for emotion recognition.
8.1 Activation Maps of Channels
Fig. 5 shows the heatmap of the diagonal elements in our learned adjacency matrix . Conceptually, as shown in (4), the diagonal values in represents the contribution of each channel in computing the final EEG representation. It is clear from Fig. 5 that there are strong activations on the prefrontal, parietal, and occipital regions, indicating that these regions may be strongly related to the emotion processing of the brain. Our finding is consistent with existing studies, which observed that asymmetrical frontal and parietal EEG activity may reflect changes on both valence and arousal [52, 40]. The synchronization between frontal and occipital regions has also been reported to be related to positive and fear emotion [14, 42]. The symmetry pattern on the activation map of channels indicate again that the asymmetry in EEG activity between the left and right hemispheres is critical for emotion recognition.
8.2 Interchannel Relations
Fig. 6 shows the top 10 connections between channels having the largest edge weights in our adjacency matrix . Note that all global connections remain among the strongest connections after is learned, demonstrating again that global interchannel relations are essential for emotion recognition. It is obvious from Fig. 6 that there are both similarities and differences between these two plots, indicating that our initialization strategy presented in (9) can capture local interchannel relations to a certain degree. One notable difference between the two plots is that a few strong connections are gone in Fig. 6(a), e.g., (POZ, PO3), (PO6, PO8), and (P3, P5), indicating that these connections may not be critical for emotion recognition. In addition, it is clear from Fig. 6(b) that the connection between the channel pair (FP1, AF3) is the strongest, followed by (F6, F8), (FP2, AF4), and (PO8, CB2), indicating that local interchannel relations in the frontal region may be important for emotion recognition.
9 Conclusion
In this paper, we propose a regularized graph neural network for emotion recognition based on EEG signals. Our model is biologically supported to capture both local and global interchannel relations. In addition, we propose two regularizers, namely NodeDAT and EmotionDL, to improve the robustness of our model against crosssubject EEG variations and noisy labels. We evaluate our model in both subjectdependent and subjectindependent classification settings on two public datasets SEED and SEEDIV. Our model obtains better performance than a few competitive baselines such as SVM, DBN, DGCNN, BiDANN, and the stateoftheart BiHDM in most classification settings. Notably, our model achieves accuracy of 79.37% and 73.84% in subjectdependent and subjectindependent classifications on SEEDIV, respectively, outperforming the current stateoftheart model by around 5%. Our model analysis demonstrates that our proposed biologically supported adjacency matrix and two regularizers contribute consistent and significant gain to the performance of our model. Investigations on the neuronal activities reveal that prefrontal, parietal and occipital regions may be the most informative regions in emotion recognition. In addition, global interchannel relations between the left and right hemispheres are important and local interchannel relations between (FP1, AF3), (F6, F8) and (FP2, AF4) may also provide useful information.
In the future, we plan to investigate how to apply our model to EEG signals that have a smaller number of channels. A simpler version of our model may be necessary to avoid overfitting on these datasets. In addition, how to incorporate global connections on these smaller graphs may be worth exploring.
References
 [1] (2007) Efficiency and cost of economical brain functional networks. PLoS Computational Biology 3 (2), pp. e17. Cited by: §4.1.
 [2] (2015) Computeraided diagnosis of depression using EEG signals. European neurology 73 (56), pp. 329–336. Cited by: §1.
 [3] (2002) Comparison of wavelet transform and FFT methods in the analysis of EEG signals. Journal of Medical Systems 26 (3), pp. 241–247. Cited by: §2.1.
 [4] (2017) Emotions recognition using EEG signals: a survey. IEEE Transactions on Affective Computing. Cited by: §1.
 [5] (2015) Learning representations from EEG with deep recurrentconvolutional neural networks. arXiv preprint arXiv:1511.06448. Cited by: §2.1.
 [6] (2017) Onesided unsupervised domain mapping. In Advances in Neural Information Processing Systems, pp. 752–762. Cited by: §2.3.
 [7] (2011) Support vector machines with the ramp loss and the hard margin loss. Operations Research 59 (2), pp. 467–479. Cited by: §2.4.
 [8] (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.2.
 [9] (2012) The economy of brain network organization. Nature Reviews Neuroscience 13 (5), pp. 336. Cited by: §1, §4.1.
 [10] (2017) Autodial: automatic domain alignment layers. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5077–5085. Cited by: §2.3.
 [11] (2017) A fast, efficient domain adaptation technique for crossdomain electroencephalography (EEG)based emotion recognition. Sensors 17 (5), pp. 1014. Cited by: §1.
 [12] (1997) Spectral graph theory. American Mathematical Soc.. Cited by: §2.2, §3.2.
 [13] (2006) Large scale transductive svms. Journal of Machine Learning Research 7, pp. 1687–1712. Cited by: TABLE II.
 [14] (2006) EEG phase synchronization during emotional response to positive and negative film stimuli. Neuroscience Letters 406 (3), pp. 159–164. Cited by: §8.1.
 [15] (2013) Imaging human connectomes at the macroscale. Nature methods 10 (6), pp. 524. Cited by: §1.
 [16] (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. Cited by: §2.2, §3.2, §6.1.
 [17] (1976) Differing emotional response from right and left hemispheres. Nature 261 (5562), pp. 690. Cited by: §4.1, §7.1.
 [18] (1997) Universal facial expressions of emotion. Segerstrale U, P. Molnar P, eds. Nonverbal communication: Where nature meets culture, pp. 27–46. Cited by: §1.
 [19] (1960) On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5 (1), pp. 17–60. Cited by: §1.
 [20] (2013) Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2960–2967. Cited by: TABLE II.
 [21] (2013) Graph analysis of the human connectome: promise, progress, and pitfalls. Neuroimage 80, pp. 426–444. Cited by: §1.
 [22] (2016) Domainadversarial training of neural networks. The Journal of Machine Learning Research 17 (1), pp. 2096–2030. Cited by: §1, §2.3, §4.2.1, §4.2.1, §4.2.1, §7.1.
 [23] (2017) Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing 26 (6), pp. 2825–2838. Cited by: §2.4.
 [24] (2016) Deep reconstructionclassification networks for unsupervised domain adaptation. In European Conference on Computer Vision, pp. 597–613. Cited by: §2.3.
 [25] (2012) A kernel twosample test. Journal of Machine Learning Research 13 (Mar), pp. 723–773. Cited by: §2.3.
 [26] (2007) Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems, pp. 601–608. Cited by: §2.3.
 [27] (2014) Feature extraction and selection for emotion recognition from EEG. IEEE Transactions on Affective Computing 5 (3), pp. 327–339. Cited by: §2.1.
 [28] (2018) Deep physiological affect network for the recognition of human emotions. IEEE Transactions on Affective Computing. Cited by: §2.1.
 [29] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.3.
 [30] (2017) Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §2.2, §3.1, §3.2.
 [31] (1951) On information and sufficiency. The Annals of Mathematical Statistics 22 (1), pp. 79–86. Cited by: §4.2.2.
 [32] (2018) Domain adaptation techniques for EEGbased emotion recognition: a comparative study on two public datasets. IEEE Transactions on Cognitive and Developmental Systems 11 (1), pp. 85–94. Cited by: §1.
 [33] (2018) Crosssubject emotion recognition using deep adaptation networks. In Proceedings of the International Conference on Neural Information Processing, pp. 403–413. Cited by: TABLE II.
 [34] (2018) Hierarchical convolutional neural networks for EEGbased emotion recognition. Cognitive Computation, pp. 1–13. Cited by: §2.1.
 [35] (2019) EEG based emotion recognition by combining functional connectivity network and local activations. IEEE Transactions on Biomedical Engineering. Cited by: §2.1.
 [36] (2016) Emotion recognition from multichannel EEG data through convolutional recurrent neural network. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 352–359. Cited by: §2.1.
 [37] (2019) A novel bihemispheric discrepancy model for EEG emotion recognition. arXiv preprint arXiv:1906.01704. Cited by: §1, §2.1, §5.2.1, §5.2.2, §5.3, TABLE I, TABLE II.
 [38] (2018) A bihemisphere domain adversarial neural network model for EEG emotion recognition. IEEE Transactions on Affective Computing. Cited by: §1, §1, §1, §5.2.1, §5.2.2, §5.3, TABLE I, TABLE II.
 [39] (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.2.
 [40] (2010) EEGbased emotion recognition in music listening. IEEE Transactions on Biomedical Engineering 57 (7), pp. 1798–1806. Cited by: §2.1, §8.1.
 [41] (2013) Realtime fractalbased valence level recognition from EEG. In Transactions on Computational Science XVIII, pp. 101–120. Cited by: §2.1.
 [42] (2016) Timing of emotion representation in right and left occipital region: evidence from combined tmsEEG. Brain and Cognition 106, pp. 13–22. Cited by: §8.1.
 [43] (1996) Pleasurearousaldominance: a general framework for describing and measuring individual differences in temperament. Current Psychology 14 (4), pp. 261–292. Cited by: §1.
 [44] (2010) Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22 (2), pp. 199–210. Cited by: TABLE II.
 [45] (2013) On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pp. 1310–1318. Cited by: §1.

[46]
(2017)
Making deep neural networks robust to label noise: a loss correction approach.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1944–1952. Cited by: §2.4.  [47] (1985) EEG alpha activity reflects attentional demands, and beta activity reflects emotional and cognitive processes. Science 228 (4700), pp. 750–752. Cited by: §6.3.
 [48] (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596. Cited by: §2.4.
 [49] (2005) Neurophysiological architecture of functional magnetic resonance images of human brain. Cerebral cortex 15 (9), pp. 1332–1342. Cited by: §4.1.
 [50] (2014) We are not all equal: personalizing models for facial expression analysis with transductive parameter transfer. In Proceedings of the 22nd ACM International Conference on Multimedia, pp. 357–366. Cited by: TABLE II.
 [51] (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.2.
 [52] (2001) Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition & Emotion 15 (4), pp. 487–500. Cited by: §1, §1, §1, §4.1, §7.1, §8.1.
 [53] (2013) Differential entropy feature for EEGbased vigilance estimation. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6627–6630. Cited by: §1, §2.1, §5.1.
 [54] (2010) Offline and online vigilance estimation based on linear dynamical system and manifold learning. In 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 6587–6590. Cited by: §5.1.
 [55] (2019) MPED: a multimodal physiological emotion database for discrete emotion recognition. IEEE Access 7, pp. 12177–12191. Cited by: TABLE I, TABLE II.
 [56] (2018) EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing. Cited by: §1, §1, §2.1, §5.2.1, §5.2.2, §5.3, TABLE I, TABLE II.
 [57] (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §5.3.
 [58] (2014) Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080. Cited by: §2.4.
 [59] (2003) Remarks on emotion recognition from multimodal biopotential signals. In SMC’03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference ThemeSystem Security and Assurance (Cat. No. 03CH37483), Vol. 2, pp. 1654–1659. Cited by: §2.1.
 [60] (2018) Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5552–5560. Cited by: §2.4.
 [61] (2017) EEGbased emotion recognition via fast and robust feature smoothing. In International Conference on Brain Informatics, pp. 83–92. Cited by: §2.1.
 [62] (2015) Learning with symmetric label noise: the importance of being unhinged. In Advances in Neural Information Processing Systems, pp. 10–18. Cited by: §2.4.
 [63] (2014) Emotional state classification from EEG data using machine learning approach. Neurocomputing 129, pp. 94–106. Cited by: §1.
 [64] (2019) Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, pp. 6861–6871. Cited by: §1, §2.2, §2.2, §3.1, §3, §4.2.
 [65] (2019) Identifying functional brain connectivity patterns for EEGbased emotion recognition. In 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 235–238. Cited by: §2.1.
 [66] (2019) A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §2.2.
 [67] (2019) How powerful are graph neural networks?. In International Conference on Learning Representations (ICLR), Cited by: §4.2.

[68]
(2018)
Cascade and parallel convolutional recurrent neural networks on EEGbased intention recognition for brain computer interface.
In
ThirtySecond AAAI Conference on Artificial Intelligence
, Cited by: §1, §2.1.  [69] (2018) Spatialtemporal recurrent neural network for emotion recognition. IEEE Transactions on Cybernetics (99), pp. 1–9. Cited by: §1, §2.1, §5.3, TABLE I.
 [70] (2018) Frontal EEG asymmetry and middle line power difference in discrete emotions. Frontiers in Behavioral Neuroscience 12. Cited by: §4.1, §7.1.
 [71] (2018) Emotionmeter: a multimodal framework for recognizing human emotions. IEEE Transactions on Cybernetics (99), pp. 1–13. Cited by: item 3, §4.1, §4.2.2, §5.1, §5.2.1, TABLE I.
 [72] (2015) Investigating critical frequency bands and channels for EEGbased emotion recognition with deep neural networks. IEEE Transactions on Autonomous Mental Development 7 (3), pp. 162–175. Cited by: item 3, §1, §2.1, §4.1, §5.1, §5.2.1, TABLE I, §6.3.
 [73] (2016) Personalizing EEGbased affective models with transfer learning. In Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, pp. 2732–2738. Cited by: §1, §5.2.2.
 [74] (2014) EEGbased emotion classification using deep belief networks. In 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. Cited by: §2.1.
 [75] (2016) Multichannel EEGbased emotion recognition via group sparse canonical correlation analysis. IEEE Transactions on Cognitive and Developmental Systems 9 (3), pp. 281–290. Cited by: TABLE I.
 [76] (2004) Class noise vs. attribute noise: a quantitative study. Artificial intelligence review 22 (3), pp. 177–210. Cited by: §1.
Comments
There are no comments yet.