Emotion plays a crucial role in many aspects of our daily lives, such as social communication and decision-making. According to the Gartner hype cycle in 2019 
, emotion artificial intelligence (AI) is one of the 21 emerging technologies that will significantly impact our society over the next 5 to 10 years. Emotion AI, also known as artificial emotional intelligence or affective computing, aims at enabling machines to offer the capabilities to recognize, understand, and process emotions. Compared with the rich studies on the motor brain-computer interface (BCI), the recently emerging affective BCI (aBCI)  faces distinct challenges since the brain functional connectivity networks involving emotions are not well investigated . The aBCI technology aims to advance the human-computer interaction systems with the assistance of various devices to detect the affective states from neurophysiological signals. Therefore, the major challenge facing emotion AI and aBCI in the primary stage lies in emotion recognition .
In recent years, extensive endeavors have been devoted to emotion recognition. Among a variety of emotion recognition approaches, the modalities used to detect affective states primarily comprise two categories: the external behavioral signals, including facial expression , speech , body language, etc., and the internal physiological signals , containing electroencephalography (EEG) , electrocardiography (ECG) , respiration, galvanic skin response, etc. These two categories have their own prominent properties. The external behavioral signals outperform in terms of convenience of data collection, while the physiological signals are believed to be more objective and reliable in conveying emotions. As a result, multimodal emotion recognition has become the major trend, since it may leverage the complementary representation properties of different modalities. Nevertheless, most existing studies have focused on the fusion of visual and audio signals  , while few studies have combined the behaviors with physiological signals  .
Among the physiological modalities, EEG has exhibited outstanding performance in emotion recognition and is promising in elucidating the basic neural mechanisms underlying emotion  . Moreover, the fusion of EEG and eye tracking data has been shown efficient in multimodal emotion recognition, with increasing interests among research communities  
. In this paper, we adopt the EEG signals, along with eye movement data or peripheral physiological signals, to classify different emotions.
Most existing studies on EEG-based emotion recognition have relied on single-channel analysis [15, 16, 17, 18], where EEG features are independently extracted within each EEG channel in different brain regions. In contrast, studies on cognitive science and neuroimaging have demonstrated that emotion is a complex behavioral and physiological reaction that involves circuits in multiple cerebral regions . In addition, studies in neuroscience and neuropsychiatry have revealed that patients with cognitive defect psychophysiological diseases such as autism, schizophrenia and major depressive disorder present decreased brain functional connectivity by both functional magnetic resonance imaging (fMRI) and EEG . Furthermore, studies on neuroimaging based on fMRI have indicated that brain functional connectivity may offer the potential of representing the fingerprints in profiling individuals , as well as the ability of individuals to sustain attention . These results have provided evidence for the connection between cognition and brain functional connectivity. However, few studies have explored the emotion associated brain functional connectivity patterns. The study of emotion recognition from the perspective of the brain functional connectivity network remains to be further investigated and may eventually lead to the understanding of the underlying neurological mechanisms behind how emotions are processed in the brain.
In this paper, we aim to investigate the emotion-relevant brain functional connectivity patterns and evaluate the performance of the EEG connectivity feature for multimodal emotion recognition with respect to three public datasets: SEED , SEED-V , and DEAP . Fig. 1 depicts our proposed multimodal emotion recognition framework. The main contributions of our work lie in the following aspects:
We propose a novel emotion-relevant critical subnetwork selection algorithm and investigate three EEG connectivity features: strength, clustering coefficient, and eigenvector centrality.
We demonstrate the outstanding performance of the EEG connectivity feature and its complementary representation properties with eye movement data in multimodal emotion recognition.
We reveal the emotion associated brain functional connectivity patterns and the potential of applying the brain networks based on fewer EEG electrodes to aBCI systems in real scenarios.
The remainder of this paper is organized as follows. Section II introduces the related literature regrading multimodal emotion recognition and brain functional connectivity analysis. Section III describes the emotion experimental design. Section IV presents the proposed multimodal emotion recognition framework based on the brain functional connectivity. Section V analyzes and discusses the experimental results. Finally, a brief conclusion will be presented in Section VI.
2 Related Work
2.1 Emotion Recognition
Various modalities have been exploited to detect affective states in the past few decades. With the advent of computer vision and speech recognition, research on emotion recognition using facial expression and speech has gained prevalence . Hasani and Mahoor 
proposed an enhanced neural network architecture that consists of a 3D version of the Inception-ResNet network followed by a long short-term memory (LSTM) unit for emotion recognition from facial expressions in videos. They employed four databases in classifying different emotions, including anger, fear, disgust, sadness, neutrality, contempt, happiness, and surprise. Trigeorgiset al. 
presented an end-to-end speech emotion recognition framework created by combining the convolutional neural network (CNN) and LSTM models. Schirmer and Adolphs studied different modalities with respect to emotion perception, including facial expression, voice, and touch. The authors suggested that the mechanisms of these modalities have their own specializations and that together, they could lead to holistic emotion judgment.
Apart from the external behavioral modalities, the internal physiological signals have also attracted the attention of numerous researchers due to their objectivity and reliability. Zhang et al. et al.  performed valence-arousal emotion recognition based on the heart rate variability derived from the ECG signals. Atkinson and Campos 
improved the EEG-based emotion recognition by combining the mutual information-based feature selection methods with kernel classifiers. Liuet al.  constructed a real-time movie-induced emotion recognition system to continuously detect the discrete emotional states in the valence-arousal dimension. Zheng et al.  systematically evaluated the performances of different feature extraction, feature smoothing, feature selection and classification models for EEG-based emotion recognition. Their results indicated that stable neural patterns do exist within and across sessions. Among these modalities, EEG has been proven to be promising for emotion recognition and demonstrates competence for revealing the neurological mechanisms behind emotion processing.
In EEG-based emotion recognition, numerous EEG features have been exploited to enhance the performance of aBCI systems. The conventional EEG features could be categorized into temporal domain, frequency domain, and time-frequency domain. In the temporal domain, the most commonly used EEG features mainly include the fractal dimension and higher order crossings . Due to the nonstationary essence of the EEG signals and the fact that raw EEG signals are usually contaminated with artifacts and noises, the frequency domain features such as power spectral density (PSD) , higher order spectra , and differential entropy (DE)  and the time-frequency domain features such as wavelet features  and Hilbert-Huang spectra   have demonstrated outstanding performance in the EEG-based emotion recognition systems. However, these conventional EEG feature extraction methods are based on single-channel analysis, which neglects the EEG-based functional connectivity networks in association with different emotions.
2.2 Brain Functional Connectivity
Brain connectivity has long been studied in the fields of neuroscience and neuroimaging to explore the essential nature of the cerebrum. According to the attributes of connections, brain connectivity could be classified into three modes: structural connectivity, functional connectivity, and effective connectivity 
. These modes separately correspond to the biophysical connections between neurons or neural elements, the statistical relations between anatomically unconnected cerebral regions, and the directional causal effects from one neural element to another.
Recently, increasing evidence has indicated that a link does exist between brain functional connectivity and multiple psychophysiological diseases with cognitive deficiency. Murias et al.  found that robust patterns of EEG connectivity are apparent in autism spectrum disorders in the resting state. Yin et al.  concluded that the EEG-based functional connectivity in schizophrenia patients tends to be slower and less efficient. Ho et al.  indicated that adolescent depression typically relates to the inflexibly elevated default mode network connections based on fMRI. Whitton et al.  suggested that elevations in high frequency EEG-based functional connectivity may represent a neural pattern for the recurrent illness course of major depressive disorder. However, few studies have investigated the links between emotions and brain functional connectivity networks or conducted emotion recognition from the perspective of brain networks. Whether there truly exist specific connectivity patterns for different affective states remains to be lucubrated.
In the past years, only a few preliminary research efforts on EEG-based emotion recognition have attempted to employ the multichannel EEG analysis approaches. Dasdemir et al.  directly used the connectivity metric of phase locking value as the EEG feature in distinguishing the positive and negative emotions. Lee and Hsieh  tested three different connectivity metrics, correlation, coherence, and phase synchronization index, in classifying the positive, neutral, and negative emotions. Li et al.  also studied these three emotions by combining the functional connectivity with local action features. Moon et al.  utilized CNN to model the connectivity matrices constructed by three different connectivity metrics: correlation, phase locking value, and phase lag index. However, these studies either ignored the topology of the brain functional connectivity networks or failed to analyze the emotion-related functional connectivity signatures. In our previous study on EEG-based emotion recognition , we identified the brain functional connectivity patterns of the three emotions (sad, happy and neutral) and extracted the topological features from the brain networks to recognize these emotions. In this paper, we extend this preliminary work to the three-class (sad, happy, and neutral), five-class (disgust, fear, sad, happy, and neutral), and valence-arousal dimension multimodal emotion recognition tasks.
2.3 Eye Movement Data
Studies in neuroscience and biological psychology have indicated the relation between emotion and eye movement data, especially pupil diameter and dilation response. Widmann et al.  indicated that emotional arousal by novel sounds is reflected in the pupil dilation response and the P3 event-related potentials. Oliva and Anikin  suggested that the pupil dilation response reveals the perception of emotion valence and confidence in the decision-making process. Moreover, Black et al.  showed that the eye tracking and EEG data in autism spectrum disorders are atypical during the processes of attention to and cognition of facial emotions.
In addition, eye movement data could be obtained through eye tracking glasses which are wearable, portable and noninvasive. Therefore, eye movement data, as a behavioral reaction to emotions, have been widely utilized to assist with EEG-based emotion recognition in aBCI systems. López-Gil et al.  improved EEG-based emotion recognition by combining eye tracking and synchronized biometrics to detect the valence-arousal basic emotions and a complex emotion of empathy. Zheng et al.  evaluated the complementary characteristics of EEG and eye movement data in classifying positive, neutral and negative emotions by fusing the DE and pupil diameter features. Lu et al.  extended this preliminary work and systematically examined sixteen different eye movement features. Furthermore, their work has been extended to the five emotions by Li et al.  and Zhao et al. , and the discrimination ability and stability over time of EEG and eye tracking data were also revealed. However, these research approaches were all based on single-channel analysis for the EEG signals: whether there exist complementary representation properties of EEG connectivity features and eye movement data remains to be further analyzed.
2.4 Multimodal Frameworks
As a complex psychological state, emotion is reflected in both physical behaviors and physiological activities  . The collection of external behavioral data is more convenient than that of internal physiological signals, since the procedure could be accomplished without involving any invasive devices. Despite the inconvenience of data collection, the physiological signals are believed to be more objective and reliable because the participants cannot forge their internal activities.
With different modalities exhibiting distinct properties, modern emotion recognition approaches have the tendency of combining multiple modalities to enhance the performance of aBCI systems. Perez-Gaspar et al. 
extended the evolutionary computation of artificial neural networks and hidden Markov models in classifying four emotions (angry, sad, happy, and neutral) by combining the speech with facial expressions. Tzirakiset al.  also fused the auditory and visual modalities using an end-to-end valence-arousal emotion recognition model. They applied the CNN and ResNet models to extract features from speech and visual signals, respectively, which were then concatenated and fed into the LSTM model to accomplish the end-to-end training manner. Ranganathan et al. 
conducted a 23-class discrete emotion recognition task based on four different deep belief networks by combining a variety of modalities, including face, gesture, voice and physiological signals. Huanget al.  studied the fusion of EEG and facial expression data using two decision-level fusion strategies, the sum and production rules, in detecting the four basic emotions (fear, sad, happy, and neutral).
In recent years, many researchers have suggested that the combination of EEG and eye tracking data is a promising approach for recognizing emotions in aBCI systems. López-Gil et al. 
combined EEG with eye tracking and biometric signals in a synchronized manner to classify emotions using multiple machine learning methods. Liuet al.  applied the bimodal deep autoencoder (BDAE) neural network in detecting the three basic emotions (positive, neutral, and negative) from EEG and eye movement data. Tang et al. 
conducted the same task using bimodal deep denoising autoencoder and bimodal-LSTM models. Zhenget al.  presented EmotionMeter for detecting the four emotions (fear, sad, happy, and neutral). Qiu et al.  adopted the deep canonical correlation analysis (DCCA) model as a multimodal deep neural network for classifying the three-class, four-class, and valence-arousal emotions. Their results suggested that DCCA outperforms BDAE and bimodal-LSTM models in multimodal emotion recognition. In this paper, we apply the DCCA model to address the multimodal emotion recognition task.
3 Emotion Experiment Design
The emotion experiments were designed to simultaneously record the EEG and eye movement signals of the five prototypical emotions (disgust, fear, sad, happy, and neutral). Many existing works have indicated the efficiency and reliability of movie clips in eliciting the subjects’ emotions due to the blending of audio and visual information  . Therefore, the movie clips were selected as the type of stimuli to better induce the subjects’ affective states.
During the preliminary experiment, a stimuli pool containing emotional movie clips corresponding with the five emotions was prepared and then assessed by 20 volunteers using rating scores ranging from 0 to 5. The higher scores represented the more successful elicitation of the subjects’ emotions. Eventually, 9 movie clips for each of the five emotions were selected from the stimuli pool, all of which received a mean score of 3 or higher. The durations of these clips range from 2 to 4 minutes.
Sixteen subjects (6 males and 10 females) with normal hearing and self-reported normal or corrected-to-normal vision were recruited for our emotion experiments. All subjects were selected using the Eysenck Personality Questionnaire (EPQ), which could measure the personality of an individual in three independent dimensions: Extroversion/Introversion, Neuroticism/Stability, and Psychoticism/Socialization . Those with extroverted characteristics and stable mood are more readily induced to experience the intended emotions throughout the experiment in comparison with those of other personalities. Hence, subjects that are more appropriate for the emotion experiments were selected according to the EPQ feedback.
The emotion experiments were conducted under the laboratory environment. During the emotion experiment, the subject was required to view the emotional movie clips and relaxed as much as possible to induce their emotions. Meanwhile, their EEG and eye movement signals were simultaneously collected by the 62-channel wet-electrode cap and the SMI eye tracking glasses, respectively. The EEG data were recorded with the ESI NeuroScan System at a sampling rate of 1000 Hz. The layout of the 62-channel EEG cap is based on the higher-resolution international 10-20 system. Fig. 2 presents these wearable devices and the layout of the 62-channel EEG cap.
In this paper, there were 15 trials in total in each experiment, where each of the five emotions corresponds to 3 movie clips. Moreover, each subject was required to perform three sessions of the experiment on different days with an interval longer than three days. To better elicit the subjects’ emotions, there was no repetition of movie clips within or across the three sessions. Thus, the aforementioned 9 movie clips for each emotion were randomly divided into three groups and later constructed the three sessions. As studies in cognitive science have indicated that emotion varies in a fluent and smooth manner, the order of play of these movie clips in one experiment was elaborately designed according to the following criteria: 1) avoiding sudden changes in emotion, such as clips of the emotion of disgust followed by clips of happiness; 2) utilizing movie clips of neutral emotion as a cushion between two opposite emotions.
Fig. 3 illustrates the protocol of the designed emotion experiment. During each trial of the experiment, the movie clip was guided by 15 seconds of a brief introduction about the content and the emotion to be elicited and ended with 15 or 30 seconds of self-assessment and relaxation for the subjects to mitigate their emotions. Particularly, the resting time was 30 seconds after disgust or fear emotions, and 15 seconds after the other three emotions. In general, the duration of one experiment was approximately 55 minutes.
This multimodal emotion dataset is named SEED-V , which is a subset of the public emotion EEG dataset SEED111http://bcmi.sjtu.edu.cn/~seed/. The SEED dataset  contains 62-channel EEG signals and eye movement data corresponding to three emotions (sad, happy, and neutral) from 9 subjects, with each subject performing the experiments three times. Thus, there are 27 experiments in total and each experiment contains 15 trials, with 5 movie clips for each of the three emotions.
3.4 Ethic Statement
The emotion experiments have been approved by the Scientific & Technical Ethics Committee of the Bio-X Institute at Shanghai Jiao Tong University. All of the subjects participating in our experiments were informed of the experimental procedures and signed the informed consent document before the emotion experiments.
The raw EEG signals collected during the emotion experiments are usually of high resolution and contaminated by surrounding artifacts, which hampers both the processing and the analysis of the emotion-relevant brain neural activities. To remove the irrelevant artifacts, the raw EEG data were preprocessed with Curry 7 to conduct the baseline correction, and a bandpass filter between 1 and 50 Hz was applied. Then, we downsampled the EEG signals to 200 Hz to expedite the processing procedures. For further exploration of the frequency-specific brain functional connectivity patterns, the EEG data were filtered with the five bandpass filters corresponding to the five frequency bands (: 1-4 Hz, : 4-8 Hz, : 8-14 Hz, : 14-31 Hz, and : 31-50 Hz).
For the eye movement signals, the artifacts were eliminated using signals recorded from the EOG and FPZ channels. It has been proven that pupil diameter is subject to the ambient luminance in addition to the emotion stimuli materials 
. Fortunately, according to our observation and analysis, the pupil diameter exhibits consistency in response to the same emotional stimuli material across different subjects. Thus, principal component analysis was adopted to eliminate the luminance reflex of the pupil and to preserve the emotion-relevant components.
4.2 Brain Functional Connectivity Network
The EEG-based brain functional connectivity networks consist of vertices and edges, which could be represented by the EEG electrodes and the associations between pairs of EEG signals from two different channels, respectively .
4.2.1 Vertex Selection
Although the wet-electrode EEG cap was adopted in our emotion experiments within the laboratory environment due to its reliability, the dry-electrode device with fewer EEG channels offers great convenience and portability in developing aBCI systems under actual scenario conditions. Thus, the question of whether the brain functional connectivity networks comprised of fewer EEG channels could exhibit considerable performance in emotion recognition remains unexplored.
Previous studies have demonstrated that the DSI-24 wearable sensing EEG headset is quite portable and appropriate for real scenarios . There are in total 18 channels in this device, and the layout of these electrodes is based on the normal international 10-20 system. Fortunately, these 18 electrodes can be perfectly mapped into the same locations in the layout of the higher-resolution international 10-20 system. Fig. 4 displays the locations of the 18-channel dry-electrode reflected in the layout of the 62-channel wet-electrode device.
In this paper, since the raw EEG signals were acquired with the 62-channel wet-electrode device, we constructed the brain connectivity network with 62 vertices in total. Furthermore, we selected these 18 electrodes and compared the performances of EEG connectivity features extracted from the brain networks constructed with two categories of vertices: 18-channel and 62-channel.
4.2.2 Edge Measurement
To measure the associations between pairs of EEG signals recorded from different channels, we compared two connectivity metrics in this paper: Pearson’s correlation coefficient and spectral coherence. The former can measure the linear relation between two EEG signals and , which is defined as:
where denotes the covariance between and , and and
are the respective standard deviations.
Distinguished from the correlation that measures the connectivity between two EEG channels in the temporal domain, coherence can measure the connectivity between two signals and at frequency in the frequency domain, which could be written as:
where is the cross power spectral density between and , and and are the respective power spectral densities.
4.2.3 Network Construction
In our previous work, we have revealed the complementary characteristics of the state-of-the-art DE feature and eye movement data in emotion recognition, where features were extracted with a 4-second nonoverlapping time window    . Considering further fusion of our proposed EEG connectivity feature with eye movement data and comparison with the DE feature, we constructed the brain functional connectivity network using the same time window. The detailed procedure of brain network construction is depicted in Fig. 5.
First, the preprocessed EEG signals were segmented with a 4-second nonoverlapping time window. As a result, each sample is represented by a 4-second EEG segment of 62 channels. Since the five bandpass filters were employed during the preprocessing, each sample actually contains five 4-second EEG segments corresponding to the five frequency bands.
Second, the associations between pairs of EEG channels were computed using the connectivity metric (correlation or coherence) in each frequency band for each sample. Therefore, each brain network corresponds to a symmetric connectivity matrix, where elements denote the association weights between pairs of EEG channels. Since the value of self-correlation is always equal to 1, elements on the main diagonal of connectivity matrices are usually set to zero and will not be used in later analysis .
Finally, five brain connectivity networks were acquired for each sample, corresponding to the five frequency bands. For the brain networks constructed with 18 channels, there is no need to repeat the above procedures. We could directly select the corresponding elements in the connectivity matrix to construct the connectivity matrix.
4.3 Emotion-Relevant Critical Subnetwork Selection
Although the raw EEG signals were preprocessed to remove the noises, there still remain certain minor artifacts that may not be eliminated during the preprocessing phase. Unfortunately, these artifacts may further lead to the weak associations in the brain networks, which eventually results in obscureness in profiling the brain network topology. By convention, this problem is resolved by directly discarding these weak associations according to an absolute or a proportional threshold after sorting the association weights  . However, this method fails to take the targeted task into consideration, thus offering no guarantee that the preserved stronger associations are truly task relevant. Therefore, we have proposed an emotion-relevant critical subnetwork selection approach to address this issue.
The goal is to explore the universal emotion-relevant brain functional connectivity patterns among different subjects. Therefore, we utilized samples in training sets of all subjects to select the emotion-relevant critical subnetworks. Nevertheless, it should be mentioned that the affective models trained in this paper are subject-dependent. For further analysis of the brain connectivity patterns in different frequency bands, we selected a total of five critical subnetworks corresponding to the five frequency bands. Assuming that is the set of emotion labels, there are thus categories of emotions. The emotion-relevant critical subnetwork selection approach for one specific frequency band is summarized into the following three phases:
Averaging phase: all brain networks in training sets are averaged over all samples and all subjects for each emotion: thus, averaged brain networks corresponding to emotions could be obtained.
Thresholding phase: for each averaged brain network, the same proportional threshold is applied to solely preserve the strongest associations. Thus, we attain critical edges for each emotion.
Merging phase: along with the original vertices, the critical edges in all averaged brain networks are merged together to construct the emotion-relevant critical subnetwork.
Here, the proportional threshold represents the proportion of the preserved connections relative to all connections in the brain network. Particularly, the threshold value was tuned as a hyperparameter of the affective models, ranging from [0.0, 1.0] with step size of 0.01.
This procedure is also described in Algorithm 1. Suppose that the connectivity matrices along with corresponding labels in the training sets for one frequency band are defined as:
where and represent the number of samples and the number of vertices, respectively.
4.4 Feature Extraction
4.4.1 EEG Functional Connectivity Network Features
In this paper, we extracted EEG features from the perspective of brain functional connectivity networks. The essence of the emotion-relevant critical subnetwork primarily consists in the network topology. According to the five critical subnetworks in the five corresponding frequency bands, we could derive the critical connectivity matrices for each sample in the entire dataset. Precisely, if one edge belongs to the critical subnetwork, the corresponding association weight in the matrix will remain unmodified; otherwise, it will be set to zero, thus simulating the process of discarding this edge from the brain network. The critical connectivity matrices were subsequently fed into the Brain Connectivity Toolbox  to extract the three topological features: strength, clustering coefficient, and eigenvector centrality.
Assume that the selected emotion-relevant brain functional connectivity network of one sample is regarded as an indirect graph , where and represent the sets of vertices and critical edges, respectively. There are in total vertices in the brain network. Suppose that the corresponding symmetric connectivity matrix of is , where , denotes the association between two vertices and , and . According to , we could provide rigorous definitions for the three EEG functional connectivity network features as below.
The strength feature is a basic measurement of the network topology, which could be written as:
where and represent the sum of the positive and negative associations connected to vertex , respectively, and are computed as:
The clustering coefficient feature is a measurement of the brain functional segregation that primarily quantifies the clusters within the brain network, which is defined as:
represent the clustering coefficient vector for the positive and negative associations of vertex, respectively. The clustering coefficient is equivalent to the fraction of triangles around a vertex and is calculated as:
where denotes the total numbers of neighbors for vertex , and and
are the positive and negative weighted geometric means of triangles around, respectively. The triangles around a vertex are represented as:
where is the neighborhood of vertex .
The eigenvector centrality feature evaluates the significance of an individual vertex in interacting with other vertices, facilitating integration, and thus serving a crucial role in network resilience. The eigenvector centrality score of vertex could be defined as:
where is the neighborhood of . This equation could be easily transformed to the eigenvector equation using the vector notations: . In general, the dimensions of the strength, clustering coefficient, and eigenvector centrality features in each frequency band are , , and , respectively.
4.4.2 Eye Movement Features
First, eye movement parameters were calculated using the BeGaze222https://gazeintelligence.com/smi-software-download analysis software of the SMI eye tracking glasses, including pupil diameter, fixation duration, blink duration, saccade, and event statistics. Subsequently, the statistics of these eye movement parameters were derived, thus obtaining the 33-dimensional eye movement feature. The detailed description of the extracted eye movement feature could be found in our previous work  .
As aforementioned, the variation of emotion is fluent and smooth, which should be reflected in the attributes of the extracted features. In addition, the extracted EEG features are typically of high dimensionality and may contain unrelated and redundant information, which increases the unnecessary computation and time costs. Hence, a feature smoothing method, linear dynamical system (LDS) , and a feature selection algorithm, minimal redundancy maximal relevance (mRMR) , were applied to tackle this issue before feeding features into the classifier.
4.5.1 Deep Canonical Correlation Analysis Model
Fig. 6 presents the architecture of the DCCA model, which comprises three parts: the stacked nonlinear layers (L2 and L3), CCA calculation, and feature fusion layer. The DCCA model could learn shared representations of high correlation from multimodal data .
Assume that the transformed features for two modalities and are separately denoted by and , where and are the respective nonlinear transformations, and and are the corresponding parameters. Thus, the optimization function is written as:
Suppose that the centered data matrices are and , and and are the respective regularization parameters: hence, the correlation of the transformed features could be calculated as:
In particular, the gradient of
could be computed using singular value decomposition. The parameter updating is accomplished by using the negative value of correlation as the loss function. Thus, minimizing loss is equivalent to maximizing correlation. The feature fusion layer is defined as the weighted average of the two transformed features
. Finally, the fused multimodal feature is fed into the support vector machine (SVM) to train the affective model.
In this paper, the cross validation and grid search methods were adopted to tune the hyperparameters. Supposing that the numbers of nodes in L1, L2, and L3 layers of DCCA are , , and , respectively, these three hyperparameters are searched in the space where and . The learning rate is tuned from to .
4.5.2 Experiment Setups
In this paper, we evaluate the proposed approaches on three public datasets: SEED , SEED-V , and DEAP . For the SEED dataset, the three-class (sad, happy, and neutral) emotion classification task is conducted. The training and test sets are the first 9 trials and the last 6 trials, respectively, which is the same as in    . For the SEED-V dataset, the five-class (disgust, fear, sad, happy, and neutral) emotion classification task is performed with a three-fold cross validation strategy, which follows the same setups as in  .
The DEAP dataset contains 32-channel EEG signals and 8-channel peripheral physiological signals from 32 subjects in the valence-arousal dimension. Each subject watched 40 one-minute music videos. The EEG signals were preprocessed with a bandpass filter between 4 and 45 Hz. For the DEAP dataset, we build the brain networks using solely 32 channels in the four frequency bands (without the band) with a 2-second nonoverlapping time window. The peripheral physiological feature is 48-dimensional. In addition, two binary (arousal-level, valence-level) classification tasks were conducted with ten-fold cross validation strategy. The setups for the DEAP dataset are in accordance with   .
5 Experimental Results and Discussion
5.1 Discrimination Ability
To demonstrate the discrimination ability of the EEG functional connectivity network features in emotion recognition, we conduct EEG-based emotion recognition for the three datasets.
5.1.1 Experimental Results on the SEED-V Dataset
For the SEED-V dataset, we constructed the EEG-based brain functional connectivity networks using two different categories of vertices and two different edge measurements, then extracted three EEG functional connectivity network features from the brain networks.
Table I presents the five-class emotion recognition performance of these features. We could observe that the strength feature exhibits outstanding performance regardless of the number of vertices and connectivity metric. This may be because the strength feature could intuitively reflect the emotion associated connectivity of the entire brain regions. In general, the strength and eigenvector centrality features exhibit higher accuracy with correlation as the connectivity metric, whereas the clustering coefficient feature exhibits better performance with coherence.
The features extracted from 18-channel-based brain networks exhibit considerable performance compared with those of 62-channel networks, which indicates that the EEG functional connectivity network features extracted from the brain networks constructed with fewer channels are promising for actual scenarios of emotion recognition applications in aBCI systems.
In our previous work , we have demonstrated that the EEG functional connectivity network features considerably outperform the PSD feature and that they are superior to those directly using the connectivity metrics as features. In this paper, the best classification accuracy of % achieved by the strength feature defeats the value of % attained by the single-channel-based state-of-the-art DE feature in the work of  for the same dataset.
To further analyze the capability of the best feature, the strength, in recognizing each of the five emotions, the confusion matrices are displayed in Fig. 7. It could be observed that the strength feature is superior in detecting the emotion of happiness, followed by the emotions of neutrality and fear with correlation as connectivity metric. In contrast, from the perspective of coherence as connectivity metric, the strength feature exhibits the best performance in recognizing the fear emotion and exhibits similar performances with respect to the emotions of sadness, neutrality, and happiness.
Generally, the strength feature with correlation as connectivity metric could achieve better performance in classifying all emotions, except sadness, in comparison with that of coherence. Overall, the EEG feature exhibits fair performance in recognizing the emotion of disgust and could be easily confused by the sad and neutral emotions, the results for which are in accordance with previous findings . In addition, the strength feature with the 18-channel approach exhibits considerable performance compared with that of the 62-channel approach. Particularly, the strength feature with the 18-channel approach could achieve the same classification accuracy of 84% with that of the 62-channel approach with respect to recognizing the emotion of happiness.
5.1.2 Experimental Results on SEED and DEAP Datasets
On SEED and DEAP datasets, we utilize the best EEG functional connectivity network feature, the strength, to further verify its discrimination ability in classifying emotions.
On the SEED dataset, the three-class emotion recognition accuracy achieved by the strength feature is , which is higher than the value of  attained by the DE feature. On the DEAP dataset, the performances of the strength feature for two binary classification tasks (arousal-level and valence-level) are and , respectively. These results considerably outperform those of and  achieved by the PSD feature, as well as those of and  attained using the capsule network.
These results demonstrate the discrimination ability of the EEG functional connectivity network features in classifying three-class emotions (sad, happy, and neutral), five-class emotions (disgust, fear, sad, happy, and neutral), and valence-arousal dimension. Additionally, the strength feature outperforms the most commonly used PSD feature and the state-of-the-art DE feature.
5.2 Complementary Representation Properties
In this section, the DCCA model is adopted to combine the EEG signals with other modalities for multimodal emotion recognition with respect to the three datasets. Here, the best EEG functional connectivity network feature, strength with correlation as connectivity metric, is utilized for the evaluation.
5.2.1 Experimental Results on the SEED-V Dataset
The combination of EEG and eye movement data for the SEED-V dataset was implemented using two different fusion strategies: feature-level fusion (FLF) and DCCA. The FLF is a direct concatenation of the two modalities’ features. The experimental results are displayed in Table II. The best classification performance values (%) based on the EEG connectivity feature, eye movement data, FLF, and DCCA approaches are 74.057.09, 65.217.60, 78.036.07, and 84.515.11, respectively. These results indicate that the combination of the EEG connectivity feature and eye movement data could enhance the performance of five-class emotion recognition. Moreover, the DCCA model may find the shared space to be more related to emotion. In addition, the fusion based on the 18-channel EEG connectivity feature and eye movement data also achieves considerable classification performance.
Table III presents the performance of our proposed EEG feature compared with the single-channel-based state-of-the-art DE feature   for the multimodal emotion recognition task with respect to the SEED-V dataset. These results reveal that our proposed EEG connectivity feature outperforms the DE feature in combination with eye movement data to classify the five emotions, whether using the 62-channel or 18-channel-based functional connectivity networks.
|Zhao et al. ||FLF||73.65||8.90|
|Liu et al. ||Max||73.17||9.27|
|Our method (62-channel)||FLF||78.03||6.07|
|Our method (18-channel)||FLF||78.02||7.30|
To investigate the capabilities of EEG connectivity feature and eye movement data in detecting each specific emotion, the confusion matrices are displayed in Fig. 8. Here, the EEG feature is the strength feature with 62 channels and the correlation connectivity metric.
It could be observed that both EEG and eye movement data exhibit potential in classifying the emotions of fear, happiness, and neutrality. In particular, the EEG connectivity feature dominates the recognition of the happiness emotion, while eye movement data excel at detecting the fear emotion. The confusion graph of these two modalities is also presented in Fig. 9.
In comparison with the single modality affective model, the last two confusion matrices in Fig. 8 indicate that the multimodal fusion strategies could indeed improve the classification performance for all of the five emotions. These results demonstrate the complementary representation properties of the EEG connectivity feature and eye movement data in classifying the five emotions.
5.2.2 Experimental Results on SEED and DEAP Datasets
The classification performances of our work and several existing works with respect to the SEED dataset are displayed in Table IV. The existing works are based on the DE feature and eye movements. We could observe that the best performance of is achieved by our work, which combines the strength feature with eye movement data to detect three emotions (happiness, neutrality, and sadness). These results further verify that the combination of EEG and eye movements could enhance the classification performance.
|Lu et al. ||FLF||83.70||-|
|Song et al. ||DGCNN||90.40||8.49|
|Liu et al. ||BDAE||91.01||8.91|
|Tang et al. ||Bimodal-LSTM||93.97||7.03|
|Liu et al. ||DCCA||94.58||6.16|
Table V presents the classification performances of our work and several existing works with respect to the DEAP dataset. The existing works are based on the combination of peripheral physiological features with the PSD   or DE    features. The highest classification accuracy of the two binary classification tasks, for the arousal-level and for the valence-level, are both obtained by our work. These results reveal that the strength feature is also superior to the PSD and DE features in fusion with peripheral physiological signals.
|Xing et al. ||SAE-LSTM||74.38||-||81.10||-|
|Liu et al. ||BDAE||80.50||3.39||85.20||4.47|
|Tang et al. ||Bimodal-LSTM||83.23||2.61||83.82||5.01|
|Yin et al. ||MESAE||84.18||-||83.04||-|
|Liu et al. ||DCCA||84.33||2.25||85.62||3.48|
5.3 Critical Frequency Bands
In this section, we evaluate the critical frequency band of the EEG functional connectivity network feature on the SEED-V dataset.
Fig. 10 presents the classification performance of different frequency bands using the strength feature with correlation as the connectivity metric. The result demonstrates that the and frequency bands are superior in classifying the five emotions in comparison with other bands, which is in accordance with the results attained by the DE feature  . Additionally, the frequency bands with the 18-channel approach achieve comparable performance with that of the 62-channel approach, which implies the possibility of applying 18 electrodes to detect emotions in real scenario applications.
5.4 Brain Functional Connectivity Patterns
In this section, we investigate the brain functional connectivity patterns based on the SEED-V dataset. The emotion-relevant critical subnetworks are selected through three phases: averaging, thresholding, and merging. The numbers of subnetworks attained after each phase are 25, 25, and 5, respectively. Specifically, 25 corresponds to the five emotions in the five frequency bands, while 5 refers to the five frequency bands, since the critical connections of the five emotions in each frequency band are merged together during the third phase.
In this paper, the 25 subnetworks attained after the thresholding phase are adopted to analyze the frequency-specific brain functional connectivity patterns in association with the five emotions. To analyze both the positive and negative connections, we visualize the brain functional connectivity networks based on the connectivity metric of correlation.
Considering that the subnetworks are selected using samples from training sets of all participants and that a three-fold cross validation strategy is utilized, the emotion-relevant critical subnetworks are calculated three times. The results demonstrate that stable connectivity patterns are exhibited across these three calculations.
Fig. 11 presents the 25 critical subnetworks associated with the five emotions in the five frequency bands averaged over three folds. For better analysis of the distinct connectivity patterns for each emotion in each frequency band, we display all of the critical connections except for the intersections among the five emotions.
It could be observed that the positive correlation connectivity is much higher in the frontal lobes in the band for the negative affective states, including the emotions of disgust, fear, and sadness. In particular, for the disgust emotion, stronger positive connectivity is exhibited in the bands within both the left and right brain regions, and stronger negative connectivity is observed between the left and right brain regions. However, the fear emotion in the band is dominated by the stronger negative connectivity, and there are much weaker positive connections compared with those of the disgust emotion. In addition, much more positive connectivity in the band is exhibited for the fear emotion.
The fact that the functional connectivity patterns are quite similar for the sad and neutral emotions could account for the confusion between the sad and neutral emotions, which is consistent with previous findings  . Nevertheless, the connectivity in the band tends to be positive within the frontal areas and negative in the left brain regions for the sadness emotion, while negative in larger brain areas for the neutral emotion.
In terms of the happiness emotion, the entire cerebral areas are much more active in the band with both positive and negative correlation connectivity. Moreover, in the band, negative connectivity is revealed between the frontal and parietal lobes, with positive connectivity between the frontal and temporal lobes. In the band, the functional connectivity patterns for the emotions of happiness, fear, and disgust are more similar, which may be originated from the fact that amygdala voxels contribute to these three emotions . Overall, these results are in accordance with findings in the literature based on fMRI that the brain regions contributing to the emotion classification are predominated in the frontal and parietal lobes .
In this paper, we have proposed a novel emotion-relevant critical subnetwork selection algorithm and evaluated three EEG connectivity features (strength, clustering coefficient, and eigenvector centrality) on three public datasets: SEED, SEED-V, and DEAP. The experimental results have revealed that the emotion associated brain functional connectivity patterns do exist. The strength feature is the best EEG connectivity feature and outperforms the state-of-the-art DE feature based on single-channel analysis. Furthermore, we have performed the multimodal emotion recognition using the DCCA model based on the EEG connectivity feature. The classification accuracies are on the SEED dataset, on the SEED-V dataset, and and on the DEAP dataset. These results have demonstrated the complementary representation properties between the EEG connectivity feature and eye movement data. Additionally, the results have indicated that the brain functional connectivity networks based on the 18-channel approach are promising for multimodal emotion recognition applications in aBCI systems under actual scenario situations.
-  K. Panetta, “5 trends appear on the Gartner hype cycle for emerging technologies, 2019,” https://www.gartner.com/smarterwithgartner/5-trends-appear-on-the-gartner-hype-cycle-for-emerging-technologies-2019.
-  S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: From unimodal analysis to multimodal fusion,” Information Fusion, vol. 37, pp. 98–125, 2017.
-  C. Mühl, B. Allison, A. Nijholt, and G. Chanel, “A survey of affective brain computer interfaces: principles, state-of-the-art, and challenges,” Brain-Computer Interfaces, vol. 1, no. 2, pp. 66–84, 2014.
-  M. M. Shanechi, “Brain-machine interfaces from motor to mood,” Nature Neuroscience, vol. 22, no. 10, pp. 1554 – 1564, 2019.
-  T. Thanapattheerakul, K. Mao, J. Amoranto, and J. H. Chan, “Emotion in a century: A review of emotion recognition,” in Proceedings of the 10th International Conference on Advances in Information Technology. ACM, 2018, pp. 1–8.
-  B. Ko, “A brief review of facial emotion recognition based on visual information,” Sensors, vol. 18, no. 2, p. 401, 2018.
-  B. W. Schuller, “Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends,” Communications of the ACM, vol. 61, no. 5, pp. 90–99, 2018.
-  L. Shu, J. Xie, M. Yang, Z. Li, Z. Li, D. Liao, X. Xu, and X. Yang, “A review of emotion recognition using physiological signals,” Sensors, vol. 18, no. 7, p. 2074, 2018.
-  S. M. Alarcao and M. J. Fonseca, “Emotions recognition using EEG signals: A survey,” IEEE Transactions on Affective Computing, 2017, DOI: 10.1109/TAFFC.2017.2714671.
-  X. Cheng, Y. Wang, S. Dai, P. Zhao, and Q. Liu, “Heart sound signals can be used for emotion recognition,” Scientific Reports, vol. 9, no. 1, p. 6486, 2019.
S. Poria, I. Chaturvedi, E. Cambria, and A. Hussain, “Convolutional mkl based multimodal emotion recognition and sentiment analysis,” in2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016, pp. 439–448.
Z. Zhang, F. Ringeval, B. Dong, E. Coutinho, E. Marchi, and B. Schüller, “Enhanced semi-supervised learning for multimodal emotion recognition,” in2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 5185–5189.
-  F. Povolny, P. Matejka, M. Hradis, A. Popková, L. Otrusina, P. Smrz, I. Wood, C. Robin, and L. Lamel, “Multimodal emotion recognition for avec 2016 challenge,” in Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. ACM, 2016, pp. 75–82.
-  M. Soleymani, M. Pantic, and T. Pun, “Multimodal emotion recognition in response to videos,” IEEE Transactions on Affective Computing, vol. 3, no. 2, pp. 211–223, 2011.
-  W.-L. Zheng, J.-Y. Zhu, and B.-L. Lu, “Identifying stable patterns over time for emotion recognition from EEG,” IEEE Transactions on Affective Computing, vol. 10, no. 3, pp. 417–429, 2019.
-  B. García-Martínez, A. Martinez-Rodrigo, R. Alcaraz, and A. Fernández-Caballero, “A review on nonlinear methods using electroencephalographic recordings for emotion recognition,” IEEE Transactions on Affective Computing, 2019, DOI: 10.1109/TAFFC.2018.2890636.
-  Y. Lu, W.-L. Zheng, B. Li, and B.-L. Lu, “Combining eye movements and EEG to enhance emotion recognition,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 1170–1176.
-  J.-M. López-Gil, J. Virgili-Gomá, R. Gil, T. Guilera, I. Batalla, J. Soler-González, and R. García, “Method for improving EEG based emotion recognition by combining it with synchronized biometric and eye tracking technologies in a non-invasive and low cost way,” Frontiers in Computational Neuroscience, vol. 10, p. 85, 2016.
-  I. B. Mauss and M. D. Robinson, “Measures of emotion: A review,” Cognition and Emotion, vol. 23, no. 2, pp. 209–237, 2009.
-  Y. Zhan, R. C. Paolicelli, F. Sforazzini, L. Weinhard, G. Bolasco, F. Pagani, A. L. Vyssotski, A. Bifone, A. Gozzi, D. Ragozzino et al., “Deficient neuron-microglia signaling results in impaired functional brain connectivity and social behavior,” Nature Neuroscience, vol. 17, no. 3, p. 400, 2014.
-  E. S. Finn, X. Shen, D. Scheinost, M. D. Rosenberg, J. Huang, M. M. Chun, X. Papademetris, and R. T. Constable, “Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity,” Nature Neuroscience, vol. 18, no. 11, p. 1664, 2015.
-  S. Smith, “Linking cognition to brain connectivity,” Nature Neuroscience, vol. 19, no. 1, p. 7, 2016.
-  T.-H. Li, W. Liu, W.-L. Zheng, and B.-L. Lu, “Classification of five emotions from EEG and eye movement signals: Discrimination ability and stability over time,” in 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE, 2019, pp. 607–610.
-  S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “Deap: A database for emotion analysis; using physiological signals,” IEEE Transactions on Affective Computing, vol. 3, no. 1, pp. 18–31, 2011.
B. Hasani and M. H. Mahoor, “Facial expression recognition using enhanced deep
3d convolutional neural networks,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 30–40.
-  G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 5200–5204.
-  A. Schirmer and R. Adolphs, “Emotion perception from face, voice, and touch: comparisons and convergence,” Trends in Cognitive Sciences, vol. 21, no. 3, pp. 216–228, 2017.
Q. Zhang, X. Chen, Q. Zhan, T. Yang, and S. Xia, “Respiration-based emotion recognition with deep learning,”Computers in Industry, vol. 92, pp. 84–90, 2017.
-  M. Nardelli, G. Valenza, A. Greco, A. Lanata, and E. P. Scilingo, “Recognizing emotions induced by affective sounds through heart rate variability,” IEEE Transactions on Affective Computing, vol. 6, no. 4, pp. 385–394, 2015.
-  J. Atkinson and D. Campos, “Improving bci-based emotion recognition by combining EEG feature selection and kernel classifiers,” Expert Systems with Applications, vol. 47, pp. 35–41, 2016.
-  Y.-J. Liu, M. Yu, G. Zhao, J. Song, Y. Ge, and Y. Shi, “Real-time movie-induced discrete emotion recognition from EEG signals,” IEEE Transactions on Affective Computing, vol. 9, no. 4, pp. 550–562, 2017.
-  X.-W. Wang, D. Nie, and B.-L. Lu, “Emotional state classification from EEG data using machine learning approach,” Neurocomputing, vol. 129, pp. 94 – 106, 2014.
-  R. Jenke, A. Peer, and M. Buss, “Feature extraction and selection for emotion recognition from EEG,” IEEE Transactions on Affective Computing, vol. 5, no. 3, pp. 327–339, 2014.
-  W.-L. Zheng and B.-L. Lu, “Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks,” IEEE Transactions on Autonomous Mental Development, vol. 7, no. 3, pp. 162–175, 2015.
-  R.-N. Duan, J.-Y. Zhu, and B.-L. Lu, “Differential entropy feature for EEG-based emotion classification,” in 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE, 2013, pp. 81–84.
-  P. Ackermann, C. Kohlschein, J. Á. Bitsch, K. Wehrle, and S. Jeschke, “EEG-based automatic emotion recognition: Feature extraction, selection and classification methods,” in 2016 IEEE 18th International Conference on E-health Networking, Applications and Services (Healthcom). IEEE, 2016, pp. 1–6.
-  M. Rubinov and O. Sporns, “Complex network measures of brain connectivity: uses and interpretations,” Neuroimage, vol. 52, no. 3, pp. 1059–1069, 2010.
-  M. Murias, S. J. Webb, J. Greenson, and G. Dawson, “Resting state cortical connectivity reflected in EEG coherence in individuals with autism,” Biological Psychiatry, vol. 62, no. 3, pp. 270–273, 2007.
-  Z. Yin, J. Li, Y. Zhang, A. Ren, K. M. Von Meneen, and L. Huang, “Functional brain network analysis of schizophrenic patients with positive and negative syndrome based on mutual information of EEG time series,” Biomedical Signal Processing and Control, vol. 31, pp. 331–338, 2017.
-  T. C. Ho, C. G. Connolly, E. H. Blom, K. Z. LeWinn, I. A. Strigo, M. P. Paulus, G. Frank, J. E. Max, J. Wu, M. Chan et al., “Emotion-dependent functional connectivity of the default mode network in adolescent depression,” Biological Psychiatry, vol. 78, no. 9, pp. 635–646, 2015.
-  A. E. Whitton, S. Deccy, M. L. Ironside, P. Kumar, M. Beltzer, and D. A. Pizzagalli, “EEG source functional connectivity reveals abnormal high-frequency communication among large-scale functional networks in depression,” Biological Psychiatry. Cognitive Neuroscience and Neuroimaging, vol. 3, no. 1, p. 50, 2018.
-  Y. Dasdemir, E. Yildirim, and S. Yildirim, “Analysis of functional brain connections for positive–negative emotions using phase locking value,” Cognitive Neurodynamics, vol. 11, no. 6, pp. 487–500, 2017.
-  Y.-Y. Lee and S. Hsieh, “Classifying different emotional states by means of EEG-based functional connectivity patterns,” PloS One, vol. 9, no. 4, p. e95415, 2014.
-  P. Li, H. Liu, Y. Si, C. Li, F. Li, X. Zhu, X. Huang, Y. Zeng, D. Yao, Y. Zhang, and P. Xu, “EEG based emotion recognition by combining functional connectivity network and local activations,” IEEE Transactions on Biomedical Engineering, 2019, DOI: 10.1109/TBME.2019.2897651.
-  S.-E. Moon, S. Jang, and J.-S. Lee, “Convolutional neural network approach for EEG-based emotion recognition using brain connectivity and its spatial information,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 2556–2560.
-  X. Wu, W.-L. Zheng, and B.-L. Lu, “Identifying functional brain connectivity patterns for EEG-based emotion recognition,” in 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE, 2019, pp. 235–238.
-  A. Widmann, E. Schröger, and N. Wetzel, “Emotion lies in the eye of the listener: Emotional arousal to novel sounds is reflected in the sympathetic contribution to the pupil dilation response and the p3,” Biological Psychology, vol. 133, pp. 10–17, 2018.
-  M. Oliva and A. Anikin, “Pupil dilation reflects the time course of emotion recognition in human vocalizations,” Scientific Reports, vol. 8, no. 1, p. 4871, 2018.
-  M. H. Black, N. T. Chen, K. K. Iyer, O. V. Lipp, S. Bölte, M. Falkmer, T. Tan, and S. Girdler, “Mechanisms of facial emotion recognition in autism spectrum disorders: Insights from eye tracking and electroencephalography,” Neuroscience & Biobehavioral Reviews, vol. 80, pp. 488–515, 2017.
-  W.-L. Zheng, B.-N. Dong, and B.-L. Lu, “Multimodal emotion recognition using EEG and eye tracking data,” in 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2014, pp. 5040–5043.
-  L.-M. Zhao, R. Li, W.-L. Zheng, and B.-L. Lu, “Classification of five emotions from EEG and eye movement signals: complementary representation properties,” in 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE, 2019, pp. 611–614.
-  R. Adolphs and D. J. Anderson, The Neuroscience of Emotion: A New Synthesis. Princeton University Press, 2018.
-  L.-A. Perez-Gaspar, S.-O. Caballero-Morales, and F. Trujillo-Romero, “Multimodal emotion recognition with evolutionary computation for human-robot interaction,” Expert Systems with Applications, vol. 66, pp. 42–61, 2016.
-  P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and S. Zafeiriou, “End-to-end multimodal emotion recognition using deep neural networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1301–1309, 2017.
-  H. Ranganathan, S. Chakraborty, and S. Panchanathan, “Multimodal emotion recognition using deep learning architectures,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2016, pp. 1–9.
-  Y. Huang, J. Yang, P. Liao, and J. Pan, “Fusion of facial expressions and EEG for multimodal emotion recognition,” Computational Intelligence and Neuroscience, vol. 2017, 2017, DOI: 10.1155/2017/2107451.
-  W. Liu, W.-L. Zheng, and B.-L. Lu, “Emotion recognition using multimodal deep learning,” in International Conference on Neural Information Processing. Springer, 2016, pp. 521–529.
-  H. Tang, W. Liu, W.-L. Zheng, and B.-L. Lu, “Multimodal emotion recognition using deep neural networks,” in International Conference on Neural Information Processing. Springer, 2017, pp. 811–819.
-  W.-L. Zheng, W. Liu, Y. Lu, B.-L. Lu, and A. Cichocki, “Emotionmeter: A multimodal framework for recognizing human emotions,” IEEE Transactions on Cybernetics, vol. 49, no. 3, pp. 1110–1122, 2018.
-  J.-L. Qiu, W. Liu, and B.-L. Lu, “Multi-view emotion recognition using deep canonical correlation analysis,” in International Conference on Neural Information Processing. Springer, 2018, pp. 221–231.
-  S. B. Eysenck, H. J. Eysenck, and P. Barrett, “A revised version of the psychoticism scale,” Personality and Individual Differences, vol. 6, no. 1, pp. 21–29, 1985.
-  M. M. Bradley, L. Miccoli, M. A. Escrig, and P. J. Lang, “The pupil as a measure of emotional arousal and autonomic activation,” Psychophysiology, vol. 45, no. 4, pp. 602–607, 2008.
-  E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience, vol. 10, no. 3, p. 186, 2009.
J.-J. Tong, Y. Luo, B.-Q. Ma, W.-L. Zheng, B.-L. Lu, X.-Q. Song, and S.-W. Ma, “Sleep quality estimation with adversarial domain adaptation: From laboratory to real scenario,” in2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018, pp. 1–8.
-  W. Liu, J.-L. Qiu, W.-L. Zheng, and B.-L. Lu, “Multimodal emotion recognition using deep canonical correlation analysis,” ArXiv Preprint ArXiv:1908.05349, 2019.
-  R.-N. Duan, X.-W. Wang, and B.-L. Lu, “EEG-based emotion recognition in listening music by using support vector machine and linear dynamic system,” in International Conference on Neural Information Processing. Springer, 2012, pp. 468–475.
-  H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 8, pp. 1226–1238, 2005.
-  G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep canonical correlation analysis,” in International Conference on Machine Learning, 2013, pp. 1247–1255.
-  H. Chao, L. Dong, Y. Liu, and B. Lu, “Emotion recognition from multiband EEG signals using CapsNet,” Sensors, vol. 19, no. 9, p. 2212, 2019.
-  T. Song, W. Zheng, S. Peng, and C. Zhen, “EEG emotion recognition using dynamical graph convolutional neural networks,” IEEE Transactions on Affective Computing, 2018, DOI: 10.1109/TAFFC.2018.2817622.
-  X. Xing, Z. Li, T. Xu, L. Shu, B. Hue, and X. Xu, “SAE plus LSTM: A new framework for emotion recognition from multi-channel EEG,” Frontiers in Neurorobotics, vol. 13, 2019.
-  Z. Yin, M. Zhao, Y. Wang, J. Yang, and J. Zhang, “Recognition of emotions using multimodal physiological signals and an ensemble deep learning model,” Computer Methods and Programs in Biomedicine, vol. 140, pp. 93–110, 2017.
-  H. Saarimäki, A. Gotsopoulos, I. P. Jääskeläinen, J. Lampinen, P. Vuilleumier, R. Hari, M. Sams, and L. Nummenmaa, “Discrete neural signatures of basic emotions,” Cerebral Cortex, vol. 26, no. 6, pp. 2563–2573, 2015.