Multi-Scale Neural network for EEG Representation Learning in BCI

03/02/2020 ∙ by Wonjun Ko, et al. ∙ Korea University 0

Recent advances in deep learning have had a methodological and practical impact on brain-computer interface research. Among the various deep network architectures, convolutional neural networks have been well suited for spatio-spectral-temporal electroencephalogram signal representation learning. Most of the existing CNN-based methods described in the literature extract features at a sequential level of abstraction with repetitive nonlinear operations and involve densely connected layers for classification. However, studies in neurophysiology have revealed that EEG signals carry information in different ranges of frequency components. To better reflect these multi-frequency properties in EEGs, we propose a novel deep multi-scale neural network that discovers feature representations in multiple frequency/time ranges and extracts relationships among electrodes, i.e., spatial representations, for subject intention/condition identification. Furthermore, by completely representing EEG signals with spatio-spectral-temporal information, the proposed method can be utilized for diverse paradigms in both active and passive BCIs, contrary to existing methods that are primarily focused on single-paradigm BCIs. To demonstrate the validity of our proposed method, we conducted experiments on various paradigms of active/passive BCI datasets. Our experimental results demonstrated that the proposed method achieved performance improvements when judged against comparable state-of-the-art methods. Additionally, we analyzed the proposed method using different techniques, such as PSD curves and relevance score inspection to validate the multi-scale EEG signal information capturing ability, activation pattern maps for investigating the learned spatial filters, and t-SNE plotting for visualizing represented features. Finally, we also demonstrated our method's application to real-world problems.



There are no comments yet.


page 1

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Brain–computer interface (BCI) [4] is an emerging technology that enables a communication pathway between a user and an external device (e.g., a computer) through the acquisition and analysis of brain signals. Then these signals are translated into commands that are understood by a device, such as a computer. Owing to its practicality, electroencephalogram (EEG)-based non-invasive BCIs are widely used [27, 4, 16]. Earlier, Aricò et al. [2] categorized user-centered BCIs into two types, active/reactive and passive BCIs. In this paper, our focus is not only on active BCIs but also on passive BCIs. Generally, two types of brain signals such as evoked and spontaneous EEG are primarily considered for active/reactive BCIs [33]. Evoked BCIs exploit unintentional electrical potentials reacting to external or internal stimuli. Examples of evoked BCIs include steady-state visually evoked potentials (SSVEP) [18, 23] and event-related potentials [18]. Additionally, spontaneous BCIs use an internal cognitive process such as event related desynchronization and event related synchronization (ERD/ERS) in sensorimotor rhythms, e.g., motor imagery (MI) [18, 5] induced by imagining movements in addition to physical movement. Well-known examples of passive BCIs include the use of sleep/drowsy EEG signals for sleep stage classification or identifying mental fatigue to alert a driver of a dangerous situation and seizure EEG patterns for onset detection to provide the patient with a warning of a potential seizure.

Generally, machine learning-based BCIs consist of five main processing stages

[16]: (i) an EEG signal acquisition phase based on each paradigm, (ii) signal preprocessing (e.g.

, channel selection and band-pass filtering), (iii) feature representation learning, (iv) classifier learning, and finally (v) a feedback stage. Basically, most of machine learning-based BCI methods follow these processes, however, these methods need specific modification to classify a user’s intention/condition for each different paradigm

[16]. In other words, machine learning-based methods need to have prior knowledge of different EEG paradigms [4, 18, 23, 16, 29]. Therefore, conventional machine learning-based BCIs have discovered EEG representations through extremely specialized approaches, e.g., a common spatial pattern (CSP) [4] or its variants [30, 1] for MI signals and a canonical correlation analysis (CCA) [23] for SSVEP signals decoding.

While hand-crafted feature representation learning has a pivotal role in a conventional machine learning framework [4, 23, 28], deep learning-based representation has had remarkable results in the BCI community [27, 16, 33]

. These deep learning-based methods have integrated a feature extraction step with a classifier learning step such that those steps are jointly optimized, thereby improving performance. Among various deep learning methods, convolutional neural networks (CNNs) have the advantage

[14, 16, 6] of maintaining the structural and configurational information in the original data. In this respect, developing a novel CNN architecture for EEG signal representation has taken a center stage in the BCI studies [27, 31, 25, 14, 32, 15, 3, 7, 9].

However, some challenges still remain. First, existing CNN-based methods [27, 14, 25, 31, 7, 9] are mostly comprised of stacked convolutional layers. In other words, those existing methods extract features sequentially. But, ignoring multiple ranges of spectral-temporal features can cause a critical problem because EEG signal features for different subjects [12], paradigms [16], and types [2] are found in diverse ranges. For example, Fig. 1

depicts two different subjects’ MI EEG power spectral density (PSD) curves. Clearly, these two plots have different distributions from each other even though these PSDs are estimated by the same task. Therefore, it is important to capture multi-scale spectral information in EEGs for

general use in BCI, i.e., a generic method applicable to various types of BCIs.

In addition, those stacked CNN-based methods [14, 27, 9, 7] have numerous trainable parameters, thus requiring large amounts of training samples, whereas BCIs generally acquire a limited number of EEG trials [12]. Therefore, generalizing conventional stacked CNN-based methods in BCI is quite difficult because deep learning is a data-hungry problem, i.e., rarely generalized with a lack of data.

Finally, interpreting a learned stacked CNN from a neurophysiologically appropriate standpoint [11] is quite complicated because the CNN identifies complex patterns of data in latent space making a direct explanation difficult [11].

Fig. 1:

Power spectral density (PSD) curves of two different subjects’ MI EEG samples. The solid red line denotes the mean PSD and the shaded region exhibits the standard deviation for all trials. Clearly, these two different subjects show quite different PSD patterns for the same paradigm (motor imagery).

In this study, we propose a novel deep learning-based BCI method to mitigate the previously discussed difficulties. The main contributions of our study are as follows:

  • First, we propose a novel CNN architecture, that is applicable independently from the input paradigm or type of EEG and can represent multi-scale spatio-spectral-temporal features.

  • Second, the proposed method achieved positive performance on five different datasets for four differnt paradigms (two for active BCIs and two for passive BCIs). The proposed method outperformed or was similar to state-of-the-art linear and deep learning methods, which were individually designed for each specific paradigm.

  • Last, we analyze the proposed network using a variety of techniques.

The rest of this paper is organized as follows: Section II reviews previous research on various EEG representation learning via linear model-based or deep learning-based methods. In Section III, we propose a novel and compact deep CNN that classifies multi-paradigm EEG by representing multi-scale spatio-spectral-temporal features. Section IV presents experimental settings and results by comparing the proposed method and comparable baselines. In Section V, we analyze our proposed method from several points of view. Finally, Section VI summarizes the proposed study and suggest future research directions.

Ii Related Work

Learning a class-discriminative feature representation of EEG is still challenging in both theory and practice. Numerous prior studies have attempted to extract features from EEGs. In this section, we briefly discuss linear methods and deep learning models used for EEG signal representation.

Ii-a Linear Models

Over the past decades, CSP [4] and its variants [1, 30] have played an essential role in decoding MI. Blankertz et al. [4] and Ang et al. [1] independently used a spatial filtering-based method for classifying MI. Ang et al. [1] band-pass filtered EEG data before applying CSP, thereby attempting to decode EEG signals in a spatio-spectral manner. They named the proposed method filter bank CSP (FBCSP). Furthermore, Suk and Lee [30] also decoded MI by jointly optimizing multi spectral filters in a Bayesian framework.

CCA is commonly utilized for detecting SSVEP [23] owing to its practical ability to be implemented without the calibration stage. The standard CCA method [23] deployed sinusoidal signals as reference signals and estimated canonical correlation between the reference signals and input EEG signals to identify an evoked frequency in SSVEP EEGs.

In addition, to characterize the sleep stage, entropy calculation-based approaches were frequently used. Sanders et al. [26]

classified the sleep stage using the spectral-temporal features of EEGs learned from short-time Fourier transformation. Furthermore, Zheng and Lu

[34] focused on identifying a driver’s mental fatigue during driving. They [34] applied filter banks to EEG signals to extract spectral information, and then transformed the filtered EEG signals to spectral space, i.e., estimated PSD of filtered EEG signals. By doing so, Zheng and Lu [34] effectively assessed the regression score of the driver’s mental states which were labeled using the PERCLOS index, a measure of neurophysiological fatigue.

Earlier, Shoeb and Guttag [28] applied a machine learning approach to extract and classify the spatio-spectral-temporal features of epileptic seizure EEG signals. Specifically, these authors [28]

used filter banks in a channel-wise manner to capture the spatio-spectral information. Then, by encoding the temporal evolution of extracted spatio-spectral feature vectors, they


effectively constructed epileptic seizure EEG signal spatio-spectral-temporal features and classified the seizure and non-seizure features utilizing a support vector machine (SVM). Recently, spectral features derived from a principal component analysis (PCA)

[17] exhibited superior performance for seizure onset detection. In particular, Lee et al. [17] band-pass filtered raw signals and calculated PSD. Then they [17] applied PCA for the extraction of EEG signal spectral features.

These practical linear model-based BCI methods [4, 1, 30, 23, 17, 26, 34] have demonstrated credible performance. However, these methods need to have certain prior neurophysiology knowledge [16], because their feature extraction stages are specifically designed for each EEG paradigm. Conversely, our method does not need to be specialized for different paradigms.

Ii-B Deep and Hierarchical Models

Recently, deep learning methods, especially CNNs have achieved promising results in EEG signal decoding researches. For instance, Schirrmeister et al. [27] introduced Shallow ConvNet, Deep ConvNet, Hybrid ConvNet, and Residual ConvNet. These authors [27] evaluated how well various proposed CNNs decoded MI. Ko et al. [14] also proposed a novel CNN architecture which is inspired by a recurrent convolutional neural network [20] for MI classification, deep recurrent spatio-temporal neural network (RSTNN).

While a standard CCA [23] has obtained state-of-the-art performance in SSVEP BCI, Kwak et al. [15] developed a CNN for SSVEP feature representation learning. These authors simply combined spatial and temporal convolution to enable the system to learn data patterns in the latent space, thereby correctly generalizing EEG signal features. Meanwhile, Waytowich et al. [32] applied EEGNet [16] to the SSVEP paradigm and achieved a higher performance than that of the standard CCA [23].

Supratak et al. [31]

developed a deep neural network for sleep stage detection. More precisely, they combined a CNN for representation learning and a recurrent neural network for sequential residual learning

[31]. Furthermore, they trained the deep learning model in two separate steps, optimizing the model by individual pre-training and fine-tuning. In the meantime, Gao et al. [9] proposed an EEG-based spatio-temporal convolutional neural network (ESTCNN) for driver fatigue evaluation. The ESTCNN [9] simply convolved the band-pass filtered EEG to represent temporal dependencies and flattened the extracted features for spatial features fusion. Lastly, densely connected layers were used for the identification of a user’s condition [9].

To detect a seizure type, Asif et al. [3]

proposed a multi-spectral deep feature learning using a deep CNN, SeizureNet. These authors

[3] transformed the EEG signals to spectral space using saliency-encoded spectrogram generation and fed the extracted spectral features to a deep neural network. In the meantime, Emami et al. [7] independently proposed another CNN-based approach for detecting seizure onset. They [7] band-pass filtered and segmented the input EEG patterns, and then used a deep CNN for classification.

Recently, Lawhern et al. [16, 32] proposed a novel CNN, so-called EEGNet. Unlike other linear or deep learning-based methods, the EEGNet classified various EEG paradigms using a single architecture, i.e., not specifically tuned for different paradigms. Further, Lawhern et al. [16] introduced a separable convolution [6] and used it as a parameter reduction method.

On the one hand, the deep and hierarchical models decoded the EEG signals well without any custom feature extraction stage for their respective paradigm [27, 14, 15, 3, 7, 31, 9] or even various paradigms [16, 32]. On the other hand, the deep CNNs extracted the EEG features at a sequential level using stacked convolutional layers without exploiting multi-scale spectral representation. Conversely, the proposed method exploits multi-scale spatio-spectral-temporal features irrespective of the input EEG paradigms.

Fig. 2: Architectural framework of our multi-scale neural network (MSNN). In the proposed network, first, an input EEG is temporally convolved to expand the number of features, where ,

, lReLU denote the sampling frequency rate, the number of output filter maps of the first layer, and leaky rectified linear unit activation function respectively. Then, a set of temporal separable convolutions extracts

spectral-temporal features ( and respectively denote the kernel size and output feature maps of -th temporal separable convolution). At the same time, a set of spatial convolutions represents spatial features, where denotes the number of acquired EEG channels. Then, the multi-scale features are concatenated and fed into the global average pooling layer [21]. Finally, the dense layer classifies the class of input EEG by exploiting multi-scale features where denotes the number of output nodes.

Iii Methods

In this section, we propose a deep multi-scale neural network (MSNN), which can represent EEG features from different paradigms by exploiting spatio-spectral-temporal information at multi-scale.

Iii-a Multi-Scale Neural Network

As mentioned previously, an FBCSP [1] is one of the most successful models to exploit EEG signal multi-scale features, especially for MI. Thus, many successful MI EEG signal decoding algorithms [27, 30] or even other paradigm classification algorithms [16] are inspired by the FBCSP [1] model. In this study, the proposed multi-scale neural network (MSNN) also learns multi-scale feature representations. However, the network automatically learns from data through discriminative multiple spectral filters, rather than manually defining multi-frequency bounds as in FBCSP [1]. Basically, our proposed method consists of three types of blocks: (1) a spectral-temporal feature representation block, (2) a spatial feature representation block, and, (3) a classification block, as depicted in Fig 2.

First, in the spectral-temporal feature representation block, stacked convolutional layers extract EEG data spectral-temporal features, such as existing EEG classification methods. However, the proposed model exploits intermediate activations for gathering multi-scale spectral information. Then, the spatial feature representation block discovers spatial patterns from the extracted multi-scale features. Finally, these multi-scale spatio-spectral-temporal features are concatenated, pooled, and fed into the densely connected layer for classification.

Iii-B Spectral-Temporal Feature Representation Block

Given an input EEG data , we reshape it in the form of , i.e., , where and denote the number of electrode channels and timepoints, respectively.

In the MSNN, the input EEG data are temporally convolved in a channel-wise manner by a temporal convolutional layer to expand the number of feature maps. Thus, the activated features have the form , where ( and are the sampling frequency and the feature map dimension for the first temporal convolution layer.). The main benefits of using a separable convolution [6, 16] are a significant reduction of tunable weights in the model and, more importantly, an efficient and explicit decoupling of the relationship between the temporal and the feature map dimensions of the input features. This is accomplished by learning kernels independently for each feature map. Thus, as in BCI literature, the separable convolution [6] enables the system to learn temporal kernels individually from the feature map dimensions (using a depthwise convolution [6]), and then optimally re-combine the feature maps (using a pointwise convolution [6]).

In this block, by setting a kernel size of , where denotes the kernel size of the -th temporal separable convolution, the -th temporal separable convolutional layer represents EEG signal features in the range of sec, hence, Hz, where is a frequency property extracted at the first temporal convolutional layer. Therefore, the spectral-temporal feature representation layers can deal with different timepoints or frequency ranges by using various kernel sizes for the input EEG data.

Additionally, each different layer that has a different kernel size extracts features in different frequency and timepoint ranges. In other words, a spectral-temporal convolution layer with a larger kernel represents longer-term temporal features, i.e., a lower-range of spectral features and vice versa. Then, the MSNN exploits intermediate activations from each layer, thus learning multi-scale feature representations.

In addition, a separable convolution [6] only operates convolutions in a cross- way, thus, the number of parameters is small compared to a conventional convolution. For instance, while a -th separable temporal convolution has only parameters, the conventional convolution with the same size kernel has parameters, where denotes the feature maps dimension of -th layer.

Furthermore, in this processing, as described above, the MSNN uses its intermediate activations to exploit multi-scale representations. In other words, the proposed network obtains numbers of spectral-temporal features , like:


where , , and , respectively, denote the -th separable convolution, the first temporal convolution, and a function composition between arbitrary functions and , i.e., . Thus, by extracting features , the MSNN effectively represents the spectral-temporal features from the multi-scale viewpoint, thereby automatically enhancing generalization. In addition, as all inputs are


before each separable temporal convolution, the output features have the same dimension for the channels and timepoints, except for the feature map dimension. Thus, the -th spectral-temporal feature now has the form .

Iii-C Spatial Feature Representation Block

In the spatial feature representation block, a common spatial convolution is used for feature extraction. In this block, the kernel size is constrained to be equal to the number of EEG channels, hence, a convolution with a kernel of is used. Additionally, by setting the kernel size to be the same as the number of electrode channels, similar to many existing deep learning-based BCI methods [27, 14, 16], the proposed MSNN extracts spatial information from the original EEG acquisition channel distributions of multi-scale spectral temporal features. Then, the MSNN can obtain neurophysiologically plausible information from the input data distribution.

Furthermore, the spatial feature representation can be applied unrestrictedly, thus in the proposed method, we add this block after every extracted spectral-temporal features , like,


where denotes the -th spatial convolution and is spatio-spectral-temporal features estimated by the and . We use valid paddings for every spatial convolution, thus the -th spatio-spectral-temporal feature has the form . By setting the number of spatial convolutions to be identical to the number of spectral-temporal convolutions, unlike many previous researches using deep learning for BCI [27, 14, 16, 3, 31], we extract spatial features of each range from spectral-temporal features. In other words, unlike many previous stacked CNNs, the proposed architecture uses every intermediate activated feature set to exploit spatial information, thereby creating the capability to extract various ranges of EEG features at multi-scale.

Iii-D Classification Block

For classifier learning, because we have numbers of different (or same when ) size of spatio-spectral-temporal features , , the classifier in the proposed method has to concatenate the features in the feature map dimension. Thus, the concatenated feature is represented as:


where denotes the concatenation operation.

For the classifier network, let us assume that the number of output classification nodes is denoted by and we use a single linear mapping layer. Then, we need to train the large number of parameters (note that we disregard the bias term for a convenient calculation) because has the form , and it would still require a large number of training samples. Therefore, after representing the input EEG data multi-scale spatio-spectral-temporal features, the proposed MSNN has one extra operation for reducing the trainable weights. Unlike the existing deep learning-based BCI methods [16, 27, 14, 15, 3, 31]

, global average pooling (GAP), which is widely used in the computer vision field

[21] is performed.

The GAP layer [21]

, a type of pooling layer, averages nodes from each feature map, thus eliminating the requirement for any window size or stride. By applying GAP

[21], our proposed MSNN efficiently extracts significant features. From the BCI literature, the GAP layer [21] can be understood to be a method that can emphasize an important frequency range and its surrounding area for each feature map dimension. Thus, for the extracted multi-scale features in the MSNN, the GAP layer [21] stresses the crucial spectral-temporal part resulting in concise information for the final decision making.

Additionally, the GAP layer [21] significantly reduces the number of classifier parameters used in the proposed MSNN. Specifically, after the GAP layer , the extracted feature is reduced to the form , whereas the feature without GAP has the form . Therefore, we drastically suppress the trainable parameters in the classifier from to .

Then, the MSNN prediction, , for the input EEG data, , is as follows:


where and respectively denote the weight matrix and bias of the classifier.

Finally, the cross-entropy loss, , that is used for network training is calculated by the prediction and the label :


where and

respectively denote the mini-batch sizes and the cross-entropy loss function, and

and denote the prediction and ground-truth label for the -th training sample in the mini-batch111All codes used in our experiments are available at ‘’.

Iv Experiments

In this section, we describe the datasets used for performance evaluation, our experimental settings, and baseline settings. Furthermore, we present the performance of our method and competing methods.

Iv-a Datasets and Preprocessing

In this study, we used five different publicly available datasets to validate the proposed method on four different EEG data paradigms.

Iv-A1 Motor Imagery

First, we used two big datasets for MI EEGs, GIST-MI [5]222Available at and KU-MI [18]333Available at,444Experimental results of the KU-MI dataset [18] are reported in Supplementary B.. The GIST-MI [5] dataset consists of two different MI tasks: left-hand and right-hand MI that are acquired from 52 subjects. All EEG signals were recorded from 64 Ag/AgCl electrode channels according to the standard 10-20 system, sampled at 512Hz. Each class contained 100 or 120 trials, and each trial was a 3 sec long MI task. Because this dataset is not separated into training and test samples, we conducted a five-fold cross-validation for a fair evaluation. For the MI datasets, we preprocessed signals by applying a large Laplacian filtering555When the target channel does not have four nearest neighbors, we just used available channels and their average value to filter the target channel., baseline correction by subtracting the mean value of the fixation signal from each MI trial, and band-pass filtering between 4 and 40Hz. Then, we removed the first and last 0.5 sec from each trial, and finally applied Gaussian normalization. We applied the same mean and standard deviation values for normalization to the test samples. The multi-channel EEG signals were only shifted and scaled by their respective channel-wise mean and standard deviation values. Thus, inter-channel relations inherent in the data were preserved.

Iv-A2 Steady-State Visually Evoked Potentials

We also used the KU-SSVEP dataset [18]3 for SSVEP decoding experiments in this study. This KU-SSVEP dataset [18] was acquired from 54 subjects and recorded from 62 Ag/AgCl electrode channels using the 10-20 system. The KU-SSVEP dataset [18] contains four EEG classes from target stimuli at 5.45, 6.67, 8.57, and 12Hz, and each class has 25 EEG trials of training and testing samples for each session. We preprocessed the SSVEP signals by applying band-pass filtering between 4 and 15Hz and selected eight channels in the occipital region, ‘PO3, POz, PO4, PO9, O1, Oz, O2, and PO10,’ because this region is widely used for SSVEP classification [32].

Iv-A3 Drowsiness

With respect to passive BCI [2], we considered two different paradigms, seizure EEG signals [29] and vigilance EEG signals [34]. Owing to its theoretical and practical benefits, in this study, we conducted experiments identifying drivers’ mental fatigue. We also used a publicly available SEED-VIG EEG dataset [34]666Available at: for the drowsy driving task data. This dataset [34] consists of 23 experiments, i.e., trials, and each trial is recorded for approximately 2 hours while simulated driving occurs. The EEG signals are acquired from 17 electrode channels according to the 10-20 system and sampled at 200Hz [34]

. For this dataset, we band-pass filtered EEG signals in the range between 0.5 and 40Hz, each epoch was 8 sec in length. Because the dataset was originally labeled using

PERCLOS levels [34], we categorized the label vectors into three classes, awake, tired, and drowsy with two threshold values(0.35 and 0.7) [34]. Then, for every 23 experiments, a five-fold cross-validation was used for performance estimations.

Iv-A4 Seizure

Finally, we conducted seizure onset detection experiments with the widely used and publicly available CHB-MIT [29]777Avaliable at: dataset. The CHB-MIT dataset [29] contains EEG data from 24 subjects sampled at 256Hz acquired from 23 electrode channels (24 or 26 in a few cases) according to the 10-20 system. In this work, we selected EEG trials that have the same 23 channels montage and removed some trials acquired from the different montage. By following [28], we used a leave-one-record-out cross-validation. More precisely, we trained the proposed method using all non-seizure records and all seizure records but one, and tested the model on the remaining seizure record [28]

. Then, we repeated this process for the number of seizure records in the dataset, thus, each seizure record was tested. For training, the test trial epochs were 10 sec in length. During validation and testing session, a 10 sec length EEG signal was input into the proposed network using a 1/256 stride. Then, we observed whether the probability values for each EEG signal timepoint was ictal or normal.

For all datasets, the training samples were randomly selected and split again into training and validation samples for model selection. Specifically, we divided the training samples at a 9:1 ratio for each subject and used them for training and model selection respectively.

Method GIST-MI [5] KU-SSVEP [18] SEED-VIG [34] CHB-MIT [29]
Classification accuracy Number of false detections
MeanStd. MeanStd. False Positive (Drowsy) Mean (Mean latency)
CSP + LDA [4] .66.14 - - - -
FBCSP + LDA [1] .68.15 - - - -
CCA [23] - .94.09 - - -
PSD + SVM [34] - - 31.2015.47 6.74 -
Shoeb and Guttag [28] - - - - 5.35 (5.11)
Shallow ConvNet [27] .63.11 .52.20 34.8919.13 6.51 19.21 (8.48)
Deep ConvNet [27] .61.07 .96.08 41.3121.04 8.65 8.74 (7.52)
RSTNN [14] .69.12 .65.20 39.8422.56 8.08 24.35 (9.31)
ESTCNN [9] .67.10 .79.17 41.1021.31 8.71 6.41 (7.01)
EEGNet [16, 32] .64.07 .93.10 46.6322.10 11.26 5.40 (6.23)
MSNN (Proposed) .81.12 .93.08 31.1017.29 5.38 5.35 (4.98)
TABLE I: Performance evaluations. The Method column denotes all used classification/detection methods including baselines and the proposed method on the various datasets, GIST-MI [5], KU-SSVEP [18], SEED-VIG [34], and CHB-MIT [29] EEG dataset. Each cell depicts the average performance and the standard deviation of all subjects (or trials for the SEED-VIG [34]). For classification performance on the SSVEP dataset, we used different kernel sizes for EEGNet [32] and the proposed method. These values are marked by and , respectively.

Iv-B Experimental Settings

In our work, we compared our method with paradigm-specific linear model-based and deep learning-based methods for each EEG paradigm.

Iv-B1 Linear Models - Motor Imagery

First, we built a CSP with a linear discriminant analysis (CSP + LDA) [4] and an FBCSP with an LDA (FBCSP + LDA) [1] for MI decoding. We used four filters and regularized covariance for the CSP [4] and FBCSP [1]. Additionally, we also used nine non-overlapped filter banks in the 440Hz range, i.e., 48, 812, , 36

40Hz, and, finally selected 10 features using the mutual information-based feature selection method FBCSP


Iv-B2 Linear Models - Steady-State Visually Evoked Potentials

We also built a standard CCA [23] for SSVEP classification. We set reference signals for each stimulus including second harmonics. Furthermore, the standard CCA [23] does not require training samples for the optimization, thus we only estimated each session in its entirety from the KU-SSVEP dataset [18] for the CCA performance estimation.

Iv-B3 Linear Models - Drowsiness

For the drowsy state detection experiment, we estimated the filter-banked input EEG data PSD in a channel-wise manner for extracting spatio-spectral features and classified the learned features using an SVM with a radial basis function (RBF) kernel (

where denotes the input feature dimension) [34].

Iv-B4 Linear Models - Seizure

In addition, we also reimplemented Shoeb and Guttag [28]’s method for the seizure onset detection experiment. We applied the PSD to the EEG data in a channel-wise manner. Then, the 3 sec time window time evolution [28] method was used for capturing temporal information. Finally, the represented spatio-spectral-temporal features were fed into an SVM using an RBF kernel ().

Iv-B5 Deep Neural Networks - Motor Imagery

We also implemented deep learning-based BCI models888See ‘Appendix A: Architectural Details of Deep Models for BCIs’ for more detail architectures and learning schedules. for MI. Basically, most of the existing deep learning models [27, 14, 7, 9] have focused on a paradigm-specific BCI task. However, we conducted experiments over all types of datasets for each deep learning model to demonstrate the validity of the proposed method. We built a Shallow ConvNet and a Deep ConvNet as proposed by Schirrmeister et al. [27]. The Shallow ConvNet [27] consists of two convolutions, temporal and spatial, with a squaring nonlinear activation, an average pooling, and a logarithmic activation. The Deep ConvNet [27] has five convolutions, temporal and spatial, and three additional temporal convolutions. The RSTNN [14] is also used for these experiments. This network [14] consists of three recurrent convolutional layers, and each recurrent convolutional layer has three recurrent temporal convolutions [20] with a spatial convolution.

Iv-B6 Deep Neural Networks - Steady-State Visually Evoked Potentials

For the SSVEP decoding experiment, we exploited another version of EEGNet for SSVEP EEG [32]. We used different kernel sizes for this EEGNet [32] as Waytowich et al. proposed. The SSVEP classification performance estimated by this version [32] is marked by in the classification table.

Iv-B7 Deep Neural Networks - Drowsiness

The ESTCNN [9] which is proposed for mental fatigue classification has three core blocks. Each block in the ESTCNN [9]

consists of three temporal convolutions with a max pooling layer with the exception of the last block that uses an average pooling layer instead of the max pooling.

Iv-B8 Deep Neural Networks - Multi-paradigm

Finally, we also implemented the EEGNet [16] in our study. As previously mentioned, we used different kernel sizes for two different EEGNets, [16] and [32]. Nevertheless, the basic architecture of the network was the same for various EEG paradigms, having a temporal convolution, depthwise spatial convolution [6], and separable temporal convolution [6].

Iv-B9 Proposed Multi-Scale Neural Network

While training our proposed network, depicted in Fig. 2, we set a mini-batch size of 16, an exponentially decreasing learning rate (initial value: 0.03, decreasing ratio per epoch: 0.001), and an Adam optimizer. For the first temporal convolution, we used a conventional temporal convolution with the kernel size of () and . Furthermore, we used three spectral-temporal feature representation convolutions, i.e., , and set , , and with , , and . Then, for the spatial feature representation block, we used three spatial convolutions because the number of spatial convolutional layers must be the same as the number of spectral-temporal separable convolutional layers. The proposed method used different kernel sizes for the SSVEP dataset, similar to the EEGNet [32] due to the fact that SSVEP EEG data is created by target frequencies [18, 23]. For the KU-SSVEP dataset [18], we set , , and for the spectral-temporal feature representation block, and used the same settings for the others. The SSVEP classification performance estimated by this method is marked by

. Additionally, batch normalization was performed after every convolution. Finally, for the classification block, all activated features from the

spatio-spectral-temporal block were concatenated and fed into the GAP [21] layer. Then, after flattening, the multi-scale

features were linearly mapped by a dense layer. In this proposed network, a leaky rectified linear unit (ReLU) activation function, an L1-L2 regularizer (

and ), and a Xavier initializer [10] are used for all tunable parameters except for the final decision layer that is activated by a softmax activation function instead of a leaky ReLU. We selected model components that demonstrated the best performance for validation, i.e., model selection samples, as mentioned previously.

Iv-C Experimental Results

Iv-C1 Motor Imagery

All experimental results are summarized in TABLE I. Our proposed network clearly outperformed other baselines for MI EEG signal decoding. Importantly, the proposed network achieved a higher accuracy than those methods designed specifically for MI classification: CSP [4], FBCSP [1], Shallow ConvNet [27], Deep ConvNet [27], and RSTNN [14]. With this clear improvement in accuracy, we could expect that our proposed method is one step closer to MI-based BCI commercialization.

Iv-C2 Steady-State Visually Evoked Potentials

Our proposed MSNN achieved a slightly lower performance than CCA [23], Deep ConvNet [27], and EEGNet [32] in the SSVEP classification. However, the difference in performance between our MSNN and the other three baselines, CCA [23], Deep ConvNet [27], and EEGNet [32], was reasonably small and the proposed method performed with a credible accuracy score.

Iv-C3 Drowsiness

The proposed MSNN made the smallest number of mistakes in decision making for passive BCI [2]. In particular, the proposed method detected a driver’s mental fatigue, i.e., drowsiness, from the EEG signals. Our proposed method predicted 31.10 incorrect trials from a total of 177 samples on average. Furthermore, accurately detecting a drowsy state is one of the most important MSNN capabilities for practical use. Our proposed model only made 5.38 mistakes out of 35 drowsy trials on average, thus exhibiting the highest precision score.

Iv-C4 Seizure

Finally, the MSNN incorrectly identified 5.35 seizures among 178 total test seizure samples. Furthermore, our proposed network was the fastest for detecting seizures, i.e., it exhibited the shortest latency time (approximately 4.98 sec on average) among various methods. In other words, our proposed method demonstrated the best performance even with the shortest latency time. Additionally, the proposed model correctly identified approximately 92% of the seizures within 4.98 sec. We do not present the standard deviation values for this seizure detection experiment because each test trial consisted of different numbers of seizures.

V Analyses and Discussions

In this section, we analyzed our proposed network. We determined the feature response by estimating PSD values and relevance scores [22] to show the multi-scale learning benefits. We also visualized learned weights and represented features of the proposed method using different methodologies, activation pattern maps [11] and t-SNE plots. Additionally, we observed a practical use for the proposed method, especially for drowsiness and seizure detection experiments.

Fig. 3: PSD curves (left) and relevance scores [22] (right) for subject 48 (top) and subject 52 (bottom) from the GIST-MI dataset [5]. For the PSD curves, the solid red line and the shaded region exhibit the mean and standard deviation of PSD values of all trials, respectively. We observed that our proposed MSNN concentrates features from the lower frequency range for subject 48 and a wide range for subject 52.

V-a Multi-Scale EEG Feature Extraction

To demonstrate the multi-scale information capture ability of our proposed method, we estimated and plotted PSD values and relevance scores [22] for MI EEG samples. Specifically, we estimated PSD values for subject 48 and 52 in the GIST-MI dataset [5]’s EEG samples from channels on the motor cortex. Additionally, we calculated relevance scores for those subjects by a layer-wise relevance propagation [22]. In our results, all classification methods evenly demonstrated well-generalization (baselines: 80% and proposed: 85%) for subject 48, whereas only the proposed method achieved superior performance for subject 52 (baselines: 65% and proposed: 80%). As Fig. 3 shows, subject 48’s EEG samples are highly activated at the range, while subject 52’s samples do not show any clear trend at the range, but in a wider range. Our proposed network exhibited a high relevance score at the low-frequency range for subject 48 who exhibited a clear trend at the low-frequency range. Furthermore, the relevance scores for subject 52 were roughly alike for the wider range, where subject 52’s PSD demonstrated a less clearly defined trend.

From this phenomenon, we can conclude that our proposed MSNN can capture important features on the multi-scale range, not only in the frequency of interest. In other words, while other existing methods gather spatio-spectral-temporal information at the sequential level, the proposed network exploit multi-scale features, thereby improving learning ability999Randomly selected additional results are reported in Supplementary C..

(a) Topologically visualized activation pattern maps [11] of comparable baselines, and three spatial convolutions in the proposed network. All these visualized patterns here are estimated by the first subject’s first fold EEG signals in the GIST-MI dataset [5] and normalized in a range between 0 and 1. Finally, [a.u.] denotes an arbitrary unit.
(b) Visualization of t-SNE transformed represented features for test SSVEP EEG samples. The first three figures denote extracted features by the first, second, and final spatial convolutional layers of the proposed method. The final figure exhibits the GAP [21]-ed feature, which is used for final decision making.
(c) Normalized averaged confusion matrices estimated by comparable baselines and the proposed method using the SEED-VIG dataset [34].
(d) Changes of probabilities estimated by comparable baselines, and the proposed method. These plots demonstrate the probability of whether input EEG is ictal or not. Two dot-dashed lines (magenta) denote the seizure onset and ending, respectively, labeled by Shoeb [29].
Fig. 4: Investigation of learned weights (Fig. 3(a)) and represented features (Fig. 3(b)), and inspection of the practical usage of the proposed network (Fig. 3(d) and 3(c)).

V-B Activation Patterns

Earlier, Haufe et al. [11] proposed an activation pattern which is based on a forward-backward modeling in signal processing. The activation pattern method [11] provides a way to interpret weight matrices in multivariate neuroimaging, as presented in the signal processing literature.

The proposed method, clearly, decodes the input EEG signal to the corresponding label, i.e., inferring a user’s intention or condition from an observed EEG pattern. Therefore, it is a backward process computational model. Hence, for a concrete and meaningful understanding of learned layers, it is essential to reverse this backward process model to a forward process. Finally, in this work, we estimated and visualized the activation patterns of the learned weights shown in Fig. 3(a). We extracted the spatial convolutions of Shallow ConvNet [27], Deep ConvNet [27], RSTNN [14], EEGNet [16], and the proposed model. Then, we estimated activation patterns and visualized them in a topological manner. We do not estimate ESTCNN [9] activation patterns because the ESTCNN [9] does not have any spatial feature representation layers and those visualized patterns are estimated by the first subject’s first fold data in the GIST-MI dataset [5]. Finally, we normalized the activation patterns in [0, 1] range before visualization.

In this investigation, we observed right-lateralized brain activation/deactivation patterns, and the same patterns in the left hemisphere when a user imagined the movement of left-hand and right-hand respectively. Furthermore, the proposed model shows relatively clearer patterns than the other models, thus, we can conclude that our method thoroughly represents input EEG signal spatial features.

V-C Discriminative Power of EEG Representations

To validate the representation ability of the proposed network, we plotted t-SNE transformed learned features shown in Fig. 3(b). Specifically, we exhibited extracted features from test SSVEP EEG samples from the first, second, and third spatio-spectral-temporal feature representation layers, i.e., , , and (first three figures in Fig. 3(b)). Then, we also depicted the final learned feature, i.e., . These intermediate features , , and are temporally pooled just for visualization like . We used the first subject’s first session data in the KU-SSVEP dataset [18], and used a learning rate of 200, a perplexity of 10 for the t-SNE calculation, and visualization.

From these visualized represented features, we could observe that is more class-discriminative than the other intermediate features. Additionally, we observed a trend, which demonstrated that a feature learned by a deeper layer is more disentangled than others learned by shallower layers.

V-D Mental Fatigue Classification

For the application analysis of drowsiness detection, we visualized confusion matrices that were estimated by the experimental results of the SEED-VIG dataset [34] in Fig. 3(c). Because the labels that identify the mental status were decided using the PERCLOS levels [34], the label at the boundary of the two classes may not be accurate. In this respect, we can conclude that the proposed method is useful for drowsiness state detection because false detections predicted by the proposed method are mostly at the boundaries between classes, e.g., the ‘awake’ vs. ‘tired’ or ‘tired’ vs. ‘drowsy’ case. In addition, for practical application, it is essential to detect the drowsy state accurately to avoid unexpected situations, such as a car accident. The proposed method achieved the highest and most promising result for detecting drowsiness among other baselines, i.e., it achieved the highest precision score for identifying the drowsy state. Therefore, we can also expect that our proposed method can be applied in real-world situations.

V-E Early Seizure Detection

Early detection [17] of seizures is one of the most important potential practical applications for this work. Hence, we also validated tthe benefits of the proposed method in early seizure detection. Specifically, in the training phase, the MSNN was trained using normal and ictal EEG samples with binary labels (e.g., 0: normal and 1: seizure) similar to a conventional training framework. In the testing phase, we input the EEG samples using a sliding window with a 1/256 stride. Then, we observed the change in the output probability values to determine the character of the input (normal or ictal).

Additionally, we visualized these changes in Fig. 3(d) (We used the first subject’s third EEG trial in the CHB-MIT dataset [29] for the visualization). In Fig. 3(d), magenta-colored dot-dashed lines denote the seizure onset and offset. Colored solid lines denote the probability change of various methods. In this visualization, we observed that the proposed method is more stable for detecting seizures. Specifically, the proposed method detects the seizure EEG signal as a seizure state with a strong probability (almost 1), whereas the other methods have low confidence values (Shoeb and Guttag [28]’s method and ESTCNN [9]) or even make incorrect decisions regarding the seizure state (Shallow ConvNet [27], Deep ConvNet [27], RSTNN [14], and EEGNet [16]).

Vi Conclusion

In this work, we proposed a novel and compact deep multi-scale neural network which can learn multi-scale EEG signal features. In our experiments, we validated our novel architecture’s effectiveness over diverse EEG paradigms, MI, SSVEP, seizure, and drowsy EEG signals. Furthermore, we inspected the relevance scores to demonstrate the benefits of the multi-scale feature extraction ability, investigated activation pattern maps to understand what types of neurophysiological phenomena were learned by our CNN model, and visualized the t-SNE of learned features to examine the ability of our method to differentiate feature classes. Finally, we also demonstrated that the proposed method can be used for precise drowsiness detection and early seizure detection. In all these respects, we concluded that the proposed deep multi-scale neural network offers significant potential for interpreting EEG signals. Additionally, because the proposed network is clearly generalizable to various EEG paradigms, it is expected to have promising benefits that can apply to neural architecture search methods [24], thereby making a deep learning-based BCI adaptable to different paradigms.

From a practical standpoint, many limitations remain with regard to the inter-subject variation [12] in performance. In the present work, we experimented in a subject-dependent manner. In general use, it is important for a BCI system to be useful for any subject operating in a subject-independent way. Thus, in the future, we will focus on developing a subject-neutral multi-paradigm BCI system using adversarial learning [8, 13] or other learning strategies [19].


This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (No. 2017-0-00451, Development of BCI based Brain and Cognitive Computing Technology for Recognizing User’s Intentions using Deep Learning).


  • [1] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan (2008) Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface. In IEEE International Joint Conference on Neural Networks, pp. 2390–2397. Cited by: §I, §II-A, §II-A, §III-A, §IV-B1, §IV-C1, TABLE I.
  • [2] P. Aricò, G. Borghini, G. Di Flumeri, N. Sciaraffa, and F. Babiloni (2018) Passive BCI Beyond the Lab: Current Trends and Future Directions. Physiological Measurement 39 (8), pp. 08TR02. Cited by: §I, §I, §IV-A3, §IV-C3.
  • [3] U. Asif, S. Roy, J. Tang, and S. Harrer (2019) SeizureNet: A Deep Convolutional Neural Network for Accurate Seizure Type Classification and Seizure Detection. arXiv preprint arXiv:1903.03232. Cited by: §I, §II-B, §II-B, §III-C, §III-D.
  • [4] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. Muller (2008) Optimizing Spatial Filters for Robust EEG Single-trial Analysis. IEEE Signal Processing Magazine 25 (1), pp. 41–56. Cited by: §I, §I, §I, §II-A, §II-A, §IV-B1, §IV-C1, TABLE I.
  • [5] H. Cho, M. Ahn, S. Ahn, M. Kwon, and S. C. Jun (2017) EEG Datasets for Motor Imagery Brain–Computer Interface. GigaScience 6 (7), pp. gix034. Cited by: §I, §IV-A1, TABLE I, Fig. 3, 3(a), §V-A, §V-B.
  • [6] F. Chollet (2017) Xception: Deep Learning with Depthwise Separable Convolutions. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 1251–1258. Cited by: §I, §II-B, §III-B, §III-B, §IV-B8.
  • [7] A. Emami, N. Kunii, T. Matsuo, T. Shinozaki, K. Kawai, and H. Takahashi (2019) Seizure Detection by Convolutional Neural Network-based Analysis of Scalp Electroencephalography Plot Images. NeuroImage: Clinical 22, pp. 101684. Cited by: §I, §I, §I, §II-B, §II-B, §IV-B5.
  • [8] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016) Domain-Adversarial Training of Neural Networks. The Journal of Machine Learning Research 17 (1), pp. 2096–2030. Cited by: §VI.
  • [9] Z. Gao, X. Wang, Y. Yang, C. Mu, Q. Cai, W. Dang, and S. Zuo (2019) EEG-Based Spatio-Temporal Convolutional Neural Network for Driver Fatigue Evaluation. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §I, §I, §I, §II-B, §II-B, §IV-B5, §IV-B7, TABLE I, §V-B, §V-E.
  • [10] X. Glorot and Y. Bengio (2010) Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the thirteenth International Conference on aAtificial Intelligence and Statistics, pp. 249–256. Cited by: §IV-B9.
  • [11] S. Haufe, F. Meinecke, K. Görgen, S. Dähne, J. Haynes, B. Blankertz, and F. Bießmann (2014) On the Interpretation of Weight Vectors of Linear Models in Multivariate Neuroimaging. NeuroImage 87, pp. 96–110. Cited by: §I, 3(a), §V-B, §V.
  • [12] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, and M. Grosse-Wentrup (2016) Transfer Learning in Brain-Computer Interfaces. IEEE Computational Intelligence Magazine 11 (1), pp. 20–31. Cited by: §I, §I, §VI.
  • [13] E. Jeon, W. Ko, and H. Suk (2019) Domain Adaptation with Source Selection for Motor-Imagery based BCI. In 2019 7th International Winter Conference on Brain-Computer Interface (BCI), pp. 1–4. Cited by: §VI.
  • [14] W. Ko, J. Yoon, E. Kang, E. Jun, J. Choi, and H. Suk (2018) Deep Recurrent Spatio-Temporal Neural Network for Motor Imagery based BCI. In 2018 6th International Conference on Brain-Computer Interface (BCI), pp. 1–3. Cited by: §I, §I, §I, §II-B, §II-B, §III-C, §III-C, §III-D, §IV-B5, §IV-C1, TABLE I, §V-B, §V-E.
  • [15] N. Kwak, K. Müller, and S. Lee (2017) A Convolutional Neural Network for Steady State Visual Evoked Potential Classification Under Ambulatory Environment. PLoS one 12 (2), pp. e0172578. Cited by: §I, §II-B, §II-B, §III-D.
  • [16] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance (2018) EEGNet: A Compact Convolutional Neural Network for EEG-based Brain–Computer Interfaces. Journal of Neural Engineering 15 (5), pp. 056013. Cited by: §I, §I, §I, §I, §II-A, §II-B, §II-B, §II-B, §III-A, §III-B, §III-C, §III-C, §III-D, §IV-B8, TABLE I, §V-B, §V-E.
  • [17] J. Lee, J. Park, S. Yang, H. Kim, Y. S. Choi, H. J. Kim, H. W. Lee, and B. Lee (2017) Early Seizure Detection by Applying Frequency-based Algorithm Derived from the Principal Component Analysis. Frontiers in Neuroinformatics 11, pp. 52. Cited by: §II-A, §II-A, §V-E.
  • [18] M. Lee, O. Kwon, Y. Kim, H. Kim, Y. Lee, J. Williamson, S. Fazli, and S. Lee (2019) EEG Dataset and OpenBMI Toolbox for Three BCI Paradigms: An Investigation into BCI Illiteracy. GigaScience 8 (5), pp. giz002. Cited by: §I, §I, §IV-A1, §IV-A2, §IV-B2, §IV-B9, TABLE I, §V-C, footnote 4.
  • [19] Z. Li and D. Hoiem (2017) Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (12), pp. 2935–2947. Cited by: §VI.
  • [20] M. Liang and X. Hu (2015) Recurrent Convolutional Neural Network for Object Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3367–3375. Cited by: §II-B, §IV-B5.
  • [21] M. Lin, Q. Chen, and S. Yan (2013) Network in Network. arXiv preprint arXiv:1312.4400. Cited by: Fig. 2, §III-D, §III-D, §III-D, §IV-B9, 3(b).
  • [22] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K. Müller (2017) Explaining Nonlinear Classification Decisions with Deep Taylor Decomposition. Pattern Recognition 65, pp. 211–222. Cited by: Fig. 3, §V-A, §V.
  • [23] M. Nakanishi, Y. Wang, Y. Wang, and T. Jung (2015) A Comparison Study of Canonical Correlation Analysis based Methods for Detecting Steady-State Visual Evoked Potentials. PLoS One 10 (10), pp. e0140703. Cited by: §I, §I, §I, §II-A, §II-A, §II-B, §IV-B2, §IV-B9, §IV-C2, TABLE I.
  • [24] E. Rapaport, O. Shriki, and R. Puzis (2019) EEGNAS: Neural Architecture Search for Electroencephalography Data Analysis and Decoding. In International Workshop on Human Brain and Artificial Intelligence, pp. 3–20. Cited by: §VI.
  • [25] S. Sakhavi, C. Guan, and S. Yan (2018) Learning Temporal Information for Brain-Computer Interface Using Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §I, §I.
  • [26] T. H. Sanders, M. McCurry, and M. A. Clements (2014) Sleep Stage Classification with Cross Frequency Coupling. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 4579–4582. Cited by: §II-A, §II-A.
  • [27] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball (2017) Deep Learning with Convolutional Neural Networks for EEG Decoding and Visualization. Human Brain Mapping 38 (11), pp. 5391–5420. Cited by: §I, §I, §I, §I, §II-B, §II-B, §III-A, §III-C, §III-C, §III-D, §IV-B5, §IV-C1, §IV-C2, TABLE I, §V-B, §V-E.
  • [28] A. H. Shoeb and J. V. Guttag (2010) Application of Machine Learning to Epileptic Seizure Detection. In Proceedings of the 27th International Conference on Machine Learning, pp. 975–982. Cited by: §I, §II-A, §IV-A4, §IV-B4, TABLE I, §V-E.
  • [29] A. H. Shoeb (2009) Application of Machine Learning to Epileptic Seizure Onset Detection and Treatment. Ph.D. Thesis, Massachusetts Institute of Technology. Cited by: §I, §IV-A3, §IV-A4, TABLE I, 3(d), §V-E.
  • [30] H. Suk and S. Lee (2012) A Novel Bayesian Framework for Discriminative Feature Extraction in Brain-Computer Interfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2), pp. 286–299. Cited by: §I, §II-A, §II-A, §III-A.
  • [31] A. Supratak, H. Dong, C. Wu, and Y. Guo (2017) DeepSleepNet: A Model for Automatic Sleep Stage Scoring based on Raw Single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (11), pp. 1998–2008. Cited by: §I, §I, §II-B, §II-B, §III-C, §III-D.
  • [32] N. Waytowich, V. J. Lawhern, J. O. Garcia, J. Cummings, J. Faller, P. Sajda, and J. M. Vettel (2018) Compact Convolutional Neural Networks for Classification of Asynchronous Steady-State Visual Evoked Potentials. Journal of Neural Engineering 15 (6), pp. 066031. Cited by: §I, §II-B, §II-B, §II-B, §IV-A2, §IV-B6, §IV-B8, §IV-B9, §IV-C2, TABLE I.
  • [33] X. Zhang, L. Yao, X. Wang, J. Monaghan, and D. Mcalpine (2019) A Survey on Deep Learning based Brain Computer Interface: Recent Advances and New Frontiers. arXiv preprint arXiv:1905.04149. Cited by: §I, §I.
  • [34] W. Zheng and B. Lu (2017) A Multimodal Approach to Estimating Vigilance using EEG and Forehead EOG. Journal of Neural Engineering 14 (2), pp. 026017. Cited by: §II-A, §II-A, §IV-A3, §IV-B3, TABLE I, 3(c), §V-D.