Transfer Learning for EEG-Based Brain-Computer Interfaces: A Review of Progresses Since 2016

04/13/2020 ∙ by Dongrui Wu, et al. ∙ Huazhong University of Science u0026 Technology Shanghai Jiao Tong University 0

A brain-computer interface (BCI) enables a user to communicate directly with a computer using the brain signals. Electroencephalogram (EEG) is the most frequently used input signal in BCIs. However, EEG signals are weak, easily contaminated by interferences and noise, non-stationary for the same subject, and varying among different subjects. So, it is difficult to build a generic pattern recognition model in an EEG-based BCI system that is optimal for different subjects, in different sessions, for different devices and tasks. Usually a calibration session is needed to collect some subject-specific data for a new subject, which is time-consuming and user-unfriendly. Transfer learning (TL), which can utilize data or knowledge from similar or relevant subjects/sessions/devices/tasks to facilitate the learning for a new subject/session/device/task, is frequently used to alleviate this calibration requirement. This paper reviews journal publications on TL approaches in EEG-based BCIs in the last few years, i.e., since 2016. Six paradigms and applications – motor imagery (MI), event related potentials (ERP), steady-state visual evoked potentials (SSVEP), affective BCIs (aBCI), regression problems, and adversarial attacks – are considered. For each paradigm/application, we group the TL approaches into cross-subject/session, cross-device, and cross-task settings and review them separately. Observations and conclusions are made at the end of the paper, which may point to future research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

A brain-computer interface (BCI) enables a user to communicate with a computer (or robot, wheelchair, etc) using his/her brain signals directly [1, 2]. The term was first coined by Vidal in 1973 [3], though it had been studied before that [4, 5]. BCIs were initially used mainly for disabled people, e.g., those with neuromuscular impairments [6]. Later research has extended their application scope to able-bodied users [7], in gaming [8], emotion recognition [9], mental fatigue evaluation [10]

, vigilance estimation

[11, 12], etc.

There are generally three types of BCIs [13]:

  1. Non-invasive BCIs, which use non-invasive brain signals, e.g., electroencephalogram (EEG), magnetoencephalogram (MEG), functional near-infrared spectroscopy (fNIRS), etc., measured outside of the brain to communicate with it.

  2. Invasive BCIs, which require surgery to implant sensor arrays or electrodes within the grey matter under the scalp for measuring and decoding the brain signal (usually spikes and local field potentials).

  3. Partially-invasive (semi-invasive) BCIs, in which the sensors are surgically implanted inside the skull but outside the brain rather than within the grey matter.

This paper focuses on non-invasive BCIs, particularly, EEG-based BCIs, which is the most popular type of BCIs due to its low risk (no surgery needed), low cost, and convenience.

The flowchart of a closed-loop EEG-based BCI system is shown in Fig. 1. It consists of the following components:

Fig. 1: Flowchart of a closed-loop EEG-based BCI system.
  1. Signal acquisition [14], which uses an EEG device to collect EEG signals from the scalp. EEG devices in the early days used wired connections and gels to increase the conductivity. Nowadays, wireless connections and dry electrodes are becoming more and more popular.

  2. Signal processing [15]

    , which usually includes temporal filtering and spatial filtering. The former uses a bandpass filter to reduce interferences and noise such as muscle artifacts, eye blinks, and DC drift. The latter combines different EEG channels to increase the signal-to-noise ratio. Popular spatial filters include common spatial patterns (CSP)

    [16]

    , independent component analysis (ICA)

    [17], blind source separation [18], xDAWN [19], etc.

  3. Feature extraction

    , where time domain, frequency domain

    [20], time-frequency domain, Riemannian space [21] and/or functional brain connectivity [22] features could be used.

  4. Pattern recognition

    . Depending on the application, a classifier or regression model is used.

  5. Controller, which outputs a command to control an external device, e.g., a wheelchair or a drone, or to alter the behavior of an environment, e.g., the difficulty level of a video game. The controller may not be needed in certain applications, e.g., BCI spellers.

When deep learning is used, feature extraction and pattern recognition can be integrated into a single neural network, and both components are optimized simultaneously and automatically.

EEG signals are very weak, easily contaminated by interferences and noise, non-stationary for the same subject, and varying among different subjects. So, it is difficult, if not impossible, to build a generic machine learning model in an EEG-based BCI system that is optimal for different subjects, in different sessions, for different devices and different tasks. Usually a calibration session is needed to collect some subject-specific data for a new subject, which is time-consuming and user-unfriendly. So, reducing this subject-specific calibration is critical to the market success of EEG-based BCIs.

Different machine learning techniques, e.g., transfer learning (TL) [23]

and active learning

[24], have been used for this purpose. Among them, TL is particularly promising, because it can utilize data or knowledge from similar or relevant subjects/sessions/devices/tasks to facilitate the learning for a new subject/session/device/task. Moreover, it can also be integrated with active learning for even better performance [25, 26]. This paper focuses on TL in EEG-based BCIs.

There are three classic classification paradigms in EEG-based BCIs, which will be considered in this paper:

  1. Motor imagery (MI) [27]

    , which can modify the neuronal activity in the primary sensorimotor areas, similar to a real executed movement does. As different MIs affect different regions of the brain, e.g., left (right) hemisphere for right (left) hand MIs, and center for feet MIs, a BCI can decode an MI from the EEG signals and map it to a specific command.

  2. Event-related potentials (ERP) [28, 29], which are any stereotyped EEG responses to a visual, audio, or tactile stimulus. The most frequently used ERP is P300 [30], which occurs about 300 ms after a rare stimulus.

  3. Steady-state visual evoked potentials (SSVEP) [31]. The EEG oscillates at the same (or multiples of) frequency of the visual stimulus at a specific frequency, usually between 3.5 and 75 Hz [32]. This paradigm is frequently used in BCI spellers [33], as it can achieve very high information transfer rate.

EEG-based affective BCIs (aBCIs) [34, 35, 36, 37], which detect affective states (moods, emotions) from EEG and use them in BCIs, have become an emerging research area. There are also some interesting regression problems in EEG-based BCIs, e.g., driver drowsiness estimation [38, 39, 40], user reaction time estimation [41], etc. Additionally, recent research [42, 43] has shown that adversarial attacks, where deliberately designed small perturbations, which may be impossible to be noticed by human eyes, can be added to benign EEG trials to fool the machine learning model and cause dramatic performance degradation. This paper also considers aBCI, regression problems, and adversarial attacks of EEG-based BCIs.

Though TL has been applied in all above EEG-based BCI paradigms and applications, to our knowledge, there does not exist a comprehensive and up-to-date review on it. Wang et al. [44] performed a short review in a conference paper in 2015. Jayaram et al. [45] gave a brief review in 2016, considering only cross-subject and cross-session transfers. Lotte et al. [46] gave a comprehensive review of classification algorithms for EEG-based BCIs between 2007 and 2017. Again, they only considered cross-subject and cross-session transfers. Azab et al. [47] performed a review of four categories of TL approaches in BCIs in 2018: 1) instance-based TL; 2) feature-representation TL; 3) classifier-based TL; and, 4) relational-based TL.

However, all aforementioned reviews considered only cross-subject and cross-session TL of the three classic paradigms of EEG-based BCIs (MI, ERP and SSVEP), but did not mention the more challenging cross-device and cross-task transfers, nor aBCI, regression problems and adversarial attacks.

To fill these gaps and not to overlap too much with previous reviews, this paper reviews journal publications of TL approaches in EEG-based BCIs in the last few years, i.e., since 2016. Six paradigms and applications are considered: MI, ERP, SSVEP, aBCI, regression problems, and adversarial attacks. For each paradigm/application, we group the TL approaches into cross-subject/session (because these two are essentially the same), cross-device, and cross-task settings and review them separately, unless no TL approaches have been proposed for that category. Some TL approaches may cover more than two categories, e.g., both cross-subject and cross-device transfers were considered. In this case, we introduce them in the more challenging category, e.g., cross-device TL. When there are multiple TL approaches in each category, we generally introduce them according to the years they were proposed, unless there are intrinsic connections among several approaches.

The remainder of this paper is organized as follows: Section II briefly introduces some basic concepts of TL. Sections III-VIII review TL approaches in MI, ERP, SSVEP, aBCI, regression problems and adversarial attacks, respectively. Section IX makes observations and conclusions, which may point to some future research directions.

Ii TL Concepts and Scenarios

This section introduces the basic definitions of TL, the related concepts, e.g., domain adaptation and covariate shift, and different TL scenarios in EEG-based BCIs.

In machine learning, a feature vector is usually denoted by a bold symbol

. To emphasize that each EEG trial is a 2D matrix, this paper denotes the feature matrix by , where is the number of channels and the number of time domain samples. Of course, can also be converted to a feature vector .

Ii-a TL Concepts

Definition 1

A domain [23, 48] consists of a feature space

and a marginal probability distribution

, i.e., , where .

A source domain and a target domain are different means that they may have different feature spaces, i.e., , and/or different marginal probability distributions, i.e., .

Definition 2

Given a domain , a task[23, 48] consists of a label space and a prediction function , i.e., .

Let . Then,

is the conditional probability distribution. Two tasks

and are different means they may have different label spaces, i.e., , and/or different conditional probability distributions, i.e., .

Definition 3

Given a source domain , and a target domain with labeled samples and unlabeled samples , transfer learning aims to learn a target prediction function with low expected error on , under the general assumptions that , , , and/or .

In inductive TL, the target domain has some labeled samples, i.e., . For most inductive TL scenarios in BCIs, the source domain samples are labeled, but they could also be unlabeled. When the source domain samples are labeled, inductive TL is similar to multi-task learning [49]. The difference is that multi-task learning tries to learn a model for every domain simultaneously, whereas inductive TL focuses only on the target domain. In transductive TL, the source domain samples are all labeled, but the target domain samples are all unlabeled, i.e., . In unsupervised TL, none samples in either domain are unlabeled.

Domain adaptation is a special case of TL, or more specifically, transductive TL:

Definition 4

Given a source domain and a target domain , domain adaptation aims to learn a target prediction function with low expected error on , under the assumptions that and , but and/or .

Covariate shift is a special and simpler case of domain adaptation:

Definition 5

Given a source domain , and a target domain , covariate shift happens when , , , but .

Ii-B TL Scenarios

According to the nature of the source and target domains, there can be different TL scenarios in EEG-based BCIs:

  1. Cross-subject transfer, i.e., data from other subjects (source domains) are used to facilitate the calibration for a new subject (target domain). Usually the task and EEG device are the same for different subjects.

  2. Cross-session transfer, i.e., data from previous sessions (source domains) are used to facilitate the calibration of a new session (target domain). For example, data from previous days are used in the current calibration. Usually the subject, task and EEG device are the same in different sessions.

  3. Cross-device transfer, i.e., data from one EEG device (source domain) are used to facilitate the calibration of a new device (target domain). Usually the task and subject are the same for different EEG devices.

  4. Cross-task transfer, i.e., labeled data from other similar or relevant tasks (source domains) are used to facilitate the calibration for a new task (target domain). For example, data from left- and right-hand MIs are used in the calibration of feet and tongue MIs. Usually the subject and EEG device are the same for different tasks.

Since cross-subject transfer and cross-session transfer are essentially the same, this paper combines them into one cross-subject/session category. Generally, cross-device transfer and cross-task transfer are more challenging than cross-subject/session transfer, and hence they were less studied in the literature.

The above simple TL scenarios could also be mixed to form more complex TL scenarios, e.g., cross-subject and cross-device transfer [26], cross-subject and cross-task transfer [50], etc.

Iii TL in MI-Based BCIs

This section reviews recent progresses in TL for MI-based BCIs. Many of the studies used the BCI Competition datasets111http://www.bbci.de/competition/.

Assume there are source domains, and the -th source domain has EEG trials. The -th trial of the -th source domain is denoted as , where is the number of channels, and is the number of features in each channel. The corresponding covariance matrix is , which is symmetric and positive definite (SPD) and lies on a Riemannian manifold. The label for is for binary classification. The -th EEG trial in the target domain is denoted as , and the covariance matrix . These notations are used in the remaining of the paper.

Iii-a Cross-Subject/Session Transfer

Dai et al. [51] proposed transfer kernel common spatial patterns (TKCSP), which integrates kernel common spatial patterns (KCSP) [52] and transfer kernel learning (TKL) [53], for EEG trial spatial filtering in cross-subject MI classification. It first computes a domain-invariant kernel by TKL, and then uses it in KCSP, which further finds the components with the largest energy difference between two classes. Note that here TL was used in EEG signal processing (spatial filtering) instead of classification.

Jayaram et al. [45] proposed a multi-task learning framework for cross-subject/session transfers, which does not need any labeled data in the target domain. The linear decision rule is , where consists of the weights of the channels, and the weights of the features. and are obtained by minimizing

(1)

where consists of the weights of the channels for the -th source subject, consists of the weights of the features, is a hyper-parameter, and

is the negative log prior probability of

given the Gaussian distribution parameterized by

. and ( and ) are the mean vector and covariance matrix of (), respectively.

Briefly speaking, the first term in (1) requires and to work well for the corresponding source subject, the second term ensures the divergence of from the shared is small, i.e., all source subjects should have similar , and the third term ensures the divergence of from the shared is small. and can be viewed as the subject-invariant characteristics of stimulus prediction, and hence used directly for a new subject. Jayaram et al. demonstrated that the proposed approach worked well on cross-subject transfers in MI, and also cross-session transfers in one patient with amyotrophic lateral sclerosis (ALS).

Azab et al. [54]

proposed weighted TL for cross-subject transfer in MI classification, as an improvement of the above approach. They assumed that each source subject has plenty of labeled samples, whereas the target subject has only a few. They first trained a logistic regression classifier for each source subject, by using a cross-entropy loss function with an L2 regularization term. Then, the logistic regression classifier for the target subject was trained so that the cross-entropy loss on the few labeled target domain samples is minimized, and also its parameters are close to those of the source subjects. The mean vector and covariance matrix of the classifier parameters in the source domains were computed in a similar way to

[45]

, except that for each source domain a weight determined by the Kullback-Leibler divergence between it and the target domain was used.

Hossain et al. [55] proposed an ensemble learning approach for cross-subject transfer in multi-class MI classification. Four base classifiers were used, all constructed using TL and active learning: 1) multi-class direct transfer with active learning (mDTAL), a multi-class extension of the active TL approach proposed in [56]; 2) multi-class aligned instance transfer with active learning, which is similar to mDTAL except that only the source domain samples correctly classified by the corresponding classifier are transferred; 3) most informative and aligned instances transfer with active learning, which transfers only source domain samples that are correctly classified by its classifiers and close to the decision boundary (i.e., most informative); and, 4) most informative instances transfer with active learning, which transfers only source domain samples that are close to the decision boundary. The four base learners were finally aggregated by stacking ensemble learning to achieve more robust performance.

Since the SPD matrices of EEG trials are SPD and lie on a Riemannian manifold instead of in a Euclidean space, Riemannian approaches [21] have become popular in EEG-based BCIs. Different TL approaches have also been proposed recently.

Zanini et al. [57] proposed a Riemannian alignment (RA) approach to center the EEG covariance matrices in the -th domain with respect to a reference covariance matrix specific to that domain. More specifically, RA computes first the covariance matrices of some resting trials in the -th domain, in which the subject is not performing any task, and then calculates their Riemannian mean . is next used as the reference matrix to reduce the inter-subject/session variation:

(2)

where is the aligned covariance matrix for . Equation (2

) makes the reference state of different subjects/sessions centered at the identity matrix. In MI, the resting state is the time window that the subject is not performing any task, e.g., the transition window between two MIs. In ERP, the non-target stimuli are used as the resting state, which means that some labeled trials in the target domain must be known. Zanini

et al. also proposed improvements to the minimum distance to Riemannian mean (MDRM) [58] classifier, and demonstrated the effectiveness of RA and the improved MDRM in both MI and ERP classifications.

Yair et al. [59] proposed a domain adaptation approach using the analytic expression of parallel transport on the cone manifold of SPD matrices. The goal was to find a common tangent space such that the mappings of and are aligned. It first computes the Riemannian mean of the -th domain, and then the Riemannian mean of all . Then, each is moved to by parallel transport , and , the -th covariance matrix in the -th domain, is projected to

(3)

After the projection, the covariance matrices in different domains are mapped to the same tangent space, so a classifier built in a source domain can be directly applied to the target domain. Equation (3) is essentially identical to RA in (2), except that (3) works in the tangent space, whereas (2) in the Riemannian space. Yair et al. demonstrated the effectiveness of parallel transport in cross-subject MI classification, sleep stage classification, and mental arithmetic identification.

RA may have some limitations: 1) RA aligns the covariance matrices in the Riemannian space, and hence requires a Riemannian space classifier, whereas there are very few such classifiers; 2) RA uses the Riemannian mean of the covariance matrices, which is time-consuming to compute, especially when the number of EEG channels is large; 3) RA for ERP classification needs some labeled trials in the target domain, so it cannot be used when such information is not available. To solve these problems, He and Wu [60] proposed a Euclidean alignment (EA) approach, to align EEG trials from different subjects in the Euclidean space to make them more consistent. Mathematically, for the -th domain, EA computes the reference matrix , i.e., is the arithmetic mean of all covariance matrices in the -th domain (it can also be the Riemannian mean, which is more computationally intensive), then performs the alignment by . After EA, the mean covariance matrices of all domains become the identity matrix. Both Euclidean space and Riemannian space feature extraction and classification approaches can then be applied to . EA can be viewed as a generalization of Yair et al.’s parallel transport approach, because the computation of in EA is more flexible, and both Euclidean and Riemannian space classifiers can be used after the alignment. He and Wu demonstrated that EA outperformed RA in both MI and ERP classifications, in both offline and simulated online applications.

Rodrigues et al. [61] proposed Riemannian procrustes analysis (RPA) to accommodate covariant shift in EEG-based BCIs. It is semi-supervised, and requires at least one labeled sample from each class of the target domain. RPA first matches the statistical distributions of the covariance matrices of the EEG trials from different domains, using simple geometrical transformations of translation, scaling, and rotation in sequential. Then, the labeled and transformed data from both domains are concatenated to train a classifier, which is next applied to the transformed and unlabeled target domain samples. Mathematically, it transforms each target domain covariance matrix to

(4)

where:

  • is the geometric mean of the labeled samples in the target domain, which centers the target domain covariance matrices at the identity matrix.

  • stretches the target domain covariance matrices so that they have the same dispersion as the source domain, in which and are the dispersions around the geometric mean of the source domain and target domain, respectively.

  • is an orthogonal rotation matrix to be optimized, which minimizes the distance between the class means of the source domain and the translated and stretched target domain.

  • is the geometric mean of the labeled samples in the source domain, which ensures the geometric mean of is the same as that in the source domain.

Clearly, RPA is a generalization of RA. Rodrigues et al. [61] showed that RPA achieved promising results in cross-subject MI, ERP and SSVEP classification.

Recently, Zhang and Wu [62] proposed a manifold embedded knowledge transfer (MEKT) approach, which first aligns the covariance matrices of the EEG trials in the Riemannian manifold, extracts features in the tangent space, and then performs domain adaptation by minimizing the joint probability distribution shift between the source and the target domains, while preserving their geometric structures. More specifically, it consists of the following three steps:

  1. Covariance matrix centroid alignment: Align the centroid of the covariance matrices in each domain, i.e., and , where () can be the Riemannian mean, the Euclidean mean, or the Log-Euclidean mean of all (). This is essentially a generalization of RA [57]. The marginal probability distributions from different domains are brought together after centroid alignment.

  2. Tangent space feature extraction: Map and assemble all () into a tangent space super matrix (), where is the dimensionality of the tangent space features.

  3. Mapping matrices identification: Find projection matrices and , where is the dimensionality of a shared subspace, such that and are close.

After MEKT, a classifier can be trained on and applied to to estimate their labels.

MEKT can cope with one or more source domains, and be computed efficiently. Zhang and Wu [62] also used domain transferability estimation (DTE) to identify the most beneficial source domains, in case there are too many of them. Experiments in cross-subject MI and ERP classification demonstrated that MEKT outperformed several state-of-the-art TL approaches, and DTE can reduce more than half of the computational cost when the number of source subjects is large, with little sacrifice of classification accuracy.

Singh et al. [63] proposed a TL approach for estimating the sample covariance matrices, which are used by the MDRM classifier, from a very small number of target domain samples. It first estimates the sample covariance matrix for each class, by a weighted average of the sample covariance matrix of the corresponding class from the target domain and that in the source domain. The mixed sample covariance matrix is the sum of the per-class sample covariance matrices. Spatial filters are then computed from the mixed and per-class sample covariance matrices. Next, the covariance matrices of the spatially filtered EEG trials are further filtered by Fisher geodesic discriminant analysis [64] and used as features in the MDRM [58] classifier.

Deep learning, which has been very successful in image processing, speech recognition, video analysis and natural language processing, has also started to find applications in EEG-based BCIs.

Schirrmeister et al. [65]

proposed two convolutional neural networks (CNNs) for EEG decoding. The Deep ConvNet consists of four convolutional blocks and a classification block. The first convolutional block is specially designed to handle EEG inputs and the other three are standard ones. The Shallow ConvNet is a shallow version of the Deep ConvNet, inspired by filter bank common spatial patterns (FBCSP)

[66]

. Its first block is similar to the first convolutional block of the Deep ConvNet, but with a larger kernel, a different activation function, and a different pooling approach. Schirrmeister

et al. [65] showed that both ConvNets outperformed FBCSP in cross-subject MI classification.

Lawhern et al. [67] proposed EEGNet, a compact CNN architecture for EEG classification. It can be applied across different BCI paradigms, be trained with very limited data, and generate neurophysiologically interpretable features. EEGNet starts with a temporal convolution to learn frequency filters (analogous to different bandpass filters, e.g., delta, alpha, theta, etc), then uses a depthwise convolution, connected to each feature map individually, to learn frequency-specific spatial filters (analogous to FBCSP [66]

). Next, a depthwise convolution, which learns a temporal summary for each feature map individually, and a pointwise convolution, which learns how to optimally mix the feature maps together, are used. Finally, a softmax layer is used for classification. EEGNet achieved robust results in both within-subject and cross-subject classification of MIs and ERPs.

Kwon et al. [68] proposed a CNN for subject-independent MI classification. EEG signals from all training subjects were concatenated and filtered by 30 spectral-spatial filters, e.g., 8-30Hz, 11-20Hz, etc. A few most discriminative bands were selected based on mutual information. The spectral-spatial maps generated from these selected bands were then each input to a CNN. The feature vectors generated from all CNNs were concatenated and a fully-connected layer was used for classification.

Iii-B Cross-Device Transfer

Xu et al. [69] studied the performance of deep learning in cross-dataset transfer. Eight publicly available MI datasets were considered. Though different datasets used different EEG devices, channels and MI tasks, they only selected three common channels (C3, CZ, C4) and the left-hand and right-hand MIs. They applied an online pre-alignment strategy to each EEG trial of each subject, by computing the Riemannian mean recursively online and using it as the reference matrix in EA. They showed that online pre-alignment significantly increased the performance of deep learning models in cross-dataset transfers.

Iii-C Cross-Task Transfer

Both RA and EA assume that the source domains have the same feature space and label space as the target domain, which may not hold in many real-world applications, i.e., they may not be used in cross-task transfers. Recently, He and Wu [50] also proposed a label alignment (LA) approach, which can handle the situation that the source domains have different label spaces from the target domain. For MI-based BCIs, this means the source subjects and the target subject can perform completely different MI tasks (e.g., the source subject may perform left-hand and right-hand MIs, whereas the target subject may perform feet and tongue MIs), but the source subjects’ data can still be used to facilitate the calibration for a target subject.

When the source and target domain devices are different, LA first selects the source EEG channels as those closest to the target EEG channels. Then, it computes the mean covariance matrix of each source domain class, and estimates the mean covariance matrix of each target domain class. Next, it re-centers each source domain class at the corresponding estimated class mean of the target domain. Both Euclidean space and Riemannian space feature extraction and classification approaches can next be applied to the aligned trials. LA only needs as few as one labeled sample from each class of the target subject, can be used as a preprocessing step before different feature extraction and classification algorithms, and can be integrated with other TL approaches to achieve even better performance. He and Wu [50] demonstrated the effectiveness of LA in simultaneous cross-subject, cross-device and cross-task transfer in MI classification.

To our knowledge, this is the only cross-task TL work in EEG-based BCIs, and also the most complicated TL scenario (simultaneous cross-subject, cross-device and cross-task transfer) considered in the literature so far.

Iv TL in ERP-Based BCIs

This section reviews recent TL approaches in ERP-based BCIs. Many approaches introduced in the previous section, e.g., RA, EA, RPA and EEGNet, can also be used here. To avoid duplication, we only include approaches that have not been introduced in the previous section here. Because there were no publications on cross-task transfers in ERP-based BCIs, we do not have a “Cross-Task Transfer” subsection.

Iv-a Cross-Subject/Session Transfer

Waytowich et al. [70] proposed unsupervised spectral transfer using information geometry (STIG) for subject-independent ERP-based BCIs. STIG uses spectral meta learner [71] to combine predictions from an ensemble of MDRM classifiers on data from individual source subjects. Experiments on single-trial ERP classification demonstrated that STIG significantly outperformed some calibration-free approaches and traditional within-subject calibration approaches when limited data is available, in both offline and online ERP classifications.

Wu [72] proposed weighted adaptation regularization (wAR) for cross-subject transfers in ERP-based BCIs, in both online and offline settings. Mathematically, wAR learns the following classifier directly:

(5)

where is a loss function, is the overall weight of target domain samples, is a kernel function, and , and are non-negative regularization parameters. and are the weight for the -th sample in the source domain and target domain, respectively, to balance the number of positive and negative samples in the corresponding domain.

Briefly speaking, the first and second terms in (5) minimize the loss on fitting the labeled samples in the source domain and target domain, respectively. The third term minimizes the structural risk of the classifier. The fourth term minimizes the distance between the marginal probability distributions and . The last term minimizes the distance between the conditional probability distributions and . Experiments on single-trial visual evoked potential classification demonstrated that both online and offline wAR algorithms were effective. Wu also proposed a source domain selection approach, which selects the most beneficial source subjects for transfer. It can reduce the computational cost of wAR by 50%, without sacrificing the classification performance.

Qi et al. [73]

performed cross-subject transfer on a P300 speller to reduce the calibration time. A small number of ERP epochs from the target subject were used as reference to compute the Riemannian distance to each source ERP sample from an existing data pool. The most similar ones were selected to train a classifier and applied to the target subject.

Jin et al. [74]

used a generic model set for reducing calibration time in P300-based BCIs. Filtered EEG data from 116 participants were assembled into a data matrix, principal component analysis (PCA) was used to reduce the dimensionality of the time domain features, and then the 116 participants were clustered into 10 groups by

-means clustering. A weighted linear discriminant analysis (WLDA) classifier was then trained for each cluster. These 10 classifiers formed the generic model set. For a new subject, a few calibration samples were acquired, and an online linear discriminant (OLDA) model was trained. The OLDA model was matched to the closest WLDA model, which was then selected as the model for the new subject.

Deep learning has also be used in ERP classification. Inspired by the generative adversarial network (GAN) [75], Ming et al. [76]

proposed a subject adaptation network (SAN) to mitigate individual differences in EEGs. Based on the characteristics of the application, they designed an artificial low-dimensional distribution and forced the transformed EEG features to approximate it. For example, for two-class visual evoked potential classification, the artificial distribution is bi-modal, and the area of each modal is proportional to the number of samples in the corresponding class. Experiments on cross-subject visual evoked potential classification demonstrated that SAN outperformed support vector machine (SVM) and EEGNet.

Iv-B Cross-Device Transfer

Wu et al. [26] proposed active weighted adaptation regularization (AwAR) for cross-device transfer. It integrates wAR (introduced in Section IV-A), which uses labelled data from the previous device and handles class-imbalance, and active learning [24], which selects the most informative samples from the new device to label. Only the common channels were used in wAR, but all channels of the new device can be used in active learning for better performance. Experiments on single-trial visual evoked potential classification using three different EEG devices with different number of electrodes showed that AwAR can significantly reduce the calibration data requirement for a new device in offline calibration.

To our knowledge, this is the only study on cross-device transfer in ERP-based BCIs.

V TL in SSVEP-Based BCIs

This section reviews recent TL approaches in SSVEP-based BCIs. Because there were no publications on cross-task transfers in SSVEP-based BCIs, we do not have a “Cross-Task Transfer” subsection. Overall, much fewer TL studies on SSVEPs have been performed, compared with MIs and ERPs.

V-a Cross-Subject/Session Transfer

Waytowich et al. [77] proposed Compact-CNN, which is essentially EEGNet [67] introduced in Section III-A, for 12-class SSVEP classification without the need for any user-specific calibration. It outperformed state-of-the-art hand-crafted approaches using canonical correlation analysis (CCA) and Combined-CCA.

Rodrigues et al. [61] proposed RPA, which can also be used in cross-subject transfer of SSVEP-based BCIs. Since it has been introduced in Section III-A, it is not repeated here.

V-B Cross-Device Transfer

Nakanishi et al. [78] proposed a cross-device TL algorithm for reducing the calibration effort in an SSVEP-based BCI speller. It first computes a set of spatial filters by channel averaging, or CCA, or task-related component analysis, then concatenates them to form a filter matrix . The average trial of Class  of the source domain is computed and filtered by to obtain . Let be a single trial to be classified in the target domain. Its spatial filter matrix is then computed by

(6)

i.e., . Then, Pearson’s correlation coefficients between and are computed as , and canonical correlation coefficients between and computer-generated SSVEP models are computed as . The two feature values are combined as

(7)

and the target class is identified as .

To our knowledge, this is the only study on cross-device transfer in SSVEP-based BCIs.

Vi TL in aBCI

Recently, there has been a fast growing research interest in aBCIs [34, 35, 36, 37]. Emotions can be represented by discrete categories [79], e.g., happy, sad, angary, etc., and also by continuous values in the 2D space of arousal and valence [80], or in the 3D space of arousal, valence, and dominance [81]. So, there can be both classification and regression problems in aBCIs. However, the current literature focused exclusively on classification problems.

Most studies used the publicly available DEAP [82] and SEED [9] datasets. DEAP consists of 32-channel EEGs recorded by a Biosemi ActiveTwo device from 32 subjects while they were watching minute-long music videos, whereas SEED consists of 62-channel EEGs recorded by an ESI NeuroScan device from 15 subjects while they were watching 4-minute movie clips. By using SEED, Zheng et al. [83] investigated if there are stable EEG patterns over time for emotion recognition. Using differential entropy features, they found that stable patterns did exhibit consistency across sessions and subjects. Thus, it is possible to perform TL in aBCIs.

This section reviews the latest progresses on TL in aBCIs. Because there were no publications on cross-task transfers in aBCIs, we do not have a “Cross-Task Transfer” subsection.

Vi-a Cross-Subject/Session Transfer

Chai et al. [84] proposed adaptive subspace feature matching (ASFM) for cross-subject and cross-session transfer in offline and simulated online EEG-based emotion classification. Differential entropy features were again used. ASFM first performs PCA of the source domain and the target domain separately. Let () be the leading principal components in the source (target) domain, which form the corresponding subspace. Then, ASFM transforms the source domain subspace to and projects the source data into it. The target data are projected directly into . In this way, the marginal distribution discrepancy between the two domains is reduced. Next, an iterative pseudo-label refinement strategy is used to train a logistic regression classifier using the labeled source domain samples and pseudo-labeled target domain samples, which can be directly applied to unlabeled target domain samples.

Lin and Jung [85] proposed a conditional TL (cTL) framework to facilitate positive cross-subject transfers in aBCI. Five differential laterality features (topoplots), corresponding to five different frequency bands, from each EEG channel are extracted. cTL first computes the classification accuracy by using the target subject’s data only, and performs transfer only if that accuracy is below the chance level. Then, it uses ReliefF [86] to select a few most emotion-relevant features in the target domain, calculates their correlations with the corresponding features in each source domain to select a few most similar (correlated) source subjects. Next, the target domain data and the selected source domain data are concatenated to train a classifier.

Lin et al. [87] proposed a robust PCA (RPCA) [88] based signal filtering strategy and validated its performance in cross-day binary emotion classification. RPCA decomposes an input matrix into the sum of a low-rank matrix and a sparse matrix. The former accounts for the relatively regular profiles of input signals, whereas the latter its deviant events. Lin et al. showed that the RPCA-decomposed sparse signals filtered off the background EEG activity that contributed more to the inter-day variability, and predominately captured the EEG oscillations of emotional responses that behaved relatively consistent along days.

Li et al. [89]

extracted nine types of time-frequency domain features (peak-peak mean, mean square value, variance, Hjorth activity, Hjorth mobility, Hjorth complexity, maximum power spectral frequency, maximum power spectral density, power sum) and nine types of nonlinear dynamical system features (approximate entropy, C0 complexity, correlation dimension, Kolmogorov entropy, Lyapunov exponent, permutation entropy, singular entropy, Shannon entropy, spectral entropy) from EEG measurements. Through automatic and manual feature selection, they verified the effectiveness and performance upper bounds of those features in cross-subject emotion classification on DEAP and SEED. They found that L1-norm penalty-based feature selection achieved robust performance on both datasets, and Hjorth mobility in the beta rhythm achieved the highest mean classification accuracy.

Liu et al. [90] performed cross-day EEG-based emotion classification. Each of the 17 subjects watched 6-9 emotional movie clips in five different days over one month. Spectral powers in delta, theta, alpha, beta, low and high gamma bands were computed for each of the 60 channels as initial features, and then recursive feature elimination was used for feature selection. In cross-day classification, data from a subset of the five days were used by an SVM to classify data from the remaining days. They showed that EEG variability could impair the emotion classification performance dramatically, and using more days’ data in training could significantly improve the generalization performance.

Yang et al. [91]

studied cross-subject emotion classification on DEAP and SEED. Ten linear and nonlinear features (Hjorth activity, Hjorth mobility, Hjorth complexity, standard deviation, PSD-alpha, PSD-beta, PSD-gamma, PSD-theta, sample entropy, and wavelet entropy) were extracted from each channel and concatenated. Then, sequential backward feature selection and significance test were used for feature selection, and an RBF SVM was used as the classifier.

Li et al. [92] considered cross-subject EEG emotion classification for both supervised (the target subject has some labeled samples) and semi-supervised (the target subject has some labeled samples, and also unlabeled samples) scenarios. We only briefly introduce their best-performing supervised approach here. Assume there are multiple source domains. They first performed source selection: train a classifier in each source domain and compute its classification accuracy on the labeled samples in the target domain. These accuracies were then sorted to select the top few source subjects. A style transfer mapping was learned between the target and each selected source. For each selected source subject, they performed SVM classification on his/her data, removed the support vectors (because they are close to the decision boundary and hence uncertain), performed -means clustering on the remaining samples to obtain the prototypes, and mapped each target domain labeled sample feature vector to the nearest prototype in the same class by the following mapping:

(8)

where is the nearest prototype in the same class of the source domain, and and are hyper-parameters. For a new unlabeled sample in the target domain, it is first mapped to each selected source domain and then classified by a classifier trained in the corresponding source domain. The classification results from all source domains were then weighted averaged, where the weights were determined by the accuracies of the source domain classifiers.

Deep learning has also gained popularity in aBCI.

Chai et al. [93] proposed subspace alignment auto-encoder (SAAE) for cross-subject and cross-session transfer in EEG-base emotion classification. First, differential entropy features from both domains were transformed into a domain-invariant subspace using a stacked auto-encoder. Then, kernel PCA, graph regularization and maximum mean discrepancy were used to reduce the feature distribution discrepancy between the two domains. After that, a classifier trained in the source domain can be directly applied to the target domain.

Yin and Zhang [94] proposed an adaptive stacked denoising auto-encoder (SDAE) for cross-session binary classification of mental workload levels from EEG. The weights of the shallow hidden neurons of the SDAE were adaptively updated during the testing phase, using augmented testing samples and their pseudo-labels.

Zheng et al. [95] presented a multi-modal emotion recognition framework called EmotionMeter that combines brain waves and eye movements to classify four emotions, fear, sadness, happiness, and neutrality. They adopted bimodal deep auto-encoder to extract the shared representations of both EEG and eye movements. Their experimental results demonstrated that modality fusion combining EEG and eye movements with multi-modal deep learning can significantly enhance the emotion recognition accuracy (85.11%) compared with a single modality (eye movements: 67.82% and EEG: 70.33%). Moreover, they also investigated the complementary characteristics of EEG and eye movements for emotion recognition and evaluated the stability of their EmotionMeter framework across sessions. They found that EEG and eye movements have important complementary characteristics, e.g., EEG has the advantage of classifying happy emotion (80%) compared with eye movements (67%), whereas eye movements outperform EEG in recognizing fear emotion (67% versus 65%). It is difficult to recognize fear emotion using only EEG and happy emotion using only eye movements. Sad emotion has the lowest classification accuracies for both modalities.

Fahimi et al. [96] performed cross-subject attention classification. They first trained a CNN by combining EEG data from all source subjects, and then fine-tuned it by using some calibration data from the target subject. The inputs were raw EEG, band-pass filtered EEG, and decomposed EEG (delta, theta, alpha, beta and gamma bands were used).

Li et al. [97]

proposed R2G-STNN, which consists of spatial and temporal neural networks with a regional-to-global hierarchical feature learning process, to learn discriminative spatial-temporal EEG features for subject-independent emotion classification. To learn the spatial features, a bidirectional long short term memory (LSTM) network was used to capture the intrinsic spatial relationships of EEG electrodes within and between different brain regions. A region-attention layer was also introduced to learn the weights of different brain regions. A domain discriminator working corporately with the classifier was used to reduce the domain shift between training and testing.

Li et al. [98] further proposed improved bi-hemisphere domain adversarial neural network (BiDANN-S) for subject-independent emotion classification. Inspired by the neuroscience findings that the left and right hemispheres of human’s brain are asymmetric to the emotional response, BiDANN-S uses a global and two local domain discriminators working adversarially with a classifier to learn discriminative emotional features for each hemisphere. To improve the generalization performance and to facilitate subject-independent EEG emotion classification, it also tries to reduce the possible domain differences in each hemisphere between the source and target domains, and ensure that the extracted EEG features are robust to subject variations.

Li et al. [99]

proposed a neural network model for cross-subject/session EEG emotion recognition, which does not need label information in the target domain. The neural network was optimized by minimizing the classification error in the source domain, while making the source and target domains similar in their latent representations. Adversarial training was used to adapt the marginal distributions in the early layers, and association reinforcement was performed to adapt the conditional distributions in the last few layers. In this way, it achieved joint distribution adaptation

[100].

Song et al. [101] proposed a dynamical graph convolutional neural networks (DGCNN) for subject-dependent and subject-independent emotion classification. Each EEG channel was represented as a node in DGCNN, and the differential entropy features from five frequency bands were used as inputs. After graph filtering, a

convolution layer learned the discriminative features among the five frequency bands. A ReLU activation function was adopted to ensure that the outputs of the graph filtering layer are non-negative. The outputs of the activation function were sent to a multi-layer fully connected network for classification.

Appriou et al. [102] compared several modern machine learning algorithms on subject-specific and subject-independent cognitive and emotional state classification, including Riemannian approaches and a CNN. They found that CNN performed the best in both subject-specific and subject-independent workload classification. A filter-bank tangent space classifier (FBTSC) was also proposed. It first filters EEG into several different frequency bands. For each band, it computes the covariance matrices of the EEG trials, projects them onto the tangent space at their mean, and then applies a Euclidean space classifier. FBTSC achieved the best performance in subject-specific emotion (valance and arousal) classification.

Vi-B Cross-Device Transfer

Lan et al. [103] considered cross-dataset transfers between DEAP and SEED, which have different number of subjects, and were recorded using different EEG devices with different number of electrodes. They used only three trials (one positive, one neutral and one negative) from 14 selected subjects in DEAP, and only the 32 common channels between the two datasets. Five differential entropy features in five different frequency bands (delta, theta, alpha, beta and gamma) were extracted from each channel and concatenated as features. Experiments showed that domain adaptation, particularly, transfer component analysis [104] and maximum independence domain adaptation [105], can effectively improve the classification accuracies over the baseline.

Lin [106] proposed RPCA-embedded TL to generate a personalized cross-day emotion classifier with less labeled data, while obviating intra- and inter-individual differences. The source dataset consists of 12 subjects using 14-channel Emotiv EPOC device, and the target dataset consists of 26 different subjects using 30-channel Neuroscan Quik-Cap. Twelve of the 26 channels of Quik-Cap were first selected to align with 12 of the 14 selected channels from EPOC. The Quik-Cap EEG signals were also down-sampled and filtered to match those of EPOC. Five frequency band (delta, theta, alpha, beta, gamma) features from each of the six left-right channel pairs (e.g., AF3-AF4, F7-F8), four fronto-posterior pairs (e.g., AF3-O1, F7-P7) and the 12 selected channels were extracted, resulting a 120D feature vector for each trial. Same as [87], the sparse matrix of RPCA of the feature matrix was used as the final features. The Riemannian distance between the trials of each source subject and the target subject was computed as a dis-similarity to select the most similar source subjects, whose trials were combined with the trials from the target subject to train an SVM classifier.

Zheng et al. [107] considered an interesting cross-device (or cross-modality) and cross-subject TL scenario: use the target subject’s eye tracking data to enhance the performance of cross-subject EEG-based affect classification. It’s a 3-step procedure. First, multiple individual emotion classifiers are trained for the source subjects. Second, a regression function is learned to model the relationship between the data distribution and classifier parameters. Third, a target classifier is constructed using the target feature distribution and the distribution-to-classifier mapping. This heterogeneous TL approach achieved comparable performance with homogeneous EEG-based models and scanpath-based models. To our knowledge, this is the first study that transfers between two different types of signals.

Deep learning has also been used in cross-device transfers of aBCI. EEG trials are usually transformed to some sort of images before input to the deep learning model. In this way, EEG signals from different devices can be made consistent.

Siddharth et al. [108] performed multi-modality (e.g., EEG, ECG, face, etc) cross-dataset emotion classification, e.g., training on DEAP and test on MAHNOB-HCI [109]. We only briefly introduce their EEG-based deep learning approach here, which worked for datasets with different numbers and placements of electrodes, different sampling rates, etc. EEG power spectral density (PSD) in theta, alpha and beta bands were used to plot three topographies for each trial. Then, each topography was considered as a component of a color image, and weighted by the ratio of alpha blending to form the color image. In this way, one color image representing the topographic PSD was obtained for each trial, and the images obtained from different EEG devices can be directly combined or compared. A pre-trained VGG-16 network was used to extract 4,096 features from each image, whose number was later reduced to 30 by PCA. An extreme learning machine was used as the classifier for final classification.

Cimtay and Ekmekcioglu [110] used a pre-trained state-of-the-art CNN model, InceptionResnetV2, for cross-subject and cross-dataset transfers. Since InceptionResnetV2 requires the input data size to be , where is the number of EEG channels and is the number of time domain samples, whereas the number of EEG channels may be smaller than 75, they increased the number of channels by adding noisy copies of them (Gaussian random noise was used) to reach . This process was repeated three times so that each trial became a matrix, which was then used as the input to InceptionResnetV2. They also added a global average pooling and five dense layers after InceptionResnetV2 for classification.

Vii TL in BCI Regression Problems

There are many important BCI regression problems, e.g., driver drowsiness estimation [38, 39, 40], vigilance estimation [12, 11, 111], user reaction time estimation [41], etc., which were not adequately addressed in previous reviews. This section fills this gap. Because there were no publications on cross-device and cross-task transfers in BCI regression problems, we do not have subsections on them.

Vii-a Cross-Subject/Session Transfer

Wu et al. [40] proposed a novel online weighted adaptation regularization for regression (OwARR) algorithm to reduce the amount of subject-specific calibration data in EEG-based driver drowsiness estimation, and also a source domain selection approach to save about half of its computational cost. OwARR minimizes the following loss function, similar to wAR [26]:

(9)

where and are non-negative regularization parameters, and is the overall weight for target domain samples. approximates the sample Pearson correlation coefficient between and . Fuzzy sets were used to define fuzzy classes, so that can be efficiently computed. Briefly speaking, the first two terms in (9) minimize the sum of squared errors in the source domain and target domain, respectively. The 3rd term minimizes the distance between the marginal and conditional probability distributions in the two domains. The last term maximizes the approximated sample Pearson correlation coefficient between and , so that the regression output cannot be a constant. Wu et al. showed that OwARR and OwARR with source domain selection can achieve significantly smaller estimation errors than several other approaches in cross-subject transfer.

Jiang et al. [39] further extended OwARR to multi-view learning, where the first view included theta band powers from all channels, and the second view converted the first view into dBs and removed some bad channels. A TSK fuzzy system was used as the regression model, optimized by minimizing (9) for both views simultaneously, and adding an additional term to enforce the consistency between the two views (the estimation from one view should be close to that from the other view). They demonstrated that the proposed approach outperformed a domain adaptation with model fusion approach [112] in cross-subject transfers.

Wei et al. [113] also performed cross-subject driver drowsiness estimation. Their procedure consisted of three steps: 1) Ranking. For each source subject, it computed six distance measures (Euclidean distance, correlation distance, Chebyshev distance, cosine distance, Kullback-Leibler divergence, and transferability-based distance) between his/her own alert baseline (the first 10 trials) power distribution and all other source subjects’ distributions, and the cross-subject model performance (XP), which is the transferability of other source subjects on the current subject. A support vector regression (SVR) model was then trained to predict XP from the distance measures. In this way, given a target subject with a few calibration trials, the XP of the source subjects can be computed and ranked. 2) Fusion: a weighted average was used to combine the source models, where the weights were determined from a modified logistic function optimized on the source subjects. 3) Re-calibration: the weighted average was subtracted by an offset, estimated as the median of the initial 10 calibration trials (i.e., the alert baseline) from the target subject. They showed that this approach can result in 90% calibration time reduction in driver drowsiness index estimation.

Chen et al. [114] integrated feature selection and an adaptation regularization based TL (ARTL) [48] classifier for cross-subject driver status classification. The most novel part is feature selection, which extends the traditional ReliefF [86] and minimum redundancy maximum relevancy (mRMR) [115] to class separation and domain fusion (CSDF)-ReliefF and CSDF-mRMR, which consider both the class separability and the domain similarity, i.e., the selected feature subset should simultaneously maximize the distinction among different classes and minimize the difference among different domains. Ranks of features from different feature selection algorithms were then fused to identify the best feature set, which was used in ARTL for classification.

Deep learning was also used in BCI regression problems.

Ming et al. [116] proposed stacked differentiable neural computer and demonstrated its effectiveness in cross-subject EEG-based mind load estimation and reaction time estimation. The original Long Short-Term Memory network controller in differentiable neural computers was replaced by a recurrent convolutional network controller, and the memory accessing structures were also adjusted for processing EEG topographic data.

Cui et al. [38] proposed a subject-independent TL approach, feature weighted episodic training (FWET), to completely eliminate the calibration requirement in cross-subject transfer in EEG-based driver drowsiness estimation. It integrates feature weighting to learn the importance of different features, and episodic training for domain generalization. Episodic training considers the conditional distributions directly, and trains a regression network that aligns in all source domains, which usually generalizes well to the unseen target domain . It first establishes a subject-specific feature transformation model and a subject-specific regression model for each source subject to learn the domain-specific information, then trains a feature transformation model that makes the transformed features from Subject  still perform well when applied to a regressor trained on Subject  (). The overall loss function of episodic training, when Subject ’s data are fed into Subject ’s regressor, is:

(10)

where means that is not updated during back propagation. Once the optimal and are obtained, the prediction for is .

Viii TL in Adversarial Attacks of EEG-Based BCIs

Adversarial attacks of EEG-based BCIs represent one of the latest developments in BCIs. It was first studied by Zhang and Wu [42]. They found that adversarial perturbations, which are deliberately designed small perturbations difficult to be noticed by human eyes, can be added to normal EEG trials to fool the machine learning model and cause dramatic performance degradation. Both traditional machine learning models and deep learning models, and both classifiers and regression models in EEG-based BCIs, can be attacked.

Adversarial attacks can target different components of a machine learning model, e.g., training data, model parameters, test data, and test output, as shown in Fig. 2. So far only adversarial examples (benign examples contaminated by adversarial perturbations) targeting at the test inputs have been investigated in EEG-based BCIs, so this section only considers adversarial examples.

Fig. 2: Attack strategies to different components of a machine learning model.

A more detailed illustration of the adversarial example attack scheme is shown in Fig. 3. A jamming module is injected between signal processing and machine learning to generate adversarial examples.

Fig. 3: Adversarial example generation scheme  [42].

Table I shows the three attack types in EEG-based BCIs. White-box attacks know all information about the victim model, including its architecture and parameters, and hence are the easiest to perform. Black-box attacks know nothing about the victim model, but can only supply inputs to it and observe its output, and hence are the most challenging to perform.

Victim model information White-Box Gray-Box Black-Box
Know its architecture
Know its parameters
Know its training data
Can observe its response
TABLE I: Summary of the three attack types in EEG-based BCIs [42].

Viii-a Cross-Model Attacks

Different from the cross-subject/session/device/task TL scenarios considered in the previous five sections, adversarial attacks in BCIs so far mainly considered cross-model attacks222Existing publications [42, 43] also considered cross-subject attacks, but the meaning of cross-subject in adversarial attacks is different from the cross-subject TL setting in previous sections: in adversarial attacks, cross-subject means that the same machine learning model is used by all subjects, but the adversarial example generate scheme is designed on some subjects and applied to another subject. It assumes that the victim machine learning model works well for all subjects., where adversarial examples generated from one machine learning model are used to attack another model. This is necessary in gray-box and black-box attacks, because the victim model is unknown, and the attacker needs to construct his own model (called substitute model) to approximate the victim model.

Interestingly, cross-model attacks can be performed without explicitly considering TL. They are usually achieved by making use of the transferability of adversarial examples [117], which means adversarial examples generated by one machine learning model can also, with high probability, fool another model even the two models are different. The fundamental reason behind this property is still unclear, but it does not hinder people from making use of it.

For example, Zhang and Wu [42] proposed unsupervised fast gradient sign methods, which can effectively perform white-box, gray-box and black-box attacks on deep learning classifiers. Two BCI paradigms, i.e., MI and ERP, and three popular deep learning models, i.e., EEGNet, Deep ConvNet and Shallow ConvNet, were considered.

Ix Conclusions

This paper has reviewed recently proposed TL approaches in EEG-based BCIs, according to six different paradigms and applications: MI, ERP, SSVEP, aBCI, regression problems, and adversarial attacks. TL algorithms were grouped into cross-subject/session, cross-device and cross-task settings and introduced separately. Connections among similar approaches were also pointed out.

The following observations and conclusions can be made, which may point to some future research directions:

  1. Among the three classic BCI paradigms (MI, ERP and SSVEP), SSVEP seems to receive the least amount of attention. Very few TL approaches have been proposed for it recently. One reason may be that MI and ERP are very similar, so many TL approaches developed for MIs can be applied to ERPs directly or with little modification, e.g., RA, EA, RPA and EEGNet, whereas SSVEP is a quite different paradigm.

  2. Two new applications of EEG-based BCIs, i.e., aBCI and regression problems, have been attracting more and more research interests. Interestingly, both of them are passive BCIs [118]. Although both classification and regression problems can be formulated in aBCI, existing research focused almost exclusively on classification problems.

  3. Adversarial attacks, one of the latest developments in EEG-based BCIs, can be performed cross different machine learning model, by utilizing the transferability of adversarial examples. However, explicitly considering TL between different domains may further improve the attack performance. For example, in black-box attacks, TL could make use of publicly available datasets to reduce the number of queries to the victim model; or, in other words, to better approximate the victim model given the same number of queries.

  4. Most TL studies focused on cross-subject/session transfers. Cross-device transfers have started to attract attention, but cross-task transfers remain largely unexplored. To our knowledge, there has been only one such study [50] since 2016. Effective cross-device and cross-task transfers would make EEG-based BCIs much more practical.

  5. Among various TL approaches, Riemannian Geometry and deep learning are emerging and gaining momentum. Each has a cluster of approaches proposed.

  6. Although most research on TL in BCIs focused on the classifier or the regression model, i.e., at the pattern recognition stage, TL in BCIs can also be performed in trial alignment, e.g., RA, EA, LA and RPA, in signal filtering, e.g., transfer kernel common spatial patterns [51], and in feature extraction/selection, e.g., CSDF-ReliefF and CSDF-mRMR [114]. Additionally, these TL-based individual components can also be assembled into a complete machine learning pipeline to achieve even better performance. For example, EA and LA data alignments have been combined with TL classifiers [60, 50], and CSDF-ReliefF and CSDF-mRMR feature selection approaches have also been integrated with TL classifiers [114].

  7. TL can also be integrated with other machine learning approaches, e.g., active learning [24], for improved performance [25, 26].

References

  • [1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain-computer interfaces for communication and control,” Clinical Neurophysiology, vol. 113, no. 6, pp. 767–791, 2002.
  • [2] B. J. Lance, S. E. Kerick, A. J. Ries, K. S. Oie, and K. McDowell, “Brain-computer interface technologies in the coming decades,” Proc. of the IEEE, vol. 100, no. 3, pp. 1585–1599, 2012.
  • [3] J. J. Vidal, “Toward direct brain-computer communication,” Annual Review of Biophysics and Bioengineering, vol. 2, no. 1, pp. 157–180, 1973.
  • [4] E. E. Fetz, “Operant conditioning of cortical unit activity,” Science, vol. 163, no. 3870, pp. 955–958, 1969.
  • [5] J. M. R. Delgado, Physical control of the mind: Toward a psychocivilized society.   New York City: World Bank Publications, 1969, vol. 41.
  • [6] G. Pfurtscheller, G. R. Müller-Putz, R. Scherer, and C. Neuper, “Rehabilitation with brain-computer interface systems,” Computer, vol. 41, no. 10, pp. 58–65, 2008.
  • [7] J. van Erp, F. Lotte, and M. Tangermann, “Brain-computer interfaces: Beyond medical applications,” Computer, vol. 45, no. 4, pp. 26–34, 2012.
  • [8] D. Marshall, D. Coyle, S. Wilson, and M. Callaghan, “Games, gameplay, and BCI: The state of the art,” IEEE Trans. on Computational Intelligence and AI in Games, vol. 5, no. 2, pp. 82–99, 2013.
  • [9] W.-L. Zheng and B.-L. Lu, “Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks,” IEEE Trans. on Autonomous Mental Development, vol. 7, no. 3, pp. 162–175, 2015.
  • [10] T. G. Monteiro, C. Skourup, and H. Zhang, “Using EEG for mental fatigue assessment: A comprehensive look into the current state of the art,” IEEE Trans. on Human-Machine Systems, vol. 49, no. 6, pp. 599–610, 2019.
  • [11] W.-L. Zheng and B.-L. Lu, “A multimodal approach to estimating vigilance using EEG and forehead EOG,” Journal of Neural Engineering, vol. 14, no. 2, p. 026017, 2017.
  • [12] L.-C. Shi and B.-L. Lu, “EEG-based vigilance estimation using extreme learning machines,” Neurocomputing, vol. 102, pp. 135–143, 2013.
  • [13] R. P. Rao, Brain-Computer Interfacing: An Introduction.   New York, NY: Cambridge University Press, 2013.
  • [14] L.-D. Liao, C.-T. Lin, K. McDowell, A. Wickenden, K. Gramann, T.-P. Jung, L.-W. Ko, and J.-Y. Chang, “Biosensor technologies for augmented brain-computer interfaces in the next decades,” Proc. of the IEEE, vol. 100, no. 2, pp. 1553–1566, 2012.
  • [15] S. Makeig, C. Kothe, T. Mullen, N. Bigdely-Shamlo, Z. Zhang, and K. Kreutz-Delgado, “Evolving signal processing for brain-computer interfaces,” Proc. of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567–1584, 2012.
  • [16] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatial filtering of single trial EEG during imagined hand movement,” IEEE Trans. on Rehabilitation Engineering, vol. 8, no. 4, pp. 441–446, 2000.
  • [17] S. Makeig, A. J. Bell, T.-P. Jung, and T. J. Sejnowski, “Independent component analysis of electroencephalographic data,” in Proc. Advances in Neural Information Processing Systems, Denver, CO, Dec. 1996, pp. 145–151.
  • [18] T.-P. Jung, S. Makeig, C. Humphries, T.-W. Lee, M. J. Mckeown, V. Iragui, and T. J. Sejnowski, “Removing electroencephalographic artifacts by blind source separation,” Psychophysiology, vol. 37, no. 2, pp. 163–178, 2000.
  • [19] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xDAWN algorithm to enhance evoked potentials: application to brain-computer interface,” IEEE Trans. on Biomedical Engineering, vol. 56, no. 8, pp. 2035–2043, 2009.
  • [20] X.-W. Wang, D. Nie, and B.-L. Lu, “Emotional state classification from EEG data using machine learning approach,” Neurocomputing, vol. 129, pp. 94–106, 2014.
  • [21] F. Yger, M. Berar, and F. Lotte, “Riemannian approaches in brain-computer interfaces: a review,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 25, no. 10, pp. 1753–1762, 2017.
  • [22] X. Wu, W.-L. Zheng, and B.-L. Lu, “Identifying functional brain connectivity patterns for EEG-based emotion recognition,” in Proc. 9th IEEE/EMBS Int’l Conf. on Neural Engineering, San Francisco, CA, Mar. 2019, pp. 235–238.
  • [23] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
  • [24] B. Settles, “Active learning literature survey,” University of Wisconsin–Madison, Computer Sciences Technical Report 1648, 2009.
  • [25] D. Wu, B. J. Lance, and T. D. Parsons, “Collaborative filtering for brain-computer interaction using transfer learning and active class selection,” PLoS ONE, 2013.
  • [26] D. Wu, V. J. Lawhern, W. D. Hairston, and B. J. Lance, “Switching EEG headsets made easy: Reducing offline calibration effort using active wighted adaptation regularization,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 24, no. 11, pp. 1125–1137, 2016.
  • [27] G. Pfurtscheller and C. Neuper, “Motor imagery and direct brain-computer communication,” Proc. of the IEEE, vol. 89, no. 7, pp. 1123–1134, 2001.
  • [28] T. C. Handy, Ed., Event-related potentials: A methods handbook.   Boston, MA: The MIT Press, 2005.
  • [29] S. Lees, N. Dayan, H. Cecotti, P. McCullagh, L. Maguire, F. Lotte, and D. Coyle, “A review of rapid serial visual presentation-based brain–computer interfaces,” Journal of Neural Engineering, vol. 15, no. 2, p. 021001, 2018.
  • [30] S. Sutton, M. Braren, J. Zubin, and E. John, “Evoked-potential correlates of stimulus uncertainty,” Science, vol. 150, no. 3700, pp. 1187–1188, 1965.
  • [31] O. Friman, I. Volosyak, and A. Graser, “Multiple channel detection of steady-state visual evoked potentials for brain-computer interfaces,” IEEE Trans. on Biomedical Engineering, vol. 54, no. 4, pp. 742–750, 2007.
  • [32] F. Beverina, G. Palmas, S. Silvoni, F. Piccione, S. Giove et al., “User adaptive BCIs: SSVEP and P300 based interfaces.” PsychNology Journal, vol. 1, no. 4, pp. 331–354, 2003.
  • [33] X. Chen, Y. Wang, M. Nakanishi, X. Gao, T.-P. Jung, and S. Gao, “High-speed spelling with a noninvasive brain-computer interface,” Proc. National Ccademy of Eciences, vol. 112, no. 44, pp. E6058–E6067, 2015.
  • [34] C. Mühl, B. Allison, A. Nijholt, and G. Chanel, “A survey of affective brain computer interfaces: principles, state-of-the-art, and challenges,” Brain-Computer Interfaces, vol. 1, no. 2, pp. 66–84, 2014.
  • [35] A. Al-Nafjan, M. Hosny, Y. Al-Ohali, and A. Al-Wabil, “Review and classification of emotion recognition based on EEG brain-computer interface system research: a systematic review,” Applied Sciences, vol. 7, no. 12, p. 1239, 2017.
  • [36] Y.-W. Shen and Y.-P. Lin, “Challenge for affective brain-computer interfaces: Non-stationary spatio-spectral EEG oscillations of emotional responses,” Frontiers in Human Neuroscience, vol. 13, 2019.
  • [37] S. M. Alarcao and M. J. Fonseca, “Emotions recognition using EEG signals: A survey,” IEEE Trans. on Affective Computing, vol. 10, no. 3, pp. 374–393, 2019.
  • [38] Y. Cui, Y. Xu, and D. Wu, “EEG-based driver drowsiness estimation using feature weighted episodic training,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 27, no. 11, pp. 2263–2273, 2019.
  • [39] Y. Jiang, Y. Zhang, C. Lin, D. Wu, and C.-T. Lin, “EEG-based driver drowsiness estimation using an online multi-view and transfer TSK fuzzy system,” IEEE Trans. on Intelligent Transportation Systems, 2020, in press.
  • [40] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.-T. Lin, “Driver drowsiness estimation from EEG signals using online weighted adaptation regularization for regression (OwARR),” IEEE Trans. on Fuzzy Systems, vol. 25, no. 6, pp. 1522–1535, 2017.
  • [41] D. Wu, V. J. Lawhern, B. J. Lance, S. Gordon, T.-P. Jung, and C.-T. Lin, “EEG-based user reaction time estimation using Riemannian geometry features,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 25, no. 11, pp. 2157–2168, 2017.
  • [42] X. Zhang and D. Wu, “On the vulnerability of CNN classifiers in EEG-based BCIs,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 27, no. 5, pp. 814–825, 2019.
  • [43] L. Meng, C.-T. Lin, T.-P. Jung, and D. Wu, “White-box target attack for EEG-based BCI regression problems,” in Proc. Int’l Conf. on Neural Information Processing, Sydney, Australia, Dec. 2019.
  • [44] P. Wang, J. Lu, B. Zhang, and Z. Tang, “A review on transfer learning for brain-computer interface classification,” in Proc. 5th Int’l Conf. on Information Science and Technology, Changsha, China, Apr. 2015, pp. 315–322.
  • [45] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, and M. Grosse-Wentrup, “Transfer learning in brain-computer interfaces,” IEEE Computational Intelligence Magazine, vol. 11, no. 1, pp. 20–31, 2016.
  • [46] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger, “A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update,” Journal of Neural Engineering, vol. 15, no. 3, p. 031005, 2018.
  • [47] A. M. Azab, J. Toth, L. S. Mihaylova, and M. Arvaneh, “A review on transfer learning approaches in brain–computer interface,” in Signal Processing and Machine Learning for Brain-Machine Interfaces, T. Tanaka and M. Arvaneh, Eds.   The Institution of Engineering and Technology, 2018, pp. 81–98.
  • [48] M. Long, J. Wang, G. Ding, S. J. Pan, and P. S. Yu, “Adaptation regularization: A general framework for transfer learning,” IEEE Trans. on Knowledge and Data Engineering, vol. 26, no. 5, pp. 1076–1089, 2014.
  • [49] Y. Zhang and Q. Yang, “An overview of multi-task learning,” National Science Review, vol. 5, no. 1, pp. 30–43, 2018.
  • [50] H. He and D. Wu, “Different set domain adaptation for brain-computer interfaces: A label alignment approach,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, 2020, in press. [Online]. Available: https://arxiv.org/abs/1912.01166
  • [51] M. Dai, D. Zheng, S. Liu, and P. Zhang, “Transfer kernel common spatial patterns for motor imagery brain-computer interface classification,” Computational and Mathematical Methods in Medicine, vol. 2018, 2018.
  • [52] H. Albalawi and X. Song, “A study of kernel CSP-based motor imagery brain computer interface classification,” in Proc. IEEE Signal Processing in Medicine and Biology Symposium, New York City, NY, Dec. 2012, pp. 1–4.
  • [53] M. Long, J. Wang, J. Sun, and P. S. Yu, “Domain invariant transfer kernel learning,” IEEE Trans. on Knowledge and Data Engineering, vol. 27, no. 6, pp. 1519–1532, 2015.
  • [54] A. M. Azab, L. Mihaylova, K. K. Ang, and M. Arvaneh, “Weighted transfer learning for improving motor imagery-based brain–computer interface,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 27, no. 7, pp. 1352–1359, 2019.
  • [55] I. Hossain, A. Khosravi, I. Hettiarachchi, and S. Nahavandi, “Multiclass informative instance transfer learning framework for motor imagery-based brain-computer interface,” Computational Intelligence and Neuroscience, vol. 2018, 2018.
  • [56] D. Wu, B. J. Lance, and V. J. Lawhern, “Transfer learning and active transfer learning for reducing calibration data in single-trial classification of visually-evoked potentials,” in Proc. IEEE Int’l Conf. on Systems, Man, and Cybernetics, San Diego, CA, October 2014.
  • [57] P. Zanini, M. Congedo, C. Jutten, S. Said, and Y. Berthoumieu, “Transfer learning: a Riemannian geometry framework with applications to brain-computer interfaces,” IEEE Trans. on Biomedical Engineering, vol. 65, no. 5, pp. 1107–1116, 2018.
  • [58] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain-computer interface classification by Riemannian geometry,” IEEE Trans. on Biomedical Engineering, vol. 59, no. 4, pp. 920–928, 2012.
  • [59] O. Yair, M. Ben-Chen, and R. Talmon, “Parallel transport on the cone manifold of SPD matrices for domain adaptation,” IEEE Trans. on Signal Processing, vol. 67, no. 7, pp. 1797–1811, 2019.
  • [60] H. He and D. Wu, “Transfer learning for brain-computer interfaces: A Euclidean space data alignment approach,” IEEE Trans. on Biomedical Engineering, vol. 67, no. 2, pp. 399–410, 2020.
  • [61] P. L. C. Rodrigues, C. Jutten, and M. Congedo, “Riemannian procrustes analysis: Transfer learning for brain–computer interfaces,” IEEE Trans. on Biomedical Engineering, vol. 66, no. 8, pp. 2390–2401, 2019.
  • [62] W. Zhang and D. Wu, “Manifold embedded knowledge transfer for brain-computer interfaces,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, 2020, in press.
  • [63] A. Singh, S. Lal, and H. W. Guesgen, “Small sample motor imagery classification using regularized Riemannian features,” IEEE Access, vol. 7, pp. 46 858–46 869, 2019.
  • [64] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Riemannian geometry applied to BCI classification,” in Proc. Int’l Conf. on Latent Variable Analysis and Signal Separation, St. Malo, France, Sep. 2010, pp. 629–636.
  • [65] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for EEG decoding and visualization,” Human Brain Mapping, vol. 38, no. 11, pp. 5391–5420, 2017.
  • [66] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan, “Filter bank common spatial pattern (FBCSP) in brain-computer interface,” in Proc. IEEE World Congress on Computational Intelligence, Hong Kong, June 2008, pp. 2390–2397.
  • [67] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces,” Journal of Neural Engineering, vol. 15, no. 5, p. 056013, 2018.
  • [68] O. Kwon, M. Lee, C. Guan, and S. Lee, “Subject-independent brain-computer interfaces based on deep convolutional neural networks,” IEEE Trans. on Neural Networks and Learning Systems, 2020, in press.
  • [69] L. Xu, M. Xu, Y. Ke, X. An, S. Liu, and D. Ming, “Cross-dataset variability problem in EEG decoding with deep learning,” Frontiers in Human Neuroscience, vol. 14, p. 103, 2020.
  • [70] N. R. Waytowich, V. J. Lawhern, A. W. Bohannon, K. R. Ball, and B. J. Lance, “Spectral transfer learning using Information Geometry for a user-independent brain-computer interface,” Frontiers in Neuroscience, vol. 10, p. 430, 2016.
  • [71] F. Parisi, F. Strino, B. Nadler, and Y. Kluger, “Ranking and combining multiple predictors without labeled data,” Proc. National Academy of Science, vol. 111, no. 4, pp. 1253–1258, 2014.
  • [72] D. Wu, “Online and offline domain adaptation for reducing BCI calibration effort,” IEEE Trans. on Human-Machine Systems, vol. 47, no. 4, pp. 550–563, 2017.
  • [73] H. Qi, Y. Xue, L. Xu, Y. Cao, and X. Jiao, “A speedy calibration method using Riemannian geometry measurement and other-subject samples on a P300 speller,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 26, no. 3, pp. 602–608, 2018.
  • [74] J. Jin, S. Li, I. Daly, Y. Miao, C. Liu, X. Wang, and A. Cichocki, “The study of generic model set for reducing calibration time in P300-based brain–computer interface,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 28, no. 1, pp. 3–12, 2020.
  • [75] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, Dec. 2014, pp. 2672–2680.
  • [76] Y. Ming, D. Pelusi, W. Ding, Y.-K. Wang, M. Prasad, D. Wu, and C.-T. Lin, “Subject adaptation network for EEG data analysis,” Applied Soft Computing, vol. 84, p. 105689, 2019.
  • [77] N. Waytowich, V. J. Lawhern, J. O. Garcia, J. Cummings, J. Faller, P. Sajda, and J. M. Vettel, “Compact convolutional neural networks for classification of asynchronous steady-state visual evoked potentials,” Journal of Neural Engineering, vol. 15, no. 6, p. 066031, 2018.
  • [78] M. Nakanishi, Y.-K. Wang, C.-S. Wei, K. Chiang, and T.-P. Jung, “Facilitating calibration in high-speed BCI spellers via leveraging cross-device shared latent responses,” IEEE Trans. on Biomedical Engineering, vol. 67, no. 4, pp. 1105–1113, 2020.
  • [79] P. Ekman and W. Friesen, “Constants across cultures in the face and emotion,” Journal of Personality and Social Psychology, vol. 17, pp. 124–129, 1971.
  • [80] J. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, no. 6, pp. 1161–1178, 1980.
  • [81] A. Mehrabian, Basic Dimensions for a General Psychological Theory: Implications for Personality, Social, Environmental, and Developmental Studies.   Oelgeschlager, Gunn & Hain, 1980.
  • [82] S. Koelstra, C. Muhl, M. Soleymani, J. S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “DEAP: A database for emotion analysis using physiological signals,” IEEE Trans. on Affective Computing, vol. 3, no. 1, pp. 18–31, 2012.
  • [83] W.-L. Zheng, J. Zhu, and B.-L. Lu, “Identifying stable patterns over time for emotion recognition from EEG,” IEEE Trans. on Affective Computing, vol. 10, no. 3, pp. 417–429, 2019.
  • [84] X. Chai, Q. Wang, Y. Zhao, Y. Li, D. Liu, X. Liu, and O. Bai, “A fast, efficient domain adaptation technique for cross-domain electroencephalography(EEG)-based emotion recognition,” Sensors, vol. 17, no. 5, p. 1014, 2017.
  • [85] Y.-P. Lin and T.-P. Jung, “Improving EEG-based emotion classification using conditional transfer learning,” Frontiers in Human Neuroscience, vol. 11, p. 334, 2017.
  • [86] I. Kononenko, “Estimating attributes: Analysis and extensions of RELIEF,” in Proc. European Conf. on Machine Learning, Catania, Italy, Apr. 1994, pp. 171–182.
  • [87] Y.-P. Lin, P.-K. Jao, and Y.-H. Yang, “Improving cross-day EEG-based emotion classification using robust principal component analysis,” Frontiers in Computational Neuroscience, vol. 11, p. 64, 2017.
  • [88] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, 2011.
  • [89] X. Li, D. Song, P. Zhang, Y. Zhang, Y. Hou, and B. Hu, “Exploring EEG features in cross-subject emotion recognition,” Frontiers in Neuroscience, vol. 12, p. 162, 2018.
  • [90] S. Liu, L. Chen, D. Guo, X. Liu, Y. Sheng, Y. Ke, M. Xu, X. An, J. Yang, and D. Ming, “Incorporation of multiple-days information to improve the generalization of EEG-based emotion recognition over time,” Frontiers in Human Neuroscience, vol. 12, p. 267, 2018.
  • [91] F. Yang, X. Zhao, W. Jiang, P. Gao, and G. Liu, “Multi-method fusion of cross-subject emotion recognition based on high-dimensional EEG features,” Frontiers in computational neuroscience, vol. 13, 2019.
  • [92] J. Li, S. Qiu, Y. Shen, C. Liu, and H. He, “Multisource transfer learning for cross-subject EEG emotion recognition,” IEEE Trans. on Cybernetics, 2020, in press.
  • [93] X. Chai, Q. Wang, Y. Zhao, X. Liu, O. Bai, and Y. Li, “Unsupervised domain adaptation techniques based on auto-encoder for non-stationary EEG-based emotion recognition,” Computers in Biology and Medicine, vol. 79, pp. 205–214, 2016.
  • [94] Z. Yin and J. Zhang, “Cross-session classification of mental workload levels using EEG and an adaptive deep learning model,” Biomedical Signal Processing and Control, vol. 33, pp. 30–47, 2017.
  • [95] W.-L. Zheng, W. Liu, Y. Lu, B.-L. Lu, and A. Cichocki, “EmotionMeter: A multimodal framework for recognizing human emotions,” IEEE Trans. on Cybernetics, vol. 49, no. 3, pp. 1110–1122, 2019.
  • [96] F. Fahimi, Z. Zhang, W. B. Goh, T.-S. Lee, K. K. Ang, and C. Guan, “Inter-subject transfer learning with an end-to-end deep convolutional neural network for EEG-based BCI,” Journal of Neural Engineering, vol. 16, no. 2, p. 026007, 2019.
  • [97] Y. Li, W. Zheng, L. Wang, Y. Zong, and Z. Cui, “From regional to global brain: A novel hierarchical spatial-temporal neural network model for EEG emotion recognition,” IEEE Trans. on Affective Computing, 2020, in press.
  • [98] Y. Li, W. Zheng, Y. Zong, Z. Cui, T. Zhang, and X. Zhou, “A bi-hemisphere domain adversarial neural network model for EEG emotion recognition,” IEEE Trans. on Affective Computing, 2020, in press.
  • [99] J. Li, S. Qiu, C. Du, Y. Wang, and H. He, “Domain adaptation for EEG emotion recognition based on latent representation similarity,” IEEE Trans. on Cognitive and Developmental Systems, 2020, in press.
  • [100] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in

    Proc. IEEE Int’l Conf. on Computer Vision

    , Sydney, Australia, Dec. 2013, pp. 2200–2207.
  • [101] T. Song, W. Zheng, P. Song, and Z. Cui, “EEG emotion recognition using dynamical graph convolutional neural networks,” IEEE Trans. on Affective Computing, 2020, in press.
  • [102] A. Appriou, A. Cichocki, and F. Lotte, “Modern machine learning algorithms to classify cognitive and affective states from electroencephalography signals,” IEEE Systems, Man and Cybernetics Magazine, 2020.
  • [103] Z. Lan, O. Sourina, L. Wang, R. Scherer, and G. R. Muller-Putz, “Domain adaptation techniques for EEG-based emotion recognition: A comparative study on two public datasets,” IEEE Trans. on Cognitive and Developmental Systems, vol. 11, no. 1, pp. 85–94, 2019.
  • [104] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Trans. on Neural Networks, vol. 22, no. 2, pp. 199–210, 2010.
  • [105] K. Yan, L. Kou, and D. Zhang, “Learning domain-invariant subspace using domain features and independence maximization,” IEEE Trans. on Cybernetics, vol. 48, no. 1, pp. 288–299, 2018.
  • [106] Y. Lin, “Constructing a personalized cross-day EEG-based emotion-classification model using transfer learning,” IEEE Journal of Biomedical and Health Informatics, 2020, in press.
  • [107] W.-L. Zheng, Z.-F. Shi, and B.-L. Lu, “Building cross-subject EEG-based affective models using heterogeneous transfer learning,” Chinese Journal of Computers, vol. 43, no. 2, pp. 177–189, 2020, in Chinese.
  • [108] S. Siddharth, T. Jung, and T. J. Sejnowski, “Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing,” IEEE Trans. on Affective Computing, 2020, in press.
  • [109] M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multimodal database for affect recognition and implicit tagging,” IEEE Trans. on Affective Computing, vol. 3, no. 1, pp. 42–55, 2012.
  • [110] Y. Cimtay and E. Ekmekcioglu, “Investigating the use of pretrained convolutional neural network on cross-subject and cross-dataset EEG emotion recognition,” Sensors, vol. 20, no. 7, p. 2034, 2020.
  • [111] W.-L. Zheng, K. Gao, G. Li, W. Liu, C. Liu, J. Liu, G. Wang, and B.-L. Lu, “Vigilance estimation using a wearable EOG device in real driving environment,” IEEE Trans. on Intelligent Transportation Systems, vol. 21, no. 1, pp. 170–184, 2020.
  • [112] D. Wu, C.-H. Chuang, and C.-T. Lin, “Online driver’s drowsiness estimation using domain adaptation with model fusion,” in Proc. Int’l Conf. on Affective Computing and Intelligent Interaction, Xi’an, China, September 2015, pp. 904–910.
  • [113] C.-S. Wei, Y.-P. Lin, Y.-T. Wang, C.-T. Lin, and T.-P. Jung, “A subject-transfer framework for obviating inter-and intra-subject variability in EEG-based drowsiness detection,” NeuroImage, vol. 174, pp. 407–419, 2018.
  • [114] L.-L. Chen, A. Zhang, and X.-G. Lou, “Cross-subject driver status detection from physiological signals based on hybrid feature selection and transfer learning,” Expert Systems with Applications, vol. 137, pp. 266–280, 2019.
  • [115] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005.
  • [116] Y. Ming, D. Pelusi, C.-N. Fang, M. Prasad, Y.-K. Wang, D. Wu, and C.-T. Lin, “EEG data analysis with stacked differentiable neural computers,” Neural Computing and Applications, 2018.
  • [117] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. Int’l Conf. on Learning Representations, Banff, Canada, Apr. 2014.
  • [118] P. Aricò, G. Borghini, G. Di Flumeri, N. Sciaraffa, and F. Babiloni, “Passive BCI beyond the lab: current trends and future directions,” Physiological Measurement, vol. 39, no. 8, p. 08TR02, 2018.