1 Introduction
Brain computer interfaces (BCIs) [19, 10] provide a direct communication pathway for a user to interact with a computer or external device by using his/her brain signals, which include electroencephalogram (EEG), magnetoencephalogram (MEG), functional magnetic resonance imaging (fMRI), functional nearinfrared spectroscopy (fNIRS), electrocorticography (ECoG), and so on. EEGbased BCIs have attracted great attention because they have little risk (no need for surgery), are convenience to use, and offer high temporal resolution. They have been used for robotics, speller, games, and medical applications [14, 6].
However, there are still many challenges for widespread realworld applications of EEGbased BCIs [13, 10]. One of them is related to the EEG signal quality. EEG signals can be easily contaminated by various artifacts and noise, including muscle movements, eye blinks, heartbeats, environmental electromagnetic fields, etc. Common approaches to clean EEG signals including timedomain filtering and spatial filtering. Common spatial pattern (CSP) filtering [21, 3, 17, 25] is one of the most popular and effective spatial filters for EEG to increase its signaltonoise ratio.
CSP performs supervised filtering, which requires some subjectspecific calibration data to design. This is timeconsuming and not userfriendly. A promising approach for shortening or even completely eliminating this calibration session is transfer learning (TL) [15], which has already been extensively used to handle individual differences and nonstationarity in EEGbased BCI [8, 18, 20, drwuTFS2016, 23, 24, 22]. TL leverages relevant data or knowledge from other subjects or tasks to reduce the calibration effort for a new subject or task. Traditionally, EEG signal processing (e.g., CSP filtering) and classification (e.g., TL) are performed sequentially and independently. However, recent research has shown that TL may be used to directly enhance CSP for better filtering performance [9, 4, 12].
This paper focuses on TL enhanced CSPs. Its main contributions are:

We group existing TL enhanced CSPs into two categories and give a comprehensive review of them. To our knowledge, this is the first review in this direction.

We propose a novel TL enhanced CSP approach, and demonstrate its performance against existing approaches on EEGbased motor imagery classification.
The rest of this paper is organized as follows: Section 2 introduces CSP and TL, and gives an overview of existing approaches for incorporating TL into CSP. Section 3 proposes a new instancebased TL approach to enhance CSP. Section 4 compares the performance of all these approaches. Finally, Section 5 draws conclusions and points out several future research directions.
2 Existing TL Enhanced CSP Filters
This section briefly introduces CSP and TL, and reviews three existing approaches for integrating them.
2.1 Common Spatial Pattern (CSP)
Let
be an EEG epoch, where
is the number of channels and the number of time samples. For simplicity, only binary classification is considered in this paper.separates a multivariate signal into additive subcomponents which have maximum differences in variance between the two classes. Specifically, CSP finds a filter matrix to maximize the variance for one class while minimizing it for the other:
(1) 
where is the filter matrix consisting of filters, is the trace of a matrix, and are the mean covariance matrices of epochs in Classes 0 and 1, respectively. The solution is the concatenation of the Feigenvectors associated with the F
largest eigenvalues of the matrix
.In practice, we often construct a CSP filter matrix , where
(2) 
i.e., maximizes the variance for Class 1 while minimizing it for Class 0. Similar to , is the concatenation of the eigenvectors associated with the largest eigenvalues of the matrix . Since and have the same eigenvectors, and the eigenvalues of are the inverses of the eigenvalues of , actually consists of the eigenvectors associated with the smallest eigenvalues of the matrix . So, only one eigendecomposition of the matrix (or ) is needed in computing .
Once is obtained, CSP projects an EEG epoch to by:
(3) 
Usually , so CSP can increase the signaltonoise ratio and reduce the dimensionality simultaneously.
After CSP filtering, the logarithmic variance feature vector is then calculated as
[4]:(4) 
where returns the diagonal elements of a matrix.
can be used as the input to a classifier, e.g., linear discriminant analysis (LDA).
2.2 Transfer Learning (TL)
TL has been extensively used in BCIs to reduce their calibration effort [8, 18, 20, drwuTFS2016, 23]. Some basic concepts of TL are introduced in this subsection.
A domain [15, 11] in TL consists of a feature space
and a marginal probability distribution
, i.e., , where . Two domains and are different if , and/or .A task [15, 11] in TL consists of a label space
and a conditional probability distribution
. Two tasks and are different if , or .Given a source domain with labeled samples, and a target domain with labeled samples and unlabeled samples, TL learns a target prediction function with low expected error on , under the assumptions , , , and/or .
For example, in EEGbased motor imagery classification studied in this paper, a source domain consists of EEG epochs from an existing subject, and the target domain consists of EEG epochs from a new subject. When there are source domains , we can perform TL for each of them separately and then aggregate the classifiers, or treat the combination of the source domains as a single source domain.
2.3 Incorporating TL into CSP: Covariance MatrixBased Approaches
Since covariance matrices are used in CSP, whereas the target domain does not have enough labeled samples to reliably estimate them, a direction to incorporate TL into CSP is to utilize the source domain covariance matrices to enhance the estimation of the target domain ones.
Kang et al. [9] proposed a subjecttosubject transfer approach, which emphasizes the covariance matrices of source subjects who are more similar to the target subject. They computed the dissimilarity between the target subject and each source subject by KullbackLeibler (KL) divergence between their data distributions, and then used the inverses of these dissimilarities as weights to combine the source domain covariance matrices.
Let be the EEG data distribution in the th source domain , which is assumed to be dimensional Gaussian with zero mean and covariance matrix , i.e., . Let be the data distribution in the target domain , which is dimensional Gaussian with zero mean and covariance matrix , i.e., . The KL divergence between and is computed as [9]:
(5) 
where is the matrix determinant.
Then, the TLenhanced covariance matrix for the target subject is computed as:
(6) 
where is an adjustable parameter to balance the information from the target subject and source subjects, and
(7) 
in which is a normalization factor.
Lotte and Guan [12] proposed a similar approach for incorporating TL into CSP, based on the covariance matrices:
(8) 
where is the set of subjects whose data have been recorded previously, is a subset of subjects from , is the number of subjects in , and is defined by
(9) 
in which is the leaveoneout validation accuracy on the target domain labeled samples when the classifier is trained by using only the target domain labeled samples, is the accuracy on the target domain labeled samples when the classifier is trained by using only the labeled samples from the selected source subjects in , and is the classification accuracy at the chance level (e.g., 50% for binary classification). The algorithm for determining can be found in [12].
2.4 Incorporating TL into CSP: A ModelBased Approach
Instead of learning a single set of CSP filters by aggregating information from the target subject and all (or a subset of) source subjects, as introduced in the previous subsection, Dalhoumi et al. [4] proposed an approach to design a set of CSP filters for each source subject, train a classifier for each source subject according to the extracted features, and then aggregate all these source classifiers to obtain the target classifier.
Let and be the CSP filter matrix and classifier trained for the th source subject, respectively, and be the labeled target domain data. We first filter each by , extract the corresponding feature vector using (4), and then feed into model to obtain its classification . The final classifier is:
(10) 
where the weights are determined by solving the following constrained minimization problem:
(11)  
where is the loss between and .
Dalhoumi et al. [4] also constructed another CSP filter matrix and the corresponding classifier using the target domain data only, and compared its leaveoneout validation performance with that of to determine which one should be used as the preferred classifier. Because the goal of this paper is to compare different TL enhanced CSP approaches, we always use .
3 Incorporating TL into CSP: InstanceBased Approaches
This section introduces our proposed approach for incorporating TL into CSP. It’s an instancebased approach, meaning that the source domain labeled samples are combined with the target domain labeled samples in a certain way to design the CSP.
The simplest instancebase approach is to directly combine the labeled samples from the target domain and all source domains. However, this is usually not optimal because it completely ignores the individual difference: some source domain samples may be more similar to the target domain samples, so they should be given more consideration.
So, a better approach is to reweight the source domain samples according to their similarity to the target domain samples, and then use them in the CSP. The main problem is how to optimally reweight the source samples. We adopt the approach proposed by Huang et al. [7], which is a generic method for correcting sample collection bias and has not been used for CSP and BCIs. It assigns different weights to the source domain samples to minimize the Maximum Mean Discrepancy [2] between the source and target domains after mapping onto a reproducing kernel Hilbert space. More specifically, it solves the following constrained minimization problem:
(12)  
where is the th source domain sample, is the th target domain sample, is a feature mapping onto a reproducing kernel Hilbert space , is the weight vector for the source domain samples, is the number of source domain samples, is the number of target domain samples, and and are adjustable parameters.
The source domain samples are then reweighted by and combined with the target domain samples to design a CSP filter matrix.
4 Experiment and Results
This section presents a comparative study of the above TLenhanced CSP algorithms.
4.1 Dataset and Preprocessing
We used Dataset 2a from BCI competition IV^{1}^{1}1http://www.bbci.de/competition/iv/., which consists of EEG data from 9 subjects. Every subject was instructed to perform four different motor imagery tasks, namely the imagination of movement of the left hand, right hand, both feet, and tongue. A training session and a test session were recorded on different days for each subject and each session is comprised of 288 epochs (72 for each of the four classes). The signals were recorded using 22 EEG channels and 3 EOG channels at 250Hz and bandpass filtered between 0.5Hz and 100Hz.
Only the 22 EEG channels were used in our study. We further processed them using the Matlab EEGLAB toolbox [5]. They were first downsampled to 125Hz. Next a bandpass filter of 830 Hz was applied as movement imagination is known to suppress idle rhythms in this frequency band contralaterally [16]. As we consider binary classification in this paper, only EEG signals corresponding to the left and right hand motor imageries were used. More specifically, EEG epochs between 1.5 and 3.5 seconds after the appearance of left or right hand motor imagery cues were used.
4.2 Algorithms
We compared the performance of the following seven CSP algorithms:

Baseline 1 (BL1), which uses only the small amount of target domain labeled samples to design the CSP filters and the LDA classifier, and applies them to target domain unlabeled samples. That’s, BL1 does not use any source domain samples.

Baseline 2 (BL2): which combines all source domain samples to design the CSP filters and the LDA classifier, and applies them to target domain unlabeled samples. That’s, BL2 does not use any target domain labeled samples.

Baseline 3 (BL3), which directly combines all source domain samples and target domain labeled samples, designs the CSP filters and the LDA classifier, and applies them to target domain unlabeled samples.

Modelbased approach (MA), which is the approach introduced in Section 2.4.

Instancebased approach (IA), which is our proposed algorithm: it first solves the constrained optimization problem in (12) for the weights of the source domain samples, then combines target domain labeled samples and the weighted source domain samples to train CSP filters and the LDA classifier, and next applies them to target domain unlabeled samples.
There were 9 subjects in our dataset. Each time we picked one as our target subject, and the remaining 8 as the source subjects. For the target subject, we randomly reserved 40 epochs (20 epochs per class) as the training data pool, and used the remaining 104 epochs as our test data. We started with zero target domain training data, trained different CSP filters using the above 7 algorithms, and evaluated their performances on the test dataset. We then sequentially added 2 labeled epochs (1 labeled epoch per class) from the reserved training data pool to the target domain training dataset till all 40 epochs were added. Each time we trained different CSP filters using the above 7 algorithms and evaluated their performances on the test dataset. We repeated this process 30 times to obtain statistically meaningful results.
4.3 Results
The performances of the 7 algorithms are shown in Fig. 1, where the first 9 subfigures show the performances on the individual subjects. Observe that some subjects, e.g., Subjects 2 and 5, were more difficult to deal with than others, and there was no approach that always outperformed others; however, when , the number of target domain labeled epochs, was small, our proposed algorithm (IA) achieved the best performance for 5 out of the 9 subjects.
The last subfigure of Fig. 1 shows the average performance across the 9 subjects. Observe that:

When was small, all other methods outperformed BL1. Particularly, when , BL1 cannot build a model because it used only subjectspecific calibration data, but all other algorithms can, because they can use data from the source subjects. This suggests that all TLenhanced CSP algorithms are advantageous when the target domain has very limited labeled epochs.

BL2 outperformed BL1 and BL3 when was small, but as increased, all other algorithms outperformed BL2. This suggests that there is large individual difference among the subjects, so incorporating target domain samples is necessary and beneficial.

Generally, all TLenhanced CSP algorithms outperformed the three baselines, suggesting the effectiveness of TL. Particularly, our proposed algorithm (IA) achieved the best performance when was small. This is favorable, as we always want to achieve the best calibration performance with the smallest number of subjectspecific calibration samples.
5 Conclusions
CSP is a popular spatial filtering approach to increase the signaltonoise ratio of EEG signals. However, it is a supervised approach, which needs some subjectspecific calibration data to design. This is timeconsuming and not userfriendly. A promising approach for shortening or even completely eliminating this calibration session is TL, which leverages relevant data or knowledge from other subjects or tasks. This paper reviewed three existing approaches for incorporating TL into CSP, and also proposed a new TL enhanced CSP approach. Experiments on motor imagery classification demonstrated the effectiveness of these approaches. Particularly, our proposed approach achieved the best performance when the number of target domain calibration epochs is small.
The following directions will be considered in our future research:

Use the Riemannian mean instead of the Euclidean mean in estimating the mean class covariance matrices in CSP [1]. As the covariance matrix of each epoch is semipositive definite, they are located on a Riemannian manifold instead of in an Euclidean space. So, the Riemannian means may be more reasonable than the Euclidean means in CSP.

Extend the TL enhanced CSPs from classification to regression, using a fuzzy set based approach similar to the one proposed in [21].
References
 [1] Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Common spatial pattern revisited by Riemannian geometry. In: IEEE Int’l Workshop on Multimedia Signal Processing. pp. 472–476. London, UK (October 2010)

[2]
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7, 2399–2434 (2006)
 [3] Blankertz, B., Tomioka, R., Lemm, S., Kawanabe, M., Muller, K.R.: Optimizing spatial filters for robust EEG singletrial analysis. IEEE Signal Processing Magazine 25(1), 41–56 (2008)

[4]
Dalhoumi, S., Dray, G., Montmain, J.: Knowledge transfer for reducing calibration time in braincomputer interfacing. In: Proc. 26th IEEE Int’l Conf. on Tools with Artificial Intelligence. Limassol, Cyprus (November 2014)

[5]
Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of singletrial EEG dynamics including independent component analysis. Journal of Neuroscience Methods 134, 9–21 (2004)
 [6] van Erp, J., Lotte, F., Tangermann, M.: Braincomputer interfaces: Beyond medical applications. Computer 45(4), 26–34 (2012)
 [7] Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., Scholkopf, B.: Correcting sample selection bias by unlabeled data. In: Proc. Int’l. Conf. on Neural Information Processing Systems. pp. 601–608. Vancouver, Canada (December 2006)
 [8] Jayaram, V., Alamgir, M., Altun, Y., Scholkopf, B., GrosseWentrup, M.: Transfer learning in braincomputer interfaces. IEEE Computational Intelligence Magazine 11(1), 20–31 (2016)
 [9] Kang, H., Nam, Y., Choi, S.: Composite common spatial pattern for subjecttosubject transfer. Signal Processing Letters 16(8), 683–686 (2009)
 [10] Lance, B.J., Kerick, S.E., Ries, A.J., Oie, K.S., McDowell, K.: Braincomputer interface technologies in the coming decades. Proc. of the IEEE 100(3), 1585–1599 (2012)
 [11] Long, M., Wang, J., Ding, G., Pan, S.J., Yu, P.S.: Adaptation regularization: A general framework for transfer learning. IEEE Trans. on Knowledge and Data Engineering 26(5), 1076–1089 (2014)
 [12] Lotte, F., Guan, C.: Learning from other subjects helps reducing braincomputer interface calibration time. In: Proc. IEEE Int’l. Conf. on Acoustics Speech and Signal Processing (ICASSP). Dallas, TX (March 2010)
 [13] Makeig, S., Kothe, C., Mullen, T., BigdelyShamlo, N., Zhang, Z., KreutzDelgado, K.: Evolving signal processing for braincomputer interfaces. Proc. of the IEEE 100(Special Centennial Issue), 1567–1584 (2012)
 [14] NicolasAlonso, L.F., GomezGil, J.: Brain computer interfaces, a review. Sensors 12(2), 1211–1279 (2012)
 [15] Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. on Knowledge and Data Engineering 22(10), 1345–1359 (2010)
 [16] Pfurtscheller, G., Brunner, C., Schlogl, A., da Silva, F.L.: Mu rhythm (de)synchronization and EEG singletrial classification of different motor imagery tasks. NeuroImage 31(1), 153–159 (2006)
 [17] Ramoser, H., MullerGerking, J., Pfurtscheller, G.: Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans. on Rehabilitation Engineering 8(4), 441–446 (2000)
 [18] Waytowich, N.R., Lawhern, V.J., Bohannon, A.W., Ball, K.R., Lance, B.J.: Spectral transfer learning using Information Geometry for a userindependent braincomputer interface. Frontiers in Neuroscience 10, 430 (2016)
 [19] Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Braincomputer interfaces for communication and control. Clinical Neurophysiology 113(6), 767–791 (2002)
 [20] Wu, D.: Online and offline domain adaptation for reducing BCI calibration effort. IEEE Trans. on HumanMachine Systems 47(4), 550–563 (2017)
 [21] Wu, D., King, J.T., Chuang, C.H., Lin, C.T., Jung, T.P.: Spatial filtering for EEGbased regression problems in braincomputer interface (BCI). IEEE Trans. on Fuzzy Systems 26(2), 771–781 (2018)
 [22] Wu, D., Lance, B.J., Lawhern, V.J.: Transfer learning and active transfer learning for reducing calibration data in singletrial classification of visuallyevoked potentials. In: Proc. IEEE Int’l Conf. on Systems, Man, and Cybernetics. San Diego, CA (October 2014)
 [23] Wu, D., Lawhern, V.J., Hairston, W.D., Lance, B.J.: Switching EEG headsets made easy: Reducing offline calibration effort using active weighted adaptation regularization. IEEE Trans. on Neural Systems and Rehabilitation Engineering 24(11), 1125–1137 (2016)
 [24] Wu, D., Lawhern, V.J., Lance, B.J.: Reducing offline BCI calibration effort using weighted adaptation regularization with source domain selection. In: Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics. Hong Kong (October 2015)
 [25] Wu, D., Lawhern, V.J., Lance, B.J., Gordon, S., Jung, T.P., Lin, C.T.: EEGbased user reaction time estimation using Riemannian geometry features. IEEE Trans. on Neural Systems and Rehabilitation Engineering 25(11), 2157–2168 (2017)
Comments
There are no comments yet.