I Introduction
Transfer learning often describes an approach to discover and exploit some shared structure in the data that is invariant across data sets. In the context of braincomputer interfaces (BCIs), where the aim is to provide a direct neural communication and control channel for individuals, e.g., with severe neuromuscular disorders, the concept of transfer learning gains significant interest given its potential benefit in reducing BCI system calibration times by exploiting neural data recorded from other subjects. Given the limited data collection times under adequate concentration and consciousness with patients, this becomes essential for a potential patient enduser of the BCI system. Several pieces of work in this domain aim to find neural features (representations) that are invariant across subjects or sessions to calibrate BCIs [1, 2, 3], or learn a structure for the set of decision rules and how they differ across subjects and sessions [4, 5].
Going beyond neural interfaces, significant progress was recently achieved in domain transfer learning by adversarially censored invariant representations within the growing field of deep learning in computer vision and image processing
[6, 7, 8, 9, 10, 11, 12, 13]. These methods rely on learning generative models of the data that allow synthesis of data samples from latent representations, which can be achieved with variational autoencoders (VAEs) [14] for unsupervised feature learning, or generative adversarial networks (GANs) [15], where the supervision is alleviated by penalizing inaccurate samples using an adversarial game. Consistently, these are trained with adversarial censoring to learn representations that are aimed to be independent from some nuisance variables (e.g., a representative variable for factors of variations across data sets). In the light of these recent work, we introduce this progress in adversarial representation learning as a novel approach for transfer learning in BCIs.Various aspects of deep convolutional neural networks (CNNs) in computer vision have been already introduced to extract features for taskspecific decoding in electoencephalogram (EEG) based BCIs [16, 17], as well as for recent attempts to learn deep generative models for EEG [18, 19, 20]. In the present study, we extend these lines of work and propose a transfer learning approach for BCIs based on the exploitation of adversarial training for subjectinvariant representation learning. Particularly, the proposed approach [9, 13] aims to learn subjectinvariant representations by simultaneously training a conditional VAE and an adversarial network to enforce invariance of the learned data representations with respect to subject identity. This adversarial training procedure, with VAEs based on CNN architectures, yields data representations that work as features that are disentangled from subjectspecific nuisance variations, which enables decoding for unseen BCI subjects. Our results demonstrate the advantage of this approach with a proofofconcept based on analyses of EEG data recorded from 103 subjects during a motor imagery BCI experiment.
Ii Methods
Iia Notation
Let denote the data set for subject consisting of trials, where is the raw EEG data at trial recorded from channels for discretized time samples, and is the corresponding class label from a set of class labels. In a subjecttosubject transfer learning problem, the aim is to learn a parametric encoder which can be generalized across subjects, and extracts latent representations from the data that are useful in discriminating different tasks or brain states indicated by their corresponding class labels . Accordingly, let
denote the onehot encoded subject identifier vector for subject
(i.e., an dimensional vector with a value of 1 at the ’th index and zero in other indices), which represents the nuisance variable in our adversarial representation learning frameworks that will be enforced to be independent of.IiB Conditional Variational Autoencoder (cVAE)
VAEs [14] learn a generative model as a pair of encoder and decoder networks. The encoder learns a latent representation from the data , while the decoder aims to reconstruct the data from the learned representation . In this variational framework the encoder is stochastic, meaning that the decoder uses a learned posterior , whose parameters are given by the encoder network. The decoder is provided with samples from this posterior distribution as input .
In the conditional VAE (cVAE) framework [21], the decoder is conditioned on a nuisance variable as an additional input besides , and the encoder is expected to learn representations that are invariant of , since is already given as input to the decoder. The loss function to be minimized in this cVAE framework, which is also known as the evidence lower bound (ELBO), is given by:
(1) 
where the first term is the reconstruction loss of the decoder, and the second term is the encoder variational posterior loss. This framework implicitly enforces invariance for with respect to . However this is known to be not perfectly achieved in practice, which paves the way for adversarial training methods in representation learning [13].
IiC Adversarial Conditional VAE (AcVAE)
In the proposed adversarial cVAE (AcVAE) framework [9, 13], a conditional VAE and an adversary to enforce invariance with respect to (i.e., subject identifiers) are simultaneously trained. Specifically, alongside a cVAE that takes EEG timeseries data
as input to the encoder and estimates
at the decoder, an adversary is trained that takes learned representations as input, and estimates .We extend Eq. (1) to obtain the AcVAE loss function. For the deterministic decoder, reconstruction loss is determined by the mean squared error of the estimated timeseries EEG data. Furthermore, softmax crossentropy loss of the adversary network is inversely added to the loss function for AcVAE which is then denoted as:
(2) 
where is a weight parameter to adjust the impact of adversarial censoring on learned representations. Alternatingly once per batch with AcVAE parameter updates, the adversary is also individually trained to minimize its softmax crossentropy loss .
IiD Model Architecture and Classifier Training
In our implementations, the encoder and decoder have convolutional architectures embedding temporal and spatial filterings motivated by the results achieved with EEGNet [16], Deep ConvNet and Shallow ConvNet [17]. Parameterization and details of the convolutional cVAE architecture are broadly illustrated in Fig. 1, and provided in detail in Table I. The two fully connected layers at the output of the encoder generate two dimensional parameter vectors and , which are then used to sample . The nuisance variable vector is then concatenated to the sampled as the input for the decoder. We used temporal convolution kernels of size , and spatial convolution kernels of size , and a latent vector dimensionality of
. Adjacent to the cVAE, the adversary is realized as a single hidden layer multilayer perceptron (MLP) with ReLU nonlinearity after the first layer, and we fixed adversarial censoring weight parameter
in all experiments.Following adversarial representation learning using a set of training data samples, the encoder is kept static and then using the same training data samples, a classifier is trained that is connected to the output of the encoder. Specifically, all training data samples were again used as input to the static encoder that was previously optimized, and using the obtained parameters at the output of the encoder, a latent vector
is sampled which was then used as an input to a classifier. The classifier was also realized as a single hidden layer MLP with ReLU nonlinearity after the first layer. Classifier training was performed to minimize its softmax crossentropy loss . The adversary network had output dimensionality of , and the classifier had an output dimensionality of . Both the adversary and the classifier hidden layers had nodes.Layer  Input Dim.  Operation 

Encoder 1  40 Temporal Conv1D ()  
BatchNorm + ReLU + Dropout (0.25)  
Encoder 2  40 Spatial Conv1D ()  
BatchNorm + ReLU + Dropout (0.25)  
Encoder 3  Reshape (Flatten)  
2 FullyConnected Layers  
Latent ()  Sample with estimated parameters  
Decoder 1  FullyConnected Layer  
ReLU + Reshape  
Decoder 2  40 Spatial Deconv1D ()  
BatchNorm + ReLU + Dropout (0.25)  
Decoder 3  40 Temporal Deconv1D ()  
BatchNorm + ReLU + Dropout (0.25) 
IiE Dataset and Implementation
We used the publicly available PhysioNet EEG Motor Movement/Imagery Dataset [22], which was collected using the BCI2000 instrumentation system [23]. The dataset consists of over 1500 one and twominute EEG recordings, obtained from 109 subjects. Throughout the experiments, subjects were placed in front of a computer screen and were instructed to perform cuebased motor execution/imagery tasks while 64channel EEG were recorded at a sampling rate of 160 Hz. These tasks included executing the movement of the right or left hand, opening and closing of both fists or legs; or just the imagination of these movements. Each trial lasted fourseconds with intertrial resting periods of same length. At the beginning of the experiments, eyesopen and eyesclosed restingstate EEG were also recorded. Each subject participated in the experiment for a single session.
From this data set, six subjects’ data were discarded due to irregular timestamp alignments, resulting in a total of 103 subjects. We used trials that correspond to right and left hand motor imagination to evaluate our proposed approach on a conventional BCI paradigm [24]. This resulted in a total of 45 foursecond trials per subject, with binary class labels corresponding to right or left hand imagery. We randomly selected 13 subjects to holdout for further acrosssubjects transfer learning experiments. Using the remaining 90 subjects’ data, the networks were trained over a training set of 3240 trials, while validations were performed with the remaining 810 trials including data from all subjects. We implemented all analyses with the Chainer deep learning framework [25]
. Networks were trained with 100 trials per batch for 750 epochs (
25,000 iterations), and parameter updates were performed once per batch with Adam [26].Iii Experimental Results
Iiia EEG PreProcessing and Model Evaluation
All subjects’ data were epoched into the timeinterval where the neural changes induced by motor imagery are emphasized [24]. Specifically, from the four second duration, the 1to3 seconds interval after the imagery cue onset were extracted to be used in experiments, resulting in a timeseries length of . Raw EEG data were normalized to have zero mean. Note that this preprocessing statistics (i.e., data mean) is only computed on the training data, and then applied to validation and transfer subjects’ data.
We evaluate adversarial representation learning with the following frameworks: (1) AcVAE, (2) cVAE, (3) adversarially censored VAE without conditioning (AVAE), (4) basic convolutional encoder (CNN). Implementation of (1) corresponds to the Sections IIC and IID. The approach in (2) is expected to reveal the practical deficiencies of only using decoder conditioning for representation invariance. In that case, we still train an adversary in parallel but do not feed the adversarial loss to the overall training objective, i.e., using Eq. (1). Method (3) is expected to reveal the tradeoff between enforcing invariance with an adversary but still preserving enough information in to allow sufficient decoder learning (c.f. a similar approach in [6]). This corresponds to using the same objective as AcVAE, but not providing at the decoder input. Finally, (4) depicts a baseline case that uses the same CNN encoder architecture in combination with an MLP classifier but only trained endtoend from scratch (via softmax crossentropy loss for classification) rather first training the encoder within a VAE.
IiiB AcrossSubjects Transfer Learning
To observe representation invariance, accuracies of the adversary network over 90 subjects after training are presented in Table II. In this context, a higher accuracy indicates more subjectspecific information remaining in the learned representations , which results in better decoding of by the adversary. Therefore a lower adversary accuracy is representative of better invariant representation learning, as observed through the least leakage with AcVAE.
Distributions of transfer learning classification accuracies for the 13 heldout subjects are shown in Fig. 2. Using zero subjectspecific training or finetuning data, we observe accuracies up to with AcVAE. Consistently with the results in Table II, we observe a decrease of accuracies in cVAE and AVAE with respect to AcVAE. For baseline CNN, the model tends to memorize the training data without any subjectinvariance attempt, resulting in high variation of accuracies across the 13 subjects as intuitively expected.
Training Data  Validation Data  
AcVAE  cVAE  AVAE  AcVAE  cVAE  AVAE 
Iv Discussion
In this work we introduced adversarial invariant representation learning as a novel approach to transfer learning in BCIs. We revealed that learning subjectinvariant representations by adversarial censoring can be a significantly useful tool for subjecttransfer learning. We demonstrated an empirical proofofconcept with EEG data recorded from 103 subjects during a motor imagery BCI experiment.
Hereby, we mainly focused on the results regarding the invariance of representations and the acrosssubjects transfer learning capability of the models. However the proposed approach can be further extended in the context of semisupervised transfer learning in BCIs, such as using a short calibration time for finetuning and semisupervised transfer, learning sessioninvariant representations to reduce useroriented BCI system calibration times, or learning disentangled representations that exploit adversarial censoring to learn partly subjectinvariant, and partly subjectvariant representations. We highlight that these frameworks should be of significant interest in the field of neural interfaces.
References
 [1] M. Krauledat, M. Tangermann, B. Blankertz, and K.R. Müller, “Towards zero training for braincomputer interfacing,” PloS one, vol. 3, no. 8, p. e2967, 2008.
 [2] H. Kang, Y. Nam, and S. Choi, “Composite common spatial pattern for subjecttosubject transfer,” IEEE Signal Processing Letters, vol. 16, no. 8, pp. 683–686, 2009.
 [3] W. Samek, F. C. Meinecke, and K.R. Müller, “Transferring subspaces between subjects in brain–computer interfacing,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 8, pp. 2289–2298, 2013.

[4]
M. Alamgir, M. GrosseWentrup, and Y. Altun, “Multitask learning for
braincomputer interfaces,” in
Proceedings of the 13th International Conference on Artificial Intelligence and Statistics
, 2010, pp. 17–24.  [5] V. Jayaram, M. Alamgir, Y. Altun, B. Schölkopf, and M. GrosseWentrup, “Transfer learning in braincomputer interfaces,” IEEE Computational Intelligence Magazine, vol. 11, no. 1, pp. 20–31, 2016.
 [6] H. Edwards and A. Storkey, “Censoring representations with an adversary,” arXiv preprint arXiv:1511.05897, 2015.
 [7] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015.
 [8] M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” in Advances in Neural Information Processing Systems, 2016, pp. 5040–5048.
 [9] G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer et al., “Fader networks: Manipulating images by sliding attributes,” in Advances in Neural Information Processing Systems, 2017.
 [10] A. Creswell, A. A. Bharath, and B. Sengupta, “Conditional autoencoders with adversarial information factorization,” arXiv preprint arXiv:1711.05175, 2017.

[11]
E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative
domain adaptation,” in
Computer Vision and Pattern Recognition
, vol. 1, no. 2, 2017, p. 4.  [12] J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Adversarial representation learning for domain adaptation,” arXiv preprint arXiv:1707.01217, 2017.
 [13] Y. Wang, T. KoikeAkino, and D. Erdogmus, “Invariant representations from adversarially censored autoencoders,” arXiv preprint arXiv:1805.08097, 2018.
 [14] D. P. Kingma and M. Welling, “Autoencoding variational Bayes,” arXiv preprint arXiv:1312.6114, 2013.
 [15] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014.
 [16] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: A compact convolutional network for EEGbased braincomputer interfaces,” arXiv preprint arXiv:1611.08024, 2016.
 [17] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for EEG decoding and visualization,” Human Brain Mapping, vol. 38, no. 11, pp. 5391–5420, 2017.
 [18] P. Bashivan, I. Rish, M. Yeasin, and N. Codella, “Learning representations from EEG with deep recurrentconvolutional neural networks,” arXiv preprint arXiv:1511.06448, 2015.
 [19] Y. Luo and B.L. Lu, “EEG data augmentation for emotion recognition using a conditional Wasserstein GAN,” in International Conference of the IEEE Engineering in Medicine and Biology Society, 2018.
 [20] K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “EEGGAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals,” arXiv preprint arXiv:1806.01875, 2018.
 [21] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in Advances in Neural Information Processing Systems, 2015, pp. 3483–3491.
 [22] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
 [23] G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, and J. R. Wolpaw, “BCI2000: a generalpurpose braincomputer interface (BCI) system,” IEEE Transactions on Biomedical Engineering, vol. 51, no. 6, pp. 1034–1043, 2004.
 [24] G. Pfurtscheller and C. Neuper, “Motor imagery and direct braincomputer communication,” Proceedings of the IEEE, vol. 89, no. 7, pp. 1123–1134, 2001.

[25]
S. Tokui, K. Oono, S. Hido, and J. Clayton, “Chainer: a nextgeneration open
source framework for deep learning,” in
Proceedings of Workshop on Machine Learning Systems in the 29th Annual Conference on Neural Information Processing Systems
, 2015.  [26] D. P. Kingma and J. B. Adam, “A method for stochastic optimization,” in International Conference on Learning Representations, vol. 5, 2015.
Comments
There are no comments yet.