Transfer Learning in Brain-Computer Interfaces with Adversarial Variational Autoencoders

12/17/2018 ∙ by Ozan Özdenizci, et al. ∙ 0

We introduce adversarial neural networks for representation learning as a novel approach to transfer learning in brain-computer interfaces (BCIs). The proposed approach aims to learn subject-invariant representations by simultaneously training a conditional variational autoencoder (cVAE) and an adversarial network. We use shallow convolutional architectures to realize the cVAE, and the learned encoder is transferred to extract subject-invariant features from unseen BCI users' data for decoding. We demonstrate a proof-of-concept of our approach based on analyses of electroencephalographic (EEG) data recorded during a motor imagery BCI experiment.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Transfer learning often describes an approach to discover and exploit some shared structure in the data that is invariant across data sets. In the context of brain-computer interfaces (BCIs), where the aim is to provide a direct neural communication and control channel for individuals, e.g., with severe neuromuscular disorders, the concept of transfer learning gains significant interest given its potential benefit in reducing BCI system calibration times by exploiting neural data recorded from other subjects. Given the limited data collection times under adequate concentration and consciousness with patients, this becomes essential for a potential patient end-user of the BCI system. Several pieces of work in this domain aim to find neural features (representations) that are invariant across subjects or sessions to calibrate BCIs [1, 2, 3], or learn a structure for the set of decision rules and how they differ across subjects and sessions [4, 5].

Going beyond neural interfaces, significant progress was recently achieved in domain transfer learning by adversarially censored invariant representations within the growing field of deep learning in computer vision and image processing 

[6, 7, 8, 9, 10, 11, 12, 13]. These methods rely on learning generative models of the data that allow synthesis of data samples from latent representations, which can be achieved with variational autoencoders (VAEs) [14] for unsupervised feature learning, or generative adversarial networks (GANs) [15], where the supervision is alleviated by penalizing inaccurate samples using an adversarial game. Consistently, these are trained with adversarial censoring to learn representations that are aimed to be independent from some nuisance variables (e.g., a representative variable for factors of variations across data sets). In the light of these recent work, we introduce this progress in adversarial representation learning as a novel approach for transfer learning in BCIs.

Various aspects of deep convolutional neural networks (CNNs) in computer vision have been already introduced to extract features for task-specific decoding in electoencephalogram (EEG) based BCIs [16, 17], as well as for recent attempts to learn deep generative models for EEG [18, 19, 20]. In the present study, we extend these lines of work and propose a transfer learning approach for BCIs based on the exploitation of adversarial training for subject-invariant representation learning. Particularly, the proposed approach [9, 13] aims to learn subject-invariant representations by simultaneously training a conditional VAE and an adversarial network to enforce invariance of the learned data representations with respect to subject identity. This adversarial training procedure, with VAEs based on CNN architectures, yields data representations that work as features that are disentangled from subject-specific nuisance variations, which enables decoding for unseen BCI subjects. Our results demonstrate the advantage of this approach with a proof-of-concept based on analyses of EEG data recorded from 103 subjects during a motor imagery BCI experiment.

Ii Methods

Ii-a Notation

Let denote the data set for subject consisting of trials, where is the raw EEG data at trial recorded from channels for discretized time samples, and is the corresponding class label from a set of class labels. In a subject-to-subject transfer learning problem, the aim is to learn a parametric encoder which can be generalized across subjects, and extracts latent representations from the data that are useful in discriminating different tasks or brain states indicated by their corresponding class labels . Accordingly, let

denote the one-hot encoded subject identifier vector for subject

(i.e., an -dimensional vector with a value of 1 at the ’th index and zero in other indices), which represents the nuisance variable in our adversarial representation learning frameworks that will be enforced to be independent of.

Fig. 1: Adversarial cVAE (A-cVAE) architecture with a stochastic encoder and a deterministic decoder with conditioning on

. A-cVAE is trained to minimize the loss function in Eq. (

2), while the adversary is also individualy trained to minimize its softmax cross-entropy loss. Parameter updates were performed alternatingly among the A-cVAE and the adversary once per batch.

Ii-B Conditional Variational Autoencoder (cVAE)

VAEs [14] learn a generative model as a pair of encoder and decoder networks. The encoder learns a latent representation from the data , while the decoder aims to reconstruct the data from the learned representation . In this variational framework the encoder is stochastic, meaning that the decoder uses a learned posterior , whose parameters are given by the encoder network. The decoder is provided with samples from this posterior distribution as input .

In the conditional VAE (cVAE) framework [21], the decoder is conditioned on a nuisance variable as an additional input besides , and the encoder is expected to learn representations that are invariant of , since is already given as input to the decoder. The loss function to be minimized in this cVAE framework, which is also known as the evidence lower bound (ELBO), is given by:


where the first term is the reconstruction loss of the decoder, and the second term is the encoder variational posterior loss. This framework implicitly enforces invariance for with respect to . However this is known to be not perfectly achieved in practice, which paves the way for adversarial training methods in representation learning [13].

Ii-C Adversarial Conditional VAE (A-cVAE)

In the proposed adversarial cVAE (A-cVAE) framework [9, 13], a conditional VAE and an adversary to enforce invariance with respect to (i.e., subject identifiers) are simultaneously trained. Specifically, alongside a cVAE that takes EEG time-series data

as input to the encoder and estimates

at the decoder, an adversary is trained that takes learned representations as input, and estimates .

We extend Eq. (1) to obtain the A-cVAE loss function. For the deterministic decoder, reconstruction loss is determined by the mean squared error of the estimated time-series EEG data. Furthermore, softmax cross-entropy loss of the adversary network is inversely added to the loss function for A-cVAE which is then denoted as:


where is a weight parameter to adjust the impact of adversarial censoring on learned representations. Alternatingly once per batch with A-cVAE parameter updates, the adversary is also individually trained to minimize its softmax cross-entropy loss .

Ii-D Model Architecture and Classifier Training

In our implementations, the encoder and decoder have convolutional architectures embedding temporal and spatial filterings motivated by the results achieved with EEGNet [16], Deep ConvNet and Shallow ConvNet [17]. Parameterization and details of the convolutional cVAE architecture are broadly illustrated in Fig. 1, and provided in detail in Table I. The two fully connected layers at the output of the encoder generate two -dimensional parameter vectors and , which are then used to sample . The nuisance variable vector is then concatenated to the sampled as the input for the decoder. We used temporal convolution kernels of size , and spatial convolution kernels of size , and a latent vector dimensionality of

. Adjacent to the cVAE, the adversary is realized as a single hidden layer multilayer perceptron (MLP) with ReLU nonlinearity after the first layer, and we fixed adversarial censoring weight parameter

in all experiments.

Following adversarial representation learning using a set of training data samples, the encoder is kept static and then using the same training data samples, a classifier is trained that is connected to the output of the encoder. Specifically, all training data samples were again used as input to the static encoder that was previously optimized, and using the obtained parameters at the output of the encoder, a latent vector

is sampled which was then used as an input to a classifier. The classifier was also realized as a single hidden layer MLP with ReLU nonlinearity after the first layer. Classifier training was performed to minimize its softmax cross-entropy loss . The adversary network had output dimensionality of , and the classifier had an output dimensionality of . Both the adversary and the classifier hidden layers had nodes.

Layer Input Dim. Operation
Encoder 1 40 Temporal Conv1D ()
BatchNorm + ReLU + Dropout (0.25)
Encoder 2 40 Spatial Conv1D ()
BatchNorm + ReLU + Dropout (0.25)
Encoder 3 Reshape (Flatten)
2 Fully-Connected Layers
Latent () Sample with estimated parameters
Decoder 1 Fully-Connected Layer
ReLU + Reshape
Decoder 2 40 Spatial Deconv1D ()
BatchNorm + ReLU + Dropout (0.25)
Decoder 3 40 Temporal Deconv1D ()
BatchNorm + ReLU + Dropout (0.25)
TABLE I: A-cVAE Encoder and Decoder Architectures

Ii-E Dataset and Implementation

We used the publicly available PhysioNet EEG Motor Movement/Imagery Dataset [22], which was collected using the BCI2000 instrumentation system [23]. The dataset consists of over 1500 one- and two-minute EEG recordings, obtained from 109 subjects. Throughout the experiments, subjects were placed in front of a computer screen and were instructed to perform cue-based motor execution/imagery tasks while 64-channel EEG were recorded at a sampling rate of 160 Hz. These tasks included executing the movement of the right or left hand, opening and closing of both fists or legs; or just the imagination of these movements. Each trial lasted four-seconds with inter-trial resting periods of same length. At the beginning of the experiments, eyes-open and eyes-closed resting-state EEG were also recorded. Each subject participated in the experiment for a single session.

From this data set, six subjects’ data were discarded due to irregular timestamp alignments, resulting in a total of 103 subjects. We used trials that correspond to right and left hand motor imagination to evaluate our proposed approach on a conventional BCI paradigm [24]. This resulted in a total of 45 four-second trials per subject, with binary class labels corresponding to right or left hand imagery. We randomly selected 13 subjects to hold-out for further across-subjects transfer learning experiments. Using the remaining 90 subjects’ data, the networks were trained over a training set of 3240 trials, while validations were performed with the remaining 810 trials including data from all subjects. We implemented all analyses with the Chainer deep learning framework [25]

. Networks were trained with 100 trials per batch for 750 epochs (

25,000 iterations), and parameter updates were performed once per batch with Adam [26].

Iii Experimental Results

Iii-a EEG Pre-Processing and Model Evaluation

All subjects’ data were epoched into the time-interval where the neural changes induced by motor imagery are emphasized [24]. Specifically, from the four second duration, the 1-to-3 seconds interval after the imagery cue onset were extracted to be used in experiments, resulting in a time-series length of . Raw EEG data were normalized to have zero mean. Note that this pre-processing statistics (i.e., data mean) is only computed on the training data, and then applied to validation and transfer subjects’ data.

We evaluate adversarial representation learning with the following frameworks: (1) A-cVAE, (2) cVAE, (3) adversarially censored VAE without conditioning (A-VAE), (4) basic convolutional encoder (CNN). Implementation of (1) corresponds to the Sections II-C and II-D. The approach in (2) is expected to reveal the practical deficiencies of only using decoder conditioning for representation invariance. In that case, we still train an adversary in parallel but do not feed the adversarial loss to the overall training objective, i.e., using Eq. (1). Method (3) is expected to reveal the tradeoff between enforcing invariance with an adversary but still preserving enough information in to allow sufficient decoder learning (c.f. a similar approach in [6]). This corresponds to using the same objective as A-cVAE, but not providing at the decoder input. Finally, (4) depicts a baseline case that uses the same CNN encoder architecture in combination with an MLP classifier but only trained end-to-end from scratch (via softmax cross-entropy loss for classification) rather first training the encoder within a VAE.

Iii-B Across-Subjects Transfer Learning

To observe representation invariance, accuracies of the adversary network over 90 subjects after training are presented in Table II. In this context, a higher accuracy indicates more subject-specific information remaining in the learned representations , which results in better decoding of by the adversary. Therefore a lower adversary accuracy is representative of better invariant representation learning, as observed through the least leakage with A-cVAE.

Distributions of transfer learning classification accuracies for the 13 held-out subjects are shown in Fig. 2. Using zero subject-specific training or fine-tuning data, we observe accuracies up to with A-cVAE. Consistently with the results in Table II, we observe a decrease of accuracies in cVAE and A-VAE with respect to A-cVAE. For baseline CNN, the model tends to memorize the training data without any subject-invariance attempt, resulting in high variation of accuracies across the 13 subjects as intuitively expected.

Training Data Validation Data
TABLE II: Adversary Accuracies After Model Training
Fig. 2: Transfer learning classification accuracies for the 13 held-out subjects with learned features by: (1) A-cVAE, (2) cVAE, (3) A-VAE (i.e., no conditioning on

), (4) CNN as a baseline. Central line mark represents the median across 13 subjects. Upper and lower bounds of the box represents the first and third quartiles. Dashed lines represent the extreme samples. Mean accuracies are: (1) 63.8%, (2) 61.2%, (3) 56.9%, (4) 59.8%.

Iv Discussion

In this work we introduced adversarial invariant representation learning as a novel approach to transfer learning in BCIs. We revealed that learning subject-invariant representations by adversarial censoring can be a significantly useful tool for subject-transfer learning. We demonstrated an empirical proof-of-concept with EEG data recorded from 103 subjects during a motor imagery BCI experiment.

Hereby, we mainly focused on the results regarding the invariance of representations and the across-subjects transfer learning capability of the models. However the proposed approach can be further extended in the context of semi-supervised transfer learning in BCIs, such as using a short calibration time for fine-tuning and semi-supervised transfer, learning session-invariant representations to reduce user-oriented BCI system calibration times, or learning disentangled representations that exploit adversarial censoring to learn partly subject-invariant, and partly subject-variant representations. We highlight that these frameworks should be of significant interest in the field of neural interfaces.