Electroencephalography (EEG) is the most prominent data acquisition approach in BCI, owing to its non-invasive nature, relative ease of use and exquisite temporal resolution [1, 2]. Traditionally, the electrodes used for EEG are placed on the scalp with conductive gel (wet-EEG) in order to lower the impedance between the electrodes and the skin . The impedance values in EEG signals are a measurement of how good the conductivity is between the electrode and the skin. The lower the value of impedance, the better the electrode and the skin contact thus improving overall EEG signal quality [4, 5].
The major drawback of wet-EEG is the required gel application owing to the Ag/AgCl electrodes, consequently resulting in relatively substantial preparation time, scalp discomfort and additional time required to remove the gel after the experimental protocol . Furthermore, the gel will dry over a certain time frame, thus somewhat limiting the experimental data acquisition interval . Moreover, classical wet-EEG requires some specific experimental conditions like a Faraday cage (a physical shield using conductive material) which reduces the effect of external electromagnetic interference in terms of signal noise . This limits the application of BCI using wet-EEG to strict experimental operating conditions. By contrast, a dry-EEG headset offers an alternative approach alleviates these limitations in terms of skin preparation, stable connectivity and comfort during experimentation in addition to ready adaptability to different head sizes [6, 7]. However, the major drawback is the relatively higher impedance values, as compared to wet-EEG , thus making it difficult to reduce the EEG signal noise and unwanted artefacts. This results in a substantially more challenging signal decoding and classification task.
In this study, we are using the commercially available Quick-20 dry EEG headset from Cognionics Inc. (San Diego, USA) with 20 dry-EEG sensors (10-20 sensor layout compliant). The system is employed without the need for skin preparation and it is both portable and wireless [8, 6, 7, 9]. This headset comes with individual local active shields that eliminate the need for the rigid experimental condition [10, 6]. In our experiments, we collect dry-EEG signals with SSVEP as the neuro-physiological responses. SSVEP has the feature of frequency tagging, which enables the measurement of neural activity in response to a flickering stimuli which the subject is fixated upon, even if the subject is not paying full conscious attention to the stimuli . It is considered to be the most suitable type of stimuli to be used for effective high throughput BCI as SSVEP can provide high Information Transfer Rate (ITR) neural signals with minimal subject training .
In this study, we investigate the use of a deep neural network, specifically a CNN, to perform the classification of SSVEP frequencies in dry-EEG data. CNN are a subset of neural networks, which learn to differentiate between classes in data by extracting unique features across multiple layers of convolutional transformation . In the convolution layer, the input is convolved via kernels (filters) to obtain feature maps 
. This process removes the requirement for hand-crafted feature extraction as well as common signal pre-processing steps, as raw data samples can be used as a direct input to the model[14, 15]. This property provides a critical advantage as the potential exists for salient EEG signals or features to be excluded or missed when using traditional pre-processing based approaches .
We evaluate the performance of our proposed CNN architecture at classifying dry-EEG SSVEP signals across a four class stimuli problem collected from a single subject and highlight the vastly superior performance when compared to baseline classifiers including the Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Minimum Distance to Mean (MDM) and a Recurrent Neural Network (RNN). Furthermore, we explore the use of the same CNN architecture to examine both multiple subject, exploring both within subject and across subject performance. Finally, to test the ability of the CNN to generalise across unseen subjects, we explore the performance when testing upon a subject for which no sample data is present within the training dataset.
In summary, the major contributions of this study are:
An end-to-end deep learning CNN architecture to perform the classification of raw dry-EEG SSVEP data without the need for manual pre-processing or feature extraction (the first study to do so with the accuracy achieved: 96).
A demonstrable model that achieves generalisation across subjects during training in contrast to earlier EEG BCI work in the field (accuracy: 78).
An approach with the ability to generalise to entirely unseen subjects with no additional training, raising the potential for subject-independent BCI applications.
Ii Related Work
In , a 32-channel dry-EEG was used on subjects in which they fixated on 11 and 12 Hz SSVEP stimuli during walking trials. The performance and the quality of cortical signals between the wet-EEG and dry-EEG during locomotion were compared. From their experiments, wet-EEG performed better as compared to the dry-EEG by 4 to 10 in accuracy for standing and walking at different speeds, respectively.
The study of foot motor imagery has been carried out in  to trigger a lower limb exoskeleton while using the same 20-channel dry-EEG headset we use here. The aim of the paper is to have the quick setup system for asynchronous motor imagery BCI as offered by using the dry-EEG headset.
, the authors control an exoskeleton via a visual stimulus generator that had five different frequency LEDs to control five different behaviours for static and ambulatory experiments. They used eight wet-EEG electrodes to measure the SSVEP signals with Canonical Correlation Analysis (CCA), Multivariate Synchronization Index (MSI) and CCA with k-Nearest Neighbours (CCA-KNN) used to compare the classification result with three proposed Neural Network (NN) methods: CNN-1 (3 layer network), CNN-2 (4 layer network) and a fully-connected NN. The data from the stimuli is pre-processed for all approaches with the CNN-1 method providing the best accuracy results across both EEG data genres.
which used the traditional wet-EEG with five flickering stimuli frequencies. These authors proposed CNN and RNN with Long-short Term Memory (LSTM) for the deep learning methods against traditional classifiers like k-Nearest Neighbour (k-NN), Multi-layer Perceptron (MLP), decision trees and SVM. Within all the classifiers, CNN outperformed other approaches with a mean accuracy of 69.03and within the traditional classifiers, SVM provided the best overall accuracy.
The authors in  introduce EEGNet, a CNN model for wet-EEG data across paradigms. The paper includes four datasets for four different paradigms (P300 Event-Related Potential, Error-Related Negativity, Movement-Related Cortical Potential, and Sensorimotor Rhythm). All the datasets come from different sources with different data sizes. These authors pre-process the data before training the datasets using different approaches including both shallow CNN and deep CNN for within subject classification and across subject classification and for all four paradigms. Inconclusively, the results demonstrate that different paradigms perform differently for every approach.
In contrast to these earlier works, we explicitly consider an end-to-end approach, without the need for EEG signal pre-processing, to tackle single subject, multiple subject and unseen subject SSVEP-based dry-EEG signal classification challenges.
In this section, we explore the creation of a machine learning model, specifically a deep CNN, in order to perform accurate classification of dry-EEG data. We include several baseline studies in order to compare the performance of the classification accuracy. We also detail the methodology adopted for the experimental SSVEP data collection.
Iii-a Experimental Setup
In this work, we utilise SSVEP as the neuro-physiological response, measured via dry-EEG. The subjects sit in front of a 60Hz refresh rate LCD monitor whilst wearing the dry-EEG headset. We record the data from a range of SSVEP stimuli frequencies; 10, 12, 15 and, 30 Hz  using PsychoPy for SSVEP stimuli presentation . The stimuli corresponding to the different flicker frequencies were presented on the primary computer. In order to assist with real time processing further along the analysis pipeline, the cortical signals were streamed via the data acquisition software to a secondary computer and sent back to the primary computer. The communication between the different hardware components is shown in Figure 1.
The dry-EEG headset provides 19 channels and A2, reference and ground as shown in Figure 1 (highlighted in blue). The 20-channel (Cognionics Inc.) sensor montage  has been coregistered with the MNI Colin27 brain (Montreal Neurological Institute Colin 27 atlas). Average sensor locations were obtained by averaging 3-D digitized (ELPOS, Zebris Medical GmbH) electrode locations from ten individuals. Electrode labels are assigned based on the nearest neighbour mapping to the standard 10/5 montage. Nas, LPA, and RPA denote nasion and left/right preauricular fiducials .
During the experiments, we collect data over the parietal and occipital cortex (P7, P3, Pz, P4, P8, O1 and O2) , frontal centre (Fz) and A2 reference at 500 Hz sampling rate across four subjects. The data for subject one (S01) consists of 100 trials of each of the 4 SSVEP classes investigated. For the additional three subjects, we only record 20 trials instead. Each trial flickers the LCD screen for three seconds. The data acquisition software used to monitor and record the signals provides real-time measurement of the impedances for the entire duration of the experiment, thus ensuring good quality EEG signals are recorded.
Nevertheless, the primary challenge associated with the classification of dry-EEG signals is the higher noise ratio as compared to the traditional wet-EEG system, owing to the relatively higher impedance values. This noise can be seen in Figure 2 which shows the seven distinct dry-EEG data channels across the four SSVEP frequencies we are investigating.
Iii-B Convolutional Neural Network Model Design
Signal processing is one of the primary components in the field of BCI and it acts as the translation between the raw EEG cortical signals to a specific desirable decision or application . Traditionally, this requires the use of manual pre-processing and feature extraction stages to transform the data into a format suitable for down-stream prediction tasks. By contrast in this work, we explore the use of a deep convolutional neural network to perform this translation process in an end-to-end fashion111 Implemented using the Pytorch library (
Implemented using the Pytorch library (http://pytorch.org/).. We explore whether or not a CNN can perform accurate classification of SSVEP target class frequencies on raw dry-EEG data, without the need for manual pre-processing nor feature extraction as found in contemporary work . CNN have demonstrated state-of-the-art results in many image processing tasks, when being used on two dimensional image data . However, there is growing evidence that CNN can be used to process time-series data, when passing a filter over the time dimension, often outperforming recurrent models designed specifically for such temporal data tasks . As EEG data represents time-series data, we make use of a 1D CNN model to classify the dry-EEG data.
The structure of the CNN used in this work is displayed in Figure 3 in which we have our SSVEP Convolutional Unit (SCU) comprising of a triplicate layer of a 1D convolutional layer, batch normalization and max pooling layer operations. These SCU form the common computational building blocks of the CNN architectures used for dry-EEG signal decoding in this study. Our CNN architecture has a large initial filter to capture the frequencies we are interested in classifying in the dry-EEG data. We also make use of batch normalization to help counterbalance the noisy EEG data. Once the data has been transformed via the convolutional filter, the actual classification of the EEG signal is performed via a softmax function (highlighted in black in Figure 3) in the final layer. The softmax function takes as input the feature vector , generated by the CNN
and computes the conditional probability of producing the labelas:
where is the set of all labels in the dataset.
The loss function the model minimised during training is that of categorical cross-entropy (CCE), which will measure the distance between the output distribution ofand as:
where is the total number of training samples.
The model is trained using the ADAM gradient descent algorithm 
, for 100 epochs with a mini-batch size of 32. We also utilise L2 weight decay to help prevent over-fitting by penalising the network for having large weights, meaning that the final objective of our model for optimising is:
where are the weights of the network and is a user controllable scaling parameter, set to for this work.
To validate the effectiveness of our proposed approach, we compare with traditional classifiers and other deep learning models. The traditional classifiers used require pre-processing and feature extraction prior to the classification stage. As such, the raw signals will process via the following steps:- downsampled to 250Hz, referencing to the frontal centre sensor signals (Fz), notch filtered at 50Hz to remove line signal noise and bandpass filtered between 9 to 100 Hz. As a result, pre-processing is used to remove the unwanted signals such as power-line noise, and to focus on the signals between the desirable range . These filtered signals are then utilised as the input for the feature extraction stage. Based on the recent comparative review of , we select the Riemannian approach for feature extraction 
which utilises covariance matrix and tangent space features which estimate a feature vector in.
Based on the result from , SVM is the optimal traditional classifier for EEG data. Therefore we use SVM as one of our baseline classifiers with a Gaussian and linear kernel . For further comparison purposes, we also compare with Linear Discriminant Analysis (LDA) and Minimum Distance to Mean (MDM), both frequent choices for EEG analyses . To compare with other leading neural network approaches, we also compare with several Recurrent Neural Network (RNN) models 
including vanilla RNN, Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), which have also been assessed for EEG classification in previous study.
Iv Results and Discussion
In this section, we present detailed experimental evaluation demonstrating the ability of our approach to accurately classify SSVEP signals in dry-EEG data. All the presented results are produced using 10-fold cross validation, with models which were initially optimised using hyperparameters chosen via a grid-search over a validation set. The key hyperparameters which are common across all the networks are L2 weight decay scaling 0.001, dropout level 0.5 and 0.001 for CNN and other deep approaches respectively222Training time taken for vanilla RNN 600 minutes, GRU 524 minutes, LSTM 619 minutes, CNN 4 minutes on Nvidia GeForce GTX 1060 GPU.
. For the CNN, the hyperparameters utilised are: convolution kernel size 1x10, kernel stride 4, maxpool kernel size 2, and ReLU as the activation function. The dataset and experimental setup are detailed previously in SectionIII-A.
Iv-a Single Subject Classification
The results for the classification of the data of a single subject (S01) are presented in Table I. The table highlights the accuracy of our proposed CNN approach against all the baselines discussed in Section III-C
. The results for the traditional approaches, are presented with and without pre-processing. Without pre-processing, we perform feature extraction only before classification with the traditional baseline approaches. Overall, the results show that, even without any pre-processing of the data, our CNN approach demonstrates superior performance over the baselines. The confusion matrix obtained from the classification of S01 using the CNN is presented in Figure3(a), which shows very strong accuracy across all classes.
Mean accuracy with standard deviation over 10-fold cross validation for subject, S01
Iv-B Multiple Subject Classification
The second results, presented in Table II, demonstrate the classification performance across three subjects S01, S02, S03, where a new classification model is trained for each subject. Due to the known impracticalities of collecting large amounts of data per subject , here we reduce the number of SSVEP presentation sessions (trials) per subject for each class to only 20. We also only consider the highest performing classification approaches across traditional and deep model approaches from the previous result of Section IV-A (CNN and SVM; Table I). The results highlight, that even with a reduced quantity of data available, the CNN approach still significantly outperforms the SVM across all subjects. This result highlights the applicability of the proposed CNN approach for BCI applications, where data quantity is often relatively limited .
Iv-C Classification Across Subjects
To assess the ability of a single CNN model to classify a dataset comprising data from all of the subjects S01, S02, S03, we classify all the signals from the three subjects together instead of performing individual classification. Having a single model trained on EEG data from multiple subjects is known to be challenging , potentially due to biological differences between subjects and the variability of the EEG recording process. However, the results presented in Table III show that the CNN is able to significantly outperform the SVM based approaches when performing classification across subjects. This can be further seen in Figures 3(b), 3(c) and 3(d), showing better performance across classes for the CNN.
Iv-D Generalisation Capability to an Unseen Subject
A strongly desirable quality for any model performing the classification of EEG is that of unseen subject generalisation - whereby the model is able to correctly classify data from a subject whose data is absent from a priori model training. To test this on our CNN model, we introduce the data of the unseen subject S04. We then attempt to classify these data using a model which was trained only on the data of the other three subjects, S01, S02, S03. Using the same CNN architecture for this task as depicted in Figure 3, we only achieve an accuracy of 0.59 on S04 without any additional training. We also attempt to classify the new test subject using SVM, however the SVM only displays random classification performance ( 0.25 accuracy; i.e 1/4 for 4 classes).
To overcome this performance issue, we explore a deeper architectural network variant as deeper networks have been shown to learn more complex features in order to determine the correlation between subjects . Figure 5 illustrates the deeper architecture where empirically, we repeat our SCU blocks (each dashed pink box represents a SCU block) to a maximum number of five. This deeper architecture, introduced to classify subject S04 data, demonstrates a substantially better classification accuracy of 0.69, perhaps suggesting that a deeper model is required to perform the unseen subject generalisation task. The confusion matrix for this result is presented in Figure 3(e). This figure demonstrates that the CNN has varying performance across the different classes, with the 30Hz signal being the best performing for this extended CNN model.
In this paper we introduce deep convolutional neural network architectures constructed around a common computational building block, for the classification of raw dry-EEG SSVEP data - the first such study to do so. We evaluate the performance of our model on SSVEP data recorded from four subjects using the noise-prone dry-EEG methodology. As compared with current state-of-the-art methods, our approach requires no pre-processing to the data, demonstrates higher overall classification accuracy across subjects and generalises significantly better to entirely unseen test subjects. These key results demonstrate that CNN based approaches should become the new benchmark method for SSVEP dry-EEG classification.
Future work would involve larger datasets to further study the classification and generalisation performance across subjects. The combination of the CNN and RNN models may also offer a way to increase overall performance.
The authors would like to thank the Ministry of Higher Education Malaysia and Technical University of Malaysia Malacca (UTeM) as the sponsors of the first author.
-  R. P. Rao, Brain-Computer Interfacing: an Introduction. New York, NY, USA: Cambridge University Press, 2013.
-  V. P. Oikonomou, G. Liaros, K. Georgiadis, E. Chatzilari, K. Adam, S. Nikolopoulos, and I. Kompatsiaris, “Comparative Evaluation of State-of-the-art Algorithms for SSVEP-based BCIs,” arXiv preprint arXiv:1602.00904, 2016.
-  J. Minguillon, M. A. Lopez-Gordo, and F. Pelayo, “Trends in EEG-BCI for Daily-life: Requirements for Artifact Removal,” Biomedical Signal Processing and Control, vol. 31, pp. 407–418, 2017.
-  M. A. Lopez-Gordo, D. Sanchez-Morillo, and F. Pelayo Valle, “Dry EEG electrodes,” Sensors, vol. 14, no. 7, pp. 12 847–12 870, 2014.
-  G. Edlinger and C. Guger, “Can Dry EEG Sensors Improve the Usability of SMR, P300 and SSVEP Based BCIs?” in Towards Practical Brain-Computer Interfaces. Springer, 2012, pp. 281–300.
-  T. R. Mullen, C. A. Kothe, Y. M. Chi, A. Ojeda, T. Kerth, S. Makeig, T.-P. Jung, and G. Cauwenberghs, “Real-time Neuroimaging and Cognitive Monitoring using Wearable Dry EEG,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 11, pp. 2553–2567, 2015.
-  Y.-P. Lin, Y. Wang, C.-S. Wei, and T.-P. Jung, “Assessing the Quality of Steady-State Visual-Evoked Potentials for Moving Humans using a Mobile Electroencephalogram Headset,” Frontiers in Human Neuroscience, vol. 8, no. March, pp. 1–10, 2014.
-  T. Mullen, C. Kothe, Y. M. Chi, A. Ojeda, T. Kerth, S. Makeig, G. Cauwenberghs, and T.-P. Jung, “Real-time Modeling and 3D Visualization of Source Dynamics and Connectivity using Wearable EEG,” in International Conference of Engineering in Medicine and Biology Society. IEEE, 2013, pp. 2184–2187.
-  G. Lisi, M. Hamaya, T. Noda, and J. Morimoto, “Dry-wireless EEG and Asynchronous Adaptive Feature Extraction Towards a Plug-and-play Co-adaptive Brain Robot Interface,” in IEEE International Conference on Robotics and Automation. IEEE, 2016, pp. 959–966.
-  D. E. Callan, G. Durantin, and C. Terzibas, “Classification of Single-trial Auditory Events using Dry-wireless EEG during Real and Motion Simulated Flight,” Frontiers in Systems Neuroscience, vol. 9, no. February, pp. 1–12, 2015.
-  A. M. Norcia, L. G. Appelbaum, J. M. Ales, B. R. Cottereau, and B. Rossion, “The Steady-state Visual Evoked Potential in Vision Research: A Review,” Journal of vision, vol. 15, no. 6, pp. 4–4, 2015.
-  N. S. Kwak, K. R. Müller, and S. W. Lee, “A Convolutional Neural Network for Steady State Visual Evoked Potential Classification under Ambulatory Environment,” PLoS ONE, vol. 12, no. 2, pp. 1–20, 2017.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
-  I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
-  R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with Convolutional Neural Networks for EEG Decoding and Visualization,” Human brain mapping, vol. 38, no. 11, pp. 5391–5420, 2017.
-  V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: A compact Convolutional Network for EEG-based Brain-Computer Interfaces,” arXiv preprint arXiv:1611.08024, 2016.
-  J. Thomas, T. Maszczyk, N. Sinha, T. Kluge, and J. Dauwels, “Deep learning-based classification for brain-computer interfaces,” in IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 2017, pp. 234–239.
-  A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, Physiotoolkit, and Physionet,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
-  J. W. Peirce, “PsychoPy—psychophysics Software in Python,” Journal of neuroscience methods, vol. 162, no. 1-2, pp. 8–13, 2007.
-  G. Gargiulo, R. A. Calvo, P. Bifulco, M. Cesarelli, C. Jin, A. Mohamed, and A. van Schaik, “A New EEG Recording System for Passive Dry Electrodes,” Clinical Neurophysiology, vol. 121, no. 5, pp. 686–693, 2010.
-  S. Bai, J. Z. Kolter, and V. Koltun, “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling,” arXiv preprint arXiv:1803.01271, 2018.
-  D. P. Kingma and J. Ba, “Adam: A method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger, “A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update,” Journal of neural engineering, 2018.
-  A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Classification of Covariance Matrices using a Riemannian-based Kernel for BCI Applications,” Neurocomputing, vol. 112, pp. 172–178, 2013.
-  Z. C. Lipton, J. Berkowitz, and C. Elkan, “A Critical Review of Recurrent Neural Networks for Sequence Learning,” arXiv preprint arXiv:1506.00019, 2015.
-  O. Dehzangi and M. Farooq, “Portable Brain-Computer Interface for the Intensive Care Unit Patient Communication Using Subject-Dependent SSVEP Identification,” BioMed Research International, vol. 2018, 2018.