Brain-computer interface (BCI) has been studied for motor-disabled patients to recover and replace their motor function, and even for healthy users to extend motor function capabilities with external devices control [8, 12, 2]. In non-invasive BCI paradigms, EEG signals are easily collected without brain surgery and commonly used due to their high temporal resolution . The EEG signals have been applied to various types of BCI paradigms such as event-related potential (ERP) , movement-related cortical potential (MRCP)  and motor imagery (MI) . EEG-based BCI paradigms have been developed for interaction between users and external devices [5, 10, 20, 13]. Of these paradigms, MI-based BCI decodes the EEG signals when the user imagines movements. While the user performs a MI task, event-related desynchronization/synchronization (ERD/ERS) patterns represented spectral features over the supplementary motor area and pre-motor cortex .
Decoding user intention from EEG data is one of the most challenging issues of BCI. One of the main reasons is that EEG signals have intricate and non-stationary properties and low signal quality 
. For MI, it is especially difficult to obtain high-quality data, as it is unknown what the user exactly imagined. Therefore, recent advances related to MI-based BCI approaches have investigated for improving the decoding accuracy using numerous feature extraction or classification methods based on advanced machine learning algorithms and deep learning. For example, filter bank common spatial pattern (FBCSP) algorithm has been widely adopted for MI classification with linear discriminant analysis (LDA) using spectral power modulations [9, 18]. Inspired by FBCSP, deep and shallow convolutional neural network (CNN) was developed for finding causal contributions of features in the different frequency bands 
. A compact CNN with depthwise convolution is trained to summarize individual feature maps over time to classify EEG data. These studies focused primarily on a few classes classification and non-intuitive tasks contained in BCI Competition IV data (left-hand, right-hand, foot, and tongue). However, intuitive MI is a practical BCI paradigm due to direct interaction between users and devices without artificial command matching . To the best of our knowledge, these approaches have not achieved satisfactory classification performance on intuitive MI yet. Therefore, in this paper, we focused on intuitive MI data classification containing various types of movements of a single-arm.
Hence, our main contributions represented in three folds: 1) We collected EEG data concerning single-arm movement imagery; arm reaching task in 3D space, hand grasping, and wrist-twisting. 2) We proposed an end-to-end role-assigned CNN (ERA-CNN) for classifying various MI tasks with high performance by adopting the principle of a hierarchical CNN architecture which extracts discriminative features from different body regions such as the arm, hand, and wrist of a single-arm. 3) The proposed ERA-CNN model achieved substantial improvement in MI classification performance, and we have shown that the principle of hierarchy is efficient at uncertain multi-class data classification.
2.1 Data description
We collected an intuitive MI dataset from nine healthy subjects between the age of 22 and 30 (6 males and 3 females, all right-handed). An EEG signal amplifier (BrainAmp, BrainProduct GmbH, Germany) was selected to recordEEG signals with a sampling rate of 1000 Hz and a 60 Hz notch filter. Additionally, a band-pass filter from 1-60 Hz applied to all channels. BrainVision software was used for data recording with 64 Ag/AgCl electrodes according to 10-20 international system. The FPz and FCz channels were selected as ground and reference respectively. From these 64 channels, we selected 24 channels placed on the motor cortex , which are most relevant for the MI task (F3, F1, Fz, F2, F4, FC3, FC1, FC2, FC4, C3, C1, Cz, C2, C4, CP3, CP1, CPz, CP2, CP4, P3, P1, Pz, P2, and P4). Impedances were measured between the electrodes and the scalp to maintain channels impedance below 15 k. During the experiment, subjects were asked to imagine specific muscle movements following the paradigm in Fig. 1 and performed 50 trials per each task. Total 9 classes of single-arm tasks were defined: arm-reaching (left, right, forward, backward, upward, downward), grasping, twisting, and the resting state. We divided the 6-class of arm-reaching tasks into horizontal reaching and vertical reaching. Additionally, the dataset was resampled at 250 Hz before classification. Data validation was done using an FBCSP and regularized linear discriminant analysis (RLDA) for each class. The protocols and environments were reviewed and approved by the Institutional Review Board at Korea University [1040548-KU-IRB-17-172-A-2].
ERA-CNN is an end-to-end convolutional neural network designed to extract frequency features through hierarchical convolution layers. Generally, hierarchical CNN consists of a shared layer and several sub-networks to separately obtain higher-level features . In the following section, we describe the design choices and training strategy of ERA-CNN. The overview of our architecture is shown in Fig. 2.
2.2.1 Shared layer for raw EEG signals
The shared layer consisted of two convolution blocks which classify each category. The first convolution block is composed of a temporal convolution layer and a spatial filter layer to reduce the dimensionality to a single channel. The temporal kernel size is set to a quarter of the input’s sampling rate (which creates a receptive field above 4 Hz) to remove ocular artifacts. In the second block, the convolution layer and softmax function conduct the categorization of classes defining it as arm-reaching MI or hand-related MI (grasping and twisting) or the resting state. If prediction of shared layer is not resting state, sub-networks utilize the features (shared features) from the second convolution layer of the shared layer as input to conduct detailed classifications for each sub-category.
Two sub-networks were exploited to improve classification accuracy, each specializing in predicting different types of MI tasks. A sub-network for hand-related MI classification is composed of three convolution-pooling blocks. Similarly, a sub-network for arm-reaching MI classification consists of four convolution-pooling blocks with a smaller kernel size to extract features. An extra block was added since more classes have to be classified using the same amount of shared features. The softmax function was applied to provide the final classification of each sub-network.
Contrary to other hierarchical CNNs, both sub-networks received shared features regardless of the shared layer output during training. In this way, one sub-network learns the correct classification, while the other sub-network learns the wrong cases at the same time. By training both cases, the sub-networks specialized in each classification role. In every convolution block of the ERA-CNN, we applied average pooling in order to reduce the dimensionality and perform smoothing of the EEG data. The exponential linear unit (ELU) was applied as the activation function, which can help avoid severe distortion of EEG data. The detailed design choices and filter sizes are described in Table 1.
|Parameter||Shared layer||Sub-network (Arm)||Sub-network (Hand)|
|Input||Raw EEG||Shared features||Shared features|
|(1, 1, 24, 751)||(1, 36, 1, 216)||(1, 36, 1, 216)|
|Hidden layer||Conv2D: 36||Conv2D: 36, 72, 144, 288||Conv2D: 36, 72, 144, 288|
|AvgPool: (1,3)||AvgPool: (1,3)||AvgPool: (1,3)|
|Stride: (1,3)||Stride: (1,3)||Stride: (1,3)|
|Last layer: Softmax||Last layer: Softmax||Last layer: Softmax|
|Loss||Cross entropy||Cross entropy||Cross entropy|
2.2.3 Loss functions
The ERA-CNN loss function consisted of three separate terms and the output of a shared layer is the probabilities for each categorized class. ERA-CNN selects sub-networks based on the prediction probability of each class. In order to take into account the uncertainty in this prediction for each selection (i.e. contribution to sub-networks of the shared layer), the loss function was modified as follows:
where is a probability to select a sub-network for arm-reaching MI classification and is a probability to select a sub-network for hand-related MI classification. , and are the loss of the shared layer, sub-network for arm-reaching MI classification and sub-network for hand-related MI classification respectively. These loss values are derived from the cross-entropy loss function  which is a weighted sum of loss values as:
where is label of the shared layer, and and are labels of the arm-reaching and hand-related MI respectively. is the classification output of a shared layer. and are outputs of the sub-network for the arm-reaching MI and hand-related MI classification. The number of arm-reaching classes determines the parameter M.
3 Results and Discussion
Fig. 3 shows the power spectral density (PSD) of EEG data per MI task. The frequency-domain analysis was conducted on EEG data for seeking frequency characteristics using PSD. Through the analysis, we confirmed that a high magnitude was obtained in mu-band (8-12 Hz). Accordingly, it is advantageous to extract frequency features or band power features of the EEG data. For evaluation, the data was organized into 3-class, 5-class, and 7-class. The dataset for 3-class classification contains categorized arm-reaching MI, hand-related MI, and resting state classes. For 5-class classification, we comprised the dataset with arm-reaching (left and right), grasping, twisting, and resting state. The dataset for 7-class classification further separated vertical reaching (VER) classification and horizontal reaching (HOR) classification. We added upward, downward, forward, and backward to the arm-reaching classes in the 5-class dataset. We used 32-size mini-batch and 200 epochs for training. For comparison with existing MI classification approaches, all experiments were conducted in the same test environment.
Table 2 indicates the classification accuracies of the ERA-CNN for each dataset. The dataset was split into training and test data for evaluation. In the 3-class classification, we used only a shared layer since the dataset consists of three categorized classes. As shown in Table 2, both 7-class dataset classification performances are slightly different (0.63 and 0.66). The highest accuracy of VER classification is 0.80 in sub8. On the other hand, the highest accuracy of HOR classification is 0.68 in sub1 and sub8. The chance level of the 7-class classification is around 0.14. We found that almost subjects who performed well in 3-class tended to show higher classification performance in other classifications.
Table 3 shows a comparison of classification accuracies and standard deviations with existing methods. EPA-CNN outperformed comparison groups in the classification accuracy. Unlike other methods based on singular structure, ERA-CNN divides the classes according to their roles. Hence, it classifies a series of a small number of classification classes, which can explain the increase in performance over singular models that classify entire classes at once. However, the ShallowConvNet which has a similar architecture with the shared layer marks the second-highest accuracy (0.78) in 3-class classification. Even in 7-class classifications, the accuracy of the ShallowConvNet shows better performance (0.42 and 0.48) than other methods because it extracts frequency band power features like the ERA-CNN. However, in the 5-class classification, all three methods record similar accuracies except FBCSP with RLDA. The difference in overall classification accuracy of the ERA-CNN model over existing methods was found to be significant using a pairedt-test (p-value 0.05). Due to the relatively small size of the dataset compared to other domains, performance can be improved using either augmentation or sliding window methods.
|Subjects||3-class||5-class||7-class (HOR)||7-class (VER)|
|Methods||3-class||5-class||7-class (HOR)||7-class (VER)|
|FBCSP+RLDA ||0.44 (0.08)||0.44 (0.05)||0.27 (0.04)||0.26 (0.04)|
|DeepConvNet ||0.70 (0.09)||0.56 (0.09)||0.37 (0.09)||0.39 (0.07)|
|ShallowConvNet ||0.78 (0.10)||0.54 (0.11)||0.42 (0.08)||0.48 (0.08)|
|EEGNet ||0.69 (0.13)||0.55 (0.08)||0.36 (0.06)||0.41 (0.08)|
|ERA-CNN||0.88 (0.05)||0.82 (0.03)||0.63 (0.04)||0.66 (0.07)|
4 Conclusion and Future works
In this paper, we proposed an ERA-CNN architecture that considers discriminative features for each upper limb region of MI classification. We demonstrated that the ERA-CNN achieved the highest classification accuracies (0.86, 0.82, 0.63, and 0.66) compared to existing methods (0.76, 0.56, 0.42, and 0.48). This improvement in performance opens up the possibility to perform continuous decoding for various types of upper limb movements. The proposed model can thus be applied to help intuitively control external devices with high accuracy, such as a robotic arm, which can ultimately help improve the autonomy of people with movement disabilities.
The authors thanks to B.-H. Kwon and J.-H. Cho for their help with the dataset construction and S. K. Prabhakar, P. Bertens and J. Kalafatovich for their discussion of the data analysis.
-  (2006) High Gamma Power is Phase-Locked to Theta Oscillations in Human Neocortex. Science 313 (5793), pp. 1626–1628. Cited by: §1.
-  (2018) BMI Control of a Third Arm for Multitasking. Sci. Robot. 3 (20), pp. eaat1228. Cited by: §1.
-  (2006) ERD/ERS Patterns Reflecting Sensorimotor Activation and Deactivation. Prog. Brain Res. 159, pp. 211–222. Cited by: §1.
-  (2015) Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv preprint arXiv:1511.07289. Cited by: §2.2.2.
-  (2015) Effect of Higher Frequency on the Classification of Steady-State Visual Evoked Potentials. J. Neural Eng. 13 (1), pp. 016014. Cited by: §1.
-  (2001) Motor Imagery and Direct Brain-Computer Communication. Proc. IEEE 89 (7), pp. 1123–1134. Cited by: §1.
-  (2011) Detection of Movement Intention from Single-Trial Movement-Related Cortical Potentials. J. Neural Eng. 8 (6), pp. 066009. Cited by: §1.
-  (2016) Noninvasive Electroencephalogram Based Control of a Robotic Arm for Reach and Grasp Tasks. Sci. Rep. 6, pp. 38565. Cited by: §1.
-  (2018) Classification of Hand Motions within EEG Signals for Non-Invasive BCI-Based Robot Hand Control. In Conf. Proc. IEEE Int. Conf. Syst. Man. Cybern., pp. 515–518. Cited by: §1, §2.1, Table 3.
-  (2020) Decoding Movement-Related Cortical Potentials based on Subject-Dependent and Section-Wise Spectral Filtering. IEEE Trans. Neural Syst. Rehabil. Eng.. Cited by: §1.
-  (2008) Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface. In Proc. Int. Jt. Conf. Neural Netw., pp. 2390–2397. Cited by: §1.
-  (2016) Commanding a Brain-Controlled Wheelchair Using Steady-State Somatosensory Evoked Potentials. IEEE Trans. Neural Syst. Rehabil. Eng. 26 (3), pp. 654–665. Cited by: §1.
-  (2017) Enhancing Detection of SSVEPs for a High-Speed Brain Speller Using Task-Related Component Analysis. IEEE Trans. Biomed. Eng. 65 (1), pp. 104–112. Cited by: §1.
-  (2019) EEG-Based Brain-Computer Interfaces Using Motor-Imagery: Techniques and Challenges. Sensors 19 (6), pp. 1423. Cited by: §1.
-  (2018) A Comprehensive Review of EEG-Based Brain-Computer Interface Paradigms. J. Neural Eng. 16 (1), pp. 011001. Cited by: §1.
-  (2017) Deep Learning with Convolutional Neural Networks for EEG Decoding and Visualization. Hum. Brain Mapp. 38 (11), pp. 5391–5420. Cited by: §1, Table 3.
-  (2013) Dynamically Weighted Ensemble Classification for Non-Stationary EEG Processing. J. Neural Eng. 10 (3), pp. 036007. Cited by: §1.
-  (2013) Non-Homogene-ous Spatial Filter Optimization for ElectroEncephaloGram (EEG)-based Motor Imagery Classification. Neurocomputing 108, pp. 58–68. Cited by: §1.
-  (2018) EEGNet: A Compact Convolutional Neural Network for EEG-Based Brain-Computer Interfaces. J. Neural Eng. 15 (5), pp. 056013. Cited by: §1, Table 3.
-  (2013) A Hybrid BCI System Combining P300 and SSVEP and Its Application to Wheelchair Control. IEEE Trans. Biomed. Eng. 60 (11), pp. 3156–3166. Cited by: §1.
-  (2015) HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition. In Proc. IEEE Int. Conf. Comput. Vis., pp. 2740–2748. Cited by: §2.2.
-  (2018) Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. In Adv. Neural Inf. Process. Syst. (NIPS), pp. 8778–8788. Cited by: §2.2.3.