Brain-computer interfaces (BCIs) allow direct communication between a human brain and an external device, without the involvement of peripheral nerves or muscular movements [54, 1]. Owing to its characteristics, the BCI system has been widely used as an assistive technology or a rehabilitation system for patients with severe motor disabilities such as spinal cord injury, stroke, or amyotrophic lateral sclerosis . However, recent BCI advances have also extended their purpose to healthy people, to maximize one’s physical capabilities  or provide neural entertainment[11, 26]. To this end, many researchers have adopted electroencephalogram (EEG) signals; this method offers higher portability and safety than other methods [34, 14, 24, 50]. To control external devices, EEG-based BCI systems allow various types of experimental paradigms such as event-related potential , movement imagination (MI) [24, 50, 31, 12], and steady-state visual evoked potential (SSVEP) [23, 28]. In particular, the MI-based BCI (MI-BCI) system has attracted considerable interest as it can provide direct communication between a user and a device without any external stimulus. Due to its advantages, the MI-BCI system has been specifically employed to control neuro-prosthesis (e.g., robotic arm and exoskeleton) using intuitive commands without any artificial interaction [48, 46, 15].
. In this study, we focus on maintaining a sufficient decoding performance only using few amounts of EEG data. Specifically, the MI-BCI system requires a considerable amount of time to record sufficient EEG data for robust classifiers training. Owing to these inevitable BCI environments, the MI-BCI system has considered the brain dynamics, reflecting each individual’s EEG characteristics. In addition, the subjects tend toward a state of inattention state in real-time experiments due to the long calibration times for offline experiments[17, 6, 49].
A variety of deep learning approaches[6, 25, 44, 52]
for the high performance of the MI-BCI system has drawn attention as a major advance; however, model learning still requires a large amount of EEG data. Advances in this area have a critical issue owing to the lack of large and uniform datasets. To solve this problem, we propose a gradual relation network (GRN) using only a small amount of EEG data. We adopted a few-shot learning approach that significantly contributes to the artificial intelligence (AI) field, particularly computer vision. The few-shot learning strategy can train neural networks using a few training samples, while maintaining a sufficient level of performance[51, 47, 19].
To evaluate our proposed GRN method, we acquired various upper extremity MI data from the upper-arm, forearm, and hand motions of each subject. In addition, we applied the GRN for robotic arm control to test intuitive MI decoding with a small amount of EEG training data. Furthermore, by selecting the representative class of each body part, we could assure the user’s free will on the various MI that correspond to the multivariate circumstances. We could confirm the possibility of an online brain-controlled robotic arm drinking system using intuitive MI commands with a shortened calibration time by using the GRN.
Hence, the novelty of this study can be summarized as follows: i) This is the first attempt at adopting a few-shot learning approach to a real-time MI-BCI system; ii) Designing a shared robotic arm control system that ensures the user’s various intuitive controls rather than specifying the MI commands. iii) The proposed GRN model outperforms conventional methods regardless of the size of the training dataset.
Ii Related Works
A few recent BCI studies have considered deep learning techniques to recognize user intentions from EEG signals. For example, Li et al. 
proposed a novel hybrid network based on a restricted Boltzmann machine (RBM) for event-related potential detection. Lawhern et al. introduced a compact CNN model (EEGNet) for BCIs. They adopted the proposed EEGNet for three typical BCI paradigms: P300 potential, movement-related cortical potentials (MRCP), and error-related negativity. The decoding performance of EEGNet was comparable to those of the conventional methods. Schirrmeister et al.  demonstrated a robust MI classification for the multi-class problem using a shallow and a deep structured model ShallowConvNet and DeepConvNet, respectively. In addition, some studies presented robust EEG decoding using advanced deep learning architecture . Jiao et al. 
proposed a deep convolutional neural network (CNN) for mental load classification; Zhang et al., demonstrated a cross-task mental workload assessment using recurrent 3D-CNN. Lu et al.  proposed a novel RBM scheme for MI classification, and Zhang et al.  proposed a novel deep learning approach using a data augmentation method for MI classification with small amounts of EEG data.
Conventional BCI studies using a deep learning approach have discussed comparatively insufficient training data as one of the critical challenges [6, 20, 27]. Constructing sufficient EEG data for training the model is realistically impossible. Therefore, we focused on employing few-shot learning approaches for studying BCIs. We hypothesized that the successful adoption of few-shot learning approaches for studying BCI can lower the calibration time for real-time scenarios and deliver a robust decoding performance using only a few data samples. Furthermore, by using the metric-based few-shot learning approach, we expect the trained model to project proper MIs to the proper sub-part of the upper extremity.
In other research fields, few-shot approaches have demonstrated successful classification and decoding for their problems. Sung et al.  presented a general and flexible framework for few-shot learning, where a classifier must be extended to new classes not observed in the training set, given only a small number of examples of each new class. Their proposed framework learns an embedding and a deep non-linear distance metric for sample items and comparing queries. Snell et al.  proposed a network for the problem of few-shot classification. This network learns a metric space wherein computing distances to prototype representations of each class can be used for classification. In addition, Vinyals et al.  designed matching networks (MN) using only one shot learning for image classification. They showed the highest performances as 98.1% and 98.9%. Koch et al.  proposed a deep Siamese neural network for 1-shot image recognition. They evaluated the recognition performance using the Omniglot dataset and shows comparable performance to that of a human test.
Based on these conventional studies with advanced algorithms, in this study, we designed a GRN model for robust MI decoding using only a small amount of EEG data. Furthermore, we verified that our model could contribute to real-time BCI scenarios using the robotic arm control.
Iii Data Acquisition and Experimental Setup
Twenty-five healthy subjects (15 males and 10 females, aged 20-28 years, all right-handed) were recruited for the experiment. All participants were novices in the BCI system. They were sufficiently informed about the protocols of the overall experiment before it began.. As the brain signals of an individual can change according to the physiological and psychological characteristics at different times, all the volunteers were notified to attend the experiments three times at one- or two-week intervals. Subsequently, a written consent per the Declaration of Helsinki was provided. The overall experimental protocols and environments were reviewed and approved by the Institutional Review Board at Korea University (KUIRB-2020-0013-01).
Iii-B Experimental Setup
The subject was seated comfortably on a chair before the experiment began and informed to adjust the seat 60 () cm away from the LCD monitor (refresh rate: 60 Hz; resolution: 1,920 1,080) as illustrated in Fig. 1(a). An EEG cap with 60 channels (ActiCap, BrainProduct GmbH, Germany) was placed on the scalp to acquire brain signals. Sixty-EEG channels followed the 10-20 international configuration and were located as followed: Fp1-2, AF5-8, AFz, F1-8, Fz, FT7-8, FC1-6, T7-8, C1-6, Cz, TP7-8, CP1-6, CPz, P1-8, Pz, PO3-4, PO7-8, POz, O1-2, Oz and Iz. The signals were recorded using the ground and reference located at FPz and FCz. The impedance of the signals were maintained below 15 k during the experiments.
Iii-C Experiment Protocols
The MI experiment protocols were designed to decode the various user intention with respect to the single upper extremity of three sub-parts the upper-arm, forearm and hand. For this reason, each session for data acquisition was divided into three parts corresponding to the sub-parts of the single upper extremity. The sequence of the three sub-parts in one session was equally distributed to collect the data on the user’s condition as below.
Iii-C1 Session 1
upper-arm forearm hand
Iii-C2 Session 2
forearm hand upper-arm
Iii-C3 Session 3
hand upper-arm forearm
Each participant was provided flexible break times between the sub-parts. We collected eleven MI classes during the experiments six-classes for the upper-arm, two- for the forearm, and three- for the hand. To embrace the various intuitive MI commands, we chose one representative class for each sub-part to train the proposed GRN model. The untrained candidate classes and trials were used to verify whether the model can project into proper sub-parts. The upper-arm of the extremity typically used to select or move the object in three-dimensional space using upper-arm reaching. Therefore, six different arm-reaching MI classes were chosen as candidates among the various three-dimensional reaching. Forward upper-arm reaching MI was extracted as a representative class of upper-arm movement among six upper-arm related candidate classes. The forearm of the human body typically performed the twist; therefore, we collected two left- and right- twist MI commands as candidates and used the left-twist command as a representative class of forearm. The human hands are primarily focused on grasping the object. Therefore, three different grasping styles lateral, cylindrical and spherical were acquired, and the cylindrical grasp for grasping a cup was chosen as a representative class. The role of the human upper extremity cannot be limited to several specific commands. Therefore, to assure the user’s free will on the robotic arm we attempted to divide the human upper extremity into three sub-parts and allow the user to perform any intuitive command. By using the metric-based approach, the GRN could successfully detect the user intention in each sub-part and allow users to perform various MIs in the online session without any constraint on the class.
The experimental paradigm consisted of three seconds of instructions, four seconds of MI, and four seconds of rest state with a fixation cross, as illustrated in Fig. 1(b). Each of the eleven classes contained fifty trials. Five hundred fifty trials for each subject at each session were acquired and 1,650 trials for each individual were obtained for the entire experiments.
Iv Gradual Relation Network
Iv-a Data Description
EEG signals were acquired from the 60 electrodes by sampling at 2,500 Hz, cutting off artifacts by a notch filter at 60 Hz, and band-pass filtering between 0.5 to 40 Hz, which contained most of the MI-related rhythms 
. Each individual at each session returned epochchannel time samples (550 60 7,500). All the signals were down-sampled to 250 Hz and 25 channels as followed: F1-4, FC1-4, C1-4, CP1-4, P1-4, Fz, FCz, Cz, CPz, and Pz were located on the sensorimotor cortex [36, 34, 15]. To extract subtle minute spatial differences we reconstructed the two-dimensional channel location from the one-dimensional location [48, 24]. Therefore, during training, the input to our GRN was fixed at 55750.
The data acquisition process of MI consumes a relatively large amount of time to record sufficient data for robust classifier training [4, 30, 24]. This time-consuming offline data acquisition process leads to considerable exhaustion for the user, which can deteriorate the online performance. To avoid this problem, we considered adopting few-shot learning approaches; we considered 1- and 5-shot settings as conventional few-shot approaches [51, 53, 19, 47]. Furthermore, 25-shot settings (50% of trials) were also considered to evaluate the performance of the conventional train/test division.
In few-shot settings, -shot represents labeled examples for each class . The training dataset labeled with class consists of
. The training strategy of the proposed GRN method follows Algorithm 1. The mean squared error loss was imported as a loss function, and all the experiments used Adam with an initial learning rate . In addition, all the models were end-to-end trained from scratch with no additional dataset.
Iv-B Overall Framework of GRN
The GRN consisted of one encoder and one relation module to retrieve the relation score between one encoded feature and prototypical feature of class , as a test phase of Algorithm 1. The encoder was trained by dataset . The primary focus of this encoder was the preservation of frequency information and dimension reduction. The relation module of the GRN gathered the correlated spectral and temporal information together and compared it with the prototypical feature, . By comparing the encoded feature gradually by groups, the model could accumulate the correlated frequency and temporal information more precisely. of the test phase in Algorithm 1. depicts the overall framework of the GRN to retrieve the relation score of the input EEG signals and prototypical feature .
The GRN encoder contained three convolutional blocks, two temporal filters and two spatial filters as illustrated in Fig. 2. Channel-wise CNN was used to extract temporal and spectral information as reported previously 
. The receptive field of the channel-wise CNN was determined to extract the frequency information at 4 Hz and above, which had a length of (1, 1, 65). The embedding module in GRN is aimed at preserving the spectral and temporal information to be properly combined at the relation module. Herein, we developed a relation module to construct nine groups consisting of four channels; this number can be varied as a hyper-parameter. Then, we used depthwise CNN of size (5, 5, 1) to extract two frequency-specific spatial filters so that the 72 channels were extracted at the second and third layers. The spatial filter returned nine groups with eight channels that contained two spatial filters for each grouped frequency or temporal information. Lastly, the encoder of the GRN contained an additional convolutional layer to reduce the size of the embedding feature, which was a size of (1, 1, 65) but with a (1, 1, 10) stride. Each of the convolutional blocks was followed by batch normalization and exponential linear unit (ELU) non-linearization. The encoder returned 72 channels with 63 feature maps was reshaped into nine groups with eight channels ().
While training the embedding module of GRN, the prototypical feature of each class is calculated by using labeled examples. The prototypical feature is calculated using the average encoded feature for each class as detailed in Algorithm 1.
|Session||Session 1||Session 2|
Iv-D Relation Module
Each of the encoders returned nine groups each containing eight channels with 63 feature maps. The relation module primarily focused on grouping the encoded features and retrieving the relation scores between the encoded features and the prototypical feature of class . Both encoded and prototypical features were combined by a group and were compared gradually at the first layer of the relation module. By comparing those encoded features gradually, the relation module could compare various spectral and temporal feature groups (depicted in Fig. 7).
Specifically, one encoded feature from the unknown EEG signals and prototypical features of class
were concatenated group by group alternately as illustrated in Fig. 3. The group of encoded features from the respective encoders were combined by the convolutional layer. The first convolutional layer of the relation module has the role of combining two clustered groups. Each clustered group was calculated by the convolutional layer size of (1, 10) receptive field and 32 channels were returned for each concatenated group to retrieve more information between the two groups. After this feature combination according to groups, average pooling was performed over a (1, 2) window, with stride 2. The second layer of the relation module was used to combine all the nine compared groups using a convolutional layer, with a size of (1, 10) and returned 288 filters with 18 feature maps. Global average pooling (GAP) was used to prevent the overfitting problem of the fully-connected (FC) layer, rather than vectorizing and constructing a 5,184 feature vector (
). The retrieved feature vector (with size 288) passed two FC-layer and sigmoid functions to extract a relation score between two encoded features.
To predict the unknown EEG signals in the sub-part of the upper extremity, the relation score of each sub-part
becomes an input of softmax to retrieve the probability distribution, and the maximum value becomes the prediction of the unknown EEG signals as a test phase of Algorithm 1.
Iv-E Performance Evaluation
Iv-E1 Offline Experiment
The performance of GRN was evaluated on two different datasets, representative classes, and candidate classes using the same model trained by the representative classes. Additionally, as offline data acquisition has the purpose of online robotic arm control in the BCI system, we also performed an online evaluation of drinking water within five training data. Therefore, we proceeded to follow three performance evaluations.
Performance evaluation on the representative classes with 1-shot, 5-shot, and 25-shot settings.
Performance evaluation on the intuitive candidate classes that may occur during the online experiment.
System performance of robotic arm manipulation to drink water during the online experiment.
To control the robotic arm more intuitively and verify the robustness of the model, the overall dataset contained three representative classes and eight candidate classes. The eight candidate classes consisted of classes that the user may use while controlling the robotic arm.
. The performance can be varied according to the composition of the training datasets. Therefore, we trained the model by using ten different combinations of datasets and retrieved the average and standard deviation as presented in Table I. The average performance on 10 different datasets was compared with those of the conventional methods, and the comparison is presented in the first row of Fig. 4.
Iv-E2 Online Experiment
The purpose of the time-consuming data acquisition process in the BCI system is to control or communicate with external devices. We propose the robotic arm control system with a GRN trained by a small amount of EEG data to avoid the exhaustion of the user on the offline data acquisition process. Therefore, we performed the online drinking task to prove the feasibility of the online robotic arm control system by using a small amount of data.
Five participants (Sub 2, Sub 9, Sub 13, Sub 22, and Sub 24), who outperformed 60% were selected as the participants in the online robotic arm control. Sub 12 was absent for the online robotic arm control session because of the personal issue. After 10 trials of the offline data acquisition process per class, five participants performed the online drinking task in the experimental environment as illustrated in Fig. 5.
The paradigm of the online robotic arm control was developed in a similar manner to that of the offline data acquisition process but with a 5-seconds long MI signal acquisition; the instruction and rest stage was substituted by the robotic arm manipulation stage. The 5-seconds long of the MI signal acquisition stage was divided into five sliding windows, each 3-seconds long. A detailed description and the results of the online experiment are presented below for the drinking task in the Experimental Results section.
V Experimental Results
V-a Performance Evaluation of the Representative Classes
Table I presents the averaged decoding accuracies across all subjects by using the proposed method of the representative classes. In session 1, twenty-five subjects show 40.99% (), 51.22% (), and 75.55% () when and , respectively. 53.52% (), 59.89% (), and 84.24% () for session 2; 41.49% (), 55.69% (), and 82.76% () for session 3. The overall performance when and were 42.57% (), 55.60% (), and 80.85% (
) respectively. The standard deviation in Table I represents the performance variation according to different training datasets of an individual. As EEG signals are non-stationary, the performance of the models trained by 1- and 5-shot settings were easily affected by the composition of the training set and exhibited comparatively large variances[41, 5, 10].
|Avg.||Model Std.||Sub. Std.||Avg.||Model Std.||Sub. Std.||Avg.||Model Std.||Sub. Std.|
|Sub 2||Sub 9||Sub 13||Sub 22||Sub 24||Avg.|
|Control time (s)||56.36||62.06||74.72||64.59||55.73||62.69|
The first row of Fig. 4. shows the scatter plots that indicate the performance comparison between individual subjects and sessions. The x-axis represents the classification performance of the conventional methods, whereas the y-axis represents the performance of the proposed GRN. The blue, orange and grey points of the scatter plot represents the performances of 1-, 5-, and 25-shot settings, respectively. The proposed methods outperformed FBCSP by 7.29%, 5.25% and 9.27% and the conventional deep-learning methods by -0.74%, 8%, and 21.9% for 1-, 5-, and 25-shot settings respectively. The GRN could outperform both conventional machine learning and deep learning methods for all the training amounts except the 1-shot deep learning approaches.
Table II compares the average performances of all the subjects at all sessions and the following two standard deviations represent the standard deviation of the model by the trainset and the standard deviation of all 25 subjects. The GRN with the single group outperformed the conventional deep learning methods on the 1-, 5-, and 25-shot settings by 0.6%, 3.75%, and 2.4%, respectively. On the contrary, the GRN with the single group could not outperform FBCSP in the 25-shot setting. In could only surpass the performance on the 1- and 5-shot settings by 8.63% and 1.03%, respectively. The single grouped GRN exhibited a performance degradation by 10.23% in the 25-shot setting, in comparison with FBCSP.
V-B Performance Evaluation of the Candidate Classes
Collecting intuitive command in various environments is difficult, and the candidate classes were used to verify whether the model could afford and allocate the untrained intuitive classes to the proper sub-part of the upper extremity. This relatively high performances of the candidate classes allowed us to design an intuitive online robotic arm control system but without additional training.
Table II and the second row of Fig. 4. compare the performances of the candidate datasets. The model with the highest accuracy among the ten models trained by representative classes was used to retrieve the results of the candidate classes. The proposed method outperformed FBCSP by 24.93%, 5.94%, and 3.9% and the conventional deep learning approaches by -0.62%, 5.6%, and 20.53% in the 1-, 5-, 25-shot settings, respectively. Even the deep learning approaches contained larger parameters and outperformed FBCSP on small amounts of training data, but it could not outperform it in the 25-shot setting. However, the proposed GRN method could outperform both deep learning and FBCSP regardless of the training amounts except in the 1-shot setting.
The GRN outperformed both representative and candidate classes on all the training settings, except in the 1-shot setting. The standard deviation of the subjects in the candidate classes was larger than that in the representative classes. However, performance degradation in the untrained candidate classes was not significant. Therefore, we could design the online experiment to perform intuitive MI rather than the representative commands acquired in the data acquisition process.
V-C Online Robotic Arm Manipulation for Drinking Task
The purpose of data acquisition in the BCI domain was to control or communicate with external devices; however, the long calibration time of the BCI system led to considerable distortion of the brain signals of the user, which in turn may have led to performance deterioration. Therefore, we acquired 10 trials for each representative class; five trials were allocated for training and five for the validation. This shortened calibration time allowed for avoiding the physical burden to the user. Furthermore, as we verified how candidate classes can be effectively allocated to the sub-parts in the previous section, for this reason, we instructed the user to perform the various intuitive MIs to control the robotic arm not just the class acquired by the data acquisition process.
Because of the safety issue, five subjects (Sub 2, Sub 9, Sub 13, Sub 22, and Sub 25) who exhibited higher performance than 60% on three offline sessions attended the online robotic arm control session. The user of the system was instructed to perform ten water drinking tasks using the robotic arm, as illustrated in Fig. 5. A beep sound was provided to alert the start of online signal acquisition. The 5 seconds, starting from the beep sound, was divided into 3 seconds sliding windows with 500 millisecond strides. Each of the 3 seconds sliding windows returned a relation score for each class from the GRN. The highest prediction value using Equation (1) becomes a command to control the robotic arm. To assist the drinking task by using the robotic arm, each of the decoded user intentions related to the sub-part corresponds to the role of drinking. Upper-arm-related command from the user allows the robotic arm to locate adjacent to the object, the hand-related command allows the robotic arm to grasp the adjacent object and the forearm-related command of the user allows the robotic arm to tilt the robotic hand (Fig. 5).
The user of the system was instructed to perform drink water 10 times using the robotic arm. During this online experiment, the user could drink successfully using three MI commands in sequence (upper-arm MI to adjacent the robotic arm, hand MI to grasp and forearm MI to twist the robotic arm to assist the user to drink). Theoretically, the shortest control time for one drinking task was 19 seconds.
The exact location of the object was derived by the object detection model YOLO  from the Kinect V2 RGB-D sensor, as in our previous study . However, for the case of unexpected decoding results, the user could restore the previous status by performing eye blinking twice; a node horizontally was used as a veto function to initialize the robotic arm, and this counted as a failure of the system.
Table III presents the success rate of online experiments on drinking water using the MI-based robotic arm and control commands and average control time of each drinking task. Subject 24, who shows the highest performance in the three sessions, could drink nine times successfully, and each of the drinking tasks took 55.73 seconds on an average. Five subjects could achieve a success rate of 78.00% and each of the tasks contained 9.9 commands on an average.
Vi-a Few-shot Learning Approaches on BCI
The time-consuming data acquisition process of the BCI system hinders the construction of big data and performance of online tasks. Therefore, many studies attempted to overcome this limitation by using small amounts of various user’s data rather than a large number of individuals [24, 39, 13]. However, conventional studies of this well-known subject independent approach are limited in the rough classes such as left and right hand MI because of its low performance. As the purpose of the subject-independent approach is to retrieve common features from various people, the performance is lower than that of the subject dependent approach [24, 8]. Hence, we imported few-shot learning approaches from the vision domain [51, 53, 47]. Using this approach, we expected to extract discriminant subject dependent features within a few EEG data, while preserving the performance.
As deep learning approaches in the BCI field, such as EEGNet and DeepConvNet contain more parameters that are required to be trained in the case of a machine learning approach such as FBCSP, a deep learning approach suffers in training the model. Therefore, the machine learning approach outperforms the conventional deep learning approach in 5-shot and 25-shot settings. On the contrary, the proposed GRN method could outperform both deep learning and machine learning, irrespective of the training amounts by constructing the correlated feature group and gradually comparing the groups. Furthermore, the metric-based approach of few-shot learning allows the model to choose a comparatively adjacent class related to user intention.
We proposed the first few-shot learning approach in the BCI robotic arm system. A large amount of labeled data in the vision domain allows stable training of the model in deep learning. On the contrary, constructing a large labeled dataset of an individual in the BCI system to perform intuitive online control is nearly impossible as the brain signal of an individual can change owing to the different physiological and psychological characteristics at different times. Therefore, we propose adopting a few-shot learning approach as a further study of the BCI system rather than extracting common features of all the people or performing a time-consuming data acquisition process.
Vi-B Intuitive Commands for BCI-based Robotic Arm Control
The definition of the term ”intuitive” is based on feelings rather than facts or proof; therefore, intuitive MI commands of the BCI system can be varied according to the circumstances that the user faces, and it is impossible to collect all the intuitive MIs in diverse circumstances. Therefore, we used three representative classes that corresponded to each sub-part of the upper extremity and selected eight additional candidate classes that possibly occurred during the online robotic arm control.
The ultimate goal of the intuitive robotic arm is to control the robotic arm without any sense of displacement. The low SNR of the EEG signals and limitless degrees of freedom (DoF) in the upper extremity hinder its ultimate goal. Therefore, in this study, we aimed to ensure the user’s intuitive MIs on the multivariate real-time environments rather than limit the user’s free will into specific classes by focusing on the sub-part of the upper extremity. For intuitive robotic arm control, asynchronous control is necessary. However, the performance of the proposed GRN method with five training trials was not high enough to realize safe asynchronous robotic arm control. Hence, we decide to perform cue-based robotic arm control with five training trials as a first step towards the realization of asynchronous robotic arm control with small amounts of EEG data.
During the online robotic arm control, we instructed the user to perform intuitive MI command to drink water not only using three representative classes acquired in the data acquisition process but also various intuitive commands. All five subjects could drink water successfully using the robotic arm with an average success rate of 78.00%. However, the single sub-part movement of the upper extremity limited the DoF of the user. To restore the role of the upper limb successfully, the BCI needs to design a new approach that can decode complicate movements of multiple sub-parts simultaneously, such as hand grasping while arm reaching.
Vi-C Comparison of Grouped Information
The proposed GRN method focused on constructing nine groups containing four temporal or spectral patterns, and gradually comparing two groups from each encoder. By visualizing the encoded feature and its spectral information, we attempted to understand how the model trained with small amounts of EEG data.
Fig. 6. presents the feature space of encoded features from the encoder. The dimensions of the encoded features were reduced using t-stochastic neighbor embedding (T-SNE) to the two-dimensional features space. Both representative and candidate classes were encoded and projected to the feature space. (a), (b), and (c) were the output of the encoder trained by 1, 5, and 25 training trials of Sub 22, respectively. All three models could encode three sub-parts discriminantely on both representative and untrained candidate classes. (d)-(l) represent the encoded features of nine groups of the single encoder. The proposed GRN method attempted to use various feature groups for complementary interaction. As can be inferred from Fig. 6 (d)(l), some groups may confuse this owing to non-stationary EEG data, but the relation module of the GRN can compensate for the loss or distorted information using other spectral or temporal groups.
Fig. 7 is the spectral information after the first layer of the encoder. As all the convolutional layers consisted of depthwise CNN and gradually combined at the relation module, the spectral or temporal information is preserved by the group. However, the temporal information of the MI has not been discovered, except for a few patterns such as event-related desynchronization (ERD) or event-related synchronization (ERS) [31, 37]; therefore we focused on visualizing the spectral information, using the fast Fourier transform (FFT). The peak value of the output of FFT is colored as in Fig. 7;approximately 68.59% of feature components were related to the beta rhythms ([12-30] Hz). Moreover, all the subjects except Sub 12 retrieved the meaningful information from the beta rhythms. The model for Sub 12 extracted the information from the delta band ([0-4] Hz), which may contain MRCP features.
Vi-D Study Limitations and Future Works
Our study has several limitations that call for future investigations. First, the decoding performance in the 5-shot setting GRN was not sufficient to realize a stable robotic arm control. There are several approaches in few-shot learning such as recurrent neural network (RNN) memory-based approaches[33, 43] and fine-tuning approaches[9, 38]. These use of these approaches may be more appropriate in the BCI domain than in the metric-based GRN. Second, this study is the first attempt at verifying the feasibility of intuitive online robotic arm control with a few EEG data; therefore we performed cue-based online robotic arm control rather than asynchronous control. The continuous control is considered to facilitate the extension of BCI toward the realistic control of physical devices in home and clinical settings [7, 28].
In this study, we proposed the GRN to decode the intuitive MI decoding method for controlling the online robotic arm with a few EEG data. The grouped spectral and temporal feature allows the model to compensate for the low SNR. We verified that the GRN could outperform conventional methods in both representative and untrained candidate classes in proper sub-parts than the conventional methods.
In conclusion, we constructed the intuitive robotic arm control system with a few training trials using EEG signals. Our study demonstrates superior performance and promising approaches to the few-shot learning method in both offline and online BCI system. This study could pave the way for an intuitive upper extremity MI online robotic arm control within a few EEG data.
-  (2019) A comprehensive review of EEG-based brain-computer interface paradigms. J. Neural. Eng. 16 (1), pp. 011001. Cited by: §I.
-  (2008) Filter bank common spatial pattern (FBCSP) in brain-computer interface. In IEEE Int. Joint Conf. on Neural Netw., pp. 2390–2397. Cited by: TABLE II.
-  (2017) EEG-based strategies to detect motor imagery for control and rehabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 25 (4), pp. 392–401. Cited by: §I.
-  (2007) The non-invasive berlin brain–computer interface: Fast acquisition of effective performance in untrained subjects. Neuroimage 37 (2), pp. 539 – 550. External Links: Cited by: §IV-A.
-  (2019) Cycle-by-cycle analysis of neural oscillations. J. Neurophysiol. 122 (2), pp. 849–861. Cited by: §IV-E1, §V-A.
-  (2019) Deep learning for electroencephalogram (EEG) classification tasks: a review. J. Neural Eng. 16 (3), pp. 031001. Cited by: §I, §I, §II.
-  (2019) Noninvasive neuroimaging enhances continuous neural tracking for robotic device control. Sci. Robot. 4 (31), pp. eaaw6844. Cited by: §VI-D.
Inter-subject transfer learning with an end-to-end deep convolutional neural network for EEG-based BCI. J. Neural Eng. 16 (2), pp. 026007. Cited by: §VI-A.
-  (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. Int. Conf. Mach. Learn. (ICML), pp. 1126–1135. Cited by: §VI-D.
Time-frequency mixed-norm estimates: Sparse M/EEG imaging with non-stationary source activations. Neuroimage 70, pp. 410–422. Cited by: §IV-E1, §V-A.
-  (2019) An improved five class MI based BCI scheme for drone control using filter bank CSP. In Int. Winter Conf. on Brain-Computer Interface (BCI), pp. 1–6. Cited by: §I.
-  (2017) A new self-regulated neuro-fuzzy framework for classification of EEG signals in motor imagery BCI. IEEE Trans. Fuzzy Syst. 26 (3), pp. 1485–1497. Cited by: §I.
-  (2016) Transfer learning in brain-computer interfaces. IEEE Comput. Intell. M. 11 (1), pp. 20–31. Cited by: §VI-A.
-  (2020) Decoding movement-related cortical potentials based on subject-dependent and section-wise spectral filtering.. IEEE Trans. Neural Syst. Rehabil. Eng. 28 (3), pp. 687–698. Cited by: §I.
-  (2020) Brain-controlled robotic arm system based on multi-directional CNN-BiLSTM network using EEG signals.. IEEE Trans. Neural Syst. Rehabil. Eng.. Cited by: §I, §IV-A.
-  (2019) Classification of drowsiness levels based on a deep spatio-temporal convolutional bidirectional LSTM network using electroencephalography signals. Brain Sci. 9 (12), pp. 348. Cited by: §II.
-  (2018) Sparse group representation model for motor imagery EEG classification. IEEE J. Biomed. Health Inform. 23 (2), pp. 631–641. Cited by: §I.
-  (2018) Deep convolutional neural networks for mental load classification based on EEG data. Pattern Recognit. 76, pp. 582–595. Cited by: §II.
-  (2020) Few-shot learning with geometric constraints. IEEE Trans. Neural Netw. Learn. Syst.. Cited by: §I, §IV-A.
-  (2018) A large electroencephalographic motor imagery dataset for electroencephalographic brain computer interfaces. Sci. Data 5, pp. 180211. Cited by: §II.
-  (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-A.
-  (2015) Siamese neural networks for one-shot image recognition. In Proc. Int. Conf. Mach. Learn. (ICML), Vol. 2. Cited by: §II.
-  (2019) Error correction regression framework for enhancing the decoding accuracies of ear-EEG brain-computer interfaces. IEEE Trans. on Cybernetics. Cited by: §I.
-  (2019) Subject-independent brain-computer interfaces based on deep convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. (), pp. 1–14. External Links: Cited by: §I, §IV-A, §IV-A, §VI-A.
-  (2018) EEGNet: A compact convolutional neural network for EEG-based brain-computer interfaces. J. Neural Eng. 15 (5), pp. 056013. Cited by: §I, §II, TABLE II, §VI-A.
-  (2018) A high performance spelling system based on EEG-EOG signals with visual feedback. IEEE Trans. Neural Syst. Rehabil. Eng. 26 (7), pp. 1443–1459. Cited by: §I.
-  (2019) EEG dataset and openbmi toolbox for three BCI paradigms: an investigation into BCI illiteracy. GigaScience 8 (5), pp. giz002. Cited by: §II.
-  (2015) Towards independence: a BCI telepresence robot for people with severe motor disabilities. Proc. IEEE 103 (6), pp. 969–982. Cited by: §I, §VI-D.
-  (2018) A hybrid network for ERP detection and analysis based on restricted Boltzmann machine. IEEE Trans. Neural Syst. Rehabil. Eng. 26 (3), pp. 563–572. Cited by: §I, §II.
-  (2007-01) A review of classification algorithms for EEG-based brain–computer interfaces. J. Neural Eng. 4 (2), pp. R1–R13. External Links: Cited by: §IV-A.
-  (2006) Motor imagery. J. Physiol. Paris 99 (4-6), pp. 386–395. Cited by: §I, §VI-C.
-  (2017) A deep learning scheme for motor imagery classification based on restricted Boltzmann machines. IEEE Trans. Neural Syst. Rehabil. Eng. 25 (6), pp. 566–576. Cited by: §II.
-  (2017) Meta networks. In Proc. Int. Conf. Mach. Learn. (ICML), pp. 2554–2563. Cited by: §VI-D.
-  (2012) Brain computer interfaces, a review. Sensors 12 (2), pp. 1211–1279. External Links: Cited by: §I, §IV-A.
-  (2018) BMI control of a third arm for multitasking. Sci. Robot. 3 (20), pp. eaat1228. Cited by: §I.
-  (1997) EEG-based discrimination between imagination of right and left hand movement. Electroencephalogr. Clin. Neurophysiol. 103 (6), pp. 642 – 651. Cited by: §IV-A.
-  (2001) Motor imagery and direct brain-computer communication. Proc. IEEE 89 (7), pp. 1123–1134. Cited by: §VI-C.
-  (2017) Optimization as a model for few-shot learning. In Proc. Int. Conf. Learn. Represent. (ICLR), Cited by: §VI-D.
-  (2015) A subject-independent pattern-based brain-computer interface. Front. Behav. Neurosci. 9, pp. 269. Cited by: §VI-A.
-  (2016) You only look once: Unified, real-time object detection. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 779–788. Cited by: §V-C.
-  (2019) Deep learning-based electroencephalography analysis: A systematic review. J. Neural Eng.. Cited by: §IV-E1, §V-A.
-  (2018-11) Learning temporal information for brain-computer interface using convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29 (11), pp. 5619–5629. External Links: Cited by: §IV-A, §IV-C.
-  (2016) Meta-learning with memory-augmented neural networks. In Proc. Int. Conf. Mach. Learn. (ICML), pp. 1842–1850. Cited by: §VI-D.
-  (2017) Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 38 (11), pp. 5391–5420. Cited by: §I, §II, TABLE II, §VI-A.
-  (2015) An autonomous robotic assistant for drinking. In Proc. Int. Conf. Robot. Auton. (ICRA), pp. 6482–6487. Cited by: §I.
-  (2017) Decoding natural reach-and-grasp actions from human EEG. J. Neural Eng. 15 (1), pp. 016005. Cited by: §I.
-  (2017) Prototypical networks for few-shot learning. In Proc. Adv. Neural Inf. Process. Syst. (NIPS), pp. 4077–4087. Cited by: §I, §II, §IV-A, §VI-A.
-  (2019-10) Assistive robotic arm control based on brain-machine interface with vision guidance using convolution neural network. In IEEE Int. Conf. Syst., Man, and Cybern. (SMC), Vol. , pp. 2785–2790. External Links: Cited by: §I, §IV-A, §V-C.
-  (2019) Reduce calibration time in motor imagery using spatially regularized symmetric positives-definite matrices based classification. Sensors 19 (2), pp. 379. Cited by: §I.
-  (2011) Subject and class specific frequency bands selection for multiclass motor imagery classification. Int. J. Imaging Syst. Technol. 21 (2), pp. 123–130. Cited by: §I.
-  (2018-06) Learning to compare: Relation network for few-shot learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Cited by: §I, §II, §IV-A, §VI-A.
-  (2019) Validating deep neural networks for online decoding of motor imagery movements from EEG signals. Sensors 19 (1), pp. 210. Cited by: §I.
-  (2016) Matching networks for one shot learning. In Proc. Adv. Neural Inf. Process. Syst. (NIPS), pp. 3630–3638. Cited by: §II, §IV-A, §VI-A.
-  (2002) Brain–computer interfaces for communication and control. Clin. Neurophysiol. 113 (6), pp. 767–791. Cited by: §I.
-  (2019) Shared control of a robotic arm using non-invasive brain-computer interface and computer vision guidance. Robot. Auton. Syst. 115, pp. 121–129. Cited by: §I.
-  (2019) Learning spatial-spectral-temporal EEG features with recurrent 3D convolutional neural networks for cross-task mental workload assessment. IEEE Trans. Neural Syst. Rehabil. Eng. 27 (1), pp. 31–42. Cited by: §II.
-  (2019) A novel deep learning approach with data augmentation to classify motor imagery signals. IEEE Access 7, pp. 15945–15954. Cited by: §II.