Towards a Fast Steady-State Visual Evoked Potentials (SSVEP) Brain-Computer Interface (BCI)

02/04/2020 ∙ by Aung Aung Phyo Wai, et al. ∙ Nanyang Technological University 11

Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively long time window of one second or more. Various methods were proposed to improve sub-second response accuracy through subject-specific training and calibration. Substantial performance improvements were achieved with tedious calibration and subject-specific training; resulting in the user's discomfort. So, we propose a training-free method by combining spatial-filtering and temporal alignment (CSTA) to recognize SSVEP responses in sub-second response time. CSTA exploits linear correlation and non-linear similarity between steady-state responses and stimulus templates with complementary fusion to achieve desirable performance improvements. We evaluated the performance of CSTA in terms of accuracy and Information Transfer Rate (ITR) in comparison with both training-based and training-free methods using two SSVEP data-sets. We observed that CSTA achieves the maximum mean accuracy of 97.43±2.26 four-class and forty-class SSVEP data-sets respectively in sub-second response time in offline analysis. CSTA yields significantly higher mean performance (p<0.001) than the training-free method on both data-sets. Compared with training-based methods, CSTA shows 29.33±19.65 statistically significant differences in time window less than 0.5 s. In longer time windows, CSTA exhibits either better or comparable performance though not statistically significantly better than training-based methods. We show that the proposed method brings advantages of subject-independent SSVEP classification without requiring training while enabling high target recognition performance in sub-second response time.



There are no comments yet.


page 4

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Brain-computer interface (BCI) is a promising technology that can enable users to directly interact with target physical or virtual interfaces through brain signals without conventional neural pathways [brunner2015bnci]. EEG is widely adopted due to its portability, high temporal resolution, low-cost and high usability. BCI systems are implemented in different paradigms, such as motor imagery [handiru2018eeg, He2015], Steady-State Visual Evoked Potentials (SSVEP) using periodic frequency stimuli  [kimura2013p, zhang2019retinotopic], P300 event-related potentials  [JingJ2011, gu2019online], etc., depending on specific application needs. Among these BCI modalities, SSVEP provides high performance throughput, requires less or no training and shows high resilience to interference [brunner2015bnci]. SSVEP, in response to repetitive stimulus presentation, can easily be characterised by robust steady-state responses (SSR) with less dependence on time and phase synchronization requirements [dreyer2017]. So SSVEP-based BCI system has become popular in visual speller and control applications for having reliable responses and high information transfer rate (ITR) [chen2015PNAS].

For capturing reliable SSR, SSVEP stimulus presentation generally requires continuous time window of a few seconds over multiple trials [norica2015review]. This limits SSVEP in real-time usage as long exposure to repetitive flickering stimuli cause user discomfort leading to performance deterioration and impaired usability [dreyer2017]. In recent years, several methods were proposed to improve the accuracy while reducing the time window. These new methods leverage correlated features from multi-channel EEG for improving performance in short response time, such as multivariate synchronization index (MSI) [MSI2014, ZhangRMSI2016], task-related component analysis (TRCA) [Nakanishi2018], correlated component analysis  [zhangcorca20182, ZHANGYS2019]. Besides the visual speller, SSVEP-BCI could potentially be used to objectively assess the visual field abnormality and its integrity over time in Glaucoma patients  [hebert2014, nakanishi2017d]. The objective SSVEP responses show better Glaucoma detection than conventional Psycho-physics method that relies on subjective user responses [vaegan2008]. The presence or absence of SSR in response to multi-frequency modulated stimuli can be regarded as objective measures in discriminating glaucomatous eyes from healthy ones. Recent study using portable SSVEP BCI solution with multiple Glaucoma patients showed high diagnosis accuracy and low test-retest variability [nakanishi2017d]. In order to be effective assessment, these clinical assessments require short test duration without tedious calibration stage, and should robustly detect visual functional defects of all subjects. In all SSVEP applications, shorter test duration brings better user experience as well as better test performance because long testing time for visual field assessment is deemed unreliable and lack of user’s adherence [dreyer2017].

The high accuracies in SSVEP decoding results usually rely on long data segment length [MSI2014, Lin2007]. For quick responsiveness and online BCI applications, it is important to achieve reliable SSR recognition in short time window, preferably sub-second response time (within 1 s interval). Although standard CCA method is an unsupervised method without requiring calibration or training data, some of its enhanced variants were implemented in supervised way that require training or calibration data [chen2015PNAS, chen2015filter, nakanishi2015, yuan2015enhancing]. Still, there is poor performance in short time windows compared with longer ones in both supervised and unsupervised variants of CCA solutions  [zhangcorca20182, nakanishi2015]. The huge gap in accuracy among different time windows might be due to the poor linear relationship in short segment length [labecki2016]. On the other hand, this might be due to non-linear origin of SSVEP responses resulting in SSR with highly non-linear relationship [labecki2016]. What’s more, collecting calibration data is time-consuming and labor-intensive, and sometimes difficult to obtain representative calibration data. The major issue is that performance differences between sub-second and more than one second response time of most SSVEP methods are highly distinct with less satisfactory performance [zerafa2018train]. Moreover, existing studies only evaluated on single data-set potentially optimizing the methods with data-specific improvement instead of generalizable performance across different data-sets. It is highly desirable having a single method that should be working well on different data-sets with consistence performance across subjects. Nevertheless, various methods were proposed to improve sub-second response accuracy through subject-specific training to learn representative features per stimulus [Nakanishi2018, zhangcorca20182]. Those existing methods show superior performance improvements but still a huge discrepancy in mean accuracy between time window less than 0.5 s (sub-second response time) and time window of 1 s or longer. We strongly believe that there is a room for performance improvements in sub-second response time with existing training-free as well as training-based methods [zerafa2018train].

So, we propose an unsupervised method without subject-dependent calibration, termed as combined spatial-filtering and temporal alignment (CSTA), which leverages features from linear correlation and multi-phased non-linear mapping between EEG signals and stimulus reference signals. In speech and human activity recognition tasks, Dynamic Time Warping (DTW) is successfully applied by aligning temporal similarities between two signals to achieve desirable classification or detection performance [Muller2007]. DTW method leverages temporal similarity measures to optimize non-linear similarity mapping through distance metrics between measured signals and templates [seto2015]. Some preliminary results with four-class SSVEP data-set showed performance improvements in sub-second response time by combining CCA and DTW [wai2019improving]. We will evaluate the proposed method in comparison with two training-based and one training-free state-of-the-art (SOTA) methods using two data-sets [chen2015filter, nakanishi2015, Nakanishi2018]. The offline evaluation results show significant performance improvements in different short time windows from 0.2 to 1.0 s with CSTA in comparison with other methods used in evaluation [Nakanishi2018, nakanishi2015]. These promising results encourage that non-training method can still achieve similar or better performance compared with those training-based methods; moving towards practical fast SSVEP-based BCI online applications.

In this paper, we propose subject-independent training-free method, CSTA and evaluate its performance against the SOTA methods using two SSVEP data-sets. In Section II, we explain the principles of the proposed method in details. We then explain evaluation scenario and comparison with selected methods using two SSVEP data-sets in Section III. We discuss the details of analysis results using different evaluation criteria in Section IV. Finally, we conclude the paper with possible improvements with some limitations and future works in Section V.

2 Combined Spatial Filtering and Temporal Alignment Method

In order to improve recognition accuracy and reduce user’s discomfort or fatigue, SSVEP detection method requires recognizing correct steady-state responses (SSR) in short response time without subject-specific calibration. Our method, CSTA exploits features from spatial filtering and temporal mapping with phase shifts to recognize the target frequency [wai2019improving]. The spatial filtering with CCA optimizes the ‘spatial weights’ to find maximum linear correlation between multi-channel EEG signals and reference signals. In contrast, DTW optimizes non-linear ‘temporal alignment’ matching to find minimum mapping distance measures between EEG signals and stimulus templates (similar reference signals used in CCA).

2.1 CCA Spatial Filtering

CCA is a multivariate analysis method for finding the linear relationship between two multidimensional variables. It optimizes the canonical weight vectors that maximizes the linear relationship between the input variables

[Hotelling1936]. For SSVEP recognition, only two multivariate inputs of EEG multi-dimensional signals, and reference signals, are required; where is number of channels, is number of data samples and is number of harmonics. CCA optimizes two weight vectors and such that the correlation between the resulting linear combinations of canonical variates and is maximized as:


Maximizing correction between input and reference signals (1

) can be formulated and solved using generalized eigenvalue problem.

In baseline CCA based frequency detection method, the ideal sine-cosine reference signals, i.e., (), are created with a combination of sine-cosine signals of stimulus fundamental and different harmonics frequencies as follows [Lin2007]:


where is the sampling rate. Then, the maximum correlation coefficient can be computed between input EEG sample and each reference , , respectively where is the number of stimulus frequencies. Finally, the SSVEP target frequency can be identified as the frequency of the reference with the maximum correlation among all stimulus frequencies in Eq. (3):


2.2 DTW Temporal Alignment

DTW is a time-series alignment method that finds the non-linear temporal alignment mapping between input multi-dimensional input data and templates by minimizing the distance metric between them. DTW provides efficient time-series similarity measures regardless of shifting and distortion in time for alignment matching between the templates and target signals [seto2015]. Comparison between two time-series uses local distance measure to determine similarity if the distance metric is small (low cost) and dissimilarity if distance is large (high cost). For each pairs of input X and Y, the distance matrix is computed to find the alignment between X and Y. The distance between and can be computed by minimizing path lengths and as


The warping path is a sequence with that satisfies three constraints to find the optimal solution in polynomial time as following.

  • Boundary:

  • Monotonicity:

  • Continuity:

The optimal path length that includes indexes to align the samples, is automatically selected by DTW algorithm having minimum distance among all possible warping paths. The time complexity of DTW can be reduced through multi-scaling, warping path constraints, etc to achieve linear computational efficiency. For SSVEP recognition task, DTW’s stimulus templates can easily be constructed by consisting of SSR characteristics with narrow band responses to fundamental and harmonics of the stimulus frequencies. Unlike conventional time series matching problems, SSVEP classification only requires short data length of sub-second duration resulting in total numbers of samples less than or equal to 250 samples where .

For stimulus frequencies, each target can be represented by DTW as and the target frequency, is the frequency with minimum distance in comparison among reference templates as

Figure 1: An example of SSVEP Target Detection with 4-class OpenBMI data-set by temporal alignment matching between input EEG and reference templates. For simplicity, only a single EEG channel with template of first harmonic was illustrated. The black-colored diagonal line shows distance measure of similarity between input and template signals where single-valued distance metric, can be derived.

Figure. 1 illustrates computing distance where for each stimulus frequency by comparing input EEG signals with each template frequency, . As expected, is minimum distance compared with resulting in correct frequency detection.

2.3 Combined Spatial-Filtering and Temporal Alignment

The proposed method, CSTA, consists of four steps in correctly classifying SSVEP responses among multiple stimuli as shown in Fig. 

2 where EEG and reference signals at the -th stimulus frequency as and , respectively. As a first step, basic CCA is used to find maximum linear relationship between and . The maximum correlation coefficients, of respective stimulus frequencies, are considered for spatial filtering based decision as in Eq. (3). Secondly, the canonical weight vector, is used to transform input EEG signals into canonically correlated EEG signals, i.e., , as the input to DTW temporal alignment mapping step.

Figure 2: Diagram of the proposed method, CSTA in SSVEP Classification.

By concatenating from each -th stimulus frequency, correlation vector was obtained from linear spatial filtering operation. Secondly, the CCA reference signal is expanded into multiple reference templates, by shifting phases specified as follows:


where is the phase used in stimulus with joint frequency and phase modulation coding [wangyj2016] and zero if stimulus is only frequency modulation coding [lee2019eeg]. Multiple phases modified CCA reference signals allows to include different phases as template signals, in DTW temporal alignment matching operation as follows:


Then, temporal similarity between and is determined by minimizing the warping path length using DTW for each frequency and phased-template as according to Eq (5). For comparison among distance metrics, non-normalized distances, computed from temporal alignment using multiple references are normalized resulting . For each -th stimulus frequency, the minimum distance is determined from multi-phased normalized warped distances according to k phase shifts.


Similarly, the distance vector is formed by concatenating normalized minimum distances of all stimulus frequencies . The final target classification uses intermediate decision vectors from CCA, and DTW, to determine the detected target, . We used complementary fusion to combine decision outcomes from maximization of CCA spatial filtering and minimization of DTW temporal alignment as shown in Eq. (9). Further, the correct class detection is governed by two decision rules: (1) matching with ground-truth frequency with detected label(s) from either or both linear correlation, and non-linear alignment, operations; (2) at least twenty-five percents of frequency detection agreement between both operations. But fusion rule can easily be varied from this simple fusion logic to weighted voting fusion between and depending on the performance criteria and data-specific requirements.


3 Experiment Design and Evaluation Analysis

For evaluating the performance of the proposed training-free method, we select two SSVEP data-sets to compare with existing SOTA methods including one training-free method, FBCCA  [chen2015filter] and two training-based methods, TRCA [Nakanishi2018] and IT-CCA [nakanishi2015]. We used similar data and parameters settings with the same data analysis pipeline in offline evaluation analysis. We will explain briefly about two SSVEP data-sets and three SOTA methods in below sub-sections.

3.1 SSVEP Data-sets

The first SSVEP data-set, OpenBMI, was collected by Korea University [lee2019eeg]. The second data-set was an open access benchmark data-set, HS-SSVEP, by Tsinghua University [wangyj2016]. These data-sets have distinct stimulus characteristics and experiment conditions. OpenBMI data-set, 4-class SSVEP, has 10 times lower spatial resolution than HS-SSVEP data-set, 40-class SSVEP. The stimuli in OpenBMI are generated using the number of frames in a period according to monitor screen’s refresh rate such as 8.57Hz (7 frames per period of 60Hz), whereas those of HS-SSVEP are generated using frequency approximation approach [Nakanishi2014].

OpenBMI data-set is a representative of control task where four targets SSVEP interface is coded with four distinct frequencies, such as, 5.25 Hz, 6.67 Hz, 8.57 Hz, 12 Hz [lee2019eeg]. The EEG data were recorded using the 62 channels EEG at 1000 Hz sampling rate. Fifty-four subjects participated in the experiment with a total of four sessions. Each session has 100 trials corresponding to four stimuli that were presented in a random order. So each stimulus has 25 trials per session. In each trial, the stimulus duration was 4 s, which was followed by 6 s rest interval. To match with the second data-set in SSVEP analysis, all data were down-sampled to 250 Hz.

HS-SSVEP data-set is a representative of visual spelling task [wangyj2016]. Its interface consists of forty targets by coding unique frequency and phase information for each stimulus. The stimuli are coded with frequencies of 8-15.8 Hz with an interval of 0.2 Hz in combination with four distinct phases of radians  [wangyj2016]. The EEG data from thirty-five healthy subjects were recorded from 64 channels at 1000 Hz sampling rate. For each subject, six blocks of data in single session were recorded. In each block, forty trials corresponding to each stimulus frequency were conducted in a random order. So each stimulus frequency has six trials. The duration of each trial is 6 s, in which the first and last 0.5 s were used for visual cue and rest interval respectively. The provided EEG data for each trial are downsampled at 250 Hz.

In our analysis, the time window or response time is set from 0.2 to 1 s with an increment of 0.1s. We only consider data from nine channels at occipito-parietal area: namely, PO8, Pz, PO7, PO4, POz, PO3, O2, Oz, O1 available in both data-sets. All data epoches are preprocessed through zero-phase IIR band-pass filter from 4Hz to 75Hz, and from 7Hz to 90Hz for OpenBMI and HS-SSVEP data-set respectively. This evaluation allows to assess the performance of different methods with distinct SSVEP characteristics such as the number of targets, stimulus coding, frequency range, etc. The number of phase,

, in CSTA methods is set empirically at 8 and 17 in OpenBMI and HS-SSVEP data-set respectively.

3.2 Training-free Method

Filter-Bank CCA (FB-CCA) method is still a calibration-free method as no training is required for each subject. In this method, multi-channel EEG signals are preprocessed with pre-defined band-pass filters to extract sub-band components  [chen2015filter]. By applying pre-constructed equal bandwidth filter banks, EEG signals are decomposed into ’n’ sub-bands resulting in components. Standard CCA is applied to each sub-band comparing with ideal sine-cosine reference signals to compute correlation vector . The target frequency, is the weighted sum of the square of the correlation coefficients from all sub-band N components.


where. In our analysis, we used sub-bands where each sub-band starts with unique frequencies but ends with about five or six times of of stimulus frequencies of the target data-set. The constants and from weight vector are empirically identified through parameters grid search for each data-set using formula defined in [chen2015filter]. The starting frequency is set at Hz for sub-bands with same ending frequency at 72Hz for OpenBMI data-set. For HS-SSVEP data-set, the starting frequency for sub-bands is set at Hz with same ending frequency at 88Hz [chen2015filter]. The and coefficient pairs of weight vector are set at [1.25, 0.25] and [3.0, 1.0] for HS-SSVEP and OpenBMI data-set respectively after performing parameters grid search  [chen2015filter, kumar2019filter]. The number of FB is set at 5 for both data-sets [zhangcorca20182].

3.3 Training-based Methods

Individual Template-CCA (IT-CCA) is training-based method as calibration data are required to create subject-specific references [nakanishi2015]. Subject-specific templates are created by averaging EEG signals across multiple trials from each subject [Bin2011]. This method achieved higher performance in comparison with various improved CCA methods using 12-class SSVEP data-set [nakanishi2015]. Instead of single maximum correlation value among in Eq. (1), each stimulus frequency is represented by the correlation coefficients between the linear combination of test EEG samples , subject-independent reference signals and individual subject-specific templates () using different spatial filters. For the -th frequency, the correlation coefficient was derived as the summation of correlation vectors as follows [nakanishi2015].


where sign() was used to retain discriminative information, positive or negative correlation, from correlation vectors . The target frequency of input EEG was then identified by selecting the frequency with maximum correlation values similar to Eq. (3).

Task Related Component Analysis (TRCA) is another subject-specific training-based method that exploits reliable frequency response (SSR) characteristics of SSVEP with respect to stimulus frequency. Nakanishi et al proposed TRCA spatial filtering by exploiting task-related components by linear and weighted sum of multiple time courses that optimizes the maximum covariance among trials [Nakanishi2018]. TRCA essentially learns spatial filters to extract task-related components by maximizing the reproducibility of SSR from multiple trials [TANAKA2013308]. So TRCA optimizes the weight vector from the assumption that temporal profiles of task-related components exhibits a maximal temporal similarity among trials. With spatial filters, and the template signals , where , we can quantify the linear correlation between the each template signal and a test sample using one-dimensional correlation analysis. Finally, the frequency of the test sample could be determined from resultant correlation coefficient as:


We used leave-one-trial-out cross validation in our data-sets for both training-based methods [Nakanishi2018]. Out of trials, trials are used to train individual templates, , i.e. 24 trials and 5 trials respectively for OpenBMI and HS-SSVEP data-set, and remaining 1 trial repetitively to compute the mean accuracy for performance comparison. The FB extension shows performance improvements with other baseline methods including CCA in evaluation study with HS-SSVEP data-set [kumar2019filter]. FB as preprocessing step improved the performance of both calibration-free and calibration-based methods in previous studies [Nakanishi2018, zhangcorca20182, kumar2019filter]. So we also apply FB preprocessing in SOTA methods in order to compare the performance of the CSTA method with similar FB extension.

4 Results and Discussion

We compared the performance between CSTA, and 3 SOTA methods, i.e., FBCCA, IT-CCA and TRCA using Accuracy and ITR metrics [Yin2014BME, Nakanishi2018]. The number of harmonics was set to five in all methods to include the fundamental and multiple harmonic components of SSVEP responses [ZHANGYS2019]. For ITR computation, we include cue interval of 2 s and 0.5 s time window for OpenBMI and HS-SSVEP data-set respectively [chen2015PNAS]. TRCA is commonly used as benchmark evaluation method in comparison with newly proposed training-based methods using HS-SSVEP data-set [Nakanishi2018, wangyj2016]. FB preprocessing step applied to TRCA, IT-CCA and CSTA methods are named as FB-TRCA, FB-ITCCA and FB-CSTA respectively. This FB extension allows to evaluate how FB extension improves performance and behaves differently with those methods in two data-sets.

For statistical evaluation analysis, we used one-way ANOVA with repeated measures analysis with Greenhouse-Geisser spherical correction after testing sphericity assumption using Mauchly’s test of sphericity for performance comparison among multiple methods [chen2015filter]. All post-hoc pairwise comparisons between different method pairs were computed with Bonferroni correction. Two alpha levels of 0.001 and 0.05 are used for statistical significance testing. Also, two-sided non-parametric Wilcoxon rank sum test was used to test significant difference between methods with option of with or without FB extension.

Figure 3: Performance comparison of CSTA with two training methods (TRCA, IT-CCA) for OpenBMI data-set (a) Accuracy (b) ITR using one-way ANOVA with repeated measures after spherical correction. At specific time window, ** indicates significant difference among methods at , * indicates significant difference at

, otherwise no statistical significant difference among methods. Error bar indicates standard error.

From one-way ANOVA with repeated measures with OpenBMI data-set, the results show high statistically significance () in mean accuracy and ITR among methods from 0.2 s to 0.4 s time windows. The significant in mean accuracy and ITR was achieved () from 0.5 s to 1.0 s except not statistical significant difference in accuracy at 0.8 s . Figure. 3 shows that CSTA performs either better or similar accuracy and ITR in comparison with two training-based methods in time windows from 0.2 s to 1.0 s. Interestingly, CSTA results show less variability in mean accuracy among 0.2 s to 1 s time windows. Averaging accuracy per subject across time windows highlights superior performance of CSTA with % overall accuracy compared with % and % of TRCA and IT-CCA respectively. The mean accuracy difference between CSTA and other methods at 1 s showed on where maximum accuracy was achieved for each method. But the mean accuracy difference of CSTA is about higher than other methods at 0.2 s time window. This highlights better performance improvements in CSTA can be seen at short time windows. Similar to accuracy, highly significant mean ITR among methods ( ) was achieved from 0.2 s to 0.4 s time window. The significant in mean ITR () among methods was shown from 0.5 s to 1 s. The maximum mean ITR of 36.67 bpm is achieved at 0.7 s time window with CSTA method. But TRCA achieved maximum mean ITR of 36.59 bpm at 0.6 s whereas that of IT-CCA is 32.48 bpm at 0.9 s. The low mean ITR results in OpenBMI data-set are mainly due to inclusion of long cue duration of 2 s used in stimulus presentation  [lee2019eeg].

Data-set Metric CSTA TRCA IT-CCA
Table 1: Mean standard error comparison () over time windows of three methods using two-data-set

Table. 1 presents mean standard errors of accuracy and ITR for comparison among methods across all time windows derived in both data-sets as shown in Figure. 3 and Figure. 4. Apparently, CSTA method has lower standard errors than TRCA and IT-CCA methods;resulting less variability in mean performance across time window in both data-sets as shown in Table. 1. The possible reason for consistent performance across time window in CSTA might be due to low stimulus spatial resolution on screen (only 4-class) with frequency only coded stimuli and high adjacent inter-stimulus frequency difference ( Hz). This neighboring frequency difference is more than ten times higher than that of HS-SSVEP data-set, 0.2 Hz. With distinct and well-separate stimulus frequencies, combination of linear correlation with non-linear temporal alignment matching recognizes target frequencies more accurately. Although phases from steady-state responses are stable, the actual phase information will possibly deviate from stimulus frequencies due to visual latency from retina to brain pathway [Pan2011]. As both CSTA and IT-CCA methods include linear correlation measures in frequency recognition, the extension of multi-phased templates with temporal alignment in CSTA contributed to above superior performance outcomes. According to post-hoc pairwise mean accuracy comparison results as shown in Table 2, CSTA performs significantly better () than IT-CCA in all time windows. Similarly, TRCA has significantly better accuracy than IT-CCA () in all time windows. But CSTA only performs significant better than TRCA from 0.2 s to 0.4 s and no significant difference between them from 0.5 s to 1 s as shown in Table 2. Similar significant results as accuracy, except at 0.4 s time window , achieved in pairwise comparison between pairs of three methods for ITR as shown in Table 3.

Figure 4: Mean performance comparison of CSTA with two training methods for HS-SSVEP data-set (a) Accuracy (b) ITR using one-way ANOVA with repeated measures with spherical correction. At specific time window, ** indicates significant difference among methods at , * indicates significant difference at , otherwise no statistical significant difference.Error bar indicates standard error.
Methods data-sets 0.2 s 0.3 s 0.4 s 0.5 s 0.6 s 0.7 s 0.8 s 0.9 s 1 s
CSTA Vs TRCA OpenBMI ** ** * 0.599 1.0 0.994 1.0 0.689 0.186
HS-SSVEP ** ** ** ** * 0.87 1.0 1.0 1.0
CSTA Vs ITCCA OpenBMI ** ** ** ** ** ** ** ** **
HS-SSVEP ** ** ** ** ** 0.116 0.493 1.0 1.0
TRCA Vs ITCCA OpenBMI ** ** ** ** ** ** ** ** **
HS-SSVEP * * 0.55 1.0 0.211 0.295 0.366 * 0.099
Table 2: Post hoc pairwise comparison of Accuracy among method pairs without FB. At specific time window, ** indicates significant difference at , * indicates significant difference at , otherwise p-value for no statistical significant difference.
Methods data-sets 0.2 s 0.3 s 0.4 s 0.5 s 0.6 s 0.7 s 0.8 s 0.9 s 1 s
CSTA Vs TRCA OpenBMI ** ** 0.059 1.0 1.0 1.0 1.0 1.0 0.193
HS-SSVEP ** ** ** * 0.098 1.0 1.0 0.827 0.63
CSTA Vs ITCCA OpenBMI ** ** ** ** ** ** ** ** **
HS-SSVEP ** ** ** ** ** 0.252 1.0 1.0 1.0
TRCA Vs ITCCA OpenBMI ** ** ** ** ** ** ** ** **
HS-SSVEP * * 0.276 0.662 0.295 0.137 0.107 * *
Table 3: Post hoc pairwise comparison of ITR among method pairs without FB. At specific time window, ** indicates significant difference at , * indicates significant difference at , otherwise p-value for no statistical significant difference result.

From applying one-way ANOVA with repeated measures in HS-SSVEP data-set, different statistically significant results in mean accuracy among methods in short time windows as show in Fig.4. Only high significant difference at 0.2 s () and significant difference () from 0.3 s-0.4 s among methods were achieved. Similar statistical significant results of mean ITR among methods similar to mean accuracy can be seen in Figure.4. Unlike OpenBMI data-set, no significant difference among methods from 0.5 to 1.0 s in mean accuracy and ITR performance. Pairwise ad-hoc comparison of accuracy and ITR between methods of HS-SSVEP data-set in Table 2 showed differently in comparison with OpenBMI data-set as shown in Table 3. Different levels of statistical significant results can be seen in both mean accuracy (Only significant difference from 0.2 s-0.6 s) and ITR (Only significant difference from 0.2 s-0.5 s) comparison between CSTA and TRCA methods.But highly significant results ( ) between CSTA and IT-CCA in both accuracy and ITR in 0.2 s to 0.6 s. These statistical results show variability in significant performance difference among methods in HS-SSVEP data-set compared with OpenBMI. But CSTA method consistently performs either better or comparable with both training methods in short time windows. Similar to openBMI dataset, mean performance of CSTA method has less variability compared with other methods across all time windows as shown in Table. 1. This highlights the contribution of temporal alignment matching with multi-phased templates and complementary fusion between correlation and alignment decision vectors. Comparison of all three methods above did not include FB as pre-processing step.

Generally, FB extensions of different methods in SSVEP detection showed accuracy improvements with HS-SSVEP data-set from 0.25 s to 5 s time window [kumar2019filter]. But the most notable performance improvement was observed in longer time window after 1 s segment with FB extension [kumar2019filter]. Although FB extension does not require user-specific calibration step, it still requires to perform a grid search for finding optimal weight coefficients in Eq. (10) for each method in each data-set [kumar2019filter]. This is one limitation of FB extension to optimize weight vector for each data-set for optimal performance improvements.

Figure 5: Mean Accuracy comparison among CSTA with training-free CCA and FB-CCA methods at different time windows using one-way ANOVA with repeated measures with spherical correction. Left and right graph shows results from OpenBMI and HS-SSVEP data-set respectively.Here, ** indicates significant difference among methods at at specific time window.Error bar indicates standard error.

FBCCA method shows better accuracy improvements in longer time window of more than 0.5 s compared with baseline CCA in both data-sets. But its accuracy is close to baseline accuracy of and in 0.2 - 0.3 s time windows in both data-sets as shown in Fig. 5. With one-way ANOVA with repeated measures analysis, highly significant difference ( ) among these methods can be seen from 0.2 s to 1.0 s time windows. This shows exploiting phase information with non-linear temporal alignment achieve high performance in sub-second response time. As FB operation exploits the existence of multiple harmonics from SSR in frequency recognition, longer time window is required for improving performance as can be seen in Fig. 5.

Figure 6: Accuracy comparison among three methods with FB extensions in OpenBMI (Left graph) and HS-SSVEP (Right graph). At specific time window, ** indicates significant difference at , * indicates significant difference at , otherwise ’ ’ for no statistical significant difference. Error bar indicates standard error.

One-way ANOVA with repeated measures results show less significant performance difference among methods with FB extension in Fig. 6 compared with non-FB extensions using both data-sets in Figs. 3 and 4. The significant difference among FB extended methods can only be found in less than 0.7 s time window, i.e. 0.2 s-0.3 s () and 0.4 s-0.6 s () respectively, in OpenBMI data-set 6. But only significant difference ( ) among methods can be seen at 0.2 s in HS-SSVEP data-set. Table 4 shows mean accuracy comparison of FB extended methods of both data-sets from 0.2 s to 1.0 s time window. But overall accuracy improvements can be seen in all methods at different levels in both data-sets. The mean accuracy improvements in OpenBMI data-set show slight differences of %, % and % for CSTA, TRCA an IT-CCA methods respectively. In HS-SSVEP data-set, mean accuracy improvements of %, % and % respectively for above three methods. The proposed method CSTA gains the least accuracy improvements by FB extension in both data-sets. FB improvements can be clearly seen in HS-SSVEP data-set compared with OpenBMI data-set especially in TRCA and IT-CCA methods. The possible reason for higher performance improvement is due to equal inter-stimulus frequency difference of Hz in HS-SSVEP data-set compared with unequal inter-stimulus frequency difference in OpenBMI data-set. Pairwise comparison of CSTA, TRCA and ITCCA methods with and without FB extensions at different time windows validates our assertions as shown in Table. 5.

Methods data-sets 0.2 s 0.3 s 0.4 s 0.5 s 0.6 s 0.7 s 0.8 s 0.9 s 1 s
CSTA Vs TRCA OpenBMI ** ** * 0.774 0.513 0.745 0.284 0.28 0.114
HS-SSVEP ** 0.714 1.0 1.0 1.0 1.0 1.0 1.0 1.0
CSTA Vs ITCCA OpenBMI ** ** ** ** ** * ** * *
HS-SSVEP ** * 0.359 0.971 1.0 1.0 1.0 1.0 1.0
TRCA Vs ITCCA OpenBMI ** ** ** ** ** ** ** * *
HS-SSVEP ** ** * 0.123 1.0 1.0 1.0 1.0 1.0
Table 4: Post hoc pairwise comparison of accuracy among methods with filter-bank extension. At specific time window, ** indicates significant difference at , * indicates significant difference at , otherwise p-value for no statistical significant difference result.
Methods data-sets 0.2 s 0.3 s 0.4 s 0.5 s 0.6 s 0.7 s 0.8 s 0.9 s 1 s
CSTA OpenBMI 0.177 0.463 0.107 0.493 0.104 0.491 0.109 0.407 *
HS-SSVEP ** * 0.077 0.139 0.085 0.061 * 0.066 *
TRCA OpenBMI ** 0.89 0.561 0.612 0.873 0.929 0.961 0.851 0.813
HS-SSVEP * * * * * * 0.051 0.066 0.073
ITCCA OpenBMI * 0.505 0.252 0.333 0.386 0.217 0.233 0.352 0.33
HS-SSVEP 0.14 * * * * * * * *
Table 5: Two-sided non-parametric Wilcoxon rank sum tests of comparison between with and without FB pre-processing of three methods. At specific time window, ** indicates significant difference at , * indicates significant difference , otherwise shows p-value if no statistically significant difference.

We also use coefficients to compare the features discrimination between target and non-target stimuli of different methods [Nakanishi2018]. The high value means good features discrimination, better features representations between targets and non-targets, of the method. The results in Figure. 7 show CSTA has better features discriminative capability but has less features variability than other methods in both data-sets in each frequency.

Figure 7: Features comparison among methods with mean R-squared values for SSVEPs at 12 Hz and 8.4 Hz for OpenBMI and HS-SSVEP data-set respectively. Time window is set at 0.3 s. Error bar indicates standard errors.

This can be further validated by visualizing multi-phased non-linear temporal alignment features used in CSTA method in different time windows. We apply t-Distributed Stochastic Neighbor Embedding (t-SNE) to transform and visualize multi-dimensional features from both methods into 2-dimensional feature presentations [vanDerMaaten2008]. Figure. 8 proves that the features identified by CSTA method has similar separability among different frequencies between 0.3 s and 0.8 s time windows although slightly better inter-frequency separation at 0.8 s. We believe this good inter-class separability enables better performance in short time windows compared with SOTA methods in OpenBMI and HS-SSVEP as shown in Fig. 3 and Fig. 4 respectively.

Figure 8: Clusters-based visualization of features, after dimensional reduction using t-SNE to two dimensions for simple illustration, derived from CSTA method for 4-class openBMI data-set at (a) 0.3 s (b) 0.8 s time window.

The performance of TRCA can be further improved using ensembled and hierarchical extensions though such extensions are not used in our comparison [Nakanishi2018, ZHANGYS2019]

. Another training-based approach constructs spatiotemporal beamformers from calibration trials, features that optimizes weights per stimulus, by minimizing the beamforming variance 

[Wittevrongel2016]. But, the performance of these training methods are evaluated in subject-specific leave-one-trial-out cross validation. In order to evaluate in a subject-independent manner, there is still no empirical evaluation of these training-based methods similar to training-free methods using leave-one-subject-out scenario. In this manner, overall performance evaluation is similar and comparable to that of subject-independent SSVEP recognition  [chen2015filter, wai2019improving]. Moreover, most training-based methods such as TRCA, ITCCA exploits spatial filters that optimize linear weights of SSVEP responses assuming that consistent data characteristics across trials, sessions and days for each subject. With underlying sources of non-linear SSVEP responses and non-linearity of scalp EEG nature, linear spatial filtering with optimal weights methods might possibly miss-classify some SSVEP responses [labecki2016]. By combining linear spatial filtering with non-linear temporal alignments, CSTA method can offer high detection and consistent performance in short time window.

The only limitation of CSTA method is to empirically specify the numbers of phases, , used in multi-phased templates for temporal alignment. This will raise a question of how to systematically optimize parameter in CSTA method as well as how different

values affect the SSVEP classification performance. Due to relatively high accuracy achieved with conventional data-driven approaches, deep learning approaches are not commonly used compared with other BCI modalities like motor imagery 

[Craik_2019]. Recently, deep learning based SSVEP recognition using custom 12 stimuli showed significant performance improvements compared with CCA methods in 1 s time window [waytowich_2018]. We intend to further evaluate performance of CSTA methods with subject-independent cross validation of SOTA training-methods including deep learning in short time windows. Furthermore, we will continue to improve CSTA performance through subject-specific templates like IT-CCA [nakanishi2015] and improved time-series alignment mapping methods [zhou_2009]. Notably, extension of TRCA method by exploiting multiple stimuli’s characteristics show further performance improvements compared with baseline and ensembled TRCA [wong_2020]. Nevertheless, our work highlights the subject-independent training-free method can still perform better or similar compared with training-based methods for SSVEP frequency recognition in short response time.

5 Conclusion

For achieving a fast SSVEP-based BCI, frequency recognition methods must provide high accuracy with less performance variability in sub-second response time. It is highly desirable that such methods require no subject-specific training to reduce calibration time ensuring high usability. Existing methods only improve accuracy by either exploiting subject-specific calibration in short response time, training-based methods, or relying on long time window, training-free methods. In the current study, we propose a CSTA method leveraging linear correlation and non-linear temporal alignment mapping with decision fusion to improve accuracy in short response time. The offline experimental evaluation shows significant performance improvements with CSTA compared with both training and training-free methods, especially in response time of less than 0.5 s. We validated that performance improvements on two SSVEP data-sets with different stimulus characteristics. We are hoping that the proposed CSTA method brings towards us the realization of fast SSVEP-based BCI by fulfilling end-user needs of training-free approach for reliable frequency recognition in sub-second response time.


This research was supported by Alibaba Group Holding Limited, DAMO Academy, Health-AI division. The program is the collaboration between Alibaba and Nanyang Technological University, Singapore. This work was also supported by the Institute of Information Communications Technology Planning Evaluation (IITP) grant funded by the Korea government (No. 2017-0-00451; Development of BCI based Brain and Cognitive Computing Technology for Recognizing Users Intentions using Deep Learning).