, and so on. But in such cerebral disease diagnosis, invalid/noisy EEGs significantly affect the diagnosis accuracy, since the invalid/noisy EEGs degrade the distinction of target features. Invalid/noisy EEG is stimulated by the non-target brain activities, whose contour shape is dissimilar with those of target ones stimulated by the specific brain activities. Invalid EEGs are mainly from (1) the environmental noises which are always ignored when analyzed and (2) the non-target bioelectrical potentials. Actually, more invalid EEGs mix in raw EEGs of those patients suffering with epilepsy, Alzheimer’s disease, stroke, amyotrophic lateral sclerosis (ALS), etc., due to the uncontrolled neural actions in their brain. To improve the diagnosis accuracy, independent component analysis (ICA)
, principal component analysis (PCA), common spatial pattern (CSP), blind source separation (BSS) , and wavelet transform (WT) 
mainly consider the artifact removal and they improve the accuracy in some extent, but they ignore the impact of invalid/noisy EEGs to the follow-up analyses such as EEG artifact removal, denoising, feature extraction and classification. In another word, EEG selection is the most advance process for EEG analyses, especially for EEG classification, as Figure1 illustrates. Furthermore, EEG selection aims to reduce the invalid/noisy EEGs stimulated by non-target brain activities.
EEG selection is a source control to reduce the degradation from the invalid ones. As far as we know, none of existing previous researches focused on EEG selection, and they jumped this step to artifact removal, feature extraction and classification, seeing Figure 1. We study EEG selection based on maximum weight clique in the work, providing more target EEGs for EEG classification. To the best of our knowledge, this is the first try to select EEGs with maximum weight clique for its further classification.
This paper presents a novel EEG selection. It aims to map EEG selection to searching maximum weight cliques in a similarity-weighted EEG graph in such a way that EEGs in the same clique are more similar to each other than to those in different cliques. This method simultaneously considers the weights of vertices and edges in the weighted EEG graph. Meanwhile, the proposed method focuses on the correlation between pairwise EEGs in the same class and scatter among different classes. Our contributions can be summarised as follows:
We present a novel method mwcEEGs for EEG selection. It maps EEG selection to searching maximum weight cliques from an similarity-weighted EEG graph, simultaneously considering the edge weights and the vertex weights.
We demonstrate the superiority of mwcEEGs, with several popular and newest classifiers, over the state-of-the-art time series selection approaches through a detailed experimentation using standard classification validity criteria on the real-world EEG datasets.
The structure of this paper is as follows: In Section 2, we provide some backgrounds into similarity measure and maximum weight clique applied in this paper. In Section 3, we describe the proposed method including the EEG selection algorithm mwcEEGs and its detail description. In Section 4, we outline the datasets, criteria, and baselines to compare. The results and discussion are also presented in this section. Finally, we conclude the work in Section 5.
This section introduces the backgrounds of similarity measure: Fréchet distance and maximum weight clique problem, which are the main two parts of the proposed method.
Ii-a Fréchet Distance
The Fréchet distance (FD), Hausdorff distance (HD), and dynamic time warping (DTW) are the most widely used similarity measures. FD takes into account the location and ordering of the points along the curves, which makes it a better similarity measure for EEG than HD and DTW. Since HD regards the EEG as arbitrary point sets, it ignores the order of points along the EEG. It is possible for two EEGs to have small HD but large FD . DTW measures the distance between curves by warping the sequences in time dimension which ignores timing orders of points and degrades the synchronism of two curves. It sometimes generates unintuitive alignments and results in inferior results , since the DTW similarity measure is not essentially positive definite. Hence, the DTW does not reflect exact similarity of two EEGs because of its time warping . Therefore, we applied Fréchet distance to be the similarity measure in our work.
Mathematically, for two EEGs with continuous mapping , the Fréchet distance between and is defined as
where is the underlying norm, and .
Ii-B Maximum weight Clique
Maximum weight clique problem (MWCP) is to search a complete subgraph (any two vertices are connected by an edge) with the maximum weights of vertices or edges from a weighted graph. Mathematically, given a weighted undirected graph , where and respectively denote vertex and edge of the graph; and are respectively the weights of them. is the weight of . Define and , the aim of MWCP is to search a clique with maximum weight from , see (2).
Especially, when , otherwise , the MWCP is transformed to maximum clique problem (MCP) which aims to search a complete subgraph of maximum cardinality. In the case, the MWCP is to maximize .
Iii The proposed method
The proposed method selects EEGs through searching maximum weight cliques in an improved Fréchet distance weighted EEG graph simultaneously considering the weights of vertices and edges.
Iii-a Weights of Edges
In this work, EEGs are regarded as pairwise connected vertices in an undirected weighted complete graph. The Fréchet distance-based similarities of EEGs are the weights on the edge, also called edge weight, that determines which edge is cut when partitioning complete subgraphs. However, the conventional Fréchet distance (CFD) ignores the temporal structure and it is sensitive to global trends . To improve the CFD, local tendency is brought to evaluate the trend of EEGs. Mathematically, for two EEGs and , , is the set of EEGs, the local trend of , is evaluated by (3).
where . denotes the length of segment EEG. As indicates, a larger probably ignores more local tendencies than those of shorter segments whose length . Hence, commonly or down-sampling into ; estimates the local tendency observed simultaneously on EEGs. This index indicates the synchronization of two EEGs in temporal structure.
and is the normalized value of ; is the weight for normalized global Fréchet distance while for local tendency.
All the similarities of pairwise connected EEGs calculated by IFD with local and global measurements construct the edge weight of the undirected weighted complete graph. Let denote a EEG matrix and the diagonal normalized similarity symmetric matrix , the edge weights , is formed as:
where ; , , and , of denote .
Iii-B Weights of Vertices
Vertex weight of indicates the importance of vertex to the potential maximum similarity-weighted clique, i.e., it measures the importance of the vertex to the potential clique and also determines which vertices can be partitioned together into a same clique. For a set of EEGs in , their importance to the potential clique can be measured by the similarity partially ordered matrix , also called the vertex weights, such that , where is computed by (7).
Specifically, , . also represents the similarity rank of objective EEG to the rest of ones . When is partitioned into clique , with high rank similarity based on vertex weights to are correspondingly highly likely grouped into . In detail, the vertex weight matrix is formed as
Simultaneously considering the edge weights and vertex weights , the pairwise high-weight EEGs with same label grouped into the same clique with respect to similarity threshold can be represented by (9), with which the graph partition (vertices selection) can group most similar vertices with same labels into the same clique with a minimum weight loss of edge cut. In other word, based on (9), the proposed method repeats selecting the vertex with highest value of such that in class as the one adding into the clique to form a new clique with larger weight, rather than randomly selecting one, which insures vertices with high importance can be grouped together into the same clique.
Iii-C The mwcEEGs
In this paper, EEG selection is mapped to multi-searching cliques with maximum weight based on and similarity threshold , which is named mwcEEGs. In detail, the mwcEEGs selects most similar EEGs with same label into the same clique and separates discriminative ones into different cliques with respect to and in this selecting the invalid/noisy EEGs are removed. That is to say, mwcEEGs not only selects most intra-clique similar EEGs as well as inter-clique discriminative ones, but also reduces the influence of noisy EEGs on the classifier.
Given a labeled weighted EEG graph with label matrix , where and , and positive integers such that , the mwcEEGs aims to select a family of disjoint labeled cliques with maximum weight: based on with respect to the similarity threshold . Given any edge , , and the weight of edge in such that , the weight function simultaneously considering the weights of edges and vertices is modified as (10) defines.
Where denotes the number of vertices of clique . increases along with a new vertex such that joins into and then the modified weighted is correspondingly updated. Moreover, the similarity thresholds are crucial to EEG selection. Simply, similar vertices whose edge weights larger than corresponding are likely grouped into the same clique . Namely, influences the EEG selection results. Furthermore, for searching cliques, it needs thresholds, since thresholds achieve cliques while remaining vertices are naturally regarded as a clique, then cliques are finally achieved.
The mwcEEGs with the modified weight function (10) can be written equivalently as
where , denotes the EEG class index.
Set , then
Proposition 1 indicates that searching the maximum similarity-weighted cliques based on the modified weight function is to maximize the total weight of vertices and edges satisfying the similarity thresholds. With the modified weight function simultaneously considering edge weights and vertex weights, the pseudo-code of mwcEEGs for EEG selection is shown in Algorithm 1. The mwcEEGs firstly sets similarity thresholds, vertex weight matrix with (4,5,6) and edge weight matrix with (7,8) for initializing the labeled EEG graph , seeing line 1. Then it calculates the value of and selects the vertex with the maximum value into the clique without randomly selecting one, see lines 4-5. Subsequently, mwcEEGs compares the total weight of the new clique with the old one to determine the new vertex joining to the clique or not, as lines 6-13 indicate. Additionally with lines 9-13, the matrix , and are updated based on the vertex adding. When the clique whose vertex set is is searched out, the and are modified correspondingly in lines 9-13, to calculate the weight matrix of the remaining vertices and their edges with same label , and then the vertex with holding the largest value in achieved by is most likely chosen as the next vertex into the clique, to form a new clique with larger total weight.
As Algorithm 1 and Proposition 1 demonstrate, the algorithm selects labeled EEGs such that and then trains the classifier model with such selected labeled EEGs. In other words, this process with respect to not only chooses the most distinguished labeled EEGs with high similarity to train the classifier model, but also reduces the influence of invalid/noisy EEGs.
In the algorithm, a vertex labeled joining the clique if it simultaneously satisfies 2 conditions: (1) , ; (2) . Actually, once is set, adding into just needs satisfying (1).
A vertex labeled joins the clique to construct a new larger-weight clique if and only if .
Since , obviously , and . For a vertex labeled such that , according to Proposition 1: , then
Namely, joining increases the weight, therefore and construct a new clique with a larger weight than that of .∎
Based on Theorem 1, once is set, searching labeled cliques such that can be transformed to -search labeled maximum similarity-weighted cliques, namely -repeating mwcEEGs with , .
The mwcEEGs for labeled cliques selection can be equivalently transformed to -time repeating mwcEEGs with .
Recall Proposition 1. The mwcEEGs, can be written as , then
According to Proposition 2 and Theorem 1, the mwcEEGs is transformed by -searching labeled maximum cliques, as Algorithm 2 shows, in which any algorithm for maximum clique problem (MCP)  can be applied to search the cliques with maximum weight in the given graph. Importantly, in every iteration to search the maximum clique, the vertex weights of will be ranked in a descending order and the weight matrix is calculated in line 8, so that the algorithm can choose vertex with highest weight into the potential clique . This procedure contributes high-quality selection and fast searching maximum clique. A higher indicates the vertex has higher similarity with all the other vertices. That is, vertices with higher are likely grouped into the same clique. Meanwhile, selecting the vertex with largest value in as the new vertex adding to the potential clique also reduces the time consumption compared with the conventional methods that randomly select vertices.
The EEG data we experiment with are slow cortical potentials (SCPs 111The data set is publicly available as online archives at http://www.bbci.de/competition/ii/.) provided by Institute of Medical Psychology and Behavioral Neurobiology from University of Tübingen. In detail, Dataset Ia with 135 EEG trials labeled ’0’ (Traindata_0, Ia) and 133 labeled ’1’ (Traindata_1, Ia) are taken from a healthy subject (HS). Dateset Ib with 100 EEG trials labeled ’0’ (Traindata_0, Ib) and 100 labeled ’1’ (Traindata_1, Ib) are taken from an amyotrophic lateral sclerosis subject (ALS). 3 cases of experiments are set up in Table I. In this paper, we apply Hold-out strategy  to evaluate methods. The data sets are divided into two parts: training data and testing data with the proportion of 2:1. Furthermore, the Hold-out strategy is applied 3 times to produce 3 groups of training and testing data for the methods.
|EEG Cases||Datasets||Training:Testing||# of Classes|
|EEG Case 1||Traindata_0, Ia + Traindata_1, Ia||180 : 88 (268)||2|
|EEG Case 2||Traindata_0, Ib + Traindata_1, Ib||134 : 66 (200)||2|
|EEG Case 3||
Iv-B Evaluation Methodology
Rand index (RI)  estimates the quality of classification with respect to the right classes of the data. It measures the percentage of right decisions made by the method. In detail, , where TP, FP, TN, and FN respectively denotes the number of true positives, false positives, true negatives, and false negatives.
Fleiss’ kappa ()  is a statistical measure for assessing the coherence of decision ratings among classes. Mathematically, , where denotes the degree of agreement actually achieved over chance and denotes the degree of agreement attainable above chance. Meanwhile, , , and denotes the number of subjects, the number of ratings per subject, the number of classes into which assignment are made.
We compared mwcEEGs with the state-of-the-art EEG time series selection methods, as follows.
lwEEGs: Local weighted EEG time series selection computes time series centroid in each class and selects nearest [21, 22] time series from the same labeles to the corresponding centroid as the training time series.
gwEEGs: Global weighted EEG time series selection computes the centroid of all labeled time series and selects closest ones from all classes to the centroid time series as the training ones for classifiers.
lrtEEGs : Local recursion testing EEG time series selection recursively selects nearest time series to every testing one with same label and chooses the most nearest ones from selected time series in each class as the training data for classifiers, which focuses on the local correlation between time series to the testing ones.
grtEEGs : Global recursion testing EEG time series selection recursively selects nearest time series to each testing time series without considering the class labels and then chooses the most similar ones as the training time series for classifiers. It considers the global similarity between all the time series to the entire testing time series.
Meanwhile, in order to evaluate all the methods to select EEGs, we apply most popular and newest classifiers to classify EEGs with EEG selection methods. The applied classifiers are introduced below, which mainly include most widely applied SVM, shapelet-based, ensemble-based, and structure-based classifiers.
SVM: We apply LIBSVM  in this section as one of the baselines to classify EEG data. With LIBSVM, the kernel width and of SVM are respectively tuned as and .
st-TSC : Shapelet transform-based method time series classifier extracts discriminative subsequences that best distinguish time series in different classes and uses an optimization formulation to search for fixed length time series subsequences that best predict the target variable by calculating the distances from a series to each shapelet.
RPCD : Recurrence patterns compression distance time series classifier uses recurrence plots as representation domain for time series classification via applying Campana-Keogh distance to estimate similarity.
COTE : An ensemble-based classifier classifies time series by applying a heterogeneous ensemble onto transformed representations. A flat collective of transform based ensembles (COTE) fuses various classifiers into a single one, which includes whole time series classifiers, shapelet classifiers, and spectral classifiers.
Iv-D Parameter setting
For mwcEEGs, determines the selection of EEGs and affects the classification performance. The mwcEEGs with a smaller selects less discriminative but more general EEGs while mwcEEGs with a larger selects more discriminative but less general EEGs. Then the classification with classifiers are influenced by the selected EEGs, namely, the final classification results are affected by . And more distinguished EEGs with a large degrades the classification performance of classifiers because of low generality of selected discriminative EEGs which cannot represent the general EEG data. As a consequence, the mwcEEGs with a moderate or optimal selects a better amount of EEGs which balances the discrimination and generality, and seems to achieve better classification performance. Here we select the optimal
based on the discrete probability distribution of similarities among EEG. Mathematically, the discrete probability distributionis defined as
where denotes the number of vertices (EEGs) in the similarity-weighted graph , denotes the number of similarities locate in .
Similarity threshold in the work for EEG selection with 3 datasets is respectively set based on the discrete probability distribution shown in (a) – (e) of Figure 2. (a) – (e) respectively shows that most EEGs from the same dataset are similar with each other and the EEG similarity is displayed in (f).
To illustrate the influence of on mwcEEGs, we set for 3 EEG Cases. The number of selected EEGs for lwEEGs, gwEEGs, lrtEEGs, and grtEEGs is set as same as mwcEEGs. For classifiers SVM, st-TSC, RPCD, COTE, and SAX-SEQL, we set the optimal parameters as same as the references set. As introduced before, data set in 3 cases is divided into 3 groups of training and testing data with the proportion of 2:1 based on the Hold-out strategy. All the methods are run on 3 groups of EEG data for each case and the results are averaged.
Iv-E Experimental Results and Discussion
In this paper, we proposed the maximum weight clique inspired method mwcEEGs to select EEGs. To firmly establish the efficacy of our method, we compared the mwcEEGs with the state-of-the-art time series selection methods on several popular and newest time series classifiers for EEG classification. The experimental results with 3 cases (3 datasets) are shown in Figure 3 – 5 respectively. We can see from the experimental results that with selected EEGs by mwcEEGs, the classification performance is improved compared that without selected EEGs. Moreover, a small or a moderate yields a better classification performance than a larger . In other word, a small or moderate calculated with (12) achieves a high-quality EEG classification. The reason is that the larger the is, more discriminative EEGs are selected by mwcEEGs. That is to say, the selected discriminative EEGs with a larger reduce more discriminative features of EEGs and cannot represent the general EEGs, so its classification results are probably lower than that with a smaller achieves or even lower than that without selected ones. As a consequence, mwcEEGs with small or moderate yields best classification results with respect to RI, F-score, and kappa, which indicates that mwcEEGs is superior over the state-of-the-art time series selection methods for EEG classification on several promising classifiers.
This paper explores Brain EEG selection. Raw EEGs without selection contains many invalid/noisy data which degrades the corresponding learning performance. Since EEG is weak, complicated, fluctuated and with low signal-to-noise, conventional time series selection methods are not applicable for EEG selection. To address this issue, a novel approach (called mwcEEGs) based on maximum weight clique is proposed to select valid EEGs. The main idea of mwcEEGs is to map EEG selection to searching a family of cliques with maximum weights simultaneously combining edge weights and vertex weights in an improved Fréchet distance-weighted EEG graph while reducing the influence of invalid/noisy EEGs according to similarity thresholds . The experimental comparisons with the state-of-the-art time series selection methods based on different evaluation criteria on real-world EEG data demonstrate the effectiveness of the mwcEEGs for EEG selection.
This work was partially supported by National Natural Science Foundation of China (Nos. U1433116 and 61702355), Fundamental Research Funds for the Central Universities (Grant No. NP2017208), and Funding of Jiangsu Innovation Program for Graduate Eduction (Grant No. KYZZ16_0171).
-  E. Barzegaran, B. van Damme, R. Meuli, and M. G. Knyazeva, “Perception-related eeg is more sensitive to alzheimer’s disease effects than resting eeg,” Neurobiology of aging, vol. 43, pp. 129–139, 2016.
K. Samiee, P. Kovacs, and M. Gabbouj, “Epileptic seizure classification of eeg time-series using rational discrete short-time fourier transform,”IEEE transactions on Biomedical Engineering, vol. 62, no. 2, pp. 541–552, 2015.
-  B. Hordacre, N. C. Rogasch, and M. R. Goldsworthy, “Commentary: Utility of eeg measures of brain function in patients with acute stroke,” Frontiers in human neuroscience, vol. 10, p. 621, 2016.
-  Y. Jonmohamadi, G. Poudel, C. Innes, and R. Jones, “Source-space ica for eeg source separation, localization, and time-course reconstruction,” NeuroImage, vol. 101, pp. 720–737, 2014.
-  R. Jiang, H. Fei, and J. Huan, “A family of joint sparse pca algorithms for anomaly localization in network data streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 11, pp. 2421–2433, 2013.
-  W. Wu, Z. Chen, X. Gao, Y. Li, E. N. Brown, and S. Gao, “Probabilistic common spatial patterns for multichannel eeg analysis,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 3, pp. 639–653, 2015.
-  R. R. Vázquez, H. Velez-Perez, R. Ranta, V. L. Dorr, D. Maquin, and L. Maillard, “Blind source separation, wavelet denoising and discriminant analysis for eeg artefacts and noise cancelling,” Biomedical Signal Processing and Control, vol. 7, no. 4, pp. 389–400, 2012.
-  Y. Liu, W. Zhou, Q. Yuan, and S. Chen, “Automatic seizure detection using wavelet transform and svm in long-term intracranial eeg,” IEEE transactions on neural systems and rehabilitation engineering, vol. 20, no. 6, pp. 749–755, 2012.
-  P. K. Agarwal, R. B. Avraham, H. Kaplan, and M. Sharir, “Computing the discrete frÉchet distance in subquadratic time,” in Proceedings of the Twenty-fourth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’13. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2013, pp. 156–167. [Online]. Available: http://dl.acm.org/citation.cfm?id=2627817.2627829
-  E. J. Keogh and M. J. Pazzani, “Derivative dynamic time warping,” in Proceedings of the 2001 SIAM International Conference on Data Mining. SIAM, 2001, pp. 1–11.
-  F. Petitjean, A. Ketterlin, and P. Gançarski, “A global averaging method for dynamic time warping, with applications to clustering,” Pattern Recognition, vol. 44, no. 3, pp. 678–693, 2011.
-  A. Chouakria-Douzal and P. N. Nagabhushan, “Improved fréchet distance for time series,” in Data Science and Classification. Springer, 2006, pp. 13–20.
-  Q. Wu and J.-K. Hao, “A review on algorithms for maximum clique problems,” European Journal of Operational Research, vol. 242, no. 3, pp. 693–709, 2015.
-  J.-H. Kim, “Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap,” Computational statistics & data analysis, vol. 53, no. 11, pp. 3735–3745, 2009.
J. Wu, Z. Cai, and X. Zhu, “Self-adaptive probability estimation for naive bayes classification,” in
The 2013 International Joint Conference on Neural Networks (IJCNN), Aug 2013, pp. 1–8.
-  J. Wu, S. Pan, X. Zhu, C. Zhang, and P. S. Yu, “Multiple structure-view learning for graph classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. PP, no. 99, pp. 1–16, 2018.
-  J. Wu, X. Zhu, C. Zhang, and Z. Cai, “Multi-instance multi-graph dual embedding learning,” in 2013 IEEE 13th International Conference on Data Mining, 2013, pp. 827–836.
-  W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical association, vol. 66, no. 336, pp. 846–850, 1971.
-  D. C. Blair, “Information retrieval, cj van rijsbergen. london: Butterworths; 1979: 208 pp,” 1979.
-  J. L. Fleiss, “Measuring nominal scale agreement among many raters.” Psychological bulletin, vol. 76, no. 5, p. 378, 1971.
-  J. Wu, Z. Cai, and Z. Gao, “Dynamic k-nearest-neighbor with distance and attribute weighted for classification,” in 2010 International Conference on Electronics and Information Engineering, vol. 1, 2010, pp. V1–356–V1–360.
-  J. Wu, Z.-h. Cai, and S. Ao, “Hybrid dynamic k-nearest-neighbour and distance and attribute weighted method for classification,” Int. J. Comput. Appl. Technol., vol. 43, no. 4, pp. 378–384, Jun. 2012.
-  L. Jiang, “Learning instance weighted naive bayes from labeled and unlabeled data,” Journal of intelligent information systems, vol. 38, no. 1, pp. 257–268, 2012.
C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,”ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, p. 27, 2011.
-  J. Lines, L. M. Davis, J. Hills, and A. Bagnall, “A shapelet transform for time series classification,” in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012, pp. 289–297.
-  D. F. Silva, V. M. De Souza, and G. E. Batista, “Time series classification using compression distance of recurrence plots,” in Data Mining (ICDM), 2013 IEEE 13th International Conference on. IEEE, 2013, pp. 687–696.
-  A. Bagnall, J. Lines, J. Hills, and A. Bostrom, “Time-series classification with cote: the collective of transformation-based ensembles,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 9, pp. 2522–2535, 2015.
-  T. Le Nguyen, S. Gsponer, and G. Ifrim, “Time series classification by sequence learning in all-subsequence space,” in Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE, 2017, pp. 947–958.
-  J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing sax: a novel symbolic representation of time series,” Data Mining and knowledge discovery, vol. 15, no. 2, pp. 107–144, 2007.