Riemannian classification of EEG signals with missing values

by   Alexandre Hippert-Ferrer, et al.

This paper proposes two strategies to handle missing data for the classification of electroencephalograms using covariance matrices. The first approach estimates the covariance from imputed data with the k-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. Both approaches are combined with the minimum distance to Riemannian mean classifier and applied to a classification task of event related-potentials, a widely known paradigm of brain-computer interface paradigms. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.



There are no comments yet.


page 1

page 2

page 3

page 4


Using Riemannian geometry for SSVEP-based Brain Computer Interface

Riemannian geometry has been applied to Brain Computer Interface (BCI) f...

Diagnosing missing always at random in multivariate data

Models for analyzing multivariate data sets with missing values require ...

Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data

Presence of missing values in a dataset can adversely affect the perform...

Learning Invariant Representations with Missing Data

Spurious correlations allow flexible models to predict well during train...

Robust low-rank covariance matrix estimation with a general pattern of missing values

This paper tackles the problem of robust covariance matrix estimation wh...

Nonparametric Pattern-Mixture Models for Inference with Missing Data

Pattern-mixture models provide a transparent approach for handling missi...

Towards the Classification of Error-Related Potentials using Riemannian Geometry

The error-related potential (ErrP) is an event-related potential (ERP) e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Electroencephalography (EEG) is a non-invasive neuroimaging modality pioneered by Hans Berger [6], which consists in recording the electrical activity of the brain by placing electrodes on the scalp. Its low cost, simplicity and high temporal resolution (it captures well the brain activity dynamics) have made the popularity of EEG and allowed for its use in various applications. EEG is indeed widely used in brain research for instance to study sleep or epilepsy. It is also central for brain-computer interfaces (BCI), where the subject interacts with a computer through brain signals. These can for example be employed to control exoskeleton [14] or help mechanical ventilation [7].

This paper focuses onto BCI and more specifically onto one of the various possible paradigms: event related potentials (ERP), which are specific time-locked responses of the brain to some stimuli, such as flashes [23]. Given the brain signals, the goal is to determine if the subject was exposed to the stimulus, i.e., find out if the specific time-locked response is in the EEG recording. Thus, BCI boils down to a classification task of brain signals at hand. Various methods have specifically been designed to achieve this; see [15] for a recent review. In this work, we consider one of the most efficient approach, called minimum distance to Riemannian mean (MDRM) [4], which exploits covariance matrices of the data along with their natural Riemannian geometry.

BCI classification methods have shown good performance on a number of classical datasets; see e.g.[12]. However, these datasets have been recorded in laboratories, which are well controlled environments, and data have been curated by hand to ensure the quality of the recordings. In real life scenarios, obtaining such idealistic data seems unrealistic. One can expect the data at hand to contain faulty measurements due for instance to impedence change or electrode disconnections [3], or artifacts coming from movements of the subject, blinks, etc. These can lead to missing data in EEG recordings as one may decide to discard it from the data. When it comes to classification tasks, missing data impedes the exploitation of complementary information in non-missing channels. Considering this, it is of primary importance to develop an appropriate procedure to classify incomplete EEG signals.

To achieve this, several strategies can be employed. A first possibility is to exploit some missing data imputation technique in order to retrieve a complete EEG recording. It has for instance been considered for EEG in [18]

with an algorithm relying on spherical spline interpolation. Imputation can also be performed through the

-nearest neighbors (KNN) algorithm 


. Another possibility is to directly work in the covariance space. When a channel is missing, an imputation of covariance matrices with uncorrelated white noise has been proposed in 

[19]. It is also possible to estimate the covariance matrix of incomplete data using an expectation-maximization (EM) algorithm [9]

. More recently, a neural network approach was designed to interpolate missing electrodes in EEG


In this paper, our contribution is to adapt the state-of-the-art classification pipeline of EEG ERP data [5] to the case of incomplete data. To do so, we consider two different strategies: the missing data imputation approach using KNN and the covariance estimation based on the EM algorithm. The proposed implementations are validated and compared on a classical real EEG dataset [23] with simulated missing data patterns.

2 Covariance estimation with missing data

A common hypothesis for EEG signals is to assume a multivariate normal distribution

[8]. Let be a recorded EEG trial containing

-dimensional real vectors, where

is the number of electrodes and the time samples.

follows a zero-mean Gaussian distribution with a covariance matrix

, that is . When the data is complete, the Maximum Likehood Estimator (MLE) is the Sample Covariance Matrix (SCM):


Let assume now that the data is incomplete, i.e., some recorded elements in are missing as in Figure 1. There are multiple sources of missingness within an EEG trial: electrode poping, artifacts created by eye blinking or jaw movements, and more generally post-processing steps that discard poor quality data. It is then likely that the missingness pattern is a general pattern with multiple blocks missing in time and space.

Figure 1: Illustration of three rectangular data sets (e.g. ) with different missing data patterns (black=observed, white=missing): monotone, general and random pattern.

2.1 Data imputation

The estimation of a classification feature, as the SCM, requires the data to be complete. In the case of incomplete data, one natural strategy is to rely on imputation techniques before performing any classification task. In this work, we focus on -nearest neighbors (KNN) interpolation [22], which is an obvious candidate for EEG due to its ability to capture spatial channel correlation. If the electrode of is missing, KNN finds its -closest neighbors in the training set according to an Euclidean distance compatible with missing data, called the heterogeneous Euclidean-overlap metric (HEOM) [10]. If represents the set of -selected neighbors, then the imputed value is


where denotes the corresponding weight of the -th nearest neighbor and is the squared HEOM. The choice of can be determined by cross-validation on observed data only, which can be challenging with high missing data ratio. If is the total imputed dataset, the covariance can be then estimated using (1) with replaced by .

2.2 An EM algorithm for covariance matrix estimation

A second strategy to deal with missing values is to directly estimate the covariance feature without imputation. In this case, one can rely on maximum likelihood estimation using the EM algorithm developed in [11]. Here, the unknown parameter to estimate is simply . In the following EM algorithm, is transformed into such that , where is a permutation matrix, and denote the observed and missing elements of , respectively. The covariance matrix of is


where , , are the block CM of , of and , and of . Replacing by its permuted version , the complete log-likelihood for the Gaussian distribution is expressed as:


The EM algorithm, which is given in Algorithm 1, can now be formulated.

Initialization. At step of the algorithm, initialize the covariance matrix with the SCM computed from fully observed (i.e., no missing values), denoted .

E-step. Compute the expectation of the missing data conditioned by the observed data and the estimated covariance at the -th iteration:


with . The only expectation to compute is , which is the expectation of the sufficient statistics (denoted hereafter). A classical result in conditional distributions [1] gives the following expression:


with conditional mean .

M-step. Resolve the following maximization problem:


where is the set of symmetric positive definite (SPD) matrices. Solving (7) requires to solve . Fortunately in this case, the following closed-form expression can be found with simple derivation calculus:

5:     Compute
6:     Compute
8:until  converges
Algorithm 1 Estimation of with the EM algorithm

3 Riemannian classification of EEG ERP

This section illustrates, based on [5], how it is possible to classify EEG trials using their covariance matrices.

3.1 Feature extraction

The main contribution of [5] consists in a smart way of building the covariance feature for ERP data embedding both the spatial and temporal structural information. For that, each trial is augmented using a reference signal , i.e. the signal waveform, which is the average of several single trial responses of the target class:


where is the ensemble of indexes of the trial belonging to the target class. For each trial , a super trial is built by concatenating and such that . Then, the classical approach consists in estimating the SCM using (1) with replaced by , resulting in covariance matrices as features to classify.

In the case of incomplete trials, a new must be estimated using (9) from imputed trials , e.g. using KNN. For the imputing and EM strategies described in sections 2.1 and 2.2, the super-trials become respectively:


where is the transformation of using a set of permutation matrices as described in section 2.2. For the imputation strategy, the covariance is estimated using (1) with replaced by . For the EM strategy, the covariance is estimated using Algorithm 1 with replaced by .

3.2 Minimum distance to Riemannian mean

Concerning classification of EEG signals through the covariance matrix feature (and its augmented counterpart), the Minimum Distance to Mean (MDM) algorithm [4] has proven to be an efficient and reliable classifier as reported in previous experiments. Associated with a Riemannian framework, the methodology has shown increased performance in terms of accuracy of classifcation [5].

The idea behind this approach consists in leveraging a non-Euclidean distance based on the geometry of the space of SPD (and thus covariance matrices). In peculiar, theoretical studies have shown benefits to use the so-called affine invariant distance[17]. Given two covariances matrices , its expression is given by:


where is the Frobenius norm and is the logarithm of matrices. Then, given a classification problem of classes, with training data , the training phase consists in computing the mean of each class according to the affine-invariant distance by using the iterative algorithm of [13]. A predicted label is then obtained on a new sample by assigning it to closest mean:


4 Numerical experiments

4.1 Data description and pre-processing

The proposed procedures are assessed through a detection task of a P300 ERPs triggered from target stimuli consisting of flashes [23]. The data is accessed through the BNCI 2014-009 P300 dataset readily available from the moabb plateform [12]. This dataset consists in 3 recording sessions of EEG potentials on 10 subjects using 16 Ag/Ag-Cl electrodes111Fz, FCz, Cz, CPz, Pz, Oz, F3, F4, C3, C4, CP3, CP4, P3, P4, PO7, PO8. Please refer to [2] for an extensive description of the dataset. covering left, right and central scalp. Before assessing performance, signals are filtered between 1 and 20Hz, downsampled to 128Hz and then segmented into

1728 overlapping epochs each lasting

103 ms.

To approach a realistic scenario of incomplete EEG signals, missing data are simulated in space and time, e.g., multiple electrodes popping over various time intervals according to a general pattern (see Figure 1). For that purpose, spatio-temporal blocks of data are set as missing in randomly chosen EEG trials (from 6% to 49% of the total number of trials): in each incomplete trial, 10 out of 16 electrodes are set as missing during a block of time (from 5ms to 45ms blocks).

4.2 Results

Figure 2: Classification mean accuracy (%) as function of the classification pipeline for each subject of the experiment with 39% incomplete epochs and missing time blocks of 35ms.

To compare the accuracy of the binary classification of EEG ERP, the following strategies are considered for comparison: Gaussian: the MDRM method using the SCM (1) of the complete dataset, i.e. without missing values; Gaussian observed: The MDRM method using the SCM estimated only from fully observed . KNN + Gaussian: a KNN imputer [22] implemented from the scikit-learn package [16] is applied to incomplete data with a number of neighbors . Then, the MDRM method is used as in the “Gaussian” strategy. Gaussian EM: a hybrid strategy in which the covariance matrix is estimated through Algorithm 1 and fed into the MDRM classifier.

The performance is evaluated using the classical training-test paradigm with stratified -folding (folds=5). Algorithms are calibrated on the data recorded during the training phase, and applied on the data recorded during the test phase. Note that data is augmented with (complete data) or (incomplete data) during training-test.

Accuracy results are presented in Figure 2 and Figure 3. Figure 2 shows the variability of the classification accuracy for each subject for a fixed number of incomplete epochs (39%) and time block size (35ms) while Figure 3 shows the mean accuracy over all subjects for various ratios of incomplete epochs and time block sizes. As expected, best performance is achieved on complete data. As compared to KNN and EM, the “Gaussian observed” approach gives similar accuracies for low incomplete epochs ratio (%) but its accuracy drastically decreases when the missingness ratio increases, illustrating the interest of the proposed strategies. Finally, KNN imputation shows slightly higher accuracies as compared to EM-based classification. KNN seems a particularly adapted strategy due to its ability to capture spatial and temporal correlations, which are strong in EEG ERP due to the time-locked nature of the signal.

Figure 3: Classification mean accuracy (%) as function of the number of incomplete epochs for four different strategies.

5 Conclusion

In this paper, it has been shown that the MDRM classifier gives higher accuracy by taking the latent information given by missing data compared to the estimation of the SCM from observed data. Two strategies have been explored for this purpose: imputation of missing values and covariance estimation with an EM algorithm. Our results indicate that the KNN imputation seems slightly more adapted than the EM algorithm for these data. Perspectives include the exploration of the use of the joint distribution of the complete data and the missing data mechanism,

e.g., by modeling it with a logistic model [21].


  • [1] T.W. Anderson (1965) An introduction to multivariate statistical analysis. Wiley, New York. Cited by: §2.2.
  • [2] P. Aricò, F. Aloise, F. Schettini, S. Salinari, D. Mattia, and F. Cincotti (2014) Influence of P300 latency jitter on event related potential-based brain–computer interface performance. Journal of Neural Engineering 11 (3), pp. 035008. External Links: Document Cited by: footnote 1.
  • [3] N. Bahador, K. Erikson, J. Laurila, J. Koskenkari, T. Ala-Kokko, and J. Kortelainen (2020-10)

    A correlation-driven mapping for deep learning application in detecting artifacts within the EEG

    Journal of Neural Engineering 17 (5), pp. 056018. External Links: Document Cited by: §1.
  • [4] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten (2011) Multiclass brain–computer interface classification by Riemannian geometry. IEEE Transactions on Biomedical Engineering 59 (4), pp. 920–928. Cited by: §1, §3.2.
  • [5] A. Barachant and M. Congedo (2014) A plug&play P300 BCI using information geometry. arXiv preprint arXiv:1409.0107. Cited by: §1, §3.1, §3.2, §3.
  • [6] H. Berger (1929) Über das elektrenkephalogramm des menschen. Archiv für psychiatrie und nervenkrankheiten 87 (1), pp. 527–570. Cited by: §1.
  • [7] Chevallier, Bao, Hammami, Marlats, Mayaud, Annane, Lofaso, and Azabou (2018) Brain-machine interface for mechanical ventilation using respiratory-related evoked potential. In International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece. Cited by: §1.
  • [8] M. Congedo, C. Gouy-Pailler, and C. Jutten (2008) On the blind source separation of human electroencephalogram by approximate joint diagonalization of second order statistics. Clinical Neurophysiology 119 (12), pp. 2677–2686. Cited by: §2.
  • [9] A. P. Dempster, N. M. Laird, and D. B. Rubin (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society. Series B (Methodological) 39 (1), pp. 1–38. Cited by: §1.
  • [10] P. J. García-Laencina, J. Sancho-Gómez, A. R. Figueiras-Vidal, and M. Verleysen (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72 (7), pp. 1483–1493. External Links: Document Cited by: §2.1.
  • [11] A. Hippert-Ferrer, M. N. E. Korso, A. Breloy, and G. Ginolhac (2021) Robust low-rank covariance matrix estimation with a general pattern of missing values. arXiv preprint arXiv:2107.10505. Cited by: §2.2.
  • [12] V. Jayaram and A. Barachant (2018) MOABB: trustworthy algorithm benchmarking for BCIs. Journal of Neural Engineering 15 (6), pp. 066011. External Links: Document Cited by: §1, §4.1.
  • [13] B. Jeuris, R. Vandebril, and B. Vandereycken (2012)

    A survey and comparison of contemporary algorithms for computing the matrix geometric mean

    Electronic Transactions on Numerical Analysis 39 (ARTICLE), pp. 379–402. Cited by: §3.2.
  • [14] Kalunga, Chevallier, Barthélemy, Djouani, Monacelli, and Hamam (2016) Online SSVEP-based BCI using Riemannian geometry. Neurocomputing 191, pp. 55–68. Cited by: §1.
  • [15] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rakotomamonjy, and F. Yger (2018) A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update. Journal of neural engineering 15 (3). Cited by: §1.
  • [16] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. (2011)

    Scikit-learn: machine learning in python

    the Journal of machine Learning research 12, pp. 2825–2830. Cited by: §4.2.
  • [17] X. Pennec (2020) 3 - manifold-valued image processing with spd matrices. In Riemannian Geometric Statistics in Medical Image Analysis, X. Pennec, S. Sommer, and T. Fletcher (Eds.), pp. 75–134. External Links: ISBN 978-0-12-814725-2, Document, Link Cited by: §3.2.
  • [18] F. Perrin, J. Pernier, O. Bertrand, and J.F. Echallier (1989) Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology 72 (2), pp. 184–187. External Links: ISSN 0013-4694, Document Cited by: §1.
  • [19] P. L. C. Rodrigues, M. Congedo, and C. Jutten (2019) A data imputation method for matrices in the symmetric positive definite manifold. In XXVIIème colloque GRETSI (GRETSI 2019), Cited by: §1.
  • [20] S. Saba-Sadiya, T. Alhanai, T. Liu, and M. M. Ghassemi (2020) EEG channel interpolation using deep encoder-decoder networks. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2432–2439. Cited by: §1.
  • [21] A. Sportisse, C. Boyer, and J. Josse (2020) Imputation and low-rank estimation with missing not at random data. Statistics and Computing 30 (6), pp. 1629–1643. Cited by: §5.
  • [22] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman (2001-06) Missing value estimation methods for DNA microarrays. Bioinformatics 17 (6), pp. 520–525. External Links: Document Cited by: §1, §2.1, §4.2.
  • [23] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan (2002) Brain–computer interfaces for communication and control. Clinical Neurophysiology 113 (6), pp. 767–791. External Links: Document Cited by: §1, §1, §4.1.