1 Introduction
Electroencephalography (EEG) is a noninvasive neuroimaging modality pioneered by Hans Berger [6], which consists in recording the electrical activity of the brain by placing electrodes on the scalp. Its low cost, simplicity and high temporal resolution (it captures well the brain activity dynamics) have made the popularity of EEG and allowed for its use in various applications. EEG is indeed widely used in brain research for instance to study sleep or epilepsy. It is also central for braincomputer interfaces (BCI), where the subject interacts with a computer through brain signals. These can for example be employed to control exoskeleton [14] or help mechanical ventilation [7].
This paper focuses onto BCI and more specifically onto one of the various possible paradigms: event related potentials (ERP), which are specific timelocked responses of the brain to some stimuli, such as flashes [23]. Given the brain signals, the goal is to determine if the subject was exposed to the stimulus, i.e., find out if the specific timelocked response is in the EEG recording. Thus, BCI boils down to a classification task of brain signals at hand. Various methods have specifically been designed to achieve this; see [15] for a recent review. In this work, we consider one of the most efficient approach, called minimum distance to Riemannian mean (MDRM) [4], which exploits covariance matrices of the data along with their natural Riemannian geometry.
BCI classification methods have shown good performance on a number of classical datasets; see e.g., [12]. However, these datasets have been recorded in laboratories, which are well controlled environments, and data have been curated by hand to ensure the quality of the recordings. In real life scenarios, obtaining such idealistic data seems unrealistic. One can expect the data at hand to contain faulty measurements due for instance to impedence change or electrode disconnections [3], or artifacts coming from movements of the subject, blinks, etc. These can lead to missing data in EEG recordings as one may decide to discard it from the data. When it comes to classification tasks, missing data impedes the exploitation of complementary information in nonmissing channels. Considering this, it is of primary importance to develop an appropriate procedure to classify incomplete EEG signals.
To achieve this, several strategies can be employed. A first possibility is to exploit some missing data imputation technique in order to retrieve a complete EEG recording. It has for instance been considered for EEG in [18]
with an algorithm relying on spherical spline interpolation. Imputation can also be performed through the
nearest neighbors (KNN) algorithm
[22]. Another possibility is to directly work in the covariance space. When a channel is missing, an imputation of covariance matrices with uncorrelated white noise has been proposed in
[19]. It is also possible to estimate the covariance matrix of incomplete data using an expectationmaximization (EM) algorithm [9]. More recently, a neural network approach was designed to interpolate missing electrodes in EEG
[20].In this paper, our contribution is to adapt the stateoftheart classification pipeline of EEG ERP data [5] to the case of incomplete data. To do so, we consider two different strategies: the missing data imputation approach using KNN and the covariance estimation based on the EM algorithm. The proposed implementations are validated and compared on a classical real EEG dataset [23] with simulated missing data patterns.
2 Covariance estimation with missing data
A common hypothesis for EEG signals is to assume a multivariate normal distribution
[8]. Let be a recorded EEG trial containingdimensional real vectors, where
is the number of electrodes and the time samples.follows a zeromean Gaussian distribution with a covariance matrix
, that is . When the data is complete, the Maximum Likehood Estimator (MLE) is the Sample Covariance Matrix (SCM):(1) 
Let assume now that the data is incomplete, i.e., some recorded elements in are missing as in Figure 1. There are multiple sources of missingness within an EEG trial: electrode poping, artifacts created by eye blinking or jaw movements, and more generally postprocessing steps that discard poor quality data. It is then likely that the missingness pattern is a general pattern with multiple blocks missing in time and space.
2.1 Data imputation
The estimation of a classification feature, as the SCM, requires the data to be complete. In the case of incomplete data, one natural strategy is to rely on imputation techniques before performing any classification task. In this work, we focus on nearest neighbors (KNN) interpolation [22], which is an obvious candidate for EEG due to its ability to capture spatial channel correlation. If the electrode of is missing, KNN finds its closest neighbors in the training set according to an Euclidean distance compatible with missing data, called the heterogeneous Euclideanoverlap metric (HEOM) [10]. If represents the set of selected neighbors, then the imputed value is
(2) 
where denotes the corresponding weight of the th nearest neighbor and is the squared HEOM. The choice of can be determined by crossvalidation on observed data only, which can be challenging with high missing data ratio. If is the total imputed dataset, the covariance can be then estimated using (1) with replaced by .
2.2 An EM algorithm for covariance matrix estimation
A second strategy to deal with missing values is to directly estimate the covariance feature without imputation. In this case, one can rely on maximum likelihood estimation using the EM algorithm developed in [11]. Here, the unknown parameter to estimate is simply . In the following EM algorithm, is transformed into such that , where is a permutation matrix, and denote the observed and missing elements of , respectively. The covariance matrix of is
(3) 
where , , are the block CM of , of and , and of . Replacing by its permuted version , the complete loglikelihood for the Gaussian distribution is expressed as:
(4) 
The EM algorithm, which is given in Algorithm 1, can now be formulated.
Initialization. At step of the algorithm, initialize the covariance matrix with the SCM computed from fully observed (i.e., no missing values), denoted .
Estep. Compute the expectation of the missing data conditioned by the observed data and the estimated covariance at the th iteration:
(5) 
with . The only expectation to compute is , which is the expectation of the sufficient statistics (denoted hereafter). A classical result in conditional distributions [1] gives the following expression:
(6) 
with conditional mean .
Mstep. Resolve the following maximization problem:
(7) 
where is the set of symmetric positive definite (SPD) matrices. Solving (7) requires to solve . Fortunately in this case, the following closedform expression can be found with simple derivation calculus:
(8) 
3 Riemannian classification of EEG ERP
This section illustrates, based on [5], how it is possible to classify EEG trials using their covariance matrices.
3.1 Feature extraction
The main contribution of [5] consists in a smart way of building the covariance feature for ERP data embedding both the spatial and temporal structural information. For that, each trial is augmented using a reference signal , i.e. the signal waveform, which is the average of several single trial responses of the target class:
(9) 
where is the ensemble of indexes of the trial belonging to the target class. For each trial , a super trial is built by concatenating and such that . Then, the classical approach consists in estimating the SCM using (1) with replaced by , resulting in covariance matrices as features to classify.
In the case of incomplete trials, a new must be estimated using (9) from imputed trials , e.g. using KNN. For the imputing and EM strategies described in sections 2.1 and 2.2, the supertrials become respectively:
(10) 
where is the transformation of using a set of permutation matrices as described in section 2.2. For the imputation strategy, the covariance is estimated using (1) with replaced by . For the EM strategy, the covariance is estimated using Algorithm 1 with replaced by .
3.2 Minimum distance to Riemannian mean
Concerning classification of EEG signals through the covariance matrix feature (and its augmented counterpart), the Minimum Distance to Mean (MDM) algorithm [4] has proven to be an efficient and reliable classifier as reported in previous experiments. Associated with a Riemannian framework, the methodology has shown increased performance in terms of accuracy of classifcation [5].
The idea behind this approach consists in leveraging a nonEuclidean distance based on the geometry of the space of SPD (and thus covariance matrices). In peculiar, theoretical studies have shown benefits to use the socalled affine invariant distance[17]. Given two covariances matrices , its expression is given by:
(11) 
where is the Frobenius norm and is the logarithm of matrices. Then, given a classification problem of classes, with training data , the training phase consists in computing the mean of each class according to the affineinvariant distance by using the iterative algorithm of [13]. A predicted label is then obtained on a new sample by assigning it to closest mean:
(12) 
4 Numerical experiments
4.1 Data description and preprocessing
The proposed procedures are assessed through a detection task of a P300 ERPs triggered from target stimuli consisting of flashes [23]. The data is accessed through the BNCI 2014009 P300 dataset readily available from the moabb plateform [12]. This dataset consists in 3 recording sessions of EEG potentials on 10 subjects using 16 Ag/AgCl electrodes^{1}^{1}1Fz, FCz, Cz, CPz, Pz, Oz, F3, F4, C3, C4, CP3, CP4, P3, P4, PO7, PO8. Please refer to [2] for an extensive description of the dataset. covering left, right and central scalp. Before assessing performance, signals are filtered between 1 and 20Hz, downsampled to 128Hz and then segmented into
1728 overlapping epochs each lasting
103 ms.To approach a realistic scenario of incomplete EEG signals, missing data are simulated in space and time, e.g., multiple electrodes popping over various time intervals according to a general pattern (see Figure 1). For that purpose, spatiotemporal blocks of data are set as missing in randomly chosen EEG trials (from 6% to 49% of the total number of trials): in each incomplete trial, 10 out of 16 electrodes are set as missing during a block of time (from 5ms to 45ms blocks).
4.2 Results
To compare the accuracy of the binary classification of EEG ERP, the following strategies are considered for comparison: Gaussian: the MDRM method using the SCM (1) of the complete dataset, i.e. without missing values; Gaussian observed: The MDRM method using the SCM estimated only from fully observed . KNN + Gaussian: a KNN imputer [22] implemented from the scikitlearn package [16] is applied to incomplete data with a number of neighbors . Then, the MDRM method is used as in the “Gaussian” strategy. Gaussian EM: a hybrid strategy in which the covariance matrix is estimated through Algorithm 1 and fed into the MDRM classifier.
The performance is evaluated using the classical trainingtest paradigm with stratified folding (folds=5). Algorithms are calibrated on the data recorded during the training phase, and applied on the data recorded during the test phase. Note that data is augmented with (complete data) or (incomplete data) during trainingtest.
Accuracy results are presented in Figure 2 and Figure 3. Figure 2 shows the variability of the classification accuracy for each subject for a fixed number of incomplete epochs (39%) and time block size (35ms) while Figure 3 shows the mean accuracy over all subjects for various ratios of incomplete epochs and time block sizes. As expected, best performance is achieved on complete data. As compared to KNN and EM, the “Gaussian observed” approach gives similar accuracies for low incomplete epochs ratio (%) but its accuracy drastically decreases when the missingness ratio increases, illustrating the interest of the proposed strategies. Finally, KNN imputation shows slightly higher accuracies as compared to EMbased classification. KNN seems a particularly adapted strategy due to its ability to capture spatial and temporal correlations, which are strong in EEG ERP due to the timelocked nature of the signal.
5 Conclusion
In this paper, it has been shown that the MDRM classifier gives higher accuracy by taking the latent information given by missing data compared to the estimation of the SCM from observed data. Two strategies have been explored for this purpose: imputation of missing values and covariance estimation with an EM algorithm. Our results indicate that the KNN imputation seems slightly more adapted than the EM algorithm for these data. Perspectives include the exploration of the use of the joint distribution of the complete data and the missing data mechanism,
e.g., by modeling it with a logistic model [21].References
 [1] (1965) An introduction to multivariate statistical analysis. Wiley, New York. Cited by: §2.2.
 [2] (2014) Influence of P300 latency jitter on event related potentialbased brain–computer interface performance. Journal of Neural Engineering 11 (3), pp. 035008. External Links: Document Cited by: footnote 1.

[3]
(202010)
A correlationdriven mapping for deep learning application in detecting artifacts within the EEG
. Journal of Neural Engineering 17 (5), pp. 056018. External Links: Document Cited by: §1.  [4] (2011) Multiclass brain–computer interface classification by Riemannian geometry. IEEE Transactions on Biomedical Engineering 59 (4), pp. 920–928. Cited by: §1, §3.2.
 [5] (2014) A plug&play P300 BCI using information geometry. arXiv preprint arXiv:1409.0107. Cited by: §1, §3.1, §3.2, §3.
 [6] (1929) Über das elektrenkephalogramm des menschen. Archiv für psychiatrie und nervenkrankheiten 87 (1), pp. 527–570. Cited by: §1.
 [7] (2018) Brainmachine interface for mechanical ventilation using respiratoryrelated evoked potential. In International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece. Cited by: §1.
 [8] (2008) On the blind source separation of human electroencephalogram by approximate joint diagonalization of second order statistics. Clinical Neurophysiology 119 (12), pp. 2677–2686. Cited by: §2.
 [9] (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society. Series B (Methodological) 39 (1), pp. 1–38. Cited by: §1.
 [10] (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72 (7), pp. 1483–1493. External Links: Document Cited by: §2.1.
 [11] (2021) Robust lowrank covariance matrix estimation with a general pattern of missing values. arXiv preprint arXiv:2107.10505. Cited by: §2.2.
 [12] (2018) MOABB: trustworthy algorithm benchmarking for BCIs. Journal of Neural Engineering 15 (6), pp. 066011. External Links: Document Cited by: §1, §4.1.

[13]
(2012)
A survey and comparison of contemporary algorithms for computing the matrix geometric mean
. Electronic Transactions on Numerical Analysis 39 (ARTICLE), pp. 379–402. Cited by: §3.2.  [14] (2016) Online SSVEPbased BCI using Riemannian geometry. Neurocomputing 191, pp. 55–68. Cited by: §1.
 [15] (2018) A review of classification algorithms for EEGbased brain–computer interfaces: a 10 year update. Journal of neural engineering 15 (3). Cited by: §1.

[16]
(2011)
Scikitlearn: machine learning in python
. the Journal of machine Learning research 12, pp. 2825–2830. Cited by: §4.2.  [17] (2020) 3  manifoldvalued image processing with spd matrices. In Riemannian Geometric Statistics in Medical Image Analysis, X. Pennec, S. Sommer, and T. Fletcher (Eds.), pp. 75–134. External Links: ISBN 9780128147252, Document, Link Cited by: §3.2.
 [18] (1989) Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology 72 (2), pp. 184–187. External Links: ISSN 00134694, Document Cited by: §1.
 [19] (2019) A data imputation method for matrices in the symmetric positive definite manifold. In XXVIIème colloque GRETSI (GRETSI 2019), Cited by: §1.
 [20] (2020) EEG channel interpolation using deep encoderdecoder networks. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2432–2439. Cited by: §1.
 [21] (2020) Imputation and lowrank estimation with missing not at random data. Statistics and Computing 30 (6), pp. 1629–1643. Cited by: §5.
 [22] (200106) Missing value estimation methods for DNA microarrays. Bioinformatics 17 (6), pp. 520–525. External Links: Document Cited by: §1, §2.1, §4.2.
 [23] (2002) Brain–computer interfaces for communication and control. Clinical Neurophysiology 113 (6), pp. 767–791. External Links: Document Cited by: §1, §1, §4.1.
Comments
There are no comments yet.