Log In Sign Up

Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer's disease

by   Wolfgang Fruehwirt, et al.

The diagnosis of Alzheimer's disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framework combining Riemannian tangent space mapping and elastic net regression for the development of brain atrophy markers. While most AD QEEG studies are based on small sample sizes and psychological test scores as outcome measures, here we train and test our models using data of one of the largest prospective EEG AD trials ever conducted, including MRI biomarkers of brain atrophy.


page 1

page 2

page 3

page 4

1 Introduction

Having been successfully applied in domains such as computer vision

[30], radar signal processing [5]

, and diffusion tensor imaging


for years, the introduction of a Riemannian manifold of symmetric positive-definite (SPD) matrices to brain signal analysis represents a powerful alternative to more traditional information extraction protocols. Only recently has it been shown that Riemannian Brain-Computer Interface (BCI) methods outperform state-of-the-art Euclidian spatial filtering and machine learning techniques

[9]. Five recent international BCI competitions – including last year’s Microsoft Cortana brain decoding challenge – have been won using Riemannian geometry [9, 3].
Several reasons for this success have been proposed in the literature. First, in the form of covariance matrices, SPD matrices are understood to be excellent representations of the raw electrophysiological brain signal, while reducing its unwanted variations [20]. They have therefore become fundamental elements in methods such as common spatial pattern and canonical correlation analysis. Second, SPD matrices are traditionally treated within Euclidian frameworks, ignoring their intrinsic non-Euclidian structure. Neglecting this fundamental characteristic may lead to deficient results [1].

These points not only have been found advantageous in BCI design but also make a strong case for the use of a Riemannian SPD matrix manifold in the assessment of neuronal degeneration as can be found in Alzheimer’s disease (AD).

AD is the most common form of dementia and ultimately fatal. The combination of its severity and looming global epidemic scale – caused by the ageing of our society – makes AD a major public health concern [2]. Due to its degenerative nature, early accurate diagnosis and effective clinical monitoring are crucial. However, when it comes to routine clinical practice, AD assessment is most commonly done by subjective clinical interpretations at an already progressed stage of the disease. So far, no cost-effective, widely-used biomarkers have been established to facilitate the objectivization of diagnosis and disease progression assessment. To promote the screening and monitoring of as many individuals as possible, such markers should not be dependent on costly equipment, such as MRI, or PET scanners. Therefore, we focus on inexpensive apparatuses, namely electroencephalography (EEG) devices. Their non-invasiveness and low noise level adds to their suitability for large-scale use in irritable patients such as those found within the spectrum of AD. Additionally, research suggests that quantitative electroencephalography (QEEG) reflects neurodegenerative processes in AD (for reviews, see [32, 12, 10]).
Therefore, we aim to develop a Riemannian framework for QEEG markers of neuronal degeneration in AD and empirically investigate its usefulness. To be able to combine the merits of Riemannian geometry with the advantages of sophisticated Euclidean regularization and variable selection techniques like the elastic net (see 2.3), we map SPD matrices into the tangent space (see 2.2

). Sustaining the distance relationship of elements, this projection and a subsequent vectorization allows to treat SPD matrices as Euclidean entities.

All existing Riemannian brain signal analysis methods use covariance as measure of dependence, thereby implicitly assuming a multivariate Gaussian distribution of data and linear associations between the activities of brain regions. However, both properties might not be fulfilled

[29, 33]

. Hence, we examine the usefulness of rank correlation for constructing SPD matrices – capturing non-linear relationships in data that is often far from normally distributed. For model training and testing, we use one of the largest AD EEG data sets ever collected in a prospective manner, including MRI biomarkers of brain atrophy. As frequency-specific QEEG information has been proven useful in the AD domain

[33], we furthermore analyze the effectiveness of a special type of spatiofrequential SPD matrix. Finally, to evaluate the real added value of tangent space mapping, we compare results achieved by this method with those achieved by using regular Euclidean procedures.

2 Materials and Methods

2.1 Experimental data

AD patients were prospectively recruited at four tertiary memory clinics (Medical Universities of Graz, Innsbruck, Vienna, and the General Hospital Linz, PRODEM cohort study by the Austrian Alzheimer Society [27], supported by the Austrian Research Promotion Agency FFG, project no. 827462). We exclusively analyzed participants (N = 110) who had a structural MRI scan within 60 days of the baseline EEG measurement, a maximal MMSE score [14] of 28, and a CDR [19] from 0.5 to 1.
Acquisition of structural T1-weighted images was accomplished on 1.5 and 3 Tesla MR scanners (Siemens). We used FreeSurfer volumetric analyses [13] to build two MRI biomarkers of brain atrophy, i.e., cerebral volume and hippocampal volume divided by the total intracranial volume (ratios referred to as BrainVol and HippVol).
Continuous EEG (alpha trace EEG recorder, 10-20 electrode placement) was analyzed for an eyes-closed resting condition (EC, 180 sec; prediction of BrainVol) and the encoding period of a paired-associate word list task (WLT, adapted version of [25], 140 sec; prediction of HippVol). Research has repeatedly shown (i) the importance of hippocampal activity during the WLT [7, 8], and (ii) the sensitivity of paired-associative memory to early AD-related changes [15, 24].
For details on the entire PRODEM experimental protocol and preprocessing pipeline, see [33, 17, 16].

2.2 Feature generation

To optimize the number of time points available, we determined the maximal signal length with guaranteed quasi-stationary properties using an augmented Dickey-Fuller test [11]

. The de-artifacted signal was partitioned into 4-sec segments accordingly. SPD matrices were estimated using sample covariance (SCM, see

1) and Kendall rank correlation (KEN, see [22]). In the following all procedures will be described by taking the example of sample covariance.
EEG segments were considered as by matrices , being the number of electrodes, being the number of time samples. For each segment a spatial covariance matrix was estimated:


To take distinct frequential aspects into account, we created a special type of matrix by band-pass filtering multiple times ( = 2 to 4 Hz, = 4 to 8 Hz = 8 to 13 Hz, = 13 to 15 Hz) and vertically concatenating the resulting signal to . Then, the sample covariance matrix was estimated:


The common Euclidean distance () between two matrices and the corresponding mean () of several matrices can be defined as


where denotes the Frobenius norm. However, the Euclidean space suffers from several disadvantages, as – for instance – the averaging of SPD matrices may lead to a swelling effect (the determinant of the Euclidean mean can be strictly larger than the original determinants [1]). To avoid such artifacts from geometry, a more natural metric for SPD matrices, the Log-Euclidean distance , with the corresponding mean , can be used (e.g., [34]):


Further, SPD matrices can be treated in their native Riemannian space using geodesic distance

and the Riemannian geometric mean, often referred to as Karcher mean

[21], , which minimizes the sum of squared :



are the eigenvalues of

. As has no closed-form solution for > 2 , we optimized it using the relaxed Richardson iteration [6]. For a review on the advantages of Riemannian geometry in brain signal processing and detailed formal definitions, see [9, 30].
Sets of and were separately averaged on patient level using the aforementioned mean calculation methods (4, 6, 8) for both EEG paradigms (EC and WLT), and both measures of dependence (COV and KEN), resulting in subject-specific matrices ( being a patient) for all variants. For Riemannian (TAN) and Log-Euclidean (TAN) tangent space-based features, of the corresponding mean type was mapped into the tangent space


where was computed alternatively using (8) for TAN or (6) for TAN. For a formal definition of the Riemannian tangent space, see [4, 30]. For the Euclidean control condition (EUC) both, the averaging on subject as well as group level, was done using (4).
Applying upper(.) as an operator vectorizing the upper triangular part of a SPD matrix, the feature vectors of , given = 19 resulting in 190 dimensions, and of , given = 19 resulting in 2926 dimensions, were created.

2.3 Elastic net regression and repeated nested cross-validation

The elastic net [35] was used as a regularization and variable selection technique to estimates a sparse regression model based on . It imposes a combination of the (lasso, [28]) and (ridge, [18]) penalties on regression coefficients. While enjoying a similar sparsity of representation as the lasso, the elastic net encourages a grouping effect, where strongly correlated predictors – as presumably present in our data set due to spatially adjacent electrode placement – tend to be in or out of the model together [35].
We used a two-level nested cross-validation to determine generalization performance. The inner loop was included to sensibly choose a value for the regularization parameter with minimal expected generalization error [31]. The value that resulted in the lowest mean squared error (MSE) in the inner loop was used to fit models in the outer loop. The parameter , representing the weight of lasso () versus ridge () optimization, was set at 0.5. Age and gender were introduced in BrainVol models, whereas the magnetic field strength (varying values of Tesla between centers might influence the analysis of smaller structures) was additionally introduced in the HippVol models. All variables were normalized before model fitting. To reduce the variability of prediction outcome resulting from random training–test set splitting, we repeated the entire nested cross-validation procedure times. This allowed us to average out variability and report the range of results of multiple permutations [26].

3 Results and Discussion

Results are depicted in Table 1. For both prediction problems (BrainVol, HippVol), the best models were of spatiofrequential nature (indicated in bold in the table), highlighting the importance of frequency-specific information for QEEG AD markers. When comparing spatiofrequential models, the best tangent space mapping models significantly outperformed the Euclidean reference models (BrainVol, = 0.003; HippVol, = 0.030).
Differences between model performances were assessed by statistically comparing squared errors of test set predictions and averaging

-values across repeated cross-validation. Further, we calculated test statistics for evaluating the stand-alone performance of the best models (BrainVol,

= 0.003; Hippvol, = 0.011).
For the prediction of BrainVol (measured during EC resting state) COV yielded lower root-mean-square errors (RMSE) than KEN. Information on the the magnitude of the signal at certain sites – which is present in the diagonal elements of COV but not KEN matrices – seem to be essential. Whereas for hippocampus-mediated memory encoding during the WLT, the interaction between brain regions (neuronal networks), as measured by off-diagonal matrix elements, seem to be of predominant importance – explaining the superior results for KEN. Further, the EEG signal during an active eyes-open task is presumably less normally distributed (even after sophisticated pre-processing) then during a resting EC period, additionally explaining deviating results for COV and KEN.
Interestingly, TAN achieved better results than TAN. Barachant [3] also inter alia used tangent space mapping with a Log-Euclidean reference point () for winning Microsoft’s ’mind reading’ challenge. Should future studies support the superiority – or at least equality – of TAN, computational cost could be dramatically decreased due to the algorithmic simplicity of Log-Euclidean as compared to Riemannian mean calculation.
To the best of our knowledge, this is the first article reporting a Riemannian approach for building QEEG markers of neuronal degeneration.

BrainVol HippVol
Dep Approach Design RMSE Min Max RMSE Min Max
EUC SF 1.70E-03 1.57E-03 2,53E-03 2.09E-07 1.63E-07 4.26E-07
COV TAN SF 1.23E-03 1.10E-03 1.37E-03 1.86E-07 1.71E-07 2.04E-07
TAN SF 1.43E-03 1.27E-03 1,63E-03 1.80E-07 1.67E-07 2.04E-07
EUC SF 1.75E-03 1.61E-03 2.11E-03 1.78E-07 1.65E-07 2.05E-07
KEN TAN SF 1.56E-03 1.43E-03 1.77E-03 1.44E-07 1.30E-07 1.44E-07
TAN SF 1.58E-03 1.41E-03 1.91E-03 1.47E-07 1.31E-07 1.65E-07
EUC S 2.02E-03 1.57E-03 3.80E-03 1.69E-07 1.55E-07 2.17E-07
COV TAN S 1.34E-03 1.25E-03 1.50E-03 1.77E-07 1.62E-07 2.01E-07
TAN S 1.35E-03 1.24E-03 1.46E-03 1.75E-07 1.56E-07 2.10E-07
EUC S 1.73E-03 1.59E-03 1.90E-03 1.68E-07 1.51E-07 1.95E-07
KEN TAN S 1.56E-03 1.43E-03 1.79E-03 1.79E-07 1.63E-07 2.18E-07
TAN S 1.55E-03 1.44E-03 1.73E-03 1.81E-07 1.63E-07 2.56E-07
Table 1: Mean, minimum and maximum root-mean-square error (RMSE) of 100 nested cross-validation repetitions for predicting the normalized whole-brain volume (BrainVol) and normalized hippocampus volume (HippVol) for various combinations of measures of dependence (Dep; covariance, COV; Kendall rank correlation, KEN), geometric approaches (Approach; Euclidean, EUC; tangent space mapping with Log-Euclidean mean, TAN, and Riemannian mean, TAN), and spatial (S), or spatiofrequential (SF) matrix designs (Design).


  • Arsigny et al.  [2007] Arsigny, V., Fillard, P., Pennec, X., & Ayache, N. 2007. Geometric Means in a Novel Vector Space Structure on Symmetric Positive‐Definite Matrices. SIAM Journal on Matrix Analysis and Applications, 29(1), 328–347.
  • Association [2014] Association, Alzheimer’s. 2014. 2014 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 10(2), e47–e92.
  • Barachant [30.10.2017] Barachant, A. 30.10.2017. Decoding Brain Signals 2016 – Microsoft Cortana Challenge . Retrieved from Microsoft Cortana Website.
  • Barachant et al.  [2012] Barachant, A., Bonnet, S., Congedo, M., & Jutten, C. 2012. Multiclass Brain Computer Interface Classification by Riemannian Geometry. Biomedical Engineering, IEEE Transactions on, 59(4), 920–928.
  • Barbaresco [2008] Barbaresco, F. 2008. Innovative tools for radar signal processing Based on Cartan’s geometry of SPD matrices & Information Geometry. Pages 1–6 of: 2008 IEEE Radar Conference.
  • Bini & Iannazzo [2013] Bini, D. A., & Iannazzo, B. 2013. Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra and its Applications, 438(4), 1700–1710.
  • Cameron et al.  [2001] Cameron, K. A., Yashar, S., Wilson, C. L., & Fried, I. 2001. Human Hippocampal Neurons Predict How Well Word Pairs Will Be Remembered.
  • Clark et al.  [2017] Clark, I. A., Kim, M., & Maguire, E. A. 2017. Confronting the elephant in the room - verbal paired associates and the hippocampus. bioRxiv.
  • Congedo et al.  [2017] Congedo, M., Barachant, A., & Bhatia, R. 2017. Riemannian geometry for EEG-based brain-computer interfaces; a primer and a review. Brain-Computer Interfaces, 4(3), 155–174.
  • Dauwels et al.  [2011] Dauwels, J., Srinivasan, K., Ramasubba Reddy, M., Musha, T., Vialatte, F.-B., Latchoumane, C., Jeong, J., & Cichocki, A. 2011. Slowing and loss of complexity in Alzheimer’s EEG: two sides of the same coin? International journal of Alzheimer’s disease, 2011.
  • Dickey & Fuller [1979] Dickey, D. A., & Fuller, W. A. 1979. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association, 74(366a), 427–431.
  • Drago et al.  [2011] Drago, V., Babiloni, C., Bartres-Faz, D., Caroli, A., Bosch, B., Hensch, T., Didic, M., Klafki, H. W., Pievani, M., Jovicich, J., Venturi, L., Spitzer, P., Vecchio, F., Schoenknecht, P., Wiltfang, J., Redolfi, A., Forloni, G., Blin, O., Irving, E., Davis, C., Hardemark, H. G., & Frisoni, G. B. 2011. Disease tracking markers for Alzheimer’s disease at the prodromal (MCI) stage. J Alzheimers Dis, 26 Suppl 3, 159–99.
  • Fischl [2012] Fischl, B. 2012. FreeSurfer. Neuroimage, 62(2), 774–781.
  • Folstein et al.  [1975] Folstein, M. F., Folstein, S. E., & McHugh, P. R. 1975. Mini-mental state. Journal of Psychiatric Research, 12(3), 189–198.
  • Fowler et al.  [2002] Fowler, K. S., Saling, M. M., Conway, E. L., Semple, J. M., & Louis, W. J. 2002. Paired associate performance in the early detection of DAT. Journal of the International Neuropsychological Society, 8(1), 58–71.
  • Fruehwirt et al.  [2017] Fruehwirt, W., Zhang, P., Gerstgrasser, M., Grossegger, D., Schmidt, R., Benke, T., Dal-Bianco, P., Ransmayr, G., Weydemann, L., Garn, H., Waser, M., Osborne, M., & Dorffner, G. 2017. Bayesian Gaussian Process Classification from Event-Related Brain Potentials in Alzheimer’s Disease. Cham: Springer International Publishing. Pages 65–75.
  • Garn et al.  [2014] Garn, H., Waser, M., Deistler, M., Schmidt, R., Dal-Bianco, P., Ransmayr, G., Zeitlhofer, J., Schmidt, H., Seiler, S., Sanin, G., Caravias, G., Santer, P., Grossegger, D., Fruehwirt, W., & Benke, T. 2014. Quantitative EEG in Alzheimer’s disease: cognitive state, resting state and association with disease severity. International Journal of Psychophysiology, 93(3), 390–7.
  • Hoerl & Kennard [1970] Hoerl, A. E., & Kennard, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67.
  • Hughes et al.  [1982] Hughes, C. P., Berg, L., Danziger, W. L., Coben, L. A., & Martin, R L. 1982. A new clinical scale for the staging of dementia. The British Journal of Psychiatry, 140(6), 566–572.
  • Kalunga et al.  [2016] Kalunga, E. K., Chevallier, S.and Barthélemy, Q., Djouani, K., & Monacelli, E.and Hamam, Y. 2016. Online SSVEP-based BCI using Riemannian geometry. Neurocomputing, 191, 55–68.
  • Karcher [1977] Karcher, H. 1977. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5), 509–541.
  • Kendall [1938] Kendall, M. G. 1938. A new measure of rank correlation. Biometrika, 30(1/2), 81–93.
  • Pennec & Ayache [2006] Pennec, X.and Fillard, P., & Ayache, N. 2006. A Riemannian Framework for Tensor Computing. International Journal of Computer Vision, 66(1), 41–66.
  • Pike et al.  [2008] Pike, K. E., Rowe, C. C., Moss, S. A., & Savage, G. 2008.

    Memory profiling with paired associate learning in Alzheimer’s disease, mild cognitive impairment, and healthy aging.

    Neuropsychology, 22(6), 718–728.
  • Plihal & Born [1997] Plihal, W., & Born, J. 1997. Effects of early and late nocturnal sleep on declarative and procedural memory. Journal of cognitive neuroscience, 9(4), 534–547.
  • Schouten et al.  [2016] Schouten, T. M., Koini, M., de Vos, F., Seiler, S., van der Grond, J., Lechner, A., Hafkemeijer, A., Möller, C., Schmidt, R., de Rooij, M., & Rombouts, S. A. R. B. 2016. Combining anatomical, diffusion, and resting state functional magnetic resonance imaging for individual classification of mild and moderate Alzheimer’s disease.
  • Seiler et al.  [2012] Seiler, S., Schmidt, H., Lechner, A., Benke, T., Sanin, G., Ransmayr, G., Lehner, R., Dal-Bianco, P., Santer, P., Linortner, P., Eggers, C., Haider, B., Uranues, M., Marksteiner, J., Leblhuber, F., Kapeller, P., Bancher, C., Schmidt, R., & Group, P. S. 2012. Driving Cessation and Dementia: Results of the Prospective Registry on Dementia in Austria (PRODEM). PLoS ONE, 7(12), e52710.
  • Tibshirani [1996] Tibshirani, R. 1996. Regression shrinkage and selection via the lasso.
  • Tong & Thakor [2009] Tong, S., & Thakor, N. V. 2009. Quantitative EEG analysis methods and clinical applications. Artech House.
  • Tuzel et al.  [2007] Tuzel, O., Porikli, F., & Meer, P. 2007. Human Detection via Classification on Riemannian Manifolds. Pages 1–8 of:

    2007 IEEE Conference on Computer Vision and Pattern Recognition

  • Varma & Simon [2006] Varma, S., & Simon, R. 2006. Bias in error estimation when using cross-validation for model selection.
  • Vecchio et al.  [2013] Vecchio, F., Babiloni, C., Lizio, R., Fallani Fde, V., Blinowska, K., Verrienti, G., Frisoni, G., & Rossini, P. M. 2013. Resting state cortical EEG rhythms in Alzheimer’s disease: toward EEG markers for clinical applications: a review. Suppl Clin Neurophysiol, 62, 223–36.
  • Waser et al.  [2016] Waser, M., Garn, H., Schmidt, R., Benke, T., Dal-Bianco, P., Ransmayr, G., Schmidt, H., Seiler, S., Sanin, G., Mayer, F., Caravias, G., Grossegger, D., Fruhwirt, W., & Deistler, M. 2016. Quantifying synchrony patterns in the EEG of Alzheimer’s patients with linear and non-linear connectivity markers. Journal of Neural Engineering, 123(3), 297–316.
  • Yger et al.  [2015] Yger, F., Lotte, F., & Sugiyama, M. 2015. Averaging covariance matrices for eeg signal classification based on the csp: an empirical study. Pages 2721–2725 of: 23rd European Signal Processing Conference (EUSIPCO). IEEE.
  • Zou & Hastie [2005] Zou, H., & Hastie, T. 2005. Regularization and variable selection via the elastic net.