1 Introduction
Magnetoencephalography and electroencephalography (M/EEG) measure brain activity with millisecond precision from outside the head [20]. Both methods are noninvasive and expose rhythmic signals induced by coordinated neuronal firing with characteristic periodicity between minutes and milliseconds [9]. These socalled brainrhythms can reveal cognitive processes as well as health status and are quantified in terms of the spatial distribution of the power spectrum over the sensor array that samples the electromagnetic fields around the head [3].
Statistical learning from M/EEG commonly relies on covariance matrices estimated from bandpass filtered signals to capture the characteristic scale of the neuronal events of interest [7, 19]
. However, covariance matrices do not live in an Euclidean space but a Riemannian manifold. Fortunately, Riemannian geometry offers a principled mathematical approach to use standard linear learning algorithms such as logistic or ridge regression that work with Euclidean geometry. This is achieved by projecting the covariance matrices into a vector space equipped with an Euclidean metric, the tangent space. The projection is defined by the Riemannian metric, for example the geometric affineinvariant metric
[5] or the Wasserstein metric [6]. As a result, the prediction error can be substantially reduced when learning from covariance matrices using Riemannian methods [39, 12].In practice, M/EEG data is often provided in a rank deficient form by platform operators but also curators of public datasets [27, 2]. Its contamination with highamplitude environmental electromagnetic artefacts often render aggressive offlineprocessing mandatory to yield intelligible signals. Commonly used tools for artefactsuppression project the signal linearly into a lower dimensional subspace that is hoped to predominantly contain brain signals [34, 36, 29]. But this necessarily leads to inherently rankdeficient covariance matrices for which no affineinvariant distance is defined. One remedy may consist in using anatomically informed source localization techniques that can typically deal with rank deficiencies [14] and can be combined with sourcelevel estimators of neuronal interactions [26]. However, such approaches require domainspecific expert knowledge, imply processing steps that are hard to automate (e.g. anatomical coregistration) and yields pipelines in which excessive amounts of preprocessing are not under control of the predictive model.
In this work, we focus on regression with rankreduced covariance matrices. We propose two Riemannian methods for this problem. A first approach uses a Wasserstein metric that can handle rankreduced matrices, yet is not affineinvariant. In a second approach, matrices are projected into a common subspace in which affineinvariance can be provided. We show that both metrics can achieve perfect outofsample predictions in a synthetic generative model. Based on the SPoC method [13], we then present a supervised and computationally efficient approach to learn subspace projections informed by the target variable. Finally, we apply these models to the problem of inferring age from brain data [28, 26] on 595 MEG recordings from the Cambridge Center of Aging (CamCAN, http://camcan.org) covering an age range from 18 to 88 years [35]. We compare the datadriven Riemannian approaches to simpler methods that extract power estimates from the diagonal of the sensorlevel covariance as well as the cortically constrained minimum norm estimates (MNE) which we use to project the covariance into a subspace defined by anatomical prior knowledge.
Notations
We denote scalars with regular lowercase font, vectors with bold lowercase font and matrices with bold uppercase fonts.
is the identity matrix of size
. represents vector or matrix transposition. The Frobenius norm of a matrix will be denoted by with the trace operator. is the rank of a matrix. The norm of a vector is denoted by . We denote by the space of square realvalued matrices, the subspace of symmetric matrices, the subspace of symmetric positive definite matrices, the subspace of symmetric semidefinite positive (SPD) matrices, the subspace of SPD matrices of fixed rank R. All matrices are full rank, invertible (with) and diagonalizable with real strictly positive eigenvalues:
withan orthogonal matrix of eigenvectors of
() and the diagonal matrix of its eigenvalues . For a matrix , is its diagonal. We also define the exponential and logarithm of a matrix: , and .denotes the normal (Gaussian) distribution of mean
and variance
. Finally, represents the expectation andthe variance of any random variable
w.r.t. their subscript when needed.Background and M/EEG generative model
MEG or EEG data measured on channels are multivariate signals . For each subject , the data are a matrix where is the number of time samples. For the sake of simplicity, we assume that is the same for each subject, although it is not required by the following method. The linear instantaneous mixing model is a valid generative model for M/EEG data due to the linearity of Maxwell’s equations [20]. Assuming the signal originates from locations in the brain, at any time , the measured signal vector of subject is a linear combination of the source patterns , :
(1) 
where the patterns form the time and subjectindependent source mixing matrix , is the source vector formed by the timedependent sources amplitude, is a contamination due to noise. Note that the mixing matrix and sources are not known.
Following numerous learning models on M/EEG [7, 13, 19], we consider a regression setting where the target is a function of the power of the sources, denoted . Here we consider the linear model:
(2) 
where and is increasing. A first approach consists in estimating the sources before fitting such a linear model, for example using the Minimum Norm Estimator (MNE) approach [21]. This boils down to solving the socalled M/EEG inverse problem which requires costly MRI acquisitions and tedious processing [3]. A second approach is to work directly with the signals . To do so, models that enjoy some invariance property are desirable: these models are blind to the mixing and working with the signals is similar to working directly with the sources . Riemannian geometry is a natural setting where such invariance properties are found [15]. Besides, under Gaussian assumptions, model (1) is fully described by second order statistics. This amounts to working with covariance matrices, , for which Riemannian geometry is well developed. One specificity of M/EEG data is however that signals used for learning have been rankreduced, leading to rankdeficient covariance matrices, , for which specific matrix manifolds need to be considered.
2 Theoretical background to model invariances on manifold
2.1 Riemannian matrix manifolds
Endowing a continuous set of square matrices with a metric, that defines a local Euclidean structure, gives a Riemannian manifold with a solid theoretical framework. Let , a dimensional Riemannian manifold. For any matrix , as , belongs to a vector space of dimension called the tangent space at .
The Riemannian metric defines an inner product for each tangent space , and as a consequence a norm in the tangent space . Integrating this metric between two points gives a geodesic distance . It allows to define means on the manifold:
(3) 
The manifold exponential at , denoted , is a smooth mapping from to that preserves local properties. In particular, . Its inverse is the manifold logarithm from to , with for . Finally, since is Euclidean, there is a linear invertible mapping such that for all , . This allows to define the vectorization operator at , , defined by . Fig. 1 illustrates these concepts.
The vectorization explicitly captures the local Euclidean properties of the Riemannian manifold:
(4) 
Hence, if a set of matrices is located in a small portion of the manifold, denoting , it holds:
(5) 
For additional details on matrix manifolds, see [1], chap. 3.
Regression on matrix manifolds
The vectorization operator is key for machine learning applications: it projects points in
on , and the distance on is approximated by the distance on . Therefore, those vectors can be used as input for any standard regression technique, which often assumes a Euclidean structure of the data. More specifically, throughout the article, we consider the following regression pipeline. Given a training set of samples and target continuous variables , we first compute the mean of the samples . This mean is taken as the reference to compute the vectorization. After computing as, a linear regression technique (e.g. ridge regression) with parameters
can be employed assuming that .2.2 Distances and invariances on positive matrices manifolds
We will now introduce two important distances: the geometric distance on the manifold (also known as affineinvariant distance), and the Wasserstein distance on the manifold .
The geometric distance
Seeking properties of covariance matrices that are invariant by linear transformation of the signal, leads to endow the positive definite manifold
with the geometric distance [15]:(6) 
where are the real eigenvalues of . The affine invariance property writes:
(7) 
This distance gives a Riemannianmanifold structure to with the inner product [15]. The corresponding manifold logarithm at is and the vectorization operator of w.r.t. : , where is the vectorized uppertriangular part of , with unit weights on the diagonal and weights on the offdiagonal, and .
The Wasserstein distance
Unlike , it is hard to endow the manifold with a distance that yields tractable or cheaptocompute logarithms [37]. This manifold is classically viewed as , where is the set matrices of rank [25]. This view allows to write as a quotient manifold , where is the orthogonal group of size . This means that each matrix is identified with the set .
It has recently been proposed [30] to use the standard Frobenius metric on the total space . This metric in the total space is equivalent to the Wasserstein distance [6] on :
(8) 
This provides cheaptocompute logarithms:
(9) 
where
is a singular value decomposition and
. The vectorization operator is then given by , where the of a matrix is the vector containing all its coefficients.This framework offers closed form projections in the tangent space for the Wasserstein distance, which can be used to perform regression. Importantly, since , we can also use this distance on the positive definite matrices. This distance possesses the orthogonal invariance property:
(10) 
This property is weaker than the affine invariance of the geometric distance (7). A natural question is whether such an affine invariant distance also exists on this manifold. Unfortunately, it is shown in [8] that the answer is negative for (proof in appendix 6.3).
3 Manifoldregression models for M/EEG
3.1 Generative model and consistency of linear regression in the tangent space of
Here, we consider a more specific generative model than (1) by assuming a specific structure on the noise. We assume that the additive noise with and . This amounts to assuming that the noise is of rank and that the noise spans the same subspace for all subjects. Denoting and , this generative model can be compactly rewritten as .
We assume that the sources are decorrelated and independent from : with the powers, i.e. the variance over time, of the th source of subject , we suppose and . The covariances are then given by:
(11) 
where is a block diagonal matrix, whose upper block is .
In the following, we show that different functions from (2) yield a linear relationship between the ’s and the vectorization of the ’s for different Riemannian metrics.
Proposition 1 (Euclidean vectorization).
Assume . Then, the relationship between and is linear.
Proof.
Indeed, if , the relationship between and the is linear. Rewritting eq. (11) as , and since the are on the diagonal of the upper block of , the relationship between the and the coefficients of is also linear. This means that there is a linear relationship between the coefficients of and the variable of interest . In other words, is a linear combination of the vectorization of w.r.t. the standard Euclidean distance. ∎
Proposition 2 (Geometric vectorization).
Assume . Denote
the geometric mean of the dataset, and
the vectorization of w.r.t. the geometric distance. Then, the relationship between and is linear.The proof is given in appendix 6.1. It relies crucially on the affine invariance property that means that using Riemannian embeddings of the ’s, is equivalent to working directly with the ’s.
Proposition 3 (Wasserstein vectorization).
Assume . Assume that is orthogonal. Denote the Wasserstein mean of the dataset, and the vectorization of w.r.t. the Wasserstein distance. Then, the relationship between and is linear.
The proof is given in appendix 6.2. The restriction to the case where is orthogonal stems from the orthogonal invariance of the Wasserstein distance.
These propositions show that the relationship between the samples and the variable is linear in the tangent space, motivating the use of linear regression methods (see simulation study in Sec. 4). The argumentation of this section relies on the assumption that the covariance matrices are full rank. However, this is rarely the case in practice.
3.2 Learning projections on
In order to use the geometric distance on the , we have to project them on to make them full rank. In the following, we consider a linear operator of rank which is common to all samples (i.e. subjects). For consistency with the M/EEG literature we will refer to rows of as spatial filters. The covariance matrices of ‘spatially filtered’ signals are obtained as:
. With probability one,
, hence . Since the ’s do not span the same image, applying destroys some information. We consider two approaches to estimate .Unsupervised spatial filtering
A first strategy is to project the data into a subspace that captures most of its variance. This is achieved by Principal Component Analysis (PCA) applied to the averaged covariance matrice computed across subjects:
, where contains the eigenvectors corresponding to the top eigenvalues of the average covariance matrix . This step is blind to the values of and is therefore unsupervised. Note that under the assumption that the time series across subjects are independent, the average covariance is the covariance of the data over the full population.Supervised spatial filtering
We use a supervised spatial filtering algorithm [13] originally developed for intrasubject Brain Computer Interfaces applications, and adapt it to our crossperson prediction problem. The filters are chosen to maximize the covariance between the power of the filtered signals and . Denoting by the weighted average covariance matrix, the first filter is given by:
In practice, all the other filters in are obtained by solving a generalized eigenvalue decomposition problem (see the proof in Appendix 6.4).
4 Experiments
4.1 Simulations
We start by illustrating Prop. 2. Independent identically distributed covariance matrices and variables are generated following the above generative model. The matrix is taken as with
a random matrix, and
a scalar controlling the distance from to identity ( yields ). We use the function for to link the source powers (i.e. the variance) to the ’s. Model reads , with a small additive random perturbation.We compare three methods of vectorization: the geometric distance, the Wasserstein distance and the nonRiemannian method “logdiag” extracting the of the diagonals of as features. Note that the diagonal of contains the powers of each sensor for subject . A linear regression model is used following the procedure presented in Sec. 2. We take , and . We measure the score of each method as the average mean absolute error (MAE) obtained with 10fold crossvalidation. Fig. 2 displays the scores of each method when the parameters controlling the noise level and controlling the distance from to are changed. The same experiment with yields comparable results, yet with Wasserstein distance performing best and achieving perfect outofsample prediction when and is orthogonal.
4.2 MEG data
Predicting biological age from MEG on the Cambridge center of ageing dataset
In the following, we apply our methods to infer age from brain signals. Age is a dominant driver of crossperson variance in neuroscience data and a serious confounder [33]. As a consequence of the globally increased average lifespan, ageing has become a central topic in public health that has stimulated neuropsychiatric research at large scales. The link between age and brain function is therefore of utmost practical interest in neuroscientific research.
To predict age from brain signals, here we use the currently largest publicly available MEG dataset provided by the CamCAN [32]. We only considered the signals from magnetometer sensors () as it turns out that once SSS is applied (detailed in Appendix 6.6), magnetometers and gradiometers are linear combination of approximately 70 signals (), which become redundant in practice [16]. We considered taskfree recordings during which participants were asked to sit still with eyes closed in the absence of systematic stimulation. We then drew time samples from subjects. To capture agerelated changes in cortical brain rhythms [4, 38, 10], we filtered the data into frequency bands: low frequencies , , , , , , , and (Hz unit). These frequencies are compatible with conventional definitions used in the Human Connectome Project [27]. We verify that the covariance matrices all lie on a small portion of the manifold, justifying projection in a common tangent space. Then we applied the covariance pipeline independently in each frequency band and concatenated the ensuing features.
Datadriven covariance projection for age prediction
Three types of approaches are here compared: Riemannian methods (Wasserstein or geometric), methods extracting logdiagonal of matrices (with or without supervised spatial filtering, see Sec. 3.2) and a biophysicsinformed method based on the MNE source imaging technique [21]. The MNE method essentially consists in a standard Tikhonov regularized inverse solution and is therefore linear (See Appendix 6.5 for details). Here it serves as goldstandard informed by the individual anatomy of each subject. It requires a T1weighted MRI and the precise measure of the head in the MEG device coordinate system [3] and the coordinate alignment is hard to automate. We configured MNE with candidate dipoles. To obtain spatial smoothing and reduce dimensionality, we averaged the MNE solution using a cortical parcellation encompassing 448 regions of interest from [26, 18]. We then used ridge regression and tuned its regularization parameter by generalized crossvalidation [17] on a logarithmic grid of values in on each training fold of a 10fold crossvalidation loop. All numerical experiments were run using the ScikitLearn software [31], the MNE software for processing M/EEG data [18] and the PyRiemann package [11]. The proposed method, including all data preprocessing, applied on the 500GB of raw MEG data from the CamCAN dataset, runs in approximately 12 hours on a regular desktop computer with at least 16GB of RAM. The preprocessing for the computation of the covariances is embarrassingly parallel and can therefore be significantly accelerated by using multiple CPUs. The actual predictive modeling can be performed in less than a minute on standard laptop.
Riemannian projections are the leading datadriven methods
Fig. 3 displays the scores for each method. The biophysically motivated MNE projection yielded the best performance (7.4y MAE), closely followed by the purely datadriven Riemannian methods (8.1y MAE). The chance level was 16y MAE. Interestingly, the Riemannian methods give similar results, and outperformed the nonRiemannian methods. When Riemannian geometry was not applied, the projection strategy turned out to be decisive. Here, the supervised method performed best: it reduced the dimension of the problem while preserving the agerelated variance.
Importantly, the supervised spatial filters and MNE both support model inspection, which is not the case for the two Riemannian methods. Fig. 4 depicts the marginal patterns [23] from the supervised filters and the sourcelevel ridge model, respectively. The sensorlevel results suggest predictive dipolar patterns in the theta to beta range roughly compatible with generators in visual, auditory and motor cortices. Note that differences in headposition can make the sources appear deeper than they are (distance between the red positive and the blue negative poles). Similarly, the MNEbased model suggests localized predictive differences between frequency bands highlighting auditory, visual and premotor cortices. While the MNE model supports more exhaustive inspection, the supervised patterns are still physiologically informative. For example, one can notice that the pattern is more anterior in the band than the band, potentially revealing sources in the motor cortex.
5 Discussion
In this contribution, we proposed a mathematically principled approach for regression on rankreduced covariance matrices from M/EEG data. We applied this framework to the problem of inferring age from neuroimaging data, for which we made use of the currently largest publicly available MEG dataset. To the best of our knowledge, this is the first study to apply a covariancebased approach to regression problem in which the target is defined across persons and not within persons (as in braincomputer interfaces). Moreover, this study reports the first benchmark of age prediction from MEG resting state data on the CamCAN. Our results demonstrate that Riemannian datadriven methods do not fall far behind the goldstandard methods with biophysical priors, which depend on manual data processing. Finally, we report models that are explainable as they allow to uncover brainregion and frequencyband specific effects. These results suggest a tradeoff between performance and explainability. Our study suggests that the Riemannian methods have the potential to support automated largescale analysis of M/EEG data in the absence of MRI scans. Taken together, this potentially opens new avenues for biomarker development.
References
 [1] PA Absil, Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
 [2] Anahit Babayan, Miray Erbey, Deniz Kumral, Janis D. Reinelt, Andrea M. F. Reiter, Josefin Röbbig, H. Lina Schaare, Marie Uhlig, Alfred Anwander, PierreLouis Bazin, Annette Horstmann, Leonie Lampe, Vadim V. Nikulin, Hadas OkonSinger, Sven Preusser, André Pampel, Christiane S. Rohr, Julia Sacher, Angelika ThöneOtto, Sabrina Trapp, Till Nierhaus, Denise Altmann, Katrin Arelin, Maria Blöchl, Edith Bongartz, Patric Breig, Elena Cesnaite, Sufang Chen, Roberto Cozatl, Saskia Czerwonatis, Gabriele Dambrauskaite, Maria Dreyer, Jessica Enders, Melina Engelhardt, Marie Michele Fischer, Norman Forschack, Johannes Golchert, Laura Golz, C. Alexandrina Guran, Susanna Hedrich, Nicole Hentschel, Daria I. Hoffmann, Julia M. Huntenburg, Rebecca Jost, Anna Kosatschek, Stella Kunzendorf, Hannah Lammers, Mark E. Lauckner, Keyvan Mahjoory, Ahmad S. Kanaan, Natacha Mendes, Ramona Menger, Enzo Morino, Karina Näthe, Jennifer Neubauer, Handan Noyan, Sabine Oligschläger, Patricia PanczyszynTrzewik, Dorothee Poehlchen, Nadine Putzke, Sabrina Roski, MarieCatherine Schaller, Anja Schieferbein, Benito Schlaak, Robert Schmidt, Krzysztof J. Gorgolewski, Hanna Maria Schmidt, Anne Schrimpf, Sylvia Stasch, Maria Voss, Annett Wiedemann, Daniel S. Margulies, Michael Gaebler, and Arno Villringer. A mindbrainbody dataset of MRI, EEG, cognition, emotion, and peripheral physiology in young and old adults. Scientific Data, 6:180308 EP –, 02 2019.
 [3] Sylvain Baillet. Magnetoencephalography for brain electrophysiology and imaging. Nature Neuroscience, 20:327 EP –, 02 2017.
 [4] Luc Berthouze, Leon M James, and Simon F Farmer. Human eeg shows longrange temporal correlations of oscillation amplitude in theta, alpha and beta bands across a wide age range. Clinical Neurophysiology, 121(8):1187–1197, 2010.
 [5] Rajendra Bhatia. Positive Definite Matrices. Princeton University Press, 2007.
 [6] Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. On the bures–wasserstein distance between positive definite matrices. Expositiones Mathematicae, 2018.
 [7] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. Muller. Optimizing spatial filters for robust eeg singletrial analysis. IEEE Signal Processing Magazine, 25(1):41–56, 2008.
 [8] Silvere Bonnabel and Rodolphe Sepulchre. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications, 31(3):1055–1070, 2009.
 [9] György Buzsáki and Rodolfo Llinás. Space and time in the brain. Science, 358(6362):482–485, 2017.
 [10] C Richard Clark, Melinda D Veltmeyer, Rebecca J Hamilton, Elena Simms, Robert Paul, Daniel Hermens, and Evian Gordon. Spontaneous alpha peak frequency predicts working memory performance across the age span. International Journal of Psychophysiology, 53(1):1–9, 2004.
 [11] M. Congedo, A. Barachant, and A. Andreev. A new generation of braincomputer interface based on Riemannian geometry. arXiv eprints, October 2013.
 [12] Marco Congedo, Alexandre Barachant, and Rajendra Bhatia. Riemannian geometry for EEGbased braincomputer interfaces; a primer and a review. BrainComputer Interfaces, 4(3):155–174, 2017.
 [13] Sven Dähne, Frank C Meinecke, Stefan Haufe, Johannes Höhne, Michael Tangermann, KlausRobert Müller, and Vadim V Nikulin. Spoc: a novel framework for relating the amplitude of neuronal oscillations to behaviorally relevant parameters. NeuroImage, 86:111–122, 2014.
 [14] Denis A Engemann and Alexandre Gramfort. Automated model selection in covariance estimation and spatial whitening of meg and eeg signals. NeuroImage, 108:328–342, 2015.
 [15] Wolfgang Förstner and Boudewijn Moonen. A metric for covariance matrices. In GeodesyThe Challenge of the 3rd Millennium, pages 299–309. Springer, 2003.
 [16] Pilar Garcés, David LópezSanz, Fernando Maestú, and Ernesto Pereda. Choice of magnetometers and gradiometers after signal space separation. Sensors, 17(12):2926, 2017.
 [17] Gene H. Golub, Michael Heath, and Grace Wahba. Generalized crossvalidation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979.
 [18] Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A. Engemann, Daniel Strohmeier, Christian Brodbeck, Lauri Parkkonen, and Matti S. Hämäläinen. MNE software for processing MEG and EEG data. NeuroImage, 86:446–460, Feb. 2014.

[19]
M. GrosseWentrup* and M. Buss.
Multiclass common spatial patterns and information theoretic feature extraction.
IEEE Transactions on Biomedical Engineering, 55(8):1991–2000, Aug 2008.  [20] Matti Hämäläinen, Riitta Hari, Risto J Ilmoniemi, Jukka Knuutila, and Olli V Lounasmaa. Magnetoencephalography—theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of modern Physics, 65(2):413, 1993.
 [21] MS Hämäläinen and RJ Ilmoniemi. Interpreting magnetic fields of the brain: minimum norm estimates. Technical Report TKKFA559, Helsinki University of Technology, 1984.
 [22] Riitta Hari and Aina Puce. MEGEEG Primer. Oxford University Press, 2017.
 [23] Stefan Haufe, Frank Meinecke, Kai Görgen, Sven Dähne, JohnDylan Haynes, Benjamin Blankertz, and Felix Bießmann. On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage, 87:96 – 110, 2014.
 [24] Mainak Jas, Denis A Engemann, Yousra Bekhti, Federico Raimondo, and Alexandre Gramfort. Autoreject: Automated artifact rejection for MEG and EEG data. NeuroImage, 159:417–429, 2017.
 [25] Michel Journée, Francis Bach, PA Absil, and Rodolphe Sepulchre. Lowrank optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimization, 20(5):2327–2351, 2010.
 [26] Sheraz Khan, Javeria A Hashmi, Fahimeh Mamashli, Konstantinos Michmizos, Manfred G Kitzbichler, Hari Bharadwaj, Yousra Bekhti, Santosh Ganesan, KeriLee A Garel, Susan WhitfieldGabrieli, et al. Maturation trajectories of cortical restingstate networks depend on the mediating frequency band. NeuroImage, 174:57–68, 2018.
 [27] Linda J LarsonPrior, Robert Oostenveld, Stefania Della Penna, G Michalareas, F Prior, Abbas BabajaniFeremi, JM Schoffelen, Laura Marzetti, Francesco de Pasquale, F Di Pompeo, et al. Adding dynamics to the Human Connectome Project with MEG. Neuroimage, 80:190–201, 2013.
 [28] Franziskus Liem, Gaël Varoquaux, Jana Kynast, Frauke Beyer, Shahrzad Kharabian Masouleh, Julia M. Huntenburg, Leonie Lampe, Mehdi Rahim, Alexandre Abraham, R. Cameron Craddock, Steffi RiedelHeller, Tobias Luck, Markus Loeffler, Matthias L. Schroeter, Anja Veronica Witte, Arno Villringer, and Daniel S. Margulies. Predicting brainage from multimodal imaging data captures cognitive impairment. NeuroImage, 148:179 – 188, 2017.
 [29] Scott Makeig, Anthony J. Bell, TzyyPing Jung, and Terrence J. Sejnowski. Independent component analysis of electroencephalographic data. In Proceedings of the 8th International Conference on Neural Information Processing Systems, NIPS’95, pages 145–151, Cambridge, MA, USA, 1995. MIT Press.
 [30] Estelle Massart and PierreAntoine Absil. Quotient geometry with simple geodesics for the manifold of fixedrank positivesemidefinite matrices. Technical report, UCLouvain, 2018. preprint on webpage at http://sites.uclouvain.be/absil/2018.06.
 [31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
 [32] Meredith A Shafto, Lorraine K Tyler, Marie Dixon, Jason R Taylor, James B Rowe, Rhodri Cusack, Andrew J Calder, William D MarslenWilson, John Duncan, Tim Dalgleish, et al. The Cambridge Centre for Ageing and Neuroscience (CamCAN) study protocol: a crosssectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC neurology, 14(1):204, 2014.
 [33] Stephen M Smith and Thomas E Nichols. Statistical challenges in “big data” human neuroimaging. Neuron, 97(2):263–268, 2018.
 [34] Samu Taulu and Matti Kajola. Presentation of electromagnetic multichannel data: the signal space separation method. Journal of Applied Physics, 97(12):124905, 2005.
 [35] Jason R Taylor, Nitin Williams, Rhodri Cusack, Tibor Auer, Meredith A Shafto, Marie Dixon, Lorraine K Tyler, Richard N Henson, et al. The Cambridge Centre for Ageing and Neuroscience (CamCAN) data repository: structural and functional MRI, MEG, and cognitive data from a crosssectional adult lifespan sample. Neuroimage, 144:262–269, 2017.
 [36] Mikko A Uusitalo and Risto J Ilmoniemi. Signalspace projection method for separating MEG or EEG into components. Medical and Biological Engineering and Computing, 35(2):135–140, 1997.
 [37] Bart Vandereycken, PA Absil, and Stefan Vandewalle. Embedded geometry of the set of symmetric positive semidefinite matrices of fixed rank. In 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pages 389–392. IEEE, 2009.
 [38] Bradley Voytek, Mark A Kramer, John Case, Kyle Q Lepage, Zechari R Tempesta, Robert T Knight, and Adam Gazzaley. Agerelated changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38):13257–13265, 2015.
 [39] F. Yger, M. Berar, and F. Lotte. Riemannian approaches in braincomputer interfaces: A review. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10):1753–1762, Oct 2017.
6 Appendix
6.1 Proof of proposition 2
First, we note that by invariance, , where has the same block diagonal structure as the ’s, and for . Denote . By simple verification, we obtain , i.e. is orthogonal.
Furthermore, we have:
It follows that for all ,
Note that shares the same structure as the ’s, and that . for .
Therefore, the relationship between and the is linear.
Finally, since , the relationship between the ’s and the is linear, and the result holds.
6.2 Proof of proposition 3
First, we note that so it can be decomposed as with .
By orthogonal invariance, , where so has the same block diagonal structure as the ’s, and for . is also decomposed as with .
Further, with and coming from the SVD of which has the same structure as the ’s. Therefore has also the same structure with the identity matrix as its upper block.
Finally we have so it is linear in for .
6.3 Proof that there is no continuous affine invariant distance on if
We show the result for and ; the demonstration can straightforwardly be extended to the other cases. The proof, from [8], is by contradiction.
Assume that is a continuous invariant distance on . Consider and , both in . For
, consider the invertible matrix
.We have: , and .
Hence, as goes to , we have
Using affine invariance, we have:
Letting and using continuity of yields , which is absurd since .
6.4 Supervised Spatial Filtering
We assume that the signal is bandpass filtered in one of frequency band of interest, so that for each subject the band power of signal is approximated by the variance over time of the signal. We denote the expectation and the variance over time or subject by a corresponding subscript.
The source extracted by a spatial filter for subject is . Its power reads:
and its expectation across subjects is given by:
where is the average covariance matrix across subjects. Note that here, refers to the covariance of the and not its estimate as in Sec. 3.2.
We aim to maximize the covariance between the target and the power of the sources, . This quantity is affected by the scaling of its arguments. To address this, the target variable is normalized:
Following [13], to also scale we constrain its expectation to be 1:
The quantity one aims to maximize reads:
where .
Taking into account the normalization constraint we obtain:
(12) 
The Lagrangian of (12) reads . Setting its gradient w.r.t. to yields a generalized eigenvalue problem:
(13) 
Note that (12) can be also written as a generalized Rayleigh quotient:
Equation (13) has a unique closedform solution called the generalized eigenvectors of . The second derivative gives:
(14) 
Equation (14) leads to an interpretation of as the covariance between and , which should be maximal. As a consequence, is built from the generalized eigenvectors of Eq.(13), sorted by decreasing eigenvalues.
6.5 MNEbased spatial filtering
Let us denote the instantaneous mixing matrix that relates the sources in the brain to the MEG/EEG measurements. This forward operator matrix is obtained by solving numerically Maxwell’s equations after specifying a geometrical model of the head, typically obtained using an anatomical MRI image [22]. Here corresponds to the number of candidate sources in the brain. The MNE approach [21] offers a way to solve the inverse problem. MNE can be seen as Tikhonov regularized estimation, also similar to a ridge regression in statistics. Using such problem formulation the sources are obtained from the measurements with a linear operator which is given by:
The rows of this linear operator can be seen also as spatial filters that are mapped to specific locations in the brain. These are the filters used in Fig. 3, using the implementation from [18].
6.6 Preprocessing
Typical brain’s magnetic fields detected by MEG are in the order of 100 femtotesla ( T) which is ~ times the strength of the earth’s steady magnetic field. That is why MEG recordings are carried out inside special magnetically shielded rooms (MSR) that eliminate or at least dampen external ambient magnetic disturbances.
To pick up such tiny magnetic fields sensitive sensors have to be used [22]. Their extreme sensitivity is challenged by many electromagnetic nuisance sources (any moving metal objects like cars or elevators) or electrically powered instruments generating magnetic induction that is orders of magnitude stronger than the brain’s. Their influence can be reduced by combining magnetometers coils (that directly record the magnetic field) with gradiometers coils (that record the gradient of the magnetic field in certain directions). Those gradiometers, arranged either in a radial or tangential (planar) way, record the gradient of the magnetic field towards 2 perpendicular directions hence inherently greatly emphasize brain signals with respect to environmental noise.
Even though the magnetic shielded room and gradiometer coils can help to reduce the effects of external interference signals the problem mainly remains and further reduction is needed. Also additional artifact signals can be caused by movement of the subject during recording if the subject has small magnetic particles on his body or head. The Signal Space Separation (SSS) method can help mitigate those problems [34].
Signal Space Separation (SSS)
The Signal Space Separation (SSS) method [34], also called Maxwell Filtering, is a biophysical spatial filtering method that aim to produce signals cleaned from external interference signals and from movement distortions and artifacts.
A MEG device records the neuromagnetic field distribution by sampling the field simultaneously at P distinct locations around the subject’s head. At each moment of time the measurement is a vector
is the total number of recording channels.In theory, any direction of this vector in the signal space represents a valid measurement of a magnetic field, however the knowledge of the location of possible sources of magnetic field, the geometry of the sensor array and electromagnetic theory (Maxwell’s equations and the quasistatic approximation) considerably constrain the relevant signal space and allow us to differentiate between signal space directions consistent with a brain’s field and those that are not.
To be more precise, it has been shown that the recorded magnetic field is a gradient of a harmonic scalar potential. A harmonic potential is a solution of the Laplacian differential equation , where is represented by its spherical coordinates . It has been shown that any harmonic function in a threedimensional space can be represented as a series expansion of spherical harmonic functions :
(15) 
We can separate this expansion into two sets of functions: those proportional to inverse powers of and those proportional to powers of . From a given array of sensors and a coordinate system with its origin somewhere inside of the helmet, we can compute the signal vectors corresponding to each of the terms in 15.
Following notations of [34], let be the signal vector corresponding to term and the signal vector corresponding to . A set of such signal vectors forms a basis in the dimensional signal space, and hence, the signal vector is given as
(16) 
This basis is not orthogonal, but linearly independent so any measured signal vector has a unique representation in this basis:
(17) 
where the subbases and contain the basis vectors and , and vectors and contain the corresponding and values.
It can be shown that the spherical harmonic functions contain increasingly higher spatial frequencies when going to higher index values (l,m) so that the signals from real magnetic sources are mostly contained in the low end of the spectrum. By discarding the high end of the spectrum we thus reduce the noise. Then we can do signal space separation. It can be shown that the basis vectors corresponding to the terms in the second sum in expansion (15) represent the perturbating sources external to the helmet. We can than separate the components of field arising from sources inside and outside of the helmet. By discarding them we are left with the part of the signal coming from inside of the helmet only. The signal vector is then decomposed into 2 components