Modeling Symmetric Positive Definite Matrices with An Application to Functional Brain Connectivity

In neuroscience, functional brain connectivity describes the connectivity between brain regions that share functional properties. Neuroscientists often characterize it by a time series of covariance matrices between functional measurements of distributed neuron areas. An effective statistical model for functional connectivity and its changes over time is critical for better understanding the mechanisms of brain and various neurological diseases. To this end, we propose a matrix-log mean model with an additive heterogeneous noise for modeling random symmetric positive definite matrices that lie in a Riemannian manifold. The heterogeneity of error terms is introduced specifically to capture the curved nature of the manifold. We then propose to use the local scan statistics to detect change patterns in the functional connectivity. Theoretically, we show that our procedure can recover all change points consistently. Simulation studies and an application to the Human Connectome Project lend further support to the proposed methodology.

Authors

• 17 publications
• 14 publications
• 31 publications
04/26/2019

Discovering Common Change-Point Patterns in Functional Connectivity Across Subjects

This paper studies change-points in human brain functional connectivity ...
04/10/2019

Analyzing Dynamical Brain Functional Connectivity As Trajectories on Space of Covariance Matrices

Human brain functional connectivity (FC) is often measured as the simila...
09/13/2020

Functional random effects modeling of brain shape and connectivity

We present a statistical framework that jointly models brain shape and f...
01/26/2017

Riemannian-geometry-based modeling and clustering of network-wide non-stationary time series: The brain-network case

This paper advocates Riemannian multi-manifold modeling in the context o...
11/02/2018

Bayesian Hierarchical Modeling on Covariance Valued Data

Analysis of structural and functional connectivity (FC) of human brains ...
01/28/2020

WISDoM: a framework for the Analysis of Wishart distributed matrices

WISDoM (Wishart Distributed Matrices) is a new framework for the charact...
08/08/2020

k-means on a log-Cholesky Manifold, with Unsupervised Classification of Radar Products

We state theoretical properties for k-means clustering of Symmetric Posi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Understanding the functional brain connectivity is critical for understanding the fundamental mechanisms of brain, how it works, and various neurological diseases. It has attracted great interest recently. For instance, the Human Connectome Project investigates the structural and functional connectivity in order to diagnose cognitive abilities of individual subjects. Functional connectivity can be defined as temporary statistical dependence between spatially remote neurophysiological events (Friston, 2011), and has been observed to be dynamic in nature, even in the resting state (Hutchison et al., 2013). In practice, neuroscientist often characterize the dynamic functional connectivity by a series of symmetric positive definite (SPD) covariance matrices between functional measurements of neuronal activities across different regions in human brain. Establishing appropriate dynamic models is critical for understanding fundamental mechanisms of brain networks and has attracted much attention in neuroscience recently (Xu and Lindquist, 2015; Hutchison et al., 2013).

However, little has been done in the statistics community for investigating dynamic changes of functional connectivity over time. The non-Euclidean structure of covariances has introduced significant challenges to the development of proper statistical models and their analysis. Indeed, all SPD matrices form a nonlinear Riemannian manifold, which is referred to as the SPD manifold. Motivated by the SPD manifold structure under the Log-Euclidean metric (Arsigny et al., 2007), we use the matrix logarithm to embed the SPD matrices into a Hilbert space – an Euclidean space up to a symmetric structure, to be concrete. We then model the transformed random SPD matrix using a mean model with an additive heterogeneous noise. The heterogeneous error depends on the tangent space of the mean SPD matrix and thus takes the curved structure of the SPD manifold into account. Our work refines the previous work by Chiu et al. (1996), whose model does not respect the original manifold structure.

Built on this statistical model, we then propose to use a form of local scan statistics to detect multiple change patterns that are present in the functional brain connectivity over time. To the best of our knowledge, ours is the first work on the study of change point detection for SPD manifold-valued data. Although the proposed method is primarily motivated by discovering change patterns in fMRI, it has the potential to be applied to many other applications, such as diffusion tensor imaging

(Dryden et al., 2009) and longitudinal data analysis (Daniels and Pourahmadi, 2002).

1.1 Related Literature

Change point detection with at most a single change point has been widely studied in the literature. When the distributions of the data are assumed to be known, score- or likelihood-based procedures can be applied (James et al., 1987). Bayesian and nonparametric approaches have also been proposed, see Carlstein et al. (1994) for a review. More recently, Chen and Zhang (2015)

proposed a graph based approach for nonparametric change point detection. When there are multiple change points, the problem becomes much more complicated. Some popular approaches include the exhaustive search with Schwarz criterion

(Yao, 1988), the circularly binary segmentation (Olshen et al., 2004) and the fused lasso method (Tibshirani and Wang, 2007). In genomics, these techniques have been exploited to study DNA copy number variations, see Olshen et al. (2004); Zhang and Siegmund (2007); Tibshirani and Wang (2007) among others. However, none of the above methods deals with Riemannian data.

There have been a few works on Riemannian data analysis in the statistics literature. For example, Schwartzman (2006)

proposed several test statistics for comparing the means of two populations of symmetric positive definite matrices.

Zhu et al. (2009) developed a semiparametric regression model for symmetric definite positive matrices with Euclidean covariates. Later, Yuan et al. (2012) studied the local polynomial regression in the same setting. Steinke and Hein (2009) consider nonparametric regression between general Riemannian manifolds. Petersen and Müller (2017) developed a novel Fréchet regression approach for complex random objects with euclidean covariates. We believe our work will be a valuable addition to the literature.

2 Geometric Interpretation

We briefly introduce , the Riemannian manifold consisting of all symmetric positive definite matrices, while we refer readers to the appendix for more details. A Riemannian manifold is a smooth manifold endowed with an inner product on the tangent space at each point , such that varies with smoothly. We consider the Log-Euclidean metric for the symmetric positive definite matrix manifold due to its computational tractability (Arsigny et al., 2007). Other metrics include the naive Frobenius metric which does not account for the curved nature of symmetric positive definite matrices and the affine invariant metric which is more difficult to compute (Terras, 2012).

For a manifold , we use to denote the tangent space at the base point . It can be shown that the tangent space at the identify matrix is the space of symmetric matrices, denoted by . For the Log-Euclidean metric, the inner product between on

at the identity matrix

is defined as . To define the inner product at a general point, we utilize the concept of differential maps. For a smooth transformation between two manifolds, its differential at , denoted by

, is a linear map sending a tangent vector

to a tangent vector . See Figure 1 for a graphical illustration. When both and are Euclidean submanifolds, the differential is the usual notion of differential of the function at , given by a Jacobian matrix. With this formalism, we consider the smooth map , where is the matrix logarithm, the inverse map of the matrix exponential. The matrix exponential of a matrix is defined as . The Riemannian metric at a general point is then defined as , where is a linear operator (Arsigny et al., 2007). The Riemannian exponential map under this metric is given by , where is the matrix exponential. Riemannian exponential maps are closely related to the intrinsic properties of a manifold, such as the geodesics and the Gauss curvature (Lee, 1997).

3 Methodology

3.1 A Heterogeneous Matrix-log Mean Model

Suppose we have collected a sequence of matrix-valued observations . We propose the following matrix-log mean model for investigating the mean changes of the data sequence

 logYi=logμi+log′μiεi, (1)

where is the mean matrix, and is a mean-zero error term in . Here is a linear operator acting on . The noise term has mean zero, but the corresponding covariance depends on . Hence model (1) has a heterogeneous noise component.

Interestingly, the heterogeneity of the noise terms makes use of the Riemannian manifold structure introduced in Section 2. Without using the geometric structure, one could simply apply the matrix logarithm first and then model the random SPD matrices ’s as

 logYi=logμi+ξi,

where ’s are identically distributed random elements. This naive model, first introduced by Chiu et al. (1996)

, misses the curved structure in the SPD manifold, and thus is less efficient for estimation and inference. Different from theirs, we introduce the location-dependent transformations

in model (1) to respect the original manifold structure, because it turns this model into a geodesic/intrinsic mean model. To appreciate this, we take matrix exponential on both sides and find that

 Yi=exp(logμi+log′μiεi)=Expμiεi.

It can shown that is the minimizer to the following optimization program

 μi=\argminS∈Sym+⋆(m)\EEg2(S,Yi),

where is the geodesic distance between and in Therefore, model (1) serves as an exact counterpart of the Euclidean mean model , where minimizes over .

We emphasize that the idea of using a matrix logarithm to model SPD matrices was first explored by Leonard and Hsu (1992) and Chiu et al. (1996)

. However, their approach does not take the manifold structure into account. From the modeling perspective, our key contribution is that we establish a parametric model for SPD matrices that respects the original manifold structure.

Model (1) provides a natural way to investigate change-point detection problems for SPD manifold-valued data. For this purpose, we further assume that there exist and such that if and otherwise. Elements in are called change points. Our goal is to detect based on the data sequence .

3.2 Computational Details

Computationally, it is more convenient to work with a basis of the space which is a dimensional Hilbert space under the Frobenius inner product. The Frobenius inner product between and is defined as . Let be an orthonormal basis of under this inner product. Then, for any , we can write with , and identify it with its coefficient vector , denoted by .

In this paper, we adopt the basis constructed in the following. Let be the matrix of zeros except the and entries, which are set to if , and otherwise. Since , we consider basis matrices ’s with . It can be checked that and if or , where denotes the Frobenius norm. Let , and then form an orthonormal basis for . We use this basis in our computation. Note that the results presented in the paper are identical for all bases.

To compute the matrix logarithm for

, we first find a unitary matrix

such that for a diagonal matrix . The matrices and

can be computed by eigendecomposition or singular value decomposition (SVD). Then

with .

To compute the matrix representation of the linear differential operator for a given symmetric positive definite matrix with respect to the basis , we first note that (Arsigny et al., 2007). Therefore, once we have the matrix representation of the linear operator with respect to the basis , then will be the matrix representation of , noting that the non-singularity of everywhere implies the invertibility of . If is the coefficient vector (viewed as a column vector) of with respect to a chosen basis, then it is seen that the is given by the matrix concatenated by column vectors . Therefore, the problem boils down to the computation of

 exp′logμϕj=∞∑k=11k!k−1∑ℓ=0(logμ)k−ℓ−1⋅ϕj⋅(logμ)ℓ.

Numerically, the above series is truncated at a sufficiently large . Note that when , we have specially .

3.3 A Local Scan Procedure

Roughly speaking, an ideal statistic for detecting change patterns, or change points, at a position should directly relate to the possibility that is a change point. The statistic at the position we proposed is a locally weighted average of the transformed ’s near :

 G(x,h)=n∑i=1wi(h)−−−→logYi,

where if , if , and otherwise. We remind the readers that denotes the coefficient vector of the matrix with respect to a basis . The defined above is constructed based on data points within a local window of size around the point . The intuition is that, if there is no change point within the window , then has mean zero and is close to zero. Otherwise, if is large, then is likely to be a change point. In particular, points that locally maximize have a high chance of being a change point. We say that is a local maximizer if for all . The set of local maximizers is denoted by . Suppose , where is the number of elements in . For a given threshold , we then estimate by , and is estimated by the cardinality of .

However, the above procedure for estimating depends on the unknown parameter . In practice, we propose a data-driven alternative based on the -fold cross validation to select the number of change points. Suppose that are change points, which divided all time points into segments. Within each segment, time points are randomly split into partitions. The sample mean of a segment is estimated by using data from any partitions within that segment, and the validation error is evaluated on the rest one partition. The cross validation error of the segment is defined to be the sum of validation errors from the partitions, while the total cross-validation error is the sum of cross-validation errors across all segments. Formally, the total cross-validation error is defined as

 CV(k)=k+1∑q=1K∑p=1∑i∈Pq,p(ˆlogμq,p−−−−→logYi)T(ˆlogμq,p−−−−→logYi),

where is the th partition of the th segment, and . Here denotes the time points in the th segment but not in the partition and the cardinality of the set . The integer that minimizes is chosen as an estimate of . We then estimate the locations of change points using the proposed scan statistics.

4 Asymptotic Theory

A random vector is called a subgaussian vector with parameter if , and for all ,

 E[exp(aT(ξ−ν))]≤exp(∥a∥2σ2/2).

We say that a random element in is subgaussian if its coefficient vector with respect to the orthonormal basis is subgaussian. One can easily check that this definition is independent of the choice of the orthonormal basis of . Below we shall assume is subgaussian with parameter . This might depend on and thus the linear operator and its matrix representation . For example, one might conceive of i.i.d. subgaussian random elements and applying to , where different transformations result in different distributions of

. Although subgaussianity is well preserved by linear transformations, the subgaussian parameter might differ after transformation. For example, one can show that, if

is subgaussian with a parameter , then is subgaussian, but with a parameter . In this case, might quantify the magnitude, measured by , of the transformation .

To derive the sure coverage property of the proposed procedure, we define , , and , where we conventionally denote and . We need the following assumption. The quantities , , and satisfy that . Here, characterizes the variability of ’s over all time points. The quantity characterizes the strength of the weakest signal of change points, while indicates the separability of change points. Intuitively, when and are small, no method would succeed in recovering all change points. Recall that denotes the dimension of the space . It is seen that detection of change points becomes harder for higher dimensional matrices, i.e., a larger , since stronger signal (larger ) or better separation of change points (larger ) is required to make the inequality in the above assumption hold.

Now we establish the sure coverage property of the proposed procedure, that is, the union of the intervals selected by our procedure recovers all change points with probability going to

. An nonasymptotic probability bound is also derived, with explicit dependence on the sample size . We use to denote that for all . We are ready to state the main theorem of this paper, whose proof is deferred to the appendix. Suppose that Assumption 4 holds. If and , then

 Pr(^J =J,J⊂ˆJ±h)→1, as n→∞.

We emphasize here that the dimension does not need to be assumed to be fixed and could potentially diverge to infinity as long as Assumption 4 holds.

5 Simulation Studies

In this section, we examine the empirical performance of our method. We generate data according to model (1). In the first example, we consider different combinations of such that , , , respectively. When , we set , and . When , we set for , where , , , , , and . For the symmetric random noise, we first sample the coefficient vector from distribution , then combine it with the basis to generate the noise . Our second example is concerned with a larger by setting that , , , .

Choosing the optimal bandwidth is usually a difficult task for change point problems, see, for example, Niu and Zhang (2012) for a detailed discussion. Intuitively, when there is only one change point in the interval , the larger is, the more powerful the scan statistic is. But when the bandwidth gets too large, the interval might contain multiple change points. Therefore we need to choose bandwidth carefully. In our simulations, we found that the performance of the procedure is relatively robust to the choice of the bandwidth as long as the bandwidth is not too large, and works relatively well in our case. We use the proposed cross validation technique to select the number of change points. We run 100 repetitions of Monte Carlo studies. For each run, we calculate the estimated number of change points and the locations of the change points. We report the frequencies of the three cases: , and , the mean of the number of change points detected, and the sure coverage probability for each of the change point. We also compare two methods that are frequently used in practice. The first one vectorizes the response without considering any manifold structure, which results in a -dimensional vector. We denote this method as “Vector”. The other one also adopts the vectorization idea, but additionally takes the symmetric information into account, yielding a -dimensional vector. We use “Symmetric” to denote this method. The results are summarized in Tables 1 and 2.

As indicated by the results, our proposed method performs better than the comparison methods in terms of the percentage of correctively recovering the number of change points, in all cases. Additionally, when , all the methods presented here can detect the second change point very well. But for the first one, our method achieves a higher sure coverage probability. When , our method achieves a higher sure coverage probability for all the change points. These results suggest the importance of considering the geometric structure of the Riemannian data, at least, in change point detection problems.

6 An Application to the Human Connectome Project

We apply the proposed methodology to the social cognition task related fMRI data from Human Connectome Project Dataset, which includes behavioral and 3T MR imaging data from 970 healthy adult participants collected from 2012 to spring 2015. We focus on the 850 subjects out of the 970 which have the social cognition task related fMRI data. Participants were presented with short video clips (20 seconds) of objects (squares, circles, triangles) that either interacted in some way, or moved randomly on the screen (Castelli et al., 2000; Wheatley et al., 2007). There were 5 video blocks (2 Mental and 3 Random in one run, 3 Mental and 2 Random in the other run) in the task run.

We use the “Desikan-Killiany” atlas (Desikan et al., 2006) to divide the brain into 68 regions of interest. Figure 2(a) shows the “Desikan-Killiany” parcellation of the cortical surface in the left and right hemisphere. We pick eight possible regions that are related to the social task, that are the left and right part of superior temporal, inferior parietal, temporal pole and precuneus (Green et al., 2015). These eight regions of interest are highlighted in yellow in Figure 2(b). For each subject, the fMRI data are recorded on 274 evenly spaced time points, one per 0.72 seconds. We use a moving local window of size 100 to calculate the cross covariance between these eight regions, which results in 175 cross covariance matrices with dimensions . We then apply the proposed method to detect change points in this sequence of cross covariance matrices with bandwidth set to be .

We apply the method to all the subjects, and report the locations of change points detected for each subject. In Table 3, we have summarized the count and percentage of the number of change points detected for all the subjects. The mean number of change points detected among all the subjects is . This result matches the physiology well since there are 5 video blocks in the task design, with changes at the time points , , and respectively. To further validate the proposed methodology, we pick up all those subjects with four change points, and calculate the mean locations respectively. The means are , , , , which are fairly close to the task block changes. A more interesting observation is that the lags (4.6s, 2.0s, 1.7s and -1.5s) are becoming shorter and shorter: the last time point even precedes the designed time point. This, to our understanding, witnesses the powerfulness of human brains for learning the change patterns.

7 Discussion

In this paper we propose an additive matrix-log mean model with a heterogeneous noise for modeling random symmetric positive definite matrices that lie in a Riemannian manifold. The heterogeneous noise part takes account the manifold structure of the original symmetric positive definite matrices. Built upon this model, we then propose a scan statistic to perform multiple change point detection. Theoretical studies and numerical examples lend further support to our proposed methodology.

Our proposed methodology replies on the assumption that the collected samples ’s are independent. Independence is an ideal assumption that may be violated in some settings. However, this assumption allows us to conduct theoretical analysis, which also produce results that could be useful when the assumption is violated. We will pursue the change-point detection problems under dependence for Riemannian data in future work.

References

• Arsigny et al. (2007) Arsigny, V., Fillard, P., Pennec, X. and Ayache, N. (2007). Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal of Matrix Analysis and Applications 29 328–347.
• Carlstein et al. (1994) Carlstein, E. G., Müller, H.-G. and Siegmund, D. (1994). Change-point problems. Institute of Mathematical Statistics.
• Castelli et al. (2000) Castelli, F., Happé, F., Frith, U. and Frith, C. (2000). Movement and mind: a functional imaging study of perception and interpretation of complex intentional movement patterns. NeuroImage 12 314–325.
• Chen and Zhang (2015) Chen, H. and Zhang, N. (2015). Graph-based change-point detection. The Annals of Statistics 43 139–176.
• Chiu et al. (1996) Chiu, T. Y., Leonard, T. and Tsui, K.-W. (1996). The matrix-logarithmic covariance model. Journal of the American Statistical Association 91 198–210.
• Daniels and Pourahmadi (2002) Daniels, M. J. and Pourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89 553–566.
• Desikan et al. (2006) Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., Buckner, R. L., Dale, A. M., Maguire, R. P. and Hyman, B. T. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage 31 968–980.
• Dryden et al. (2009) Dryden, I. L., Koloydenko, A. and Zhou, D. (2009). Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. The Annals of Applied Statistics 3 1102–1123.
• Fletcher (2013) Fletcher, P. T. (2013). Geodesic regression and the theory of least squares on Riemannian manifolds.

International Journal of Computer Vision

105 171–185.
• Friston (2011) Friston, K. J. (2011). Functional and effective connectivity: a review. Brain Connectivity 1 13–36.
• Green et al. (2015) Green, M. F., Horan, W. P. and Lee, J. (2015). Social cognition in schizophrenia. Nature Reviews. Neuroscience 16 620–631.
• Hsu et al. (2012) Hsu, D., Kakade, S. and Zhang, T. (2012). A tail inequality for quadratic forms of subgaussian random vectors. Electronic Communications in Probability 17 1–6.
• Hutchison et al. (2013) Hutchison, R. M., Womelsdorf, T., Allen, E. A., Bandettini, P. A., Calhoun, V. D., Corbetta, M., Della Penna, S., Duyn, J. H., Glover, G. H. and Gonzalez-Castillo, J. (2013). Dynamic functional connectivity: promise, issues, and interpretations. NeuroImage 80 360–378.
• James et al. (1987) James, B., James, K. L. and Siegmund, D. (1987). Tests for a change-point. Biometrika 74 71–83.
• Lee (1997) Lee, J. M. (1997). Riemannian Manifolds: An Introduction to Curvature. Springer-Verlag, New York.
• Leonard and Hsu (1992) Leonard, T. and Hsu, J. S. (1992). Bayesian inference for a covariance matrix. The Annals of Statistics 1669–1696.
• Niu and Zhang (2012) Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. The Annals of Applied Statistics 6 1306–1326.
• Olshen et al. (2004) Olshen, A. B., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based dna copy number data. Biostatistics 5 557–572.
• Petersen and Müller (2017) Petersen, A. and Müller, H.-G. (2017). Fréchet regression for random objects with Euclidean predictors. The Annals of Statistics to appear.
• Schwartzman (2006) Schwartzman, A. (2006). Random ellipsoids and false discovery rates: Statistics for diffusion tensor imaging data. Ph.D. thesis, Stanford University.
• Steinke and Hein (2009) Steinke, F. and Hein, M. (2009). Non-parametric regression between manifolds. In Advances in Neural Information Processing Systems.
• Terras (2012) Terras, A. (2012). Harmonic analysis on symmetric spaces and applications II. Springer Science & Business Media.
• Tibshirani and Wang (2007) Tibshirani, R. and Wang, P. (2007). Spatial smoothing and hot spot detection for cgh data using the fused lasso. Biostatistics 9 18–29.
• Wheatley et al. (2007) Wheatley, T., Milleville, S. C. and Martin, A. (2007). Understanding animate agents: distinct roles for the social network and mirror system. Psychological Science 18 469–474.
• Xu and Lindquist (2015) Xu, Y. and Lindquist, M. A. (2015). Dynamic connectivity detection: an algorithm for determining functional connectivity change points in fmri data. Frontiers in Neuroscience 9.
• Yao (1988) Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statistics & Probability Letters 6 181–189.
• Yuan et al. (2012) Yuan, Y., Zhu, H., Lin, W. and Marron, J. S. (2012). Local polynomial regression for symmetric positive definite matrices. Journal of the Royal Statistical Society: Series B 74 697–719.
• Zhang and Siegmund (2007) Zhang, N. R. and Siegmund, D. O. (2007). A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 22–32.
• Zhu et al. (2009) Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C. and Lin, W. (2009). Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging. Journal of the American Statistical Association 104 1203–1212.

Appendix S.1 Preliminary

In this section, we further discuss the smooth and Riemannian manifold. For a comprehensive treatment on these subjects, readers are referred to the introductory book by Lee (1997).

A smooth manifold is a differentiable manifold with all transition maps being -differentiable. Associated with each point on the manifold , there exists a linear space called the tangent space at the base point

. Each element in the tangent space is called a tangent vector. For a manifold that is a submanifold of a Euclidean space, the tangent space at a point can be geometrically visualized as the hyperplane tangent to that point, while tangent vectors are visualized as Euclidean vectors tangent to the manifold at that point; see Figure

S.3 for an illustration. It is emphasized that tangent vectors at different base points are different, despite that the vectors might point to the same direction. Thus, a tangent vector always implicitly comes with a base point. For Euclidean submanifolds, a tangent vector at a point can also be algebraically interpreted as a directional derivative at , such that for all with denoting the collection of real-valued smooth functions defined on the manifold . Observe that is a derivation at , which satisfies the Leibniz rule,

 Dv(fg)=g(x)(Dvf)+f(x)(Dvg),

for any and , where denotes the pointwise product of functions. This allows one to generalize the concept of tangent vector as directional derivative to non-Euclidean manifolds, by defining tangent vectors at as derivations at , and tangent space at as the space of derivations at . A convenient way to perceive the derivation represented by a tangent vector is to treat as a linear functional that maps into .

For a smooth transformation that maps points on a manifold to points on the manifold , its differential at , denoted by , is a linear map sending a tangent vector to a tangent vector , such that the derivation corresponding to the tangent vector at the point is depicted by

 Dφ′x(v):C∞(M)→R,s.t.∀f∈C∞(M):Dφ′x(v)f=Dv(f∘φ),

where denotes the composition of functions. When both and are Euclidean submanifolds, the differential is the usual notion of differential of the function at , given by a Jacobian matrix. Specially, when is an interval of the real line, the tangent space at each is the whole real line . In this case, is often called a (parameterized) smooth curve on , and is denoted by which is the derivative of the curve at time . Here, to properly decode the notation , recall that is a map sending a tangent vector of a manifold to a tangent vector of another manifold, and for the special manifold , the real number can be viewed as a tangent vector at . Geometrically and intuitively, is a vector tangent to the curve at time , as illustrated in Figure S.3.

A Riemannian manifold is a smooth manifold endowed with an inner product on the tangent space at each point , such that varies with smoothly. The collection of such inner products is often called the Riemannian metric tensor, or simply Riemannian metric. One can show that, the metric tensor induces a distance function that turns the manifold into a metric space. A geodesic is a smooth curve on the manifold such that for any sufficiently small segment, the segment is the unique smooth curve with the minimal length among all smooth curves connecting the two endpoints of the segment. Every smooth curve on the manifold can be parameterized by a smooth map from an interval in to the manifold. For any , there exists a unique geodesic such that and . Then the exponential map at , denoted by , is defined by . For example, one can verify that for the unit circle , for and , the defining geodesic for the Riemannian exponential map is , as and . Thus, . A graphical illustration of the Exp map is given in Figure S.3.

For the Log-Euclidean metric, at the identity matrix , it is defined as for , the Frobenius inner product on . In order to define metric at other points, the following group structure is considered. Define , where and are the matrix exponential and logarithm respectively. The operation turns into a group. Now we define the left-translation operator for . As shown in Arsigny et al. (2007), is a smooth map from to itself. Thus, its differential at is a linear map that sends tangent vectors at to tangent vectors at . For instance, the linear operator maps tangent vectors at to tangent vectors at the identity. Given this property, we can “translate” the metric at the identity matrix to all points by the left-translation operator . More specifically, the Log-Euclidean metric at any is defined by for all , where the last identity is due to (Arsigny et al., 2007). The Riemannian exponential map under this metric is given by .

Appendix S.2 Proofs

Let , and , where denotes the collection of flat points, i.e., if and only if for all . Then holds under the event .

Proof of Lemma s.2.

The proof can be found in Lemma 3 of Niu and Zhang (2012). ∎

Proof of Theorem 4.

We first note the following facts about subgaussian random vectors that will be used in the sequel.

• If are independent and subgaussian with parameters , respectively, then is a subgaussian random vector with parameters and .

• If is a subgaussian random vector with a parameter and is a matrix, then is a subgaussian random vector with parameters and .

A point is called a -flat point (or simply flat point if is clear from the context) if for all . For a flat point , has mean zero and also is a subgaussian random vector with parameters and . Here, we recall that is subgaussian with the parameter . For a change-point , similarly, is subgaussian with parameter .

Let and . For a flat point , we first observe that

 Pr{∥G(x,h)∥22>an}≤e−tn=1nlogn

according to Theorem 1 of Hsu et al. (2012). According to Assumption 1 and the choice of and , we have and thus

 Pr{∥G(x,h)∥22>ρ}≤Pr{∥G(x,h)∥22>an}≤1nlogn.

Similarly, from Assumption 1 we deduce that and . Thus, for a change point ,

 Pr{∥G(τ,h)∥2<√ρ} ≤Pr{∥G(τ,h)∥2<∥δτ∥2−√an} ≤Pr{|∥G(τ,h)∥2−∥δτ∥2|>√an} ≤Pr{∥G(τ,h)−δτ∥22>an} ≤1nlogn,

or equivalently,

 Pr{∥G(τ,h)∥22<ρ}≤1nlogn.

Next, we bound the probabilities of events defined in Lemma S.2.

 Pr{En(h,ρ)} ≥1−Pr[{An(h,ρ)}c]−Pr[{Bn(h,ρ)}c]. (S.2)

Now, note that

 Pr[{An(h,ρ)}c] =Pr{∃x∈F:∥G(x,h)∥22>ρ}≤∑x∈FPr{∥G(x,h)∥22>ρ} ≤∑x∈F1nlogn≤n(1nlogn)=1logn. (S.3)

Similarly,

 Pr[{Bn(h,ρ)}c] =Pr{∃τ∈J:∥G(τ,h)∥22<ρ}≤∑τ∈JPr{∥G(τ,h)∥22<ρ} ≤∑τ∈J1nlogn≤nnlogn=1logn. (S.4)

Combining (S.2), (S.3) and (S.4), we conclude that

 Pr{En(h,ρ)}≥1−2logn→1. (S.5)

Finally, the theorem follows from Lemma S.2. ∎