# Total Variation Regularized Fréchet Regression for Metric-Space Valued Data

Non-Euclidean data that are indexed with a scalar predictor such as time are increasingly encountered in data applications, while statistical methodology and theory for such random objects are not well developed yet. To address the need for new methodology in this area, we develop a total variation regularization technique for nonparametric Fréchet regression, which refers to a regression setting where a response residing in a generic metric space is paired with a scalar predictor and the target is a conditional Fréchet mean. Specifically, we seek to approximate an unknown metric-space valued function by an estimator that minimizes the Fréchet version of least squares and at the same time has small total variation, appropriately defined for metric-space valued objects. We show that the resulting estimator is representable by a piece-wise constant function and establish the minimax convergence rate of the proposed estimator for metric data objects that reside in Hadamard spaces. We illustrate the numerical performance of the proposed method for both simulated and real data, including the metric spaces of symmetric positive-definite matrices with the affine-invariant distance and of probability distributions on the real line with the Wasserstein distance.

## Authors

• 17 publications
• 24 publications
02/10/2022

### Random Forests Weighted Local Fréchet Regression with Theoretical Guarantee

Statistical analysis is increasingly confronted with complex data from g...
10/26/2020

### Robust Bayesian Inference for Discrete Outcomes with the Total Variation Distance

Models of discrete-valued outcomes are easily misspecified if the data e...
07/15/2021

### Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

Distribution function is essential in statistical inference, and connect...
01/26/2021

### Inferring serial correlation with dynamic backgrounds

Sequential data with serial correlation and an unknown, unstructured, an...
01/14/2020

### Graph-Fused Multivariate Regression via Total Variation Regularization

In this paper, we propose the Graph-Fused Multivariate Regression (GFMR)...
09/18/2020

### The Stein Effect for Frechet Means

The Frechet mean is a useful description of location for a probability d...
01/13/2021

### Concurrent Object Regression

Modern-day problems in statistics often face the challenge of exploring ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Regression analysis is a foundational technique in statistics aiming to model the relationship between response variables and covariates or predictor variables. Conventional regression models are designed for Euclidean responses and predictors

and include parametric models such as linear or polynomial regression and generalized linear models as well as various nonparametric approaches, such as kernel and spline smoothing. All of these models target the conditional expectation

.

In response to the emergence of new types of data, the basic Euclidean regression models have been extended to the case of non-Euclidean data, where a relatively well-studied scenario concerns manifold-valued responses. For instance, Chang (1989); Fisher (1995) studied regression models for spherical and circular data, while Shi et al. (2009); Steinke, Hein and Schölkopf (2010); Davis et al. (2010); Fletcher (2013); Cornea et al. (2017) investigated such models for the case of more general Riemannian manifolds. Also classical local regression techniques, such as Nadaraya-Watson smoothing and local polynomial smoothing, have been generalized to cover responses that lie on manifolds (Pelletier, 2006; Yuan et al., 2012; Hinkle, Fletcher and Joshi, 2014). In this paper, we extend the scope of these previous approaches and study the regression problem for response variables that are situated on a generic metric space. Due to the absence of rich geometric and algebraic structure, such as the existence of tangent bundles, this problem poses new challenges that go beyond the regression problem for the Euclidean or manifold case.

While regression with metric-space valued responses covers a wide range of random objects and therefore is of intrinsic interest, the literature on this topic so far is quite limited. Existing work includes Faraway (2014), who considered regression for non-Euclidean data by a Euclidean embedding using distance matrices, similar to multidimensional scaling, as well as intrinsic approaches by Hein (2009), who studied Nadaraya-Watson kernel regression for general metric spaces, and by Petersen and Müller (2019)

, who introduced linear and local linear regression for metric-space valued response variables and approached the regression problem within the framework of conditional Fréchet means.

Here we propose a novel regularization approach for nonparametric regression with metric-space valued response variables and a scalar predictor variable. We utilize a total variation based penalty of the fitted function, where we introduce an appropriate modification of the definition of total variation that covers metric-space valued functions in Section 3.1. Specifically, the inclusion of a total variation penalty term in the estimating equation for Fréchet regression leads to a penalized M-estimation approach for metric-space valued data. We refer to the proposed method as total variation regularized Fréchet regression or simply regularized Fréchet regression. While regularized Fréchet regression can be developed for any length space, we focus here primarily on the family of Hadamard spaces; see Section 3.1. This family includes the Euclidean space and forms a rich class of metric spaces that have important practical applications; see Sections 5 and 6 for more details.

Total variation regularization was introduced by Rudin, Osher and Fatemi (1992) for image recovery/denoising. There is a vast literature on this regularization technique from the perspective of image denoising and signal processing; see Chambolle et al. (2010) for a brief introduction and review. From a statistical perspective and for Euclidean data, this method was studied by Mammen and van de Geer (1997) from the viewpoint of locally adaptive regression splines and by Tibshirani et al. (2005), connecting it to the lasso. Recent developments along this line include trend filtering (Kim et al., 2009; Tibshirani, 2014) and total variation regularized regression when predictors are on a tree or graph (Wang et al., 2016; Ortelli and van de Geer, 2018). Extensions to manifold-valued data were investigated by Lellmann et al. (2013)

, although without asymptotic analysis. Total variation penalties were also shown to confer advantages for regression models in brain imaging

(Wang, Zhu and ADNI, 2017). We generalize these approaches to the case of data in a Hadamard space and provide a detailed asymptotic analysis for the proposed regularized Fréchet regression. While the extension of total variation regularization from Euclidean spaces to smooth manifolds is relatively straightforward, as one can take advantage of local diffeomorphisms between manifolds and Euclidean spaces, the generalization to Hadamard spaces, and especially the theoretical analysis, is considerably more challenging.

In Section 2 we show that the total variation regularized Fréchet estimator leads to a metric-space valued step function. The class of step functions is not only sufficiently powerful to approximate any function of finite total variation, but also advantageous in modeling functions that are discontinuous since it incorporates jumps of the function estimates, in contrast to classical smoothing methods that usually assume a smooth underlying regression function. The jump points of the step function are adaptively selected through just one regularization parameter, the penalty term for the total variation penalty. Incorporating jumps or discontinuities is of interest in many applications (Kolar and Xing, 2012; Zhu, Fan and Kong, 2014). Regularized Fréchet regression makes it possible to go beyond Euclidean spaces and to fit metric-space valued functions with jumps, as demonstrated in Section 6.

The structure of the paper is as follows. Total variation regularized Fréchet regression is introduced in Section 2, and asymptotic results are presented in Section 3. Numerical studies for synthetic data are provided in Section 4. In Sections 5 and 6 we illustrate the application of the proposed method to analyze data on the evolution of human mortality profiles using the Wasserstein distance on the space of probability distributions, and to study the dynamics of brain connectivity using task-related fMRI signals and the affine-invariant distance on the space of symmetric positive-definite matrices, respectively.

## 2 Regularized Fréchet Regression with Total Variation

Let be a Hadamard space (for the definition, see Section 3.1) and a random element in , where denotes the distance function on . When is a Euclidean space, which is a special Hadamard space, the expectation or mean of is an important concept to characterize the average location of . For a non-Euclidean metric space, we replace the mean with the Fréchet mean, which is an element of that minimizes the Fréchet function ; in the Euclidean case it coincides with the usual mean. In a general metric space with a given probability measure, the Fréchet mean might not exist, and even when it exists it might not be unique. However, for Hadamard spaces, the Fréchet mean exists and is unique when for some (Bhattacharya and Patrangenaru, 2003; Sturm, 2003; Afsari, 2011; Patrangenaru and Ellingson, 2015).

We consider a sequence of curves on parameterized by an interval . Without loss of generality, we assume throughout. Given independent observations at designated equidistant time points we denote their Fréchet means at the support points by

 μn(tn,i)=argminy∈MEd2(y,Y(tn,i)). (2.1)

We note that the assumption of equal spacing that we adopt here for simplicity is not essential, and the results can be easily extended to the non-equal spaced case.

In the sequel, we suppress indices at , and unless the dependence on needs to be emphasized. Our goal is to obtain a mean curve estimate from the given data pairs

by minimizing the loss function

 Lλ(μ)=1nn∑i=1d2(μ(ti),Yi)+λTV(μ),

where is the total variation of the curve , measured by its length as defined by eq. (3.1) below, and is a regularization parameter. The curve estimate is then

 ^μ∈argminTV(f)<∞Lλ(f), (2.2)

and the following result shows that it has a simple structure.

###### Proposition 1.

For any that minimizes , there is a step function such that for all and .

###### Proof.

It is clear that , where and . Define

 ^μ(t)={~μ(ti),t∈[0,1) and t∈[ti,ti+1),~μ(tn),t=1.

Then for . Also, from the definition, is constant over . Therefore, one finds . ∎

The above proposition shows that one can always choose a step function to minimize the loss function . In the following, we may therefore assume that is a step function. In practice, the regularization parameter can be chosen via cross-validation. For the computational implementation of , we adopt the iterative proximal point algorithm of Weinmann, Demaret and Storath (2014), who also showed that it is convergent for Hadamard spaces; further details are in appendix A.

## 3 Theory

### 3.1 Concepts and Tools from Metric Geometry

To state our main results, which will be presented in Section 3.3, we need to make use of various tools from metric geometry that are briefly reviewed here; additional details can be found in Appendix B, while a more comprehensive treatment is presented in Burago, Burago and Ivanov (2001).

#### Length and Geodesics

For a generic metric space and a closed interval , given a curve parameterized by on , i.e., , and a set consisting of points in , we use the quantity to define the length of , denoted by , which is given by

 |γ|=supP∈PRd(γ,P); (3.1)

here is the collection of subsets of whose cardinality is finite. The metric space is called a length space if , where the infimum ranges over all curves connecting and , i.e., and . A geodesic on is a curve such that for . The metric space is called a unique geodesic space if any pair of points can be connected by a unique geodesic , i.e., , and . The geodesic connecting and is denoted by .

Analogous to triangles in Euclidean spaces, a (geodesic) triangle with vertices on , denoted by , consists of three geodesic segments that connect to , to and to , respectively. A comparison triangle of in the Euclidean space is a triangle on formed by vertices such that , , and , where denotes the Euclidean distance. A geodesic triangle is said to satisfy the inequality if for all , ; see Figure 1 for a graphical illustration. A Hadamard space is a complete space. Note that a space is a unique geodesic space. Also, every Euclidean space is a Hadamard space.

#### Angles

The comparison angle between and at is defined by

 ¯∠pqr=arccosd2(p,q)+d2(p,r)−d2(q,r)2d(p,q)d(p,r). (3.2)

This is utilized to introduce the concept of (Alexandrov) angle between two geodesics and emanating from , which is denoted by or and defined by

 ∠pγη=limsups,t→0¯∠p(γ(s),η(t)).

Note that does not depend on the length of or . For three distinct points on , we define the angle .

### 3.2 Assumptions

For -valued functions and with domain , we will make use of the distances

 dn(f,g)={n−1n∑i=1d2(f(ti),g(ti))}1/2 (3.3)

to quantify the deviations of estimates from the targets. Denote by a class of -valued curves defined on such that if . Equipped with any of the distance functions , is a pseudometric space. Let be a collection of functions with , such that there exists a ball of radius with for all and ; we write .

For a subset of , the minimal number of balls of radius in to cover is denoted by . The covering number depends on , which in turn depends on the metric as per (3.3). Next we consider a family of Hadamard spaces, indexed by the constant , such that for all , and . We require the following regularity assumptions for the space and the Fréchet mean function (2.2).

(H1)

For some universal constants and , and .

We furthermore require the random objects to be uniformly sub-Gaussian, in the following sense, noting that the following assumption is not needed whenever the diameter of the space is bounded.

(H2)

There exist constants and such that

 E[exp{βdn(μn(tn,i),Yn,i)}]≤ζ<∞,

i.e., the random variables

are uniformly sub-Gaussian for all and .

To study the asymptotic properties of the estimate given in (2.2), we consider the distances , where is as in (2.1). The following auxiliary result will be useful.

###### Lemma 1.

Suppose (H1) and (H2) hold and that there exist universal constants and a functional with the following properties:

1. for all ,

2. for all , and ,

3. for all ,

4. the function is Lipschitz continuous with a Lipschitz constant no larger than for all and .

Then, with the choice , it holds that uniformly over the family and the classes indexed by , in the sense that

 limD→∞limsupn→∞supF∈FPF{dn(^μ,μn)>Dn−1/3}=0,

where denotes the probability measure induced by , and is the collection of probability distributions on that satisfy conditions (H1) and (H2).

To gain intuition about this result, first consider the Euclidean space, which is a special Hadamard space. Then can be taken as , where denotes the angle at in the triangle formed by the points , and the quantity in (2) of Lemma 1 is the projection of onto the line determined by and ; see Figure 2. The inequality in (3) of Lemma 1 concerns the convexity of the distance function with respect to the functional . With , the inequality becomes an equality and corresponds to the law of cosines. The last condition imposes Lipschitz continuity on and can be easily checked for . Further discussion of this lemma in the context of a general Hadamard space follows in the next subsection.

### 3.3 Main Results

In the following, we omit indices at , and to the extent possible. When is a Hadamard manifold, the Riemannian logarithmic map at , denoted by , is well defined for each point . By Toponogov’s comparison theorem, one has . In particular,

 d2(q,Yi)≥ ∥Logμ(ti)q−Logμ(ti)Yi∥2 ≥ ∥Logμ(ti)Yi∥2−2⟨Logμ(ti)q,Logμ(ti)Yi⟩+∥Logμ(ti)q∥2 = d2(μ(ti),Yi)−2d(μ(ti),q)d(μ(ti),Yi)cos∠(Logμ(ti)q,Logμ(ti)Yi) +d2(μ(ti),q).

Upon setting , the above inequality shows that the conditions (1) and (3) of Lemma 1 are satisfied with . In addition, according to Theorem 2.1 of Bhattacharya and Patrangenaru (2003), and hence for all in the tangent space at . This implies that for all . Therefore, condition (2) of Lemma 1 is also satisfied. Applying Lemma 3 and Corollary II.1A.7 of Bridson and Häfliger (1999) that asserts the equality of the (Alexandrov) angle and the angle

between the two tangent vectors

and , one can establish condition (4) of Lemma 1, and the conclusion of the lemma holds.

For a general Hadamard space that is not a Riemannian manifold, the above argument based on tangent spaces and Riemannian logarithmic maps no longer applies. A key observation that makes it possible to go beyond manifolds is the following characterization of Fréchet means of random objects in a Hadamard space.

###### Proposition 2.

Let be a random element with the Fréchet mean on a Hadamard space . Then for all .

For a general Hadamard space, one has

 d2(q,r)≥d2(p,q)+d2(p,r)−2d(p,q)d(p,r)cos∠p(¯¯¯¯¯pq,¯¯¯¯¯pr)

for all , where we recall that is the (Alexandrov) angle between the geodesics and . In particular,

 d2(q,Yi)≥ d2(μ(ti),Yi)+d2(μ(ti),q) −2d(μ(ti),q)d(μ(ti),Yi)cos∠μ(ti)(¯¯¯¯¯¯¯¯¯¯¯¯¯¯μ(ti)q,¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯μ(ti)Yi).

This suggests the choice

 L(μ(ti),Yi,q)=cos∠μ(ti)(¯¯¯¯¯¯¯¯¯¯¯¯¯¯μ(ti)q,¯μ(ti)Yi)≡∠μ(ti)(Yi,q),

where can be interpreted as the projection of the geodesic/direction onto the geodesic determined by and ; see Figure 3. Then Proposition 2 implies that for all . Therefore, conditions (1)–(3) of Lemma 1 are satisfied with . Verification of the Lipschitz condition (4) is nontrivial for a general Hadamard space. Using various properties of Hadamard spaces, we show in Lemma 2 that condition (4) in Lemma 1 is still satisfied with for a general Hadamard space. The following theorem summarizes our discussion of the convergence of the total variation regularized estimator . Its proof follows from the above considerations and is therefore omitted.

###### Theorem 3.1.

If conditions (H1) and (H2) are satisfied, then with , one has

 limD→∞limsupn→∞supF∈FPF{dn(^μ,μn)>Dn−1/3}=0,

where denotes the probability measure induced by , and is the collection of probability distributions on that satisfy conditions (H1) and (H2).

When is the one-dimensional Euclidean space , Donoho and Johnstone (1998) showed that the minimax rate is for the class of uniformly bounded variation; see also Sadhanala, Wang and Tibshirani (2016). Since contains the one-dimensional Euclidean space for the same class of functions, the rate in the above theorem is also the minimax rate for the family ; our result is thus a generalization of the minimax result of Donoho and Johnstone (1998) to Hadamard spaces.

In the following, we discuss two important examples that will also be further investigated in simulations and data applications.

###### Example 1 (Symmetric Positive-Definite Matrices).

Symmetric positive-definite (SPD) matrices as random objects arise in many applications that include computer vision

(Rathi, Tannenbaum and Michailovich, 2007), medical imaging (Dryden, Koloydenko and Zhou, 2009) and neuroscience (Friston, 2011)

. For example, diffusion tensor imaging, which is commonly used to obtain brain connectivity maps based on MRI, produces

SPD matrices that characterize the local diffusion (Zhou et al., 2016). The space of SPD matrices, denoted by , is a convex subset of the linear space of symmetric matrices. The Euclidean distance function that is based on the Frobenius norm suffers from the so-called swelling effect: The determinant of the average SPD matrix is larger than any of the individual determinants (Arsigny et al., 2007). Rectifying this issue motivates the use of more sophisticated distance functions, such as the Log-Euclidean distance (Arsigny et al., 2007) or the affine-invariant distance (Moakher, 2005, 2006; Pennec, Fillard and Ayache, 2006), where is the matrix logarithm of . The space equipped with either of these distance functions is a Hadamard space with curvature bounded from below. For the function class of -valued functions with bounded variation, using Proposition 4 in Appendix E, one finds that conditions (H1) holds for and . Therefore, the rate in Theorem 3.1 applies to this case.

###### Example 2 (Wasserstein space W2(R)).

Let be the space of probability distributions on the real line , equipped with the Wasserstein distance , where and

are the (left continuous) quantile functions corresponding to distribution functions

and . According to Proposition 4.1 of Kloeckner (2010), is a CAT(0) space. As inherits the completeness of , is also a Hadamard space. We illustrate the utility of for data analysis in a study of mortality profiles in Section 5. For the function class of Lipschitz continuous -valued functions defined on , using Proposition 3 in the Appendix and the fact that according to Theorem 2.7.5 of Van der Vaart and Wellner (1996), we can establish condition (H1), and since the space is bounded, the rate in Theorem 3.1 applies.

## 4 Simulation Studies

We consider two metric spaces in our simulation studies, namely, the SPD matrix space endowed with the affine-invariant distance in Example 1 with , and the Wasserstein space in Example 2. For each of these metric spaces, two settings are examined. In the first setting, the underlying mean functions are locally constant, while in the second setting, they smoothly vary with . Further details are given in Table 1. The first setting represents a favorable scenario for total variation regularized Fréchet regression, since the estimator is also locally constant, while the second setting is more challenging.

For each setting, we investigated two sample sizes, and for the design points . For the SPD matrix space, data were generated as with , where is as in Table 1, is a symmetric matrix and is its vector representation, i.e., the -dimensional vector obtained by stacking elements in the lower triangular part of , and denotes the identity matrix.

For the Wasserstein space, we adopted the method in Petersen and Müller (2019) to generate observations , as follows. Let and for , where again the distributions are as listed in Table 1 for the Wasserstein case. We then first sample and , with shape parameter and rate parameter . Note that and . Then is obtained by transporting the distribution by a transport map that is uniformly sampled from the collection of maps for . Note that

is not a Gaussian distribution due to the transportation. Nevertheless, one can show that the Fréchet mean of

is exactly .

The regularization parameter is chosen by five-fold cross-validation. The results are based on 100 Monte Carlo runs. The estimation quality of is quantified by the root integrated squared error (RISE)

 RISE(^μ)={∫Td2M(^μ(t),μ(t))dt}1/2.

The results in Table 2 indicate that as sample size grows, the estimation error decreases in both the favorable setting and the challenging setting. Moreover, we observe that the decay rate of the empirical RISE in the table, defined as the ratio of the RISE with and the RISE with , is approximately 0.62. This seems to agree rather well with our theory in Section 3 that suggests a rate of .

## 5 Application to Mortality

We apply the proposed method to analyze the evolution of the distributions of age-at-death using mortality data from the Human Mortality Database (www.mortality.org). The database contains yearly mortality for 37 countries, grouped by age from 0 to 110+. Specifically, the data provide a lifetable with a discretization by year, which can be easily converted into a histogram of age-at-death, one for each country and calendar year. Starting from these fine-grained histograms, a simple smoothing step then leads to the density function of age-at-death for a given country and calendar year. We focus on the adult (age 18 or more) mortality densities of Russia and the calendar years from 1959 to 2014. The time-indexed densities of age-at-death are shown in the form of a heat map in Figure 4(a) for males and Figure 5(a) for females, respectively. The patterns of mortality for males and females are seen to differ substantially.

Applying the proposed total variation regularized Fréchet regression for distributions as random objects with the Wasserstein distance to these data, we employ a fine grid on the interval and use five-fold cross validation to select the regularization parameter . The selected values are and for males and females, respectively. The resulting estimates are shown in Figure 4(b) and Figure 5(b), respectively.

The results suggest that the proposed total variation regularized Fréchet estimator adapts to the smoothness of the target function. For example, the female mortality dynamics seems relatively smooth, and the estimator is also rather smooth. In contrast, male age-at-death distributions exhibit sharp shifts; the proposed estimator reflects this well and preserves the discontinuities in the mortality dynamics. This demonstrates desirable flexibility of total variation penalized Fréchet regression, as it appropriately reflects relatively smooth trajectories, while at the same time preserving edges/boundaries when present. This flexibility has been documented previously for the Euclidean case (Strong and Chan, 2003), and is shown here to extend to the much more complex case of metric-space valued data.

Specifically, a major shift in mortality distributions occurred around 1992 and is well represented in the estimates for both males and females, with a much larger shift for males. The direction of the shift was towards increased mortality for both males and females, as the age-at-death distributions moved left, implying increased mortality at younger ages. A weaker shift that occurred in 2008 is also captured by the estimator for both males and females, and again is more expressed for males. This latter shift was towards decreased mortality.

These findings pinpoint a period from 1992-2008, during which the turmoil following the collapse of the Soviet Union 1988-1991 appears to have had devastating impacts on mortality. The strong shift in 1992 is relatively easy to explain with social ills such as increased alcoholism and joblessness that followed the collapse of the Soviet Union and it affected males more than females. The timing coincides with the early phase of a long economic decline in Russia that lasted for 11 years, 1989-1999. The somewhat weaker second shift around 2008 towards improved mortality is harder to explain. It might be a by-product of a long-term upwards trend in the Russian economy after 1999, which led to substantial declines in the poverty rate, and might signal a demographic end to the turmoil caused by preceding massive political and societal changes.

## 6 Application to Functional Connectivity

We applied the proposed total variation regularization method for metric random objects also to data on functional connectivity in the human brain from the Human Connectome Project that were collected between 2012 and 2015. Out of 970 subjects in the study, for subjects social cognition task related fMRI data are available. In this study, each participant was sequentially presented with five short video clips while in a brain scanner, which recorded a fMRI signal. Each clip showed squares, circles and triangles that either interacted in a certain way or moved randomly. The fMRI signals were recorded at 274 time points spaced 0.72 seconds apart. The starting time for the five video clips is approximately at time points 11, 64, 117, 169 and 222, respectively, while the ending time is approximately at time points 39, 92, 144, 197 and 250, respectively, so there are 10 time points where the nature of the visual input is changing for the study participants. A natural question is then whether changes in brain connectivity, as quantified by fMRI signals, are associated with the above time points that indicate changes in visual input.

We divided the brain into 68 regions of interest based on the “Desikan-Killiany” atlas (Desikan et al., 2006) and picked eight possible regions that are related to social skills, i.e., the left and right part of superior temporal, inferior parietal, temporal pole and precuneus (Green, Horan and Lee, 2015). The dynamics of functional connectivity for each subject is represented by the changing nature of the cross-covariance between these eight regions, computed by a moving local window that includes time points. Specifically, denoting by the vector of the BOLD (blood-oxygen-level dependent) fMRI signals of the th subject at the th time point, the connectivity at is computed by

 Σij=1Pj+P−1∑k=j−P(Vik−¯Vij)(Vik−¯Vij)Twith¯Vij=1Pj+P−1∑k=j−PVik.

We set and found that the results are not sensitive to the choice of as long as it is within the reasonable range . This led to a sequence of 243 time-indexed covariance (symmetric positive definite, SPD) matrices for each subject.

As we focus on the population dynamics of functional connectivity, we then computed for each time point the empirical mean SPD by

 ~Σj=argminΣ∈Sym+⋆(8)1850850∑i=1d2(Σ,Σij),

where is the affine-invariant distance (Moakher, 2005) on . This sequence then constitutes the observed time-indexed random objects and is depicted in Figure 6(a). For illustration purposes, in Figure 6 each SPD matrix is vectorized into an (taking symmetry into account) dimensional vector represented by a row in the heat map, indicating the relative values of the vector elements.

This mean SPD random object sequence is quite noisy and does not clearly indicate whether the mean brain connectivity changes in accordance with the transition points of the visual input as described above. Thus, to gain insight whether the pattern of brain connectivity follows the pattern of visual inputs, it is necessary to denoise these data. Assuming constant brain connectivity while the visual input is constant (video on or off), this motivates the fitting of locally constant functions with a few knots for SPD random objects and thus the application of the proposed total variation penalized Fréchet regression. This is due to the fact that the proposed estimator can be viewed as a locally constant function in time with adaptive knot placement, mapping time into metric space, in our case the space of SPD matrices.

It then seems reasonable to assume that the underlying dynamics of the brain connectivity for subjects in this study, represented by , is piece-wise constant with 11 steps where the function is constant, or equivalently, locations where a discontinuity occurs when the video input moves from on to off or off to on, where such a transition occurs 10 times during the scan. We refer to these locations as jump points, and they can be formally identified by , where . While is estimated by , the locations in can be estimated by

 K(^μ)={^τn,1<⋯<^τn,J},{^τn,1,…,^τn,J}⊂{tn,1,…,tn,n}. (6.1)

In Appendix D, we show that these jump points can be estimated with a rate of .

When applying total variation regularized Fréchet regression, one has to select the regularization parameter . Generally, we recommend to use cross-validation. However, in the particular application at hand, since we may assume that is known, one can simply choose the smallest value of that yields jumps of . As is used to compute the empirical sequence , the sequence does not contain sufficient information about the start time point of the first video clip, which is . Therefore, we target and choose the smallest value of that yields .

Practically, we performed the proposed total variation regularization for the SPD case on the sequence for different choices of the regularization parameter on a fine grid within the interval . Panels (b) – (i) in Figure 6 display the resulting estimates. For each panel, the minimal value of the regularization parameter was chosen so that the number of jump points ranged from 9 (smaller ) to 2 (larger ), respectively. From Figure 6(b), where one has 9 jump points of , we find that the detected jump points closely matches the times when the videos clips started and ended, with the exception of time points 11 and 250, which is due to insufficient data between these first and last events and the respective boundaries, and the event at time point 197, which is split into two jump points, at time points 181 and 202. As increases, the number of jump points of the estimates decreases.

In panel (i) of Figure 6, there are only two jump points left, at time points 42 and 64. This suggests that changes in the fMRI signal caused by early events are more pervasive than those at later events, which is also in line with the fact that the video transition at time point 197 gave rise to two estimated jump points, located slightly before and after. These findings might be due to a stronger brain reaction to the stimulus when the video clip is presented early on in the recording sequence, with subsequent attenuation.

This example demonstrates that changing the penalty can be used as a tool to determine a hierarchy of jump points with the more pronounced jump points persisting even when large penalties are applied. Remarkably, the location of the estimated jump points is hardly affected by the size of the penalty in this example. This finding is in line with the result in Theorem D.1, which indicates that if there is a jump point in the underlying true function, it will be consistently estimated with a fast rate of convergence.

## 7 Concluding remarks

The modeling of functions that take values in Hadamard spaces is the focus of this paper, and such functions are encountered in many applications, as our examples demonstrate. Nevertheless, there are also metric-space valued data that are relevant in some applications but do not reside in a Hadamard space, such as time-indexed compositional data or flight paths modeled on spheres (Dai and Müller, 2018). Such data can be viewed as residing in Alexandrov spaces, which are metric spaces with a lower positive bound on curvature. It is important to note that the proposed total variation regularized Fréchet regression also is theoretically supported for the case of Alexandrov spaces. While a comprehensive treatment of total variation penalized Fréchet regression on Alexandrov spaces is beyond the scope of this paper, we show in Appendix C that the principle of total variation regularization can be adopted for such spaces, with theoretical guarantees, thus broadening the applicability of the proposed methodology further; however, there is currently no guarantee for the convergence of the algorithm described in Appendix A for the case of Alexandrov spaces, and this issue is left for future research.

In Section 6 we briefly discussed the potential of the proposed method to model jump points in functions that take values in Hadamard spaces. This approach is particularly useful if one deals with discontinuities for which the number of such discontinuities in the data is known, as is the case in the example concerning brain connectivity. Connections to the much wider topic of change-points and their estimation and inference are beyond the scope of this paper.

## Appendix A Computational Details

To compute the total variation regularized estimator defined in (2.2), we adopt a simplified version of the cyclic proximal point algorithm proposed by Weinmann, Demaret and Storath (2014). To find the step function estimator according to Proposition 1, noting that , it is sufficient to compute for . This is achieved by minimizing the function

 ~Lλ(p1,…,pn)=12n∑i=1d2(pi,Yi)+nλ2n−1∑j=1d(pj,pj+1)

over the product space . For and , the family of proximal mappings of is defined by

 proxαGp=argminq∈Mn(αG(q)+n2d2n(p,q)),

where is a parameter and . It is easy to check that the th component of is with , where denotes the point sitting on the geodesic segment connecting and that satisfies .

For the proximal mappings of the function , given by

 proxαHjp=argminq∈Mn(αHj(q)+n2d2n(p,q)),

one finds that if , then the th component of is equal to . It is shown in Weinmann, Demaret and Storath (2014) that the th component of is given by , while the th component is , where and that the algorithm converges to the minimizer of for Hadamard spaces.

The computational details are summarized in Algorithm 1, where the symbol denotes the assignment or update operator, evaluating the expression on the right hand side and then assigning the value to the variable on the left hand side. The algorithm is shown to be convergent for Hadamard spaces (Weinmann, Demaret and Storath, 2014).

## Appendix B Curvature, Directions and Derivatives

#### Curvature

To discuss curvature of a given metric space, a standard approach is to compare geodesic triangles on the metric space to those on the following reference spaces :

• When , with the standard Euclidean metric;

• When , is the hyperbolic space with the hyperbolic distance function