 # A Riemannian Corollary of Helly's Theorem

We introduce a notion of halfspace for Hadamard manifolds that is natural in the context of convex optimization. For this notion of halfspace, we generalize a classic result of Grünbaum, which itself is a corollary of Helly's theorem. Namely, given a probability distribution on the manifold, there is a point for which all halfspaces based at this point have at least 1/n+1 of the mass. As an application, the gradient oracle complexity of convex optimization is polynomial in the parameters defining the problem.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Overview

### 1.1 Introduction

The extrema of functions are of fundamental importance in mathematics and its applications. Much of numerical optimization studies this topic. Most of the theory focuses on convex functions, as it has proven hard to find other classes that are both useful and tractable. The motivation for this paper comes from the desire to expand the boundaries of this class of tractable functions.

Rigorous study of convergence rates was initiated in  for first order methods for convex functions on Hadamard manifolds. That is, gradient descent methods for simply connected manifolds of nonpositive sectional curvature. Such manifolds are diffeomorphic to and exhibit natural convex functions; so in a sense they give new classes of functions for which optimization is tractable.

Still, up to this point as far as the author is aware, all known algorithms for general convex optimization on Riemannian manifolds have iteration complexity depending polynomially on . To achieve better convergence rates, further conditions are added such as strong convexity, dominated gradients, or recently robust second-order , , . One major unresolved question for interesting Hadamard manifolds like is, does convexity enable algorithms whose time complexity depends polynomially on ?

For Euclidean optimization, cutting plane methods are the standard, general approach to get

complexity. They are based on reducing the feasible set by halfspace cuts indicated by the gradient of the convex function. An important feature of this approach is that Euclidean convex sets have what are commonly termed centerpoints; roughly speaking, all hyperplanes based at a centerpoint are approximately balanced. Ellipsoid methods explicitly maintain a radially symmetric set, so hyperplanes are exactly balanced. More generally, Grünbaum’s result in

 shows that by computing the gradient at a centerpoint, we may reduce the volume of the feasible set by a factor.

Here, we replicate the result of Grünbaum in the more general setting of Hadamard manifolds in the hope that others find the result encouraging, useful, or intrinsically interesting. The main result is from Section 2,

###### Theorem 1.

Suppose a subset of Hadamard manifold is convex and compact, and is a probability distribution on that is absolutely continuous with respect to the Riemannian volume measure. Then there exists a -centerpoint for the measure .

This leads to a bound on the number of gradient calls needed to optimize a function.

###### Theorem 2.

Suppose a subset of Hadamard manifold is convex and compact and also a convex -Lipshitz function . Additionally assume the optimum is in the -interior of , in that . Then it is possible to find a point such that using gradient calls.

### 1.2 Definitions and Notation

In this section we create the definitions needed to frame the problem and results. Only basic notions of Riemannian geometry are needed in this paper; these are surveyed in Appendix A, with the present section mainly providing non-standard or less common definitions.

For the remainder of this paper, we are generally interested in triples . is always an -dimensional Hadamard manifold with Riemannian metric , whose inner product at we will denote by . We additionally work with a probability measure defined on . Usually it is the Riemannian volume measure with support restricted to a subset. The metric produces geodesics, which are locally distance minimizing paths, and are the analog of straight lines in Euclidean space. The exponential map at follows geodesics starting at . As explained in the Riemannian geometry overview, is a diffeomorphism from to when is a Hadamard manifold.

###### Definition 3.

is convex on its convex domain if its restrictions to geodesics are convex.

As explained in Appendix A, for differentiable functions, this is equivalent to for all

 f(y)≥f(x)+t⟨∇f(x),^v⟩x

where for the unit length tangent . We have adopted the notation for the Riemannian gradient of .

Our primary object of interest is a halfspace.

###### Definition 4.

An open halfspace based at is formed by applying to a halfspace of ,

 Hx(^v)={expx(v)|⟨v,^v⟩x<0}

Although such halfspaces are not convex sets in the general setting of Hadamard manifolds, they arise naturally as “cutting planes” for convex functions. This notion of cutting is justified by the following,

###### Lemma 5.

Consider convex function where is a convex subset of Hadamard manifold . Then for any , the optimum of within is either obtained at or within .

###### Proof.

If , the corresponding satisfies

 ⟨v,∇f(x)⟩x≥0

and we have

 f(y)≥f(c)+⟨∇f(x),v)⟩x≥f(x)

Cutting plane methods need to find a point for which no halfspace based at that point has too much volume. This can be captured through the notion of a centerpoint.

###### Definition 6.

A -centerpoint of the probability measure on a manifold is a point such that

 p(Hc(^v))≤1−β

for all .

Finally, a few notations that recur.

• is the open ball of Riemannian radius , based at

• is used for the Riemannian volume

### 1.3 Overview and Conclusion

The remainder of this paper is organized as follows:

• Section 2 analyzes the existence of centerpoints on Hadamard manifolds.

• Section 3 presents the brief application of the above to upper bound gradient oracle complexity.

• Appendix A recalls the relevant notions of Riemannian geometry, providing references.

To be clear, the problem of constructing an efficient optimization procedure is far from resolved. However, the interesting consequence of Helly’s theorem does carry over to the more general setting of Hadamard manifolds, showing that in some sense there is not an information theoretic obstacle to developing a Hadamard manifold analog to cutting plane methods.

We hope our main result is of interest and encourages others to study centerpoints in the manifold setting. Targeting optimization procedures, we believe focusing on the spaces would be of greatest interest, both for theory and applications. Speaking informally, computing a centerpoint from a discrete point set would be a notable advancement. It would also be useful to be able to sample from the Riemannian volume restricted to a convex subset.

## 2 Existence of Centerpoints

Helly’s Theorem is not classicaly part of cutting plane methods in , because the centroid is an approximately optimal point at which to call the separation oracle. However, all proofs we are aware of critically use the Brunn-Minkowski Theorem, whose manifold analogs do not seem suitable for this application. However, Helly’s Theorem is somewhat the opposite, working in situations in which the distance function is convex.  and  have proofs that amount to:

###### Theorem 7.

Let be an -dimensional Riemannian manifold of nonpositive sectional curvature. Suppose that there exists a convex compact set with closed convex set family . Then if for any sets, , it follows that

The paper  actually proves this result for geodesic spaces.

That the halfspace notion of Definition 4 is convex in few situations limits the applicability of this generalization of Helly’s Theorem. The remainder of this section proves a result that could be a considered a Riemannian variant of a well-known corollary of Helly’s theorem, whose classical proof relies upon the convexity of halfspaces. The result can be found in , which we summarize as

###### Lemma 8.

A -centerpoint exists for any compactly supported probability measure on , endowed with the usual Borel -algebra.

To generalize this result, we rely on a few simple regularity properties of sets of Euclidean centerpoints, which we now collect. In the following lemma, the halfspaces are Euclidean halfspaces, and is the Hausdorff distance.

###### Lemma 9.

Let be a family of compactly supported probability measures on that are absolutely continuous with respect to Lebesgue measure. Assume the index set is compact, and the measures vary continuously with respect to total variation distance. Define the Euclidean centrality function

 G(x,y):=sup^v∈Sn−1px(Hy(^v))

Then is continuous and is a quasi-convex function for a fixed . Also define

 Ux:={y∈Rn|G(x,y)∈(0,1n+1]}

Then as ,

###### Proof.

Each is a halfspace. Indeed, there is a unique halfspace with normal that contains of the mass of , and the previous set is precisely the points contained in this halfspace. Therefore the intersection over all is a convex set. This shows that preimages under of sets are convex, which is the definition of quasi-convex.

To prove continuity, we may assume the domain of is a compact set . It is easy to see that is a continuous function on its domain . Therefore, in particular, choose so that when and , then

 |G(x,y)−G(x′,y′)|<ϵ

By compactness in the last argument, we may let , and therefore

 G(x,y)−G(x′,y′)=g(x,y,^vx,y)−g(x′,y′,^vx′,y′)>g(x,y,^vx,y)−(g(x,y,^vx′,y′)+ϵ)>−ϵ

Switching roles gives the reverse inequality, , which proves continuity.

Recall the Hausdorff distance is the maximum distance from one set to the other. Therefore for the final observation, the alternative is that there exists a sequence of points with that are bound away from . Compactness implies an accumulation point . Continuity of requires , because each . This contradicts the premise that are bound away from .

We are now ready for our main result.

###### Proposition 10.

Let be a probability measure on a convex and compact subset of a Hadamard manifold . Further assume is absolutely continuous with respect to the Riemannian volume measure. Then there exists a -centerpoint for contained in .

Before going into the proof details, here is conceptual overview of the proof. We will define a continuous function from to itself, and an application of Brouwer’s theorem will show there is a fixed point. We design so that the fixed point is a -centerpoint. To do this, we adopt normal coordinates at and pull back the measure from (i.e. the measure of is . In these coordinates, there is a Euclidean-convex set of Euclidean centerpoints provided by the previous lemma, for the pulled-back measure. We select the closest of these centerpoints to and denote this point by . is then defined by projecting onto . As stated precisely in the appendix, it is the Hadamard assumption that implies a strictly convex distance function, making this projection possible.

The technical part of the proof mostly involves showing continuity of , as it is not hard to show that fixed points are -centerpoints. Lemma 9 provides the needed tools. The main obstacle is to show that varies continuously. To establish this, we note that the pulled back measures vary continuously with respect to total variation. Then the lemma shows that the Euclidean centerpoint sets , are close in Hausdorff distance, provided are close. Combining this with convexity of the centerpoint sets, we are able to make small.

We now provide the details.

###### Proof.

Fix an orthonormal frame on , so as to determine normal coordinate charts at each , defined by . Because it is absolutely continuous with respect to the Riemannian volume measure, the measure pulls back under these coordinate charts to measures we denote by .

Identifying with a subset of through a chart , Lemma 9 shows that there is a nonempty closed convex set of Euclidean -centerpoints. There is a unique point that is closest to . However, it is not necessarily the case that is inside , because the latter is not convex. To work around this, project onto . That is,

 f(x):=π(ux):=argmins∈Sd(s,ψx(ux))

That the projection is well-defined and continuous can be found in Corollary 5.6 of . Therefore is well-defined. Moreover, the exponential map provides a homeomorphism between a compact convex set with non-empty interior, and the closed -ball. A proof of this is in the subsequent Lemma 11. So by using Brouwer’s fixed point theorem for continuous functions on a closed topological ball, it is sufficient to verify the following properties

• If , then is a -centerpoint

• is continuous

We first show that fixed points are centerpoints. One of the key properties of normal coordinates at , is that and geodesics through appear as lines. As a consequence, is the Euclidean halfspace . Therefore if is a -centerpoint for , then and . Conversely, suppose is not a -centerpoint but is also fixed. The only way this could happen is if , and also is nonempty. Taking in the latter intersection, the geodesic between is contained in . Triangle inequalities (or as an alternative on Hadamard manifolds, convexity of the distance function) show that, initially, moving from to along the geodesic decreases the distance to . This means it is not the case that .

Next we consider the continuity claim. Once we show is continuous, then is as well, because the projection is continuous. As a first step, we remark that the pull-back probability densities vary continuously with respect , because they are defined by smoothly varying functions (the frame and exponential map). Since is compact, there is so that implies . This establishes continuity for the family of measures , with respect to total variation distance. We can now make use of the regularity properties provided by Lemma 9.

From the lemma’s last part, by requiring , one can ensure . Let be the point closest to ; this ensures . It is also easy to see that . Therefore . Critically, the Euclidean distance to the origin is strongly convex and minimizes it on the Euclidean convex set , which also includes . Therefore, qualitatively, since and are close, we know that is small. Making this quantitative through the Euclidean law of cosines,

 |ux−hx|2≤|hx|2−|ux|2=(|hx|−|ux|)(|hx|+|ux|)<ϵR

where a sufficiently large can be taken to be twice the diameter of . We conclude

 |ux−ux′|≤|ux−hx|+|hx−ux′|<√ϵR+ϵ

###### Lemma 11.

If is compact, convex, and has non-empty interior, then there is a homeomorphism from to the closed -ball, .

###### Proof.

We will first define the map, and then prove it is a homeomorphism. Fix a point in the interior of . The function is to be given by

 t(v)=sup{t:expx0(tv)∈S}

And then the homeomorphism is , given by

 f(v)=expx0(t(v)|v|v)

By compactness and convexity of , it is clear that is well-defined and bounded from above and below. Therefore is well defined and bijective. Because the domain and codomain are compact and Hausdorff respectively, it remains to show that is continuous. This follows once we show that is continuous, which we will now establish.

For a given sequence , we must also show that . Consider the supporting halfspace to based at . If , then eventually the sequence would lie in and therefore outside , contradicting the definition of .

For contradiction, we assume that . That way, we have a subsequence for which approaches for some . However, is an interior point, and therefore has a small open ball containing it of radius . The hull formed by geodesics between points in and is full dimensional, contains , and its only boundary point is . It therefore also contains the later points in the sequence . This contradicts that these points are contained in . We conclude that completing the proof.

## 3 Upper Bound on Needed Gradient Calls

The proof of Theorem 2 is now a rather straightforward consequence.

###### Lemma 12.

Suppose is convex on convex and -Lipschitz, and is Hadamard. Additionally assume the optimum is in the -interior of . Now suppose a sequence of cutting planes with are used, leaving

 m(S′:=S∩iHci(^vi))<(ϵ/L)nnn

Then one of the satisfies .

And now to complete the proof of the application,

###### Proof.

The main fact to be established is that . By comparison methods, this geodesic ball has as much volume as a geodesic ball of the same radius in Euclidean space. A reference justifying this is included in Appendix A

. Using a rough estimate for the volume of a Euclidean ball, we deduce

 m(b(ϵL))>(ϵ/L)nnn>m(S′)

It follows that does not contain some ; let . From Lemma 5, . The Lipschitz bound on then gives

 f(ci)−f(x∗)≤f(x′)−f(x∗)≤ϵ

###### Proof for Theorem 2.

Lemma 12 shows that one of the origins of the cuts is from optimal for the function as soon as the remaining set has volume .

Prop 10 applied to the Riemannian density on shows that we may choose the cut centers to be centerpoints for the remaining set , so that the volume is reduced by a factor each cut. This means the number of iterations needed is . ∎

## 4 Acknowledgments

Many people have listened patiently and offered advice which has been helpful for this work. In particular, many thanks to Alex Appleton, Richard Bamler, Andrew Hanlon, Suvrit Sra, and Nikhil Srivastava.

## Appendix A Riemannian Overview

We will be working in the setting of Riemannian geometry, but will not use much machinery. We provide an informal overview. The definitions we introduce here are generally standard and formalized in introductory texts, one such being .

An -dimensional (smooth) manifold can be understood as a space that is locally diffeomorphic to , so we identify these subsets of with coordinates . This allows us define smooth curves , by requiring its coordinate representation to be smooth. We may define velocities by associating them with , leading to the notion of the tangent spaces and their union, the tangent bundle .

Riemannian manifolds additionally specify a metric for measuring the size of these velocities, by defining an inner product on each tangent space. This immediately enables the definition of curve length, as . It also gives a method of measuring volume; if is the bilinear form for the metric in a local coordinate choice, then can be used to find Riemannian volumes. For smooth , although usually means the covariant derivative of , which coincides with the differential (or pushforward) , we use it to mean the gradient of . The gradient is defined by duality using the metric, satisfies .

It also turns out to be helpful to compute directional derivatives for vector fields (or acceleration along curves). Requiring a few natural conditions leads to a unique Riemannian connection

determined by the metric. It is known as the Levi-Civita connection. In the coordinates of a local frame for ,

 ∇eiej=Γkijek

where are the Christoffel symbols. When the acceleration of a curve is , i.e. , we say that curve is a geodesic. This is a second order nonlinear ODE system for ,

 ¨xk(t)+˙xi˙xjΓkij(x(t))=0

A unique solution will exist locally provided we specify the initial position and velocity.

###### Definition 13.

We say that is convex if points are joined by a unique geodesic contained within , which is also distance minimizing.

Because of the geodesic equation, is determined by its initial position and velocity . Then the exponential map can be defined as , its domain being the for which the ODE solution exists for a unit time. We will actually further restrict to Hadamard manifolds, which will be defined soon; the exponential map is globally defined for these manifolds.

One critical consequence of the metric is that Riemannian manifolds are not locally equivalent to with the Euclidean metric. The Riemann curvature endomorphism is introduced to provide a local characterization and measure the deviation from . It takes as inputs 3 vector fields and outputs a vector field,

 R(X,Y)Z=∇X∇YZ−∇Y∇XZ−∇[X,Y]Z

where

is the Lie bracket for vector fields. Although the curvature endomorphism has intuitive geometric meaning, it is often more helpful to derive certain quantities from it. The sectional curvature assigns a scalar value to the 2-plane spanned by orthonormal

,

 K(v,w)=⟨R(v,w)w,v⟩p

More concretely, this quantity is the Gaussian curvature (product of the two principle curvatures) of the surface generated by the 2-plane. Sectional curvature lower and upper bounds enable generalizations of Euclidean tools like ball volume and triangle trigonometry estimates. These methods introduce notions like Jacobi fields and shape operator to quantitatively characterize the effect of curvature. One important result along these lines, which we use, is the Bishop-Gromov volume comparison theorem. Although usually stated for its volume upper bound by assuming just a lower bound on curvature, it is understood that the proof also provides a lower volume bound. A quite direct path to the result can be found in the lecture notes ,

###### Theorem 14.

Bishop-Gromov Let be complete, and small enough so that does not contain any of the cut points of . Then provided the sectional curvatures are bound between ,

 H(r)≤m(bp(r))≤S(r)

where are the volumes of balls in the model spaces of constant , curvature respectively.

Hadamard manifolds are simply connected manifolds of nonpositive sectional curvature. They have been extensively studied in mathematical literature. We collect two commonly used facts which we made use of or provide intuition. For Hadamard manifolds,

• The exponential maps are diffeomorphisms from to (Cartan-Hadamard theorem)

• The square of the distance to a point, , is strictly convex

• Geodesics between points are unique and distance minimizing

• Projection onto closed, convex sets is well defined and continuous

All of these can be found in . In particular, their Corollary 5.6 proves the last.

One additional fact, mentioned in the introduction, is the generalization of the idea that convexity along lines implies supporting planes to the graph of the function. We provide a short justification for this simple fact, in case it helps to work through the definitions.

###### Lemma 15.

Suppose is convex along geodesics, on a convex domain. Then

 f(y)≥f(x)+⟨∇f(x),exp−1x(y)⟩x
###### Proof.

Let . That is convex on geodesics means is convex in , so

 f(y)≥f(x)+t0f(expx(tv))′(0)

But using the chain rule and that

(see ),

 f(expx(tv))′(0)=df(dexpx|0(tv))=df(v)=⟨∇f(x),v⟩x

## References

•  W. Ballmann. Lectures on Spaces of Nonpositive Curvature. Springer, Berlin, Germany, 1995.
•  A. Freire. Volume estimates, 2012.
•  B. GrĂźnbaum. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific J. Math., 10(4):1257–1261, 1960.
•  Sergei Ivanov. On Helly’s theorem in geodesic spaces. Electronic Research Announcements, 21(1935-9179-2014-0-109):109, 2014.
•  Y. Ledyaev, J. Treiman, and J. Zhu. Helly’s intersection theorem on manifolds of nonpositive curvature. Journal of Convex Analysis, 13(3-4):785–798, 2006.
•  J. Lee. Riemannian Manifolds: An Introduction to Curvature. Springer, Berlin, Germany, 1997.
•  Hongyi Zhang, Sashank J. Reddi, and Suvrit Sra. Riemannian svrg: Fast stochastic optimization on riemannian manifolds. In Advances in Neural Information Processing Systems 29, pages 4592–4600. Curran Associates, Inc., 2016.
•  Hongyi Zhang and Suvrit Sra. First-order methods for geodesically convex optimization. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, pages 1617–1638, 2016.
•  Z. Allen Zhu, A. Garg, Y. Li, R. Oliveira, and A. Wigderson. Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing. STOC, 50, june 2018.