1 Concentration of the barycentre
Let be a probability distribution on a complete Riemannian manifold . A (Riemannian) barycentre of is any global minimiser of the function
The following statement is due to Karcher, and was improved upon by Afsari  : if is supported inside a geodesic ball , where and ( the convexity radius of ), then is strongly convex on , and has a unique barycentre .
On the other hand, the present work considers a setting where is not supported inside , but merely concentrated on this ball. Precisely, assume is equal to the Gibbs distribution
where is a normalising constant, is a function with unique global minimum at , and is the Riemannian volume of . Then, let denote the function in (1), and let denote any barycentre of .
In this new setting, it is not clear whether is differentiable or not. Therefore, statements about convexity of and uniqueness of are postponed to the following Section 2. For now, it is possible to state the following Proposition 1. In this proposition, denotes Riemannian distance, and denotes the Kantorovich (-Wasserstein) distance . Moreover, is any open interval which contains the spectrum of the Hessian , considered as a linear mapping of the tangent space .
assume is an -dimensional compact Riemannian manifold with non-negative sectional curvature. Denote the Dirac distribution at . The following hold,
(i) for any ,
(ii) for (which can be computed explicitly)
where in terms of the Beta function.
Proposition 1 is motivated by the idea of using as an approximation of . Intuitively, this requires choosing so small that is sufficiently close to . Just how small a may be required is indicated by the inequality in (4). This inequality is optimal and explicit, in the following sense.
It is optimal because the dependence on in its right-hand side cannot be improved. Indeed, by the multi-dimensional Laplace approximation (see , for example), the left-hand side is equivalent to (in the limit ). While this constant is not tractable, the constants appearing in Inequality (4) depend explicitly on the manifold and the function . In fact, this inequality does not follows from the multi-dimensional Laplace approximation, but rather from volume comparison theorems of Riemannian geometry .
In spite of these nice properties, Inequality (4
) does not escape the curse of dimensionality. Indeed, for fixed, its right-hand side increases exponentially with the dimension (note that decreases like ). On the other hand, although also depends on , it is typically much less affected by dimensionality, and decreases slower that as increases.
2 Convexity and uniqueness
Assume now that is a simply-connected, compact Riemannian symmetric space. In this case, for any , the function turns out to be throughout . This results from the following lemma.
let be a simply-connected compact Riemannian symmetric space. Let be a geodesic defined on a compact interval . Denote the union of all cut loci for . Then, the topological dimension of is strictly less than . In particular, is a set with volume equal to zero.
Remark : the assumption that is simply-connected cannot be removed, as the conclusion does not hold if is a real projective space.
The proof of Lemma 1 uses the structure of Riemannian symmetric spaces, as well as some results from topological dimension theory  (Chapter VII). The notion of topological dimension arises because it is possible is not a manifold. The lemma immediately implies, for all ,
Then, since the domain of integration avoids the cut loci of all the , it becomes possible to differentiate under the integral. This is used in obtaining the following (the assumptions are the same as in Lemma 1).
for , let and , where is the function . The following integrals converge for any
and both depend continuously on . Moreover,
so that is throughout .
where for positive . The reader may wish to note the fact that decreases to as decreases to .
let be a simply-connected compact Riemannian symmetric space. Let be the maximum sectional curvature of , and its convexity radius. If (see (ii) of Proposition 1), then the following hold for any .
(i) for all in the geodesic ball ,
where and is a constant given by the structure of the symmetric space .
(ii) there exists (which can be computed explicitly), such that implies is strongly convex on , and has a unique global minimum . In particular, this means is the unique barycentre of .
3 Finding and
Propositions 1 and 2 claim that and can be computed explicitly. This means that, with some knowledge of the Riemannian manifold and the function , and can be found by solving scalar equations. The current section gives the definitions of and .
In the notation of Proposition 1, let be small enough, so that,
whenever , and consider the quantity
where is defined as in (6). Note that decreases to as decreases to , for fixed and . Now, it is possible to define as
Here, for , and , where is the surface area of a unit sphere .
4 Black-box optimisation
Consider the problem of searching for the unique global minimum of . In black-box optimisation, it is only possible to evaluate for given , and the cost of this evaluation precludes numerical approximation of derivatives. Then, the problem is to find using successive evaluations of (hopefully, as few of these evaluations as possible).
Here, a new algorithm for solving this problem is described. The idea of this algorithm is to find using successive evaluations of , in the hope that will provide a good approximation of . While the quality of this approximation is controlled by Inequalities (3) and (4) of Proposition 1, in some cases of interest, is exactly equal to , for correctly chosen , as in the following proposition 3.
To state this proposition, let denote geodesic symmetry about (see ). This is the transformation of , which leaves fixed, and reverses the direction of geodesics passing through .
assume that is invariant by geodesic symmetry about , in the sense that . If (see (ii) of Proposition 2), then is the unique barycentre of .
Proposition 3 follows rather directly from Proposition 2. Precisely, by (ii) of Proposition 2, the condition implies is strongly convex on , and . Thus, is the unique stationary point of in . But, using the fact that is invariant by geodesic symmetry about , it is possible to prove that is a stationary point of , and this implies .
The two following examples verify the conditions of Proposition 3.
Example 1 : assume is a complex Grassmann manifold. In particular, is a simply-connected, compact Riemannian symmetric space. Identify with the set of Hermitian projectors such that , where denotes the trace. Then, define for , where
is a Hermitian positive-definite matrix with distinct eigenvalues. Now, the unique global minimum ofoccurs at , the projector onto the principal
-subspace of . Also, the geodesic symmetry is given by , where denotes reflection through the image space of . It is elementary to verify that is invariant by this geodesic symmetry. Example 2 : let be a simply-connected, compact Riemannian symmetric space, and a function on with unique global minimum at . Assume moreover that is invariant by geodesic symmetry about . For each , there exists an isometry of , such that . Then, has unique global minimum at , and is invariant by geodesic symmetry about .
Example 1 describes the standard problem of finding the principal subspace of the covariance matrix . In Example 2, the function is a known template, which undergoes an unknown transformation , leading to the observed pattern
. This is a typical situation in pattern recognition problems.
Of course, from a mathematical point of view, Example 2 is not really an example, since it describes the completely general setting where the conditions of Proposition 3 are verified. In this setting, consider the following algorithm.
Description of the algorithm :
– input : % to find such , see Section 3
% symmetric Markov kernel
% initial guess for
– iterate : for
(3) reject with probability % then,
(4) % see definition (10) below
– until : does not change sensibly
– output : % approximation of
The above algorithm recursively computes the Riemannian barycentre of the samples generated by a symmetric Metropolis-Hastings algorithm (see ). Here, The Metropolis-Hastings algorithm is implemented in lines (1)--(3). On the other hand, line (4) takes care of the Riemannian barycentre. Precisely, if is a length-minimising geodesic connecting to , let
This geodesic need not be unique.
The point of using the Metropolis-Hastings algorithm is that the generated eventually sample from the Gibbs distribution . The convergence of the distribution of to takes place exponentially fast. Indeed, it may be inferred from  (see Theorem 8, Page 36)
where is the total variation norm, and verifies
so the rate of convergence is degraded when is small.
Accordingly, the intuitive justification of the above algorithm is the following. Since the eventually sample from the Gibbs distribution , and the desired global minimum of is equal to the barycentre of (by Proposition 3), then the barycentre of the is expected to converge to .
It should be emphasised that, in the present state of the literature, there is no rigorous result which confirms this convergence . It is therefore an open problem, to be confronted in future work.
For a basic computer experiment, consider and let
where is the Legendre polynomial of degree . The unique global minimiser of is , and the conditions of Proposition 3 are verified, since is invariant by reflection in the axis, which is geodesic symmetry about .
Figure 2 shows the dependence of on , displaying multiple local minima and maxima. Figure 2 shows the algorithm overcoming these local minima and maxima, and converging to the global minimum , within iterations. The experiment was conducted with , and the Markov kernel obtained from the von Mises-Fisher distribution (see ). The initial guess is not shown in Figure 2.
In comparison, a standard simulated annealing method offered less robust performance, which varied considerably with the choice of annealing schedule.
This section is devoted to the proofs of the results stated in previous sections.
As of now, assume that . There is nos loss of generality in making this assumption.
5.1 Proof of Proposition 1
|Proof of (i) : denote . By the definition of|
|Moreover, let be the function|
|For any , it is elementary that is Lipschitz continuous, with respect to , with Lipschitz constant . Then, from the Kantorovich-Rubinshtein formula ,|
|a uniform bound in . It now follows that|
|However, from (13b), it is clear that|
|To complete the proof, replace this into (13d) and (13e). Then, assuming the condition in (3) is verified,|
|This means that any global minimum of must belong to the open ball . In other words, . This completes the proof of (3).
Proof of (ii) : let where is the injectivity radius of at , and is an upper bound on the sectional curvature of . Assume, in addition, is small enough so
|whenever . Further, consider the truncated distribution|
|where denotes the indicator function, and stands for the open ball . Of course, by the triangle inequality,|
|The proof relies on the following estimates, which use the notation of Section 3.
First estimate : if , then
|Second estimate : if , then|
|These two estimates are proved below. Assume now they hold true, and . In particular, since , the definition of implies|
|Recall the definition of , and express and in terms of the Gamma function . The last inequality becomes|
|This is the same as|
|By the definition of , it now follows the right-hand side of (14d) is less than half the right-hand side of (14e).
In this case, (4) follows from the triangle inequality (14c).
|Proof of first estimate : consider the coupling of and , provided by the probability distribution on ,|
|where denotes the complement of . Recall the definition of the Kantorovich distance (see ). Replacing (15a) into this definition, it follows that|
|Then, from the definition (2) of ,|
|Now, (14d) follows directly from (15b) and (15c), if the following lower bound on can be proved,|
|To prove this lower bound, note that|
|Using this last inequality and (14a), it is possible to write|
|Writing this last integral in Riemannian spherical coordinates,|
|where is the volume density in the Riemannian spherical coordinates, and , and where is the area element of . From the volume comparison theorem in  (see Page 129),|
|where the second inequality follows since is concave for . Now, it follows from (15e) and (15f),|
|where is the surface area of . Thus, the required lower bound (15d) follows by noting that|
|where for , and that|
|Indeed, taken together, these give|
|Finally, (15d) can be obtained by noting the second term in square brackets is negligeable compared to the first, as decreases to , and by expressing and in terms of the Gamma function .
Proof of second estimate : the Kantorovich distance between and the Dirac distribution is equal to the expectation of the distance to , with respect to . Precisely,
|According to (2) and (14b), this is|
|Using (2) to express the probability , this becomes|
|A lower bound on the denominator can be found from (15e) and subsequent inequalities, which were used to prove (15d). Precisely, these inequalities provide|
|whenever . For the numerator in (16a), it will be shown that, for any ,|
|Then, (14e) follows by dividing (16c) by (16b), and replacing in (16a), after noting that . Thus, it only remains to prove (16c). Using (14a), it is seen that|
|By expressing this last integral in Riemannian spherical coordinates, as in (15f),|
|From the volume comparison theorem in  (see Page 130), . Therefore, (16d) becomes|
|The right-hand side is half the , and replacing in (16d), gives (16c).|
6 Proof of Lemma 1
|for some and , the Lie algebra of , where denotes the Lie group exponential mapping, and the dot denotes the action of on . For each , the cut locus of is given by|
|This is due to a more general result : let be a Riemannian manifold and be an isometry of . Then, for all . This is because if and only if is conjugate to along some geodesic, or there exist two different geodesics connecting to . Both of these properties are preserved by the isometry .|
In order to describe the set , denote the isotropy group of in , and the Lie algebra of . Let be an orthogonal decomposition, with respect to the Killing form of , and let be a maximal Abelian subspace of . Define ( the centraliser of in ), and consider the mapping
Let be the set of positive restricted roots associated to the pair , (each is a linear form ). Then, let be the set of such that for all , and the boundary of . Then