## 1. Introduction

Historically, directional statistics, that is, statistics on spheres, especially , has been around for a long time, and there is a great deal of literature on it (See the books by [80], [61], [32]). Much of that was inspired by a seminal paper by [33] proving beyond any reasonable doubt that the earth’s magnetic poles had shifted over geological times. Indeed, the two sets of data that he analyzed, one from the Quaternary period and the other from recent times (1947-48), showed an almost reversal of the directions of the magnetic poles. In addition to this first scientific demonstration of a phenomenon conjectured by some paleontologists, such studies of magnetic poles in fossilized remanent magnetism had an enormous impact on tectonics, essentially validating the theory of continental drift ([47], [32]). There are other important applications of directional statistics, such as designing of windmills based on wind directions, etc. Fisher’s example is presented in Section 9, in comparison with the nonparametric method highlighted in this article.

The advancement of imaging technology and increase in computing prowess have opened up a whole new vista of applications. Medical imaging, for example, is now an essential component of medical practice. Not only have MRIs (magnetic resonance imaging) become routine for diagnosing a plethora of diseases, there are more advanced techniques such as the DTI (diffusion tensor imaging) which measures diffusion coefficients of water molecules in tiny voxels along nerve fibers in the cortex of the brain in order to understand or monitor diseases such as Parkinson’s and Alzheimer’s

[38, 57, 66]. Beyond medicine, there are numerous applications to morphometrics [19], graphics, robotics, and machine vision [2, 60, 79].Images are geometric objects and their precise mathematical descriptions and identifications in different fields of applications are facilitated by the use of differential geometry. [53] and [19] were two pioneers in the geometric description and statistical analysis of images represented by landmarks on two or three dimensional objects. The spaces of such images, or shapes, are differential manifolds, or *stratified spaces* obtained by gluing together manifolds of different dimensions. In the following sections these spaces are described in detail. Much of the earlier statistical analysis on differential manifolds were parametric in nature, where a distribution on a manifold is assumed to belong to a finite dimensional parametric family; that is, is assumed to have a density (with respect a standard distribution, e.g., the volume measure on ) which is specified except for the value of a finite dimensional parameter lying in an open subset of an Euclidean space. The statistician’ÂÂs task is then to estimate the parameter (or test for its belonging to a particular subset of ), using observed data. There are standard methodologies for estimation (say, the maximum likelihood estimator, MLE), or testing (such as the likelihood ratio test) that one may try to use. Of course, it still requires a great deal of effort to analytically compute these statistical indices and their (approximate) distributions on specific manifolds. A reasonably comprehensive account of these for the shape spaces of Kendall, or similar manifolds, may be found in [28].

The focus of the present article is a model independent, or nonparametric, methodology for inference on general manifolds. As a motivation consider the problem of discriminating between two distributions on an Euclidean space based on independent samples from them. In parametric inference one would use a density (with respect to a sigma-finite measure) which is specified except for a finite dimensional parameter as described above. One may use one of a number of standard asymptotically efficient procedures to test if the two distributions have different parameter values (See, e.g., [42],[37]

). If the statistician is not confident about this parametric model, or any other, one popular method is to test for the differences between the means of the two distributions by using the two sample means. When the sample sizes are reasonably large then the difference between the sample means is asymptotically normal with mean given by the difference between the population means. If the observations are from a normal distribution with the mean as the unknown parameter then this test is optimal in an appropriate sense (

[15], pp 296-300, [59], pp. 93,94). But used in other parametric model the test is not, in general, optimal and may even be inconsistent; that is, there may be many pairs of distributions whose means are the same. However, when the components or coordinates of the distributions are such that the differences between andare reasonably expected to manifest in shifts of the mean vector, this widely used nonparametric test is quite effective, especially since with large sample sizes the asymptotic distribution is normal. Turning now to distributions

on non-Euclidean metric spaces , one has an analogue of the mean given by the minimizer, if unique, of the average (with respect to ) of the squared distance from a point. This is the so called*Fréchet mean*introduced by [34]

, although physicists probably had used the notion earlier in specific physical contexts for the distribution

of the mass of a body, calling it the center of mass. Of course it is in general a non-trivial matter to find out broad conditions for the*uniqueness of the Fréchet minimizer*and, in the case of uniqueness, to derive the

*(asymptotic) distribution of the sample Fréchet mean*. These allow one to obtain proper confidence regions for the Fréchet mean of and critical regions for tests for detecting differences in means of distributions on [16, 17, 18]. The theory of Fréchet means is presented in Section 2 (uniqueness and consistency), and in Section 4 (asymptotic distributions). The main results in Sections 2 and 4 are presented with complete proofs. Section 4 plays a central role for inference in the present context, and it contains some improvements of earlier results.

It has been shown in data examples that the nonparametric procedures based on Fréchet means often greatly outperform their parametric counterparts (See [10]). Misspecification of the model is a serious issue with parametric inference, especially for distributions on rather complex non-Euclidean spaces.

In this article two types of images and their analysis are distinguished. The greater emphasis is on landmarks based shapes introduced by [53] and [19]. This looks at a *-ad* or a set of properly chosen points, not all the same, on an -dimensional image (usually or 3),

, such as an MRI scan of a section of the brain for purposes of diagnosing a disease, or a scan of some organ of a species for purposes of morphometrics. In order to properly compare images taken from different distances and angles using perhaps different machines, the

*shape of a -ad*is defined modulo translation, scaling and rotation. The resulting shapes comprise

*Kendall’s shape spaces.*In addition, one may consider

*affine shapes*

which are invariant under all affine transformations appropriate in scene recognition; similarly,

*projective shapes*invariant under projective transformations are often used for robotic vision. The precise mathematical (geometric) descriptions of these kind of images are presented in Section 3. Sections 5 and 6 provide the asymptotic theory of tests and confidence regions on manifolds, based on the asymptotic distribution theory developed in Section 4.

Section 8 considers briefly the second type of images, namely, the actual geometric shape of a compact two-dimensional surface or a three dimensional body. Here the shape space is infinite dimensional and may be viewed as a *Hilbert manifold* [29]

. For purposes of diagnostics such as described above, this is probably not to be preferred in comparison with the finite dimensional landmarks based shapes considered by Kendall, because of the curse of dimensionality. The Hilbert manifolds then are better suited for purposes of machine vision. However, for that task a more effective methodology seems to be one which builds on the exciting inquiry of

[50]: Can one hear the shape of a drum? It turns out that for two-dimensional compact Riemannian manifolds such as compact surfaces, the spectrum of the Laplace Beltrami operator identifies the manifold in most cases, although there are exceptions. In three and higher dimensions, on the other hand, iso-spectral manifolds are not so rare [64, 39, 83]. Still, computer scientists and other researchers in machine vision have successfully implemented algorithms to identify two and three-dimensional images by the spectrum of their Laplaceans, sometimes augmented by their eigen-functions [23, 40, 48, 75, 72]. A mathematical breakthrough was achieved by [49], who proved that indeed compact manifolds are determined by this augmentation.Section 7 is devoted to another very important statistical problem: *nonparametric classification via density estimation, and nonparametric regression* on manifolds. In particular, we emphasize Ferguson’s nonparametric Bayes theory of using *Dirichlet process priors* for this endeavor [30, 31].

Sections 9 provides a number of applications of the theory of Fréchet means, including Fisher’s example mentioned above, but focusing on two-sample problems on landmarks based shape spaces such as those introduced by Kendall [53, 54].

The appendix, Section 10, provides a ready access to some notions in Riemannian geometry used in the text.

## 2. Existence of the Fréchet Mean on Non-Euclidean Spaces.

Let be a metric space and a probability measure on it. The *Fréchet function* of
is defined as

(1) |

If is finite at some then it is finite on . The set of minimizers of F is called the Fréchet mean set. If the minimizer is unique, i.e., is a singleton, then it is called the *Fréchet mean* of , and one says that the *Fréchet mean* of exists. We will often use the topological condition

(2) | All closed bounded subsets of S are compact. |

When is a Riemannian manifold and is the geodesic distance on it, then (2) is equivalent to the completeness of , by the Hopf-Rinow theorem ([26], pp. 146-149).

Let be a random sample from , i.e., are i.i.d. with common distribution , defined on a probability space . Denote by the Fréchet function of the empirical , where is the point mass at Also let for

###### Theorem 2.1 ([17]).

Assume (2) and that the Fréchet function of is finite. Then (a) is nonempty and compact, and (b) for each , there exists a random positive integer and a -null set such that ,

(3) |

(c) In particular, if the Fréchet mean of , say , exists, then every measurable selection from , converges almost surely to . In this case is called the sample Fréchet mean.

###### Proof.

First assume is compact. Then (a) is obvious. To prove (b), it is enough to show that almost surely as . To see this let . If , then (3) holds with (for every ). Assume is not , and write . There exists , such that Also, there exists , , such that . Since a.s., there exists such that such that , and , so that , proving (3). In order to show that a.s. first note that, irrespective of , where . Given any , if Let be such that the balls with radius and center cover . Then The same is true with replaced by

. By the strong law of large numbers (SLLN), there exists

such that (, outside a -null set. It follows that, outside a -null set, , providedConsider now the non-compact case, but assuming (2). Let . This infimum is attained in . To see this, let () be such that as . Since , one has

(4) |

Letting and , one obtains . Hence the sequence is bounded, and its closure is compact,. Therefore, there exists such that . Thus is nonempty and closed. If is any point in then taking and in (4), one has . That is . Thus part (a) is proved. To prove part (b), one has, using for and a fixed point for in in (4), the inequality . Fix a . Consider the compact set . Then for , one has , for all sufficiently large except for lying in a -null set, in view of the SLLN. Hence for . Applying the result in the compact case (with ), one arrives at (b). Part (c) is an immediate consequence of part (b).

∎

###### Remark 2.2.

Theorem 2.1 extends to more general Fréchet functions, including , .

###### Remark 2.3.

Relation (3) does not imply that the sets and are asymptotically close in the Hausdorff distance. Indeed, in many examples may be a singleton, while is not. See, e.g., [17], Remark 2.6, where it is shown that whatever be the absolutely continuous distribution on , is almost surely a singleton; in particular, this is the case when

is the uniform distribution for which

. In view of this, and for asymptotic distribution theory considered later, it is important to find broad conditions on for the*existence of the Fréchet mean*(as the unique minimizer of the Fréchet function).

Let be a *differentiable manifold* of dimension –a topological space which is metrizable as a separable metric space such that (i) every has an open neighborhood up with a homeomorphism , where is an open subset of , and (ii) (compatibility condition) if is nonempty, then the map is a -a common example is the sphere ; one may take as the north pole (0,0,..,0,1) and as the south pole (0,0,… ,0, -1), , and and are the stereographic projections on and , respectively, onto . Or, one may take 2 open hemispheres of with poles whose coordinates are all zeros, except for +1 or - 1 at the -th coordinate (), each mapped diffeomorphically onto the open unit disc in . There are infinitely many distances which metrize the topology of . The two most common are (1) the Euclidean distance under an embedding, and (2) the geodesic distance when is endowed with a metric tensor. For the first, recall that a smooth () map is an embedding into an Euclidean space , if (a) is one-to-one and is a homeomorphism with given the relative topology of , and (b) the differential on the tangent space into the tangent space of at is one-to-one. The Euclidean distance on (transferred to via ) is called the *extrinsic distance* on . The embedding is said to be closed if is closed. For one may, for example, take to be the inclusion map of into , and the extrinsic distance is the chord distance.

###### Theorem 2.4 ([18] (Extrinsic Fréchet Mean on a Manifold)).

Let be a differentiable manifold and a probability measure on it. If is a closed embedding of into an Euclidean space and the Fréchet function of is finite with respect to the induced Euclidean distance on ), then the (extrinsic) Fréchet mean exists as the unique minimizer of the Fréchet function if and only if there is a unique point in closest to the Euclidean mean of the(push forward) distribution on , and then the extrinsic mean is

###### Proof.

For a point , writing for the usual squared Euclidean norm on

(5) |

This is minimized with respect to , by setting to be the point in closest to , if there is only one such point, and the minimizer is not unique otherwise. ∎

###### Example 2.5 ( Extrinsic Mean on the Sphere ).

Let the inclusion map on into be the embedding . Then the mean of on lies inside the unit ball in unless is degenerate at a point If is nondegenerate, the closest point to in is unless (i.e., lies at the center of the unit ball). Thus (the image of ) the extrinsic mean is . If , then . If is degenerate at , then is the extrinsic mean. Taking to be the empirical , the sample Fréchet mean is if is not the origin in . If , then

Theorem 2.4 allows one in many important cases of interest in image analysis to find analytic characterizations for the existence of the extrinsic mean (i.e., as the unique minimizer of the Fréchet function) and computable formulas for its computation. This will be discussed in Section 3.

Unfortunately, on a Riemannian manifolds with metric tensor there is no good analog of Theorem 2.4 for the *intrinsic mean* of , – the minimizer of the Fréchet function under the geodesic distance The pioneering work by [51] followed by generalizations and strengthening, most notably, by [55], [58] and [1] hold under support restrictions on , which are untenable for general statistical inference. The recent results of [1] are the sharpest among these, which we state below (for the Fréchet function (1)) without proof. For the terminology used in the statement we refer to the Appendix on Riemannian geometry. Recall that the support of a probability measure on a metric space is the smallest closed set such that .

###### Theorem 2.6 ([1] (Intrinsic Mean on a Riemannian Manifold)).

On a complete Riemannian manifold , there exists an intrinsic Frećhet mean of , as the unique minimizer of the Frećhet function (1) with the geodesic distance , if the support of is contained in a geodesic ball of radius less than . Here is the injectivity radius of ; and is the supremum of sectional curvatures of , if positive, or zero otherwise.

###### Remark 2.7.

If the Riemannian manifold is complete, simply connected and has non-positive curvature and the Fréchet function of is finite, then the intrinsic mean of exists (as the unique minimizer of ). An important generalization of this is to the so called *metric spaces of non-positive curvature*, or the NPC spaces, which include many interesting metric spaces which are not manifolds. Such spaces were introduced by [3] and further developed by [70] and [41]. See [77] for a fine exposition.

###### Example 2.8.

Let Then it has constant sectional curvature 1, and its injectivity radius is . Thus if has support contained in an open hemisphere, then the Fréchet mean of under the geodesic distance exists. To see that one cannot relax this support condition in general, consider the uniform distribution on the equator. Then the minimum expected squared distance is attained at both the North and South poles (say, (0,0,1), and (0,0, -1)), so that has two points.

###### Remark 2.9.

For purposes of statistical inference the support condition in Theorem 2.6 is restrictive, but as Example 2.8 shows one cannot dispense with the support condition without some further conditions on the nature of . In statistical practice a reasonable assumption is that the distribution is absolutely continuous. In under the assumption that has a continuous density (with respect to the arc length measure on intervals, i.e., the Lebesgue measure on [0, 2) ) a necessary and sufficient condition, which applies broadly, was obtained in [12] and may be found in [10], pp. 31-33, 73-75.

## 3. Geometry of Kendall’s Shape Spaces.

### 3.1. Kendall’s Similarity Shape Space

The similarity shape of a -ad in , not all points the same, is its orbit under the group generated by translations, scaling and rotations. Writing , , the effect of translation is removed by looking at , which lies in the

dimensional hyperplane

of made up of matrices with the row sums all equal to zero. To get rid of scale, one looks at , where is the usual norm in . This translated and scaled -ad is called the*preshape*of the -ad. It lies on the unit sphere in , and is isomorphic to . An alternative representation of the preshape, which we use, is obtained as , where H is the

*Helmert matrix*comprising column vectors forming an orthonormal basis of , namely, the subspace of orthogonal to . A standard H has the -th column given by , where the first elements equal . Then is an matrix of norm one. The

*shape*of is then identified with the orbit of under all rotations:

(6) |

where is called the *special orthogonal group* acting on . The set of all shapes is Kendall’s *similarity shape* space

If , , the action of on the preshape sphere is *free*, i.e., no other than the identity has a fixed point and each orbit of a point in has an orbit of dimension one, namely the dimension of . Since each is an isometry of endowed with the geodesic distance, it follows that is a Riemannian manifold. For , , however, the action of on is not free. For example, for , each collinear -ad in is invariant under all rotations in around the line of the -ad. is then a disjoint union of two Riemannian manifolds, not complete, one comprising of the orbits of collinear -ads under rotation by elements of other than those that keep it fixed (except for the identity). The other comprises of orbits under of all non-collinear -ads in . is then a stratified space with two strata. More generally, , (), is a *stratified space* with strata. See [52], Chapter 6, for a complete description of the intrinsic geometry of . Also see [46] for intrinsic analysis of more general stratified spaces of the form , where is a Riemannian manifold and is a Lie group of isometries acting on .

3.1(a).Intrinsic geometry of . For the case , it is convenient to regard a -ad as a -tuple of numbers in the complex plane , and let . Then the shape of , or , is identified with the orbit

(7) |

One may equivalently, consider the shape as the orbit . That is, after Helmertization, the shape of , or , is identified with a complex line passing through the origin in . The shape space is then identified with the *complex projective space* , of (real) dimension . We will, however, use the representation , where is a 1-dimensional compact group ( of isometries of the preshape sphere , which is isomorphic to . Recall that the metric tensor on is that inherited from the inclusion map into . That is, the inner product at the tangent space is , when , are expressed as complex matrices (row vectors) in , satisfying . The projection map is then . The *vertical subspace* is obtained by differentiating the curve , say at , yielding . That is, . Thus the *horizontal subspace* is . The *geodesics* for (for in ), and the exponential map on are specified by this isometry between and for all shapes (See the Appendix, Section A). Thus, identifying vectors in with vectors in , one obtains

(8) | ||||

Denoting by and the geodesic distances on and , respectively, and recalling that (See Example 10.1 and [52], p.114) , one has

(9) | ||||

It follows that the geodesics are periodic with period , and the cut locus of is , and that the injectivity radius of is . The inverse exponential map is given by , where , and satisfies (Use (A.1) with the representation of as )

(10) | ||||

where is so chosen as to minimize . That is, , or ( for , i.e., for not in . Hence, writing , , one has

(11) | ||||

This horizontal vector () represents .

The sectional curvature of

at a section generated by two orthonormal vector fields

and is where , and being the horizontal lifts of and (See [26]).3.1(b). Extrinsic geometry of induced by an equivariant embedding. As mentioned in Section 2, no broad sufficient condition is known for the existence of the intrinsic mean (i.e., of the uniqueness of the minimize of the corresponding Fréchet function). The extrinsic mean, on the other hand, is unique for most , and is generally computable analytically. However, for an extrinsic analysis to be very effective one should choose a good embedding which retains as many geometrical features of the shape manifold as possible. Let be a Lie group acting on a differentiable manifold , and denote by ) the *general linear group *of nonsingular transformations on a Euclidean space of dimension onto itself. An embedding on into is said to be *-equivariant* if there exists a group homomorphism of into such that , . Often, when there is a natural Riemannian structure on , is a group of isometries of . Consider the so-called *Veronese-Whitney embedding* of into the (real) vector space of all Hermitian matrices , defined by

(12) |

The Euclidean inner product on , considered as a real vector space, is given by . Let denote the *special unitary group* of all unitary matrices (i.e., , ) acting on ) by . Then the embedding (12) is - equivariant, with and the group action on given by: . For , say, where the group homomorphism on onto is given by Note that is a group of isometries of . In the notation defining equavariance, one lets (), is a subgroup of .

To compute the extrinsic mean of on , let be the probability induced on by the map in (12), and let denote its Euclidean mean. By Theorem 2.1, the (image of the) extrinsic mean of is given by the orthogonal projection on .

###### Proposition 3.1.

[17] The image under of the extrinsic mean of comprises all elements of the form where

is a normalized (column) eigenvector with the largest eigenvalues of

. In particular, the extrinsic mean of exists if and only if the largest eigenvalue of is simple.###### Proof.

Let be a unitary matrix such that where are the ordered eigenvalues of . Then the columns of form a complete orthonormal set of eigenvectors of . By relabelling the landmarks, if necessary, we may assume that the th column of is an eigenvector with eigenvalue . Write as the square of the Euclidean norm of . Then for elements of , denoting , one has

which is minimized over by taking and , i.e., by taking be any normalized eigenvector of with the largest eigenvalue. ∎

A *size-and-shape similarity shape * is defined for Helmertized -ads as its orbit under . An equivariant embedding for it is , on the *size-and-shape-similarity shape* space into

### 3.2. Reflection Similarity Shape Space , , .

For , let be the subset of the centered preshape sphere whose points span , i.e., which, as matrices, are of full rank. We define the *reflection similarity shape* of the k-ad as

(13) |

where is the set of all *orthogonal matrices * , . The set is the *reflection similarity shape space* . Since is an open subset of the sphere , it is a Riemannian manifold. Also is a compact Lie group of isometries acting on . Hence there is a unique Riemannian structure on such that the projection map is a Riemannian submersion.

We next consider a useful embedding of into the vector space ) of all real symmetric matrices (See [6], [5], [27], and [8]). Define

(14) |

with an matrix with norm one. Note that the right side is a function of . Here the elements of the preshape sphere are Helmertized. To see that this is an embedding, we first show that is one- to-one on into . For this note that if and are the same, then the Euclidean distance matrices and are equal. Since and are centered, by geometry this implies that for some , i.e., . We omit the proof that the differential is also one-to-one. It follows that the embedding is equivariant with respect to a group action isomorphic to .

###### Proposition 3.2 ([8]).

(a) The projection of into is given by

(15) |

where are the ordered eigenvalues of , are corresponding orthonormal (column) eigenvectors and . (b) The projection set is a singleton and has a unique extrinsic mean iff . Then where , .

For , a *size-and-reflection shape * of a Helmertized -ad in of full rank is given by its orbit under the group . The space of all such shapes is the size-and-reflection shape space . An -equivariant embedding of

Comments

There are no comments yet.