1. Computing averages in nonlinear geometries
The line of research presented in this chapter was first suggested by a 2001 presentation by D. Donoho on multiscale representations of discrete data . A subsequent Ph.D. thesis and accompanying publication appeared a few years later . Multiscale representations are intimately connected with refinement procedures (prediction operators). These are in themselves an interesting topic with applications, e.g. in computer graphics. Iterative refinement a.k.a. subdivision in turn is based on the notion of average. Consequently this chapter is structured into the following parts: Firstly a discussion of averages, in particular averages in metric spaces and in manifolds. Secondly, subdivision rules and the limits generated by them. Thirdly, multiresolution representations.
We start with the affine average w.r.t. weights of data points
contained in a vector space. It is defined by
In this chapter we stick to finite averages, but we allow negative coefficients. For data whose geometry is not that of a vector space, but that of a surface contained in some Euclidean space, or that of a group, or that of a Riemannian manifold, this affine average often does not make sense. In any case it is not natural. Examples of such data are, for instance, unit vectors, positions of a rigid body in space, or the 3 by 3 symmetric positive definite matrices which occur in diffusion-tensor MRI. In the following paragraphs we show how to extend the notation of affine average to nonlinear situations in a systematic way. We start by pointing out equivalent characterizations of the affine average:
The Fréchet mean
Each of (2)–(4) has been used to generalize the notion of weighted average to nonlinear geometries. Some of these generalizations are conceptually straightforward. For example, Equation (4) has an analogue in any metric space , namely the weighted Fréchet mean defined by
It is a classical result that in case of nonnegative weights, the Fréchet mean exists and is unique, if is a Hadamard metric space. This property means is complete, midpoints exist uniquely, and triangles are slim, cf. .111More precisely, for all there is a unique midpoint defined by , and for any and points which have the same pairwise distances as , the inequality holds.
The Fréchet mean in Riemannian manifolds. In a surface resp. Riemannian manifold , the Fréchet mean locally exists uniquely. A main reference here is the paper  by H. Karcher. He considered the more general situation that
is a probability measure on, where the mean is defined by
In this chapter we stick to the elementary case of finite averages with possibly negative weights. The Fréchet mean exists uniquely if the manifold is Hadamard – this property is usually called “Cartan-Hadamard” and is characterized by completeness, simple connectedness, and nonpositive sectional curvature. For unique existence of , we do not even have to require that weights are nonnegative [37, Th. 6].
The Fréchet mean in the non-unique case. If the Cartan-Hadamard property is not fulfilled, the Fréchet mean does not have to exist at all, e.g. if the manifold is not complete (cutting a hole in exactly where the mean should be makes it nonexistent). If the manifold is complete, the mean exists, but possibly is not unique.
If is complete with nonpositive sectional curvature, but is not simply connected, there are situations where a unique Fréchet mean of given data points can still be defined, e.g. if the data are connected by a path with . This will be the case e.g. if data represent a time series. Existence or maybe even canonical existence of such a path depends on the particular application. We then consider the simply connected covering , find a lifting of , compute the Fréchet mean , and project it back to . This average does not only depend on the data points and the weights, but also on the homotopy class of . In fact instead of a path, any mapping can be used for such purposes as long as its domain is simply connected .
Finally, if is complete but has positive sectional curvatures, a unique Fréchet mean is only defined locally. The size of neighbourhoods where uniqueness happens has been discussed by [15, 16, 34]. This work plays a role in investigating convergence of subdivision rules in Riemannian manifolds, see Section 2.4.
The exponential mapping
From the different expressions for the affine average, (2) and (3) seem to be specific to linear spaces, because they involve the and operations. However, it turns out that there is a big class of nonlinear geometries where natural analogues and of these operations exist, namely the exponential mapping and its inverse. We discuss this construction in surfaces resp. Riemannian manifolds, in groups, and in symmetric spaces.
The exponential mapping in Riemannian geometry. In a Riemannian manifold , for any and tangent vector , the point is the endpoint of the geodesic curve which starts in , has initial tangent vector , and whose length equals . We let
One property of the exponential mapping is the fact that curves of the form are shortest paths with initial tangent vector . The mapping is a diffeomorphism locally around . Its differential equals the identity.
Properties of the Riemannian exponential mapping. For complete Riemannian manifolds, is always well defined. Also exists by the Hopf-Rinow theorem, but it does not have to be unique. Uniqueness happens if does not exceed the injectivity radius of . In Cartan-Hadamard manifolds, injectivity radii are infinite and the exponential mapping does not decrease distances, i.e., . The injectivity radius can be small for topological reasons (e.g. a cylinder of small radius which is intrinsically flat, can have arbitrarily small injectivity radius), but even in the simply connected case, one cannot expect to exceed , if is a positive upper bound for sectional curvatures.
Further, the operation and the Riemannian distance are related by
The exponential mapping in groups. In Lie groups, which we describe only in the case of a matrix group , a canonical exponential mapping is defined: With the notation for the tangent space in the identity element, we let
The curve is the unique one-parameter subgroup of whose tangent vector at is the vector . Again, is locally a diffeomorphism whose differential is the identity mapping.
An inverse log of exp is defined locally around . Transferring the definition of to the entire group by left translation, the defining relation yields
Addition is always globally well defined, but the difference might not exist. For example, in , the mapping is not onto. The difference exists always, but not uniquely, in compact groups. See e.g. .
The exponential mapping in symmetric spaces. Symmetric spaces have the form , where is a Lie subgroup of . There are several definitions which are not entirely equivalent. We use the one that the tangent spaces , obey the condition that is the eigenspace of an involutive Lie algebra automorphism of .222i.e., obeys the law , where in the matrix group case, the Lie bracket operation is given by . The tangent space of in the point is naturally identified with the eigenspace of the involution, and is transported to all points of by left translation. The exponential mapping in is projected onto in the canonical way and yields the exponential mapping in the symmetric space.
We do not go into more details but refer to the comprehensive classic  instead. Many examples of well-known manifolds fall into this category, e.g. the sphere , hyperbolic space , and the Grassmannians. We give an important example:
The Riemannian symmetric space of positive-definite matrices. The space of positive definite matrices is made a metric space by letting
Here means the Frobenius norm, and
means the eigenvalues of a matrix.
The metric (7) is actually that of a Riemannian manifold. , as an open subset of the set of symmetric matrices, in each point has a tangent space canonically isomorphic to as a linear space. The Riemannian metric in this space is defined by .
is also a symmetric space: We know that any can be uniquely written as a product , with and . Thus , with , , and the canonical projection .
The respective tangent spaces of are given by and
, which is the set of skew-symmetricmatrices. The involution in obeys , and is its eigenspace. We have thus recognized as a symmetric space. It turns out that , where exp is the matrix exponential function.
The previous paragraphs define two different structures on , namely that of a Riemannian manifold, and that of a symmetric space. They are compatible in the sense that the , operations derived from either structure coincide. For more information we refer to [40, 24, 58]. Subdivision in particular is treated by . ∎
Averages defined in terms of the exponential mapping
If and are defined as discussed in the previous paragraphs, it is possible to define a weighted affine average implicitly by requiring that
Any Fréchet mean in a Riemannian manifold is also an average in this sense, which follows directly from (5) together with (6). Locally, is well defined and unique. As to the size of neighbourhoods where this happens, in the Riemannian case the proof given by [15, 16] for certain neighbourhoods enjoying unique existence of shows that the very same neighbourhoods also enjoy unique existence of .
Affine averages with respect to a base point. From the different expressions originally given for the affine average, is one we have not yet defined a manifold analogue for. With and at our disposal, this can be done by
We call this the log/exp average with respect to the base point . It has the disadvantage of a dependence on the base point, but for the applications we have in mind, there frequently is a natural choice of base point. Its advantages lie in the easier analysis compared to the Fréchet mean. One should also appreciate that the Fréchet mean is a log/exp mean w.r.t. to a basepoint, if that basepoint is the Fréchet mean itself:
The possibility to define averages w.r.t. basepoints rests on the possibility of defining , which has been discussed above.
2.1. Defining stationary subdivision
Subdivision is a refinement process acting on input data lying in some set , which in the simplest case are indexed over the integers and are interpreted as samples of a function . A subdivision rule refines the input data, producing a sequence which is thought of denser samples of either itself, or of a function approximating .
One mostly considers binary rules, whose application “doubles” the number of data points. The dilation factor of the rule, generally denoted by the letter , then equals . We require that the subdivision rule is invariant w.r.t. index shift, which by means of the left shift operator can be formalized as
We require that each point depends only on finitely many data points . Together with shift invariance this means that there is such that influences only .
Subdivision rules are to be iterated: We create finer and finer data
which we hope approach a continuous limit (the proper definition of which is given below).
Subdivision was invented by G. de Rham , who considered the process of iteratively cutting corners from a convex polygon contained in , and asked for the limit shape. If cutting corners is done by replacing each edge by the shorter edge with vertices , , this amounts to a subdivision rule. In de Rham’s example, only two data points contribute to any individual .
Primal and dual subdivision rules. The corner-cutting rules mentioned above are invariant w.r.t. reordering indices according to . With inversion defined by we can write this invariance as . An even simpler kind of symmetry is enjoyed by subdivision rules with obey . The latter are called primal rules, the former dual ones. The reason why we emphasize these properties is that they give guidance for finding manifold analogues of linear subdivision rules.
Subdivision of multivariate data. It is not difficult to generalize the concept of subdivision to multivariate data indexed over the grid . A subdivision rule must fulfill , for all shifts w.r.t. a vector .
Data with combinatorial singularities have to be treated separately, cf. Sec. 3.4. Here basically only the bivariate case is studied, but this has been done extensively, mostly because of applications in Computer Graphics .
Linear subdivision rules and their nonlinear analogues
A linear subdivision rule acting on data has the form
If the sum of coefficients contributing to equals , the application of the rule amounts to computing a weighted average:
Subdivision rules not expressible in this way might occur as auxiliary tools in proofs, but are not meant to be applied to data which are points of an affine space. This is because if , then the linear combination is not translation-invariant, and the rule depends on the choice of origin of the coordinate system.
Besides, the iterated application of rules not expressible as weighted averages either leads to divergent data , or alternatively, to data approaching zero. For this reason, one exclusively considers linear rules of the form (11). A common definition of convergent subdivision rule discounts the case of zero limits and recognizes translation invariance as a necessary condition for convergence, cf. .
For a thorough treatment of linear subdivision rules, conveniently done via acting as a linear operator in and using the appropriate tools of approximation theory, see, e.g. .
In the following we discuss some nonlinear, geometric, versions of subdivision rules. We use the various nonlinear versions of averages introduced above, starting with the Fréchet mean in metric spaces.
Subdivision using the Féchet mean. A natural analogue of (11) is found by replacing the affine average by the Fréchet mean. This procedure is particularly suited for Hadamard metric spaces and also in complete Riemannian manifolds.
Log/exp subdivision. In a manifold equipped with an exponential mapping, an analogue of (11) is defined by
where is a base point computed in a meaningful manner from the input data, e.g. . In case of combinatorial symmetries of the subdivision rule, it makes sense to make the choice of conform to these symmetries.
Subdivision using projections. If is a surface embedded in a vector space and is a projection onto , we might use the subdivision rule
If the intrinsic symmetries of extend to symmetries of ambient space, then this projection analogue of a linear subdivision rule is even intrinsic – see Example 2.1.
Subdivision in the motion group. The groups and are -dimensional surfaces in the linear space . A projection onto
is furnished by singular value decomposition, or in an alternate way of expressing it, by the polar decomposition of Example1:
This projection is -equivariant in the sense that for , we have both and . The same invariance applies to application of a linear subdivision rule acting in . So for any given data in , and a linear subdivision rule , the subdivision rule produces data in in a geometrically meaningful way, as long as we do not exceed the bounds of . Since is a rather big neighbourhood of , this is in practice no restriction. Figure 1 shows an example. ∎
2.2. Convergence of subdivision processes
Definition of convergence. When discrete data are interpreted as samples of a function, then refined data , etc. are interpreted as the result of sampling which is times, times etc. as dense as the original. We therefore define a convergent refinement rule as follows.
Discrete data at the -th iteration of refinement determine a function , whose values are the given data points: For any -adic point , we have , provided is an integer. For all such , the sequence is eventually defined and we let . We say is convergent for input data , if the limit function exists for all and is continuous. It can be uniquely extended to a continuous function .
Another way of defining the limit is possible if data
lie in a vector space. We linearly interpolate them by functionswith . Then the limit of functions agrees with the limit of Def. 2.2 (which is pointwise, but in fact convergence is usually uniform on compact sets.)
Let be a complete metric space, and let the subdivision rule operate with dilation on data . We measure the density of the data by
where we use the 1-norm on the indices. is contractive, resp. displacement-safe, if
If these two conditions are met, any input data with bounded density have a limit , which is Hölder continuous with exponent .
Contractivity implies . For any -adic rational point , the sequence is defined for all . It is Cauchy, since
Thus the limit function is defined for all -adic points.
Consider now two -adic points . Choose such that . For all , approximate resp. by -adic points , such that none of , , exceeds . One can choose . The sequence is eventually constant with limit , and similarly the sequence is eventually constant with limit . Using the symbol for “similar terms involving instead of
”, we estimate
Using the contractivity and displacement-safe condition, we further get
The index was chosen such that , so in particular . We conclude that
Thus is continuous with Hölder exponent on the -adic rationals, and so is the extension of to all of . ∎
The scope of this lemma can be much expanded by some obvious modifications.
Input data with unbounded density . Since points only depend on finitely many ’s, there is such that only influences with . By iteration, influences with , and so on. It follows that influences the value of the limit function only for . We can therefore easily analyze the restriction of the limit function to some box by re-defining all data points away from that box in a manner which makes finite.
Partially defined input data. If data are defined not in all of but only in a subset, the limit function is defined for a certain subset of . Finding this subset goes along the same lines as the previous paragraph – we omit the details.
Convergence for special input data. In order to check convergence for particular input data , it is sufficient that the contractivity and displacement-safe conditions of Lemma 2.2 hold for all data constructed by iterative refinement from . A typical instance of this case is that contractivity can be shown only if does not exceed a certain threshold . It follows that neither does , and Lemma 2.2 applies to all with .
Powers of subdivision rules. A subdivision rule might enjoy convergence like a contractive rule without being contractive itself. This phenomenon is analogous to a linear operator having norm but spectral radius , in which case some . In that case we consider some power as a new subdivision rule with dilation factor . If is contractive with factor , Lemma 2.2 still applies, and limits enjoy Hölder smoothness with exponent .
Convergence of linear subdivision rules. Consider a univariate subdivision rule defined by finitely many nonzero coefficients via (11). acts as a linear operator on sequences . The norm induces an operator norm which obeys . It is an exercise to check . Equality is attained for suitable input data with values in .
With we express the density of the data as . Contractivity means that for some .
Analysis of this contractivity condition uses a trick based on the generating functions and . Equation (11) translates to the relation between generating functions, and we also have . The trick consists in introducing the derived subdivision rule with coefficients which obeys . The corresponding relation between generating functions reads
This division is possible in the ring of Laurent polynomials, because for all , . The contractivity condition now reads , i.e., the contractivity factor of the subdivision rule is bounded from above by . The “displacement-safe” condition of Lemma 2.2 is fulfilled also, which we leave as an exercise (averages of points are not far from the ’s).
Convergence of geodesic corner-cutting rules. Two points of a complete Riemannian manifold are joined by a shortest geodesic path , , . The difference vector and thus the path are generically unique, but do not have to be, if the distance between and exceeds both injectivity radii . The point has , . It is a Fréchet mean of points w.r.t. weights .
With these preparations, we consider two elementary operations on sequences, namely averaging and corner cutting :
The distance of from is bounded by the length of the broken geodesic path which connects the first point with and continues on to the second; its length is bounded by . Similarly, the distance of successive points of the sequence , for is estimated by . It follows immediately that a concatenation of operations of this kind is a subdivision rule where Lemma 2.2 applies, if at least one with is involved. Any such concatenation therefore is a convergent subdivision rule in any complete Riemannian manifold. A classical example are the rules , which insert midpoints,
and then compute rounds of averages. E.g.,
The rule (Chaikin’s rule, see ) is one of de Rham’s corner cutting rules. In the linear case, has coefficients , apart from an index shift. Its limit curves are the B-spline curves whose control points are the initial data .
The corner-cutting rules discussed above are well defined and convergent in any Hadamard metric space – those spaces have geodesics in much the same way as Riemannian manifolds. Subdivision rules based on geodesic averaging (not necessarily restricted to values ) have been treated by [51, 54, 21, 20]. We should also mention that adding a round to a subdivision increases smoothness of limit curves, which was recently confirmed in the manifold case . ∎
Convergence of interpolatory rules. A subdivision rule with dilation factor is called interpolatory if , i.e., the old data points are kept and new data points are inserted in between. In the linear case, a very well studied subdivision rule of this kind is the four-point rule proposed by Dyn, Gregory and Levin . We let and
In the special case , the point is found by evaluating the cubic Lagrange polynomial interpolating , which accounts for the high approximation order of . There is in fact a whole series of interpolatory rules based on the idea of evaluating Lagrange interpolation polynomials (the Dubuc-Deslauriers subdivision schemes, see ).
is a binary “dual” subdivision rule with combinatorial symmetry about edges. Thus it makes sense to define a Riemannian version of by means of averages w.r.t. geodesic midpoints of as base points, cf. Equ. (9). Using , we let
The distance of successive points and is bounded by half the geodesic distance of plus the length of the vector added to the midpoint in the previous formula. This yields the inequality . Lemma 2.2 thus shows convergence, if .
We cannot easily extend this “manifold” four-point rule to more general metric spaces. The reason is that we used the linear structure of the tangent space. A general discussion of univariate interpolatory rules is found in . ∎
2.3. Probabilistic interpretation of subdivision in metric spaces
acting on data . Consider a stochastic process defined as the random walk on with transition probabilities
Then the expected value of , conditioned on is given by
by definition of the expected value. Now the expectation of an
-valued random variablehas a characterization via distances: is that constant which is closest to in the sense of . A similar characterization works for the conditional expectation which is the random variable closest to in the sense. These facts inspired a theory of random variables with values in Hadamard metric spaces developed by K.-T. Sturm [46, 47]. The minimizers mentioned above can be shown to still exist if is replaced by .
Since the way we compute subdivision by Fréchet means is compatible with the distance-based formula for expected values, Equation (13) holds true also in the case that both the expectation and the subdivision rule are interpreted in the Hadamard space sense. On that basis, O. Ebner could show a remarkable statement on convergence of subdivision rules:
[23, Th. 1] Consider a binary subdivision rule with nonnegative coefficients . It produces continuous limits for any data in any Hadamard space if and only if it produces a continuous limit function when acting on real-valued data.
Sketch of proof.
With the random walk defined above, (13) directly implies
Unlike for -valued random variables, there is no tower property for iterated conditioning, so in general . That expression has a different interpretation: is analogous to the linear rule of (12), which is nothing but the restriction of the general rule to data in Euclidean spaces. Its -th power is a linear rule of the form , and we have
This follows either directly (computing the coefficients of the -th iterate corresponds to computing transition probabilites for the -iterate of the random walk), or by an appeal to the tower property in (14).
Sturm  showed a Jensen’s inequality for continuous convex functions ,
To continue, we need some information on the coefficients . For that, we use the limit function generated by applying (or rather, ), to the delta sequence. By construction (see Lemma 2.2), as . These ingredients allow us to show existence of with contractive. ∎
As a corollary we get, for instance, that subdivision with nonnegative coefficients works in in the same way as in linear spaces, as far as convergence is concerned. Since