# Hypoelliptic Diffusion Maps I: Tangent Bundles

We introduce the concept of Hypoelliptic Diffusion Maps (HDM), a framework generalizing Diffusion Maps in the context of manifold learning and dimensionality reduction. Standard non-linear dimensionality reduction methods (e.g., LLE, ISOMAP, Laplacian Eigenmaps, Diffusion Maps) focus on mining massive data sets using weighted affinity graphs; Orientable Diffusion Maps and Vector Diffusion Maps enrich these graphs by attaching to each node also some local geometry. HDM likewise considers a scenario where each node possesses additional structure, which is now itself of interest to investigate. Virtually, HDM augments the original data set with attached structures, and provides tools for studying and organizing the augmented ensemble. The goal is to obtain information on individual structures attached to the nodes and on the relationship between structures attached to nearby nodes, so as to study the underlying manifold from which the nodes are sampled. In this paper, we analyze HDM on tangent bundles, revealing its intimate connection with sub-Riemannian geometry and a family of hypoelliptic differential operators. In a later paper, we shall consider more general fibre bundles.

## Authors

• 12 publications
• ### Vector Diffusion Maps and the Connection Laplacian

We introduce vector diffusion maps (VDM), a new mathematical framework ...
02/01/2011 ∙ by Amit Singer, et al. ∙ 0

• ### Diffusion Maps meet Nyström

Diffusion maps are an emerging data-driven technique for non-linear dime...
02/23/2018 ∙ by N. Benjamin Erichson, et al. ∙ 0

• ### Web image annotation by diffusion maps manifold learning algorithm

Automatic image annotation is one of the most challenging problems in ma...
12/08/2014 ∙ by Neda Pourali, et al. ∙ 0

• ### Diffusion Structures for Architectural Stripe Pattern Generation

We present Diffusion Structures, a family of resilient shell structures ...
11/11/2020 ∙ by Abhishek Madan, et al. ∙ 0

• ### General Riemannian SOM

Kohonen's Self-Organizing Maps (SOMs) have proven to be a successful dat...
05/14/2015 ∙ by Jascha A. Schewtschenko, et al. ∙ 0

• ### Grassmannian diffusion maps based dimension reduction and classification for high-dimensional data

Diffusion Maps is a nonlinear dimensionality reduction technique used to...
09/16/2020 ∙ by K. R. M. dos Santos, et al. ∙ 0

• ### Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery

In recent years, manifold learning has become increasingly popular as a ...
05/30/2013 ∙ by Dominique Perraul-Joncas, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Acquiring complex, massive, and often high-dimensional data sets has become a common practice in many fields of natural and social sciences; while inspiring and stimulating, these data sets can be challenging to analyze or understand efficiently. To gain insight despite the volume and dimension of the data, methods from a wide range of science fields have been brought into the picture, rooted in statistical inference, machine learning, signal processing, to mention just a few.

Among the exploding research interests and directions in data science, the relation between the graph Laplacian

[23] and the manifold Laplacian [76] has emerged as a useful guiding principle. Specifically, the field of non-linear dimensionality reduction has witnessed the emergence of a variety of Laplacian-based techniques, such as Locally Linear Embedding (LLE) [77], ISOMAP [93], Hessian Eigenmaps [32], Local Tangent Space Alignment (LTSA) [100], Diffusion Maps [27], Orientable Diffusion Maps (ODM) [83], Vector Diffusion Maps (VDM) [81], and Schrödinger Eigenmaps [95]. The general practice of these methods is to treat each object in the data set (these objects could be images, texts, shapes, etc.) as an abstract node or vertex, and form a similarity graph by connecting each pair of similar nodes with an edge, weighted by their similarity score. Built with varying flexibility, these methods provide valuable tools for organizing complex networks and data sets by “learning” the global geometry from the local connectivity of weighted graphs.

The Diffusion Map (DM) framework [27, 56, 25, 26, 28, 83, 81]

proposes a probabilistic interpretation for graph-Laplacian-based dimensionality reduction algorithms. Under the assumption that the discrete graph is appropriately sampled from a smooth manifold, it assigns transition probabilities from a vertex to each of its neighbors (vertices connected to it) according to the edge weights, thus defining a graph random walk the continuous limit of which is a diffusion process

[97, 35]

over the underlying manifold. The eigenvalues and eigenvectors of the graph Laplacian, which converge to those of the manifold Laplacian under appropriate assumptions

[7, 8], then reveal intrinsic information about the smooth manifold. More precisely,  [9] proves that these eigenvectors embed the manifold into an infinite dimensional space, in such a way that the diffusion distance [27] (rather than the geodesic distance) is preserved. Appropriate truncation of these sequences leads to an embedding of the smooth manifold into a finite dimensional Euclidean space, with small metric distortion.

Under the manifold assumption,  [83, 81]

recently observed that estimating random walks and diffusion processes on structures associated with the original manifold (as opposed to estimates of diffusion on the manifold itself) are able to handle a wider range of tasks, or obtain improved precision or robustness for tasks considered earlier. For instance,

[83] constructed a random walk on the orientation bundle [17, §I.7] associated with the manifold, and translated the detection of orientability into an eigenvector problem, the solution of which reveals the existence of a global section on the orientation bundle;  [81] introduced a random walk on the tangent bundle associated with the manifold, and proposed an algorithm that embeds the manifold into an space using eigen-vector-fields instead of eigenvectors (and thus the name Vector Diffusion Maps (VDM)). In [98] the VDM approach is used, analogously to  [9], to embed the manifold into a finite dimensional Euclidean space. Although the VDM embedding does not reduce the dimensionality as much as standard diffusion embedding methods, it benefits from improved robustness to noise, as illustrated by the analysis of some notoriously noisy data sets [49, 50].

Both  [83] and  [81] incorporate additional structures into the graph Laplacian framework: in  [81] this is an extra orthogonal transformation (estimated from local tangent planes) attached to each weighted edge in the graph; in  [83] the edge weights are overwritten with signs determined by this orthogonal transformation. These methods are successful because they incorporate more local geometry in the path to dimensionality reduction, by estimating tangent planes. In fact, the advantage of utilizing local geometric information from the tangent bundle had been noticed earlier: Fig. 1 shows a simple example, borrowed from [56, §2.6.1], where the original data set (shown in Fig. 1(a)) is a Descartes Folium with self-intersection at the origin, parametrized by

 x(θ)=3tanθ1+tan3θ,y(θ)=3tan2θ1+tan3θ,θ∈[−π2,π2].

This curve is the projection onto a plane of a helix in . A standard isotropic random walker on the planar curve would get lost at the intersection, even when sober, as shown in Fig. 1(b), where the embedding completely mixes blue and red tails beyond the crossing point. In contrast, incorporating tangent information into local similarity scores yields a much more clear embedding back to (see Fig. 1(c)), which blows up (in the sense of complex algebraic geometry [42, pp.182]) the self-intersecting curve at its singularity and unraveled its hidden geometry. Specifically, the similarity measure used in the modified diffusion map between any pair of points and on the curve is

 d((x(θ1),y(θ1)),(x(θ2),y(θ2)))2 =∥(x(θ1),y(θ1))−(x(θ2),y(θ2))∥22

where is a parameter that balances the two contributions to the dissimilarity score in consideration. (Two distinct tangent vectors exist at the self-intersection, but they each belong to a distinct point in the parametrization.)

It is possible to use the methodology of ODM and VDM to tackle similar problems in much broader contexts, where the local geometric information can be of a different type than information about tangent planes. Indeed, for many data sets, a single data point has abundant structural details; typically graph-Laplacian-based methods begin by “abstracting away” these details, encoding only pairwise similarites. In some circumstances, the hidden details (pixels in an image, vertices/faces on a triangular mesh, key words and transition sentences in a text, etc.) may themselves be of interest. For example, in the geometry processing problem of analyzing large collections of 3D shapes, it is desirable to enable user exploration of shape variations across the collection. In this case, abstracting each single shape as a graph node completely ignores the spatial configuration of an individual shape. On the other hand, even when sticking to pairwise similarity scores significantly simplifies the data manipulation, the best way to score similarity is not always clear. In practice, the similarity measure is often dictated by practical heuristics, which may be misguided for incompletely understood data. In addition, there are situations for which it can be proved that no finite-dimensional representation will do justice to the data. (For instance, in topological data analysis of shapes and surfaces, the only known sufficient statistics (other than the data set itself) is the set of all persistent diagrams taken from all directions

[94].)

In this paper, we propose the Hypoelliptic Diffusion Map (HDM), a new graph-Laplacian-based framework for analyzing complex data sets. This method focuses on data sets in which pairwise similarity between data points is not sufficiently informative, but each single data point carries sophisticated individual structure

. In practice, this type of data set often arises when the data acquired is too noisy, has huge degrees of freedom, or contains un-ordered features (as opposed to sequential data). An example that has all these characteristics is, e.g., a data set in which each data point is a two-dimensional surface in

, represented either by a triangular mesh or a collection of persistent diagrams. In many cases, computing pairwise similarity within such data sets requires minimizing some functional over the space of admissible pairwise correspondences, and the similarity score between two surfaces is achieved by a certain optimal correspondence map between the surfaces. It is conceivable that the optimal correspondence contains substantial information, missing from the condensed similarity score. The HDM framework is our first attempt at mining this hidden information from correspondences.

Like ODM and VDM, HDM generalizes the DM framework, but it takes an essentially different path. We are most interested in the scenario in which the individual structures themselves are also manifolds. In order to take them into consideration, we first augment the manifold underlying DM, denoted as , with extra dimensions. To each point on , this augmentation attaches the individual manifold at , denoted as ; we assure that around each there exists an open neighborhood such that on the augmented structure “looks like” a product of with a “universal template” manifold . Intuitively, plays the role of a “parametrization” for all the . Of course, the existence of such a universal template makes sense only if the are compatible in some appropriate sense (each should at least be diffeomophic to ; we shall add more restrictions below); however, such compatibility is not uncommon for many data sets of interest, as we shall see in Section 2. This picture of parametrizing a family of manifolds with an underlying manifold is reminiscent of the modern differential geometric concept of a fibre bundle, which played an important role in the development of geometry, topology, and mathematical physics in the past century. Therefore, we shall refer to this geometric object as the underlying fibre bundle of the data set. Adopting the terminology from differential geometry, we call the base manifold, the universal template manifold the fibre, and each a fibre at .

The probabilistic interpretation of HDM is a random walk on the fibre bundle. In one step, the transition occurs either between points on adjacent but distinct fibres, or within the same fibre. Since the fibre bundle is itself a manifold (referred to as the total manifold, denoted as ), this looks so far no different from a direct application of DM, only on an augmented geometric object. However, HDM also incorporates the pairwise correspondences of data points in the fibre bundle formulation, by requiring transitions between distinct fibres to satisfy certain directional constraints imposed by the correspondences. The resulting random walk is no longer a direct analogy of its standard counterpart on the total manifold, but rather a “lift” of a random walk on the base manifold . Under mild assumptions, its continuous limit is a diffusion process on the total manifold , infinitesimally generated by a hypoelliptic differential operator [46] (thus the name HDM). We can then embed the whole fibre bundle into a Euclidean space using the eigenvectors of this hypoelliptic differential operator; discretely this corresponds to solving for the eigenvectors of our new graph Laplacian, referred to as a hypoelliptic Laplacian of the graph. It turns out that, by varying a couple of parameters in its construction, the family of graph hypoelliptic Laplacians contains the discrete analogue of several important and informative partial differential operators on the fibre bundle, relating the geometry of the base manifold with that of the total manifold. Our numerical experiments revealed interesting phenomena when embedding the fibre bundle using eigenvectors of these new graph Laplacians.

Though the HDM framework applies to general fibre bundles, the focus of this paper is the study of tangent and unit tangent bundles of Riemannian manifolds; in a sequel paper we shall study more general fibre bundles. Note that even though the fibre bundles in this paper are the same as for VDM, HDM for tangent bundle nevertheless differs from VDM; we shall come back to this below.

This paper is organized as follows: Section 2 sets up notations and terminology, and discusses the meaning of the fibre bundle assumption; Section 3 describes the formulation of HDM in detail; Section 4 characterizes the hypoelliptic graph Laplacians on tangent and unit tangent bundles, and studies their pointwise convergence from finite samples; some numerical experiments are shown in Section 5; finally we conclude with a brief discussion and propose potentially interesting directions for future work. In Appendix A we include preliminaries on the geometry of tangent bundles and (as their subbundles) unit tangent bundles.

## 2. Motivating The Fibre Bundle Assumption

For high-dimensional data generated by some implicit process with relatively fewer degrees of freedom, it is often reasonable to assume that the data lie approximately on a manifold of much lower dimension than the ambient space. In the literature on semi-supervised learning, this is often referred to as the

manifold assumption [6, 101]

. The goal of semi-supervised learning is to build a classifier based on a partially labeled training set; learning the underlying manifold structure of high-dimensional data is often the first step in this practice, not only because it reduces the dimensionality, but also due because it simplifies the data and exposes the structure.

Our fibre bundle assumption is a generalization of the manifold assumption. In differential geometry, a fibre bundle is a manifold itself, that is structured as a family of related manifolds parametrized by another underlying manifold. Following [88], a fibre bundle consists of the following data111Strictly speaking, the definition given here is that of a coordinate bundle [88, §2.3]; fibre bundles are equivalence classes of coordinate bundles. This distinction is less crucial since in the HDM framework we describe the structure of a fibre bundle using coordinates. This is similar to how manifold learning uses the notion of a manifold.:

1. the total manifold ;

2. the base manifold ;

3. the bundle projection , a surjective smooth map from onto ;

4. the fibre manifold , satisfying

1. for any , is diffeomorphic to ;

2. for any , there exists an open neighborhood of in and a diffeomorphism from to ;

5. the structure group , a topological transformation group that acts effectively222 acts effectively on if for all implies , the identity element of on , satisfying

1. for any and two open neighborhoods and that both satisfy (4b), the diffeomorphism on , defined as “freezing the first component as ”, obtained from as

 gxUV:=[ϕV∘ϕ−1U](x,⋅):F→F,

is an element in , and this correspondence

 x↦gxUV

is continuous with respect to the topology on ;

2. for any and three open neighborhoods that all satisfy (4b),

 gxUU=the identity element e of G

and

 gxUV∘gxVW=gxUW.

The diffeomorphisms in (4b) are also known as local trivializations. For each on the base manifold , it is conventional to denote the fibre over as . The fibre bundle assumption can now be stated as follows:

###### Assumption 2.1 (The Fibre Bundle Assumption).

The data lie approximately on a fibre bundle, in the sense that each data object is a subset of a fibre over some point on a base manifold.

Note that in the special case where the fibre manifold is a single point, the fibre bundle is diffeomorphic to its base manifold, and our fibre bundle assumption reduces to the manifold assumption.

The definition of a fibre bundle is technical, especially for the part involving the structure group . The key point is that a fibre bundle is locally a product manifold, and these local pieces are carefully patched together so that the product structures remain consistent when they intersect. Product manifolds are thus fibre bundles by definition, but the concept of a fibre bundle becomes interesting only when the global geometry gets twisted and exposes non-trivial topology. The Möbius band, the Klein bottle, and the Hopf fibration are standard illustrations of this; see e.g.  [88, §1].

At a first glance, the fibre bundle assumption imposes strong restrictions on the data set structure. However, when understanding the structure of individual data points is equally as interesting as understanding the structure of the data set in the large, the framework based on the manifold assumption becomes insufficient. For instance, in geometric morphormetrics [99], the data sets of interest are collections of shapes, i.e., two-dimensional smooth surfaces in , and the central problem is to infer species and other biological information from shape variations. Under the assumption that these variations are governed by relatively few degrees of freedom, it is possible to learn manifold coordinates for each shape in the collection (e.g., applying the diffusion map to the shape collection based on some pairwise shape-distance, e.g., [74, 60, 38, 62, 57, 58, 1]). Yet it is difficult to infer shape variation from such coordinates, since the geometry of each individual shape is “abstracted away”, collapsing each shape to a single point. To add interpretability to the manifold learning framework in this circumstance, it is a natural idea to learn different coordinates for distinct points on the same shape, and simultaneously keep similar the coordinates of points belonging to different shapes that are developmentally or functionally equivalent. This geometric intuition is embodied by the fibre bundle assumption. From this point of view, the fibre bundle assumption is but an extra level of indirection (borrowing a term from Andrew Koenig’s “fundamental theorem of software engineering”) for the manifold assumption.

The example of shape analysis in geometric morphometics is particularly interesting, because it contains another source of ideas that naturally models the data set as a fibre bundle: the global registration problem. Geometric morphometricians typically select equal numbers of homologous landmarks on each shape in a globally consistent manner, then reduce the analysis to investigation of the shape space [51, 52] of these landmark points. Along these lines, tools like the Generalized Procrustes Analysis (GPA) [40, 34, 53, 41] have been developed in statistical shape analysis [33], and software products [69, 71] made available. (Recent progress in this area [67, 86, 66, 3, 20] relates semidefinite programming with the little Grothendieck problem.) A common basis for these GPA-based methods is that the homology of landmark points depends on human input. Manually placing landmarks on each shape among a large collection is a tedious task, and the skill to perform it correctly typically requires years of professional training. Recently, automated methods have been proposed in this field, based on efficient and robust pairwise surface comparison algorithms [18, 57, 58, 1, 72, 73]. However, biological morphologists typically do not compare surfaces merely pairwise: in practice, an experienced morphologist uses a large database of anatomical structures to improve the consistency and accuracy of visual interpretation of biological features. This consistency can not be trivially achieved by any geometric algorithm that uses only pairwise comparison information, even when each pairwise comparison is of remarkably high quality. This is shown in Fig.2, where a small set of landmarks is propagated from a Microcebus molar to a corresponding Lepilemur molar, along three different paths. Though all surfaces A through E are fairly similar to each other (and hence the algorithm in [1] guarantees high quality pairwise correspondences), direct propagation of landmarks via path gives a different result from or . Using the collection leads to a more accurate correspondence between and then an isolated - comparison would.

In the fibre bundle framework, the inherent inconsistency for pairwise-comparison-based global registration can be modeled using the concept of the holonomy of connections. In the sense of Ehresmann [36], a connection is a choice of splitting the short exact sequence

 (2.1) 0→VE→TE→π∗TM→0

In this short exact sequence, is the tangent bundle of the total manifold ; is the vertical tangent bundle of , a subbundle of spanned by vectors that are tangent not only to at some point , but also to the fibre over ; is the pullback bundle of to . The practical meaning of this definition is as follows: since the fibre over carries manifold structure for itself, the notion of vectors that are “tangent to the fibre” is well-defined; they correspond to . The short exact sequence (2.1) tells us that the quotient bundle of by is isomorphic to , but there is no canonical way to choose a “horizontal tangent bundle” for such that

 HE⊕VE=TE.

The definition of an Ehresmann connection is just the choice of such a subbundle . More concretely, a connection specifies for each point a subspace of , such that together with all vertical tangent vectors at spans the entire tangent space at . Of course, the choice of subspaces should depend smoothly on . We shall call vectors in horizontal, while keeping in mind that this concept builds upon the connection.

As long as a connection is given on a fibre bundle, tangent vectors on the base manifold can always be canonically lifted to . That is, for any and any tangent vector , there exists in a unique tangent vector . In fact, this follows immediately from the fact that is isomorphic to , as implied in the short exact sequence (2.1). Moreover, a smooth vector field on can be uniquely lifted to , resulting in a vector field on that is horizontal everywhere. This eventually enables us to lift any smooth curve on the base manifold to a horizontal curve on , defined by the ODE

 d~γdt∣∣∣u(t)=(dγdt∣∣∣π(u(t)))L.

Note that the horizontal curve is uniquely determined once its starting point on is specified. Therefore, given a smooth curve that connects to on , there exists a smooth map from to (at least when and are sufficiently close), defined as

 Fγ(0)∋s↦~γs(1)∈Fγ(1),

where denotes the horizontal lift of with starting point . Such constructed maps between neighboring fibres, obviously depending on the choice of path , is called the parallel transport along . Like the concept of horizontal tangent vectors, parallel transport depends on the choice of the connection. We shall denote the parallel transport from fibre to fibre as . When is a unique geodesic on that connects to , we drop the super-index and simply write . We shall see later that the probabilistic interpretation of HDM (and even VDM) implicitly depends on lifting from the base manifold a path that is continuous but not necessarily smooth. Though this can not be trivially achieved by the ODE based approach, stochastic differential geometry has already prepared the appropriate tools for tackling this technicality (see e.g.  [90, §5.1.2]).

We now return to modeling the inherent inconsistency for geometric morphometrics. Similar to the diffusion map framework, where small distances are considered to approximate geodesic distances on the manifold, we assume, when the pairwise distance between surfaces is relatively small among all pairwise distances within the collection, that the shape distance is approximately equal to the geodesic distance on the base manifold. Moreover, under the fibre bundle assumption, we consider the pairwise correspondence map from to to approximate , the parallel transport along the geodesic connection to . By routing through different intermediates, one obtains different maps from to , which is conceptually equivalent to parallel-transporting along different piecewise geodesics. Due to the dependency on the underlying path, the parallel transport typically does not define globally consistent maps. The inconsistency shown in Fig.2, caused by propagation along three different paths, fits into this geometric picture.

The inconsistency of parallel transport, closely related to the curvature of the corresponding connection [2], is characterized by the notion of holonomy [91, 19, 11]. If for all the parallel transport is independent of the choice of path , then the connection is said to be flat or has trivial holonomy; otherwise the connection is non-flat or the holonomy is non-trivial. Fig.3 illustrates the non-trivial holonomy of the Levi-Civita connection on the unit sphere in : if we parallel transport a tangent vector , first from to along the equator and then from to along the meridian, then the result is generally different from , the result obtained by directly parallel transporting from to along the meridian that connects the two points.

The fibre bundle of interest in Fig.3 is an example of a tangent bundle. Generally, the tangent bundle of a -dimensional Riemannian manifold is a fibre bundle with base manifold , fibre , and structure group (the -dimensional orthogonal group); the fibre over each is , the tangent space of at . On this bundle, there uniquely exists a canonical connection, the Levi-Civita connection, that is simultaneously torsion-free and compatible with the Riemannian metric on . The unit tangent bundle is a subbundle of , with the same base manifold and structure group, but has a different type of fibre , the unit -dimensional sphere in ; the fibre over each consists of all tangent vectors of at with unit length. The Levi-Civita connection carries over to a canonical connection on . We focus on analyzing HDM on these two types of fibre bundles in this paper.

Note that the tangent bundle is also of fundamental importance for VDM. However, as we shall see in Section 3, HDM aims at a goal different from VDM’s, even on tangent bundles: VDM acts on vector fields on , or equivalently operates on sections of (denoted as ); HDM focuses on functions on , and thus operates on sections of the trivial line bundle (denoted as ). While VDM embeds the base manifold into a Euclidean space of lower dimension, HDM is more interested in how each fibre of corresponds to its neighboring fibres. In short, VDM and HDM extend DM in two different directions.

The use of diffusion maps to solve the global registration problem was proposed earlier in the geometry processing community [80, 54], as was the concept of a “template” for a collection of shapes [96, 87, 68, 48, 47, 22]. These approaches were quite successful, albeit based mostly on heuristics; the fibre bundle framework provides geometric interpretations and insights for many of them. For instance, cycle-consistency-based approaches  [68, 47] focus on improving the consistency of composed correspondence maps along -cycles, which is implicitly an attempt to recover from condition (5b) the fibre bundle structure that underlies the shape collection; from this point of view, these method sample only one point from each coordinate patch on the base manifold, and likely suffer from an inaccurate recovery due to low sampling rate. [54]

uses the diffusion map as a visualization tool, based on a dissimilarity score computed from local and global shape alignments. This is similar to the random walk HDM constructs on a fibre bundle; the geometric meaning of the fuzzy correspondence score in

[54] is vague from a manifold learning point of view, but then, it was not the main focus of [54] to analyze the new graph Laplacian on the discretized fibre bundle.

From the fibre bundle point of view, the goal of many global registration problems is to learn the fibre bundle structure that underlies the collection of objects. Making an analogy with the terminology manifold learning, we call this type of learning problems fibre learning. A flat connection, or its induced parallel transport, is the key to resolving the problem. However, we remark that the existence of a flat connection on an arbitrary fibre bundle is not guaranteed: the geometry and topology of the fibre bundle may be an obstruction. For a discussion on tangent bundles, see [61, 39].

## 3. Hypoelliptic Diffusion Maps: The Formulation

### 3.1. Basic Setup

The data set considered in the HDM framework is a triplet , where

1. The total data set is formed by the union

 X=n⋃j=1Xj

where each subset is referred to as the -th fibre of , containing points

 Xj={xj,1,xj,2,⋯,xj,κj}.

We call the collection of fibres the base data set

 B={X1,X2,⋯,Xn},

and let be the canonical projection from to

 π:X ⟶B xj,k ⟼Xj,1≤j≤n,1≤k≤κj.

We shall denote the total number of points in as

 κ=κ1+κ2+⋯+κn.
2. The similarity measure is a real-valued function on , such that for all

 ρ(ξ,η)≥0,ρ(ξ,ξ)=0,ρ(ξ,η)=ρ(η,ξ).

On the product set , we denote

 ρij(s,t)=ρ(xi,s,xj,t);

then is an matrix on , to which we will refer as the similarity matrix between and .

3. The affinity graph has vertices, with each corresponding to a point ; without loss of generality, we shall assume is connected. (In our applications, each is typically connected to several ’s on neighboring fibres.) If there is an edge between and in , then is a neighbor of and is a neighbor of . Moreover, we also call a neighbor of (and similarly a neighbor of ) if there is an edge in linking one point in with one point in ; this terminology implicitly defines a graph , where vertices of are in one-to-one correspondences with fibres of , and encodes the neighborhood relations between pairs of fibres. will be called as the base affinity graph.

### 3.2. Graph Hypoelliptic Laplacians

Let be the weighted adjacency matrix of the graph , i.e., is a block matrix in which the -th block

 (3.1) Wij=ρij.

The entry in is thus the edge weight between and . Note that is a symmetric matrix, since is symmetric. Let be the diagonal matrix

 (3.2) D:=diag{n∑j=1κj∑t=1W1j(1,t),⋯,n∑j=1κj∑t=1Wn,j(κn,t)},

then the graph hypoelliptic Laplacian for the triplet is defined as the graph Laplacian of the graph with edge weights given by , that is

 (3.3) LH:=D−W.

Since is connected, the diagonal elements of are all non-zero, and we can define the random-walk and normalized version of

 (3.4) LHrw:=D−1LH=I−D−1W,
 (3.5) LH∗:=D−1/2LHD−1/2=I−D−1/2WD−1/2.

Following [27], we can also repeat the constructions above on a renormalized graph of . More precisely, let be the diagonal matrix

 (3.6) Qα:=diag⎧⎨⎩⎛⎝N∑j=1Kj∑t=1W1j(1,t)⎞⎠α,⋯,⎛⎝N∑j=1Kj∑t=1WN,j(KN,t)⎞⎠α⎫⎬⎭,

where is some constant between and , and set

 (3.7) Wα:=Q−1αWQ−1α.

The graph hypoelliptic Laplacians can then be constructed for instead of , by first forming the diagonal matrix

 (3.8) Dα:=diag⎧⎨⎩N∑j=1Kj∑t=1(Wα)1j(1,t),⋯,N∑j=1Kj∑t=1(Wα)N,j(KN,t)⎫⎬⎭,

and then set

 (3.9) LHα:=Dα−Wα,
 (3.10) LHα,rw:=D−1αLHα=I−D−1αWα,
 (3.11) LHα,∗:=D−1/2αLHαD−1/2α=I−D−1/2αWαD−1/2α.

### 3.3. Spectral Distances and Embeddings

In order to define spectral distances, we shall use eigen-decompositions. This is the reason to consider the symmetric matrices ; their eigen-decompositions lead to a natural representation for , since and are diagonal-similar:

 LHα,*=D1/2αLHα,rwD−1/2α.

Let be an eigenvector of corresponding to eigenvalue ; defines a function on the vertices of , or equivalently on the data set . By the construction of , can be written as the concatenation of segments of length ,

 v=(v⊤[1],⋯,v⊤[n])⊤

where defines a function on fibre . Now let be the eigenvalues of in ascending order, and denote the eigenvector corresponding to eigenvalue as . By our connectivity assumption for , we know from spectral graph theory [23] that , , and is a constant vector with all entries equal to ; we have thus

 0=λ0<λ1≤λ2≤⋯≤λκ−1.

By the spectral decomposition of ,

 (3.12) LHα,∗=κ−1∑l=0λlvlv⊤l,

and for any fixed diffusion time ,

 (3.13) (LHα,∗)t=κ−1∑l=0λtlvlv⊤l,

with the -block taking the form

 (3.14) ((LHα,∗)t)ij=κ−1∑l=0λtlvl[i]v⊤l[j].

In general, this block is not square. Its Frobenius norm can be computed as

 (3.15) =Tr[((LHα,∗)t)ij((LHα,∗)t)⊤ij] =Tr[κ−1∑l,m=0λtlλtmvl[i]v⊤l[j]vm[j]v⊤m[i]] =Tr[κ−1∑l,m=0λtlλtmv⊤m[i]vl[i]v⊤l[j]vm[j]] =κ−1∑l,m=0λtlλtmv⊤m[i]vl[i]v⊤l[j]vm[j].

Let us define the hypoelliptic base diffusion map

 (3.16) Vt:B ⟶Rκ2 Xj ⟼(λt/2lλt/2mv⊤l[j]vm[j])0≤l,m≤κ−1

then (denoting the standard Euclidean inner produce in as )

 (3.17)

with which we can define the hypoelliptic base diffusion distance on as

 (3.18) dHBDM,t (Xi,Xj)=∥∥Vt(Xi)−Vt(Xj)∥∥ ={⟨Vt(Xi),Vt(Xi)⟩+⟨Vt(Xj),Vt(Xj)⟩−2⟨Vt(Xi),Vt(Xj)⟩}12.

The hypoelliptic base diffusion map embeds the base data set into a Euclidean space using , the base affinity graph with edges weighted by entry-wise non-negative matrices. In this sense, it is closely related to the vector diffusion maps [81]: if

 κ1=κ2=⋯=κn=d

and (relaxing the constraint )

 (vi,vj)∈EB⇔ρij=wijOij,wherewij≥0 and Oij is d×d% orthogonal,

then the weighted adjacency matrix (as defined in (3.1)) coincides with the adjacency matrix defined in [81, §3]. In this case, the graph hypoelliptic Laplacian of reduces to the graph connection Laplacian for . Note that in HDM we assume the non-negativity of the similarity measure , which is generally not the case for vector diffusion maps. (The non-negativity of the eigenvalues of allows us to consider arbitrary powers ; in VDM, this is circumvented by considering powers of .) From a different point of view, by the Riesz Representation Theorem, smooth vector fields on a manifold can be identified with linear functions on , thus VDM can be viewed as HDM restricted on the space of linear functions on .

In addition to embedding the base data set , HDM is also capable of embedding the total data set into Euclidean spaces. Define for each diffusion time the hypoelliptic diffusion map

 (3.19) Ht:X ⟶Rκ−1 xj,s ⟼(λt1v1[j](s),λt2v2[j](s),⋯,λtκ−1v(κ−1)[j](s)).

where is the -th entry of the -th segment of the -th eigenvector, with . We could also have written

 vl[j](s)=vl(sj+s),where% s1=0 and sj=j−1∑p=1κp for j≥2.

Following a similar argument as in [27], we can define the hypoelliptic diffusion distance on as

 (3.20)

As a result, embeds the total data set into a Euclidean space in such a manner that the hypoelliptic diffusion distance on is preserved. Moreover, this embedding automatically suggests a global registration for all fibres, according to the similarity measure . For simplicity of notations, let us write

 Htj:=Ht↾Xj

for the restriction of to fibre , and call it the -th component of . Up to scaling, the components of bring the fibres of to a common “template”, such that points and with a high similarity measure tend to be close to each other in the embedded Euclidean space. Pairwise correspondences between fibres can then be reconstructed from the hypoelliptic diffusion map. Indeed, assuming each is sampled from some manifold , and a template fibre can be estimated from

 Ht1(X1),⋯,Htn(Xn),

then one can often extend (by interpolation)

from a discrete correspondence to a continuous bijective map from to , and build correspondence maps between an arbitrary pair by composing (the interpolated continuous maps) and . A similar construction was implicit in [54]. Sometimes, it is more useful to consider a normalized version of hypoelliptic diffusion map that takes value on the standard unit sphere in :

 (3.21) ˜Ht:X ⟶Sκ−2⊂Rκ xj,s ⟼Htj(xj,s)∥∥Htj(xj,s)∥∥.

We shall see an example that applies to in Section 5.

## 4. HDM on Tangent and Unit Tangent Bundles

The HDM framework is very flexible: if each fibre consists of one single point, the hypoelliptic graph Laplacian reduces to the graph Laplacian that underlies diffusion maps; if all the fibres have the same number of points and all similarity matrices (defined in Section 3.3) are orthogonal (up to a multiplicative constant), the hypoelliptic graph Laplacian reduces to the graph connection Laplacian that underlies vector diffusion maps. The goal of this section is to relate HDM to some other partial differential operators of geometric importance on tangent and unit tangent bundles of (compact, closed) Riemannian manifolds. In a follow-up paper, we will extend the geometric setting to more general fibre bundles.

This section builds upon the fibre bundle assumption. Adopting notation in Section 3.1, we assume that is sampled from a fibre bundle , and each fibre is sampled from a fibre over some point on the base manifold . Moreover, we shall assume that is the tangent bundle or unit tangent bundle of , i.e., or . For the convenience of the reader, some basic properties about the geometry of these fibre bundles are reviewed in Appendix A.

### 4.1. HDM on Tangent Bundles

Let be a smooth kernel function supported on the unit square . In all that follows, we shall assume that is a compact manifold without boundary, which, according to standard custom, we shall simply call a closed manifold. Let the closed manifold

be equipped with a Riemannian metric tensor

, which induces on a geodesic distance function ; defines an inner product on each tangent space of , denoted as

 (4.1) ⟨u,v⟩x=gjk(x)ujvk,u,v∈TxM,

where, and for the remainder of this section unless otherwise specified, we have adopted the Einstein summation convention. The vector norm on with respect to this inner product shall be denoted as

 (4.2) ∥u∥x=(gjk(x)ujuk)12,u∈TxM.

We denote

 (4.3) Py,x:TxM→TyM

for the parallel transport from to with respect to the Levi-Civita connection on , along a geodesic segment that connects to . It is well known that a tangent vector can be parallel-transported along any smooth curve on ; since is compact, its injectivity radius is positive, and thus any lies within a geodesic normal neighborhood in which any point can be connected to through a unique geodesic with length smaller than . Therefore, is well-defined, at least for with . Furthermore, for such , is an orientation-preserving isometry between the domain and target tangent planes [30, Exercise 2.1].

For bandwidth parameters , , define for all

 (4.4)

Note that the requirement that implies that only if . It follows that , and further , are well-defined when ; we shall restrict ourselves to such sufficiently small . is symmetric because is an isometry between and :

 Kϵ,δ(y,w;x,v) =K⎛⎝d2M(y,x)ϵ,∥∥Px,yw−v∥∥2xδ⎞⎠=K⎛⎜⎝d2M(x,y)ϵ,∥∥Py,x(Px,yw−v)∥∥2yδ⎞⎟⎠ =K⎛⎜⎝d2M(x,y)ϵ,∥∥w−Py,xv∥∥2yδ⎞⎟⎠=Kϵ,δ(x,v;y,w).

This symmetry is of particular importance for the definition of symmetric diffusion semigroups [89].

Let be the probability density function according to which we shall sample. Assume is bounded from both above and below (away from ):

 (4.5) 0

Define

 (4.6) pϵ,δ(x,v):=∫TMKϵ,δ(x,v;y,w)p(y,w)dμ(y,w).

where is the standard volume form on . As in Appendix A, is a product of (the standard translation- and rotation-invariant Borel measure on ) and (the standard Riemannian volume element on ). If we fix and integrate along , then

 (4.7) ¯¯¯p(x):=∫TxMp(x,v)dVx(v)

is a density function on , since

 ∫M¯¯¯p(x)dvol(x)=∫M∫TxMp(x,v)dVx(v)dvol(x)=∫TMp(x,v)dμ(x,v)=1.

We call the projection of on . Furthermore, dividing by yields a conditional probability density function on

 (4.8) p(v∣x)=p(x,v)¯¯¯p(x),

since

 ∫TxMp(v∣x)dVx(v)=∫TxMp(x,v)dVx(v)¯¯¯p(x)=¯¯¯p(x)¯¯¯p(x)=1.

For any , we can define its average along fibres on , with respect to the conditional probability density functions, as the following function on :

 (4.9) ¯¯¯f(x)=∫TxMf(x,v)p(v∣x)dVx(v),∀x∈M.

Finally, for any , define the -normalized kernel

 (4.10) Kαϵ,δ(x,v;y,w):=Kϵ,δ(x,v;y,w)pαϵ,δ(x,v)pαϵ,δ(y,w).

We are now ready to define a family of hypoellitpic diffusion operators on as

 (4.11)

for any .

We are interested in the asymptotic behavior of in the limit , . It turns out that this depends on the relative rate with which and approach . For simplicity of notation, let us write .

###### Theorem 4.1 (HDM on Tangent Bundles).

Let be a closed Riemannian manifold, and defined as in (4.11). If as , or equivalently if is asymptotically bounded, then for any , as (and thus ),

 (4.12) Hαϵ,δf(x,v) =f(x,v)+ϵm212m0[ΔH[fp1−α](x,v)p1−α(x,v)−f(x,v)ΔHp1−α(x,v)p1−α(x,v)]

where , , are positive constants depending only on the kernel .

Theorem 4.1 is the tangent bundle version of Theorem 4.5 (which applies to unit tangent bundles), and the proofs for these two theorems are essentially identical. We included a proof of Theorem 4.5 in Appendix B, from which a proof of Theorem 4.1 can be easily adapted.

###### Proposition 4.2.

Let and be as in Theorem 4.1. If , then for any and sufficiently small ,

 (4.13) limγ→∞Hαϵ,γϵf(x,v)=¯¯¯f(x)+ϵm′22m′0⎡⎢ ⎢⎣ΔM[¯¯¯f¯¯¯p1−α](x)¯¯¯p1−α(x)−¯¯¯f(x)ΔM¯¯¯p1−α(x)¯p1−α(x)⎤⎥ ⎥⎦+O(ϵ2),

where is the Laplace-Beltrami operator on , is the projected density function on , is the average of along fibres on , and , are constants that only depend on the kernel function .

###### Corollary 4.3.

Let and be as in Theorem 4.1. If as , , then for any , in general

 limγ→∞limδ→0Hαϵ,γϵf(x,v)−f(x,v)ϵ≠limδ→0limγ→∞Hαϵ,γϵf(x,v)−f(x,v)ϵ,

and thus an asymptotic expansion of for small is not well-defined. In fact, for each fixed , as in (4.12),

 (4.14) Hαϵ,γϵf(x,v)= f(x,v)+ϵm212m0[ΔH[fp1−α](x,v)p1−α(x,v)−f(x,v)ΔHp1−α(x,v)p1−α(x,v)] +γϵ m222m0[ΔV[fp1−α](x,v)p1−α(x,v)−f(x,v)ΔV