1 Introduction
Data in Euclidean space often lie on (or near) a lowerdimensional submanifold . For example, images with many pixels are highdimensional, but image libraries are often locally parameterized by many fewer dimensions isomap . In chemistry, the conformation space of a molecule may be a manifold or a union of manifolds cyclo . In topological data analysis (TDA), one considers the following question: given a finite sample of points (a point cloud) that lies on or near , what can one infer about the topology (i.e., “global structure”) of ? TDA has been used to study the global structure of data sets in a variety of fields (see, e.g., cortical ; topaz ; materials
). Researchers have also made significant progress towards using the geometric properties of the manifold for dimensionality reduction and data visualization
isomap ; laplacian_eigmap ; hessian_eigenmap ; LLE .We focus on inferring the homology of . Homology is a quantitative way of characterizing the topology of . For example, the rank of the dimensional homology is the number of connected components, and the rank of for is the number of dimensional holes in . If is compact and orientable, then the dimension of is equal to the largest such that is nontrivial. For example, if is the torus , then there is one connected component, there are two dimensional holes, and there is one dimensional hole. Although homology does not uniquely identify a manifold, it provides useful information about a manifold’s global structure, and the homology of a manifold can be used to distinguish it from other manifolds that have different homology.
Methods from persistent homology (PH) can be used to infer the homology of from a point cloud that is sampled from . To approximate the manifold, we construct a filtered complex, a combinatorial description of a topological space (see Definition 1). One of the classical approaches to building a filtered complex is the Čech complex . At each point , one places a ball of radius , where is the filtration level. A simplex with vertices is added to if the intersection is nonempty. The Nerve Theorem guarantees that is homotopyequivalent to . The PH of , which we denote by , records how the homology of changes as increases. As grows, new homology classes (which represent dimensional holes) are “born” and old homology classes “die”.
Conventional wisdom holds that the homology classes with the longest lifetimes are true topological features of and that the homology classes with the shortest lifetimes are noise. However, one can easily observe that this is not always true, even in simple examples such as Figure 1, in which the point cloud is sampled from the disjoint union of two circles of different sizes. The smaller circle represents a homology class that has a much shorter lifetime than the homology class for the larger circle, but both homology classes are true topological features. We visualize this in Figure 1, in which the balls of the Čech complex fill in the smaller circle much earlier than they fill in the larger circle.
Following the conventional wisdom, the homology class for the smaller circle might be recorded spuriously as noise. Problems with the conventional wisdom have been noted in many other papers, such as roadmap ; pers_images ; bendich2016 ; stolz2017 ; Bubenik2020 ; feng2021 .
In general, standard distancebased filtered complexes (such as the Čech complex) depend largely on “topological feature sizes,” by which we mean the following concept, introduced in feature_size . The medial axis of a submanifold in is the closure of
The local feature size at , denoted by , is the distance from to the medial axis. The condition number of is equal to , where . For example, if is an sphere in , then the medial axis is the center of the sphere and the local feature size at any point on the sphere is the radius of the sphere. Niyogi et al. showed that is homotopyequivalent to when and is sufficiently dense in weinberger . However, whenever is small, the Čech complex may only be homotopyequivalent to for a very small range of filtration values , even as the number of points sampled from the manifold approaches infinity.
Standard distancebased filtered complexes may perform especially poorly when contains features of different sizes, even if the smallest features have “high resolution” in the point cloud (i.e., the density of points is inversely proportional to the local feature size). For example, consider again the point cloud in Figure (a)a, sampled from the disjoint union of two circles and of different radii. (With probability we sample uniformly at random from , and with probability we sample uniformly at random from .) The product of the probability density function and the local feature size is a constant function; in that sense, the two circles have equally high resolution. However, the corresponding homology classes do not have equally high persistence in the PH of standard filtered complexes.
The dependence on topological feature size is because persistent homology is not a topological invariant. The topology of a manifold is invariant under homeomorphism, but standard distancebased filtered complexes (such as the Čech complex) are not invariant under homeomorphism. More precisely, suppose is a homeomorphism of manifolds and is a point cloud in . The manifolds and are homeomorphic, but and are not necessarily isomorphic (see Definition 9). Indeed, the bottleneck distance between the persistence diagrams (see Section 2.2) for and can be arbitrarily large^{1}^{1}1For example, consider the scaling homeomorphism defined by for some . For any point cloud with more than one point, the bottleneck distance between and approaches infinity as .. Therefore, the standard Čech complex depends on geometric properties such as size. Standard distancebased filtered complexes are closer to geometric tools than topological tools.
1.1 Contributions
We work in a probabilistic setting. We suppose that is an dimensional Riemannian manifold and that the point cloud consists of points sampled from a smooth probability density function . It is important that is nonzero everywhere because we cannot observe regions of the manifold where equals zero. The Riemannian metric is necessary because it turns the manifold into a metric space and induces a volume form . We define the probability measure to be , where is a Borel set Rman_stats . We note that all manifolds can be endowed with a Riemannian metric (see Section 2.3), so the requirement of a Riemannian metric is not a restriction on the types of manifolds we can study.
We construct a family of “densityscaled filtered complexes” by modifying the metric such that we effectively shrink the distances between points in sparse regions of the manifold and enlarge the distances between points in dense regions of the manifold. To do this, we define a conformally equivalent metric , where is a scaling factor that we define in Section 3.1. Our scaling factor plays an important role in the convergence property that we prove in Section 4 and discuss below. The metric is defined such that the points in
are uniformly distributed with respect to the volume form
in and such that the balls grow at a slower rate when is larger. We can then apply any existing distancebased filtered complex (such as the Čech complex) in the densityscaled Riemannian manifold .We show that our densityscaled filtered complexes have two important properties that other filtered complexes do not have:

Convergence: As , the interval of filtration values for which the densityscaled Čech complex is homotopyequivalent to the manifold grows to in probability, no matter the condition number of or any other geometric property of . (We make this statement precise in Theorem 4.1.) This means that in the PH of the densityscaled Čech complex, one can interpret the homology classes with the smallest birth times and longest lifetimes as the most important features.

Conformal invariance: We show that our densityscaled filtered complexes are invariant under conformal transformations (Theorem 5.1). This means that in contrast to standard complexes, our densityscaled complexes are closer to topological tools and do not depend as much on local feature sizes.
These properties improve our ability to infer the homology of from a point cloud and make it easier to compare the PH of point clouds sampled from different manifolds of possibly different scales.
We implement a filtered complex
that approximates the densityscaled Vietoris–Rips complex. We do this by estimating the density
via kerneldensity estimation and estimating Riemannian distances in a similar way as the widelyused Isomap algorithm
isomap . The implementation requires knowledge of the intrinsic dimensionof the manifold, which can be estimated using methods such as local principal component analysis
local_PCA ; intrinsic_dim , the conical dimension estimator conical_dimestimate , the ball expansion rate ball_dimestimate , or the doubling dimension doubling_dimestimate . We prove that our implementation is stable (Theorem 7.4): under suitable conditions that are almost surely satisfied, small perturbations of the input point cloud result only in small changes to the persistence diagram of . Consequently, it is still reasonable to use even when does not lie exactly on the manifold or when there is a small amount of noise in the data. The implementation is designed to handle outliers in the data; in Section 6.2 we discuss how this is done, and in Section 8.3 we test the empirical performance of on a point cloud with outliers. As applications, we use to count the number of clusters in a point cloud whose clusters have different densities (Section 8.4) and the number of equilibrium points in the Lorenz dynamical system from a timedelay embedding (Section 8.5).1.2 Related Work
Perhaps the most common TDAbased approach to nonuniform data is the
nearest neighbor (KNN) filtration (see Appendix
9.1.1). This is related to the densityscaled filtrations by the fact that if is the th nearest neighbor of , then converges in probability to a value that is proportional to as , for a choice of that depends on . (See knn_density for a precise statement.) However, the KNN filtration encounters problems when there are regions of the manifold that are close in Euclidean distance but far in Riemannian distance, especially if those regions differ in density. We discuss one example in Section 8.4; several other examples of KNN failures are given in continuous_knn . In continuous_knn , Berry and Sauer constructed a modification of the nearest neighbors graph (the continuous nearest neighbors graph) whose unnormalized graph Laplacian converges to the Laplace–Beltrami operator of a slightly different densityscaled Riemannian manifold. (Their densityscaled metric is , where is the original metric.) The authors of continuous_knn proved that the connected components of their graph were consistent with the components of the manifold. They left as conjecture the hypothesis that their graph was topologically consistent (i.e., that the dimensional homology of the clique complex of their graph converges to the dimensional homology of the manifold for ).A qualitatively different family of densityscaled metrics was considered in fermat_tda . For parameter , the densityscaled metric in fermat_tda is . The Riemannian distance induced by the densityscaled metric of fermat_tda is called the Fermat distance fermat . The Fermat distance effectively enlarges the distances between points in sparse regions of the manifold and shrinks the distances between points in dense regions of the manifold; by contrast, the densityscaled metric in the present paper does the opposite.
The densityscaled complexes in the present paper are also reminiscent of weighted complexes weightedPH . (See Appendix 9.1.2 for a review of weighted complexes.) In a weighted Čech complex, the radius of a ball is a function of the filtration parameter and the point at which the ball is centered^{2}^{2}2The radius function need not depend on density; more typically, the weight is determined by some intrinsic property of the point. For example, in weightedPH , a point cloud that represented the positions of image pixels had weights that were given by pixel intensity.. Weighted Vietoris–Rips complexes are defined analogously. One can define a “densityweighted” radius function
(1) 
from which one can define a densityweighted Čech complex and a densityweighted Vietoris–Rips complex. The main advantage of our densityscaled complexes over the densityweighted complexes is that our complexes are more robust with respect to noise. Specifically, if is an outlier in a lowdensity region, then the ball grows quickly in radius and may engulf balls in highdensity regions. This problem can occur even if all the data points lie exactly on the manifold . If is a highdensity region, then balls centered at points grow quickly in radius and may engulf points in lowdensity regions of . In Sections 8.3 and 8.4, we calculate examples and discuss these problems in more detail.
Other densitybased filtrations, such as the distancetomeasure (DTM) sublevel filtration dtm_tda and the density sublevel filtration kde_sublevel , are primarily designed for the purpose of noise filtering. Such methods assume that the regions of highest density are the true features of the manifold. For example, consider the point cloud of Figure (a)a again. In these other densitybased filtrations, it is the smaller circle whose corresponding homology class has a much longer lifetime in the persistent homology. In our densityscaled filtration, the two circles have equal lifetimes in the persistent homology, which reflects the fact that they have equally high “resolution” in the point cloud.
1.3 Organization
The rest of the paper is organized as follows. In Section 2, we review background from TDA and Riemannian geometry. In Section 3, we introduce our family of densityscaled filtered complexes, including definitions for a densityscaled Čech complex (DČ) and a densityscaled Vietoris–Rips complex (DVR). We discuss convergence properties in Section 4 and invariance properties in Section 5. In Section 6, we discuss our algorithm for the implementation of a filtered complex that approximates the densityscaled Vietoris–Rips complex. We prove the stability of our densityscaled complexes (including a stability theorem for ) in Section 7. In Section 8, we compute examples and compare to other filtered complexes. Finally in Section 9, we conclude and discuss some avenues for future research. The code used in this paper is available at https://bitbucket.org/ahickok/dvr/src/main/.
2 Background
2.1 Filtered Complexes
A comprehensive introduction to filtered complexes and TDA can be found in edel_book ; eat . Here we review the standard methods for building a filtered complex. Throughout this section, let denote a metric space and let denote a point cloud in . For any index set , let denote the simplex with vertices for all .
Definition 1
A filtered complex is a collection of simplicial complexes such that for all . We refer to as the filtration level.
Definition 2
The Čech complex is the filtered complex such that the set of simplices in at filtration level is
Equivalently, is the nerve of , where .
The Nerve Theorem provides theoretical guarantees for the Čech complex Borsuk .
Theorem 2.1 (Nerve Theorem)
If is either contractible or empty for all , then is homotopyequivalent to .
In Euclidean space, all balls are convex (hence their intersections are contractible), and thus the Čech complex at filtration level is homotopyequivalent to . In an arbitrary metric space, however, balls are not always convex. In a Riemannian manifold, is contractible only when is sufficiently small.
Computing the Čech complex is computationally intensive. In practice, researchers often compute the Vietoris–Rips complex instead, which requires only pairwise distances between the points.
Definition 3
The Vietoris–Rips complex is the filtered complex such that the set of simplices in at filtration level is
The Vietoris–Rips complex and the Čech complex share the same 1skeleton. When the metric space is Euclidean space, the Vietoris–Rips complex and the Čech complex are related by the Vietoris–Rips lemma edel_book , which says that
for all filtration values . In addition to the Čech and Vietoris–Rips complexes, there are many other methods for constructing a filtered complex from a point cloud. We review other relevant filtered complexes in Appendix 9.1.
2.2 Persistence Modules
In this section, we define persistence modules, persistent homology, and persistence diagrams. We assume the reader is familiar with homology. (A good introduction to homology and algebraic topology is hatcher .) References for the rest of this subsection can be found in fundthm ; pers_modules .
A persistence module over
is a collection of vector spaces
with linear maps that satisfy the composition law for all . If is a filtered complex, the persistent homology of over a field is the persistence module , which we denote by . For all , the inclusion induces a linear map . We sometimes drop the field from our notation when a fixed field is chosen. (All calculations in Section 8 are done with , the default field used by the GUDHI software package.) As increases, new homology classes are born and old homology classes die.The Fundamental Theorem of Persistent Homology, stated below, shows that we can decompose the persistence module in a way that yields a nice set of generators. If has a finite number of simplices for all (this condition holds for the Čech complex and the Vietoris–Rips complex), then there is a sequence such that for all . The direct sum has the structure of a graded module over the graded ring . The action of on a homogenous element is .
Theorem 2.2 (Fundamental Theorem of Persistent Homology fundthm )
The graded module is isomorphic to
(2) 
for some integers , , , where denotes an shift upward in grading for any integer .
An summand corresponds to a homology class that is born at filtration level and never dies. An summand corresponds to a homology class that is born at filtration level and dies at filtration level . The information in a persistence module can be summarized by a persistence diagram, which is a multiset of points in the extended plane . Given a decomposition in the form of Equation 2, the persistence diagram includes the points for all , the points for all , and all points on the diagonal. The points on the diagonal are included for technical reasons; one can think of them as homology classes that die instantaneously. We denote the persistence diagram of a persistence module by . The bottleneck distance between two diagrams is defined to be
where the infimum is taken over all bijections .
2.3 Riemannian Geometry
We briefly review the necessary background from Riemannian geometry. For further reading, we recommend a textbook such as petersen . A Riemannian manifold is a smooth manifold with a Riemannian metric that defines a smoothlyvarying inner product on each tangent space . More precisely,
is a 2tensor field on
; to each , the Riemannian metric assigns a bilinear map on the tangent space . A Riemannian metric allows one to define the length of a vector to be . The length of a continuously differentiable path is defined to be .A Riemannian manifold is a metric space. The distance between two points , in the same connected component of is
If is complete, then the infimum is achieved by a geodesic, a curve that locally minimizes length. If and are in different connected components, then their distance is infinite.
To see that all manifolds can be given a Riemannian metric, recall that all manifolds can be embedded into Euclidean space. Let be an embedding. The canonical Euclidean metric pulls back to a Riemmanian metric on . We call the Euclideaninduced Riemannian metric. On each tangent space , the metric is the restriction of to . A Riemannian metric induces a volume form , the unique form on that equals on all positively oriented orthonormal bases. In local coordinates, the expression for the volume form is
With a volume form and a smooth probability density function , one can define a probability measure on the manifold. A good reference for probability and statistics on Riemannian manifolds is Rman_stats . The volume form induces a Riemannian measure on . The measure of a Borel set is , and the volume of is . The probability measure is defined to be
for Borel sets .
Two Riemannian metrics , on are conformally equivalent if there is a positive function such that
A conformal transformation is a diffeomorphism such that pulls back to (i.e., ) for some positive function . Conformal transformations preserve angles; one can think of a conformal transformation as a transformation that “locally scales” the manifold. For example, if is a submanifold of and has the Euclideaninduced Riemannian metric, then any global scaling is a conformal transformation.
A special type of conformal transformation is an isometry. An isometry of Riemannian manifolds is a diffeomorphism such that pulls back to (i.e., ). An isometry of Riemannian manifolds is an isometry of metric spaces in the usual sense (i.e., ).
3 Our Family of DensityScaled Filtered Complexes
3.1 Our DensityScaled Riemannian Manifold
Let be an dimensional Riemannian manifold from which we sample points according to a smooth probability density function . We begin by defining a conformallyequivalent Riemannian metric such that the points are uniformly distributed in .
Definition 4
In this paper, we set
which satisfies Equation 4. However, the convergence properties of Sections 4 hold for any choice of that satisfies the conditions of Equation 4, and the invariance and stability results in Sections 5 and 7 hold for any choice of strictly positive function .
The uniform probability measure on is for all Borel sets , where is the volume form on and is the volume of . Using local coordinates, we see that satisfies
Therefore because
This means that sampling points from with probability density function is equivalent to sampling points uniformly at random from .
3.2 Our Definition of a DensityScaled Filtered Complex
Definition 5
Let be a Riemannian manifold, and let be a point cloud that consists of points sampled from a smooth probability density function . The densityscaled Cěch complex is the filtered complex
where is the Riemannian distance function in and is defined as in Equation 3. Equivalently, the set of simplices in at filtration level is
where .
Definition 6
Let be a Riemannian manifold, and let be a point cloud that consists of points sampled from a smooth probability density function . The densityscaled Vietoris–Rips complex is the filtered complex
where is the Riemannian distance function in and is defined as in Equation 3. Equivalently, the set of simplices in at filtration level is
More generally, one can define a densityscaled version of any distancebased filtered complex by applying the filtered complex to the point cloud in the metric space , where is the Riemannian distance function in the densityscaled manifold .
Definition 7
Let be a Riemannian manifold, and let be a point cloud that consists of points sampled from a smooth probability density function . If is a distancebased filtered complex, where denotes a metric space, then the densityscaled filtered complex is
where is the Riemannian distance function in and is defined as in Equation 3.
4 Convergence Properties of the DensityScaled Čech Complex
In Theorem 4.1 below, we show that the densityscaled Čech complex is homotopyequivalent to for an interval of filtration values that grows arbitrarily large in probability as . We begin by reviewing the relevant concepts. The convexity radius of a Riemannian manifold is
where and where is the Riemannian distance function in . If , the ball is geodesically convex (hence contractible). Furthermore, the intersection of geodesically convex balls is geodesically convex (hence contractible or empty). Let denote the densityscaled Riemannian metric when there are points, and let denote the convexity radius of . The coverage radius of a point cloud in a Riemannian manifold is
Let denote the coverage radius of a point cloud in .
Theorem 4.1
Let be a Riemannian manifold, and let be a point cloud that consists of points sampled from a smooth probability density function . If , then is homotopyequivalent to . If is compact, then as . If is compact and connected, then in probability as .
Proof
Lemma 1
If is compact, then as .
Proof
The convexity radius of a compact manifold is positive (see, e.g., Proposition 20 in convexity_radius ). Therefore, , so because .
Now we turn to the coverage radius. The behavior of the coverage radius is controlled by the filling factor. On an dimensional Riemannian manifold from which balls of radius are chosen uniformly at random, the filling factor is
(5) 
where is the volume of a Euclidean unit ball. For small , the filling factor approximates the number of points inside a ball of radius . Let be the number of balls of radius required to cover , assuming the balls are chosen uniformly at random. Let be the volume of a Euclidean ball of radius . Define
Theorem 4.2 (Theorem 1.1 in flatto_newman )
Let be a compact, connected Riemannian manifold with unit volume. There are constants and , which do not depend on , such that if , then
Corollary 1
Let be a compact, connected Riemannian manifold. Suppose is a point cloud that consists of points sampled uniformly at random from . Suppose is a sequence such that and .

If is such that , then as .

If is such that , then as .
Proof
Case 1:
In this case, the structure of our proof is similar to that of Corollary B.2 in vanishing_homology . First, we observe that the radius of the balls can be expressed by
If is sufficiently large and , then . Let .
Case 2:
Let be the Riemannian manifold that is normalized to have unit volume. Let denote the filling factor for and let be the coverage radius for the point cloud in . In , the Riemannian distance function is . Therefore, for any , we have
When the radius of the balls in is , the filling factor in is
Applying Case 1 to completes the proof.
This shows that on a compact, connected Riemannian manifold from which points are sampled uniformly at random, there is a threshold filling factor
(8) 
above which the balls are likely to cover and below which the balls are unlikely to cover . There is a corresponding threshold radius . The threshold radius on is
(9) 
By Equation 4, we have that as .
Lemma 2
Let be a compact, connected Riemannian manifold, and let be a smooth probability density function from which points are sampled. Then in probability as . Moreover, in probability.
Proof
Let be a sequence such that and . Define the sequence of filling factors
and define to be
which is the radius that corresponds to a filling factor of on . Note that , where is defined as in Equation 9. Because , it must be true that .
5 Conformal Invariance
Let and be Riemannian manifolds, and let be a diffeomorphism. If is a smooth probability density function, then we can pull back to a probability density function as follows.
Definition 8 (Pullback of a Probability Density Function)
The pullback of under is the function such that . The probability density function exists because the space of forms on an dimensional manifold is spanned by .
The pullback of a probability density function is defined such that sampling a point cloud from is equivalent to sampling a point cloud from and setting .
Proposition 1
Suppose is sampled from and let . Suppose is sampled from , where is the pullback of defined by Definition 8. Then and are identically distributed.
Proof
If is a Borel set, then
Prop 1 justifies a comparison of to . Below, we define what we mean by an isomorphism of two filtered complexes and what we mean by invariance of a filtered complex.
Definition 9 (Isomorphism of Filtered Complexes)
Let and be filtered complexes, and let , be the sets of vertices and simplices, respectively, of . Let be the set of all vertices of . We say that and are isomorphic if there is a bijective map such that induces bijections and for all .
Definition 10 (Invariance)
Let and be Riemannian manifolds, and let be a diffeomorphism. A densityscaled complex is invariant under if is isomorphic to for all smooth probability density functions and point clouds sampled from , where is the pullback of defined by Definition 8.
We restrict ourselves to a suitable class of distancebased filtered complexes that are invariant under global isometry. This class includes the Čech complex, the Vietoris–Rips complex, and many other standard distancebased filtered complexes.
Definition 11 (Invariance Under Global Isometry)
Let and be metric spaces. A distancebased filtered complex is invariant under global isometry if is isomorphic to for all global isometries and all point clouds in .
Theorem 5.1 below shows in particular that the densityscaled Čech complex DČ and the densityscaled Vietoris–Rips complex DVR are invariant under all conformal transformations. As a corollary, this implies that they are invariant under global scaling (Corollary 2). Additionally, they are invariant under diffeomorphisms of dimensional manifolds (Corollary 3).
Theorem 5.1
Suppose that is a distancebased filtered complex that is invariant under global isometry, and let be the densityscaled filtered complex. Then
Comments
There are no comments yet.