Persistent homology [carlsson-survey, brunrer-book, elz-topological] is a technique to analyze of data sets using topological invariants. The idea is to build a multi-scale representation of the data set and to track its homological changes across the scales.
A standard construction for the important case of point clouds in Euclidean space is the Vietoris-Rips complex (or just Rips complex): for a scale parameter , it is the collection of all subsets of points with diameter at most . When increases from to , the Rips complexes form a filtration, an increasing sequence of nested simplicial complexes whose homological changes can be computed and represented in terms of a barcode.
The computational drawback of Rips complexes is their sheer size: the -skeleton of a Rips complex (that is, only subsets of size are considered) for points consists of simplices because every -subset joins the complex for a sufficiently large scale parameter. This size bound turns barcode computations for large point clouds infeasible even for low-dimensional homological features444An exception are point clouds in and , for which alpha complexes [brunrer-book] are an efficient alternative.. This poses the question of what we can say about the barcode of the Rips filtration without explicitly constructing all of its simplices.
We address this question using approximation techniques. Barcodes form a metric space: two barcodes are close if the same homological features occur on roughly the same range of scales (see Section 2 for the precise definition). The first approximation scheme by Sheehy [sheehy-rips] constructs a -approximation of the -skeleton of the Rips filtration using only simplices for arbitrary finite metric spaces, where is the doubling dimension of the metric. Further approximation techniques for Rips complexes [desu-gic] and the closely related Čech complexes [bs-approximating, sheehy-cech, kbs-cech] have been derived subsequently, all with comparable size bounds. More recently, we constructed an approximation scheme for Rips complexes in Euclidean space that yields a worse approximation factor of , but uses only simplices [ckr-polynomial], where is the ambient dimension of the point set.
We present a -approximation for the Rips filtration of points in in the -norm , whose -skeleton has size . This translates to a -approximation of the Rips filtration in the Euclidean metric and hence improves the asymptotic approximation quality of our previous approach [ckr-polynomial] with the same size bound.
On a high level, our approach follows a straightforward approximation scheme: given a scaled and appropriately shifted integer grid on , we identify those grid points that are close to the input points and build an approximation complex using these grid points. The challenge lies in how to connect these grid points to a simplicial complex such that close-by grid points are connected, while avoiding too many connections to keep the size small. Our approach first selects a set of active faces in the cubical complex defined over the grid, and defines the approximation complex using the barycentric subdivision of this cubical complex.
We also describe an output-sensitive algorithm to compute our approximation. By randomizing the aforementioned shifts of the grids, we obtain a worst-case running time of , where is spread of the point set (that is, the ratio of the diameter to the closest distance of two points) and is the size of the approximation.
Additionally, this paper makes the following technical contributions:
We follow the standard approach of defining a sequence of approximation complexes and establishing an interleaving between the Rips filtration and the approximation. We realize our interleaving using chain maps connecting a Rips complex at scale to an approximation complex at scale , and vice versa, with being the approximation factor. Previous approaches [ckr-polynomial, desu-gic, sheehy-rips] used simplicial maps for the interleaving, which induce an elementary form of chain maps and are therefore more restrictive.
The explicit construction of such maps can be a non-trivial task. The novelty of our approach is that we avoid this construction by the usage of acyclic carriers [munkres]. In short, carriers are maps that assign subcomplexes to subcomplexes under some mild extra conditions. While they are more flexible, they still certify the existence of suitable chain maps, as we exemplify in Section 4. We believe that this technique is of general interest for the construction of approximations of cell complexes.
We exploit a simple trick that we call scale balancing to improve the quality of approximation schemes. In short, if the aforementioned interleaving maps from and to the Rips filtration do not increase the scale parameter by the same amount, one can simply multiply the scale parameter of the approximation by a constant. Concretely, given maps
interleaving the Rips complex and the approximation complex , we can define and obtain maps
which improves the interleaving from to . While it has been observed that the same trick can be used for improving the worst-case distance between Rips and Čech filtrations555Ulrich Bauer, private communication, our work seems to be the first to make use of it in the context of approximations of filtrations.
Our technique can be combined with dimension reduction techniques in the same way as in [ckr-polynomial] (see Theorems 19, 21, and 22 therein), with improved logarithmic factors. We omit the technical details in this paper. Also, we point out that the complexity bounds for size and computation time are for the entire approximation scheme and not for a single scale as in [ckr-polynomial]. However, similar techniques as the ones exposed in Section LABEL:section:computational can be used to improve the results of [ckr-polynomial] to hold for the entire approximation as well666An extended version of [ckr-polynomial] containing these improvements is currently under submission..
We start the presentation by discussing the relevant topological concepts in Section 2. Then, we present few results about grid lattices in Section 3. Building on these ideas, the approximation scheme is presented in Section 4. Computational aspects of the approximation scheme are discussed in Section LABEL:section:computational. We conclude in Section LABEL:section:conclusion.
We review the topological concepts needed in our argument; see [bss-metrics, ch-proximity, brunrer-book, munkres] for more details.
A simplicial complex on a finite set of elements is a collection of subsets called simplices such that each subset is also in . The dimension of a simplex is , in which case is called a -simplex. A simplex is a subsimplex of if . We remark that, commonly a subsimplex is called a ’face’ of a simplex, but we reserve the word ’face’ for a different structure. For the same reason, we do not introduce the common notation of of ’vertices’ and ’edges’ of simplicial complexes, but rather refer to - and -simplices throughout. The -skeleton of consists of all simplices of whose dimension is at most . For instance, the -skeleton of is a graph defined by its -simplices and -simplices.
Given a point set and a real number , the (Vietoris-)Rips complex on at scale consists of all simplices such that , where denotes the diameter. In this work, we write for the Rips complex at scale with the Euclidean metric, and when using the metric of the -norm. In either way, a Rips complex is an example of a flag complex, which means that whenever a set has the property that every -simplex is in the complex, then the -simplex is also in the complex.
A simplicial complex is a subcomplex of if . For instance, is a subcomplex of for . Let be a simplicial complex. Let be a map which assigns to each vertex of , a vertex of . A map is called a simplicial map induced by , if for every simplex in , the set is a simplex of . For a subcomplex of , the inclusion map is an example of a simplicial map. A simplicial map is completely determined by its action on the -simplices of .
A chain complex with is a collection of abelian groups and homomorphisms such that . A simplicial complex gives rise to a chain complex by fixing a base field , defining as the set of formal linear combinations of -simplices in over , and as the linear operator that assigns to each simplex the (oriented) sum of its sub-simplices of codimension one777To avoid thinking about orientations, it is often assumed that is the field with two elements..
A chain map between chain complexes and is a collection of group homomorphisms such that . For example, a simplicial map between simplicial complexes induces a chain map between the corresponding chain complexes. This construction is functorial, meaning that for the identity function on a simplicial complex , is the identity function on , and for composable simplicial maps , we have that .
Homology and carriers
The -th homology group of a chain complex is defined as . The -th homology group of a simplicial complex , , is the -th homology group of its induced chain complex. In either case is a
-vector space because we have chosen our base ringas a field. Intuitively, when the chain complex is generated from a simplicial complex, the dimension of the -th homology group counts the number of -dimensional holes in the complex (except for , where it counts the number of connected components). We write for the direct sum of all for .
A chain map induces a linear map between the homology groups. Again, this construction is functorial, meaning that it maps identity maps to identity maps, and it is compatible with compositions.
We call a simplicial complex acyclic, if is connected and all homology groups with are trivial. For simplicial complexes and , an acyclic carrier is a map that assigns to each simplex in , a non-empty subcomplex such that is acyclic, and whenever is a subsimplex of , then . We say that a chain is carried by a subcomplex , if takes value except for -simplices in . A chain map is carried by , if for each simplex , is carried by . We state the acyclic carrier theorem [munkres]:
Let be an acyclic carrier.
There exists a chain map such that is carried by .
If two chain maps are both carried by , then .
Filtrations and towers
Let be a set of real values which we refer to as scales. A filtration is a collection of simplicial complexes such that for all . For instance, is a filtration which we call the Rips filtration. A (simplicial) tower is a sequence of simplicial complexes with being a discrete set (for instance ), together with simplicial maps between complexes at consecutive scales. For instance, the Rips filtration can be turned into a tower by restricting to a discrete range of scales, and using the inclusion maps as . The approximation constructed in this paper will be another example of a tower.
We say that a simplex is included in the tower at scale , if is not the image of , where is the scale preceding in the tower. The size of a tower is the number of simplices included over all scales. If a tower arises from a filtration, its size is simply the size of the largest complex in the filtration (or infinite, if no such complex exists). However, this is not true for in general for simplicial towers, since simplices can collapse in the tower and the size of the complex at a given scale may not take into account the collapsed simplices which were included at earlier scales in the tower.
Barcodes and Interleavings
A collection of vector spaces connected with linear maps is called a persistence module, if is the identity on and for all for the index set .
We generate persistence modules using the previous concepts. Given a simplicial tower , we generate a sequence of chain complexes . By functoriality, the simplicial maps of the tower give rise to chain maps between these chain complexes. Using functoriality of homology, we obtain a sequence of vector spaces with linear maps , forming a persistence module. The same construction can be applied to filtrations.
Persistence modules admit a decomposition into a collection of intervals of the form (with ), called the barcode, subject to certain tameness conditions. The barcode of a persistence module characterizes the module uniquely up to isomorphism. If the persistence module is generated by a simplicial complex, an interval in the barcode corresponds to a homological feature (a “hole”) that comes into existence at complex and persists until it disappears at .
Two persistence modules and with linear maps and are said to be weakly (multiplicatively) -interleaved with , if there exist linear maps and , called interleaving maps, such that the diagram
commutes for all , that is, and (we have skipped the subscripts of the maps for readability). In such a case, the barcodes of the two modules are -approximations of each other in the sense of [ch-proximity]. We say that two towers are -approximations of each other, if their persistence modules that are -approximations.
Under more stringent interleaving conditions, the approximation ratio can be improved. Given a totally ordered index set , two persistence modules and with linear maps and are said to be strongly (multiplicatively) -interleaved with , if there exist linear maps and , such that the diagrams
commute for all . The barcodes of the two modules are -approximations of each other in the sense of [ch-proximity].
3 Grids and cubes
Let with be a discrete set of scales. For a scale , we inductively define a grid on scale which is a scaled and translated (shifted) version of the integer lattice: for , is simply , the scaled integer grid. For , we choose an arbitrary and define
where the signs of the components of the last vector are chosen uniformly at random (and the choice is independent for each ). For , we define
We motivate the shifting next. For a finite point set and , the Voronoi region is the (closed) set of points in that have as one of its closest points in . If , it is easy to see that the Voronoi region of any grid point is a cube of side length centered at . The shifting of the grids ensures that each lies in the Voronoi region of a unique . By an elementary calculation, we can show a stronger statement, which we use frequently; for shorter notation, we write instead of .
Let such that . Then, .
Without loss of generality, we can assume that and is the origin, using an appropriate translation and scaling. Also, we assume for simplicity that ; the proof is analogous for any other translation vector. In that case, it is clear that . Since , the Voronoi region of is the set . Since is a translated version of , the Voronoi region of is the cube , which covers . ∎
The integer grid naturally defines a cubical complex, where each element is an axis-aligned, -dimensional cube with . To define it, let denote the set of all integer translates of faces of the unit cube , considered as a convex polytope in . We call the elements of faces. Each face has a dimension ; the -faces, or vertices are exactly the points in . Moreover, the facets of a -face are the -faces contained in . We call a pair of facets of opposite if they are disjoint. Obviously, these concepts carry over to scaled and translated versions of , so we can define as the cubical complex defined by .
We define a map as follows: for vertices, we assign to the (unique) vertex such that (cf. Lemma 2). For a -face of with vertices in , we set to be the convex hull of ; the next lemma shows that this is indeed a well-defined map.
are the vertices of a face of . Moreover, if are any two opposite facets of , then there exists a pair of opposite facets of such that and .
First claim: We prove the first claim by induction on the dimension of faces of . Base case: for vertices, the claim is trivial using Lemma 2. Induction case: let the claim hold true for all -faces of . We show that the claim holds true for all -faces of .
Let be a -face of . Let and be opposite facets of , along the -th co-ordinate. Let the vertices of be and be taken in the same order, that is, and differ in only the -th coordinate for all . By definition, all vertices of share the -th coordinate, and we denote coordinate of these vertices by . Then, the -th coordinate of all vertices of equals . By induction hypothesis, and are two faces of . We show that the vertices of are vertices of a face of .
The map acts on each coordinate direction independently. Therefore, and have the same coordinates, except possibly the -th coordinate. This further implies that is a translate of along the -th coordinate.
There are two cases: if and share the -th coordinate, then and therefore , so the claim follows. On the other hand, if and do not share the -th coordinate: ’s -th coordinate is , while for it is . From the structure of , we see that and differ by . It follows that and are two faces of which differ in only one coordinate by . So they are opposite facets of a codimension-1 face of . Using induction, the claim follows.
Second claim: Without loss of generality, assume that is the direction in which is a translate of . Let denote the maximal face of such that . Clearly, , since that would imply , which is a contradiction.
Suppose has dimension less than . Let be the face of , obtained by translating along . As in the first claim, it is easy to see that , from the structure of . This means that there is a facet of containing and such that . Let be the opposite facet of in and let be the direction which separates from . Then, otherwise does not hold. Let be the face of , obtained by translating along . Then, from the structure of , holds. The facet of containing and also maps to under . This is a contradiction to our assumption that is the highest dimensional face of such that . See Figure 1 for a simple illustration.
Therefore, the only possibility is that is a facet of such that . Let be the opposite facet of . From the structure of , it is easy to see that . The claim follows. ∎
A flag in is a set of faces of such that . The barycentric subdivision of is the (infinite) simplicial complex whose simplices are the flags of ; in particular, the -simplices of are the faces of . An equivalent geometric description of can be obtained by defining the -simplices as the barycenters of the faces in , and introducing a -simplex between barycenters if the corresponding faces form a flag. It is easy to see that is a flag complex. Given a face in , we write for the subcomplex of consisting of all flags that are formed only by faces contained in .
4 Approximation scheme
We define our approximation complex at scale as a finite subcomplex of . To simplify the subsequent analysis, we define the approximation in a slightly generalized form.
For a fixed , let denote a non-empty subset of . We say that a face is spanned by if is non-empty and not contained in any facet of . Trivially, the vertices of spanned by are precisely the points in . We point out that the set of spanned faces is not closed under taking sub-faces; for instance, if consists of two antipodal points of a -cube, the only faces spanned by are the -cube and the two vertices.
The barycentric span of is the subcomplex of defined by all flags such that all are spanned by . This is indeed a subcomplex of because it is closed under taking subsets. Moreover, for a face , we define the -local barycentric span of as the set of all flags in the barycentric span such that for all . This is a subcomplex both of and of the barycentric span of and is a flag complex.
For each face , the -local barycentric span of is either empty or acyclic.
We assume that the -local barycentric span of is not empty. Hence, contains a unique active face of maximal dimension that is spanned by . A simplex in a simplicial complex is called maximal if no other simplex in contains it. It is known that if a simplicial complex contains a -simplex that lies in every maximal simplex, than is acyclic (in this case, is called star-shaped). In our situation, the -simplex belongs to every maximal simplex in the -local barycentric span, because every simplex not containing is a flag that can be extended by adding to its end. ∎
Furthermore, if , it is easy to see that faces spanned by are also spanned by . Consequently, the barycentric span of is a subcomplex of the barycentric span of .
We denote by a finite set of points. For each point , we let denote the grid point in that is closest to (we assume for simplicity that this closest point is unique). We define the active vertices of , , as , that is, the set of grid points that are closest to some point in . The next statement is a direct application of the triangle inequality; let denote the diameter in the -norm.
Let be such that . Then, the set is contained in a face of . Equivalently, for a simplex on , the set of active vertices is contained in a face of .
We prove the claim by contradiction. Assume that the set of active vertices is not contained in a face of . Then, there exists such , are not in a common face of . By the definition of the grid , the grid points , therefore have -distance at least . Moreover, has -distance less than from , and the same is true for and . By triangle inequality, the -distance of and is more than , a contradiction. ∎
Vice versa, we define a map by mapping an active vertex to its closest point in (again, assuming for simplicity that the assignment is unique). The map is a section of , that is, is the identity on .
For all , .
We now define our approximation tower: for scale , we define as the barycentric span of the active vertices . See Figure 2 for an illustration. To simplify notations, we call the faces of spanned by active faces, and simplices of active flags.
To complete the construction, we need to define simplicial maps . We show that such maps are induced by .
Let be an active face of . Then, is an active face of .
From Lemma 3, is a face of . If is a vertex, it is active, because contains at least one active vertex , and in this case. If is not a vertex, we assume for a contradiction that it is not active. Then, it contains a facet that contains all active vertices in . Let denote the opposite facet. By Lemma 3, contains opposite facets , such that and . Since is active, both and contain active vertices, in particular, contains an active vertex . But then, the active vertex must lie in , contracting the fact that contains all active vertices of . ∎
Recall that a simplex is a flag of active faces in . We set as the flag , which consists of active faces in by Lemma 7, and hence is a simplex in . It follows that is a simplicial map. This finishes our construction of the simplicial tower with simplicial maps .
To relate our tower with the -Rips filtration, we start by defining two acyclic carriers. We write to simplify notations.
: let be any flag of . Let be the set of active vertices of . We set . With a simple triangle inequality, we see that is a simplex in , hence it is acyclic.
Using the Acyclic Carrier Theorem (Theorem 1), there exist chain maps and , which are carried by and , respectively. Aggregating the chain maps, we have the following diagram: