# Graph Reconstruction by Discrete Morse Theory

Recovering hidden graph-like structures from potentially noisy data is a fundamental task in modern data analysis. Recently, a persistence-guided discrete Morse-based framework to extract a geometric graph from low-dimensional data has become popular. However, to date, there is very limited theoretical understanding of this framework in terms of graph reconstruction. This paper makes a first step towards closing this gap. Specifically, first, leveraging existing theoretical understanding of persistence-guided discrete Morse cancellation, we provide a simplified version of the existing discrete Morse-based graph reconstruction algorithm. We then introduce a simple and natural noise model and show that the aforementioned framework can correctly reconstruct a graph under this noise model, in the sense that it has the same loop structure as the hidden ground-truth graph, and is also geometrically close. We also provide some experimental results for our simplified graph-reconstruction algorithm.

## Authors

• 18 publications
• 2 publications
• 33 publications
• ### Threshold-Based Graph Reconstruction Using Discrete Morse Theory

Discrete Morse theory has recently been applied in metric graph reconstr...
11/28/2019 ∙ by Brittany Terese Fasy, et al. ∙ 0

• ### Road Network Reconstruction from Satellite Images with Machine Learning Supported by Topological Methods

Automatic Extraction of road network from satellite images is a goal tha...
09/15/2019 ∙ by Tamal K. Dey, et al. ∙ 21

• ### Persistence-perfect discrete gradient vector fields and multi-parameter persistence

The main objective of this paper is to introduce and study a notion of p...
04/10/2019 ∙ by Claudia Landi, et al. ∙ 0

• ### Persistent Homology of Morse Decompositions in Combinatorial Dynamics

We investigate combinatorial dynamical systems on simplicial complexes c...
01/19/2018 ∙ by Tamal K. Dey, et al. ∙ 0

• ### 3D Microstructure Segmentation by Topological Persistence

Tomography is a widely used tool for analyzing microstructures in three ...
04/27/2021 ∙ by Anand V. Patel, et al. ∙ 0

• ### Cache Persistence Analysis: Finally Exact

Cache persistence analysis is an important part of worst-case execution ...
09/10/2019 ∙ by Gregory Stock, et al. ∙ 0

• ### Fast Regularity-Constrained Plane Reconstruction

Man-made environments typically comprise planar structures that exhibit ...
05/20/2019 ∙ by Yangbin Lin, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Recovering hidden structures from potentially noisy data is a fundamental task in modern data analysis. A particular type of structure often of interest is the geometric graph-like structure. For example, given a collection of GPS trajectories, recovering the hidden road network can be modeled as reconstructing a geometric graph embedded in the plane. Given the simulated density field of dark matters in universe, finding the hidden filamentary structures is essentially a problem of geometric graph reconstruction.

Different approaches have been developed for reconstructing a curve or a metric graph from input data. For example, in computer graphics, much work have been done in extracting 1D skeleton of geometric models using the medial axis or Reeb graphs [10, 29, 20, 16, 22, 7]

. In computer vision and machine learning, a series of work has been developed based on the concept of

principal curves, originally proposed by Hastie and Steutzle [18]. Extensions to graphs include the work in [19] for 2D images and in [25] for high dimensional point data.

In general, there is little theoretical guarantees for most approaches developed in practice to extract hidden graphs. One exception is some recent work in computational topology: Aanijaneya et al. [3] proposed the first algorithm to approximate a metric graph from an input metric space with guarantees. The authors of [8, 16] used Reeb-like structures to approximate a hidden (metric) graph with some theoretical guarantees. These work however only handles (Gromov-)Hausdorff-type of noise. When input points are embedded in an ambient space, they requires the input points to lie within a small tubular neighborhood of the hidden graph. Empirically, these methods do not seem to be effective when the input contains ambient noise allowing some faraway points from the hidden graph.

Recently, a discrete Morse-based framework for recovering hidden structures was proposed and studied [9, 17, 26]. This line of work computes and simplifies a discrete analog of (un)stable manifolds of a Morse function by using the (Forman’s) discrete Morse theory coupled with persistent homology for 2D or 3D volumetric data. One of the main issues in such simplification is the inherent obstructions that may occur for cancelling critical pairs. The authors of [26] suggest sidestepping this and consider a combinatorial representation of critical pairs for further processing. The authors in [9] identify a restricted set of pairs called “cancellable close pairs” which are guaranteed to admit cancellation. This framework has been applied to, for example, extracting filament structures from simulated dark matter density fields [27] and reconstructing road networks from GPS traces [28].

This persistence-guided discrete Morse-based framework has shown to be very effective in recovering a hidden geometric graph from (non-Hausdorff type) noise and non-homogeneous data. The method draws upon the global topological structure hidden in the input scalar field and thus is particularly effective at identifying junction nodes which has been a challenge for previous approaches that rely mostly on local information. However, to date, theoretical understanding of such a framework remains limited. Simplification of a discrete Morse gradient vector field using persistence has been studied before. For example, the work of

[9] clarifies the connection between persistence-pairing and the simplification of discrete Morse chain complex (which is closely related, but different from the cancellation in the discrete gradient vector field) for 2D and 3D domains. Bauer et al. [6] obtain several results on persistence guided discrete Morse simplification for combinatorial surfaces. The simplification of vertex-edge persistence pairing used in [6] has also been observed in [4] independently for simplifying Morse functions on surfaces. Leveraging these existing developments, we aim to provide a theoretical understanding of a persistence-guided discrete Morse based approach to reconstruct a hidden geometric graph.

#### Main contributions and organization of paper.

In Section 3, we start with one version of the existing persistence-guided discrete Morse-based graph reconstruction algorithm (as employed in [27, 28, 11]). We show that this algorithm can be significantly simplified while still yielding the same output. To establish the theoretical guarantee of the reconstruction algorithm, we introduce a simple yet natural noise model in Section 4. Intuitively, this noise model assumes that we are given an input density field where densities are significantly higher within a small neighborhood around a hidden graph than outside it. Under this noise model, we show that the reconstructed graph has the same loop structure as the hidden graph, and is also geometrically close to it; the technical details are in Sections 5 and 6 for the general case and the 2-dimensional case (with additional guarantees), respectively.

While our noise model is simple, our theoretical guarantees are first of a kind developed for a discrete Morse-based approach applied to graph reconstruction. In fact, prior to this, it was not clear whether a discrete Morse based approach can recover a graph even if there is no noise, that is, the density function has positive values only on the hidden graph. For our specific noise model, it may be possible to develop thresholding strategies perhaps with theoretical guarantees. However, previous work (e.g, [27, 28]) have shown that discrete Morse approach succeeds in many cases handling non-homogeneous data where thresholding fails. We have implemented the proposed simplified algorithm and tested it on several data sets, which generally gives a speed-up of at least a factor of 2 over a state-of-the-art approach. We present more discussions and experimental results in the appendix.

## 2 Preliminaries

### 2.1 Morse theory

For simplicity, we consider only a smooth function . See [13, 21] for more general discussions.

For a point , the gradient vector of at a point is , which represents the steepest descending direction of at , with its magnitude being the rate of change. An integral line of is a path such that the tangent vector at each point of this path equals , which is intuitively a flow line following the steepest descending direction at any point. A point is critical if its gradient vector vanishes, i.e, . A maximal integral line necessarily “starts” and “ends” at critical points of ; that is, with , and with . See Figure 0(a) where we show the graph of a function , and there is an integral line from to the minimum .

For a critical point , the union of and all the points from integral lines flowing into is referred to as the stable manifold of . Similarly, for a critical point , the union of and all the points on integral lines starting from is called the unstable manifold of . The stable manifold of a minimum intuitively corresponds the basin/valley around in the terrain of . The 1-stable manifolds of index () saddles consist of pieces of curves connecting ()-saddles to maxima – These curves intuitively capture “mountain ridges” of the terrain (graph of the function ); see Figure 0(a) for an example. Symmetrically, the unstable manifold of a maximum corresponds to the mountain around . The 1-unstable manifolds consist of a collection of curves connecting -saddles to minima, corresponding intuitively to the “valley ridges”.

In this paper, we focus on a graph-reconstruction framework using Morse-theory (as in e.g, [17, 9, 27, 28]). Intuitively, the 1-stable manifolds of saddles (mountain ridges) of the density field are used to capture the hidden graphs. To implement such an idea in practice, the discrete Morse theory is used for robustness and simplicity contributed by its combinatorial nature; and a simplification scheme guided by the persistence pairings is employed to remove noise. Below, we introduce some necessary background notions in these topics.

### 2.2 Discrete Morse theory

First we briefly describe some notions from discrete Morse theory (originally introduced by Forman [15]) in the simplicial setting.

A -simplex is the convex hull of affinely independent points; is called the dimension of . A face of is a simplex spanned by a proper subset of vertices of ; is a facet of the -simplex , denoted by , if its dimension is .

Suppose we are given a simplicial complex which is simply a collection of simplices and all their faces so that if two simplices intersect, they do so in a common face. A discrete (gradient) vector is a pair of simplices such that . A Morse pairing in is a collection of discrete vectors where each simplex appears in at most one pair; simplices that are not in any pair are called critical.

Given a Morse pairing , a V-path is a sequence where for every , and each is a facet of for each . If , the V-path is trivial. This V-path is cyclic if and ; otherwise, it is acyclic in which case we call this V-path a gradient path. We say that a gradient path is a vertex-edge gradient path if , implying that . Similarly, it is a edge-triangle gradient path if . A Morse pairing becomes a discrete gradient vector field (or equivalently a gradient Morse pairing) if there is no cyclic V-path induced by .

Intuitively, given a discrete gradient vector field , a gradient path is the analog of an integral line in the smooth setting. But different from the smooth setting, a maximal gradient path may not start or end at critical simplices. However, those that do (i.e, when and are critical simplices) are analogous to maximal integral line in the smooth setting which “start” and “end” at critical points, and for convenience one can think of critical -simplices in the discrete Morse setting as index- critical points in the smooth setting. For example, for a function on , critical 0-, 1- and 2-simplices in the discrete Morse setting correspond to minima, saddles and maxima in the smooth setting, respectively.

For a critical edge , we define its stable manifold to be the union of edge-triangle gradient paths that ends at . Its unstable manifold is defined to be the union of vertex-edge gradient paths that begins with . While earlier we use “mountain ridges” (1-stable manifolds) to motivate the graph reconstruction framework, algorithmically (especially for the Morse cancellations below), vertex-edge gradient paths are simpler to handle. Hence in our algorithm below, we in fact consider the function (instead of the density field itself) and the algorithm outputs (a subset of) the 1-unstable manifolds (vertex-edge paths in the discrete setting) as the recovered hidden graph.

#### Morse cancellation / simplification.

One can simplify a discrete gradient vector field (i.e, reducing the number of critical simplices) by the following Morse cancellation operation: A pair of critical simplices with is cancellable, if there is a unique gradient path starting at the -simplex and ends at the -simplex . The Morse cancellation operation on then modifies the vector field by removing all gradient vectors , for , while adding new gradient vectors , for . Intuitively, the gradient path is inverted. Note that and are no longer critical after the cancellation as they now participate in discrete gradient vectors. If there is no gradient path, or more than one gradient path between this pair of critical simplices , then this pair is not cancellable – the uniqueness condition is to ensure that no cyclic V-paths are formed after the cancellation operation. See Figure 1 (b) – (d) for examples.

### 2.3 Persistence pairing

The Morse cancellation can be applied to any sequence of critical simplices pairs as long as they are cancellable at the time of cancellation. There is no canonical cancellation sequence. To cancel features corresponding to “noise” w.r.t. an input piecewise-linear function , a popular strategy is to guide the Morse cancellation by the persistent homology induced by the lower-star filtration [17, 27], which we introduce now.

#### Filtrations and lower-star filtration.

Given a simplicial complex , let be an ordered sequence of all simplices in so that for any simplex , all of its faces appear before it in . Then induces a (simplex-wise) filtration : where is the subcomplex formed by the prefix of . Passing to homology groups, we have a persistence module , which has a unique decomposition into the direct sum of a set of indecomposable summands that can be represented by the set of persistence-pairing induced by : Each persistence pair indicates that a new -th homological class, , is created at and destroyed at ; is thus called a positive simplex as it creates, and a negative simplex. Assuming that there is a simplex-wise function such that if , then the persistence of the pair is defined as . Some simplices ’s may be unpaired, meaning that homological features created at are never destroyed. We augment by adding for every unpaired simplex to it, and set .

The persistent homology can be defined for any filtration of . In our setting, there is an input function defined at the vertices of whose linear extension leads to a piecewise-linear (PL) function still denoted by . To reflect topological features of , we use the lower-star filtration of induced by : Specifically, for any vertex , its lower-star is the set of simplicies containing where has the highest value among their vertices. Now sort vertices of in non-decreasing order of their -values: . An ordered sequence induces a lower-star filtration of w.r.t. if can be partitioned to consecutive pieces , , , such that the -th piece equals .

Now let be the resulting set of persistence pairs induced by the lower-star filtration . Extend the function to a simplex-wise function where (i.e, is the highest f-value of any of its vertices). For each pair , we measure its persistence by . Every simplex in contributes to a persistence pair in . However, assuming the value of is distinct on all vertices, then those persistence pairs with zero-persistence are “trivial” in the sense they correspond to the local pairing of two simplices from the lower-star of the same vertex. A persistence pair with positive persistence corresponds to a pair of (homological) critical points for the PL-function [13] induced by the function on , with and .

## 3 Reconstruction algorithm

#### Problem setup.

Suppose we have a domain (which will be a cube in in this paper) and a density function (that “concentrates” around a hidden geometric graph ). In the discrete setting, our input will be a triangulation of and a density function given as a PL-function . Our goal is to compute a graph approximating the hidden graph . In Algorithm 1, we first present a known discrete Morse-based graph (1-skeleton) reconstruction framework, which is based on the approaches in [17, 9, 27, 28].

Intuitively, we wish to use “mountain ridges” of the density field to approximate the hidden graph, which are computed as the 1-unstable manifolds of , the negation of the density function. Specifically, after initializing the discrete gradient vector field to be a trivial one, a persistence-guided Morse cancellation step is performed in Procedure PerSimpVF() to compute a new discrete gradient vector field so as to capture only important (high persistent) features of . In particular, Morse-cancellation is performed for each pair of critical simplices from (if possible) in increasing order of persistence values (for pairs with equal persistence, we use the nested order as in [6]). Finally, the union of the 1-unstable manifolds of all remaining high-persistence critical edges is taken as the output graph , as outlined in Procedure CollectOutputG().

Since we only need 1-unstable manifolds, is assumed to be a -complex. It is pointed out in [11] that in fact, instead of performing Morse-cancellation for all critical pairs involving edges (i.e, vertex-edge pairs and edge-triangle pairs), one only needs to cancel vertex-edge pairs – This is because only vertex-edge gradient vectors will contribute to the 1-unstable manifolds, and also new vertex-edge vectors can only be generated while canceling other vertex-edge pairs. Hence in PerSimpVF(), we can consider only vertex-edge pairs in order. Furthermore, it is not necessary to check whether the cancellation is valid or not – it will always be valid as long as the pairs are processed in increasing orders of persistence [6]111We remark that though [6] states that the cancellation is not valid in higher dimension or non-manifold 2-complexes, all cancellations in PerSimpVF() are for vertex-edge pairs in a spanning tree which can be viewed as a 1-complex, and thus are always valid..

However, we can further simplify the algorithm as follows: First, we replace procedure PerSimpVF() by procedure PerSimpTree() as shown in Algorithm 2, which is much simpler both conceptually and implementation speaking. Note that there is no explicit cancellation operation any more.

The -dimensional simplicial complex output by procedure PerSimpTree() may have multiple connected components – In fact, it is known that each is a tree and is a forest (see results from [4, 6] as summarized in Lemma 3.2 below). For each component , we define its sink, denoted by , as the vertex with the lowest function value. Lemma 3.2 also states that the sink of would have been the only critical simplex among all simplices in , if we had performed the -simplification as specified in procedure PerSimpVF(). Next, we replace procedure CollectOutputG() by procedure Treebased-OutputG() shown in Algorithm 2. We use MorseReconSimp() to denote our simplified version of Algorithm 1 (with PerSimpVF() replaced by PerSimpTree(), and CollectOutputG() replaced by Treebased-OutputG(). In summary, algorithm MorseReconSimp() works by first computing all persistence pairs as before. It then collects all vertex-edge persistence pairs with . These edges along with the set of all vertices form a spanning forest . Then, for every edge with , it outputs the 1-unstable manifold of , which is simply the union of and the unique paths from and to the sink (root) of the tree containing them respectively. Its time complexity is stated below; note for the previous algorithm MorseRecon(), the cancellation step can take time.

###### Theorem 3.1.

The time complexity of our Algorithm PerSimpVF() is , where is the time to compute persistence pairings for , and is the total number of vertices and edges in .

We remark that the term is contributed by the step collecting all -unstable manifolds, which takes linear time if one avoids revisiting edges while tracing the paths.

#### Justification of the modified algorithm MorseReconSimp().

Let denote the resulting discrete gradient field after canceling all vertex-edge persistence pairs in with persistence at most ; that is, is the output of the procedure PerSimpVF() (although we only compute the relevant part of the discrete gradient vector field). Using observations from [4, 6], we show that the output of procedure PerSimpTree() includes all information of . Furthermore, procedure Treebased-OutputG() computes the correct 1-unstable manifolds for all critical edges with persistence larger than . Indeed, observe that edges in Morse pairings from (for any ) form a spanning forest of edges in . Results of [6] imply that the output constructed by our modified procedure corresponds exactly to this spanning forest:

###### Lemma 3.2.

The following statements hold for the output of procedure PerSimpTree() w.r.t any :

• is a spanning forest consisting of potentially multiple trees .

• For each tree , its sink is the only critical simplex in . The collection of s corresponds exactly to those vertices whose persistence is bigger than .

• Any edge with remains critical in (and cannot be contained in ).

Note that, (ii) above implies that for each , any discrete gradient path of in terminates at its sink . See the right figure for an illustration. Hence for any vertex , the path computed in procedure Treebased-OutputG() is the unique discrete gradient path starting at . This immediately leads to the following result:

###### Corollary 3.3.

For each critical edge with , as computed in procedure Treebased-OutputG() is the 1-unstable manifold of in . Hence the output of our simplified algorithm MorseReconSimp() equals that of the original algorithm MorseRecon().

## 4 Noise model

We first describe the noise model in the continuous setting where the domain is . We then explain the setup in the discrete setting when the input is a triangulation of .

Given a connected “true graph” , consider a -neighborhood , meaning that (i) , and (ii) for any , (i.e, is sandwiched between and its -offset). Given , we use to denote the closure of its complement . See the right figure, showing (red graph) with its -neighborhood (orange).

###### Definition 4.1.

A density function is a -approximation of a connected graph if the following holds:

• There is a -neighborhood of such that deformation retracts to .

• for ; and otherwise. Furthermore, .

Intuitively, this noise model requires that the density concentrates around the true graph in the sense that the density is significantly higher inside than outside it; and the density fluctuation inside or outside is small compared to the density value in (condition C-2). Condition C-1 says that the neighborhood has the same topology of the hidden graph. Such a density field could for example be generated as follows: Imagine that there is an ideal density field where for and otherwise. There is a noisy perturbation whose size is always bounded by for any . The observed density field is an -approximation of .

In the discrete setting when we have a triangulation of , we define a -neighborhood to be a subcomplex of , i.e, , such that (i) is contained in the underlying space of and (ii) for any vertex , . The outside-region is simply the smallest subcomplex of that contains all simplices from (i.e, all simplices not in and their faces). A PL-function -approximation of can be extended to this setting by requiring the underlying space of deformation retracts to as in (C-1), and having those density conditions in (C-2) only at vertices of .

We remark that the noise model is still limited – In particular, it does not allow significant non-uniform density distribution. However, this is the first time that theoretical guarantees are provided for a discrete Morse based reconstruction framework, despite that such a framework has been used for different applications before. We also give experiments and discussions in Appendix B that the algorithm works beyond this noise model empirical, where thresholding type approaches do not work.

## 5 Theoretical guarantee

In this section, we prove results that are applicable to any dimension. Recall that is the discrete gradient field after the -Morse cancellation process, where we perform Morse-cancellation for all vertex-edge persistence pairs from . (While our algorithm does not maintain explicitly, we use it for theoretical analysis.) At this point, all positive edges (i.e, those paired with triangles or unpaired in ) remain critical in . Some negative edges (i.e, those paired with vertices in ) are also critical in – these are exactly the negative edges with persistence bigger than . Treebased-OutputG() only takes the 1-unstable manifolds of those critical edges (positive or negative) with persistence bigger than ; so those positive edges whose persistence is (if there is any) are ignored.

From now on, we use “under our noise model” to refer to (1) the input is a ()-approximated density field w.r.t. , and (2) . Let be the output of algorithm MorseReconSimp(). The proof of the following result is in Appendix A.1.

###### Proposition 5.1.

Under our noise model, we have:

• There is a single critical vertex left after PerSimpVF() which is in .

• Every critical edge considered by Treebased-OutputG() forms a persistence pair with a triangle.

• Every critical edge considered by Treebased-OutputG() is in .

###### Theorem 5.2.

Under our noise model, the output graph satisfies .

###### Proof.

Recall that the output graph consists of the union of 1-unstable manifolds of all the edges with persistence larger than – By Propositions 5.1 (ii) and (iii), they are all positive (paired with triangles), and contained inside .

Take any and consider . Without loss of generality, consider the gradient path starting from : By Lemma 3.2 and Proposition 5.1, must be a critical vertex (a sink) and is necessarily the global minimum , which is also contained inside . We now argue that the entire path (i.e, all simplices in it) is contained inside . In fact, we argue a stronger statement: First, we say that a gradient vector is crossing if and (i.e, ) – Since is an endpoint of , this means that the other endpoint of must lie in .

###### Claim 5.3.

During the -Morse cancellation, no crossing gradient vector is ever produced.

###### Proof.

Suppose the lemma is not true: Then let be the first crossing gradient vector ever produced during the -Morse cancellation process. Since we start with a trivial discrete gradient vector field, the creation of can only be caused by reversing of some gradient path connecting two critical simplices and while we are performing Morse-cancellation for the persistence pair . Obviously, . On the other hand, due to our -noise model and the choice of , it must be that either both or both – as otherwise, the persistence of this pair will be larger than .

Now consider this gradient path connecting and in the current discrete gradient vector field . Since the pair becomes a gradient vector after the inversion of this path, it must be that currently is a gradient vector where . Furthermore, since the path begins and ends with simplices either both in or both outside it, the path must contain a gradient vector going in the opposite direction crossing inside/outside, that is, and . In other words, it must contain a crossing gradient vector. This however contradicts to our assumption that would be the first crossing gradient vector. Hence the assumption is wrong and no crossing gradient vector can ever be created. ∎

As there is no crossing gradient vector during and after -Morse cancellation, it follows that , which is one piece of the 1-unstable manifold of the critical edge , has to be contained inside . The same argument works for the other piece of -unstable manifold of (starting from the other endpoint of ). Since this is for any , the theorem holds. ∎

The previous theorem shows that is close to in geometry. Next we will show that they are also close in topology.

###### Proposition 5.4.

Under our noise model, is homotopy equivalent to .

###### Proof.

We show that has the same first Betti number as that of which implies the claim as any two graphs in with the same first Betti number are homotopy equivalent.

The underlying space of -neighborhood of deformation retracts to by definition. Observe that, by our noise model, is a sublevel set in the filtration that determines the persistence pairs. This sublevel set being homotopy equivalent to must contain exactly positive edges where is the first Betti number of . Each of these positive edges pairs with a triangle in . Therefore, for each of the positive edges in . By our earlier results, these are exactly the edges that will be considered by procedure Treebased-OutputG(). Our algorithm constructs by adding these positive edges to the spanning tree each of which adds a new cycle. Thus, has first Betti number . ∎

We have already proved that is contained in . This fact along with Proposition 5.4 can be used to argue that any deformation retraction taking (underlying space) to also takes to a subset where and have the same first Betti number. In what follows, we use to denote also its underlying space.

###### Theorem 5.5.

Let be any deformation retraction. Then, the restriction is a homotopy from the embedding to where is the minimal subset so that and have the same first Betti number.

###### Proof.

The fact that is continuous for any is obvious from the continuity of . Only thing that needs to be shown is that . Suppose not. Then, is a proper subset of which has a first Betti number less than that of .

We observe that the cycle in created by a positive edge along with the paths to the root of the spanning tree is also non-trivial in because this is a cycle created by adding the edge during persistence filtration and the edge is not killed in .Therefore, a cycle basis for is also a homology basis for . Since the map is a homotopy equivalence, it induces an isomorphism in the respective homology groups; in particular, a homology basis in is mapped to a homology basis in . Therefore, the image must have a basis of cardinality if has first Betti number . But, cannot have a cycle basis of cardinality if it is a proper subset of reaching a contradiction. ∎

## 6 Additional guarantee for 2D

For , we now show that actually deformation retracts to , which is stronger than saying and are homotopy equivalent. We are unable to prove this result for dimensions higher than 2, as our current proof needs that the edge-triangle persistence pairs can always be canceled (even though our algorithm does not depend on edge-triangle cancellations at all). It would be interesting, as a future work, to see whether a different approach can be developed to avoid this obstruction for the special case under our noise model. The main result of this section is as follows.

###### Theorem 6.1.

Under our noise model, deformation retracts to and .

This main result follows from Proposition 6.2 and Theorem 6.3 below. To prove them, we will show that there exists a partition of the set of triangles in for which Theorem 6.3 holds. (This theorem is our main tool in establishing the deformation retract.) We first state the results below before giving their proofs. Let where is the boundary operator operating on the -chain . We also abuse the notations and to denote the geometric space that is the point-wise union of simplices in the respective chains. Let be a triangle in whose choice will be explained later. In the following, let be the maximal set of edges in whose deletions do not eliminate a cycle (assume that a vertex is deleted only if all of its edges are deleted). Observe that necessarily consists of negative edges forming “hairs” attached to the loops of and hence to because of the following proposition.

###### Proposition 6.2.

Under our noise model, .

###### Theorem 6.3.

Under our noise model, there exists a partition of triangles in such that, there is a deformation retraction of to that comprises of two deformation retractions, one from to and another one from to which is .

Now we describe the construction of a partition of the triangles in to prove Proposition 6.2 and Theorem 6.3. For technicality we assume that is augmented to a triangulation of a sphere by putting a vertex at infinity and joining it to the boundary of with edges and triangles all of whom have function value . Let be the collection of persistence pairs of the form either or generated from the lower-star filtration as described before. Since is 2-dimensional, each pair is either a vertex-edge pair or an edge-triangle pair. We order persistence pairs in by their persistence, where ties are broken via the nested order in the filtration , and obtain:

 P(K) ={(σ1,τ1),…,(σn,τn),(α1,∞),…,(αs,∞)}. (1)

Starting with a trivial discrete gradient vector field where all simplices in are critical, the algorithm PerSimpVF() performs Morse cancellations for the first persistence pairs in order where but , Let denote the gradient vector field after canceling . Recall that in the implementation of the algorithm we do not need to perform Morse cancellation for any edge-triangle pairs. However in this section, for the theoretical analysis, we will cancel edge-triangle pairs as well. Recall that a positive edge is one that creates a 1-cycle, namely, it is either paired with a triangle or unpaired; while a negative edge is one that destroys a 0-cycle (i.e, paired with a vertex).

Consider the ordered sequence of edge-triangle persistence pairs, , which is a subsequence of the one in (1). Consider the sequence of triangles in ordered by the above sequence. Recall the standard persistence algorithm [14]. It implicitly associates a -chain with a triangle when searching for the edge it is about to pair with. This -chain is non-empty if is a destructor, and is empty otherwise. Let denote this -chain associated with for . Initially, the algorithm asserts . At any stage, if is not empty, the persistence algorithm identifies the edge in the boundary that has been inserted into the filtration most recently. If has not been paired with anyone, the algorithm creates the persistence pair . Otherwise, if has already been paired with a triangle, say , then is updated with and the search continues. Given an index , we define a modified set of chains inductively as follows. For , . Assume that has been already defined. To define , similar to the persistence algorithm, check if the edge is on the boundary . If so, define and otherwise. The following result is proved in Appendix A.3.

###### Proposition 6.4.

For , is in . Furthermore, is the most recent edge in according to the filtration order .

Procedure PerSimpVF() also implicitly maintains a -chain with each triangle . Initially, as in the case of . Then, inductively assume that is the -chain implicitly associated with when a persistence pair is about to be considered by PerSimpVF() and the boundary contains . By reversing a gradient path between and , it implicitly updates the -chain as . We observe that is identical with . Proposition 6.5 below establishes this fact along with some other inductive properties useful to prove Theorem 6.3. The proof can be found in Appendix A.4.

###### Proposition 6.5.

Let be the edge-triangle persistence pair PerSimpVF is about to consider and let be the -chains defined as above . Then, the following statements hold:

1. [label=()]

2. For each triangle , , in the persistence order, the -chain satisfies the following conditions:(a.i) , (a.ii) interpreting as a set of triangles, one has that the sets , , partition the set of all triangles in .

3. There is a gradient path from to all edges of the triangles in , and (b.i) the path is unique if the edge is in the boundary for every ; (b.ii) if there is more than one gradient path from to an edge , then must be a negative edge.

We are now ready to setup the regions s needed for Theorem 6.3 and Proposition 6.2. Suppose the first edge-triangle pairs have persistence less than or equal to , the parameter supplied to PerSimpVF(). Then, we set as in Proposition 6.5 for . The proof for Proposition 6.2 is in Appendix A.2.

Finally, similar to the vertex-edge gradient vectors, we say that a gradient vector is crossing if and . The following claim can be proved similarly as Claim 5.3.

###### Claim 6.6.

During the -Morse cancellation of edge-triangle pairs, no crossing gradient vector is ever produced.

Proof of Theorem 6.3. Set . Let be the spanning tree formed by all negative edges and their vertices. Let be the set of edges in that has more than one gradient path from to them; by Proposition 6.5 (b.ii). First, we want to establish a deformation retraction from to . To do this, for , we will define inductively where deformation retracts to and . Let . For , consider a positive edge in where (a) is not in and (b) there is a unique gradient path in from to that passes through triangles all of which are in . If such an edge exists, then is necessarily incident to a single triangle, say , in . We collapse the pair , which is necessarily an edge-triangle gradient vector pair because is positive. We take to be . If no such exists, then either (A) there is no positive edge in any more; or (B) for each positive edge , (B-1) there is a unique gradient path from to but this path passes through some triangle in ; or (B-2) there are two gradient paths from to .

If there is no positive edge in any more, then , as otherwise, there will be at least some triangle from with at least one boundary edge of it being positive. The induction then terminates; we set and reach our goal.

We now show that case (B-1) is not possible. Suppose it happens, that is, is an edge not in for which the unique gradient path from passes through triangles in . Let be the first edge in this path that is in . Then, if , it qualifies for the conditions (a) and (b) required for reaching a contradiction. So, assume . But, in that case, we have a gradient path that goes into and then comes out to reach . There has to be a gradient pair in this path where the edge is in and the triangle is not in . This contradicts Claim 6.6. Thus, case (B-1) is not possible. Now consider (B-2): must be negative by Proposition 6.5 (b.ii). So, it is not possible either.

To summarize, the induction terminates in case (A), at which time we would have that . Furthermore, this process also establishes a deformation retraction from to realized by successive collapses of edge-triangle pairs. Furthermore, by construction, each collapsed pair must be from , hence contains all simplices in . Combined with that , we have that , where is a subset of the spanning tree . The edges in being part of a spanning tree cannot form a cycle and thus can be retracted along the tree to , which gives rise to a deformation retraction from to and then to , establishing the first part of Theorem 6.3.

We now show that deformation retracts to . Let be the edges in with more than one gradient path from to them. These edges are negative by Proposition 6.5 (b.ii). Replacing with and edges in playing the role of edges in in the above induction, we can obtain that deformation retracts to . Observe that now instead of Claim 2, we use the fact that no edge-triangle gradient path crosses that consists of only negative and critical edges. To this end, we also observe that where is the spanning tree formed by all negative edges, as we only collapse edge-triangle pairs that are gradient pairs (hence the participating edges are always positive). This implies that . Again, edges in (being part of a spanning tree) can be retracted along the spanning tree till one reaches or edges in . Performing this for each , we thus obtain a deformation retraction from to and further to . This finishes the proof of Theorem 6.3.

In Appendix B, we also provide some experiments demonstrating the efficiency of the simplified algorithm, as well as discussion on thresholding strategies.

#### Acknowledgments

We thank Suyi Wang for generously helping with the software. The Enzo dataset used in our experiments is obtained from [1]. The Bone dataset is obtained from [2]. This work is supported by National Science Foundation under grants CCF-1526513, CCF-1618247, and CCF-1740761, and by National Institute of Health under grant R01EB022899.

## References

• [1] Enzo code is developed by the Laboratory for Computational Astrophysics at the University of California in San Diego (http://lca.ucsd.edu).
• [2] CIBC CT dataset archive.
• [3] M. Aanjaneya, F. Chazal, D. Chen, M. Glisse, L. Guibas, and D. Morozov. Metric graph reconstruction from noisy data. International Journal of Computational Geometry & Applications, 22(04):305–325, 2012.
• [4] D. Attali, M. Glisse, S. Hornus, F. Lazarus, and D. Morozov. Persistence-sensitive simplification of functions on surfaces in linear time. Presented at TOPOINVIS, 9:23–24, 2009.
• [5] U. Bauer, M. Kerber, J. Reininghaus, and H. Wagner. Phat–persistent homology algorithms toolbox. Journal of Symbolic Computation, 78:76–90, 2017.
• [6] U. Bauer, C. Lange, and M. Wardetzky. Optimal topological simplification of discrete functions on surfaces. Discr. Comput. Geom., 47(2):347–377, 2012.
• [7] S. Biasotti, D. Giorgi, M. Spagnuolo, and B. Falcidieno. Reeb graphs for shape analysis and applications. Theoretical Computer Science, 392(1-3):5–22, 2008.
• [8] F. Chazal, R. Huang, and J. Sun. Gromov–hausdorff approximation of filamentary structures using reeb-type graphs. Discr. Comput. Geom., 53(3):621–649, 2015.
• [9] O. Delgado-Friedrichs, V. Robins, and A. Sheppard. Skeletonization and partitioning of digital images using discrete morse theory. IEEE Trans. Pattern Anal. Machine Intelligence, 37(3):654–666, March 2015.
• [10] T. Dey and J. Sun. Defining and computing curve-skeletons with medial geodesic function. In Sympos. Geom. Proc., volume 6, pages 143–152, 2006.
• [11] T. Dey, J. Wang, and Y. Wang. Improved road network reconstruction using discrete morse theory. In Proc. 25th ACM SIGSPATIAL. ACM, 2017.
• [12] T. Dey, J. Wang, and Y. Wang. Graph reconstruction by discrete morse theory. arXiv preprint arXiv:1803.05093, 2018.
• [13] H. Edelsbrunner and J. Harer. Computational Topology - an Introduction. American Mathematical Soc., 2010.
• [14] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Discr. Comput. Geom., 28:511–533, 2002.
• [15] R. Forman. Morse theory for cell complexes. Advances in mathematics, 134(1):90–145, 1998.
• [16] X. Ge, I. I Safa, M. Belkin, and Y. Wang. Data skeletonization via reeb graphs. In Advances in Neural Info. Proc. Sys., pages 837–845, 2011.
• [17] A. Gyulassy, M. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham, and B. Hamann. Topologically clean distance fields. IEEE Trans. Visualization Computer Graphics, 13(6):1432–1439, Nov 2007.
• [18] T. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association, 84(406):502–516, 1989.
• [19] B. Kégl and A. Krzyzak. Piecewise linear skeletonization using principal curves. IEEE Trans. Pattern Anal. Machine Intelligence, 24(1):59–74, 2002.
• [20] L. Liu, E. W Chambers, D. Letscher, and T. Ju. Extended grassfire transform on medial axes of 2d shapes. Computer-Aided Design, 43(11):1496–1505, 2011.
• [21] J. Milnor. Morse Theory. Princeton Univ. Press, New Jersey, 1963.
• [22] M. Natali, S. Biasotti, G. Patanè, and B. Falcidieno. Graph-based representations of point clouds. Graphical Models, 73(5):151–164, 2011.
• [23] M. L. Norman, G. L. Bryan, R. Harkness, J. Bordner, D. Reynolds, B. O’Shea, and R. Wagner. Simulating Cosmological Evolution with Enzo. ArXiv e-prints, May 2007. arXiv:0705.1556.
• [24] B. W. O’Shea, G. Bryan, J. Bordner, M. L. Norman, T. Abel, R. Harkness, and A. Kritsuk. Introducing Enzo, an AMR Cosmology Application. ArXiv Astrophysics e-prints, March 2004. arXiv:astro-ph/0403044.
• [25] U. Ozertem and D. Erdogmus. Locally defined principal curves and surfaces. Journal of Machine learning research, 12(Apr):1249–1286, 2011.
• [26] V. Robins, P. J. Wood, and A. P. Sheppard. Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Trans. Pattern Anal. Machine Intelligence, 33(8):1646–1658, Aug 2011.
• [27] T. Sousbie. The persistent cosmic web and its filamentary structure - I. Theory and implementation. 414:350–383, June 2011. arXiv:1009.4015.
• [28] S. Wang, Y. Wang, and Y. Li. Efficient map reconstruction and augmentation via topological methods. In Proc. 23rd ACM SIGSPATIAL, page 25. ACM, 2015.
• [29] Y. Yan, K. Sykes, E. Chambers, D. Letscher, and T. Ju. Erosion thickness on medial axes of 3d shapes. ACM Trans. on Graphics, 35(4):38, 2016.

## Appendix A Missing proofs

### a.1 Proof of Proposition 5.1

#### Proof of claim (i):

We first show that there can be no critical vertex of from the interior of outside-region . Indeed, suppose there is a critical vertex , then it forms a persistence pair either with or with a critical edge in . The former cannot happen as the only vertex unpaired by the persistence algorithm is the global minimum of the input function , which necessarily lies in the neighborhood . So suppose is the persistent pair containing . It then follows that as it must come after in the lower-star filtration induced by . Hence is necessarily smaller than under our noise model. In other words, the persistence pairing should have already been canceled during the -Morse simplification process. As a result, there cannot be any critical vertex left in .

Next, we argue that there is exactly one critical vertex in the neighborhood in the final discrete gradient vector field . First, note that the global minimum of the PL-function must remain critical, as will be paired with by the persistent homology induced by the lower-star filtration, and the persistence pairing remains after the -Morse cancellation by Lemma 3.2 (ii). We now prove that there cannot be any other critical vertex in .

Assume on the contrary that there is another critical vertex . This means we have a persistence pairing with . It then follows that by our assumption on the noise model for . Recall that the low-star filtration is induced by adding the lower-star of s in order where are sorted in increasing order of -values. Specifically, contains the following sequence: where . Assume that such that (i.e, ). Then is the subcomplex in the filtration when the edge is first included. It follows from the persistence algorithm [14] that, there are two connected components