Computing Minimal Persistent Cycles: Polynomial and Hard Cases

07/10/2019 ∙ by Tamal K. Dey, et al. ∙ The Ohio State University 0

Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in the purely topological persistence diagrams (also termed as barcodes). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with Z_2 coefficients. In addition to proving that it is NP-hard to compute minimal persistent d-cycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the d-th persistence diagram of a weak (d+1)-pseudomanifold, we utilize the fact that persistent cycles of such intervals are null-homologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NP-hard, we further assume the weak (d+1)-pseudomanifold to be embedded in R^d+1 so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Persistent homology [15], which captures essential topological features of data, has proven to be a useful stable descriptor since Edelsbrunner et al. [16] first proposed the algorithm for its computation. The understanding of topological persistence was later expanded by several works [5, 9, 11, 29] in terms of both theory and computation. To make use of persistent homology, one typically computes a persistence diagram (also called barcode) which is a set of intervals with birth and death points. Besides just utilizing the set of intervals, some applications [13, 28] need persistence diagrams augmented with representative cycles for the intervals for gaining more insight into the data. These representative cycles, termed as persistent cycles [13], have been studied by Wu et al. [28], Obayashi [23], and Dey et al. [13] recently from the view-point of optimality.

Although the original persistence algorithm of Edelsbrunner et al. [16] implicitly computes persistent cycles, it does not necessarily provide minimal ones. In an earlier work [13], we showed that it is NP-hard to compute minimal persistent -cycles (cycles for 1-dimensional homology groups) when the given interval is finite. Interestingly, the same for infinite intervals turned out to be computable in polynomial time [13]. This naturally leads to the following questions: Are there other interesting cases beyond -dimension for which minimal persistent cycles can be computed in polynomial time? Also, what are the cases that are NP-hard? In this paper, we settle the complexity question for computing minimal persistent cycles with coefficients in general dimensions. We first show that when , computing minimal persistent -cycles for both finite and infinite intervals is NP-hard in general. We then identify a special but important class of simplicial complexes, which we term as weak -pseudomanifolds, whose minimal persistent -cycles can be computed in polynomial time. A weak -pseudomanifold111 The naming of weak pseudomanifold is adapted from the commonly accepted name pseudomanifold (see Definition 9). is a generalization of a -manifold and is defined as follows:

Definition 1.

A simplicial complex is a weak -pseudomanifold if each -simplex is face of no more than two -simplices in .

Specifically, we find that if the given complex is a weak -pseudomanifold, the problem of computing minimal persistent -cycles for finite intervals can be cast into a minimal cut problem (see Section 3) due to the fact that persistent cycles of such kind are null-homologous in the complex. However, when and intervals are infinite, the computation of the same becomes NP-hard (see Section 5). Nonetheless, for infinite intervals, if we assume that the weak -pseudomanifold is embedded in , the minimal persistent cycle problem reduces to a minimal cut problem (see Section 4) and hence belongs to P. Note that a simplicial complex embedded in is automatically a weak -pseudomanifold. Also note that while there is an algorithm [8] in the non-persistence setting which computes minimal -cycles by minimal cuts, the non-persistence algorithm assumes the -complex to be embedded in . Our algorithm for finite intervals, to the contrary, does not need the embedding assumption.

In order to make our statements about the hardness results precise, we let PCYC-FIN denote the problem of computing minimal persistent -cycles for finite intervals when the given simplicial complex is arbitrary, and let PCYC-INF denote the same problem for infinite intervals (see definitions of Problem 1 and 2). We also let WPCYC-FIN denote a subproblem222 For two problems and , is a subproblem of if any instance of is an instance of and asks for computing the same solutions as . of PCYC-FIN and let WPCYC-INF, WEPCYC-INF denote two subproblems of PCYC-INF, with the subproblems requiring additional constraints on the given simplicial complex. Table 1 lists the hardness results for all problems of interest, where the column “Restriction on ” specifies the additional constraints subproblems require on the given simplicial complex . Note that WPCYC-INF being NP-hard trivially implies that PCYC-INF is NP-hard.

Problem Restriction on Hardness
PCYC-FIN NP-hard
WPCYC-FIN a weak -pseudomanifold Polynomial
PCYC-INF Polynomial
WPCYC-INF a weak -pseudomanifold. NP-hard
WEPCYC-INF a weak -pseudomanifold in Polynomial
Table 1: Hardness results for minimal persistent cycle problems with bold results denoting new findings in this paper.

Main contributions.

We summarize our contributions as follows:

  • We prove the NP-hardness of PCYC-FIN and WPCYC-INF for all .

  • We present two polynomial time algorithms for WPCYC-FIN and WEPCYC-INF when , based on the duality of minimal persistent cycles and minimal cuts. Other than the minimal cut computation, steps in both algorithms run in linear or almost linear time.

1.1 Related works

In the context of computing optimal cycles, most works have been done in the non-persistence setting. These works compute minimal cycles for homology groups of a given simplicial complex. Only very few works address the problem while taking into account the persistence. We review some of the relevant works below.

Minimal cycles for homology groups.

In terms of computing minimal cycles for homology groups, two problems are of most interest: the localization problem and the minimal basis problem. The localization problem asks for computing a minimal cycle in a homology class and the minimal basis problem asks for computing a set of generating cycles for a homology group whose sum of weights is minimal. With coefficients, these two problems are in general hard. Specifically, Chambers et al. [4] proved that the localization problem over dimension one is NP-hard when the given simplicial complex is a 2-manifold. Chen and Freedman [8] proved that the localization problem is NP-hard to approximate with fixed ratio over arbitrary dimension. They also showed that the minimal basis problem is NP-hard to approximate with fixed ratio over dimension greater than one. For one-dimensional homology, Dey et al. [14] proposed a polynomial time algorithm for the minimal basis problem. Several other works [7, 12, 18] address variants of the two problems while considering special input classes, alternative cycle measures, or coefficients for homology other than .

In this work, we use graph cuts and their duality extensively. The duality of cuts on a planar graph and separating cycles on the dual graph has long been utilized to efficiently compute maximal flows and minimal cuts on planar graphs, a topic for which Chambers et al. [4] provide a comprehensive review. In their paper [4], Chambers et al. discover the duality between minimal cuts of a surface-embedded graph and minimal homologous cycles in a dual complex, and then devise algorithms for both problems assuming the genus of the surface to be fixed. Chen and Freedman [8] proposed an algorithm which computes a minimal non-bounding -cycle given a -complex embedded in , utilizing a natural duality of -cycles in the complex and cuts in the dual graph. The minimal non-bounding cycle algorithm can be further extended to solve the localization problem and the minimal basis problem over dimension given a -complex embedded in .

Persistent cycle.

As pointed out earlier, our main focus is the optimality of representative cycles in the persistence framework. Some early works [17, 19] address the representative cycle problem for persistence by computing minimal cycles at the birth points of intervals without considering what actually die at the death points. Wu et al. [28]

proposed an algorithm computing minimal persistent 1-cycles for finite intervals using an annotation technique and heuristic search. However, the time complexity of the algorithm is exponential in the worst-case. Obayashi 

[23]

casts the minimal persistent cycle problem for finite intervals into an integer program, but the rounded result of the relaxed linear program is not guaranteed to be optimal. Dey et al. 

[13] formalizes the definition of persistent cycles for both finite and infinite intervals. They also proved the NP-hardness of computing minimal persistent 1-cycles for finite intervals and proposed a polynomial time algorithm for computing non-optimal ones which are still good in practice.

2 Preliminaries

In this section we present some concepts necessary for presenting the results in this paper.

Simplicial complex and filtration.

A simplicial complex is a collection of simplices which are abstractly defined as subsets of a ground set called the vertex set of . If a simplex is in , then all its subsets called its faces are also in . The simplex is also referred to as a -simplex if the cardinality of the vertex set of is . A -face of is a -simplex being face of and a -coface of is a -simplex having as a face. A simplicial set is a set of simplices. The closure of a simplicial set is the simplicial complex consisting of all the faces of the simplices in . A simplicial complex is finite if it contains finitely many simplices. In this paper, we only consider finite simplicial complexes.

If each vertex of a simplicial complex is a point in a Euclidean space, then each simplex of can be interpreted as the convex hull of its vertices. The simplicial complex is said to be embedded in the Euclidean space if the interiors of all its simplices are disjoint. The underlying space of , denoted by , is the point-wise union of all the simplices of .

A filtration of a simplicial complex is a filtered sequence of subcomplexes of , , such that and differ by one simplex denoted by . We let be the index of in and denote it as . A subcomplex in the filtered sequence of is also referred to as a partial complex.

Homology.

In this paper, two coefficients and are used for simplicial homology. When not explicitly stated, the coefficients are assumed to be in . For a simplicial complex , denotes the chain group, denotes the cycle group, denotes the boundary group, and denotes the homology group. The boundary operator for simplicial chains is denoted by . With coefficients, a -cycle is a set of -simplices so that every -face of these simplices adjoins an even number of -simplices. We recommend the book by Hatcher [21] for more details on homology groups and algebraic topology in general.

Definition 2 (-weighted).

A simplicial complex is -weighted if each -simplex of has a non-negative finite weight . The weight of a -chain of is then defined as .

Definition 3 (-connected).

Let be a simplicial complex, for , two -simplices and of are -connected in if there is a sequence of -simplices of , , such that , , and for all , and share a -face. The property of -connectedness defines an equivalence relation on -simplices of . Each set in the partition induced by the equivalence relation constitutes a -connected component of . We say is -connected if any two -simplices of are -connected in .

Remark 1.

See Figure (a)a for an example of 1-connected components and 2-connected components.

Definition 4 (-connected cycle).

A -cycle (with coefficients) is -connected if the complex derived by taking the closure of the simplicial set is -connected.

Persistent homology.

We will provide a brief description of persistent homology. We recommend the book by Edelsbrunner and Harer [15] for a detailed explanation of this topic and the book by Chazal et al. [6] for its underlying Mathematical structure, persistence module. Note that persistent homology in this paper is always assumed to be with coefficients. The persistence algorithm starts with a filtration of a simplicial complex , and for each simplex , inspects whether is a boundary in . If is a boundary in , is called positive; otherwise, it is called negative. The -chains (or -cycles) in that are not in are said to be born in or created by . A positive -simplex creates some -cycles and a negative -simplex makes some -cycles become boundaries. What is central to the persistence algorithm is a notion called pairing: A positive simplex is initially unpaired when introduced; when a negative -simplex comes, the algorithm finds a -cycle created by an unpaired positive -simplex which is homologous to and pair with . Alongside the pairing, a finite interval is added to the persistence diagram, which is denoted by . After all simplices are processed, some positive simplices may still be unpaired. For each of these unpaired simplices, an infinite interval is added to , where is the dimension of .

We can now formally define the persistent cycle problems:

Problem 1 (Pcyc-Fin).

Given a finite -weighted simplicial complex , a filtration , and a finite interval , this problem asks for computing a -cycle with the minimal weight which is born in and becomes a boundary in .

Problem 2 (Pcyc-Inf).

Given a finite -weighted simplicial complex , a filtration , and an infinite interval , this problem asks for computing a -cycle with the minimal weight which is born in .

Remark 2.

The definitions of the above two problems are derived directly from the definition of persistent -cycles [13].

Undirected flow network.

An undirected flow network consists of an undirected graph with vertex set and edge set , a capacity function , and two non-empty disjoint subsets and of . Vertices in are referred to as sources and vertices in are referred to as sinks. A cut of consists of two disjoint subsets and of such that , , and . We define the set of edges across the cut as

The capacity of a cut is defined as . A minimal cut of is a cut with the minimal capacity. Note that we allow parallel edges in (see Figure (a)a) to ease the presentation. These parallel edges can be merged into one edge during computation.

3 Minimal persistent -cycles of finite intervals for weak -pseudomanifolds

(a)

(b)

(c)
(d)
Figure 1: An example of the constructions in our algorithm showing the duality between persistent cycles and cuts having finite capacity for . (a) The input weak 2-pseudomanifold with its dual flow network drawn in blue, where the central hollow vertex denotes the dummy vertex, the red vertex denotes the source, and all the black vertices (including the dummy one) denote the sinks. All “dangled” graph edges dual to the outer boundary 1-simplices actually connect to the dummy vertex and these connections are not drawn. (b) The partial complex in the input filtration , where the bold green 1-simplex denotes which creates the green 1-cycle. (c) The partial complex in , where the 2-simplex creates the pink 2-chain killing the green 1-cycle. (d) The green persistent 1-cycle of the interval is dual to a cut having finite capacity, where contains all the vertices inside the pink 2-chain and contains all the other vertices. The red graph edges denote those edges across and their dual 1-chain is the green persistent 1-cycle.

In this section, we present an algorithm which computes minimal persistent -cycles for finite intervals given a filtration of a weak -pseudomanifold when . The general process is as follows: Suppose the input weak -pseudomanifold is which is associated with a filtration and the task is to compute the minimal persistent cycle of a finite interval . We first construct an undirected dual graph for where vertices of are dual to -simplices of and edges of are dual to -simplices of . One dummy vertex termed as infinite vertex which does not correspond to any -simplices is added to for graph edges dual to those boundary -simplices. We then build an undirected flow network on top of where the source is the vertex dual to and the sink is the infinite vertex along with the set of vertices dual to those -simplices which are added to after . If a -simplex is or added to before , we let the capacity of its dual graph edge be its weight; otherwise, we let the capacity of its dual graph edge be . Finally, we calculate a minimal cut of this flow network and return the -chain dual to the edges across the minimal cut as a minimal persistent cycle of the interval.

The intuition of the above algorithm is best explained by an example in Figure 1, where . The key to the algorithm is the duality between persistent cycles of the input interval and cuts of the dual flow network having finite capacity. To see this duality, first consider a persistent -cycle of the input interval . There exists a -chain in created by whose boundary equals , making killed. We can let be the set of graph vertices dual to the simplices in and let be the set of the remaining graph vertices, then is a cut. Furthermore, must have finite capacity as the edges across it are exactly dual to the -simplices in and the -simplices in have indices in less than or equal . On the other hand, let be a cut with finite capacity, then the -chain whose simplices are dual to the vertices in is created by . Taking the boundary of this -chain, we get a -cycle . Because -simplices of are exactly dual to the edges across and each edge across has finite capacity, must reside in . We only need to ensure that contains in order to show that is a persistent cycle of . In Section 3.2, we argue that actually contains , so is indeed a persistent cycle. Note that while the above explanation introduces the general idea, the rigorous statement and proof of the duality are articulated by Proposition 2 and 3.

In the dual graph, an edge is created for each -simplex. If a -simplex has two -cofaces, we simply let its dual graph edge connect the two vertices dual to its two -cofaces; otherwise, its dual graph edge has to connect to the infinite vertex on one end. A problem about this construction is that some weak -pseudomanifolds may have -simplices being face of no -simplices and these -simplices may create self loops around the infinite vertex. To avoid self loops, we simply ignore these -simplices by constructing the dual graph only from the -connected component of . The reason why we can ignore these -simplices is that they cannot be on the boundary of a -chain and hence cannot be on a persistent cycle of minimal weight. Note that taking the -connected component may also reduce the size of the dual graph.

We list the pseudo-code in Algorithm 1 and it works as follows: Line 2 and 3 set up a complex that the algorithm mainly works on, where is taken as the closure of the -connected component of containing . Line 4 constructs the dual graph from and line 513 builds the flow network on top of . Note that we denote the infinite vertex by . Line 14 computes a minimal cut for the flow network and line 15 returns the -chain dual to the edges across the minimal cut. In the pseudo-codes of this paper, to make presentation of algorithms and some proofs easier, we treat a Mathematical function as a computer program object. For example, the function returned by DualGraphFin in Algorithm 1 denotes the correspondence between the simplices of and their dual vertices or edges (see Section 3.1 for details). In practice, these constructs can be easily implemented in any computer programming language.

Input:

: finite -weighted weak -pseudomanifold

: integer

: filtration of

: finite interval of

Output:

minimal persistent -cycle of

1:procedure MinPersCycFin()
2:      -connected component of containing set up
3:      closure of the simplicial set
4:      construct dual graph
5:     for each  do assign capacity to
6:         if  then
7:              
8:         else
9:                             
10:      set the source
11:      set the sink
12:     if  then
13:               
14:      min-cut of
15:     return
Algorithm 1 Computing minimal persistent -cycles of finite intervals for weak -pseudomanifolds

Complexity.

The time complexity of Algorithm 1 depends on the encoding scheme of the input and the data structure used for representing a simplicial complex. For encodings of the input, we assume and to be represented by a sequence of all the simplices of ordered by their indices in , where each simplex is denoted by its set of vertices. We also assume a simple yet reasonable simplicial complex data structure as follows: In each dimension, simplices are mapped to integral identifiers ranging from 0 to the number of simplices in that dimension minus 1; each -simplex has an array (or linked list) storing all the id’s of its -cofaces; a hash map for each dimension is maintained for the query of the integral id of each simplex in that dimension based on the spanning vertices of the simplex. We further assume to be constant. By the above assumptions, let be the size (number of bits) of the encoded input, then there are no more than elementary operations in line 2 and 3 so the time complexity of line 2 and 3 is . It is not hard to verify that the flow network construction also takes time so the time complexity of Algorithm 1 is determined by the minimal cut algorithm. Using the max-flow algorithm by Orlin [24], the time complexity of Algorithm 1 becomes .

In the rest of this section, we first describe the subroutine DualGraphFin, then close the section by proving the correctness of the algorithm.

3.1 Dual graph construction

In this subsection, we describe the DualGraphFin subroutine of Algorithm 1, which returns a dual graph and a denoting two bijections which we will explain later. Given the input , DualGraphFin constructs an undirected connected graph as follows:

  • Let each vertex of correspond to each -simplex of . If there is any -simplex of which has less than two -cofaces in , we add an infinite vertex to . Simultaneously, we define a bijection

    by letting . Note that in the above range notation of , may not be a subset of .

  • Let each edge of correspond to each -simplex of . Note that has at least one -coface in . If has two -cofaces and in , then let connect and ; if has one -coface in , then let connect and . We define another bijection

    using the same notation as the bijection for , by letting .

Note that we can take the image of a subset of the domain under a function. Therefore, if is a cut for a flow network built on , then denotes the set of -simplices dual to the edges across the cut. Also note that since simplicial chains with coefficients can be interpreted as sets, is also a -chain.

3.2 Algorithm correctness

In this subsection, we prove the correctness of Algorithm 1. Some of the symbols we use refer to Algorithm 1.

Proposition 1.

In Algorithm 1, is not an empty set.

Proof.

For contradiction, suppose is an empty set, then and is the -simplex of with the greatest index in . Because , any -simplex of must be face of two -simplices of , so the set of -simplices of forms a -cycle created by . Then must be a positive simplex in , which is a contradiction. ∎

The following two propositions specify the duality mentioned at the beginning of this section:

Proposition 2.

For any cut of with finite capacity, the -chain is a persistent -cycle of and .

Proof.

Let , we first want to prove , so that is a cycle. Let be any -simplex of , then connects a vertex and a vertex . If , then cannot be face of another -simplex in other than so is face of exactly one -simplex of . If , then it is also true that is face of exactly one -simplex of , so . On the other hand, let be any -simplex of , then is face of exactly one -simplex of . If is face of another -simplex in , then and , so and connects and in . If is a face of exactly one -simplex in , must connect and in . So we have , i.e., .

We then show that is created by . Suppose is created by a -simplex . Because is finite, we have that . We can let be a persistent cycle of and where is a -chain of . Then we have . Since and are both created by , then is created by a -simplex with an index less than in . So is a -cycle created by which becomes a boundary before is added. This means that is already paired when is added, contradicting the fact that is paired with . Similarly, we can prove that is not a boundary until is added, so is a persistent cycle of . Since has finite capacity, we must have

Proposition 3.

For any persistent -cycle of , there exists a cut of such that .

Proof.

Let be a -chain in such that . Note that is created by and is the set of -simplices which are face of exactly one -simplex of . Let and , we claim that . To prove this, first let be any -simplex of , then is face of exactly one -simplex of . Because , it is also true that , so . Then is face of exactly one -simplex of , so . On the other hand, let be any -simplex of , then is face of exactly one -simplex of . Note that and we then want to prove that is face of exactly one -simplex of . Suppose is face of another -simplex of , then because . So we have , contradicting the fact that is face of exactly one -simplex of . Then we have . Since , we have , which means that .

Let and , then it is true that is a cut of because is created by . We claim that . The proof of the equality is similar to the one in the proof of Proposition 2. It follows that . We then have that

because each -simplex of has an index less than or equal to in .

Finally, because is a subchain of , we must have . ∎

Combining the above facts, we can conclude:

Theorem 4.

Algorithm 1 computes a minimal persistent -cycle for the given interval .

Proof.

First, the flow network constructed by Algorithm 1 must be valid by Proposition 1. Next, because the interval must have a persistent cycle, by Proposition 3, the flow network has a cut with finite capacity. This means that is finite. By Proposition 2, the chain is a persistent cycle of . Suppose is not a minimal persistent cycle of and instead let be a minimal persistent cycle of . Then there exists a cut such that by Proposition 2 and 3, contradicting the fact that is a minimal cut. ∎

4 Minimal persistent -cycles of infinite intervals for weak -pseudomanifolds embedded in

We already mentioned that computing minimal persistent -cycles () for infinite intervals is NP-hard even if we restrict to weak -pseudomanifolds (see Section 5.3 for a proof). However, when the complex is embedded in , the problem becomes polynomially tractable. In this section, we present an algorithm for this problem given a weak -pseudomanifold embedded in , when 333 As mentioned earlier, when , this problem is polynomially tractable for arbitrary complexes [13].. The algorithm uses a similar duality described in Section 3. However, a direct use of the approach in Section 3 does not work. For example, in Figure (a)a, 1-simplices that do not have any 2-cofaces cannot reside in any -connected component of the given complex. Hence, no cut in the flow network may correspond to a persistent cycle of the infinite interval created by such a -simplex. Furthermore, unlike the finite interval case, we do not have a negative simplex whose dual can act as a source in the flow network.

(a)
(b)
Figure 2: (a) A weak 2-pseudomanifold embedded in with three voids. Its dual graph is drawn in blue. The complex has one 1-connected component and four 2-connected components with the 2-simplices in different 2-connected components colored differently. (b) An example illustrating the pairing of boundary -simplices in the neighborhood of a -simplex for . The four boundary 1-simplices produce six oriented boundary 1-simplices and the paired oriented 1-simplices are colored the same.

Let be an input to the problem where is a weak -pseudomanifold embedded in , is a filtration of , and is an infinite interval of . By the definition of the problem, the task boils down to computing a minimal -cycle containing in . Note that is also a weak -pseudomanifold embedded in .

Generically, assume is an arbitrary weak -pseudomanifold embedded in and we want to compute a minimal -cycle containing a -simplex for . By the embedding assumption, the connected components of are well defined and we call them the voids of . The complex has a natural (undirected) dual graph structure as exemplified by Figure (a)a for , where the graph vertices are dual to the -simplices as well as the voids and the graph edges are dual to the -simplices. The duality between cycles and cuts is as follows: Since the ambient space is contractible (homotopy equivalent to a point), every -cycle in is the boundary of a -dimensional region obtained by point-wise union of certain -simplices and/or voids. We can derive a cut444 The cut mentioned here is defined on a graph without sources and sinks, so a cut is simply a partition of the graph’s vertex set into two sets. of the dual graph by putting all vertices contained in the -dimensional region into one vertex set and putting the rest into the other vertex set. On the other hand, for every cut of the graph, we can take the point-wise union of all the -simplices and voids dual to the graph vertices in one set of the cut and derive a -dimensional region. The boundary of the derived -dimensional region is then a -cycle in . We observe that by making the source and sink dual to the two -simplices or voids that adjoins, we can build a flow network where a minimal cut produces a minimal -cycle in containing .

The efficiency of the above algorithm is in part determined by the efficiency of the dual graph construction. This step requires identifying the voids that the boundary -simplices are incident on. A straightforward approach would be to first group the boundary -simplices into -cycles by local geometry, and then build the nesting structure of these -cycles to correctly reconstruct the boundaries of the voids. This approach has a quadratic worst-case complexity. To make the void boundary reconstruction faster, we assume that the simplicial complex being worked on is -connected so that building the nesting structure is not needed. Our reconstruction then runs in almost linear time. To satisfy the -connected assumption, we begin our algorithm by taking as a -connected subcomplex of containing and continue only with this . The computed output is still correct because the minimal cycle in is again a minimal cycle in as shown in Section 4.2.

We list the pseudo-code in Algorithm 2 and it works as follows: Line 25 set up the complex that the algorithm works on. Line 2 prunes to produce a complex . Given , the Prune subroutine iteratively deletes a -simplex of such that there is a -face of having as the only -coface (i.e., is a dangled -simplex), until no such -simplex can be found. It is not hard to verify that Prune only deletes -simplices not residing in any -cycles, so a minimal -cycle containing is never deleted. We perform the pruning because it can reduce the graph size for the minimal cut computation which is more time consuming. In line 35, we take the -connected component of containing and add a set of -simplices to the closure of to form . The set contains all -simplices of whose -faces reside in . The reason of adding the set is to reduce the number of voids for the complex and in turn reduce the running time of the subsequent void boundary reconstruction. For example, in Figure (b)b, we could treat the entire complex as , all 1-simplices as , and all 2-simplices as . If we do not add to the closure of , there will be seven more voids corresponding to the boundaries of the seven 2-simplices. Line 6 reconstructs the void boundaries for . Each returned denotes a set of -simplices forming the boundary of a void. As indicated in Section 4.1, the -simplices in a void boundary are oriented. Line 7 constructs the dual graph based on the reconstructed void boundaries. Similar to Algorithm 1, the function returned by DualGraphInf denotes the bijection from -simplices of to . Line 812 build the flow network on top of . The capacity of each edge is equal to the weight of its dual -simplex and the source and sink are selected as previously described. Line 13 computes a minimal cut for the flow network and line 14 returns the -chain dual to the edges across the minimal cut.

Input:

: finite -weighted weak -pseudomanifold embedded in

: integer

: filtration of

: infinite interval of

Output:

minimal persistent -cycle of

1:procedure MinPersCycInf()
2:      set up
3:      -connected component of containing
4:     
5:     closure of the simplicial set
6:      construct dual graph
7:     
8:     for each  do assign capacity to
9:               
10:      end vertices of edge in
11:      set the source
12:      set the sink
13:      min-cut of
14:     return
Algorithm 2 Computing minimal persistent -cycles of infinite intervals for weak -pseudomanifolds embedded in

Complexity.

We make the same assumptions as in the complexity analysis for Algorithm 1. Since the void boundary reconstruction needs to sort the -cofaces of certain -simplices, its worst-case time complexity is . Then, all operations other than the minimal cut computation take time. Therefore, similar to Algorithm 1, Algorithm 2 achieves a complexity of by using Orlin’s max-flow algorithm [24].

In the rest of this section, we first describe the subroutine VoidBoundary invoked by Algorithm 2 and then prove the correctness of the algorithm.

4.1 Void boundary reconstruction

(a)
(b)
Figure 3: Examples showing how the void boundaries are reconstructed for . (a) Oriented boundary -simplices are grouped into six 1-cycles and these six 1-cycles are further grouped into four void boundaries with each void boundary identically colored. (b) With the complex being 1-connected, the four grouped 1-cycles are exactly the boundaries of the four voids.

As previously stated, the object of the reconstruction is to identify which voids a boundary -simplex of is incident on. The task becomes complicated because a void may have disconnected boundaries and a -simplex may bound more than one void. This is exemplified in Figure (a)a. To address this issue, we orient the boundary -simplices and determine the orientations consistently from the voids they bound. This is possible because an orientation of a -simplex in associates exactly one of its two sides to the -simplex. To describe the boundary reconstruction procedure, we define a boundary -simplex of as a -simplex with less than two -cofaces in . We also denote the set of boundary -simplices of as . To reconstruct the boundaries, we first inspect the neighborhood of each -simplex being face of a boundary -simplex and pair the oriented boundary -simplices in the neighborhood which locally bound the same void. Figure (b)b gives an example of the oriented boundary -simplices pairing for . In Figure (b)b, there are three local voids each colored differently. The oriented 1-simplices with the same color bound the same void and are paired.

After pairing the oriented boundary -simplices, we group them by putting paired ones into the same group. Each group then forms a -cycle (with coefficients). This is exemplified by Figure 3 for . Note that in general, the above grouping does not fully reconstruct the void boundaries. This can be seen from Figure (a)a where the complex has four voids but the grouping produces six 1-cycles. In order to fully reconstruct the boundaries, one has to retrieve the nesting structure of these -cycles, which may take time in the worst-case. However, as we work on a complex that is -connected, we cannot have voids with disconnected boundaries. Therefore, the grouping of oriented -simplices can fully recover the void boundaries. Figure (b)b gives an example for this when , where we add two 1-simplices to make the complex 1-connected. The four 1-cycles produced by the grouping are exactly the boundaries of the four voids.

In the rest of this subsection, we formalize the above ideas for reconstructing void boundaries and provide a proof for the correctness. Throughout this subsection, and are as defined in Algorithm 2. We first recall the definition of oriented simplices:

Definition 5 (Oriented simplex [22]).

An oriented -simplex is a -simplex with an ordering of its vertices. For each -simplex (), there are exactly two equivalent classes of vertex orderings, resulting in two oriented -simplices of . We refer them as the oppositely oriented -simplices.

Remark 3.

Any simplex is by default unoriented. We denote an unoriented -simplex spanned by vertices as and an oriented -simplex as , where specify the ordering of the spanning vertices.

We then introduce the definition of the natural orientation of a -simplex in . We use its induced orientation to canonically orient the boundary simplices.

Definition 6 (Natural orientation [22]).

Let and be a -simplex in , an oriented simplex of is naturally oriented if . For each face of , the natural orientation of induces an orientation of which we term as the induced orientation.

We now formally define the boundary of a void as follows:

Definition 7 (Boundary of void).

Let be a simplicial complex embedded in where , we define each connected component of to be a void. An oriented -simplex of is said to bound a void of if the following conditions are satisfied:

  • The simplex is contained in the closure of .

  • Let be an interior point of , be a point in such that the line segment is contained in and

    is orthogonal to the hyperplane spanned by

    . Furthermore, let be the naturally oriented simplex of . Then, has the induced orientation from .

The boundary of a void is then defined as the set of oriented -simplices of bounding .

Remark 4.

We can also interpret the boundary of a void as a sum of oriented -simplices, then the boundary defines a -cycle (with coefficients).

We now describe the pairing algorithm of the oriented boundary -simplices for . Let be a -simplex which is a face of a -simplex in