Computing Bottleneck Distance for 2-D Interval Decomposable Modules

03/07/2018 ∙ by Tamal K. Dey, et al. ∙ The Ohio State University 0

Computation of the interleaving distance between persistence modules is a central task in topological data analysis. For 1-D persistence modules, thanks to the isometry theorem, this can be done by computing the bottleneck distance with known efficient algorithms. The question is open for most n-D persistence modules, n>1, because of the well recognized complications of the indecomposables. Here, we consider a reasonably complicated class called 2-D interval decomposable modules whose indecomposables may have a description of non-constant complexity. We present a polynomial time algorithm to compute the bottleneck distance for these modules from indecomposables, which bounds the interleaving distance from above, and give another algorithm to compute a new distance called dimension distance that bounds it from below.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Persistence modules have become an important object of study in topological data analysis in that they serve as an intermediate between the raw input data and the output summarization with persistence diagrams. The classical persistence theory [18] for

-valued functions produces one dimensional (1-D) persistence modules, which is a sequence of vector spaces (homology groups with a field coefficient) with linear maps over

seen as a poset. It is known that [16, 26], this sequence can be decomposed uniquely into a set of intervals called bars which is also represented as points in called the persistence diagrams [15]. The space of these diagrams can be equipped with a metric called the bottleneck distance. Cohen-Steiner et al. [15] showed that is bounded from above by the input function perturbation measured in infinity norm. Chazal et al. [12] generalized the result by showing that the bottleneck distance is bounded from above by a distance called the interleaving distance between two persistence modules; see also [6, 8, 17] for further generalizations. Lesnick [21] (see also [2, 13]) established the isometry theorem which showed that indeed . Consequently, for -D persistence modules can be computed exactly by efficient algorithms known for computing ; see e.g. [18, 19]. The status however is not so well settled for multidimensional (-D) persistence modules [9] arising from -valued functions.

Extending the concept from -D modules, Lesnick defined the interleaving distance for multidimensional (n-D) persistence modules, and proved its stability and universality [21]. The definition of the bottleneck distance, however, is not readily extensible mainly because the bars for finitely presented -D modules called indecomposables are far more complicated though are guaranteed to be essentially unique by Krull-Schmidt theorem [1]. Nonetheless, one can define as the supremum of the pairwise interleaving distances between indecomposables, which in some sense generalizes the concept in -D due to the isometry theorem. Then, straightforwardly, as observed in [7], but the converse is not necessarily true. For some special cases, results in the converse direction have started to appear. Botnan and Lesnick [7] proved that, in -D, for what they called block decomposable modules. Bjerkevic [4] improved this result to . Furthermore, he extended it by proving that for rectangle decomposable -D modules and for free -D modules. He gave an example for exactness of this bound when .

Unlike

-D modules, the question of estimating

for -D modules through efficient algorithms is largely open [5]. Multi-dimensional matching distance introduced in [10] provides a lower bound to interleaving distance [20] and can be approximated within any error threshold by algorithms proposed in [3, 11]. But, it cannot provide an upper bound like . For free, block, rectangle, and triangular decomposable modules, one can compute by computing pairwise interleaving distances between indecomposables in constant time because they have a description of constant complexity. Due to the results mentioned earlier, can be estimated within a constant or dimension-dependent factors by computing for these modules. It is not obvious how to do the same for the larger class of interval decomposable modules mentioned in the literature [4, 7] where indecomposables may not have constant complexity. These are modules whose indecomposables are bounded by “stair-cases”. Our main contribution is a polynomial time algorithm that, given indecomposables, computes exactly for -D interval decomposable modules. The algorithm draws upon various geometric and algebraic analysis of the interval decomposable modules that may be of independent interest. It is known that no lower bound in terms of for may exist for these modules [7]. To this end, we complement our result by proposing a distance called dimension distance that is efficiently computable and satisfies the condition .

2 Persistence modules

Our goal is to compute the bottleneck distance between two 2-D interval decomposable modules. The bottleneck distance, originally defined for 1-D persistence modules [15] (also see [2]), and later extended to multi-dimensional persistence modules [7] is known to bound the interleaving distance between two persistence modules from above.

Let be a field, be the category of vector spaces over , and be the subcategory of finite dimensional vector spaces. In what follows, for simplicity, we assume .

Definition 1 (Persistence module).

Let be a poset category. A -indexed persistence module is a functor . If takes values in , we say is pointwise finite dimensional (p.f.d). The -indexed persistence modules themselves form another category where the natural transformations between functors constitute the morphisms.

Here we consider the poset category to be with the standard partial order and all modules to be p.f.d. We call -indexed persistence modules as -dimensional persistence modules, -D modules in short. The category of -D modules is denoted as . For an -D module , we use notation and .

Definition 2 (Shift).

For any , we denote , where is the standard basis of . We define a shift functor where is given by and . In words, is the module shifted diagonally by .

The following definition of interleaving taken from  [24] adapts the original definition designed for -D modules in [13] to -D modules.

Definition 3 (Interleaving).

For two persistence modules and , and , a -interleaving between and are two families of linear maps and satisfying the following two conditions (see Appendix A for commutative diagrams):

  • and

  • and symmetrically

If such a -interleaving exists, we say and are -interleaved. We call the first condition triangular commutativity and the second condition square commutativity.

Definition 4 (Interleaving distance).

Define the interleaving distance between modules and as . We say and are -interleaved if they are not -interleaved for any , and assign .

Definition 5 (Matching).

A matching between two multisets and is a partial bijection, that is, for some and . We say .

For the next definition [7], we call a module -trivial if for all .

Definition 6 (Bottleneck distance).

Let and be two persistence modules, where and are indecomposable submodules of and respectively. Let and . We say and are -matched for if there exists a matching so that, (i) -trivial, (ii) -trivial, and (iii) -interleaved.

The bottleneck distance is defined as

The following fact observed in [7] is straightforward from the definition.

Fact 7.

.

2.1 Interval decomposable modules

Persistence modules whose indecomposables are interval modules (Definition 9) are called interval decomposable modules, see for example [7]. To account for the boundaries of free modules, we enrich the poset by adding points at and consider the poset where with the usual additional rule .

Definition 8.

An interval is a subset that satisfies the following:

  1. If and , then ;

  2. If , then there exists a sequence ( for some such that . We call the sequence () a path from to (in ).

In what follows, we fix the dimension . Let denote the closure of an interval in the standard topology of . The lower and upper boundaries of are defined as

See the figure below. Let .

We say an interval is discretely presented if its boundary consists of a finite set of horizontal and vertical line segments called edges, with end points called vertices, which satisfy the following conditions: (i) every vertex is incident to either a single edge or to a horizontal and a vertical edge, (ii) no vertex appears in the interior of an edge. We denote the set of edges and vertices with and respectively.

According to this definition, is an interval with boundary that consists of all the points with at least one coordinate . The vertex set consists of four corners of the infinitely large square with coordinates .

Definition 9 (Interval module).

A -D interval persistence module, or interval module in short, is a persistence module that satisfies the following condition: for some interval , called the interval of ,

It is known that an interval module is indecomposable [21].

Definition 10 (Interval decomposable module).

A -D interval decomposable module is a persistence module that can be decomposed into interval modules. We say a -D interval decomposable module is finitely presented if it can be decomposed into finitely many interval modules whose intervals are discretely presented.

3 Algorithm to compute

Given the intervals of the indecomposables (interval modules) as input, an approach based on bipartite-graph matching is well known for computing the bottleneck distance between two -D persistence modules and  [18]. This approach constructs a bi-partite graph out of the intervals of and and their pairwise interleaving distances including the distances to zero modules. If these distance computations take time in total, the algorithm for computing takes time if and together have indecomposables altogether. Given indecomposables (say computed by Meat-Axe [22]), this approach is readily extensible to the -D modules if one can compute the interleaving distance between any pair of indecomposables including the zero modules. To this end, we present an algorithm to compute the interleaving distance between two interval modules and with and vertices respectively on their intervals in time. This gives a total time of where is the number of vertices over all input intervals.

Now we focus on computing the interleaving distance between two given intervals. Given two intervals and with vertices, this algorithm searches a value so that there exists two families of linear maps from to and from to respectively which satisfy both triangular and square commutativity. This search is done with a binary probing. For a chosen from a candidate set of values, the algorithm determines the direction of the search by checking two conditions called trivializability and validity on the intersections of modules and .

Definition 11 (Intersection module).

For two interval modules and with intervals and respectively let , which is a disjoint union of intervals, . The intersection module of and is , where is the interval module with interval . That is,

From the definition we can see that the support of , , is . We call each an intersection component of and . Write and consider to be any morphism in the following proposition which says that is constant on .

Proposition 12.

for some .

Proof.

For any , consider a path in from and the commutative diagrams above for (left) and (right) respectively. Observe that in both cases due to the commutativity. Inducting on , we get that . ∎

Definition 13 (Valid intersection).

An intersection component is -valid if for each the following two conditions hold (see figure below):

Proposition 14.

Let be a set of intersection components of and with intervals . Let be the family of linear maps defined as for all and otherwise. Then is a morphism if and only if every is -valid.

See the proof in Appendix A.

We focus on the interval modules with discretely presented intervals (figure on right). They belong to the finitely presented persistence modules as defined in [23]. For an interval module , let be the interval module defined on the closure . To avoid complication in this exposition, we assume that the upper and lower boundaries of every interval module meet exactly at two points. We also assume that every interval module has closed intervals which is justified by the following proposition (proof in Appendix A).

Proposition 15.

.

From the definition of boundaries of intervals, the following proposition is immediate.

Proposition 16.

Given an interval and any point , we have . Similarly, we have .

Definition 17 (Diagonal projection and distance).

Let be an interval and . For , let denote the line called diagonal with slope that passes through . We define (see Figure 1)

In case , define , called the projection point of on , to be the point where . For , is defined to be the edge in containing . Define and accordingly. For , we set if and only if . Then, if and otherwise.

Figure 1: , , (left); and are defined on the left edge of (middle); is - and -trivializable (right)

Notice that upper and lower boundaries of an interval are also intervals by definition. With this understanding, following properties of are obvious from the above definition.

Fact 18.
  1. For any ,

  2. Let or and let be two points such that both exist. If and are on some same horizontal, vertical, or diagonal line, then .

Set , , , and . Following proposition is proved in Appendix A.

Proposition 19.

For an intersection component of and with interval , the following conditions are equivalent:

  1. [label=(0)]

  2. is -valid.

  3. and .

  4. and .

Definition 20 (Trivializable intersection).

Let be a connected component of the intersection of two modules and . For each point , define

For , we say a point is -trivializable if . We say an intersection component is -trivializable if each point in is -trivializable (Figure 1).

Following proposition discretizes the search for trivializability (proof in Appendix A).

Proposition 21.

An intersection component is -trivializable if and only if every vertex of is -trivializable.

Recall that for two modules to be -interleaved, we need two families of linear maps satisfying both triangular commutativity and square commutativity. For a given , Theorem 23 below provides criteria which ensure that such linear maps exist. In our algorithm, we make sure that these criteria are verified.

Given an interval module and the diagonal line for any , there is a -dimensional persistence module which is the functor restricted on the poset as a subcategory of . We call it a 1-dimensional slice of along . Define

Proposition 22 follows from the observation that .

Proposition 22.

For two interval modules and , there exist two families of linear maps and such that for each , the 1-dimensional slices and are -interleaved by the linear maps and .

Theorem 23.

Given , two interval modules and are -interleaved if and only if each intersection component of and is either -valid or -trivializable, and each intersection component of and is either -valid or -trivializable

Proof.

direction: Suppose and are -interleaved. By definition, we have two families of linear maps and which satisfy both triangular and square commutativities. Let the morphisms between the two persistence modules constituted by these two families of linear maps be and respectively. For each intersection component of and with interval , consider the restriction . By Proposition 12, is constant, that is, or . If , by Proposition 14, is -valid. If , by the triangular commutativity of , we have that for each point . That means . By Fact 18(i), . Similarly, , which is the same as to say . By Fact 18(i), . So , we have . This means is -trivializable. Similar statement holds for intersection components of and .

direction: We construct two families of linear maps as follows: On the interval of each intersection component of and , set if is -valid and otherwise. Set for all not in the interval of any intersection component. Similarly, construct . Note that, by Proposition 14, is a morphism between and , and is a morphism between and . Hence, they satisfy the square commutativity. We show that they also satisfy the triangular commutativity. We claim that , and similar statement holds for . From condition that and by proposition 22, we know that there exist two families of linear maps satisfying triangular commutativity everywhere, especially on the pair of -dimensional persistence modules and . From triangular commutativity we know that since otherwise one cannot construct a -interleaving between and . Now for each with , we have by Fact 18, and by our claim. This implies that is a point in an interval of an intersection component of which is not -trivializable. Hence, it is -valid by the assumption. So, by our construction of on valid intersection components, . Symmetrically, we have that is a point in an interval of an intersection component of and which is not -trvializable since . So by our construction of on valid intersection components, . Then, we have for every nonzero linear map . The statement also holds for any nonzero linear map . Therefore, the triangular commutativity holds. ∎

Note that the above proof provides a construction of the interleaving maps for a specific if it exists. Furthermore, the interleaving distance is the infimum of all satisfying the two conditions in the theorem, which means is the infimum of all satisfying condition 2 in Theorem 23. Based on this observation, we propose a search algorithm for computing the interleaving distance for interval modules and .

Definition 24 (Candidate set).

For two interval modules and , and for each point in , let

Algorithm Interleaving (output: , input: and with vertices in total)

  1. Compute the candidate set and let be the half of the smallest difference between any two numbers in . /* time */

  2. Compute ; Let . /* time */

  3. Output after a binary search in by following steps /* ) probes */

    • let

    • Compute intersections and . /* time */

    • For each intersection component, check if it is valid or trivializable according to Theorem 23. /* time */

In the above algorithm, the following generic task of computing diagonal span is performed for several steps. Let and be any two chains of vertical and horizontal edges that are both - and -monotone. Assume that and have at most vertices. Then, for a set of points in , one can compute the intersection of with for every in total time. The idea is to first compute by a binary search a point in so that intersects if at all. Then, for other points in , traverse from in both directions while searching for the intersections of the diagonal line with in lock steps.

Now we analyze the complexity of the algorithm Interleaving. The candidate set, by definition, has only values which can be computed in time by the diagonal span procedure. Proposition 25 shows that is in and can be determined by computing the one dimensional interleaving distances for diagonal lines passing through vertices of and . This can be done in time by diagonal span procedure. Once we determine , we search for in the truncated set to satisfy the first condition of Theorem 23. Intersections between two polygons and bounded by - and -monotone chains can be computed in time by a simple traversal of the boundaries. The validity and trivializability of each intersection component can be determined in time linear in the number of its vertices due to Proposition 19 and Proposition 21 respectively. Since the total number of intersection points is , validity check takes time in total. The check for trivializabilty also takes time if one uses the diagonal span procedure.

Proposition 25 below says that is determined by a vertex in or and . Its proof appears in Appendix A.

Proposition 25.

(i) , (ii) .

The correctness of the algorithm Interleaving already follows from Theorem 23 as long as the candidate set contains the distance . The following concept of stable intersections helps us to establish this result.

Definition 26 (Stable intersection).

Let be an intersection component of and . We say is stable if every intersection point is non-degenerate, that is, is in the interior of two edges and , and at .

From Proposition 43 and Corollary 44 in Appendix A, we have the following claim.

Proposition 27.

if and only if each intersection component of , and is stable.

The main property of a stable intersection component of and is that if we shift one of the interval module, say , to continuously for some small value , the interval of the intersection component of and changes continuously. Next proposition follows directly from the stability of intersection components.

Proposition 28.

For a stable intersection component of and , there exists a positive real so that the following holds:

For each , there exists a unique intersection component of and so that it is still stable and . Furthermore, there is a bijection so that , and are on the same horizontal, vertical, or diagonal line, and . We call the set a stable neighborhood of .

Corollary 29.

For a stable intersection component , we have:

(i) is -valid iff each in the stable neighborhood is -valid.

(ii) If is -trivializable, then is -trivializable.

Proof.

(i): Let be any intersection component in a stable neighborhood of . We know that if is ()-valid, then and . By Proposition 28, and . So is -valid. Other direction of the implication can be proved by switching the roles of and in the above argument.

(ii): From Proposition 28, we have that , there exists a point so that and are on some horizontal, vertical, or diagonal line (), and . Then, by Fact 18(ii), one observes

Therefore, is -trivializable. ∎

Theorem 30.

.

Proof.

Suppose that . Let be the largest value in S satisfying . Note that if and only if . Then, by our assumption that .

By definition of interleaving distance, we have , there is a -interleaving between and , and , there is no -interleaving between and . By Proposition 25(ii), one can see that . So, to get a contradiction, we just need to show that there exists , , satisfying the condition 2 in Theorem 23.

Let be any intersection component of or . Without loss of generality, assume is an intersection component of and . By Proposition 27, is stable. We claim that there exists some such that is an intersection component of and in a stable neighborhood of , and is either -valid or -trivializable.

Let be small enough so that is a stable intersection component of and in a stable neighborhood of . By Theorem 23, is either -valid or -trivializable. If is -valid, then by Corollary 29(i), any intersection component in a stable neighborhood of is valid, which means there exists that is -valid for some . Now assume is not -valid. Then, , is -trivializable, By Proposition 21 and 29(ii), we have , , . Taking , we get , . We claim that, actually, , . If the claim were not true, some point would exist so that . There are two cases. If , then obviously contradicting . The other case is that is the intersection point of two perpendicular edges and since is a stable intersection component. But, then and are always on two parallel edges where is either or . By Proposition 42(ii), we have