Stable Persistent Homology Features of Dynamic Metric Spaces

12/03/2018 ∙ by Woojin Kim, et al. ∙ The Ohio State University 0

Characterizing the dynamics of time-evolving data within the framework of topological data analysis (TDA) has been attracting increasingly more attention. Popular instances of time-evolving data include flocking/swarming behaviors in animals and social networks in the human sphere. A natural mathematical model for such collective behaviors is a dynamic point cloud, or more generally a dynamic metric space (DMS). In this paper we extend the Rips filtration stability result for (static) metric spaces to the setting of DMSs. We do this by devising a certain three-parameter "spatiotemporal" filtration of a DMS. Applying the homology functor to this filtration gives rise to multidimensional persistence module derived from the DMS. We show that this multidimensional module enjoys stability under a suitable generalization of the Gromov-Hausdorff distance which permits metrizing the collection of all DMSs. On the other hand, it is recognized that, in general, comparing two multidimensional persistence modules leads to intractable computational problems. For the purpose of practical comparison of DMSs, we focus on both the rank invariant or the dimension function of the multidimensional persistence module that is derived from a DMS. We specifically propose to utilize a certain metric d for comparing these invariants: In our work this d is either (1) a certain generalization of the erosion distance by Patel, or (2) a specialized version of the well known interleaving distance. We also study the computational complexity associated to both choices of d.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Stability and tractability of TDA for studying metric spaces.

Finite point clouds or finite metric spaces are amongst the most common data representations considered in topological data analysis (TDA) [Carl09, edelsbrunner2008persistent, ghrist2008barcodes]

. In particular, the stability of the Single Linkage Hierarchical Clustering (SLHC) method

[clustum] or the stability of the persistent homology of filtered Rips complexes built on metric spaces [dghrips, chazal2014persistence] motivates adopting these constructions when studying metric spaces arising in applications.

Whereas there has been extensive applications of TDA to static metric data (thanks to the aforementioned theoretical underpinnings), there is not much study of dynamic metric data from the TDA perspective. Our motivation for considering dynamic metric data stems from the study and characterization of flocking/swarming behavior of animals [benkert2008reporting, gudmundsson2006computing, gudmundsson2007efficient, huang2008modeling, li2010swarm, parrish1997animal, sumpter-collective, vieira2009line], convoys [jeung2008discovery], moving clusters [kalnis2005discovering], or mobile groups [hwang2005mining, wang2008efficient]. In this paper, by extending ideas from [clustum, dghrips, chazal2014persistence, kim2017stable, kim2018CCCG], we aim at establishing a TDA framework for the study of dynamic metric spaces (DMSs) which comes together with stability theorems. We begin by describing and comparing relevant work with ours.

Lack of an adequate metric for DMSs.

In [munch2013applications], Munch considers vineyards — a certain notion of time-varying persistence diagrams introduced by Cohen-Steiner et al. [CEM06] — as signatures for dynamic point clouds. Munch, in particular, shows that vineyards are stable111Under a certain notion of distance arising from in the integration over time of the bottleneck distance between the instantaneous persistence diagrams. [cohen2007stability] under perturbations of the input dynamic point cloud [munch2013applications, Theorem 17]. However, we will observe below that, for the purpose of comparing two DMSs (which we regard as models of flocking behaviors), the metrics that directly arise as the integration of the Hausdorff or Gromov-Haussdorff distance can sometimes fail to be discriminative enough (see Example 4.3 and Remark 2.8).

In [topaz], Halverson, Topaz and Ziegelmeier study aggregation models for biological systems by adopting ideas from TDA. They show that topological analysis of aggregation reveals dynamical events which are not captured by classical analysis methods. Specifically, in order to extract insights about the global behavior of dynamic point clouds obtained by simulating aggregation models, they employ the so-called CROCKER222Contour Realization Of Computed -dimensional hole Evolution in the Rips complex plot. This plot represents the evolution of Betti numbers of Rips complexes over the plane of time and scale parameters. In [topaz_new], Topaz, Ulmer and Ziegelmeier discretize CROCKER plots as matrices and make use of Frobenius norm for comparing any two such matrices. In [topaz, topaz_new], the authors do not provide stability results for CROCKER plots derived from biological aggregation models.

Figure 1: Fix . The two figures above stand for two dynamic point clouds and in the real line each consisting of points and , respectively. Each of and contains (1) two static points located at and respectively ( and ), and (2) one dynamic point with the time-dependent coordinate either or , ( and ). Observe that in the unique dynamic point meets both of and periodically. On the contrary, in , the unique dynamic point meets only periodically.

Motivation for introducing a new metric for DMSs.

Consider the two dynamic point clouds and illustrated as in Figure 1. Let us regard them as instances of DMS with the time-dependent metrics obtained by restricting the Euclidean metric on at each time .

Observe that for each time , the metric spaces and are isometric and hence the Gromov-Hausdorff distance [burago, Ch.7] is zero. This in turn implies that the integral is also zero, implying that and are not distinguished from each other by the integrated Gromov-Hausdorff distance. 333In [munch2013applications], in order to compare two dynamic point clouds, Munch considered the integrated Hausdorff distance over time. Since the metric takes account of relative position of two dynamic point clouds inside an ambient metric space, we do not consider utilizing for the purpose of comparing intrinsic behaviors of two dynamic metric data. Also, Munch considered the integrated bottleneck distance by computing the Rips filtrations of dynamic point clouds at each time. However, by [dghrips, Theorem 3.1], the metric is upper-bounded by (twice) the integrated Gromov-Hausdorff distance, which in this case vanishes. Therefore, does not discriminate the two dynamic point clouds given as in Figure 1.

However, regarding and as models of collective behaviors of animals,vehicles or people, and are clearly distinct from each other. This motivates us to seek an adequate metric that measures the difference between the dynamics underlying any two given DMSs. In particular, this metric should not be a mere sum of instantaneous differences of the given DMSs over time.

In this paper, we adopt , called the -slack interleaving distance with (Definition 4.8, originally introduced in [kim2017stable]), as a measure of the behavioral difference between DMSs. In Section 2, we specifically show that the metric returns a positive value for the pair of DMSs and in Figure 1, demonstrating its sensitivity.

About stability and tractability of .

Even though the metric is able to differentiate subtly different DMSs (Theorem 4.9), computing is not tractable in general (Remark 4.11). This hinders us from utilizing in practice. Therefore, as a pragmatic approach, we adopt the comparison of invariants of DMSs, rather than directly comparing DMSs . To this end,

  • the invariants must be stable under perturbations of the input DMS, and

  • the metric for comparing two invariants extracted from two DMSs must be efficiently computable.

Contributions.

In this work, we achieve both items (a) and (b) above, described as follows.

With regard to (a), we first extract invariants from a given DMS, where these invariants are in the form of 3-dimensional

persistence modules of sets or vector spaces. These are obtained from a blend of ideas related to the Rips filtration

[cohen2007stability, dghrips, comptopo-herbert], the single linkage hierarchical clustering (SLHC) method [clustum], and the interlevel set persistence/categorified Reeb graphs [bendich2013homology, botnan2018algebraic, carlsson2009zigzag, de2016categorified].

We are able to prove the stability of these invariants by adapting ideas from [clustum, dghrips, chazal2014persistence]. We specifically emphasize that our stability results are a generalization of the well known stability theorems for the SLHC method [clustum] and the Rips filtration of a metric space [dghrips, chazal2014persistence]: Indeed, we show that by restricting ourselves to the class of constant DMSs, our results reduce to the standard stability theorems for static metric spaces in [clustum, dghrips, chazal2014persistence].

Next, in regard to item (b) above, we address the issue of computability of the metric between invariants of DMSs. In [bjerkevik2018computing, bjerkevik2017computational], Bjerkevik and Botnan show that computing the interleaving distance [lesnick] between multidimensional persistence modules can in general be NP-hard. Also, since we are not guaranteed to have interval decomposability [botnan2018algebraic, carlsson2009theory] of the -dimensional modules considered in this paper, we are not in a position to utilize the bottleneck distance and relevant algorithms developed by Dey and Xin [dey2018computing] instead of .

This motivates us to further simplify our invariant associated to a DMS , which is in the form of -dimensional persistence module. We focus on both the dimension function and the rank function. The dimension function of a persistence module has been studied in various contexts and with various names such as Betti curve, feature counting function, etc, [babichev2018robust, dey2018computing, giusti2016two, giusti2015clique, kahle2013limit, scolamiero2017multidimensional]. The rank function of has also been extensively considered [carlsson2009theory, cerri2013betti, landi2018rank, patel2018generalized, puuska2017erosion]. We observe that both of these functions (1) can themselves be computed in polynomial time, (2) can be compared to each other via the interleaving distance for integer-valued functions (see Section 3) and (3) are stable to perturbations of under . We also propose a simple algorithm for computing in poly-time (Section A.3). Therefore, we can bound the distance in poly-time by computing and either or .

We in particular emphasize that our method for computing provides a poly-time algorithm for bounding from below the interleaving distance between -dimensional persistence modules of vector spaces without any restriction on or on the structure of (even if is not derived from a DMS).

Other related work.

Aiming at analyzing/summarizing trajectory data such as the movement of animals, vehicles, and people, Buchin and et al. introduce the notion of trajectory grouping structure [buchin2013trajectory]. This is a summarization, in the form of a labeled Reeb graph, of a set of points having piecewise linear trajectories with time-stamped vertices in Euclidean space . This work was subsequently enriched in [kostitsyna2015trajectory, van2016grouping, van2015central, van2016refined].

In [kim2018CCCG, kim2017stable], the thread of ideas in [buchin2013trajectory] is blended with ideas in zigzag persistence theory [zigzag]. Specifically, particular cases of trajectory grouping structure in [buchin2013trajectory], are named formigrams. By clarifying the zigzag persistence structure of formigrams, formigrams are further summarized into barcodes. Regarding the barcode as a signature of a set of trajectory data, the authors of [kim2018CCCG, kim2017stable] utilize these barcodes for carrying out the classification task of a family of synthetic flocking behaviors [zane].

The central results in [kim2018CCCG, kim2017stable] show that barcodes or formigrams from a trajectory data are stable to perturbations of the input data [kim2018CCCG, Theorem 5],[kim2017stable, Theorem 9.21]. This work is a sequel to [kim2018CCCG, kim2017stable]. Namely, by considering Rips-like filtrations parametrized both by time intervals and spatial scale, we obtain novel stability results in every homological dimension.

Other work utilizing TDA-like ideas in the analysis of dynamic data includes: a study of time-varying merge trees or time-varying Reeb graphs [edelsbrunner2008time, oesterling2015computing]. Also, ideas of persistent homology are utilized in the study of time-varying graphs [hajij2018visual], discretely sampled dynamical systems [bauer2017persistence, edelsbrunner2015persistent] or in the study of combinatorial dynamical systems [dey2018persistent].

Acknowledgements.

FM thanks Justin Curry and Amit Patel for beneficial discussions. This work was partially supported by NSF grants IIS-1422400, CCF-1526513, DMS-1723003, and CCF-1740761.

2 Overview of our main results.

In this section we summarize the main results of this paper without technical details.

Throughout this paper, we fix a certain field and only consider vector spaces over whenever they arise. Any simplicial homology has coefficients in . By and , we denote the set of non-negative integers with and the set of non-negative reals with , respectively. Also, let be the collection of all finite closed intervals of . See Figure 2.

Figure 2: The collection can be graphically represented as the upper-half plane : Any closed interval of is identified with the point on . Observe that if , then the point is located at upper-left of the point in the plane.

2.1 Stability theorems for Persistent homology invariants of DMSs

Spatiotemporal Rips filtration of a DMS.

A DMS stands for a pair of finite set with -parametrized metric : for each , a certain (pseudo-)metric is obtained. See Definition B.1 for details.

Definition 2.1 (Time-interlevel analysis of a DMS).

Suppose that a DMS is given. Define the function as

Observe that if are both in , then .   We construct the -parameter simplicial filtration , called the spatiotemporal Rips filtration of , described in Figure 3. By applying -th homology to this filtration, we obtain -dimensional persistence module .444Notice that this is a blend of ideas related to the Rips filtration [cohen2007stability, dghrips, comptopo-herbert] and the interlevel set persistence/categorified Reeb graphs [bendich2013homology, botnan2018algebraic, carlsson2009zigzag, de2016categorified].

Figure 3: To each , we associate the Rips complex on the metric space* . Provided another interval and scale with and , we obtain the inclusion . This construction gives rise to a -dimensional simplicial filtration indexed by . * In fact, does not necessarily satisfy the triangle inequality. However, it does not prevent us from defining the Rips complex on the semi-metric space .

The rank invariant of a DMS.

We denote the rank invariant [carlsson2009theory] of this multidimensional persistence module by and call it the -th rank invariant of (Definition 5.5). More precisely, given a pair , with and , we define to be the rank of the linear map

We have the following stability theorems for the map taking a DMS to its rank invariant function.

Theorem 2.2 (Stability of the rank invariant of DMSs).

Let and be any two DMSs. For any , let and be the -th rank invariant of and , respectively. Then, we have:

(1)

Above, is an interleaving type distance between rank invariants — See Section 3 for its definition.

Relationship between and the CROCKER plot [topaz].

We relate the rank invariant of a DMS to the CROCKER plot of [topaz]:

Definition 2.3 (The CROCKER plots of a DMS [topaz]).

Let be a DMS. For , the -th CROCKER plot of is a map sending to the dimension of the vector space .

Let be any DMS. Note that for any time and scale , the value of associated to the repeated pair is identical to the dimension of the vector space , i.e. . This implies that is an enriched version of the -th CROCKER plot of and thus Theorem 2.2 can be interpreted somehow as establishing the stability of the CROCKER plots of a DMS.

Improvement for .

By restricting ourselves to clustering information (i.e. -th homology) of DMSs, we obtain a stronger lower bound for the metric .

Definition 2.4 (The Betti-0 function of a DMS).

Let be a DMS. We define the Betti-0 function of by sending each to the dimension of .

Example 2.5.

Consider the DMSs and given as the dynamic point clouds and in Figure 1 respectively. The Betti -functions of and are illustrated in Figure 4.

It is not difficult to check that if in and in , then . This monotonicity allows us to compare two Betti- functions of two different DMSs via . In particular, we have:

Theorem 2.6 (Stability of the Betti-0 function).

Let and be any two DMSs. Then,

(2)
Remark 2.7 (Comparison between the Betti- function and the -th CROCKER plot).

We remark that the -th CROCKER plots of are obtained by respectively restricting and to the front diagonal vertical plane , which is colored brown in the middle picture of Figure 4. In particular, since the two metric spaces and are isometric at each time (see Definition 4.2 2), the two CROCKER plots and are identical. This implies that, in comparison with the -th CROCKER plot, the Betti- function is more sensitive invariant of a DMS.

Remark 2.8 (Sensitivity of the LHS in (2)).

Consider the DMSs and given as in Example 2.5. The value is at least (See Section 5.1 for details). This in turn implies that the metric is sensitive enough to discriminate (the Betti- functions of) and .

Figure 4: (The Betti- functions of the DMSs in Figure 1) The middle figure represents the domain (Figure 3) of and . (A) and (B) illustrate the value of and respectively on the horizontal half-planes (bottom) and (top). In particular, if , for all . The same properties hold for .

Also, for the inequalities in (1) and (2), we have

Proposition 2.9.

For any two DMSs and ,

(3)

This proposition implies that, in order to obtain a lower bound for between two DMSs, computing the distance between the Betti- functions of the DMSs is better than computing the distance between their -th rank invariants. Indeed, the inequality in (3) can be strict (see Example 5.2). See Section A.2 for the proof of Proposition 2.9.

2.2 Relationship with standard stability theorems

Given a (static) finite metric space , define the DMS by declaring that for all , as a function . We refer to such as a constant DMS and simply write . In Remarks 2.10 and 2.11 below, we see that when restricting ourselves to the class of constant DMSs, Theorems 2.2 and 2.6 boil down to the well-known stability theorems for (static) metric spaces.

Let be a finite metric space. For each , we consider the function defined as555Notice that by a slight abuse of notation we are using the symbol to denote the rank function for both DMSs and static metric spaces.

(4)

where is the Rips complex of at the scale (Definition E.6).

Remark 2.10.

Consider any two constant DMSs and . Then, for any , inequality (1) reduces to:

(5)

This means that the LHS and the RHS of inequality (1) are respectively identical to the LHS and the RHS of inequality (5). See Remark 4.10.

We remark that, in comparison with the bottleneck distance between the -th persistence diagrams of the Rips filtrations of , the LHS of inequality (5) is a coarser lower bound for (twice) the Gromov-Hausdorff (Theorem E.8, Remark E.11).

Let be a finite metric space. For each , consider the graph on the vertex set , where if and only if . We define the Betti- function by sending each to the number of connected components of the graph .

Remark 2.11 (Stability of the Betti- function).

Consider any two constant DMSs and . Then, the inequality in (2) reduces to:

(6)

See Remark 4.10.

In particular, we also show that the LHS of the inequality in Remark 2.11 is always at least as fine as the bottleneck distance between the -th persistence diagrams of the Rips filtrations of . See Example 5.2 and Theorem E.13.

2.3 Computational complexity of

We clarify the computational complexity of the metric which appears in Theorems 2.2 and 2.6, and Remarks 2.10 and 2.11. In fact, each in those statements is a shorthand for , , and respectively in order. Here, the subscript in stands for the dimension of the indexing poset of integer-valued functions between which compare.

Theorem 2.12.

The expected cost of computing is at least . Furthermore, there is an algorithm that is based on ordinary binary search that matches this expected cost.

See Section A.3 for the precise statement of this theorem. In Section A.4 we clarify a connection between and the erosion distance by Patel [patel2018generalized]. Also in the same section, we compare with the dimension distance [dey2018computing, Section 4], and with the matching distance [cerri2013betti, cerri2011new, landi2018rank].

In Sections 35 we provides more precises descriptions of the aforementioned notions and theorems, together with examples.

3 Interleaving distance between integer-valued functions

In this section we consider the interleaving distance between monotonic integer-valued functions by regarding them as functors. In Section A.1, the complete definition of the interleaving distance will be provided. In Section A.3 we will discuss computational aspects of the interleaving distance between integer-valued functions.

Posets and their opposite.

Given any poset , we regard as the category: Objects are the elements of . Also, for any , there exists the unique morphism if and only if . Since there exists at most one morphism between any two elements of , the category is called thin and, any closed diagram in must commute. We sometimes consider the opposite category of , which will be denoted by . In the category , for , there exists the unique morphism if and only if .

Example 3.1 ().

Recall the collection of all finite closed intervals of (Section 2). We regard as poset, where the order is the inclusion . Hence, can be seen as the category of finite closed real intervals whose morphisms are inclusions.

Product of posets.

Given any two posets and , we assume by default that their product is equipped with the partial order defined as if and only if in and in .

Remark 3.2.

In the poset , we have if and only if and . We will regard as a subposet of the product poset via the identification . Indeed,

Poset-valued maps.

Let and be any two posets. Suppose that is any (monotonically) increasing map, i.e. for any in , . Then, by regarding as categories, can be regarded as a functor. On the other hand, suppose that is any (monotonically) decreasing map, i.e. for any in , . Then, can also be called a functor.

The interleaving distance between integer-valued functions.

Fix . Let be the poset, where in if and only if for each . For any , let . Consider any non-increasing integer-valued function . Note that can be regarded as a functor from the poset cateogory to the other poset category . Since is a thin category, given another functor , the interleaving distance (Definition A.2) between and can be written as

We drop the subscript from when confusion is unlikely.

4 The distance between DMSs

DMSs.

A DMS stands for a pair of non-empty finite set with -parametrized metric : for each , a certain (pseudo-)metric is obtained. See Definition B.1 for details.

Example 4.1 ([kim2017stable]).

Examples of DMSs include:

  1. [label=()]

  2. (Constant DMSs) Given a finite metric space , define the DMS by declaring that for all , as a function . We refer to such as a constant DMS and simply write .

  3. (Dynamic point clouds) A family of examples is given by points moving continuously inside an ambient metric space where particles are allowed to coalesce. If the trajectories are , then let and define the DMS as follows: for and , let We call a dynamic point cloud in and simply write or .

Weak and strong isomorphism between DMSs.

We introduce two different notions of isomorphism between DMSs.

Definition 4.2 (Isomorphism between DMSs).

Let be any two DMSs.

  1. [label=()]

  2. and are strongly isomorphic if there exists a bijection such that is an isometry between and for all .

  3. and are weakly isomorphic if for each , is isometric to .

Any two strongly isomorphic DMSs are weakly isomorphic, but the converse is not true:

Example 4.3 (Weakly isomorphic DMSs).

The dynamic point clouds and described in Figure 1 are weakly isomorphic, but not strongly isormorphic: Indeed, there is no bijection between and which serves as an isometry for all .

The distance between DMSs.

We review the extended metric for DMSs, which was introduced in [kim2017stable, Definition 9.13] under the name of -slack interleaving distance, for each . Throughout this paper, we fix for ease of notation. This choice is not significant because different choices of yield bilipschitz equivalent metrics for DMSs [kim2017stable, Proposition 11.29].

Definition 4.4.

Let . Given any map , by we denote the map defined as for all

In order to compare any two DMSs, we will utilize the notion of tripod:

Definition 4.5 (Tripod).

Let and be any two non-empty sets. For another set , any pair of surjective maps is called a tripod between and .

Given any map , let be any set and let be any map. Then, we define as

Definition 4.6 (Comparison of functions via tripods).

Consider any two maps and . Given a tripod between and , by

we mean for all .

For any , let . Recall Definiton 2.1.

Definition 4.7 (Distortion of a tripod).

Let and be any two DMSs. Let be a tripod between and such that

(7)

We call any such an -tripod between and . Define the distortion of to be the infimum of for which is an -tripod.

In Definition 4.7, if is a -tripod, then is also a -tripod for any .

Definition 4.8 (The distance between DMSs).

Given any two DMSs and , we define

where the minimum ranges over all tripods between and .

We remark that is a hybrid between the Gromov-Hausdorff distance (Definition E.1) and the interleaving distance [bubenik2014categorification, CCG09] for Reeb graphs [de2016categorified].

Any DMS is said to be bounded if there exists such that for all and all For example, both DMSs given in Figure 1 are bounded.

Theorem 4.9 ([kim2017stable, Theorem 9.14]).

is an extended metric between DMSs modulo strong isomorphism (Definition 4.2 1). In particular, is a metric between bounded DMSs modulo strong isomorphism.

Remark 4.10 ( generalizes the Gromov-Hausdorff distance [kim2017stable, Remark 11.28]).

Given any two constant DMSs and , the metric recovers the Gromov-Hausdorff distance between and . Indeed, for any tripod between and , condition (7) reduces to

Therefore,

Remark 4.11.

From Remark 4.10, we conclude that the computation of is in general not tractable: On the class of constant DMSs the metric reduces to the Gromov-Hausdorff distance, which leads to NP-hard problem [agarwal2015computing, schmiedl2017computational].

5 Persistent homology features of a DMS

We extend ideas from persistent homology/single linkage hierarchical clustering method for metric spaces (Section E) to the setting of dynamic metric spaces (DMSs).

5.1 Betti- function for DMSs and its stability

Recall Definition 2.4. Given any DMS, we already observed in Section 2 that if in , then . This implies that is a functor . We extend to the poset in Remark 3.2 to ensure that we are in a position to utilize the metric :

Definition 5.1 ((Extended) Betti-0 function of a DMS).

Let be a DMS. We define the (extended) Betti-0 function of as

It is not difficult to check that is indeed a functor . Hence, we can compare any two Betti-0 functions of DMSs via the interleaving distance (see Remark A.3). In particular, we have Theorem 2.6. We prove Theorem 2.6 in Section C.2. Also, we remark that when is a constant DMS (Example 4.1 1), is constant with respect to the first two factors.

Details about Remark 2.8.

For , if and , we will denote by , regarding it as the element of . We show that is at least . Observe that

Hence, the (geometric realization of) Rips complexes and are illustrated in Figure 5.

Figure 5:

By counting the number of connected components of these complexes, we have