Stability and tractability of TDA for studying metric spaces.
Finite point clouds or finite metric spaces are amongst the most common data representations considered in topological data analysis (TDA) [Carl09, edelsbrunner2008persistent, ghrist2008barcodes]
. In particular, the stability of the Single Linkage Hierarchical Clustering (SLHC) method[clustum] or the stability of the persistent homology of filtered Rips complexes built on metric spaces [dghrips, chazal2014persistence] motivates adopting these constructions when studying metric spaces arising in applications.
Whereas there has been extensive applications of TDA to static metric data (thanks to the aforementioned theoretical underpinnings), there is not much study of dynamic metric data from the TDA perspective. Our motivation for considering dynamic metric data stems from the study and characterization of flocking/swarming behavior of animals [benkert2008reporting, gudmundsson2006computing, gudmundsson2007efficient, huang2008modeling, li2010swarm, parrish1997animal, sumpter-collective, vieira2009line], convoys [jeung2008discovery], moving clusters [kalnis2005discovering], or mobile groups [hwang2005mining, wang2008efficient]. In this paper, by extending ideas from [clustum, dghrips, chazal2014persistence, kim2017stable, kim2018CCCG], we aim at establishing a TDA framework for the study of dynamic metric spaces (DMSs) which comes together with stability theorems. We begin by describing and comparing relevant work with ours.
Lack of an adequate metric for DMSs.
In [munch2013applications], Munch considers vineyards — a certain notion of time-varying persistence diagrams introduced by Cohen-Steiner et al. [CEM06] — as signatures for dynamic point clouds. Munch, in particular, shows that vineyards are stable111Under a certain notion of distance arising from in the integration over time of the bottleneck distance between the instantaneous persistence diagrams. [cohen2007stability] under perturbations of the input dynamic point cloud [munch2013applications, Theorem 17]. However, we will observe below that, for the purpose of comparing two DMSs (which we regard as models of flocking behaviors), the metrics that directly arise as the integration of the Hausdorff or Gromov-Haussdorff distance can sometimes fail to be discriminative enough (see Example 4.3 and Remark 2.8).
In [topaz], Halverson, Topaz and Ziegelmeier study aggregation models for biological systems by adopting ideas from TDA. They show that topological analysis of aggregation reveals dynamical events which are not captured by classical analysis methods. Specifically, in order to extract insights about the global behavior of dynamic point clouds obtained by simulating aggregation models, they employ the so-called CROCKER222Contour Realization Of Computed -dimensional hole Evolution in the Rips complex plot. This plot represents the evolution of Betti numbers of Rips complexes over the plane of time and scale parameters. In [topaz_new], Topaz, Ulmer and Ziegelmeier discretize CROCKER plots as matrices and make use of Frobenius norm for comparing any two such matrices. In [topaz, topaz_new], the authors do not provide stability results for CROCKER plots derived from biological aggregation models.
Motivation for introducing a new metric for DMSs.
Consider the two dynamic point clouds and illustrated as in Figure 1. Let us regard them as instances of DMS with the time-dependent metrics obtained by restricting the Euclidean metric on at each time .
Observe that for each time , the metric spaces and are isometric and hence the Gromov-Hausdorff distance [burago, Ch.7] is zero. This in turn implies that the integral is also zero, implying that and are not distinguished from each other by the integrated Gromov-Hausdorff distance. 333In [munch2013applications], in order to compare two dynamic point clouds, Munch considered the integrated Hausdorff distance over time. Since the metric takes account of relative position of two dynamic point clouds inside an ambient metric space, we do not consider utilizing for the purpose of comparing intrinsic behaviors of two dynamic metric data. Also, Munch considered the integrated bottleneck distance by computing the Rips filtrations of dynamic point clouds at each time. However, by [dghrips, Theorem 3.1], the metric is upper-bounded by (twice) the integrated Gromov-Hausdorff distance, which in this case vanishes. Therefore, does not discriminate the two dynamic point clouds given as in Figure 1.
However, regarding and as models of collective behaviors of animals,vehicles or people, and are clearly distinct from each other. This motivates us to seek an adequate metric that measures the difference between the dynamics underlying any two given DMSs. In particular, this metric should not be a mere sum of instantaneous differences of the given DMSs over time.
In this paper, we adopt , called the -slack interleaving distance with (Definition 4.8, originally introduced in [kim2017stable]), as a measure of the behavioral difference between DMSs. In Section 2, we specifically show that the metric returns a positive value for the pair of DMSs and in Figure 1, demonstrating its sensitivity.
About stability and tractability of .
Even though the metric is able to differentiate subtly different DMSs (Theorem 4.9), computing is not tractable in general (Remark 4.11). This hinders us from utilizing in practice. Therefore, as a pragmatic approach, we adopt the comparison of invariants of DMSs, rather than directly comparing DMSs . To this end,
the invariants must be stable under perturbations of the input DMS, and
the metric for comparing two invariants extracted from two DMSs must be efficiently computable.
In this work, we achieve both items (a) and (b) above, described as follows.
With regard to (a), we first extract invariants from a given DMS, where these invariants are in the form of 3-dimensional
persistence modules of sets or vector spaces. These are obtained from a blend of ideas related to the Rips filtration[cohen2007stability, dghrips, comptopo-herbert], the single linkage hierarchical clustering (SLHC) method [clustum], and the interlevel set persistence/categorified Reeb graphs [bendich2013homology, botnan2018algebraic, carlsson2009zigzag, de2016categorified].
We are able to prove the stability of these invariants by adapting ideas from [clustum, dghrips, chazal2014persistence]. We specifically emphasize that our stability results are a generalization of the well known stability theorems for the SLHC method [clustum] and the Rips filtration of a metric space [dghrips, chazal2014persistence]: Indeed, we show that by restricting ourselves to the class of constant DMSs, our results reduce to the standard stability theorems for static metric spaces in [clustum, dghrips, chazal2014persistence].
Next, in regard to item (b) above, we address the issue of computability of the metric between invariants of DMSs. In [bjerkevik2018computing, bjerkevik2017computational], Bjerkevik and Botnan show that computing the interleaving distance [lesnick] between multidimensional persistence modules can in general be NP-hard. Also, since we are not guaranteed to have interval decomposability [botnan2018algebraic, carlsson2009theory] of the -dimensional modules considered in this paper, we are not in a position to utilize the bottleneck distance and relevant algorithms developed by Dey and Xin [dey2018computing] instead of .
This motivates us to further simplify our invariant associated to a DMS , which is in the form of -dimensional persistence module. We focus on both the dimension function and the rank function. The dimension function of a persistence module has been studied in various contexts and with various names such as Betti curve, feature counting function, etc, [babichev2018robust, dey2018computing, giusti2016two, giusti2015clique, kahle2013limit, scolamiero2017multidimensional]. The rank function of has also been extensively considered [carlsson2009theory, cerri2013betti, landi2018rank, patel2018generalized, puuska2017erosion]. We observe that both of these functions (1) can themselves be computed in polynomial time, (2) can be compared to each other via the interleaving distance for integer-valued functions (see Section 3) and (3) are stable to perturbations of under . We also propose a simple algorithm for computing in poly-time (Section A.3). Therefore, we can bound the distance in poly-time by computing and either or .
We in particular emphasize that our method for computing provides a poly-time algorithm for bounding from below the interleaving distance between -dimensional persistence modules of vector spaces without any restriction on or on the structure of (even if is not derived from a DMS).
Other related work.
Aiming at analyzing/summarizing trajectory data such as the movement of animals, vehicles, and people, Buchin and et al. introduce the notion of trajectory grouping structure [buchin2013trajectory]. This is a summarization, in the form of a labeled Reeb graph, of a set of points having piecewise linear trajectories with time-stamped vertices in Euclidean space . This work was subsequently enriched in [kostitsyna2015trajectory, van2016grouping, van2015central, van2016refined].
In [kim2018CCCG, kim2017stable], the thread of ideas in [buchin2013trajectory] is blended with ideas in zigzag persistence theory [zigzag]. Specifically, particular cases of trajectory grouping structure in [buchin2013trajectory], are named formigrams. By clarifying the zigzag persistence structure of formigrams, formigrams are further summarized into barcodes. Regarding the barcode as a signature of a set of trajectory data, the authors of [kim2018CCCG, kim2017stable] utilize these barcodes for carrying out the classification task of a family of synthetic flocking behaviors [zane].
The central results in [kim2018CCCG, kim2017stable] show that barcodes or formigrams from a trajectory data are stable to perturbations of the input data [kim2018CCCG, Theorem 5],[kim2017stable, Theorem 9.21]. This work is a sequel to [kim2018CCCG, kim2017stable]. Namely, by considering Rips-like filtrations parametrized both by time intervals and spatial scale, we obtain novel stability results in every homological dimension.
Other work utilizing TDA-like ideas in the analysis of dynamic data includes: a study of time-varying merge trees or time-varying Reeb graphs [edelsbrunner2008time, oesterling2015computing]. Also, ideas of persistent homology are utilized in the study of time-varying graphs [hajij2018visual], discretely sampled dynamical systems [bauer2017persistence, edelsbrunner2015persistent] or in the study of combinatorial dynamical systems [dey2018persistent].
FM thanks Justin Curry and Amit Patel for beneficial discussions. This work was partially supported by NSF grants IIS-1422400, CCF-1526513, DMS-1723003, and CCF-1740761.
2 Overview of our main results.
In this section we summarize the main results of this paper without technical details.
Throughout this paper, we fix a certain field and only consider vector spaces over whenever they arise. Any simplicial homology has coefficients in . By and , we denote the set of non-negative integers with and the set of non-negative reals with , respectively. Also, let be the collection of all finite closed intervals of . See Figure 2.
2.1 Stability theorems for Persistent homology invariants of DMSs
Spatiotemporal Rips filtration of a DMS.
A DMS stands for a pair of finite set with -parametrized metric : for each , a certain (pseudo-)metric is obtained. See Definition B.1 for details.
Definition 2.1 (Time-interlevel analysis of a DMS).
Suppose that a DMS is given. Define the function as
Observe that if are both in , then . We construct the -parameter simplicial filtration , called the spatiotemporal Rips filtration of , described in Figure 3. By applying -th homology to this filtration, we obtain -dimensional persistence module .444Notice that this is a blend of ideas related to the Rips filtration [cohen2007stability, dghrips, comptopo-herbert] and the interlevel set persistence/categorified Reeb graphs [bendich2013homology, botnan2018algebraic, carlsson2009zigzag, de2016categorified].
The rank invariant of a DMS.
We denote the rank invariant [carlsson2009theory] of this multidimensional persistence module by and call it the -th rank invariant of (Definition 5.5). More precisely, given a pair , with and , we define to be the rank of the linear map
We have the following stability theorems for the map taking a DMS to its rank invariant function.
Theorem 2.2 (Stability of the rank invariant of DMSs).
Let and be any two DMSs. For any , let and be the -th rank invariant of and , respectively. Then, we have:
Above, is an interleaving type distance between rank invariants — See Section 3 for its definition.
Relationship between and the CROCKER plot [topaz].
We relate the rank invariant of a DMS to the CROCKER plot of [topaz]:
Definition 2.3 (The CROCKER plots of a DMS [topaz]).
Let be a DMS. For , the -th CROCKER plot of is a map sending to the dimension of the vector space .
Let be any DMS. Note that for any time and scale , the value of associated to the repeated pair is identical to the dimension of the vector space , i.e. . This implies that is an enriched version of the -th CROCKER plot of and thus Theorem 2.2 can be interpreted somehow as establishing the stability of the CROCKER plots of a DMS.
Improvement for .
By restricting ourselves to clustering information (i.e. -th homology) of DMSs, we obtain a stronger lower bound for the metric .
Definition 2.4 (The Betti-0 function of a DMS).
Let be a DMS. We define the Betti-0 function of by sending each to the dimension of .
It is not difficult to check that if in and in , then . This monotonicity allows us to compare two Betti- functions of two different DMSs via . In particular, we have:
Theorem 2.6 (Stability of the Betti-0 function).
Let and be any two DMSs. Then,
Remark 2.7 (Comparison between the Betti- function and the -th CROCKER plot).
We remark that the -th CROCKER plots of are obtained by respectively restricting and to the front diagonal vertical plane , which is colored brown in the middle picture of Figure 4. In particular, since the two metric spaces and are isometric at each time (see Definition 4.2 2), the two CROCKER plots and are identical. This implies that, in comparison with the -th CROCKER plot, the Betti- function is more sensitive invariant of a DMS.
Remark 2.8 (Sensitivity of the LHS in (2)).
For any two DMSs and ,
This proposition implies that, in order to obtain a lower bound for between two DMSs, computing the distance between the Betti- functions of the DMSs is better than computing the distance between their -th rank invariants. Indeed, the inequality in (3) can be strict (see Example 5.2). See Section A.2 for the proof of Proposition 2.9.
2.2 Relationship with standard stability theorems
Given a (static) finite metric space , define the DMS by declaring that for all , as a function . We refer to such as a constant DMS and simply write . In Remarks 2.10 and 2.11 below, we see that when restricting ourselves to the class of constant DMSs, Theorems 2.2 and 2.6 boil down to the well-known stability theorems for (static) metric spaces.
Let be a finite metric space. For each , we consider the function defined as555Notice that by a slight abuse of notation we are using the symbol to denote the rank function for both DMSs and static metric spaces.
where is the Rips complex of at the scale (Definition E.6).
We remark that, in comparison with the bottleneck distance between the -th persistence diagrams of the Rips filtrations of , the LHS of inequality (5) is a coarser lower bound for (twice) the Gromov-Hausdorff (Theorem E.8, Remark E.11).
Let be a finite metric space. For each , consider the graph on the vertex set , where if and only if . We define the Betti- function by sending each to the number of connected components of the graph .
Remark 2.11 (Stability of the Betti- function).
2.3 Computational complexity of
We clarify the computational complexity of the metric which appears in Theorems 2.2 and 2.6, and Remarks 2.10 and 2.11. In fact, each in those statements is a shorthand for , , and respectively in order. Here, the subscript in stands for the dimension of the indexing poset of integer-valued functions between which compare.
The expected cost of computing is at least . Furthermore, there is an algorithm that is based on ordinary binary search that matches this expected cost.
See Section A.3 for the precise statement of this theorem. In Section A.4 we clarify a connection between and the erosion distance by Patel [patel2018generalized]. Also in the same section, we compare with the dimension distance [dey2018computing, Section 4], and with the matching distance [cerri2013betti, cerri2011new, landi2018rank].
3 Interleaving distance between integer-valued functions
In this section we consider the interleaving distance between monotonic integer-valued functions by regarding them as functors. In Section A.1, the complete definition of the interleaving distance will be provided. In Section A.3 we will discuss computational aspects of the interleaving distance between integer-valued functions.
Posets and their opposite.
Given any poset , we regard as the category: Objects are the elements of . Also, for any , there exists the unique morphism if and only if . Since there exists at most one morphism between any two elements of , the category is called thin and, any closed diagram in must commute. We sometimes consider the opposite category of , which will be denoted by . In the category , for , there exists the unique morphism if and only if .
Example 3.1 ().
Recall the collection of all finite closed intervals of (Section 2). We regard as poset, where the order is the inclusion . Hence, can be seen as the category of finite closed real intervals whose morphisms are inclusions.
Product of posets.
Given any two posets and , we assume by default that their product is equipped with the partial order defined as if and only if in and in .
In the poset , we have if and only if and . We will regard as a subposet of the product poset via the identification . Indeed,
Let and be any two posets. Suppose that is any (monotonically) increasing map, i.e. for any in , . Then, by regarding as categories, can be regarded as a functor. On the other hand, suppose that is any (monotonically) decreasing map, i.e. for any in , . Then, can also be called a functor.
The interleaving distance between integer-valued functions.
Fix . Let be the poset, where in if and only if for each . For any , let . Consider any non-increasing integer-valued function . Note that can be regarded as a functor from the poset cateogory to the other poset category . Since is a thin category, given another functor , the interleaving distance (Definition A.2) between and can be written as
We drop the subscript from when confusion is unlikely.
4 The distance between DMSs
A DMS stands for a pair of non-empty finite set with -parametrized metric : for each , a certain (pseudo-)metric is obtained. See Definition B.1 for details.
Example 4.1 ([kim2017stable]).
Examples of DMSs include:
(Constant DMSs) Given a finite metric space , define the DMS by declaring that for all , as a function . We refer to such as a constant DMS and simply write .
(Dynamic point clouds) A family of examples is given by points moving continuously inside an ambient metric space where particles are allowed to coalesce. If the trajectories are , then let and define the DMS as follows: for and , let We call a dynamic point cloud in and simply write or .
Weak and strong isomorphism between DMSs.
We introduce two different notions of isomorphism between DMSs.
Definition 4.2 (Isomorphism between DMSs).
Let be any two DMSs.
and are strongly isomorphic if there exists a bijection such that is an isometry between and for all .
and are weakly isomorphic if for each , is isometric to .
Any two strongly isomorphic DMSs are weakly isomorphic, but the converse is not true:
Example 4.3 (Weakly isomorphic DMSs).
The dynamic point clouds and described in Figure 1 are weakly isomorphic, but not strongly isormorphic: Indeed, there is no bijection between and which serves as an isometry for all .
The distance between DMSs.
We review the extended metric for DMSs, which was introduced in [kim2017stable, Definition 9.13] under the name of -slack interleaving distance, for each . Throughout this paper, we fix for ease of notation. This choice is not significant because different choices of yield bilipschitz equivalent metrics for DMSs [kim2017stable, Proposition 11.29].
Let . Given any map , by we denote the map defined as for all
In order to compare any two DMSs, we will utilize the notion of tripod:
Definition 4.5 (Tripod).
Let and be any two non-empty sets. For another set , any pair of surjective maps is called a tripod between and .
Given any map , let be any set and let be any map. Then, we define as
Definition 4.6 (Comparison of functions via tripods).
Consider any two maps and . Given a tripod between and , by
we mean for all .
For any , let . Recall Definiton 2.1.
Definition 4.7 (Distortion of a tripod).
Let and be any two DMSs. Let be a tripod between and such that
We call any such an -tripod between and . Define the distortion of to be the infimum of for which is an -tripod.
In Definition 4.7, if is a -tripod, then is also a -tripod for any .
Definition 4.8 (The distance between DMSs).
Given any two DMSs and , we define
where the minimum ranges over all tripods between and .
We remark that is a hybrid between the Gromov-Hausdorff distance (Definition E.1) and the interleaving distance [bubenik2014categorification, CCG09] for Reeb graphs [de2016categorified].
Any DMS is said to be bounded if there exists such that for all and all For example, both DMSs given in Figure 1 are bounded.
Theorem 4.9 ([kim2017stable, Theorem 9.14]).
Remark 4.10 ( generalizes the Gromov-Hausdorff distance [kim2017stable, Remark 11.28]).
Given any two constant DMSs and , the metric recovers the Gromov-Hausdorff distance between and . Indeed, for any tripod between and , condition (7) reduces to
From Remark 4.10, we conclude that the computation of is in general not tractable: On the class of constant DMSs the metric reduces to the Gromov-Hausdorff distance, which leads to NP-hard problem [agarwal2015computing, schmiedl2017computational].
5 Persistent homology features of a DMS
We extend ideas from persistent homology/single linkage hierarchical clustering method for metric spaces (Section E) to the setting of dynamic metric spaces (DMSs).
5.1 Betti- function for DMSs and its stability
Recall Definition 2.4. Given any DMS, we already observed in Section 2 that if in , then . This implies that is a functor . We extend to the poset in Remark 3.2 to ensure that we are in a position to utilize the metric :
Definition 5.1 ((Extended) Betti-0 function of a DMS).
Let be a DMS. We define the (extended) Betti-0 function of as
It is not difficult to check that is indeed a functor . Hence, we can compare any two Betti-0 functions of DMSs via the interleaving distance (see Remark A.3). In particular, we have Theorem 2.6. We prove Theorem 2.6 in Section C.2. Also, we remark that when is a constant DMS (Example 4.1 1), is constant with respect to the first two factors.