In this manuscript we address an important question in topological data analysis (TDA), namely, the study of the weak convergence of persistent Betti numbers
where , for , , and where is either a binomial or a Poisson process with intensity function on the unit cube and . is a smooth intensity function which is not necessarily constant.
So far, there exist results on the pointwise asymptotic normality for Betti numbers in the case of a homogeneous Poisson process or a binomial process, see Yogeshwaran et al. (2017). In the case of a homogenous Poisson process this result was extended to persistent Betti numbers by Hiraoka et al. (2018). We will use this latter result to derive the pointwise asymptotic normality for the general case. The recent contributions Owada (2018) and Owada and Thomas (2018) also study the limiting behavior of Betti numbers.
TDA is a comparably young field that has emerged from several contributions in algebraic topology and computational geometry. Milestone contributions which helped to popularize TDA in its early days are Edelsbrunner et al. (2000), Zomorodian and Carlsson (2005) and Carlsson (2009). TDA consists of various techniques which aim at understanding the topology of a
-dimensional manifold based on an approximating point cloud. In practice, we can think of a probability distribution, whose topological properties are of interest, and a sample from this distribution. The various methods of TDA have been successfully implemented in applied sciences such as biology (Yao et al. (2009)), material sciences (Lee et al. (2017)) or chemistry (Nakamura et al. (2015)). From the mathematical statistician’s point of view, a particular interest deserves the application of TDA to time series, see, e.g., the pioneering works of Seversky et al. (2016), Umeda (2017) and the contributions of Gidea et al. to financial time series (Gidea and Katz (2018); Gidea (2017); Gidea et al. (2018)).
This present contribution falls into the area of persistent homology which is one of the major tools in TDA. We can only give a short introduction to this topic here, a more detailed introduction which offers insights to the basic concepts, ideas and applications of persistent homology can be found in Chazal and Michel (2017), Oudot (2015) and Wasserman (2018).
The basic ingredient for the study of persistent Betti numbers is a realization of a point cloud in (a sample of a point process) and simplicial complexes built from this point cloud according to a rule which describes the neighborhood relation between points. The two most frequent simplicial complex models are the Rips-Vietoris and the Čech complex. When considered as geometric structures, simplicial complexes are characterized by the number of their -dimensional holes, most notably connected components, loops and cavities (0, 1 and 2 dimensional features). These holes are precisely defined with a tool from algebraic topology, the so-called homology. The th homology of a simplicial complex is determined by a quotient space. Its dimension is the so-called th Betti number. Intuitively, the th Betti number counts the number of -dimensional holes in the simplicial complex.
For a given simplicial complex model, we can construct an increasing sequence of simplicial complexes which is indexed by one parameter that can be understood as time, a so-called filtration. Given a filtration on a finite time interval, we can consider the evolution of the th homology groups, i.e., of the dynamical behavior of the Betti numbers. As the underlying simple point process (e.g., a Poisson process on the Euclidean space) is random, these Betti numbers are also random and we consider a stochastic process.
From the applied point of view, the mere knowledge of the evolution of the Betti numbers is often not enough, especially when considering objects obtained from persistence diagrams, such as persistent landscapes. In this context the more general concept of persistent Betti numbers is the appropriate tool.
The remainder of this manuscript is organized as follows. In Section 2 we introduce the framework and describe our results. Section 3 offers a short overview of related results which are important and needed in our present study. Section 4 contains the main results of this manuscript which are in detail derived in Section 5 and in Appendix A.
2 Notation and description of the results
Given a finite subset of the Euclidean space the Čech filtration and the Rips-Vietoris filtration are defined as
here and is the diameter of a measurable set. Throughout this article, we consider either the Čech or the Rips-Vietoris filtration and write for the underlying filtration. Furthermore, if we refer to a generic simplicial complex, we write for simplicity.
Consider a filtration and a time . Write for the homology of the simplicial complex w.r.t. to the base field . Write for the th cycle group of the simplicial complex and for the th boundary group. Let . Then -persistent Betti number (see Edelsbrunner et al. (2000)) of a simplicial complex is defined by
The persistent Betti number is closely related to the persistence diagram of the underlying point cloud , see Hiraoka et al. (2018) for further details. Visually, counts the number of generators of the persistence diagram born before time and are still alive at time , see Figure 1. The Betti number is then defined as , ; this means .
The persistent Betti numbers are translation invariant, i.e., for each . For that reason the add-one cost function is an important tool in our analysis. Moreover, if , we write for the -simplices in with at least one edge in and we write for the entire set of -simplices in . Set for .
A density function is blocked if , where and the are subcubes forming a partition of as follows: Each edge of the cube is partitioned into intervals of the same length and the products of these intervals form the cubes with volume for . The density function satisfies on the following smoothness criterion:
E.g., a continuously differentiable intensity function is admissible. In the case of the Rips-Vietoris or Čech filtration a continuous (continuously differentiable) intensity implies that also the persistence diagram of the underlying point process admits a continuous (continuously differentiable) density (w.r.t. the Lebesgue measure) by the work of Chazal and Divol (2018).
In our analysis, stabilizing properties of the persistent Betti numbers are crucial. The ideas of the stabilization of functionals defined on spatial point processes have their origins in Lee (1997, 1999) and are extended in Penrose and Yukich (2001); Penrose et al. (2003). The stabilization of a functional defined on subsets of a point process roughly means that a local change in the point process (e.g., adding or subtracting finitely many points) affects the value of the functional only locally. This latter phenomenon can be described with different notions. We consider two radii of stabilization for the persistent Betti function . Their functionality is related to the weak and the strong stabilizing property in Penrose and Yukich (2001), see also the discussion below, where we address the properties of these radii in detail.
Radius of weak stabilization: Consider a point process on without accumulation points and let be a finite subset of centered around a point , i.e., for some . (E.g., is centered around and the set is centered around for each .) Write for short and for .
Let and fix a subset for some . Define the radius of weak stabilization of by
Radius of strong stabilization: Let be an arbitrary but fixed filtration parameter. Let be an upper bound on the diameter of simplices in the filtration at time , clearly, depends on the filtration. For the Vietoris-Rips filtration, equals by definition. For the Čech filtration is a sharp bound. We choose sufficiently large such that no further relevant -simplices are born after the radius due to the additional points from .
Write , , for the -simplices in contained in the ball that are created until filtration time due to the addition of the point set to the point process ; remember that is centered around a . Also, w.l.o.g., the simplices are already ordered according to their filtration time; if several simplices have the same filtration time, then we order them at random.
We call the number , which limits the knowledge of a point process to the ball , the information horizon, i.e., we only consider the process and the corresponding simplicial complex restricted to .
For , define
where is the natural image of ; note that not necessarily . Then, for a fixed filtration parameter , and for each , define a radius of strong stabilization by
where . This definition means the following: Consider an information horizon , i.e, we observe all points from the point process . If we include a -simplex in the simplicial complex, we already have the information that either is contained in a -cycle in or that it remains negative in for all . The latter means that, up to the information horizon , each cycle candidate in containing has already terminated.
Thus, is a stopping time w.r.t. to the natural filtration with origin of , i.e., for all
Moreover, it is true that for each pair , see also Lemma A.3.
For the classical notion of the stabilization of a functional defined on finite subsets of , Penrose and Yukich (2001) consider the add-one cost function for finite point sets . The functional is strongly stabilizing (on the homogeneous Poisson process with intensity , ) if there exist
finite random variablesand such that for all finite .
Let denote a sequence of bounded Borel subsets of (”windows”) and let be the collection as in Penrose and Yukich (2001) The functional is weakly stabilizing on (for the homogeneous Poisson process with intensity , ) if there is an finite random variable such that as for any sequence from the collection .
Our definition of the radius of weak stabilization implies that for all
The radius of weak stabilization is finite if has no accumulation points and if is finite by Lemma A.2. So the weak stabilizing property for persistent Betti numbers holds in a somewhat more general setting and is not limited to homogeneous Poisson processes. A similar result was also obtained by Hiraoka et al. (2018) for the add-one cost function of persistent Betti numbers.
In this manuscript, we show in Theorem 4.2 that the radius of strong stabilization is finite for each and if equals a homogeneous Poisson process modulo a finite set of points. In particular, this implies the strong stabilizing property of the persistent Betti number in the sense of Penrose and Yukich. For instance, consider a homogeneous Poisson process and as the additional point. Set . Then adding a finite set of points does not change the add-one cost function of the persistent Betti number, viz.,
Moreover, our results are not limited to this static case, where we only consider one Poisson process and the persistent Betti number for a one pair : We show in Theorem 4.3 that Borel probability measures induced by the radius of strong (and of weak) stabilization are tight over a variety of parameter ranges. These results allow us to overcome a major problem concerning the asymptotic normality of : So far, possible choices for the parameters were restricted to small intervals in many sampling schemes – even in the trivial case when building the filtration from an i.i.d. sampling scheme with intensity on , see Yogeshwaran et al. (2017), Owada and Thomas (2018) and Trinh (2018). We show in Theorem 4.5 that for i.i.d. observations with marginal density and , ,
for a certain covariance matrix described in detail in this theorem. This means that the multivariate asymptotic normality holds without a restriction on the parameter range of . In Theorem 4.4 we show a similar result for the corresponding Poisson sampling scheme.
We conclude this section with the introduction of further notation used throughout this article. For , we write (resp. ) if precedes (resp. succeeds) in the lexicographic ordering on and write (resp. ) if either (resp. ) or . If , write (resp. ) for the limit of from the left (resp. the right) at if this limit exists. We let denote convergence in distribution of a sequence of random variables.
3 Related results
Below we quote results which are closely related to our study. The techniques employed to obtain these results are tools from geometric probability, which studies geometric quantities deduced from simple point processes in the Euclidean space. A classical result of Steele (1988) proves the convergence of the total length of the minimum spanning tree built from an i.i.d. sample of points in the unit cube. There are several generalizations of this work, for notable contributions see McGivney and Yukich (1999), Penrose et al. (2003), Yukich (2000) and the monograph of Penrose (2003).
A different type of contribution equally important is Penrose and Yukich (2001) which studies the asymptotic normality of functionals built on the Poisson and binomial process. We will heavily use the ideas given therein to obtain limit expressions for covariance function of the finite-dimensional distributions of the persistent Betti numbers. For completeness, we mention that the study of Gaussian limits is not limited to the total mass functional (as, e.g., in Penrose and Yukich (2001) and Penrose et al. (2003)) but can also be extended to random point measures obtained from the points of a marked point process, see, e.g., Baryshnikov and Yukich (2005), Penrose (2007) and Blaszczyszyn et al. (2016).
Recently, the limiting expression for the expectation of persistent Betti numbers was obtained.
Proposition 3.1 (Divol and Polonik (2018), Lemma 9).
Let . Let be either a Poisson or an -binomial process with intensity . Let be independent of with density . Then
where is the limit of for a homogeneous binomial process on .
In another recent article Goel et al. (2018) give convergence results of Betti numbers addressing both convergence and convergence in the mean.
So far normality results for (persistent) Betti numbers exist only in a pointwise sense and are rather direct consequences of Theorem 2.1 and 3.1 given in Penrose and Yukich (2001). We quote them here in a sense which makes them more in-line with our framework. For this we need the notion of the interval of co-existence defined by the critical radius for percolation of the occupied and the critical radius of percolation of the vacant component of a Poisson process with unit intensity on . We refer to Yogeshwaran et al. (2017) for the exact definition.
Proposition 3.2 (Pointwise normality of (persistent) Betti numbers).
[Hiraoka et al. (2018) Theorem 5.2] Let be a homogeneous point process with unit intensity on and let . Let . Then there is a such that
[Yogeshwaran et al. (2017) Theorem 4.7] Let be the Čech filtration and let such that . Let be a binomial process of length and intensity on for each . Then there is a such that
First, we remark that the above statements in their original version are also valid for more general domains which are not necessarily rectangular domains such as . Furthermore, we remark that Hiraoka et al. (2018) prove their theorem for a general class of filtrations which contains among others the Čech and the Rips-Vietoris filtration. Moreover, Theorem 4.7 of Yogeshwaran et al. (2017) also contains a version of (ii) for Betti numbers of the homogeneous Poisson process, this is however contained in the result (i). Finally, we remark that Yogeshwaran et al. (2017) point out that the condition is likely to be superfluous. As already mentioned, we show that, in fact, the condition can be removed.
4 Main results
First, we present the two stabilization results for persistent Betti numbers. Let and be two sets which satisfy
is a simple point cloud on without accumulation points and is a finite subset of centered around a such that . Moreover, .
The first key result is that the stopping time from (2.3) is finite for a certain class of point processes. To this end, we first have to study objects that prevent being infinite.
Clearly as is finite, is finite if the point process does not percolate. If it does, is infinite if and only if there is a simplex which is negative until any finite information horizon but we cannot exclude the possibility that it might become positive ultimately. This means there is a tube-like object where are -simplices, such that the boundary of the restriction of to consists of two disjoint -cycles. More precisely, set
Then , for each , where are two disjoint -cycles such that is constant for all and is located near the boundary, i.e., in . The existence of such a tube in the point cloud is equivalent to being infinite, see also Figure 2 for the special case of a 1-dimensional tube.
For a homogeneous Poisson process (modulo a finite point process), we show in the next theorem that such tubes cannot occur. The proof works with arguments from continuum percolation theory, where such arguments are used to show the uniqueness of the percolation component of a homogeneous Poisson process, e.g., Aizenman et al. (1987a, b) and Burton and Keane (1989) as well as the monograph of Meester and Roy (1996).
Theorem 4.2 (Strong stabilization).
For a Poisson process with constant intensity and two finite (disjoint) sets with at most points, centered around a point the radius is finite for each and for each .
This strong stabilizing property enables us to obtain further uniform stabilization results which then yield the asymptotic normality of the persistent Betti numbers from (1.1). The next theorem is divided in three parts. In the first part, we consider the uniform stabilization over a variety of homogeneous Poisson processes. These stabilizing properties enable us to derive the results given second and the third part, where we consider the stabilizing properties in our binomial and Poissonian sampling scheme. Hence, these latter results are then used for the derivation of the multivariate asymptotic normality.
Theorem 4.3 (Uniform stabilization).
Assume Condition 4.1. Let be the class of sets with at most points in , . Then
Stabilization for the homogeneous Poisson case: For , let be a homogeneous Poisson process on with intensity . Let . Then, the laws of
are tight for each .
Stabilization in the Poisson sampling scheme: Let be a continuous probability density on . For each , let be independent Poisson processes with intensities . Then, for each , for each and for each , there is an such that uniformly in and
where }. In particular, for each , , there is an such that uniformly in and
Stabilization in the binomial sampling scheme: Let be a binomial process on obtained from an i.i.d. sequence with common density . Let be an independent random variable with density . Write for the point process for , where the function satisfies and as . Then the family is tight for every .
Furthermore, let for any . Then all these results remain valid if , , is replaced by for parameters in .
allow us to conclude the convergence of the finite-dimensional distributions to a normal distribution. Letbe independent and homogeneous Poisson processes with unit intensity on . Set . By the stabilizing property of the persistent Betti numbers, there are random variables and such that
see Lemma 5.6. Set . Then the following asymptotic normality result holds in the Poisson sampling scheme.
Let be a Poisson process with intensity on . Let and for . Then
where the covariance matrix is given by
Moreover, let be a homogeneous Poisson process with intensity on for each . Define for
Let be a binomial process with intensity . Let and for . Then
where the covariance matrix is given by
5 Technical results
This section consists of three parts. First, we give the proofs of Theorems 4.2 and 4.3. In the second part, we prove the asymptotic normality of the finite-dimensional distributions of the persistent Betti numbers obtained from Poisson processes. In the third part, we repeat these considerations for an underlying binomial process.
The next result is crucial for the upcoming proofs, the so-called geometric lemma enables us to obtain upper bounds on moments. The result for Betti numbers is well-known (to topologists), we quote here a generalized version due toHiraoka et al. (2018) (Lemma 2.11).
Lemma 5.1 (Geometric lemma).
Let be two finite point sets of . Then
5.1 The proofs of the stabilization results
Proof of Theorem 4.2.
It is clear, that is is sufficient to consider the case where . Also, w.l.o.g. the set is empty. Given the Poisson process , we write for the random geometric graph with vertex set and undirected edges connecting all those pairs from with .
So it only remains the situation where percolates as otherwise the finiteness of each radius is immediate. We study the tube-like objects obtained from the -simplices of , i.e., where are -simplices. For each , define as the restriction of restricted to , which means that we only include those in the sum which are entirely contained in . As explained in the discussion after Condition 4.1, the relevant for us are tubes (infinite sums) such that , for each , where are two disjoint -cycles such that is constant for all and is located near the boundary, i.e., in .
Naturally, this tube satisfies that the diameter of is restricted to an interval : If the diameter of is below the threshold , then this open end connects without the additional points , so that is in fact 0; (note that any -cycle consists of at least -many points). Also if the diameter of is above , then all additional points from are not enough to close this end of the tube. So only tubes with these restrictions on are relevant. Also and this is crucial, due to the boundary restriction, the diameter within each cycle has to be above .
This leads us to the following important observation: The number of disjoint tubes that can pass through the surface is at most and the constant depends on but not on . Indeed, there is a saturation threshold where adding another tube with an end of type inside the ball only results in -cycles.
Let be the -dimensional cube with edge length and center . Consider now the events
for and . Let and let the cube be partitioned in subcubes of the type , . Then due to the shift invariance (which is valid in the present Poisson situation)
Thus, for any ; otherwise, as the expectation on the left-hand side is also bounded above by a , we obtain a contradiction if is large enough. Consequently, such tubes cannot exist. ∎
Proof of Theorem 4.3.
We write for the upper bound on the diameter of a simplex with filtration time at most , . We prove the statements for the radius of strong stabilization . Using the relation
from Lemma A.3, for each , allows us then to conclude the results for the radius of weak stabilization also. We proceed for each separately; clearly this is no restriction.
In the following, if , , and are fixed, we just write for if the Poisson process has intensity .
In the rest of the proof, we always assume w.l.o.g. that the Poisson process with an intensity function on is given by