Nonparametric Estimation of Probability Density Functions of Random Persistence Diagrams

We introduce a nonparametric way to estimate the global probability density function for a random persistence diagram. Precisely, a kernel density function centered at a given persistence diagram and a given bandwidth is constructed. Our approach encapsulates the number of topological features and considers the appearance or disappearance of features near the diagonal in a stable fashion. In particular, the structure of our kernel individually tracks long persistence features, while considering features near the diagonal as a collective unit. The choice to describe short persistence features as a group reduces computation time while simultaneously retaining accuracy. Indeed, we prove that the associated kernel density estimate converges to the true distribution as the number of persistence diagrams increases and the bandwidth shrinks accordingly. We also establish the convergence of the mean absolute deviation estimate, defined according to the bottleneck metric. Lastly, examples of kernel density estimation are presented for typical underlying datasets.

Authors

• 1 publication
• 7 publications
• The density of expected persistence diagrams and its kernel based estimation

Persistence diagrams play a fundamental role in Topological Data Analysi...
02/28/2018 ∙ by Frédéric Chazal, et al. ∙ 0

• Persistence Weighted Gaussian Kernel for Probability Distributions on the Space of Persistence Diagrams

A persistence diagram characterizes robust geometric and topological fea...
03/22/2018 ∙ by Genki Kusano, et al. ∙ 0

• Topological mixture estimation

Density functions that represent sample data are often multimodal, i.e. ...
12/12/2017 ∙ by Steve Huntsman, et al. ∙ 0

• Finite Mixture Model of Nonparametric Density Estimation using Sampling Importance Resampling for Persistence Landscape

Considering the creation of persistence landscape on a parametrized curv...
11/17/2018 ∙ by Farzad Eskandari, et al. ∙ 0

• A Fast and Robust Method for Global Topological Functional Optimization

Topological statistics, in the form of persistence diagrams, are a class...
09/17/2020 ∙ by Elchanan Solomon, et al. ∙ 0

• A Stable Multi-Scale Kernel for Topological Machine Learning

Topological data analysis offers a rich source of valuable information t...
12/21/2014 ∙ by Jan Reininghaus, et al. ∙ 0

• On the choice of weight functions for linear representations of persistence diagrams

Persistence diagrams are efficient descriptors of the topology of a poin...
07/10/2018 ∙ by Divol Vincent, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Topological data analysis (TDA) encapsulates a range of data analysis methods which investigate the topological structure of a dataset (Edelsbrunner and Harer, 2010). One such method, persistent homology, describes the geometric structure of a given dataset and summarizes this information as a persistence diagram. TDA, and in particular persistence diagrams, have been employed in several studies with topics ranging from classification and clustering (Venkataraman et al., 2016; Adcock et al., 2016; Pereira and de Mello, 2015; Marchese and Maroulas, 2017) to the analysis of dynamical systems (Perea and Harer, 2015; Sgouralis et al., 2017; Guillemard and Iske, 2011; Seversky et al., 2016) and complex systems such as sensor networks (De Silva and Ghrist, 2007; Xia et al., 2015; Bendich et al., 2016). In this work, we establish the probability density function (pdf) for a random persistence diagram.

Persistence diagrams offer a topological summary for a collection of -dimensional data, say , which focuses on their global geometric structure of the data. A persistence diagram is a multiset of homological features , each representing a -dimensional hole which appears at scale and is filled at scale . In general, the dataset arises from any metric space, though restricting to guarantees . For example, if the data form a time series trajectory , the associated persistence diagram describes multistability through a corresponding number of persistent 0-dimensional features or periodicity through a single persistent 1-dimensional feature. In a typical persistence diagram, few features exhibit long persistence (range of scales ), and such features describe important topological characteristics of the underlying dataset. Moreover, persistent features are stable under perturbation of the underlying dataset (Cohen-Steiner et al., 2010).

Persistence diagrams have recently seen intense active research, including significant successful effort toward facilitating previously challenging computations with them; these efforts impact evaluation of Wasserstein distance in (Kerber et al., 2016) and the creation of persistence diagrams with packages such as Dionysus (Fasy et al., 2015) and Ripser (Bauer, 2015) which take advantage of certain properties of simplicial complexes (Chen and Kerber, 2011)

. Recently, various approaches have defined specific summary statistics such as center and variance

(Bobrowski et al., 2014; Mileyko et al., 2011; Turner et al., 2014; Marchese and Maroulas, 2017), birth and death estimates (Emmett et al., 2014), and confidence sets (Fasy et al., 2014). Here we introduce a nonparametric method to construct density functions for a distribution of persistence diagrams. The development of these densities offers a consistent framework to understand the above summary statistic results through a single viewpoint.

We naturally think of a (random) persistence diagram as a random element which depends upon a stochastic procedure which is used to generate the underlying dataset that it summarizes. Given that geometric complexes are the typical paradigms for application of persistent homology to data analysis, see for example the partial list (De Silva and Ghrist, 2007; Emmett et al., 2014; Guillemard and Iske, 2011; Marchese and Maroulas, 2016; Perea and Harer, 2015; Seversky et al., 2016; Xia et al., 2015; Venkataraman et al., 2016; Edelsbrunner, 2013; Emrani et al., 2014)), we consider persistence diagrams which arise from a dataset and its associated Čech filtration. Thus, sample datasets yield sample persistence diagrams without direct access to the distribution of persistence diagrams. In this sense, a distribution of persistence diagrams is defined by transforming the distribution of underlying data under the process used to create a persistence diagram, as discussed in (Mileyko et al., 2011). The persistence diagrams are created through a complex and nonlinear process which relies on the global arrangement of datapoints (see Section 2); thus, the structure of a persistence diagram distribution remains unclear even for underlying data with a well-understood distribution. Indeed, known results for the persistent homology of noise alone, such as (Adler et al., 2014)

, primarily concern the asymptotics of feature cardinality at coarse scale. With little previous knowledge, we study these distributions through nonparametric means. Kernel density estimation is a well known nonparametric technique for random vectors in

(Scott, 2015); however, persistence diagrams lack a vector space structure and thus these techniques cannot be applied directly here.

There has been extensive work to devise various maps from persistence diagrams into Hilbert spaces, especially Reproducing Kernel Hilbert Spaces (RKHS). For example, (Chepushtanova et al., 2015) discretizes persistence diagrams via bins, yielding vectors in a high dimensional Euclidean space. The works (Reininghaus et al., 2014) and (Kusano et al., 2016)

define kernels between persistence diagrams in a RKHS. By mapping into a Hilbert space, these studies allow the application of machine learning methods such as principal component analysis, random forest, support vector machine, and more. The universality of such a kernel is investigated in

(Kwitt et al., 2015); this property induces a metric on distributions of persistence diagrams (by comparing means in the RKHS), as (Kwitt et al., 2015) demonstrates with a two-sample hypothesis test. In a similar vein, (Adler et al., 2017) utilizes Gibbs distributions in order to replicate similar persistence diagrams, e.g. for use in MCMC type sampling.

All previous approaches kernelize to map into a Hilbert space for typical statistical learning techniques. In a similar vein, the studies (Bobrowski et al., 2014) and (Fasy et al., 2014) work with kernel density estimation on the underlying data to estimate a target diagram as the number of underlying datapoints goes to infinity. In both cases, the target diagram is directly associated to the probability density function (pdf) of the underlying data (via the superlevel sets of the pdf). The first work constructs an estimator for the target diagram, while the second defines a confidence set. In either case, kernel density estimation is used to approximate the pdf of the underlying datapoints, assuming the data are independent and identically distributed (i.i.d.). In contrast, our work considers a different kind of kernel density which directly estimates probability densities for a random persistence diagram from a sample of persistence diagrams. This kernel density estimate converges to the true probability density as the number of persistence diagrams goes to infinity.

Instead of a transformed collection or a center diagram, the output of our method is an estimate of a probability density function (pdf) of a random persistence diagram. Access to a pdf facilitates definition and application of many statistical techniques, including hypothesis testing, utilization of Bayesian priors, or likelihood methods. The proposed kernel density is centered at a persistence diagram and describes each feature as having either short or long persistence; by treating each long-persistence point individually and short persistence points collectively, the kernel density strikes a careful balance between accuracy and computation time. Our method also enables expedient sampling of new persistence diagrams from the kernel density estimate. In contrast to previous methodologies, our kernel density estimate has the potential to describe high probability features in a random persistence diagram, even if these features have brief persistence. Such features are typically indicative of the geometric structure, e.g., curvature, of the dataset rather than its topology.

The homological features in a persistence diagram come without an ordering and their cardinality is variable, being bounded but not defined by the cardinality of the underlying dataset. Thus, any notion of density must be (i) invariant to the ordering of features and (ii) account for variability in their cardinality. Indeed, the approach used to analyze a collection of persistence diagrams in (Bendich et al., 2016)

is a good step toward understanding a random persistence diagram, but requires a choice of order and considers only a fixed number of features and is therefore unsuitable for creating probability densities. In this work, we offer a kernel density with the desirable properties (i) and (ii), which also calls attention to the persistence of each feature. A typical persistence diagram has many features with brief persistence and few with moderate or longer persistence; consequently, our kernel density groups features with short persistence together in order to combat the curse of dimensionality. Indeed, the kernel density still considers features of short persistence, but simplifies their treatment in order to facilitate computation. The kernel density is defined on a pertinent space of finite random sets which is equipped to describe pdfs for random persistence diagrams generated from associated data with bounded cardinality of topological features. In this sense, our kernel density provides estimation of the distribution of persistence diagrams which in turn describes the geometry of the random underlying dataset. The requirement of bounded feature cardinality is trivially satisfied for datasets with bounded cardinality, which is reasonable for application and theory. Indeed, the creation of a persistence diagram from an infinite collection of data is often nonsensical (e.g., for anything with unbounded noise), and a scaling limit should be considered instead.

We establish the kernel density estimation problem through the lens of finite set statistics and we consequently begin with relevant backgrounds in topological data analysis in Section 2 and finite set statistics in Section 3. For further details about these two subjects, the reader may refer respectively to (Edelsbrunner and Harer, 2010) and (Matheron, 1975). Our results are presented in Section 4. In Subsection 4.1, we construct the kernel density associated to a center persistence diagram and kernel bandwidth parameter. This consists of decomposing the center persistence diagram into lower and upper halves, finding pdfs associated to each half, and lastly determining the pdf for their union. After the kernel density is defined and an explicit pdf is delivered in Thm. 1, its convergence is presented in Theorem 2. Next, Subsection 4.2 presents in detail a specific example of the kernel density. Additionally, an example of persistence diagram kernel density estimation and its convergence are demonstrated for persistence diagrams associated to underlying data with annular distribution. In Subsection 4.3, we define the mean absolute deviation (MAD) as a measure of dispersion, and present the convergence of its kernel density estimator (Thm. 3). Finally, we end with conclusions and discussion in Section 5. Further examples of KDE convergence and the proofs of the main theorems, Thm. 2 and Thm. 3, are given in the supplementary materials.

2 Topological Data Analysis Background

The topological background discussed here builds toward the definition of persistence diagrams, the pertinent objects in this work. We begin by briefly discussing simplicial complexes and homology, an algebraic descriptor for coarse shape in topological spaces. In turn, persistent homology, and its summary, persistence diagrams, are techniques for bringing the power and convenience of homology to describe subspace filtrations of topological spaces. We first consider topological spaces of discernible dimension, called manifolds.

Definition 2.1.

A topological space is called a -dimensional manifold if every point has a neighborhood which is homeomorphic to an open neighborhood in -dimensional Euclidean space.

We generalize the fixed-dimension notion of a manifold in order to define simplicial homology for simplicial complexes. We then discuss the Čech construction which is used to associate simplicial complexes to datasets.

Definition 2.2.

A -simplex is a collection of linearly independent vertices along with all convex combinations of these vertices:

 (v0,...,vk)={k∑i=0αivi:k∑i=0αi=1 and αi≥0∀i}. (2.1)

Topologically, a -simplex is treated as a -dimensional manifold (with boundary). An oriented simplex is typically described by a list of its vertices, such as . The faces of a simplex consist of all the simplices built from a subset of its vertex set; for example, the edge and vertex are both faces of the triangle .

Definition 2.3.

A simplicial complex is a collection of simplices wherein
(i) if , then all its faces are also in , and
(ii) the intersection of any pair of simplices in is another simplex in .
We denote the collection of -simplices within by .

A simplicial complex is realized by the union of all its simplices; an example is shown in Fig. 1. Conditions (i) and (ii) in Defn. 2.3 establish a unique topology on the realization of a simplicial complex which restricts to the subspace topology on each open simplex. For finite simplicial complexes realized in , this topology is also consistent with the Euclidean subspace topology.

Here we define the homology groups for a simplicial complex through purely combinatorial means, which allows for automated computation.

Definition 2.4.

The chain group (over ) on a simplicial complex of dimension is denoted by and is defined as formal sums of -simplices in :

 Ck(K)=⎧⎨⎩∑σ∈K[k]nσσ:nσ∈Z⎫⎬⎭. (2.2)
Definition 2.5.

The -th boundary map is a homomorphism defined on each simplex as an alternating sum over the faces of one dimension less:

 ∂k(v0,...,vk)=k∑n=0(−1)n(v0,...,vn−1,vn+1,...,vk). (2.3)
Remark 2.1.

Chain groups give an algebraic way to describe subsets of simplices as a formal sum. Toward this viewpoint, the chain group is often defined over instead of . In this case, the boundary maps can be understood classically; e.g., the boundary of a triangle yields (the sum of) its three edges and the boundary of an edge yields (the sum of) its endpoints. When viewed over , the presence of sign specifies simplex orientation.

Putting chain groups of every dimension together along with the boundary maps successively defined between them, we obtain a chain complex:

 {0}0←C0(K)∂1←−C1(K)∂2←−C2(K)∂3←−... (2.4)

The composition of subsequent boundary maps yields the trivial map (Edelsbrunner and Harer, 2010); this property is typically rephrased as which enables definition of the following modular groups.

Definition 2.6.

The homology group of dimension is given by

 (2.5)

where defines the coset equivalence class of .

The generators of the homology group correspond to topological features of the complex ; for example, generators for the -homology group correspond to connected components, generators of -homology group correspond to holes in , etc. The interpretation of these features is exemplified by taking the topological boundary of a ball (that is, a -sphere); for example, the boundary of an interval is two (disconnected) points while the boundary of a disc is a loop.

We wish to extend the notion of homology for a discrete set of data within a metric space . Treating the set itself as a simplicial complex, its homology yields only the cardinality of the data points. So, we utilize the metric to obtain more information. Here we denote by a metric ball centered at of radius . Fix a radius and consider the collection of neighborhoods along with its union . The filtration of sets naturally yields information about the arrangement within of the dataset at various scales. To make homology computations more tractable for , we instead consider the associated nerve complexes.

Definition 2.7.

The nerve of a collection of open sets is the simplicial complex where a -simplex is in if and only if . The nerve of the neighborhoods is called the Čech complex on the data at radius and is denoted by .

Examples of the Čech complex for the same data at different radii are depicted in Fig. 2, where they are superimposed with the associated neighborhood space. Any nerve complex trivially satisfies the requirements for a simplicial complex (Edelsbrunner and Harer, 2010). Moreover, the nerve theorem states that the nerve and union of a collection of convex sets have similar topology (they are homotopy equivalent) (Hatcher, 2002); specifically, the Čech complex and neighborhood space have identical homology for any given radius.

A priori, it is unclear which choice of scale (radius), best describes the data; and oftentimes different scales reveal different information. Thus, to investigate the topology of our data, we consider the appearance and disappearance of homological features at growing scale. This multiscale viewpoint, called persistent homology, is introduced in (Edelsbrunner et al., 2002) and yields a topological summary of the data called a persistence diagram. This is possible because we have a growing filtration of complexes, so each complex is included in the next (see Fig. 2). These inclusion maps induce inclusion maps at the chain group level and in turn induce maps (though not typically inclusions) at the level of homology groups. These induced maps are referred to here as the persistence maps, and take features to features (i.e., generators to generators) or to zero (Cohen-Steiner et al., 2007). Thus, each feature is tracked by how far the persistence maps preserve it. In turn, tracking features is boiled down to a very specific algorithm for obtaining the birth and death radii for each homological feature (e.g., see (Edelsbrunner and Harer, 2010)). Features which persist over a large range of scale are typically considered more important, and their presence is stable under small perturbations of the underlying data (Cohen-Steiner et al., 2010).

Persistent homology yields a multiset of homological features, each born at a scale , lasting until its death scale , with degree of homology ; in short, it yields a persistence diagram . We interpret the birth-death values as coordinate points with degree of homology as labels. For clarity and simplicity, we ignore any features with death value , since these features are generally a characteristic of the ambient space. In particular, one homological feature with is expected from any Čech filtration.

Specifically, for data in , we consider each feature as an element of

 W0:d−1=W×{0,...,d−1}, (2.6)

where is the infinite wedge. As a topological space, the -fold multiwedge is treated as -disconnected copies of , where has the Euclidean metric and topology.

It is desirable to define a metric between persistence diagrams with which to measure topological similarity. In TDA, Hausdorff distance is typically used to compare underlying datasets, while the bottleneck distance (Defn. 2.8) is used to compare their associated persistence diagrams (Fasy et al., 2014; Munch, 2017).

Definition 2.8.

The bottleneck distance between two persistence diagrams and is given by

 (2.7)

where ranges over all possible bijections between and which match in degree of homology. The diagonal is included in both persistence diagrams with infinite multiplicity so that any feature may be matched to the diagonal.

Remark 2.2.

Due to the unstable presence of features near the diagonal, typical metrics on persistence diagrams such as the bottleneck distance treat the diagonal as part of every persistence diagram (Mileyko et al., 2011) in order to achieve stability with respect to Hausdorff perturbations of the underlying dataset (Cohen-Steiner et al., 2007). Morally, one considers the diagonal as representing vacuous features which are born and die simultaneously. For convenient computation, the definition of bottleneck distance can be applied to each degree of homology separately.

3 Random Persistence Diagrams

In this section we establish background to make the notion of probability density for a random persistence diagram explicit and well-defined. A persistence diagram changes its feature cardinality under small perturbation of the underlying dataset, and these features have no intrinsic order. Consequently, we cannot treat persistence diagrams as elements of a vector space. Instead, we consider a random persistence diagram as a random multiset of features in the multiwedge defined in Eq. (2.6). For underlying datasets sampled from with bounded cardinality, the affiliated Čech persistence diagrams also have bounded feature cardinality and degree of homology. Thus, we assume that the cardinality of a random persistence diagram is bounded above by some value , and so consider the space . We view through a list of functions which each map the appropriate dimension of Euclidean space into its corresponding cardinality component, . This viewpoint facilitates the definition of probability densities.

Definition 3.1.

For each , consider the space of topological features, denoted , and the associated map defined by

 hN(ξ1,...,ξN)={ξ1,...,ξN}. (3.1)

The map creates equivalence classes on according to the action of the permutations ; specifically, for each . These equivalence classes yield the space

 WN0:d−1/ΠN={[ξ]hN:ξ∈WN0:d−1}, (3.2)

equipped with the quotient topology. The topology on is defined so that each lifts to a homeomorphism between and .

With a topology in hand, one can define probability measures on the associated Borel -algebra. Thus, we define a random persistence diagram to be a random element distributed according to some probability measure on for a fixed maximal cardinality . We denote associated probabilities by and expected values by . Since , we work toward defining probability densities on the collection of Euclidean spaces .

Definition 3.2 ((Matheron, 1975)).

For a given random persistence diagram and any Borel subset of , the belief function is defined as

 βD(A)=P[D⊂A]. (3.3)

Since is a Borel subset of , the collection is the quotient of under ; moreover, is clearly Borel in the Euclidean topology of . Therefore, since induces a homeomorphism (see Defn 3.1), is a Borel subset of

. The belief function of a random persistence diagram is similar to the joint cumulative distribution function for a random vector, in particular by yielding a probability density function through Radon-Nikodým type derivatives.

Definition 3.3.

(Matheron, 1975) Fix defined on Borel subsets of into . For an element or a multiset with , the set derivative (evaluated at ) is respectively given by

 δϕδξ(∅)=limn→∞ϕ(B(ξ,1/n))λ(B(ξ,1/n)),δϕδZ(∅)=δNϕδξ1...δξN=[δδξ1⋅⋅⋅δδξNϕ](∅), (3.4)

where are Euclidean balls and indicates Lebesgue measure on .

Remark 3.1.

Defn. 3.3 for set derivatives at the empty set closely mirrors the Radon-Nikodým derivative with respect to Lebesgue measure. The definition of a set derivative evaluated on a nonempty set is more involved, and is found in (Matheron, 1975). Here we are primarily concerned with evaluation at , since this suffices for the definition of a probability density function. Also, note that set derivatives satisfy the product rule.

Remark 3.2.

Restricting to a particular cardinality , consider , a function on Euclidean space which is invariant under the action of . The viewpoint of elucidates the relationship between set derivatives and Radon-Nikodým derivatives with respect to Lebesgue measure. This viewpoint also shows that the iterated derivative given in Eq. (3.4) is independent of order and thus is well-defined for a multiset .

As with typical derivatives, there is a complementary set integration operation for set derivatives. Set derivatives (at ) are essentially Radon-Nikodým derivatives with order tied to cardinality, and so the corresponding set integral acts like Lebesgue integration summed over each cardinality.

Definition 3.4.

Consider a Borel subset of and a Borel subset of . For a set function , its set integrals over and are respectively defined according to the following sums of Lebesgue integrals:

 ∫Af(Z)δZ =M∑N=01N!∫ANf(hN(ξ1,...,ξN))dξ1...dξN, (3.5a) ∫Of(Z)δZ =M∑N=01N!∫h−1N(O)f(hN(ξ1,...,ξN))dξ1...dξN, (3.5b)

where is a persistence diagram.

Dividing by in Eqs. (3.5a) and (3.5b) accounts for integrating over instead of . It has been shown that set derivatives and integrals are inverse operations (Matheron, 1975); specifically, the set derivative of a belief function yields a probability density for a random diagram such that

 βD(A)=∫AδβDδZ(∅)δZ. (3.6)

Indeed, so that Eq. (3.5a) also holds as an integral over in the sense of Eq. (3.5b).

Definition 3.5.

For a random persistence diagram , a global probability density function (global pdf) must satisfy

 ∑π∈ΠNfD(ξπ(1),...,ξπ(N))=δNβDδξ1⋅...⋅δξN(∅). (3.7)

and is described by its layered restrictions for each .

Remark 3.3.

It is necessary to make a distinction between local and global densities because the global pdf is not defined on a single Euclidean space, and is instead expressed as a collection of densities over a range of dimensions. Specifically, while each local density (for input cardinality ) is defined on , the global pdf is defined on and restricts to a local density on each input dimension. Each local density decomposes into the product of the conditional density and the cardinality probability (this follows from Prop. 3.1). Thus, each local density does not integrate to one, but instead to the associated probability . Also, the global pdf is not a set function and does not require division by , leading to the following relation:

Remark 3.4.

While the global pdf and its local constituents need not be symmetric with respect to , there is a unique choice of global pdf (up to sets of Lebesgue measure 0) which satisfies Eq. (3.7) and is symmetric under the action of . In this case, we safely abuse notation by denoting and often write and allow context to determine whether denotes a set or a vector.

The following proposition is critical to determine the global pdf for (i) the union of independent singleton diagrams (i.e., ), (ii) a randomly chosen cardinality, , followed by i.i.d. draws from a fixed distribution, and (iii) a random persistence diagram kernel density function. The proof of this proposition follows similar arguments to (Mahler, 1995) (Theorem 17, pp. 155–156).

Proposition 3.1.

Let be a random persistence diagram with cardinality bounded by and let be the belief function for . Then expands as

 βD(S)=a0+M∑m=1amqm(S),

where and .

Remark 3.5.

The decomposition in Prop. 3.1 is often applied as a first step toward finding the local density constituents of the global pdf. In particular, for .

Lastly, we encounter a computationally convenient summary for a random persistence diagram called the probability hypothesis density (PHD). The integral of the PHD over a subset in gives the expected number of points in the region ; moreover, any other function on with this property is a.e. equal to the PHD (Goodman et al., 2013).

Definition 3.6.

(Matheron, 1975) The probability hypothesis density (PHD) for a random persistence diagram is defined as the set function and is expressed as a set integral as

 FD(a)=∫{Z:{a}⊂Z}δβδZ(∅)δZ. (3.8)

In particular, for any region .

4 Kernel Density Estimation

4.1 Construction

To estimate distributions of persistence diagrams, our goal is the creation of a kernel density function about a center persistence diagram with a kernel bandwidth parameter , used for defining constituent Gaussians according to Definitions 4.1 and 4.2. Prop. 3.1 leads to the following lemma which is crucial for determining the kernel density. We refer to a random persistence diagram with as a singleton diagram, and such singletons are indexed by superscripts.

Lemma 4.1.

Consider a multiset of independent singleton random persistence diagrams . If each singleton is described by the value and the subsequent conditional pdf, , given , then the global pdf for is given by

 fD(ξ1,...,ξN)=∑γ∈I(N,M)Q(γ)N∏k=1p(γ(k))(ξk), (4.1)

for each where

 Q(γ)=Q∗(γ)N∏k=1q(γ(k)), (4.2)

consists of all increasing injections , and

 Q∗(γ)=∏Mj=1(1−q(j))∏Nk=1(1−q(γ(k))). (4.3)
Proof.

Since the singleton events are independent, the belief function for decomposes into . Next, we employ the product rule for the set derivative (see Defn. 3.3) to obtain the global pdf for in terms of the singleton belief functions and their first derivatives. Higher derivatives of are zero since are singletons (see Remark 3.5). Thus, the product rule yields first derivatives on all (ordered) subsets of the singleton belief functions:

 δNβDδξ1...δξN(∅)=∑1≤j1≠,...,≠jN≤MβD1(∅)⋅⋅⋅βDM(∅)βDj1(∅)⋅⋅⋅βDjN(∅)[δβDj1δξ1(∅)⋅⋅⋅δβDjNδξN(∅)].

By Prop. 3.1, we have that and and so

 δNβDδξ1...δξN(∅)=∑1≤j1≠,...,≠jN≤M⎡⎣∏Mj=1(1−q(j))∏Nj=1(1−q(jk))N∏k=1q(jk)⎤⎦N∏k=1p(jk)(ξk),

which nearly resembles Eq. (4.1). To bridge the gap, we describe the choice of indices by an injective function from into . In turn, each such injective function is uniquely determined by the composition of an increasing injection which decides the range of the function and permutations on the domain, . These permutations take into account the order of the range. The value of is independent of order, and thus is determined by as in Eq. (4.2). We reorder the product in order to shift these permutations onto the input variables, obtaining

 δNβDδξ1...δξN(∅)=∑π∈ΠN∑γ∈I(N,M)Q(γ)N∏k=1p(γ(k))(ξπ(k)). (4.4)

Finally, the global pdf in Eq. (4.1) follows directly from applying Eq. (3.7) to Eq.(4.4). ∎

Remark 4.1.

The global pdf in Eq. (4.1), and in particular the sum over , accounts for each possible combination of singleton presence. Moreover, summing over permutations as in Eq. (4.4) and dividing by yields a symmetric pdf with terms for every possible assignment between singletons and inputs. The weights indicate the probability of each assignment occurring, and is the product of the appropriate probability for each singleton to be either present, , or absent, , for each .

Example 1.

Consider two 1-dimensional singleton diagrams, and , with probabilities of being nonempty and , respectively. The corresponding local densities when nonempty are given by and . Lemma 4.1 yields the global pdf for through a set of local densities such that , , and . We sum over permutations and divide by ( is the input cardinality) to obtain a symmetric global pdf.

 f1(x)=(1−q(2))q(1)p(1)(x)+(1−q(1))q(2)p(2)(x)=0.12√2πe−(x+1)2/2+0.32√2πe−(x−1)2/2, (4.5a) f2(x,y)=q(1)q(2)2[p(1)(x)p(2)(y)+p(1)(y)p(2)(x)]=0.242π(e−((x−1)2+(y+1)2)/2+e−((x+1)2+(y−1)2)/2). (4.5b)

Accounting for each cardinality and following Eq. (4.5a) and Eq. (4.5b), the total probability adds up to

 =f0+∫Rf1(x)dx+∫R2f2(x,y)dxdy =(0.08)+(0.12+0.32)+(0.24+0.24)=1,

as desired. The local densities in Eq. (4.5a) and Eq. (4.5b) are plotted in Fig. 3. Though is the sum of two Gaussians, in Fig. 3 (Left) we see that the Gaussian centered at dominates, while the Gaussian centered at is only indicated by a heavy left tail. This behavior occurs because is very close to .

Now we turn toward defining the kernel density. We first define a random persistence diagram as a union of simpler constituents, and then determine its global pdf by combination in a fashion similar to Lemma 4.1. Indeed, we define the desired kernel density as the global pdf for this composite random diagram. To start, we fix a degree of homology and consider a center diagram (see Eq. (2.6)). Since is fixed, we treat within .

Long persistence points in a persistence diagram represent prominent topological features which are stable under perturbation of underlying data, and so it is important to track each independently. In contrast, we leverage the point of view that the small persistence features near the diagonal are considered together as a single geometric signature as opposed to individually important topological signatures. Toward this end, features with short persistence are grouped together and interpreted through i.i.d. draws near the diagonal. Since features cluster near the diagonal in a typical persistence diagram (see, e.g., Fig. 13 (Right) in the supplementary materials), treating short persistence features collectively simplifies our kernel density and thus speeds up its evaluation. It is imperative that these short persistence features are not ignored, because they still capture crucial geometric information for applications such as classification (Marchese and Maroulas, 2016, 2017; De Silva and Ghrist, 2007; Xia et al., 2015; Donato et al., 2016; Atienza et al., 2016). Thus, we split into upper and lower portions according to a bandwidth as

 Du={(bi,di,k)∈D:di−bi≥σ} and Dℓ={(bi,di,k)∈D:di−bi<σ}. (4.6)

Now define random diagrams centered at and centered at such that . Ultimately, the global pdf of centered at is our kernel density.

Definition 4.1.

Each feature yields an independent random singleton diagram defined by its chance to be nonempty (via Eq. (4.8)) along with its potential position

sampled according to a modified Gaussian distribution, denoted by

. The global pdf for is then determined by Lemma 4.1, where each is given by the pdf associated with , which is given by

 (4.7)

where is the pdf of the (unmodified) normal , and is the indicator function for the wedge.

The global pdf for each is readily obtained by a pair of restrictions. First, we restrict the usual Gaussian distribution to the halfspace . Features sampled below the diagonal are considered to disappear from the diagram and thus we define the chance to be nonempty by

 q(j)=P(Dj≠∅)=∫{v>u}φj(u,v)dudv. (4.8)

Afterward, the Gaussian restricted to is further restricted to and renormalized to obtain a probability measure as in Eq. (4.7). This double restriction to both and is necessary for proper restriction of the Gaussian pdf and definition of . Indeed, restriction to alone causes points with small birth time to have an artificially high chance to disappear; while restriction to alone yields nonsensical features with negative radius (with ). In kernel density estimation, the effects of this distinction become negligible as the bandwidth goes to zero. In practice, this distinction is important for features with small birth time relative to the bandwidth.

Remark 4.2.

In the Čech construction of a persistence diagram, a feature lies on the line if and only if it has degree of homology . Consequently, for a feature with , we instead take

 p(j)(d)=ϕj(d)∫R+ϕj(u)du1R+(d) and q(j)=∫R+ϕj(u)du

where is the 1-dimensional Gaussian centered at

.

Whereas the large persistence features in have small chance to fall below the diagonal and disappear, the existence of the small persistence features in is volatile: these features disappear and appear fluidly under small changes in the underlying data. The distribution of is described by a probability mass function (pmf) and lower density .

Definition 4.2.

The lower random diagram is defined by choosing a cardinality according to a pmf followed by i.i.d. draws according to a fixed density . First, take and define with mean and so that for for some independent of . The subsequent density is given by projecting the lower features of the center diagram onto the diagonal , then creating a restricted Gaussian kernel density estimation for these features; specifically,

 pℓ(b,d)=1Nℓ∑(bi,di)∈Dℓ1πσ2e−((b−bi+di2)2+(d−bi+di2)2)/2σ2. (4.9)

Projecting the lower features of the center diagram onto the diagonal simplifies later analysis and evaluation of ; without projecting, a unique normalization factor, similar to in Defn. 4.1, would be required for each Gaussian summand in Eq. (4.9). By Prop. 3.1 and Eq. (3.7), global pdfs of random persistence diagrams are described by a random vector pdf for each cardinality layer, resulting in the following global pdf for :

 fDℓ(ξ1,...,ξN)=ν(N)N∏j=1pℓ(ξj). (4.10)

Combining the expressions for and , we arrive at the following proposition.

Theorem 1.

Fix a center persistence diagram and bandwidth . Split into and according to Eq. (4.6). Define with global pdf from Eq. (4.10), and with global pdf from Eq. (4.1). Treating the random persistence diagrams and as independent, the kernel density centered at with bandwidth is given by

 Kσ(Z,D)=Nu∑j=0ν(N−j)∑γ∈I(j,Nu)Q(γ)j∏k=1p(γ(k))(ξk)N∏k=j+1pℓ(ξk), (4.11)

where is the input, for are the features, and depends on both and . Here is given by Eq. (4.2), each refers to the modified Gaussian pdf as shown in Eq. (4.7) for its matching feature in , and is given by Eq. (4.9).

Proof.

Since and are independent random persistence diagrams, the belief function decomposes into . Moreover, since derivatives above order vanish for (see Remark 3.5), the product rule and binomial-type counting yield

 δNβDδξ1...δξN(∅)=Nu∑j=0∑1≤i1≠...≠ij≤NδjβDuδξi1...δξij(∅)δN−jβDℓδξ1...^δξi1...^δξij...δξN(∅)=∑π∈ΠNNu∑j=01j!(N−j)!δjβDuδξπ(1)...δξπ(j)(∅)δN−jβDℓδξπ(j+1)...δξπ(N)(∅) (4.12)

where indicates that the given index is skipped in the set derivative (having been allocated to the other factor). Similar to the proof of Lemma 4.1, the choice of indices is replaced with a permutation ; however, the ordering within each derivative is unrelated the choice of , leading to -fold and -fold redundancy within each term.

Taking Eq. (4.10) together with Eq. (3.7) yields

 δβDℓδξπ(j+1)...δξπ(N)(∅)=(N−j)!ν(N−j)N−j∏j=1pℓ(ξj).

Also, Eq. (4.1) and Eq. yield

 δβDuδξπ(1)...δξπ(j)(∅)=∑π∗∈Πj∑γ∈I(j,Nu)Q(γ)j∏k=1p(γ(k))(ξπ∗(k)).

We substitute these relations into the final expression of Eq. (4.12). The first of these substitutions is straightforward, while the second has -fold redundant permutations overtop the existing permutations in . These substitutions yield that