Topological data analysis (TDA) encompasses a broad set of techniques that explore topological structure in datasets [9, 12, 7, 33]. One of these techniques, persistent homology, associates shapes to data and summarizes salient features with persistence diagrams. Data analysis methods involving persistence diagrams have been developed and applied in various settings [31, 2, 16, 32, 28, 34, 26, 27, 3].
Persistence diagrams are multisets of points that represent homological features along with their appearance and disappearance scales . Features of a persistence diagram that exhibit long persistence describe global topological properties in the underlying dataset, while those with shorter persistence encode information about local geometry. Hence, persistence diagrams can be considered multiscale summaries of data’s shape. While there are several methods present in literature to compute persistence diagrams, we adopt geometric complexes that are typically used for applications of persistent homology to data analysis; see [9, 16, 31, 28] and references therein.
Researchers desire to utilize persistence diagrams for inference and classification problems. Several achieve this directly with persistence diagrams [19, 16, 4, 11, 20, 24, 6], while others elect to first map them into a Hilbert space [5, 23, 1, 30, 10]
. The latter approach enables one to adopt traditional machine learning tools such as principal component analysis, random forests, support vector machines, and more general kernel-based learning schemes. Despite progress toward statistical inference, to the best of our knowledge, an analog of the Bayesian paradigm that relies on computing posterior distributions of persistence diagrams from priors and likelihoods is still absent in literature. Herein, we develop a Bayesian framework on the space of persistence diagrams.
The homological features in persistence diagrams have no intrinsic order implying they are random sets as opposed to random vectors. This viewpoint is embraced in 
to construct a kernel density estimator for persistence diagrams. Adapting this tool, one may empirically obtain prior distributions for persistence diagrams; however, computing posteriors entirely through the random set analog of Bayes’ rule is intractable. This state of affairs is due to the combinatorial explosion involved with evaluating likelihood functions for random sets. To address this, we model random persistence diagrams as Poisson point processes. The defining feature of these point processes is that they are solely characterized by a single parameter known as the intensity. This alleviates the computational burden associated with deriving the posterior intensity from Bayes’ rule alone.
. Given a collection of observed persistence diagrams, we consider the underlying stochastic phenomena generating persistence diagrams to be Poisson point processes with prior uncertainty captured in presupposed intensities. In applications, one may select an informative prior by choosing an intensity based on expert opinion, or alternatively choose an uninformative prior intensity with little information. The analog of likelihood functions in our model represent the level of belief that observed diagrams are representative of the entire population. With the prior intensity and a likelihood in hand, we compute the posterior intensity using tools from the theory of point processes. To aid exposition, we explicitly illustrate parallels between our Bayesian framework for persistence diagrams and the standard Bayesian framework for random variables.
Another key contribution of this paper is the derivation of a closed form of the posterior intensity, which relies on conjugate families of Gaussian mixtures. An advantage of this Gaussian mixture representation is that it allows us to perform Bayesian inference in an efficient and reliable manner. Indeed, this model can be viewed as an analog of the ubiquitous example in standard Bayesian inference where a Gaussian prior and likelihood yield a Gaussian posterior. We present a detailed example of our closed form implementation to demonstrate computational tractability and showcase its applicability by using it to build a Bayes factor classification algorithm; we test the latter in a classification problem for materials science data.
This paper is organized as follows. Section 2 provides a brief overview of persistence diagrams and general point processes. Our methods are presented in Section 3. In particular, Subsection 3.1 establishes the Bayesian framework for persistence diagrams, while Subsection 3.2
contains the derivation of a closed form for a posterior distribution based on a Gaussian mixture model. A classification algorithm with Bayes factors is discussed in Section4. To assess the capability of our algorithm, we investigate its performance on materials data in Subsection 4.1. Finally, we end with discussions and conclusions in Section 5.
We begin by discussing preliminary definitions essential for building our model. In Subsection 2.1, we briefly review simplicial complexes and provide a formal definition for persistence diagrams (PDs). Pertinent definitions and theorems from point processes (PPs) are discussed in Subsection 2.2 .
2.1 Persistence Diagrams
We start by discussing simplices and simplicial complexes, intermediary structures for constructing PDs.
A -dimensional collection of data is said to be geometrically independent if for any set with , the equation implies that for all
A simplex, is a collection of geometrically independent elements along with their convex hull: . We say that the vertices span the dimensional simplex, . The faces of a simplex , are the simplices spanned by subsets of .
A simplicial complex is a collection of simplices satisfying two conditions: (i) if , then all faces of are also in , and (ii) the intersection of two simplices in is either empty or contained in .
Given a point cloud , our goal is to construct a sequence of simplicial complexes that reasonably approximates the underlying shape of the data. We accomplish this by using the Vietoris-Rips filtration.
Let be a point cloud in and . The Vietoris-Rips complex of is defined to be the simplicial complex satisfying if and only if . Given a nondecreasing sequence with , we denote its Vietoris-Rips filtration by .
A persistence diagram is a multiset of points in , where and each element represents a homological feature of dimension that appears at scale during a Vietoris-Rips filtration and disappears at scale . Intuitively speaking, the feature is a dimensional hole lasting for duration . Namely, features with correspond to connected components, to loops, and to voids. An example of a PD is shown in Figure 4.
2.2 Poisson Point Processes
This section contains basic definitions and fundamental theorems from PPs, primarily Poisson PPs. Detailed treatments of Poisson PPs can be found in  and references therein. For the remainder of this section, we take and to be a Polish space and its Borel -algebra, respectively.
A finite point process is a pair where and
is a symmetric probability measure on, where is understood to be the trivial -algebra.
The sequence defines a cardinality distribution and the measures give spatial distributions of vectors for fixed . Definition 2.5 naturally prescribes a method for sampling a finite PP: (i) determine the number of points by drawing from then, (ii) spatially distribute according to a draw from . As PPs model random collections of elements in whose order is irrelevant, any sensible construction relying on random vectors should assign equal weight to all permutations of . This is ensured by the symmetry requirement in Definition 2.5. We abuse notation and write for samples from as well as their set representations. It proves useful to describe finite PPs by a set of measures that synthesize and to simultaneously package cardinality and spatial distributions.
Let be a finite PP. The Janossy measures are defined as the set of measures satisfying
Given a collection of disjoint rectangles , the value is the probability of observing exactly one element in each of and none in the complement of their union. For applications, we are primarily interested in Janossy measures that admit densities with respect to a reference measure on . We are now ready to describe the class of finite PPs that model PDs.
Let be a finite measure on and . The finite point process is Poisson if, for all and measurable rectangles , and We call an intensity measure.
Equivalently, a Poisson PP is a finite PP with Janossy measures The intensity measure in Definition 2.7 admits a density, , with respect to some reference measure on . Notice that for all , . Elementary calculations then show . Thus, we interpret the intensity measure of a region , as the expected number of elements in that land in
. The intensity measure serves as an analog to the first order moment for a random variable.
The next two definitions involve a joint PP wherein points from one space parameterize distributions for the points living in another. Consequently, we introduce another Polish space along with its Borel -algebra to serve as the mark space in a marked Poisson PP. These model scenarios in which points drawn from a Poisson PP provide a data likelihood model for Bayesian inference with PPs.
Suppose is a function satisfying: 1) for all , is a probability measure on , and 2) for all , is a measurable function on . Then, is a stochastic kernel from to .
A marked Poisson point process is a finite point process on such that: (i) is a Poisson PP on , and (ii) for all , measurable rectangles , , where is the set of all permutations of and is a stochastic kernel.
Given a set of observed marks it can be shown  that the Janossy densities for the PP induced by on given M are
where is the stochastic kernel for evaluated in for a fixed value of .
The following theorems allow us to construct new Poisson PPs from existing ones. Their proofs can be found in .
Theorem 2.1 (The Superposition Theorem).
Let be a collection of independent Poisson PPs each having intensity measure . Then their superposition given by is a Poisson PP with intensity measure .
Theorem 2.2 (The Mapping Theorem).
Let be a Poisson PP on with -finite intensity measure and let be a -algebra. Suppose is a measurable function. Write for the induced measure on given by for all . If has no atoms, then is a Poisson PP on with intensity measure .
Theorem 2.3 (The Marking Theorem).
The marked Poisson PP in Definition 2.9 has the intensity measure given by , where is the intensity measure for the Poisson PP that induces on , and is a stochastic kernel.
The final tool we need is the probability generating functional as it enables us to recover intensity measures using a notion of differentiation. The probability generating functional can be interpreted as the PP analog of the probability generating function.
Let be a finite PP on a Polish space . Denote by the set of all functions with . The probability generating functional of denoted is given by
Let be the probability generating functional given in Equation (2). The functional derivative of in the direction of evaluated at , when it exists, is given by .
It can be shown that the functional derivative satisfies the familiar product rule . As is proved in , the intensity measure of the Poisson PP in Definition 2.7 can be obtained by differentiating , i.e., , where is the indicator function for any . Generally speaking, one obtains the intensity measure for a general point process through , but the preceding identity suffices for our purposes since we only consider point processes for which Equation (2) is defined for all bounded .
The intensity function for the PP whose Janossy densities are listed in Equation (1) is .
This directly follows from writing the probability generating functional for the PP in question using its Janossy densities then applying . By Definition 2.10, linearity of the integral, and Fubini’s theorem, we have where is the probability generating functional for the PP with Janossy densities and for . One arrives at the desired result by applying the product rule for functional derivatives and the intensity retrieval property of probability generating functionals. ∎
3 Bayesian Inference
In this section, we construct a framework for Bayesian inference with PDs by modeling them as Poisson PPs. First, we derive a closed form for the posterior intensity given a PD drawn from a finite PP, and then we present a family of conjugate priors followed by an example.
Given a persistence diagram , the map given by defines tilted representation of as ; see Figure 4. In the sequel, we assume all PDs are given in their tilted representations and, unless otherwise noted, abuse notation by writing and for and , respectively. We also fix the homological dimension of features in a PD by defining .
According to Bayes’ theorem, posterior density is proportional to the product of a likelihood function and a prior. To adopt Bayesian framework to PDs, we need to define two models. In particular, our Bayesian framework views a random PD as a Poisson PP equipped with a prior intensity while observed PDsare considered to be marks from a marked Poisson PP. This enables modification of the prior intensity by incorporating observed PDs, yielding a posterior intensity based on data. Some parallels between our Bayesian framework and that for random variables (RVs) are illustrated in Table 1.
|Bayesian Framework for RVs||Bayesian Framework for Random PDs|
|Prior||Modeled by a prior density||Modeled by a Poisson PP with prior intensity|
|Likelihood||Depends on observed data||Stochastic kernel that depends on observed PDs|
|Posterior||Compute the posterior density||A Poisson PP with posterior intensity|
Let be a finite PP and consider the following:
For , and are independent.
For fixed, and some , and are independent Poisson PPs having intensity functions and , respectively.
For fixed, where
is a marked Poisson PP with a stochastic kernel density .
and are independent finite Poisson PPs where has intensity function .
Hereafter we abuse notation by writing for . The modeling assumption (M1) allows us to develop results in isolation for each homological dimension then combine them using independence. In (M2), the random persistence diagram modeled as a Poisson PP is decomposed into two PPs, , to account for scenarios in which a point from has no mark in observed persistence diagrams. Indeed, depending upon the noise level in data, it is possible that features in are not represented in observations. To this end, the intensities of and are proportional to the intensity weighted by and respectively, where is the probability of a point to spawn a mark, and the total prior intensity for is given by their sum. By the same token, (M3) considers observed persistence diagram and decomposes it into two independent PDs, and . is linked to via a marked point process with likelihood defined in Equation (1), whereas the component includes any point that arises from noise or unanticipated geometry. See Figure 7 for a graphical representation of these ideas.
Theorem 3.1 (Bayesian Theorem for Persistence Diagrams).
Let be a persistence diagram modeled by a Poisson PP as in (M2). Decompose into and where and have prior intensities and , respectively. Consider independent samples from the point process that characterizes the persistence diagram of (M3) and denote where for all . Moreover, is the likelihood associated for the stochastic kernel between and , and is the intensity of as defined in (M3). Then, the posterior intensity of given is
The proof can be found in the appendix.
3.2 A Conjugate Family of Prior Intensities: Gaussian Mixtures
This section focuses on constructing a family of conjugate prior intensities, i.e., a collection of priors that yield posterior intensities of the same form when used in Equation (3). Exploiting Theorem 3.1 with Gaussian mixture prior intensities, we obtain Gaussian mixture posterior intensities. As PDs are stochastic point processes on the space , not , we consider a restricted Gaussian density restricted to . Namely, for a Gaussian density on , , with mean and covariance matrix , we restrict the Gaussian density on as
where is the indicator function of the wedge .
Consider a random persistence diagram as in (M2) and a collection of observed PDs that are independent samples from Poisson PP characterizing the PD in (M3). We denote . Below we specialize (M2) and (M3) so that applying Theorem 3.1 to a mixed Gaussian prior intensity yields a mixed Gaussian posterior:
, where and are independent Poisson PPs with intensities and , respectively, with
where is the number of mixture components.
the marked Poisson PP has density given by
and are independent finite Poisson PPs and has intensity function given below.
where is the number of mixture components.
For matrices , with and positive definite ,and a vector ,
, where and .
Proof of Proposition 3.1.
Using Lemma 3.2, we first derive by observing that, in our model, and . By typical matrix operations we obtain, , and . Hence the numerator and denominator of the second term in Equation (3), , and respectively, yield
where the bracketed expression is the definition of . ∎
Here, we present a detailed example of computing the posterior intensity according to Equation (8) for a range of parametric choices. We consider circular point clouds often associated with periodicity in signals  and focus on estimating homological features with as they correspond to 1-dimensional holes, which describe the prominent topological feature of a circle. Precisely our goals are to: (i) illustrate posterior intensities and draw analogies to standard Bayesian inference; (ii) determine the relative contributions of the prior and observed data to the posterior; and (iii) perform sensitivity analysis.
|Weakly informative Prior|
|Unimodal Uninformative Prior|
|Bimodal Uninformative Prior|
We start by considering a Poisson PP with prior intensity that has the Gaussian mixture form given in (M2). We take into account four types of prior intensities: (i) informative, (ii) weakly informative, (iii) unimodal uninformative, and (iv) bimodal uninformative; see Figures 21–(l)l (a), (d), (g), (j), respectively. We use one Gaussian component in each of the first three priors as the underlying shape has single dimensional feature and two for the last one to include a case where we have no information about the cardinality of the underlying true diagram. The parameters of the Gaussian mixture density in Equation (5) used to compute these prior intensities are listed in Table 2. To present the intensity maps uniformly throughout this example while preserving their shapes, we divide the intensities by their corresponding maxima. This ensures all intensities are on a scale from to , and we call it the scaled intensity. The observed PDs are generated from point clouds sampled uniformly from the unit circle and then perturbed by varying levels of Gaussian noise; see Figure 8 wherein we present three point clouds sampled with Gaussian noise having variances , , and , respectively. Consequently, these point clouds provide persistence diagrams for , which are considered as independent samples from Poisson point process , exhibiting distinctive characteristics such as only one prominent feature with high persistence and no spurious features (Case-I), one prominent feature with high persistence and very few spurious features (Case-II), and one prominent feature with medium persistence and more spurious features (Case-III).
Parameters for (M3) in Equation (6) and (7). We set the weight and mean of the Gaussian component, and respectively for all of the cases. The first row corresponds to parameters in the functions characterizing that are used in computing the posterior depicted in the first column of Figure 7. The second row corresponds to analogous parameters that are used in computing the posterior depicted in the second columns of Figures 21–7. Similarly, the third row corresponds to parameters in the functions characterizing used for computing the posterior presented in the third columns of Figures 21–7.
For each observed PD, persistence features are presented as green dots overlaid on their corresponding posterior intensity plots. For Cases-I-III, we set the probability of the event that a feature in appears in to , i.e., any feature in is certainly observed through a mark in , and later in Case-IV, we decrease to while keeping all other parameters the same for the sake of comparison. The choice of anticipates that any feature has equal probability to appear or disappear in the observation and in turn provides further intuition about the contribution of prior intensities to the estimated posteriors. We observe that in all cases, the posterior estimates the dimensional hole; however, with different uncertainty each time. For example, for the cases where the data are trustworthy, expressed by a likelihood with tight variance, or in the case of an informative prior, the posterior accurately estimates the dimensional hole. In contrast, when the data suffer from high uncertainty and the prior is uninformative, then the posterior offers a general idea that the true underlying shape is a circle, but the exact estimation of the 1-dimensional hole is not accurate. We examine the cases below.
Case-I: We consider informative, weakly informative, unimodal uninformative and bimodal uninformative prior intensities as presented in Figure 21 (a), (d), (g) and (j) respectively to compute corresponding posterior intensities. The prior intensities parameters are listed in Table 2. The observed PD is obtained from the point cloud in Figure 8 (left). The parameters associated to the observed PD are listed in Table 3. For the observed PD arising from data with very low noise, we observe that the posterior computed from any of the priors predicts the existence of a one dimensional hole accurately. Firstly, with a low variability in observed persistence diagram , the posterior intensities estimate the hole with high certainty (Figure 21 (b), (e), (h) and (k) respectively). Next, to determine the effect of observed data on the posterior, we increase the variance of the observed PD component , which consists of features in observed PDs that are associated to the underlying prior. Here, we observe that the posterior intensities still estimate the hole accurately due to the trustworthy data; this is evident in Figure 21 (c), (f), (i) and (l). In Figure 21, the posteriors in (b), (e), (h), and (k) have lower variance around the 1-dimensional feature in comparison to (c), (f), (i), and (l) respectively.
, we observe increased intensity skewed towards the spurious point in (f). Furthermore in (i) and (l), we observe bimodality in the posterior intensity.
Case-II: Here, we consider all four priors as in Case-I (see Figure 34 (a), (d), (g) and (j)). The point cloud in Figure 8 (center) is more perturbed around the unit circle than that of Case-I (Gaussian noise with variance ). Due to this, the associated PD exhibits spurious features. The parameters used for this case are listed in Table 3. We compute the posterior intensities for each type of prior. First, to illustrate the posterior intensity and check the capability of detecting the dimensional feature, we use moderate noise for the observed PD . The results are presented in Figure 34 (b), (e), (h), and (k); overall, the posteriors estimate the prominent feature with different variances in their respective posteriors. Next, to illustrate the effect of observed data on the posterior, we increase the variance of . According to our Bayesian model, the persistence diagram component contains features that are not associated with , so increasing yields that every observed point is linked to , and therefore one may expect to observe increased intensity skewed towards the spurious points that arise from noise. Indeed, posterior intensities with weakly informative, unimodal uninformative, and bimodal uninformative priors exhibit the skewness toward the spurious point in Figure 34 (f), (i) and (l) respectively, but this is not the case when an informative prior is used. In (f), we observe increased intensity skewing towards the spurious points, and in (i) and (l) the intensity appears to be bimodal with two modes – one at the prominent and other at the spurious point. For the bimodal uninformative prior since one mode is located close to the spurious point in the observed PD, we observe higher intensity for that mode in the posterior (Figure 34 (l)) with another mode estimating the prominent feature.
Case-III: We again consider the four types of priors here. The observed PD constructed from the point cloud in Figure 8 (right). The point cloud has Gaussian noise with variance and due to the high noise level in sampling relative to the unit circle, the associated PD exhibits one prominent feature and several spurious features. We repeat the parameter choices as in Case-I for the variances of observed PD. For the choice of and , the posteriors computed from all of the four priors are able to detect the difference between the one prominent and other spurious points. We increase the variance of to determine the effect of observed PD on the posterior and we observe that only the posterior intensity from informative prior has evidence of the hole (Figure (l)l(c)). For the weakly informative and uninformative priors, while the posteriors in (f), (i) and (l) may not detect the hole clearly, in (f) we observe a mode with higher variance and in (i) and (l), a tail towards the high persistence point implying presence of a hole. It should be noted that with the informative prior the posterior intensity identifies the hole closer to the mode of the prior as we increase the variance in .
Case-IV: Lastly, in this case we concentrate on the effect of . The rest of the parameters used for this case remain the same and are listed in Table 3. We decrease to to model the scenario that a feature in has equal probability to appear or vanish in observed . The columns of Figure 7 correspond to the parameters of the observed persistence diagram used in computing the posteriors depicted in the third column of Figure 21, third column of Figure 34, and second column of Figure (l)l respectively. By comparing them with their respective cases, we notice a change in the intensity level in all of these due to the first term of the posterior intensity on the right hand side of Equation (8). Comparing with the respective figures in Case-I, we observe that the posterior intensities are estimating the hole with higher variability for the weakly informative and unimodal uninformative priors. For bimodal prior, we observe bimodality in the posterior. Next for Case-II, the existence of a hole is evident for informative and weakly informative priors with higher uncertainty when compared to their previous cases. The unimodal and bimodal uninformative priors lead to bimodal and trimodal posteriors, respectively. We observe that the posterior resembles the prior intensity more closely when we compare them to respective figures in Case-III. One can especially see this with the informative, weakly informative and bimodal uninformative priors, which have significantly increased intensities at the location of the modes of prior.
The Bayesian framework introduced in this paper allows us to explicitly compute the posterior intensity of a PD given data and prior knowledge. This lays the foundation for supervised statistical learning methods in classification. In this section, we build a Bayes factor classification algorithm based on notions discussed in Section 3 and then apply it on materials data, in particular, on measurements for spatial configurations of atoms.
We commence our classification scheme with a persistence diagram belonging to an unknown class. We assume that is sampled from a Poisson point process in with the prior intensity having the form in (M2). Consequently, its probability density has the form
where , with probability as in (M2). Next suppose we have two training sets and from two classes of random diagrams and , respectively. The likelihood densities of respective classes take the form of Equation (6). We then follow Equation (8) to obtain the posterior intensities of given the training sets and
from the prior intensities and likelihood densities. In particular, the corresponding posterior probability density ofgiven the training set is
and the posterior probability density given is given by an analogous expression. The Bayes factor defined by
provides the decision criterion for assigning to either or . More specifically, for a threshold , implies that belongs to and implies otherwise. We summarize this scheme in Algorithm 1.
4.1 Real Data
A crucial first step in understanding properties of a material is determining its crystal structure. For highly disordered metallic alloys, such as High Entropy Alloys (HEAs), Atomic Probe Tomography (APT) gives a snapshot of the local atomic environment. APT has two main drawbacks: experimental noise and missing data. Approximately 67% of the atoms in a sample are not registered in a typical experiment, and those atoms that are captured have their spatial coordinates corrupted by experimental noise. The goal is using Algorithm 1
to classify the true crystal lattice of a noisy and sparse materials dataset, where the unit cells are either Body centered cubic (BCC) or Face centered cubic (FCC), see Figure11. The BCC structure has a single atom in the center of the cube, while the FCC has a void in its center but has atoms on the center of the cubes’ faces.
A related approach to classifying crystal structures of defective materials using deep learning was proposed by
. The authors employ a convolutional neural network for classifying the crystal structure by looking at a diffraction image. Furthermore, the authors suggest their method could be used to determine the crystal structure of APT data or other defective materials data. However, the synthetic data considered in is not a realistic representation of experimental APT data, where about 67% of the data is missing [18, 21, 25] and is corrupted by more observational noise  than considered by .
After storing both types of spatial configurations as point clouds, we compute their Rips filtrations (see Section 2), collecting resultant 1-dimensional homological features into PDs; see Figure 14. To perform classification with Algorithm 1, we started by specifying priors for each class, and . Two scenarios were considered, namely using separate informative priors and the same uninformative prior for both BCC and FCC classes. The parameters for each type of prior intensity are shown in Table 4. In particular, we chose to include two Gaussian components in the weakly informative priors because each class generally contained two types of persistence features. As can be seen in Figure 14, BCC had late birth features with either high or low persistence, while FCC had similar features with earlier births on a smaller scale. For all cases, we set and . We chose a relatively high weight for because the nature of the data implied that extremely low persistence holes were rare events arising from noise. The data set had 42 diagrams from each class. To perform 10-fold cross validation, we partitioned PDs from both classes into training and test sets. During each fold, we took the training sets from each class, and , and input them into Algorithm 1 as and , respectively. Next, we computed the Bayes factor to each diagram in the test sets. After this, we used the Bayes factors to construct receiver operating characteristic (ROC) curves and computed the resulting areas under the ROC curves (AUCs) . Finally, we used the AUCs from the 10-fold cross validation to build a bootstrapped distribution by resampling 2000 times. Information about these bootstrapped distributions is summarized in Table 5, that shows our scoring method almost perfectly distinguishes between the BCC and FCC classes using the Bayesian framework of Section 3.