sciences, computer vision(Li et al., 2014)
or to classify NBA players(Lum et al., 2013). The goal of TDA is to exploit and account for the complex topology (connectivity, loops, holes, etc.) seen in modern data. The tools developed in TDA are built upon persistent homology theory (Edelsbrunner et al., 2000; Zomorodian & Carlsson, 2005; Edelsbrunner & Harer, 2010) whose main output is a descriptor called a persistence diagram (PD) which encodes in a compact form—roughly speaking, a point cloud in the upper triangle of the square —the topology of a given space or object at all scales.
Statistics on PDs. Persistence diagrams have appealing properties: in particular they have been shown to be stable with respect to perturbations of the input data (Cohen-Steiner et al., 2007; Chazal et al., 2009, 2014). This stability is measured either in the so called bottleneck metric or in the
-th diagram distance, which are both distances that compute optimal partial matchings. While theoretically motivated and intuitive, these metrics are by definition very costly to compute. Furthermore, these metrics are not Hilbertian, preventing a faithful application of a large class of standard machine learning tools (-means, PCA, SVM) on PDs.
To circumvent the non-Hilbertian nature of the space of PDs, one can of course map diagrams onto simple feature vectors. Such features can be either finite dimensional(Carrière et al., 2015; Adams et al., 2017), or infinite through kernel functions (Reininghaus et al., 2015; Bubenik, 2015; Carrière et al., 2017). A known drawback of kernel approaches on a rich geometric space such as that formed by PDs is that once PDs are mapped as feature vectors, any ensuing analysis remains in the space of such features (the “inverse image” problem inherent to kernelization). They are therefore not helpful to carry out simple tasks in the space of PDs, such as that of averaging PDs, namely computing the Fréchet mean of a family of PDs. Such problems call for algorithms that are able to optimize directly in the space of PDs, and were first addressed by Mileyko et al. (2011) and Turner (2013). Turner et al. (2014) provides an algorithm that converges to a local minimum of the Fréchet function by successive iterations of the Hungarian algorithm. However, the Hungarian algorithm does not scale well with the size of diagrams, and non-convexity yields potentially convergence to bad local minima.
Contributions. We reformulate the computation of diagram metrics as an optimal transport (OT) problem, opening several perspectives, among them the ability to benefit from entropic regularization (Cuturi, 2013). We provide a new numerical scheme to bound OT metrics, and therefore diagram metrics, with additive guarantees. Unlike previous approximations of diagram metrics, ours can be parallelized and implemented efficiently on GPUs. These approximations are also differentiable, leading to a scalable method to compute barycenters of persistence diagrams. In exchange for a discretized approximation of PDs, we recover a convex problem, unlike previous formulations of the barycenter problem for PDs. We demonstrate the scalability of these two advances (accurate approximation of the diagram metric at scale and barycenter computation) by providing the first tractable implementation of the -means algorithm in the space of PDs.
Notations for matrix and vector manipulations. When applied to matrices or vectors, operators , division, are always meant element-wise. denotes element-wise multiplication (Hadamard product) while denotes the matrix-vector product of and .
2 Background on OT and TDA
Optimal transport is now widely seen as a central tool to compare probability measures(Villani, 2003, 2008; Santambrogio, 2015). Given a space endowed with a cost function , we consider two discrete measures and on , namely measures that can be written as positive combinations of diracs, with weight vectors satisfying and all in . The cost matrix and the transportation polytope define an optimal transport problem whose optimum
can be computed using either of two linear programs, dual to each other,
where is the Frobenius dot product and is the set of pairs of vectors in
such that their tensor sumis smaller than , namely . Note that when and all weights and are uniform and equal, the problem above reduces to the computation of an optimal matching, that is a permutation (with a resulting optimal plan taking the form ). That problem has clear connections with diagram distances, as shown in §3.
Entropic Regularization. Solving the optimal transport problem is intractable for large data. Cuturi proposes to consider a regularized formulation of that problem using entropy, namely:
where and . Because the negentropy is 1-strongly convex, that problem has a unique solution which takes the form, using first order conditions,
where (term-wise exponentiation), and is a fixed point of the Sinkhorn map (term-wise divisions):
Note that this fixed point is the limit of any sequence , yielding a straightforward algorithm to estimate . Cuturi considers the transport cost of the optimal regularized plan, , to define a Sinkhorn divergence between (here is the term-wise multiplication). One has that as , and more precisely converges to the optimal transport plan solution of (1) with maximal entropy. That approximation can be readily applied to any problem that involves terms in , notably barycenters (Cuturi & Doucet, 2014; Solomon et al., 2015; Benamou et al., 2015).
Eulerian setting. When the set is finite with cardinality , and are entirely characterized by their probability weights and are often called histograms in a Eulerian setting. When is not discrete, as when considering the plane , we therefore have a choice of representing measures as sums of diracs, encoding their information through locations, or as discretized histograms on a planar grid of arbitrary granularity. Because the latter setting is more effective for entropic regularization (Solomon et al., 2015), this is the approach we will favor in our computations.
Persistent homology and Persistence Diagrams. Given a topological space and a real-valued function , persistent homology provides—under mild assumptions on , taken for granted in the remaining of this article—a topological signature of built on its sublevel sets , and called a persistence diagram (PD), denoted as . In practice, it is of the form , namely a point measure with finite support included in . Each point in can be understood as a topological feature (connected component, loop, hole…) which appears at scale and disappears at scale in the sublevel sets of . Comparing the persistence diagrams of two functions measures their difference from a topological perspective: presence of some topological features, difference in appearance scales, etc. The space of PDs is naturally endowed with a partial matching metric defined as ():
where is the set of all partial matchings between points in and points in and denotes the orthogonal projection of an (unmatched) point to the diagonal . The mathematics of OT and diagram distances share a key idea, that of matching, but differ on an important aspect: diagram metrics can cope, using the diagonal as a sink, with measures that have a varying total number of points. We solve this gap by leveraging an unbalanced formulation for OT.
3 Fast estimation of diagram metrics using Optimal Transport
In the following, we start by explicitly formulating (6) as an optimal transport problem. Entropic smoothing provides us a way to approximate (6) with controllable error. In order to benefit mostly from that regularization (matrix parallel execution, convolution, GPU—as showcased in (Solomon et al., 2015)), implementation requires specific attention, as described in propositions 2, 3, 4.
PD metrics as Optimal Transport. The main differences between (6) and (1) are that PDs do not generally have the same mass, i.e. number of points (counted with multiplicity), and that the diagonal plays a special role by allowing to match any point in a given diagram with its orthogonal projection onto the diagonal. Guittet’s formulation for partial transport (2002) can be used to account for this by creating a “sink” bin corresponding to that diagonal and allowing for different total masses. The idea of representing the diagonal as a single point already appears in the bipartite graph problem of Edelsbrunner & Harer (2010) (Ch.VIII). The important aspect of the following proposition is the clarification of the partial matching problem (6) as a standard OT problem (1).
Let be extended with a unique virtual point encoding the diagonal. We introduce the linear operator which, to a finite non-negative measure supported on , associates a dirac on with mass equal to the total mass of , namely .
Let and be two persistence diagrams with respectively points and points . Let . Then:
where is the cost matrix with block structure
where , for .
The proof seamlessly relies on the fact that, when transporting point measures with the same mass (number of points counted with multiplicity), the optimal transport problem is equivalent to an optimal matching problem (see §2). Details are left to the supplementary material.
Entropic approximation of diagram distances. Following the correspondance established in Proposition 1, entropic regularization can be used to approximate the diagram distance . Given two persistence diagrams with respective masses and , let , , , and where is the output after iterations of the Sinkhorn map (5). Adapting the bounds provided by Altschuler et al. (2017), we can bound the error of approximating by :
where (that is, error on marginals).
Dvurechensky et al. (2018) prove that iterating the Sinkhorn map (5) gives a plan satisfying in iterations. Given (9), a natural choice is thus to take for a desired precision , which lead to a total of iterations in the Sinkhorn loop. These results can be used to pre-tune parameters and to control the approximation error due to smoothing. However, these are worst-case bounds, controlled by max-norms, and are often too pessimistic in practice. To overcome this phenomenon, we propose on-the-fly error control, using approximate solutions to the smoothed primal (2) and dual (3) optimal transport problems, which provide upper and lower bounds on the optimal transport cost.
Upper and Lower Bounds. The Sinkhorn algorithm, after at least one iteration (), produces feasible dual variables (see below (1) for a definition). Their objective value, as measured by , performs poorly as a lower bound of the true optimal transport cost (see Fig. 3 and §5 below) in most of our experiments. To improve on this, we compute the so called -transform of (Santambrogio, 2015, §1.6), defined as:
Applying a -transform on , we recover two vectors . One can show that for any feasible , we have that (Peyré & Cuturi, 2018, Prop 3.1)
When ’s top-left block is the squared Euclidean metric, this problem can be cast as that of computing the Moreau envelope of . In a Eulerian setting and when is a finite regular grid which we will consider, we can use either the linear-time Legendre transform or the Parabolic Envelope algorithm (Lucet, 2010, §2.2.1,§2.2.2) to compute the -transform in linear time with respect to the grid resolution .
Unlike dual iterations, the primal iterate does not belong to the transport polytope after a finite number of iterations. We use the rounding_to_feasible algorithm provided by Altschuler et al. (2017) to compute efficiently a feasible approximation of that does belong to . Putting these two elements together, we obtain
Therefore, after iterating the Sinkhorn map (5) times, we have that if is below a certain criterion , then we can guarantee that is a fortiori an -approximation of . Observe that one can also have a relative error control: if one has , then . Note that might be negative but can always be replaced by since we know has non-negative entries (and therefore ), while is always non-negative.
Discretization. For simplicity, we assume in the remaining that our diagrams have their support in . From a numerical perspective, encoding persistence diagrams as histograms on the square offers numerous advantages. Given a uniform grid of size on , we associate to a given diagram a matrix-shaped histogram such that is the number of points in belonging to the cell located at position in the grid (we transition to bold-faced small letters to insist on the fact that these histograms must be stored as square matrices). To account for the total mass, we add an extra dimension encoding mass on . We extend the operator to histograms, associating to a histogram its total mass on the -th coordinate. One can show that the approximation error resulting from that discretization is bounded above by (see the supplementary material).
Convolutions. In the Eulerian setting, where diagrams are matrix-shaped histograms of size , the cost matrix has size . Since we will use large values of to have low discretization error (typically ), instantiating is usually intractable. However, Solomon et al. (2015) showed that for regular grids endowed with a separable cost, each Sinkhorn iteration (as well as other key operations such as evaluating Sinkhorn’s divergence ) can be performed using Gaussian convolutions, which amounts to performing matrix multiplications of size , without having to manipulate matrices. Our framework is slightly different due to the extra dimension , but we show that equivalent computational properties hold. This observation is crucial from a numerical perspective. Our ultimate goal being to efficiently evaluate (11), (12) and (14), we provide implementation details.
Let be a pair where is a matrix-shaped histogram and is a real number accounting for the mass located on the virtual point . We denote by the column vector obtained when reshaping . The cost matrix and corresponding kernel are given by
where , . and as defined above will never be instantiated, because we can rely instead on defined as and .
Proposition 2 (Iteration of Sinkhorn map).
The application of to can be performed as:
where denotes the Froebenius dot-product in .
We now introduce and ( denotes term-wise multiplication).
Proposition 3 (Computation of ).
Let . The transport cost of can be computed as:
where the first term can be computed as:
Finally, consider two histograms , let be the rounded matrix of (see the supplementary material or (Altschuler et al., 2017)). Let denote the first and second marginal of respectively. We introduce (using term-wise min and divisions):
along with and the marginal errors:
Proposition 4 (Computation of upper bound ).
The transport cost induced by can be computed as:
Note that the first term can be computed using (12)
Parallelization and GPU. Using a Eulerian representation is particularly beneficial when applying Sinkhorn’s algorithm, as shown by Cuturi (2013). Indeed, the Sinkhorn map (5) only involves matrix-vector operations. When dealing with a large number of histograms, concatenating these histograms and running Sinkhorn’s iterations in parallel as matrix-matrix product results in significant speedup that can exploit GPGPU to compare a large number of pairs simultaneously. This makes our approach especially well-suited for large sets of persistence diagrams.
We can now estimate distances between persistence diagrams with Alg. 1 in parallel by performing only -sized matrix multiplications, leading to a computational scaling in where is the grid resolution parameter. Note that a standard stopping threshold in Sinkhorn iteration process is to check the error to marginals , as motivated by (9).
4 Smoothed barycenters for persistence diagrams
OT formulation for barycenters. We show in this section that the benefits of entropic regularization also apply to the computation of barycenters of PDs. As the space of PD is not Hilbertian but only a metric space, the natural definition of barycenters is to formulate them as Fréchet means for the metric, as first introduced (for PDs) in (Mileyko et al., 2011).
Given a set of persistence diagrams , a barycenter of is any solution of the following minimization problem:
where is defined as in (8) with (but our approach adapts easily to any finite ), and denotes the set of non-negative finite measures supported on . is the energy of .
Let denotes the restriction of to the space of persistence diagrams (finite point measures). Turner et al. (2014) proved the existence of minimizers of and proposed an algorithm that converges to a local minimum of the functional, using the Hungarian algorithm as a subroutine. Their algorithm will be referred to as the B-Munkres Algorithm. The non-convexity of can be a real limitation in practice since can have arbitrarily bad local minima (see Lemma 1 in the supplementary material). Note that minimizing instead of will not give strictly better minimizers (see Proposition 6 in the supplementary material). We then apply entropic smoothing to this problem. This relaxation offers differentiability and circumvents both non-convexity and numerical scalability.
Entropic smoothing for PD barycenters. In addition to numerical efficiency, an advantage of smoothed optimal transport is that is differentiable. In the Eulerian setting, its gradient is given by centering the vector where is a fixed point of the Sinkhorn map (5), see (Cuturi & Doucet, 2014). This result can be adapted to our framework, namely:
Let be PDs, and the corresponding histograms on a grid. The gradient of the functional is given by
where denotes the adjoint operator and is a fixed point of the Sinkhorn map obtained while transporting onto .
As in (Cuturi & Doucet, 2014), this result follows from the envelope theorem, with the added subtlety that appears in both terms depending on and . This formula can be exploited to compute barycenters via gradient descent, yielding Algorithm 2. Following (Cuturi & Doucet, 2014, §4.2), we used a multiplicative update. This is a particular case of mirror descent (Beck & Teboulle, 2003) and is equivalent to a (Bregman) projected gradient descent on the positive orthant, retaining positive coefficients throughout iterations.
As it can be seen in Fig. 4, the barycentric persistence diagrams are smeared. If one wishes to recover more spiked diagrams, quantization and/or entropic sharpening (Solomon et al., 2015, §6.1) can be applied, as well as smaller values for that impact computational speed or numerical stability. We will consider these extensions in future work.
A comparison with linear representations. When doing statistical analysis with PDs, a standard approach is to transform a diagram into a finite dimensional vector—in a stable way—and then perform statistical analysis and learning with an Euclidean structure. This approach does not preserve the Wasserstein-like geometry of the diagram space and thus loses the algebraic interpretability of PDs. Fig. 1 gives a qualitative illustration of the difference between Wasserstein barycenters (Fréchet mean) of PDs and Euclidean barycenters (linear means) of persistence images (Adams et al., 2017), a commonly used vectorization for PDs (Makarenko et al., 2016; Zeppelzauer et al., 2016; Obayashi et al., 2018).
All experiments are run with , but would work with any finite . This choice is consistent with the work of Turner et al. (2014) for barycenter estimation.
A large scale approximation. Iterations of Sinkhorn map (5) yield a transport cost whose value converges to the true transport cost as and the number of iterations (Cuturi, 2013). We quantify in Fig. 3 this convergence experimentally using the upper and lower bounds provided in (10) through and for decreasing . We consider a set of pairs of diagrams randomly generated with to points in each diagrams, and discretized on a grid. We run Alg. 1 for different ranging from to along with corresponding upper and lower bounds described in (10). For each pair of diagrams, we center our estimates by removing the true distance, so that the target cost is across all pairs. We plot median, top % and bottom % percentiles for both bounds. Using the -transform provides a much better lower bound in our experiments. This is however inefficient in practice: despite a theoretical complexity linear in the grid size, the sequential structure of the algorithms described in (Lucet, 2010) makes them unsuited for GPGPU to our knowledge.
We then compare the scalability of Alg. 1 with respect to the number of points in diagrams with that of Kerber et al. (2017) which provides a state-of-the-art algorithm with publicly available code—referred to as Hera—to estimate distances between diagrams. For both algorithms, we compute the average time to estimate a distance between two random diagrams having from to points where ranges from to . In order to compare their scalability, we plot in Fig. 5 the ratio of both algorithms, with in Alg. 1.
Fast barycenters and -means on large PD sets. We compare our Alg. 2 (referred to as Sinkhorn) to the combinatorial algorithm of Turner et al. (2014) (referred to as B-Munkres). We use the script munkres.py provided on the website of K.Turner for their implementation. We record in Fig. 6 running times of both algorithms on a set of diagrams having from to points, ranging from to , on Intel Xeon 2.3 GHz (CPU) and P100 (GPU, Sinkhorn only). When running Alg. 2, the gradient descent is performed until , with and . Our experiment shows that Alg. 2 drastically outperforms B-Munkres as the number of points increases. We interrupt B-Munkres at , after which computational time becomes an issue.
Aside the computational efficiency, we highlight the benefits of operating with a convex formulation in Fig. 7. Due to non-convexity, the B-Munkres algorithm is only guaranteed to converge to a local minima, and its output depends on initialization. We illustrate on a toy set of diagrams how our algorithm avoids local minima thanks to the Eulerian approach we take.
We now merge Alg. 1 and Alg. 2 in order to perform unsupervised clustering via -means on PDs. We work with the 3D-shape database provided by Sumner & Popović and generate diagrams in the same way as in (Carrière et al., 2015), working in practice with diagrams with to points each. The database contains classes: camel, cat, elephant, horse, head and face. In practice, this unsupervised clustering algorithm detects two main clusters: faces and heads on one hand, camels and horses on the other hand are systematically grouped together. Fig. 8 illustrates the convergence of our algorithm and the computed centroids for the aforementioned clusters.
In this work, we took advantage of a link between PD metrics and optimal transport to leverage and adapt entropic regularization for persistence diagrams. Our approach relies on matrix manipulations rather than combinatorial computations, providing parallelization and efficient use of GPUs. We provide bounds to control approximation errors. We use these differentiable approximations to estimate barycenters of PDs significantly faster than existing algorithm, and showcase their application by clustering thousand diagrams built from real data. We believe this first step will open the way for new statistical tools for TDA and ambitious data analysis applications of persistence diagrams.
We thank the anonymous reviewers for the fruitful discussion. TL was supported by the AMX, École polytechnique. MC acknowledges the support of a Chaire d’Excellence de l’Idex Paris-Saclay.
- Adams et al. (2017) Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., and Ziegelmeier, L. Persistence images: a stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35, 2017.
- Agueh & Carlier (2011) Agueh, M. and Carlier, G. Barycenters in the wasserstein space. SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011.
- Altschuler et al. (2017) Altschuler, J., Weed, J., and Rigollet, P. Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. In Advances in Neural Information Processing Systems, pp. 1961–1971, 2017.
- Anderes et al. (2016) Anderes, E., Borgwardt, S., and Miller, J. Discrete wasserstein barycenters: optimal transport for discrete data. Mathematical Methods of Operations Research, 84(2):389–409, 2016.
- Beck & Teboulle (2003) Beck, A. and Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
- Benamou et al. (2015) Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyré, G. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
- Bubenik (2015) Bubenik, P. Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1):77–102, 2015.
- Carlier et al. (2015) Carlier, G., Oberman, A., and Oudet, E. Numerical methods for matching for teams and wasserstein barycenters. ESAIM: Mathematical Modelling and Numerical Analysis, 49(6):1621–1642, 2015.
- Carrière et al. (2015) Carrière, M., Oudot, S. Y., and Ovsjanikov, M. Stable topological signatures for points on 3d shapes. In Computer Graphics Forum, volume 34, pp. 1–12. Wiley Online Library, 2015.
- Carrière et al. (2017) Carrière, M., Cuturi, M., and Oudot, S. Sliced wasserstein kernel for persistence diagrams. In 34th International Conference on Machine Learning, 2017.
- Chazal et al. (2009) Chazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L. J., and Oudot, S. Y. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry, pp. 237–246. ACM, 2009.
- Chazal et al. (2014) Chazal, F., De Silva, V., and Oudot, S. Persistence stability for geometric complexes. Geometriae Dedicata, 173(1):193–214, 2014.
- Cohen-Steiner et al. (2007) Cohen-Steiner, D., Edelsbrunner, H., and Harer, J. Stability of persistence diagrams. Discrete & Computational Geometry, 37(1):103–120, 2007.
- Cuturi (2013) Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pp. 2292–2300, 2013.
- Cuturi & Doucet (2014) Cuturi, M. and Doucet, A. Fast computation of wasserstein barycenters. In International Conference on Machine Learning, pp. 685–693, 2014.
- Dvurechensky et al. (2018) Dvurechensky, P., Gasnikov, A., and Kroshnin, A. Computational optimal transport: Complexity by accelerated gradient descent is better than by sinkhorn’s algorithm. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1367–1376, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/dvurechensky18a.html.
- Edelsbrunner & Harer (2010) Edelsbrunner, H. and Harer, J. Computational topology: an introduction. American Mathematical Soc., 2010.
- Edelsbrunner et al. (2000) Edelsbrunner, H., Letscher, D., and Zomorodian, A. Topological persistence and simplification. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pp. 454–463. IEEE, 2000.
- Fréchet (1948) Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancié. In Annales de l’institut Henri Poincaré, volume 10, pp. 215–310. Presses universitaires de France, 1948.
- Guittet (2002) Guittet, K. Extended Kantorovich norms: a tool for optimization. PhD thesis, INRIA, 2002.
- Hiraoka et al. (2016) Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E. G., Matsue, K., and Nishiura, Y. Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences, 113(26):7035–7040, 2016.
- Kerber et al. (2017) Kerber, M., Morozov, D., and Nigmetov, A. Geometry helps to compare persistence diagrams. Journal of Experimental Algorithmics (JEA), 22(1):1–4, 2017.
Li et al. (2014)
Li, C., Ovsjanikov, M., and Chazal, F.
Persistence-based structural recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1995–2002, 2014.
- Lucet (2010) Lucet, Y. What shape is your conjugate? a survey of computational convex analysis and its applications. SIAM review, 52(3):505–542, 2010.
- Lum et al. (2013) Lum, P., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., Carlsson, J., and Carlsson, G. Extracting insights from the shape of complex data using topology. Scientific reports, 3:1236, 2013.
- Makarenko et al. (2016) Makarenko, N., Kalimoldayev, M., Pak, I., and Yessenaliyeva, A. Texture recognition by the methods of topological data analysis. Open Engineering, 6(1), 2016.
- Mileyko et al. (2011) Mileyko, Y., Mukherjee, S., and Harer, J. Probability measures on the space of persistence diagrams. Inverse Problems, 27(12):124007, 2011.
- Moreau (1965) Moreau, J.-J. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France, 93(2):273–299, 1965.
- Nicolau et al. (2011) Nicolau, M., Levine, A. J., and Carlsson, G. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, 108(17):7265–7270, 2011.
- Obayashi et al. (2018) Obayashi, I., Hiraoka, Y., and Kimura, M. Persistence diagrams with linear machine learning models. Journal of Applied and Computational Topology, 1(3-4):421–449, 2018.
- Peyré & Cuturi (2018) Peyré, G. and Cuturi, M. Computational Optimal Transport, 2018. URL http://arxiv.org/abs/1803.00567.
- Reininghaus et al. (2015) Reininghaus, J., Huber, S., Bauer, U., and Kwitt, R. A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4741–4748, 2015.
- Santambrogio (2015) Santambrogio, F. Optimal transport for applied mathematicians. Birkäuser, NY, 2015.
- Schrijver (1998) Schrijver, A. Theory of linear and integer programming. John Wiley & Sons, 1998.
- Solomon et al. (2015) Solomon, J., De Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., and Guibas, L. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG), 34(4):66, 2015.
- Sumner & Popović (2004) Sumner, R. W. and Popović, J. Deformation transfer for triangle meshes. In ACM Transactions on Graphics (TOG), volume 23, pp. 399–405. ACM, 2004.
- Turner (2013) Turner, K. Means and medians of sets of persistence diagrams. arXiv preprint arXiv:1307.8300, 2013.
- Turner et al. (2014) Turner, K., Mileyko, Y., Mukherjee, S., and Harer, J. Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry, 52(1):44–70, 2014.
- Villani (2003) Villani, C. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.
- Villani (2008) Villani, C. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
- Zeppelzauer et al. (2016) Zeppelzauer, M., Zieliński, B., Juda, M., and Seidl, M. Topological descriptors for 3d surface analysis. In International Workshop on Computational Topology in Image Context, pp. 77–87. Springer, 2016.
- Zomorodian & Carlsson (2005) Zomorodian, A. and Carlsson, G. Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274, 2005.
7 Supplementary material
7.1 Omitted proofs from Section 3
Diagram metrics as optimal transport:
We recall that we consider and two persistence diagrams with respectively points and points , , and is the cost matrix with block structure
Proof of Prop. 1.
Let and . Since are point measures, that is discrete measures of same mass with integer weights at each point of their support, finding is an assignment problem of size as introduced in §2. It is equivalent to finding an optimal matching representing some permutation for the cost matrix built from by repeating the last line in total times, the last column in total times, and replacing the lower right corner by a matrix of zeros. The optimal defines a partial matching between and , defined by iff , . Such pairs of points induce a cost , while other points (referred to as unmatched) induce a cost . Then:
Error control due to discretization:
Let be two diagrams and their respective representations as histograms. For two histograms, where are diagrams deduced from respectively by moving any mass located at to , inducing at most an error of for each point. We identify and in the following. We recall that is a distance over persistence diagrams and thus satisfy triangle inequality, leading to:
Thus, the error made is upper bounded by .
We keep the same notations as in the core of the article and give details regarding the iteration schemes provided in the paper.
Proof of prop 2.
Given an histogram and a mass , one can observe that (see below):
In particular, the operation can be perform by only manipulating matrices in . Indeed, observe that:
so we have:
Thus we have in our case:
where designs the Froebenius dot product between two histograms . Note that these computations only involves manipulation of matrices with size . ∎
Proof of prop 3.
Thus, we finally have:
And finally, taking the bin into considerations,
Remark: First term correspond to the cost of effective mapping (point to point) and the two others to the mass mapped to the diagonal. ∎
To address the last proof, we recall below the rounding_to_feasible algorithm introduced by Altschuler et al.; and denotes respectively the first and second marginal of a matrix .
Proof of prop 4.
By straightforward computations, the first and second marginals of are given by:
Observe that and can be computed using Proposition 2.
Now, the transport cost computation is:
The first term is the transport cost induced by a rescaling of and can be computed with Prop 3. Consider now the second term. Without considering the additional bin , we have:
so when we consider our framework (with ), it comes:
Putting things together finally proves the claim. ∎
7.2 Omitted proofs from Section 4
We first observe that does not have local minimum (while does). For , we extend the Euclidean norm by the distance from to its orthogonal projection onto the diagonal . In particular, . We denote by the corresponding cost function (continuous analogue of the matrix defined in (8)).111Optimal transport between non-discrete measures was not introduced in the core of this article for the sake of concision. It is a natural extension of notions introduced in §2 (distances, primal and dual problems, barycenters). We refer the reader to (Santambrogio, 2015; Villani, 2008) for more details.
Proposition (Convexity of ).
For any two measures and , we have:
We denote by the dual variables involved when computing the optimal transport plan between and . Note that maximum are taken over the set (with ):