PersLay: A Simple and Versatile Neural Network Layer for Persistence Diagrams

04/20/2019 ∙ by Mathieu Carrière, et al. ∙ FUJITSU Inria Columbia University 0

Persistence diagrams, a key descriptor from Topological Data Analysis, encode and summarize all sorts of topological features and have already proved pivotal in many different applications of data science. But persistence diagrams are weakly structured and therefore constitute a difficult input for most Machine Learning techniques. To address this concern several vectorization methods have been put forward that embed persistence diagrams into either finite-dimensional Euclidean spaces or implicit Hilbert spaces with kernels. But finite-dimensional embeddings are prone to miss a lot of information about persistence diagrams, while kernel methods require the full computation of the kernel matrix. We introduce PersLay: a simple, highly modular layer of learning architecture for persistence diagrams that allows to exploit the full capacities of neural networks on topological information from any dataset. This layer encompasses most of the vectorization methods of the literature. We illustrate its strengths on challenging classification problems on dynamical systems orbit or real-life graph data, with results improving or comparable to the state-of-the-art. In order to exploit topological information from graph data, we show how graph structures can be encoded in the so-called extended persistence diagrams computed with the heat kernel signatures of the graphs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

Code Repositories

perslay

Implementation of the PersLay layer for persistence diagrams


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Topological Data Analysis is a field of data science whose purpose is to capture and encode the topological features (such as the connected components, loops, cavities…) that are present in datasets in order to improve inference and prediction. Its main descriptor is the so-called persistence diagram, which takes the form of a set of points in the Euclidean plane , each point corresponding to a topological feature of the data. This descriptor has been successfully used in many different applications of data science, such as material science (Buchet et al., 2018), signal analysis (Perea & Harer, 2015), cellular data (Cámara, 2017), or shape recognition (Li et al., 2014) to name a few. This wide range of applications is mainly due to the fact that persistence diagrams encode information whose essence is topological, and as such this information tends to be complementary to the one captured by more classical descriptors.

However, the space of persistence diagrams heavily lacks structure: different persistence diagrams may have different number of points, and several basic operations are not well-defined, such as addition and mean. This dramatically impedes their capacity to be used in machine learning applications so far. To handle this issue, a lot of attention has been devoted to the vectorization of persistence diagrams through the construction of either finite-dimensional embeddings of persistence diagrams (Adams et al., 2017; Carrière et al., 2015; Chazal et al., 2015; Kališnik, 2018), i.e., embeddings turning diagrams into vectors in Euclidean space , or kernels for persistence diagrams (Bubenik, 2015; Carrière et al., 2017; Kusano et al., 2016; Le & Yamada, 2018; Reininghaus et al., 2015), i.e., generalized scalar products that implicitly turn diagrams into elements of infinite-dimensional Hilbert spaces. Unfortunately, both approaches suffer from major issues: it has been shown that finite-dimensional embeddings miss a lot of information about persistence diagrams (Carrière & Bauer, 2019), and kernel methods require to compute and store the kernel evaluations for each pair of persistence diagrams. Since all available kernels have a complexity that is at least linear, and often quadratic, in the number of persistence diagram points for a single matrix entry computation, kernel methods quickly become very expensive in terms of running time and memory usage on large sets or for large diagrams. An interesting preliminary approach has been developed in trying to feed persistence diagrams to neural networks Hofer et al. (2017), which consists in adapting and branching an existing vectorization method (the so-called persistence images Adams et al. (2017)) to a large neural network. However, there are applications for which persistence images might be inappropriate a choice, and it is generally hard to tell ahead of time which kernel or vectorization method will be the most relevant one.

In this article, we present a framework to make full use of the modularity, learning capacities and computing power of neural networks for Topological Data Analysis. Building on the recent introduction of DeepSet from (Zaheer et al., 2017) that produces a neural network formatting targeted at processing sets of points, we apply and extend that framework to the context of persistence diagrams, thus defining PersLay

: a simple, highly versatile, automatically differentiable layer for neural network architectures that can process topological information from all sorts of datasets. Our framework encompasses most of the popular vectorization and kernel methods that exist in the literature. We demonstrate its strengths on two challenging applications where we reach, improve or significantly improve over state-of-the-art classifying accuracies with performances evaluated by simple network architectures based on PersLay layers. The first is a large-scale classification application on a synthetic set of a hundred thousand dynamical system orbits. Second, we turn to a real graph classification application with benchmark datasets borrowing from the fields of biology, chemistry or social networks. Since graph data also suffer from a lack of general structure, we independently provide a robust method to efficiently encode graph information in a theoretically sound way using an extension of ordinary persistence called

extended persistence, that is computed from a family of functions defined on the graph vertices called heat kernel signatures with ensuing stability properties. Lastly, we make our contribution available as a public tensorflow-based Python package.

To summarize our contributions:

  • We introduce a simple, versatile neural network layer for persistence diagrams, called PersLay for handling topological information from all sorts of datasets, encompassing most of vectorizing approaches in the literature.

  • We showcase the strength of our method by achieving state-of-the-art effectiveness on classifying a collection of a hundred thousand synthetic orbit data from dynamical systems, as well as on difficult graph classification problems from the literature.

  • We introduce a theoretically-based approach for extracting topological information from graph data using a combination of extended persistence and graph signatures.

  • We provide a ready-to-use PersLay Python package based on tensorflow: https://github.com/MathieuCarriere/perslay.

The basics of (extended) persistence theory are first introduced in Section 2. Our general neural network layer PersLay for persistence diagrams is then defined in Section 3. Finally, the two applications showcased are presented in Section 4.1 for the dynamical system problem, and in Section 4.2 for the graph application. This Section 4.2 also informs on the family of functions used to generate persistence diagrams, the so-called heat kernel signatures with stability properties.

2 Persistence diagrams as topological features

In this section, we recall the basics of persistence (and extended persistence) diagrams. We point to (Cohen-Steiner et al., 2009; Edelsbrunner & Harer, 2010; Oudot, 2015) for a thorough description of general (extended) persistence theory for metric spaces.

Ordinary persistence. Let us consider a pair , where is a topological space, and is a real-valued function, and define the sublevel set . Making increase from to gives an increasing sequence of sets, called the filtration induced by , and which starts with the empty set and ends with the whole space . Ordinary persistence will record the time of appearance and disappearance of topological components (connected components, loops, cavities, etc.) in this sequence. For instance, one can record the value for which a new connected component appears in , called the birth time of the connected component. This connected component eventually gets merged with another for some value , and thus is stored and called the death time of the component, and one says that the component persists on the interval . Similarly, we save the values of each loop, cavity, etc. that appears in a specific sublevel set and disappears (get “filled”) in . This family of intervals is called the barcode, or persistence diagram, of , and can be represented as a multiset of points (i.e., point cloud where points are counted with multiplicity) supported on with coordinates .

In some context, ordinary persistence might not be sufficient to encode the topology of an object . For instance, consider a graph , with vertices and (non-oriented) edges . Let be a function defined on its vertices, and consider the sublevel graphs where , , and , see in Figure 1. In this sequence , loops persist forever since they never disappear from the sequence of sublevel graphs (they never get “filled”), and the same applies for whole connected components of . Moreover, branches pointing upwards (with respect to the orientation given by , see Figure 2) are missed (while those pointing downward are detected), since they do not create connected components when they appear in the sublevel graphs, making ordinary persistence unable to detect them.

Figure 1: Illustration of sublevel and superlevel graphs. Input graph along with the values of a function (blue). Sublevel graphs for respectively. Superlevel graphs for respectively.

Extended persistence. To handle the issue stated above, extended persistence refines the analysis by also looking at the superlevel set . Similarly, making decrease from to also gives a sequence of increasing subsets, for which structural changes can be recorded.

Although extended persistence can be defined for general metric spaces (see the references given above), we restrict ourselves to the case where is a graph. The sequence of increasing superlevel graphs is illustrated in Figure 1 . In particular, death times can be defined for loops and whole connected components by picking the superlevel graphs for which the feature appears again, and using the corresponding value as the death time for these features. In this case, branches pointing upwards can be detected in this sequence of superlevel graphs, in the exact same way that downwards branches were in the sublevel graphs. See Figure 2 for an illustration.

Figure 2: Extended persistence diagram computed on a graph: topological features of the graph are detected in the sequence of sublevel and superlevel graphs shown on the left of the figure. The corresponding intervals are displayed under the sequence: the black interval represents the connected component of the graph, the red one represents its downward branch, the blue one represents its upward branch, and the green one represents its loop. The extended persistence diagram given by the intervals is shown on the right.

Finally, the family of intervals of the form is turned into a multiset of points in the Euclidean plane by using the interval endpoints as coordinates. This multiset is called the extended persistence diagram of and is denoted by .

Since graphs have four types of topological features (see Figure 2), namely upwards branches, downwards branches, loops and connected components, the corresponding points in extended persistence diagrams can be of four different types. These types are denoted as , , and for downwards branches, upwards branches, connected components and loops respectively:

(1)

Note that an advantage of using extended persistence is that all diagram types can be treated similarly, in the sense that points in each type all have finite coordinates. Moreover, as each point represents a topological feature that persists along a given interval, provides explainable information about the topological structure of the pair . In practice, computing extended persistence diagrams can be efficiently done with the C++/Python Gudhi library (The GUDHI Project, 2015). Persistence diagrams are usually compared with the so-called bottleneck distance —whose proper definition is not required for this work (see e.g. (Edelsbrunner & Harer, 2010, VIII.2)). However, the resulting metric space is not Hilbert and as such, incorporating diagrams in a learning pipeline requires to design specific tools.

3 PersLay: A Neural Network Layer for Persistence Diagrams

We have seen in Section 2 how to derive topological descriptors from data, namely (extended) persistence diagrams, which we aim at using in machine learning applications. In this section, we present PersLay, our general neural network layer for (extended) persistence diagrams. To define it, we leverage a recent neural network architecture called DeepSet (Zaheer et al., 2017) that was targeted at processing sets of points.

The main purpose of the DeepSet design is to be invariant to the point orderings in the sets. Any such neural network is called a permutation invariant network. In order to achieve this, Zaheer et al. (Zaheer et al., 2017) propose to process point sets with a layer implementing the general equation: , where is the function implemented by the layer, , and is a point transformation. Their final neural network architecture is then obtained by composing with a transformation , which can be parametrized by any other neural network architecture. It is shown in (Zaheer et al., 2017, Theorem 2) that if the cardinality is the same for all sets, then for any permutation invariant function , there exist and such that . Moreover, this is still true for variable if the sets belong to some countable space.

In this work, we transpose this architecture to the context of persistence diagrams by defining and implementing a series of new permutation invariant layers that will generalize some standard tools used in Topological Data Analysis. To that end we define our generic neural network layer for persistence diagrams , that we call PersLay, through the following equation:

(2)

where op is any permutation invariant operation (such as minimum, maximum, sum, th largest value…), is a weight function for the persistence diagram points, and is a function that we call point transformation function. We emphasize that any neural network architecture can be composed with PersLay to generate a neural network architecture for persistence diagrams. Let us now introduce the three point transformation functions that we use and implement for parameter in Equation (2).

  • The triangle point transformation where the triangle function associated to a point is , with and .

  • The Gaussian point transformation , where the Gaussian function associated to a point is for a given , and .

  • The line point transformation , where the line function associated to a line with direction vector and bias is , with and lines.

Note that line point transformations are examples of permutation equivariant functions, which are specific functions defined on sets of points that were used to build the DeepSet layer in (Zaheer et al., 2017).

Formulation (2) has high representative power: it allows to remarkably encode most of classical persistence diagram representations with a very small set of point transformation functions , allowing to consider the choice of

as a hyperparameter of sort. Let us show how we connect it to popular vectorizations and kernel methods for persistence diagrams in the literature.

  • Using with samples , th largest value, , amounts to evaluating the th persistence landscape (Bubenik, 2015) on .

  • Using with samples , sum, arbitrary , amounts to evaluating the persistence silhouette weighted by  (Chazal et al., 2015) on .

  • Using with samples , sum, arbitrary , amounts to evaluating the persistence surface weighted by  (Adams et al., 2017) on . Moreover, characterizing points of persistence diagrams with Gaussian functions is also the approach advocated in several kernel methods for persistence diagrams (Kusano et al., 2016; Le & Yamada, 2018; Reininghaus et al., 2015).

  • Using where is a modification of the Gaussian point transformation defined with: for any , where if for some , and otherwise, sum, , is the approach presented in (Hofer et al., 2017).

  • Using with lines , th largest value, , is similar to the approach advocated in (Carrière et al., 2017), where the sorted projections of the points onto the lines are then compared with the norm and exponentiated to build the so-called Sliced Wasserstein kernel for persistence diagrams.

In practice, samples and lines are parameters that are optimized on the training set by the network. This means that our approach looks for the best locations to evaluate the landscapes, silhouettes and persistence surfaces, as well as the best lines to project the diagrams onto, with respect to the problem the network is trying to solve.

4 Applications and experimental results

We now introduce two different applications aimed at showcasing both the scaling power, precision and versatility of PersLay. In order to truly showcase the contribution of this layer we use a very simple network architecture, namely a two-layer network. The first layer is the PersLay layer that processes persistence diagrams. The output of this layer is then either used as such (as in the first application §4.1) or concatenated with additional features (for the graph application §4.2). The resulting vector is normalized and fed to the second and final layer, a fully connected layer whose output is used for predictions. An illustration in the context of graph classification is found Fig. 3. We emphasize that this simplistic two-layer architecture is designed so as to produce knowledge and understanding, rather than the best possible performances.

In those applications, the weight function from Equation (2) is actually treated as a trainable parameter whose values are optimized during training. More precisely, the input diagrams are scaled to , so that becomes a function defined on the unit square, that we approximate by discretizing this square into a set of pixels. The value of on each pixel is then optimized individually during training. As both and are functions defined on , it suggests—we leave it as perspective work—that they can be interpreted and visualized in ways to help explain the network learning behavior on persistence diagrams.

Our results can be reproduced with the help of Table 5 in the supplementary material with specific settings for each experiment. The implementation relies on the open source C++/Python library Gudhi (The GUDHI Project, 2015), Python packages sklearn-tda (Carrière, 2018) and tensorflow (Abadi et al., 2015), and is available at https://github.com/MathieuCarriere/perslay.

Figure 3: Network architecture illustrated in the case of our graph classification experiments (§4.2). Each graph is encoded as a set of persistence diagrams, then processed by an independent instance of PersLay. Each instance embeds diagrams in some vector space using two functions that are optimized during training and a fixed permutation-invariant operator op.

4.1 A large-scale dynamical system orbits application

The first application is demonstrated on synthetic data used as a benchmark in Topological Data Analysis (Adams et al., 2017; Carrière et al., 2017; Le & Yamada, 2018). It consists in sequences of points generated by different dynamical systems, see Hertzsch et al. (2007). Given some initial position and a parameter , we generate a point cloud following (mod 1) and (mod 1). The orbits of this dynamical system are highly dependent on parameter . More precisely, for some values of , voids can form in these orbits (see Fig. 5 in the Supplementary Material), and as such, persistence diagrams are likely to perform well when attempting to classify orbits with respect to the value of generating them. As in previous works (Adams et al., 2017; Carrière et al., 2017; Le & Yamada, 2018), we use the five different parameters and to simulate the different classes of orbits, with random initialization of and points in each simulated orbit. These point clouds are then turned into persistence diagrams using a standard geometric filtration Chazal et al. (2014), namely the AlphaComplex filtration111http://gudhi.gforge.inria.fr/python/latest/alpha_complex_ref.html in dimensions and . Doing so, we generate two datasets. The first is ORBIT5K, where for each value of , we generate orbits, ending up with a dataset of point clouds. This dataset is the same as the one used in (Le & Yamada, 2018). The second, ORBIT100K contains orbits per class, resulting in a dataset of point clouds — a scale that kernel methods cannot handle. This dataset aims to show the edge of our neural-network based approach over kernels methods when dealing with very large datasets of large diagrams. The goal is now to perform classification with the aforementioned architecture, while all the previous works dealing with this data (Adams et al., 2017; Carrière et al., 2017; Le & Yamada, 2018) use a kernel method to classify the persistence diagrams built on top of the point clouds.

Results are displayed in Table 1. Not only do we improve on previous results for ORBIT5K, with performances on ORBIT100K we also show that classification accuracy is further increased as more observations are made available. For consistency we use the same accuracy metric as (Le & Yamada, 2018), we split observations in 70%-30% training-test sets and report the average test accuracy over runs. The parameters used summarized Section C in the Supplementary Material.

Dataset PSS-K PWG-K SW-K PF-K PersLay
ORBIT5K 72.38(2.4) 76.63(0.7) 83.6(0.9) 85.9(0.8) 87.7(1.0)
ORBIT100K 89.2(0.3)
Table 1: Performance table. PSS-K, PWG-K, SW-K, PF-K stand for Persistence Scale Space Kernel (Reininghaus et al., 2015), Persistence Weighted Gaussian Kernel (Kusano et al., 2016), Sliced Wasserstein Kernel (Carrière et al., 2017) and Persistence Fisher Kernel (Le & Yamada, 2018) respectively. We report the scores given in (Le & Yamada, 2018) for competitors on ORBIT5K, and the one we obtained using PersLay for both the ORBIT5K and ORBIT100K datasets.

4.2 PersLay for graph classification

We now specialize our framework to the evaluation of graph datasets. We have seen in Section 2 how extended persistence diagrams can be computed from a graph for a given function , defined on its vertices. As the choice for this function is essential in capturing relevant topological information, we first provide more details about our choice of the family of such functions that we use in our experiments: the heat kernel signatures (HKS).

Heat kernel signatures on graphs. HKS is an example of spectral family of signatures, i.e. functions derived from the spectral decomposition of graph Laplacians, which provide informative features for graph analysis. The adjacency matrix of a graph with vertex set is the matrix . The degree matrix is the diagonal matrix defined by . The normalized graph Laplacian is the linear operator acting on the space of functions defined on the vertices of , and is represented by the matrix

. It admits an orthonormal basis of eigenfunctions

and its eigenvalues satisfy

. As the orthonormal eigenbasis is not uniquely defined, the eigenfunctions cannot be used as such to compare graphs. Instead we consider the heat kernel signatures:

Definition 4.1 ((Hu et al., 2014; Sun et al., 2009))

Given a graph and , the heat kernel signature at time is the function defined on the vertices of by .

The HKS have already been used as signatures to address graph matching problems (Hu et al., 2014) or to define spectral descriptors to compare graphs (Tsitsulin et al., 2018). These signatures rely on the distributions of values taken by the HKS but not on their global topological structures, which are encoded in their extended persistence diagrams. Moreover the following theorem shows these diagrams to be stable with respect to the bottleneck distance between persistence diagrams. The proof is found in the Supplementary Material, Section D.

Theorem 4.2

Let and let be the Laplacian matrix of a graph with vertices. Let be another graph with vertices and Laplacian matrix . Then there exists a constant only depending on and the spectrum of such that, for small enough :

(3)

Graph classification experiments. We are now ready to evaluate our architecture on a series of different graph datasets commonly used as a baseline in graph classification problems. REDDIT5K, REDDIT12K, COLLAB (from (Yanardag & Vishwanathan, 2015)) IMDB-B, IMDB-M (from (Tran et al., 2018)) are composed of social graphs. BZR, COX2, DHFR, MUTAG, PROTEINS, NCI1, NCI109, FRANKENSTEIN are graphs coming from medical or biological frameworks (also from (Tran et al., 2018)). A quantitative summary of these datasets is found in Table 4 from the Supplementary Material.

We compare performances with four other top graph classification methods. Scale-variant topo (Tran et al., 2018) uses a kernel for ordinary persistence diagrams computed on the graphs. RetGK (Zhang et al., 2018) is a kernel method for graphs that leverages attributes on the graph vertices and edges, and reaches state-of-the-art results on many datasets. Note that while the exact computation (denoted RetGK1 in Footnote 4) can be quite long, the method can be efficiently approximated (RetGK11 in Footnote 4) while preserving good accuracy scores. FGSD (Verma & Zhang, 2017) is a finite-dimensional graph embedding that does not leverage attributes, and reaches state-of-the-art results on different datasets. Finally, (Xinyi & Chen, 2019) is a neural network approach that also reaches top-tier results. One could also compare our results on the REDDIT datasets to the ones of Hofer et al. (2017), where authors also use persistence diagrams to feed a network (using as first channel a particular case of PersLay, see Section 3), achieving 54.5% and 44.5% of accuracy on REDDIT5K and REDDIT12K respectively.

In order to extract topological features, we use this general scheme: for each graph we compute the HKS filtrations at one or two time step (e.g. with values in ). After generating the corresponding extended persistence diagram induced by HKS (see §2 and §4.2), we eventually keep, for each diagram, the first points which are the farthest away from the diagonal (a specific is fixed for each dataset, see Table 5

in the supplementary material). We combine these topological features with more traditional graph features formed by the eigenvalues of the normalized graph Laplacian along with the deciles of the computed HKS (right-side channel in Figure 

3). So as to evaluate the impact of persistence diagrams, we also report classification performances for those features alone (thus unplugging the PersLay layers) denoted "Spectral + HKS"55footnotemark: 5 in Table 4.

We use the same accuracy metric as in (Zhang et al., 2018). For each dataset, we compute a final score by averaging ten 10-folds, where a single 10-fold is computed by randomly shuffling the dataset, then splitting it into 10 different parts, and finally classifying each part using the nine others for training and averaging the classification accuracy obtained throughout the folds. We report in Footnote 4

the average and standard deviation of the scores we obtain. In most cases, our approach is comparable, if not better, than state-of-the-art results, despite using a very simple neural network architecture. More importantly, it can be observed from the last two columns of Table 

4 that, for all datasets, the use of extended persistence diagrams significantly improves over using the additional features alone.

Dataset ScaleVariant11footnotemark: 1 RetGK1 * 22footnotemark: 2 RetGK11 * 22footnotemark: 2 FGSD 33footnotemark: 3 GCNN 44footnotemark: 4 Spectral + HKS 55footnotemark: 5 PersLay
REDDIT5K 56.1(0.5) 55.3(0.3) 47.8 52.9 49.7(0.3) 56.6(0.3)
REDDIT12K 48.7(0.2) 47.1(0.3) 46.6 39.7(0.1) 47.7(0.2)
COLLAB 81.0(0.3) 80.6(0.3) 80.0 79.6 67.8(0.2) 76.4(0.4)
IMDB-B 72.9 71.9(1.0) 72.3(0.6) 73.6 73.1 67.6(0.6) 70.9(0.7)
IMDB-M 50.3 47.7(0.3) 48.7(0.6) 52.4 50.3 44.5(0.4) 48.7(0.6)
BZR * 86.6 80.8(0.8) 87.2(0.7)
COX2 * 78.4 80.1(0.9) 81.4(0.6) 78.2(1.3) 81.6(1.0)
DHFR * 78.4 81.5(0.9) 82.5(0.8) 69.5(1.0) 81.8(0.8)
MUTAG * 88.3 90.3(1.1) 90.1(1.0) 92.1 86.7 85.8(1.3) 89.8(0.9)
PROTEINS * 72.6 75.8(0.6) 75.2(0.3) 73.4 76.3 73.5(0.3) 74.8(0.3)
NCI1 * 71.6 84.5(0.2) 83.5(0.2) 79.8 78.4 65.3() 72.8(0.3)
NCI109 * 70.5 78.8 64.9(0.2) 71.7(0.3)
FRANKENSTEIN 69.4 62.9(0.1) 70.7(0.4)
Table 2: Mean accuracies and standard deviations over ten 10-folds, for (Zhang et al., 2018)22footnotemark: 2 and our own evaluations (right hand side). In (Tran et al., 2018)11footnotemark: 1 the average accuracy over 100 random splits at different proportions of training data is reported, and in (Verma & Zhang, 2017)33footnotemark: 3 and (Xinyi & Chen, 2019)44footnotemark: 4 the mean accuracy over a single 10-fold is reported. The * indicates those datasets that contain attributes (labels) on graph nodes and symmetrically the methods that leverage such attributes for classification purposes. Blue color represents the best method that does not use node attributes (when there are), and bold fonts mark the overall best score on a given problem.

5 Conclusion

In this article, we propose a versatile, powerful and simple neural network layer to process persistence diagrams called PersLay, which generalizes most of the techniques used to vectorize persistence diagrams that can be found in the literature—while optimizing them task-wise. Our code is freely available publicly at https://github.com/MathieuCarriere/perslay. We showcase the efficiency of our approach by achieving state-of-the-art results on synthetic orbit classification coming from dynamical systems and several graph classification problems from real-life data, while working at larger scales than kernel methods developed for persistence diagrams and remaining simpler than most of its neural network competitors. We believe that PersLay has the potential to become a central tool to incorporate topological descriptors in a wide variety of complex machine learning tasks.

References

  • Abadi et al. [2015] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.
  • Adams et al. [2017] Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., and Ziegelmeier, L. Persistence images: a stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8), 2017.
  • Bubenik [2015] Bubenik, P. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16(77):77–102, 2015.
  • Buchet et al. [2018] Buchet, M., Hiraoka, Y., and Obayashi, I. Persistent homology and materials informatics. In Nanoinformatics, pp. 75–95. 2018.
  • Cámara [2017] Cámara, P. Topological methods for genomics: present and future directions. Current Opinion in Systems Biology, 1:95–101, feb 2017.
  • Carrière [2018] Carrière, M. sklearn-tda: a scikit-learn compatible python package for Machine Learning and TDA. https://github.com/MathieuCarriere/sklearn_tda, 2018.
  • Carrière & Bauer [2019] Carrière, M. and Bauer, U. On the metric distortion of embedding persistence diagrams into separable Hilbert spaces. Accepted for publication in International Symposium on Computational Geometry, 2019.
  • Carrière et al. [2015] Carrière, M., Oudot, S., and Ovsjanikov, M. Stable topological signatures for points on 3d shapes. In Computer Graphics Forum, volume 34, pp. 1–12. Wiley Online Library, 2015.
  • Carrière et al. [2017] Carrière, M., Cuturi, M., and Oudot, S. Sliced Wasserstein kernel for persistence diagrams. In International Conference on Machine Learning, volume 70, pp. 664–673, jul 2017.
  • Chazal et al. [2014] Chazal, F., de Silva, V., and Oudot, S. Persistence stability for geometric complexes. Geometriae Dedicata, 173(1):193–214, 2014.
  • Chazal et al. [2015] Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., and Wasserman, L. Stochastic convergence of persistence landscapes and silhouettes. Journal of Computational Geometry, 6(2):140–161, 2015.
  • Chazal et al. [2016] Chazal, F., de Silva, V., Glisse, M., and Oudot, S. The structure and stability of persistence modules. Springer International Publishing, 2016.
  • Cohen-Steiner et al. [2009] Cohen-Steiner, D., Edelsbrunner, H., and Harer, J. Extending persistence using Poincaré and Lefschetz duality. Foundations of Computational Mathematics, 9(1):79–103, feb 2009.
  • Edelsbrunner & Harer [2010] Edelsbrunner, H. and Harer, J. Computational topology: an introduction. American Mathematical Society, 2010.
  • Hertzsch et al. [2007] Hertzsch, J.-M., Sturman, R., and Wiggins, S. Dna microarrays: design principles for maximizing ergodic, chaotic mixing. Small, 3(2):202–218, 2007.
  • Hofer et al. [2017] Hofer, C., Kwitt, R., Niethammer, M., and Uhl, A. Deep learning with topological signatures. In Advances in Neural Information Processing Systems, pp. 1634–1644, 2017.
  • Hu et al. [2014] Hu, N., Rustamov, R., and Guibas, L. Stable and informative spectral signatures for graph matching. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pp. 2305–2312, 2014.
  • Kališnik [2018] Kališnik, S. Tropical coordinates on the space of persistence barcodes. Foundations of Computational Mathematics, pp. 1–29, jan 2018.
  • Kingma & Ba [2014] Kingma, D. and Ba, J. Adam: a method for stochastic optimization. arXiv, dec 2014.
  • Kusano et al. [2016] Kusano, G., Hiraoka, Y., and Fukumizu, K. Persistence weighted Gaussian kernel for topological data analysis. In International Conference on Machine Learning, volume 48, pp. 2004–2013, jun 2016.
  • Le & Yamada [2018] Le, T. and Yamada, M. Persistence Fisher kernel: a Riemannian manifold kernel for persistence diagrams. In Advances in Neural Information Processing Systems, pp. 10027–10038, 2018.
  • Li et al. [2014] Li, C., Ovsjanikov, M., and Chazal, F. Persistence-based structural recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2003–2010, jun 2014.
  • Oudot [2015] Oudot, S. Persistence theory: from quiver representations to data analysis. American Mathematical Society, 2015.
  • Perea & Harer [2015] Perea, J. and Harer, J. Sliding windows and persistence: an application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3):799–838, jun 2015.
  • Reininghaus et al. [2015] Reininghaus, J., Huber, S., Bauer, U., and Kwitt, R. A stable multi-scale kernel for topological machine learning. In IEEE Conference on Computer Vision and Pattern Recognition, 2015.
  • Sun et al. [2009] Sun, J., Ovsjanikov, M., and Guibas, L. A concise and provably informative multi-scale signature based on heat diffusion. Computer graphics forum, 28:1383–1392, 2009.
  • The GUDHI Project [2015] The GUDHI Project. GUDHI User and Reference Manual. GUDHI Editorial Board, 2015. URL http://gudhi.gforge.inria.fr/doc/latest/.
  • Tran et al. [2018] Tran, Q. H., Vo, V. T., and Hasegawa, Y. Scale-variant topological information for characterizing complex networks. arXiv preprint arXiv:1811.03573, 2018.
  • Tsitsulin et al. [2018] Tsitsulin, A., Mottin, D., Karras, P., Bronstein, A., and Müller, E. Netlsd: hearing the shape of a graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2347–2356. ACM, 2018.
  • Verma & Zhang [2017] Verma, S. and Zhang, Z.-L. Hunt for the unique, stable, sparse and fast feature learning on graphs. In Advances in Neural Information Processing Systems, pp. 88–98, 2017.
  • Xinyi & Chen [2019] Xinyi, Z. and Chen, L. Capsule graph neural network. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Byl8BnRcYm.
  • Yanardag & Vishwanathan [2015] Yanardag, P. and Vishwanathan, S. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pp. 1365–1374, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3664-2. doi: 10.1145/2783258.2783417. URL http://doi.acm.org/10.1145/2783258.2783417.
  • Zaheer et al. [2017] Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., and Smola, A. Deep sets. In Advances in Neural Information Processing Systems, pp. 3391–3401, 2017.
  • Zhang et al. [2018] Zhang, Z., Wang, M., Xiang, Y., Huang, Y., and Nehorai, A.

    RetGK: Graph Kernels based on Return Probabilities of Random Walks.

    In Advances in Neural Information Processing Systems, pp. 3968–3978, 2018.

Appendix A Diagram vectorization techniques

(a) Example of triangle functions on a persistence diagram with four points . Note that is not displayed since it is the null function.
(b) Example of Gaussian functions on a persistence diagram with four points .
(c) Example of line functions for two lines on a persistence diagram with four points . We use to denote the th coordinate of a vector.
Figure 4: Illustration of commonly used diagram vectorizations that are particular cases of PersLay.

Appendix B Datasets description

Tables 3 and 4 summarizes key information for each dataset for both our experiments. We also provide an illustration of the orbit we generated (§4.1).

Dataset Nb of orbit observed Number of classes Number of points per orbit
ORBIT5K 5000 5 1000
ORBIT100K 100000 5 1000
Table 3: Description of the two orbits dataset we generated. The five classes correspond to the five parameter choices for . In both ORBIT5K and ORBIT100K, classes are balanced.
Figure 5: Some example of orbits generated by the different choices of (three simulations are represented for the different values of ).
Dataset Nb graphs Nb classes Av. nodes Av. Edges Av.  Av. 
REDDIT5K 5000 5 508.5 594.9 3.71 90.1
REDDIT12K 12000 11 391.4 456.9 2.8 68.29
COLLAB 5000 3 74.5 2457.5 1.0 2383.7
IMDB-B 1000 2 19.77 96.53 1.0 77.76
IMDB-M 1500 3 13.00 65.94 1.0 53.93
BZR 405 2 35.75 38.36 1.0 3.61
COX2 467 2 41.22 43.45 1.0 3.22
DHFR 756 2 42.43 44.54 1.0 3.12
MUTAG 188 2 17.93 19.79 1.0 2.86
PROTEINS 1113 2 39.06 72.82 1.08 34.84
NCI1 4110 2 29.87 32.30 1.19 3.62
NCI109 4127 2 29.68 32.13 1.20 3.64
FRANKENSTEIN 4337 2 16.90 17.88 1.09 2.07
Table 4: Datasets description. (resp. ) stands for the th-Betti-number (resp. st), that is the number of connected components (resp. cycles) in a graph. In particular, an average means that all graph in the dataset are connected, and in this case .

Appendix C Parameters used in our experiments

Dataset Func. used PD preproc. DeepSet Channel Optim.
ORBIT5K , prom(400) Pm(25,25,10,top-5) adam(0.01, 0., 300)
ORBIT100K , prom(400) Pm(25,25,10,top-5) adam(0.01, 0., 300)
REDDIT5K prom(300) Pm(25,25,10,sum) adam(0.01, 0.99, 500)
REDDIT12K prom(400) Pm(5,5,10,sum) adam(0.01, 0.99, 1000)
COLLAB , prom(200) Pm(5,5,10,sum) adam(0.01, 0.9, 1000)
IMDB-B , prom(200) Im(20,(10,2),20,sum) adam(0.01, 0.9, 500)
IMDB-M , prom(500) Im(10,(10,2),10,sum) adam(0.01, 0.9, 500)
BZR , Im(15,(10,2),10,sum) adam(0.01, 0.9, 100)
COX2 , Im(20,(10,2),20,sum) adam(0.01, 0.9, 500)
DHFR , Im(20,(10,2),20,sum) adam(0.01, 0.9, 500)
MUTAG Im(20,(10,2),20,sum) adam(0.01, 0.9, 300)
PROTEINS prom(200) Im(15,(10,2),10,sum) adam(0.01, 0.9, 70)
NCI1 , Pm(25,25,10,sum) adam(0.01, 0.9, 300)
NCI109 , Pm(25,25,10,sum) adam(0.01, 0.9, 300)
FRANKENSTEIN Im(20,(10,2),20,sum) adam(0.01, 0.9, 300)

Table 5: Settings used to generate our experimental results.

Input data was fed to the network with mini-batches of size 128. For each dataset, various parameters are given (extended persistence diagrams, neural network architecture, optimizers, etc.) that were used to obtain the scores from Footnote 4. In Table 5, we use the following shortcuts:

  • : persistence diagrams obtained with Gudhi’s -dimensional AlphaComplex filtration.

  • : extended persistence diagram obtained with HKS on the graph with parameter .

  • prom(): preprocessing step selecting the points that are the farthest away from the diagonal.

  • PersLay channel Im(, (, ), , op) stands for a function obtained by using a Gaussian point transformation sampled on grid on the unit square followed by a convolution with filters of size , for a weight function optimized on a grid and for an operation op.

  • PersLay channel Pm(, , , op) stands for a function obtained by using a line point transformation with lines followed by a permutation equivariant function [33] in dimension , for a weight function optimized on a grid and for an operation op.

  • adam() stands for the ADAM optimizer [19] with learning rate , using an Exponential Moving Average222https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage with decay rate , and run during epochs.

Appendix D Proof of Theorem 4.2

The proof directly follows from the following two theorems. This first one, proved in [17], is a consequence of classical arguments from matrix perturbation theory.

Theorem D.1 ([17], Theorem 1)

Let and let be the Laplacian matrix of a graph with vertices. Let , be the distinct eigenvalues of and denote by the smallest distance between two distinct eigenvalues: . Let be another graph with vertices and Laplacian matrix with , where denotes the Frobenius norm of . Then, if , there exists a constant such that for any vertex ,

if , there exists two constants such that for any vertex ,

In particular, if , there exists a constant - notice that also depends on - such that in the two above cases,

Theorem 4.2 then immediately follows from the second following theorem, which is a special case of general stability results for persistence diagrams.

Theorem D.2 ([12, 13])

Let be a graph and be two functions defined on its vertices. Then:

(4)

where stands for the so-called bottleneck distance between persistence diagrams and . Moreover, this inequality is also satisfied for each of the subtypes and individually.