The field of unsupervised learning has evolved significantly for the past few years thanks to adversarial networks publications. In, Goodfelow et al. introduced a Generative Adversarial Network framework called GAN. It is a class of generative models that play a competitive game between two networks in which the generator network must compete against an adversary according to a game theoretic scenario . The generator network produces samples from a noise distribution and its adversary, the discriminator network, tries to distinguish real samples from generated samples, respectively samples inherited from the training data and samples produced by the generator. Meanwhile, Variational Auto-Encoders (VAE) presented by Kingma et al. in  have emerged as a well-established approach for synthetic data generation. Nevertheless, they might generate poor target distribution because of the KL divergence . We recall an AE is a neural network trained to copy its input manifold to its output manifold through a hidden layer. The encoder function sends the input space to the hidden space and the decoder function brings back the hidden space to the input space. By applying some of the Optimal Transport (OT) concepts gathered in  and noticeably, the Wasserstein distance, Arjovsky et al. introduced the Wasserstein GAN (WGAN) in . It tries to avoid the mode collapse, a typical training convergence issue occurring between the generator and the discriminator. Gulrajani et al. further optimized the concept in  by proposing a Gradient Penalty to Wasserstein GAN (GP-WGAN) capable to generate adversarial samples of higher quality. Similarly, Tolstikhin et al. in  applied the same OT concepts to AE and, therefore, introduced Wasserstein AE (WAE), a new type of AE generative model, that avoids the use of the KL divergence.
Nonetheless, the description of the distribution of the generative models, which involves the description of the generated scattered data points  based on the distribution of the original manifold , is very difficult using traditional distance measures, such as the Fréchet Inception Distance . We highlight the distribution and the manifold notations in figure 1 for GAN and in figure 2 for AE. Effectively, traditional distance measures are not able to acknowledge the shapes of the data manifolds and the scale at which the manifold should be analyzed. However, persistent homology [9, 10] is specifically designed to highlight the topological features of the data . Therefore, building upon persistent homology, Wasserstein distance  and generative models , our main contribution is to propose qualitative and quantitative ways to evaluate the scattered generated distributions and the performance of the generative models.
In this work we describe the persistent homology features of the generated model while minimizing the OT function for a squared cost where is the model distribution of the data contained in the manifold , and the distribution of the generative model capable of generating adversarial samples. Our contributions are summarized below:
A persistent homology procedure for generative models, including GP-WGAN, WGAN, WAE and VAE, which we call PHom-GeM to highlight the topological properties of the generated distributions of the data for different spatial resolutions. The objective is a persistent homology description of the generated data distribution following the generative model .
A distance measure for persistence diagrams, the bottleneck distance, applied to generative models to compare quantitatively the true and the target distributions on any data set. We measure the shortest distance for which there exists a perfect matching between the points of the two persistence diagrams. A persistence diagram is a stable summary representation of topological features of simplicial complex, a collection of vertices, associated to the data set.
Finally, we propose the first application of algebraic topology and generative models on a public data set containing credit card transactions, particularly challenging for this type of models and traditional distance measures.
The paper is structured as follows. In section II, we review the optimized GP-WGAN and WAE formulations using OT derived by Gulrajani et al. in  and Tolstikhin et al. in , respectively. By using persistence homology, we are able to compare the topological properties of the original distribution and the generated distribution . We highlight experimental results in section III and we conclude in section IV by addressing promising directions for future work.
Ii Proposed Method
Our method computes the persistence homology of both the true manifold and the generated manifold following the generative model based on the minimization of the optimal transport cost . In the resulting topological problem, the points of the manifolds are transformed to a metric space set for which a Vietoris-Rips simplicial complex filtration is applied (see definition 2). PHom-GeM achieves simultaneously two main goals: it computes the birth-death of the pairing generators of the iterated inclusions while measuring the bottleneck distance between persistence diagrams of the manifolds of the generative models.
Ii-a Optimal Transport and Dual Formulation
Following the description of the optimal transport problem  and relying on the Kantorovich-Rubinstein duality, the Wasserstein distance is computed as
where is a metric space,
is a set of all joint distributionswith marginals and respectively and is the class of all bounded 1-Lipschitz functions on .
Ii-B Gradient Penalty Wasserstein GAN (GP-WGAN)
As described in 
, the GP-WGAN objective loss function with gradient penalty is expressed such that
where is the set of 1-Lipschitz functions on , the original data distribution, the generative model distribution implicitly defined by . The input
to the generator is sampled from a noise distribution such as a uniform distribution.defines the uniform sampling along straight lines between pairs of points sampled from the data distribution and the generative distribution . A penalty on the gradient norm is enforced for random samples . For further details, we refer to  and .
Ii-C Wasserstein Auto-Encoders
As described in , the WAE objective function is expressed such that
where is any measurable cost function. In our experiments, we use a square cost function for data points . denotes the sending of to for a given map . , and , are any nonparametric set of probabilistic encoders, and decoders respectively.
We use the Maximum Mean Discrepancy (MMD) for the penalty for a positive-definite reproducing kernel
where is the reproducing kernel Hilbert space of real-valued functions mapping on . For details on the MMD implementation, we refer to .
Ii-D Persistence Diagram and Vietoris-Rips Complex
Definition 1 Let be a set of vertices. A simplex is a subset of vertices . A simplicial complex K on V is a collection of simplices , such that . The dimension of is its number of elements minus 1. Simplicial complexes examples are represented in figure 3.
Definition 2 Let be a metric space. The Vietoris-Rips complex at scale associated to is the abstract simplicial complex whose vertex set is , and where is a -simplex if and only if for all .
We obtain an increasing sequence of Vietoris-Rips complex by considering the for an increasing sequence of value of the scale parameter
-vector spaces, called thek-th persistence module of
Definition 3 , the (i,j)-persistent -homology group with coefficient in of denoted is defined to be the image of the homomorphism .
Using the interval decomposition theorem , we extract a finite family of intervals of called persistence diagram. Each interval can be considered as a point in the set . Hence, we obtain a finite subset of the set . This space of finite subsets is endowed with a matching distance called the bottleneck distance and defined as follow
where , , and the is over all the bijections from to .
Ii-E Application: PHom-GeM, Persistent Homology for Generative Models
Bridging the gap between persistent homology and generative models, PHom-GeM uses a two-steps procedure. First, the minimization problem is solved for the generator and the discriminator when considering GP-WGAN and WGAN. The gradient penalty in equation (2) is fixed equal to 10 for GP-WGAN and to 0 for WGAN. For auto-encoders, the minimization problem is solved for the encoder and the decoder
. We use RMSProp optimizer for the optimization procedure. Then, the samples of the original and generated distributions, and , are mapped to persistence homology for the description of their respective manifolds. The points contained in the manifold inherited from and the points contained in the manifold generated with are randomly selected into respective batches. Two samples, from following and from following , are selected to differentiate the topological features of the original manifold and the generated manifold . The samples and are contained in the spaces and , respectively. Then, the spaces and are transformed into metric space sets and for computational purposes. Then, we filter the metric space sets and using the Vietoris-Rips simplicial complex filtration. Given a line segment of length , vertices between data points are created for data points respectively separated from a smaller distance than . It leads to the construction of a collection of simplices resulting in Vietoris-Rips simplicial complex VR filtration. In our case, we decide to use the Vietoris-Rips simplicial complex as it offers the best compromise between the filtration accuracy and the memory requirement . Subsequently, the persistence diagrams, and , are constructed. We recall a persistence diagram is a stable summary representation of topological features of simplicial complex. The persistence diagrams allow the computation of the bottleneck distance . Finally, the barcodes represent in a simple way the birth-death of the pairing generators of the iterated inclusions detected by the persistence diagrams.
We empirically evaluate the proposed methodology PHom-GeM. We assess on a highly challenging data set for generative models whether PHom-GeM can simultaneously achieve (i) precise persistent homology mapping of the generated data points and (ii) accurate persistent homology distance measurement with the bottleneck distance.
Data Availability and Data Description
We train PHom-GeM on one real-world open data set: the credit card transactions data set from the Kaggle database111The data set is available at https://www.kaggle.com/mlg-ulb/creditcardfraud. containing 284 807 transactions including 492 frauds. This data set is particularly interesting because it reflects the scattered points distribution of the reconstructed manifold that are found during generative models’ training, impacting afterward the generated adversarial samples. Furthermore, this data set is challenging because of the strong imbalance between normal and fraudulent transactions while being of high interest for the banking industry. To preserve transactions confidentiality, each transaction is composed of 28 components obtained with PCA without any description and two additional features Time and Amount that remained unchanged. Each transaction is labeled as fraudulent or normal in a feature called Class which takes a value of 1 or 0, respectively.
Experimental Setup and Code Availability
In our experiments, we use the Euclidean latent space and the square cost function previously defined as for the data points . The dimensions of the true data set is . We kept the 28 components obtained with PCA and the amount resulting in a space of dimension 29. For the error minimization process, we used RMSProp gradient descent  with the parameters and a batch size of 64. Different values of for the gradient penalty have been tested. We empirically obtained the lowest error reconstruction with for both GP-WGAN and WAE. The coefficients of persistence homology are evaluated within the field . We only consider homology groups and who represent the connected components and the loops, respectively. Higher dimensional homology groups did not noticeably improve the results quality while leading to longer computational time. The simulations were performed on a computer with 16GB of RAM, Intel i7 CPU and a Tesla K80 GPU accelerator. To ensure the reproducibility of the experiments, the code is available at the following address222The code is available at https://github.com/dagrate/phomgem.
Results and Discussions about PHom-GeM
We test PHom-GeM, Persistent Homology for Generative Models, on four different generative models: GP-WGAN, WGAN, WAE and VAE. We compare the performance of PHom-GeM on two specificities: first, qualitative visualization of the persistence diagrams and barcodes and, secondly, quantitative estimation of the persistent homology closeness using the bottleneck distance between the generated manifoldsof the generative models and the original manifold .
On the top of figure 4, the rotated persistence and the barcode diagrams of the original sample are highlighted. In the persistence diagram, black points represent the 0-dimensional homology groups , the connected components of the complex. The red triangles represent the 1-dimensional homology group , the 1-dimensional features known as cycles or loops. The barcode diagram is a simple way of representing the information contained in the persistence diagram. For the sake of simplicity, we represent only the barcode diagram of the generative models to compare qualitatively the generated distribution of each model with respect to the distribution of the original sample. The generated distribution of GP-WGAN is the closest to the distribution followed by WGAN, WAE and VAE. Effectively, the spectrum of the barcodes of GP-WGAN is very similar to the original sample’s spectrum as well as denser on the right. On the opposite, the WAE and VAE’s distributions are not able to reproduce all of the features contained in the original distribution, therefore explaining the narrower barcode spectrum.
In order to quantitatively assess the quality of the generated distributions, we use the bottleneck distance between the persistent diagram of and the persistent diagram of of the generated data points. In table I
, we highlight the mean value of the bottleneck distance for a 95% confidence interval. We also underline the lower and the upper bounds of the 95% confidence interval for each generative model. Confirming the visual observations, we notice the smallest bottleneck distance, and therefore, the best result, is obtained with GP-WGAN, followed by WGAN, WAE and VAE. It means GP-WGAN is capable to generate data distribution sharing the most topological features with the original data distribution, including the nearness measurements and the overall shape. It confirms topologically on a real-world data set the claims addressed in of superior performance of GP-WGAN against WGAN. Furthermore, the performance of the AE cannot match the generative performance achieved by the GANs. However, the WAE, that relies on optimal transport theory, achieves better generative distribution in comparison to the popular VAE.
|Gen. Model||Mean Value||Lower Bound||Upper Bound|
Building upon optimal transport and unsupervised learning, we introduced PHom-GeM, Persistent Homology for Generative Models, a new characterization of the generative manifolds that uses topology and persistence homology to highlight manifold features and scattered generated distributions. We discuss the relations of GP-WGAN, WGAN, WAE and VAE in the context of unsupervised learning. Furthermore, relying on persistence homology, the bottleneck distance has been introduced to estimate quantitatively the topological features similarities between the original distribution and the generated distributions of the generative models, a specificity that current traditional distance measures fail to acknowledge. We conducted experiments showing the performance of PHom-GeM on the four generative models GP-WGAN, WGAN, WAE and VAE. We used a challenging imbalanced real-world open data set containing credit card transactions, capable of illustrating the scattered generated data distributions of the generative models and particularly suitable for the banking industry. We showed the superior topological performance of GP-WGAN in comparison to the other generative models as well as the superior performance of WAE over VAE. Future work will include further exploration of the topological features such as the influence of the simplicial complex and the possibility to integrate a topological optimization function as a regularization term.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
-  Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
-  Cédric Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.
Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein generative adversarial networks.
International Conference on Machine Learning, pages 214–223, 2017.
-  Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
-  Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558, 2017.
-  Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems, pages 899–907, 2013.
-  Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological Persistence and Simplification. Discrete & Computational Geometry, 28(4):511–533, 2002.
-  Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274, 2005.
-  Frédéric Chazal and Bertrand Michel. An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists. arXiv preprint arXiv:1710.04019, 2017.
-  Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, Carl-Johann Simon-Gabriel, and Bernhard Schoelkopf. From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642, 2017.
-  A. Hatcher. Algebraic Topology. Cambridge University Press, 2002.
-  Steve Y. Oudot. Persistence Theory: From Quiver Representations to Data Analysis. Number 209 in Mathematical Surveys and Monographs. American Mathematical Society, 2015.
-  Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Rmsprop: Divide the gradient by a running average of its recent magnitude. Neural networks for machine learning, Coursera lecture 6e, 2012.