Log In Sign Up

Geometry Score: A Method For Comparing Generative Adversarial Networks

by   Valentin Khrulkov, et al.

One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation. Our algorithm can be applied to datasets of an arbitrary nature and is not limited to visual data. We test the obtained metric on various real-life models and datasets and demonstrate that our method provides new insights into properties of GANs.


On the Evaluation of Generative Adversarial Networks By Discriminative Models

Generative Adversarial Networks (GANs) can accurately model complex mult...

Using generative adversarial networks to synthesize artificial financial datasets

Generative Adversarial Networks (GANs) became very popular for generatio...

GANalyze: Toward Visual Definitions of Cognitive Image Properties

We introduce a framework that uses Generative Adversarial Networks (GANs...

An Unsupervised Way to Understand Artifact Generating Internal Units in Generative Neural Networks

Despite significant improvements on the image generation performance of ...

Generative Adversarial Networks for geometric surfaces prediction in injection molding

Geometrical and appearance quality requirements set the limits of the cu...

Implicit Manifold Learning on Generative Adversarial Networks

This paper raises an implicit manifold learning perspective in Generativ...

Spatial Frequency Bias in Convolutional Generative Adversarial Networks

As the success of Generative Adversarial Networks (GANs) on natural imag...

1 Introduction

Generative adversarial networks (GANs) (Goodfellow et al., 2014) are a class of methods for training generative models, which have been recently shown to be very successful in producing image samples of excellent quality. They have been applied in numerous areas (Radford et al., 2015; Salimans et al., 2016; Ho & Ermon, 2016). Briefly, this framework can be described as follows. We attempt to mimic a given target distribution by constructing two networks and

called the generator and the discriminator. The generator learns to sample from the target distribution by transforming a random input vector

z to a vector , and the discriminator learns to distinguish the model distribution from

. The training procedure for GANs is typically based on applying gradient descent in turn to the discriminator and the generator in order to minimize a loss function. Finding a good loss function is a topic of ongoing research, and several options were proposed in

(Mao et al., 2016; Arjovsky et al., 2017).

One of the main challenges (Lucic et al., 2017; Barratt & Sharma, 2018)

in the GANs framework is estimating the quality of the generated samples. In traditional GAN models, the discriminator loss cannot be used as a metric and does not necessarily decrease during training. In more involved architectures such as WGAN

(Arjovsky et al., 2017) the discriminator (critic) loss is argued to be in correlation with the image quality, however, using this loss as a measure of quality is nontrivial. Training GANs is known to be difficult in general and presents such issues as mode collapse when fails to capture a multimodal nature of and in extreme cases all the generated samples might be identical. Several techniques to improve the training procedure were proposed in (Salimans et al., 2016; Gulrajani et al., 2017).

In this work, we attack the problem of estimating the quality and diversity of the generated images by using the machinery of topology. The well-known Manifold Hypothesis

(Goodfellow et al., 2016) states that in many cases such as the case of natural images the support of the distribution is concentrated on a low dimensional manifold in a Euclidean space. This manifold is assumed to have a very complex non-linear structure and is hard to define explicitly. It can be argued that interesting features and patterns of the images from can be analyzed in terms of topological properties of , namely in terms of loops and higher dimensional holes in . Similarly, we can assume that is supported on a manifold (under mild conditions on the architecture of the generator this statement can be made precise (Shao et al., 2017)), and for sufficiently good GANs this manifold can be argued to be quite similar to (see Fig. 1

). This intuitive claim will be later supported by numerical experiments. Based on this hypothesis we develop an approach which allows for comparing the topology of the underlying manifolds for two point clouds in a stochastic manner providing us with a visual way to detect mode collapse and a score which allows for comparing the quality of various trained models. Informally, since the task of computing the precise topological properties of the underlying manifolds based only on samples is ill-posed by nature, we estimate them using a certain probability distribution (see

Section 4).

We test our approach on several real–life datasets and popular GAN models (DCGAN, WGAN, WGAN-GP) and show that the obtained results agree well with the intuition and allow for comparison of various models (see Section 5).

Figure 1: The Manifold Hypothesis suggests that in the case of natural images the data is supported on a low dimensional data manifold . Similarly, GANs sample images from an immersed manifold . By comparing topological properties of the manifolds and we can get insight in how strongly GAN captured intricacies in the data distribution , and quantitatively estimate the difference.

2 Main idea

Let us briefly discuss our approach before dwelling into technical details. As described in the introduction we would like to compare topological properties of and in some way. This task is complicated by the fact that we do not have access to the manifolds themselves but merely to samples from them. A natural approach in this case is to approximate these manifolds using some simpler spaces in such a way that topological properties of these spaces resemble those of and .

Figure 2: Simplicial complex. Topological space is constructed from several edges () and a two dimensional face .

The main example of such spaces are simplicial complexes (Fig. 2), which are build from intervals, triangles and other higher dimensional simplices. In order to reconstruct the underlying manifold using a simplicial complex several methods exist. In all such approaches proximity information of the data is used, such as pairwise distances between samples. Typically one chooses some threshold parameter and based on the value of this parameter one decides which simplices are added into the approximation (see Fig. 3).

Figure 3: A simplicial complex constructed on a sample . First, we fix the proximity parameter . Then we take balls of the radius centered at each point, and if for some subset of of size all the pairwise intersections of the corresponding balls are non-empty, we add the -dimensional simplex spanning this subset to the simplicial complex .
Figure 4: Using different values of the proximity parameter we obtain different simplicial complexes (a). For the balls do not intersect and there are just isolated components (b, [left]). For several components have merged and one loop appeared (b, [middle]). The filled triangle corresponding to the triple pairwise intersection is topologically trivial and does not affect the topology (and similarly darker tetrahedron on the right). For all the components merged into one and the same hole still exists (b, [right]). In the interval one smaller hole as on Fig. 3 appeared and quickly disappeared. This information can be conveniently summarized in the persistence barcode (c). The number of connected components (holes) in the simplicial complex for some value is given by the number of intervals in () intersecting the vertical line .

However a single value is not enough — for very small values the reconstructed space will be just a disjoint union of points and for very large it will be a single connected blob, while the correct approximation is somewhere in between. This issue is resolved by considering a family (Fig. 4, a) of simplicial complexes, parametrized by the (‘persistence’) parameter . It is also convenient to refer to the parameter as time, with the idea that we gradually throw more simplices into our simplicial complex as time goes by. For each value of we can compute topological properties of the corresponding simplicial complex, namely homology which encodes the number of holes of various dimensions in a space. Controlling the value of allows us to decide holes of which size are meaningful and should not be discarded as a noise. For simplicial complex presented on Fig. 3 there are two one-dimensional holes, and for slightly bigger value of the lower hole disappeared (Fig. 4, b), while the top one remained intact, which suggests that the top hole is more important topological feature. Information about how homology is changing with respect to can be conveniently encoded in the so-called persistence barcodes (Ghrist, 2008; Zomorodian & Carlsson, 2005). An example of such barcode is given on (Fig. 4, c). In general, to find the rank of -homology (delivering the number of -dimensional holes) at some fixed value one has to count intersections of the vertical line with the intervals at the desired block .

These barcodes provide a way to compare topological properties of the underlying manifolds. In principle, we could obtain a metric of similarity of two datasets by comparing the barcodes of the simplicial complexes constructed based on each dataset (as described on Fig. 3), but there are disadvantages of this approach, such as a huge number of simplices for large datasets. Moreover, in order to extract interesting topological properties from such large simplicial complexes various tricks are required (Ghrist, 2008). To remedy these issues we can note that we are in fact interested in topological approximations rather than geometrical. The difference is that to obtain a correct estimate of the topological properties much smaller number of simplices is often sufficient, e.g., for any number of points sampled from a circle the correct answer could be obtained by taking just three points (thus obtaining a triangle which is topologically equivalent to the circle). Based on these ideas the so-called witness complex is introduced (De Silva & Carlsson, 2004), which provides a topological approximation with a small number of simplices. In order to achieve this a small subset of landmark points is chosen and a simplicial complex is constructed using these points as vertices (while also taking into account the proximity information about all the remaining points called witnesses).

To construct a numerical measure which could be compared across datasets we would like to estimate the correct values of homology. Comparing the computed barcodes is a challenging task since they are non-trivial mathematical objects (though some metrics exist they are hard to compute). We take the simpler route and to extract meaningful topological data from the barcode we propose computing Relative Living Times (RLT) of each number of holes that was observed. They are defined as the ratio of the total time when this number was present and of the value when points connect into a single blob. These relative living times could be interpreted as a confidence in our approximation — if say for of all period of topological activity we have observed that there is at least one-dimensional hole (as on Fig. 4), then it is probably an accurate estimation of topology of the underlying space.

Choosing the correct landmarks is a nontrivial task. We follow the discussion in (De Silva & Carlsson, 2004) which advises doing it randomly. To account for this randomness, we compute the RLT stochastically by repeating the experiment a large number of times. By averaging the obtained RLT we compute the Mean Relative Living Times (MRLT).

Figure 5: Estimation of the topology of a dataset sampled from the

-dimensional hyperplane in

-dimensional space. With high confidence, we can say that there are no -dimensional holes. For details see Section 4.

By construction, they add up to and employing Bayesian point of view we can interpret them as a probability distribution reflecting our confidence about the correct number of holes on average. An example of such distribution is given on Fig. 5, where we run our method for a simple planar dataset (in a high dimensional space). To quantitatively evaluate the topological difference between two datasets we propose computing the

–error between these distributions. Note that in practice (when activation functions such as ReLU are used) the resulting space

may fail to be a manifold in precise mathematical sense, however, the analysis is still applicable since it deals with arbitrary topological spaces. Now let us introduce all the technical details.

3 Homology to the rescue

In this section we briefly discuss the important concepts of simplicial complexes and homology. For thorough introduction we refer the reader to the classical texts such as (Hatcher, 2002; May, 1999).

Simplicial complexes

Simplicial complex is a classical concept widely used in topology. Formally it is defined as follows.

Definition 1.

A simplicial complex (more precisely an abstract simplicial complex) is specified by the following data:

  • [noitemsep,topsep=0pt]

  • The vertex set

  • A collection of simplices , where -dimensional simplex is defined just as a element subset of :

  • We require that the collection is closed under taking faces, that is for each -dimensional simplex all the -dimensional simplices obtained by deleting one of the vertices are also elements of .

An example of a simplicial complex is presented on Fig. 2. It contains vertices and several edges and faces: two-dimensional face and one-dimensional edges . Note that these are maximal simplices, since by the third property all the edges of are also elements of . Important topological properties of (such as connectedness, existence of one-dimensional loop) do not depend on in which Euclidean space is embedded or on precise positions of vertices, but merely on the combinatorial data — the number of points and which vertices together span a simplex.

As was described in Section 2 given a dataset sampled from a manifold we would like to compute a family of simplicial complexes topologically approximating on various scales, namely witness complexes. This family is defined as follows. First we choose some subset of points called landmarks (whereas points in are called witnesses) and some distance function , e.g., the ordinary Euclidean distance. There is not much theory about how to choose the best landmarks, but several strategies were proposed in (De Silva & Carlsson, 2004). The first one is to choose landmarks sequentially by solving a certain minimax problem, and the second one is to just pick landmarks at random (by uniformly selecting a fixed number of points from

). We follow the second approach since the minimax strategy is known to have some flaws such as the tendency to pick up outliers. The selected landmarks will serve as the vertices of the simplicial complex and witnesses will help to decide on which simplices are inserted via a predicate “is witnessed”:


with being a relaxation parameter which provides us with a sequence of simplicial complexes. The maximal value of for the analysis is typically chosen to be proportional to the maximal pairwise distance between points in . Witness complexes even for small values of are good topological approximations to . The main advantage of a witness complex is that it allows constructing a reliable approximation using a relatively small number of simplices and makes the problem tractable even for large datasets. Even though it is known that in some cases it may fail to recover the correct topology (Boissonnat et al., 2009), it still can be used to compare topological properties of datasets, and if any better method is devised, we can easily replace the witness complex by this new more reliable simplicial complex.


The precise definition of the homology is technical, and we have to omit it due to the limited space. We refer the reader to [Chapter 2] (Hatcher, 2002) for a thorough discussion. The most important properties of homology can be summarized as follows. For any topological space the so-called homology groups are introduced. The actual number of -dimensional holes in is given the rank of , the concept which is quite similar to the dimension of a vector space. These ranks are called the Betti numbers and serve as a coarse numerical measure of homology.

Homology is known to be one of the most easily computable topological invariants. In the case of being a simplicial complex can be computed by pretty much linear algebra, namely by analyzing kernels and images of certain linear maps. Dimensions of matrices appearing in this task are equal to the numbers of simplices of specific dimension in , e.g. in the case of Fig. 2 we have and matrices will be of sizes and . Existent algorithms (Kaczynski et al., 2006) can handle extremely large simplicial complexes (with millions of simplices) and are available in numerous software packages. An important property of homology is that homology depends only on simplices of dimension at most , which significantly speeds up computations.

Persistent homology

In Section 2 we discussed that to find a proxy of the correct topology of it is insufficient to use single simplicial complex but rather a family of simplicial complexes is required. As we transition from one simplicial complex to another, some holes may appear, and some disappear. To distinguish between which are essential and which should be considered noise the concept of persistence was introduced (Edelsbrunner et al., 2000; Zomorodian & Carlsson, 2005). The formal Structure Theorem (Zomorodian & Carlsson, 2005) states that for each generator of homology (“hole” in our notation) one could provide the time of its “birth” and “death”. This data is pictorially represented as (Fig. 4, [bottom]), with the horizontal axis representing the parameter and the vertical axis representing various homology generators. To perform the computation of these barcodes, an efficient algorithm was proposed in (Zomorodian & Carlsson, 2005). As an input to this algorithm one has to supply a sequence of , with being a simplex and being its time of appearance in a family. This algorithm is implemented in several software packages such as Dionysus and GUDHI (Maria et al., 2014), but the witness complex is supported only in the latter.

4 Algorithm

Let us now explain how we apply these concepts to construct a metric to compare the topological properties of two datasets. First let us define the key part of the algorithm – the relative living times (RLT) of homology. Suppose that for a dataset and some choice of landmarks we have obtained a persistence barcode with the persistence parameter spanning the range . Let us fix the dimension in which we study the homology, and let be the collection of persistence intervals in this dimension. Then in order to find the Betti number for a fixed value one has to count the number of persistence intervals containing , and we obtain the integer valued function


Then the RLT are defined as follows (for non-negative integers ):


that it is for each possible value of we find how long it existed relatively to the whole period of topological activity. Note that in our analysis we use witness complexes which depend on the choice of landmarks, which is random. Thus it is reasonable to consider the distribution of on the set of landmarks (tuples of points), in other words, we repeatedly sample the landmarks and compute the RLT of the obtained persistence barcode. After sufficiently many experiments we can approximate the Mean Relative Living Times (MRLT):


We hypothesize that these quantities provide us with a good way to compare the topological properties of datasets, as they serve as measures of confidence in the estimation of the topology of the underlying manifolds. From Eq. 3 it follows that

which suggest that for a fixed value of we could interpret as a probability distribution (over integers). This distribution defines our certainty about the number of -dimensional holes in the underlying manifold of on average. In this work we consider the case , i.e. we study the first homology of datasets. We motivate this by drawing an analogy with the Taylor series: we can get a good understanding of behavior of a function by looking at the first term of the series (see also (Ghrist, 2008) for discussion). Based on the probabilistic understanding given two datasets and we define a measure of their topological similarity (Geometry Score) in the following way:


with being an upper bound on for and (for typical datasets we found that suffices).

To construct the witness complex given the sets of landmarks and witnesses one has to provide the matrix of pairwise distances between and and the maximal value of persistence parameter (see Eq. 1). In our experiments, we have chosen to be proportional to the maximal pairwise distance between points in with some coefficient . Since we only compute the simplices of dimension at most are needed. In principle to compare two datasets any value of suffices, however in our experiments we found that to get a reasonable distribution for datasets of size the value yields good results (for large a lot of time is spend in the regime of a single connected blob which shifts the distributions towards ). We summarize our approach in Algorithm 1 and Algorithm 2. We also suggest that to obtain accurate results datasets of the same size should be used for comparison

  Require: : array representing the dataset
  Require: : Number of landmarks to use
  Require: : Coefficient determining
  Require: : Upper bound on in
  Require: : Number of experiments
  Require: dist: Function computing the matrix of pairwise (Euclidean) distances between samples from and
  Require: witness: Function computing the family of witness complexes using the matrix of pairwise distances , maximal value of persistence parameter and maximal dimension of simplices
  Require: persistence: Function computing the persistence intervals of a family in dimension
  Returns: An array of size of the obtained RLT for each experiment
  for  to  do
     for  to  do
        Compute using Eqs. 2 and 3
     end for
  end for
Algorithm 1 The algorithm to compute RLT of a dataset. See Section 4 for details. Suggested default values of the parameters for a dataset are , , , .
  Require: : arrays representing the datasets
  Returns: : a number representing the topological similarity of and
  For and run Algorithm 1 with the same collection of parameters, obtaining arrays and
Algorithm 2 Geometry Score, the proposed algorithm to compute topological similarity between datasets


Let us briefly discuss the complexity of each step in the main loop of Algorithm 1. Suppose that we have a dataset . Computing the matrix of pairwise distances between all points in the dataset and the landmarks points requires operations. The complexity of the next piece involving computing the persistence barcode is hard to estimate, however we can note that it does not depend on the dimensionality of the data. In practice this computation is done faster than computing the matrix in the previous step (for datasets of significant dimensionality). All the remaining pieces of the algorithm take negligible amount of time. This linear scaling of the complexity w.r.t dimensionality of the data allows us to apply our method even for high–dimensional datasets. On a typical laptop (3.1 GHz Intel Core i5 processor) one iteration of the inner loop of Algorithm 1 for one class of the MNIST dataset takes approximately ms.

5 Experiments

Experimental setup

We have implemented Algorithms 2 and 1 in Python using GUDHI111 for computing witness complexes and persistence barcodes. Our code is available on Github222 Default values of parameters in Algorithm 1 were used for experiments unless otherwise specified. We test our method on several datasets and GAN models:

  • Synthetic data — on synthetic datasets we demonstrate that our method allows for distinguishing the datasets based on their topological properties.

  • MNIST — as the next experiment we test our approach on the MNIST dataset of handwritten digits. We compare two recently proposed models: WGAN (Arjovsky et al., 2017) and WGAN-GP (Gulrajani et al., 2017) in order to verify if the improved model WGAN-GP indeed produces better images.

  • CelebA — to demonstrate that our method can be applied to datasets of large dimensionality we analyze the CelebA dataset (Liu et al., 2015) and check if we can detect mode collapse in a GAN using MRLT.

  • CaloGAN — as the final experiment we apply our algorithm to a dataset of a non-visual origin and evaluate the specific generative model CaloGAN (Paganini et al., 2017).

Synthetic data

For this experiment we have generated a collection of simple datasets (see Fig. 6) each containing points. As a test problem we would like to evaluate which of the datasets is the best approximation to the ground truth . For each of we ran Algorithm 1 using and compute MRLT using Eq. 4. The resulting distributions are visualized on Fig. 6, [bottom]. We observe that we can correctly identify the number of -dimensional holes in each space using the MAP estimate


It is clear that is the most similar dataset to , which is supported by the fact that their MRLT are almost identical. Note that on such simple datasets we were able to recover the correct homology with almost confidence and this will not be the case for more complicated manifolds in the next experiment.


In this experiment we compare topological properties of the MNIST dataset and samples generated by the WGAN and WGAN-GP models trained on MNIST. It was claimed that the WGAN-GP model produces better images and we would like to verify if we can detect it using topology. For the GAN implementations we used the code333 provided by the authors of (Gulrajani et al., 2017). We have trained each model for epochs and generated

samples. To compare topology of each class individually we trained a CNN classifier on MNIST (with

test accuracy) and split generated datasest into classes (containing roughly images each). For every class and each of the corresponding datasets (‘base’, ‘wgan’, ‘wgan–gp’) we run Algorithm 1 and compute MRLT with . Similarly we evaluate MRLT for the entire datasets without splitting them into classes using . The obtained MRLT are presented on Fig. 7 and the corresponding Geometry Scores for each model are given in Table 1. We observe that both models produce distributions which are very close to the ground truth, but for almost all classes WGAN-GP shows better scores. We can also note that for the entire datasets (Fig. 7, [right]) the predicted values of homology does not seem to be much bigger than for each individual digit. One possible explanation is that some samples (like say of class ‘’) fill the holes in the underlying manifolds of other classes (like class ‘’ in this case) since they look quite similar.

Figure 6: Mean Relative Living Times (MRLT) for various datasets. The number of one-dimensional holes is correctly identified in all the cases. By comparing MRLT we find that the second dataset from the left is the most similar to the ‘ground truth’ (noisy circle on the left).
Figure 7: Comparison of MRLT of the MNIST dataset and of samples generated by WGAN and WGAN-GP trained on MNIST. MRLT match almost perfectly, however, WGAN-GP shows slightly better performance on most of the classes.

Label 0 1 2 3 4 5 6 7 8 9 all
WGAN 0.85 21.4 0.60 7.04 1.52 0.47 22.8 2.20 0.76 1.27 26.1
WGAN-GP 5.19 1.44 0.54 0.27 2.16 0.03 13.5 1.38 0.14 5.00 2.04
Table 1: Geometry Scores of WGAN and WGAN-GP trained on the MNIST dataset (see also Fig. 7). Each class contained roughly images, except for ‘All’ which corresponds to the total datasets of images.


We now analyze the popular CelebA dataset consisting of photos of various celebrities. In this experiment we would like to study if we can detect mode collapse using our method. To achieve this we train two GAN models — a good model with the generator having high capacity and a second model with the generator much weaker than the discriminator. In this experiment we utilize the DCGAN model and use the implementation provided444 by the authors (Radford et al., 2015). For the first model (‘dcgan’) we use the default settings, and for the second (‘bad-dcgan’) we set the latent dimension to and reduce the size of the fully connected layer in the generator to and number of filters in convolutional layers to . Images in the dataset are of size and to obtain faces we perform the central crop which reduces the size to . We trained both models for epochs and produced images for our analysis. Similarly, we randomly picked (cropped) images from the original dataset. We report the obtained results on Fig. 8. MRLT obtained using the good model matches the ground truth almost perfectly and Geometry Score of the generated dataset is equal to , confirming the good visual quality of the samples (Radford et al., 2015). MRLT obtained using the weak model are maximized for , which suggests that the samples are either identical or present very little topological diversity (compare with Fig. 5), which we confirmed visually. On Fig. 8, [right] we report the behavior of the Geometry Score and Inception Score (Salimans et al., 2016) w.r.t the iteration number. The Inception Score introduced uses the pretrained Inception network (Szegedy et al., 2015) and is defined as

where is approximated by the Inception network and is computed as . Note that the Geometry Score of the better model rapidly decreases and of the mode collapsed model stagnates at high values. Such behavior could not be observed in the Inception Score.

Figure 8: MRLT of the (cropped) CelebA dataset and samples generated using DCGAN and DCGAN with forced mode collapse. Plots on the right present the behavior of the Geometry Score and Inception Score for these two models during the training. Mode collapse in ‘bad-dcgan’ is easily observable using the Geometry Score.


In this experiment, we will apply our technique to the dataset appearing in the experimental particle physics. This dataset555 represents a collection of a calorimeter (an experimental apparatus measuring the energy of particles) responses, and it was used to create a generative model (Paganini et al., 2017) in order to help physicists working at the LHC. Evaluating the obtained model666 is a non-trivial task and was performed by comparing physical properties of the obtained and the real data. Since our method is not limited to visual datasets we can apply it in order to confirm the quality of this model. For the analysis we used ‘eplus’ dataset which is split into parts (‘layer 0’, ‘layer 1’, ‘layer 2’) containing matrices of sizes correspondingly. We train the CaloGAN model with default settings for epochs and generate samples (each sample combines data for all layers). We then randomly pick samples from the original dataset and compare MRLT of the data and generated samples for each layer. Results are presented on Fig. 9. It appears that topological properties of this dataset are rather trivial, however, they are correctly identified by CaloGAN. Slight dissimilarities between the distributions could be connected to the fact that the physical properties of the generated samples do not exactly match those of the real ones, as was analyzed by the authors of (Paganini et al., 2017).

Figure 9: MRLT of the dataset used in experimental particle physics and of the samples generated using the corresponding CaloGAN model.

6 Related work and discussion

Several performance measures have been introduced to assess the performance of GANs used for natural images. Inception Score (Salimans et al., 2016) uses the outputs of the pretrained Inception network, and a modification called Fréchet Inception Distance (FID) (Heusel et al., 2017) also takes into account second order information of the final layer of this model. Contrary to these methods, our approach does not use auxiliary networks and is not limited to visual data. We note, however, that since we only take topological properties into account (which do not change if we say shift the entire dataset by ) assessing the visual quality of samples may be difficult based only on our algorithm, thus in the case of natural images we propose to use our method in conjunction with other metrics such as FID. We also hypothesize that in the case of the large dimensionality of data Geometry Score

of the features extracted using some network will adequately assess the performance of a GAN.

7 Conclusion

We have introduced a new algorithm for evaluating a generative model. We show that the topology of the underlying manifold of generated samples may be different from the topology of the original data manifold, which provides insight into properties of GANs and can be used for hyperparameter tuning. We do not claim however that the obtained metric correlates with the visual quality as estimated by humans and leave the analysis to future work. We hope that our research will be useful to further theoretical understanding of GANs.


We would like to thank the anonymous reviewers for their valuable comments. We also thank Maxim Rakhuba for productive discussions and making our illustrations better. This study was supported by the Ministry of Education and Science of the Russian Federation (grant 14.756.31.0001).