DeepAI AI Chat
Log In Sign Up

Towards GAN Benchmarks Which Require Generalization

by   Ishaan Gulrajani, et al.

For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation and implement an example black-box metric based on these ideas. Through experimental validation we show that it can effectively measure diversity, sample quality, and generalization.


page 1

page 2

page 3

page 4


Optimizing Black-box Metrics with Iterative Example Weighting

We consider learning to optimize a classification metric defined by a bl...

Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality

Generative Adversarial Networks (GANs) are an elegant mechanism for data...

DQI: A Guide to Benchmark Evaluation

A `state of the art' model A surpasses humans in a benchmark B, but fail...

Comparing Sample-wise Learnability Across Deep Neural Network Models

Estimating the relative importance of each sample in a training set has ...

Diversity aware image generation

The machine learning generative algorithms such as GAN and VAE show impr...

Our Evaluation Metric Needs an Update to Encourage Generalization

Models that surpass human performance on several popular benchmarks disp...