HYPE: Human eYe Perceptual Evaluation of Generative Models

04/01/2019
by   Sharon Zhou, et al.
0

Generative models often use human evaluations to determine and justify progress. Unfortunately, existing human evaluation methods are ad-hoc: there is currently no standardized, validated evaluation that: (1) measures perceptual fidelity, (2) is reliable, (3) separates models into clear rank order, and (4) ensures high-quality measurement without intractable cost. In response, we construct Human-eYe Perceptual Evaluation (HYPE), a human metric that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) results in separable model performances, and (4) efficient in cost and time. We introduce two methods. The first, HYPE-Time, measures visual perception under adaptive time constraints to determine the minimum length of time (e.g., 250ms) that model output such as a generated face needs to be visible for people to distinguish it as real or fake. The second, HYPE-Infinity, measures human error rate on fake and real images with no time constraints, maintaining stability and drastically reducing time and cost. We test HYPE across four state-of-the-art generative adversarial networks (GANs) on unconditional image generation using two datasets, the popular CelebA and the newer higher-resolution FFHQ, and two sampling techniques of model outputs. By simulating HYPE's evaluation multiple times, we demonstrate consistent ranking of different models, identifying StyleGAN with truncation trick sampling (27.6 quarter of images being misclassified by humans) as superior to StyleGAN without truncation (19.0

READ FULL TEXT
research
04/01/2019

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Generative models often use human evaluations to measure the perceived q...
research
06/02/2023

Sampling and Ranking for Digital Ink Generation on a tight computational budget

Digital ink (online handwriting) generation has a number of potential ap...
research
11/10/2018

Use of Neural Signals to Evaluate the Quality of Generative Adversarial Network Performance in Facial Image Generation

There is a growing interest in using Generative Adversarial Networks (GA...
research
06/10/2016

Improved Techniques for Training GANs

We present a variety of new architectural features and training procedur...
research
04/04/2023

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

Human evaluation is critical for validating the performance of text-to-i...
research
02/07/2022

A new face swap method for image and video domains: a technical report

Deep fake technology became a hot field of research in the last few year...
research
06/06/2018

PieAPP: Perceptual Image-Error Assessment through Pairwise Preference

The ability to estimate the perceptual error between images is an import...

Please sign up or login with your details

Forgot password? Click here to reset