HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

04/11/2023
by   Eslam Mohamed Bakr, et al.
0

In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developing new T2I architectures and those in evaluation. To address this, we introduce HRS-Bench, a concrete evaluation benchmark for T2I models that is Holistic, Reliable, and Scalable. Unlike existing bench-marks that focus on limited aspects, HRS-Bench measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. In addition, HRS-Bench covers 50 scenarios, including fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. A human evaluation aligned with 95 effectiveness of HRS-Bench. Our experiments demonstrate that existing models often struggle to generate images with the desired count of objects, visual text, or grounded emotions. We hope that our benchmark help ease future text-to-image generation research. The code and data are available at https://eslambakr.github.io/hrsbench.github.io

READ FULL TEXT

page 14

page 15

page 16

page 18

page 20

page 21

page 22

page 23

research
02/08/2022

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Generating images from textual descriptions has gained a lot of attentio...
research
06/15/2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Recent text-to-image generative models can generate high-fidelity images...
research
07/12/2023

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Despite the stunning ability to generate high-quality images by recent t...
research
05/24/2023

Visual Programming for Text-to-Image Generation and Evaluation

As large language models have demonstrated impressive performance in man...
research
04/04/2023

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

Human evaluation is critical for validating the performance of text-to-i...
research
07/06/2023

On the Cultural Gap in Text-to-Image Generation

One challenge in text-to-image (T2I) generation is the inadvertent refle...
research
04/27/2020

PuzzLing Machines: A Challenge on Learning From Small Data

Deep neural models have repeatedly proved excellent at memorizing surfac...

Please sign up or login with your details

Forgot password? Click here to reset