TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models

12/15/2022
by   Federico A. Galatolo, et al.
0

Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public.

READ FULL TEXT
research
11/08/2020

A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

Most Natural Language Generation systems need to produce accurate texts....
research
06/15/2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Recent text-to-image generative models can generate high-fidelity images...
research
10/02/2022

Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2

The field of image synthesis has made great strides in the last couple o...
research
03/30/2023

Discriminative Class Tokens for Text-to-Image Diffusion Models

Recent advances in text-to-image diffusion models have enabled the gener...
research
07/10/2023

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

The field of text-conditioned image generation has made unparalleled pro...
research
11/22/2022

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

We provide a new multi-task benchmark for evaluating text-to-image model...
research
05/24/2023

I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

Visual metaphors are powerful rhetorical devices used to persuade or com...

Please sign up or login with your details

Forgot password? Click here to reset