LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

05/18/2023
by   Yujie Lu, et al.
0

Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments. In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality. LLMScore leverages the large language models (LLMs) to evaluate text-to-image models. Initially, it transforms the image into image-level and object-level visual descriptions. Then an evaluation instruction is fed into the LLMs to measure the alignment between the synthesized image and the text, ultimately generating a score accompanied by a rationale. Our substantial analysis reveals the highest correlation of LLMScore with human judgments on a wide range of datasets (Attribute Binding Contrast, Concept Conjunction, MSCOCO, DrawBench, PaintSkills). Notably, our LLMScore achieves Kendall's tau correlation with human evaluations that is 58.8 31.2 respectively.

READ FULL TEXT

page 2

page 8

page 9

page 16

page 17

research
05/23/2022

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

We present Imagen, a text-to-image diffusion model with an unprecedented...
research
12/02/2021

TISE: A Toolbox for Text-to-Image Synthesis Evaluation

In this paper, we conduct a study on state-of-the-art methods for single...
research
05/17/2023

What You See is What You Read? Improving Text-Image Alignment Evaluation

Automatically determining whether a text and a corresponding image are s...
research
08/12/2023

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for...
research
08/20/2022

Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks

Text-to-image synthesis aims to generate a photo-realistic and semantic ...
research
07/10/2023

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

The field of text-conditioned image generation has made unparalleled pro...
research
03/29/2022

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis

Although progress has been made for text-to-image synthesis, previous me...

Please sign up or login with your details

Forgot password? Click here to reset