X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

05/18/2023
by   Yixiong Chen, et al.
0

This paper introduces a novel explainable image quality evaluation approach called X-IQE, which leverages visual large language models (LLMs) to evaluate text-to-image generation methods by generating textual explanations. X-IQE utilizes a hierarchical Chain of Thought (CoT) to enable MiniGPT-4 to produce self-consistent, unbiased texts that are highly correlated with human evaluation. It offers several advantages, including the ability to distinguish between real and generated images, evaluate text-image alignment, and assess image aesthetics without requiring model training or fine-tuning. X-IQE is more cost-effective and efficient compared to human evaluation, while significantly enhancing the transparency and explainability of deep image quality evaluation models. We validate the effectiveness of our method as a benchmark using images generated by prevalent diffusion models. X-IQE demonstrates similar performance to state-of-the-art (SOTA) evaluation methods on COCO Caption, while overcoming the limitations of previous evaluation models on DrawBench, particularly in handling ambiguous generation prompts and text recognition in generated images. Project website: https://github.com/Schuture/Benchmarking-Awesome-Diffusion-Models

READ FULL TEXT

page 2

page 4

page 9

page 16

page 17

page 18

research
05/09/2023

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Diffusion models, which have emerged to become popular text-to-image gen...
research
05/24/2023

I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

Visual metaphors are powerful rhetorical devices used to persuade or com...
research
07/06/2023

On the Cultural Gap in Text-to-Image Generation

One challenge in text-to-image (T2I) generation is the inadvertent refle...
research
09/22/2022

Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

Taking advantage of the many recent advances in deep learning, text-to-i...
research
07/18/2023

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Research in Image Generation has recently made significant progress, par...
research
06/15/2023

Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment

Text-conditioned image generation models often generate incorrect associ...
research
05/24/2023

Visual Programming for Text-to-Image Generation and Evaluation

As large language models have demonstrated impressive performance in man...

Please sign up or login with your details

Forgot password? Click here to reset