What You See is What You Read? Improving Text-Image Alignment Evaluation

05/17/2023
by   Michal Yarom, et al.
0

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study methods for automatic text-image alignment evaluation. We first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to-image and image-to-text generation tasks, with human judgements for whether a given text-image pair is semantically aligned. We then describe two automatic methods to determine alignment: the first involving a pipeline based on question generation and visual question answering models, and the second employing an end-to-end classification approach by finetuning multimodal pretrained models. Both methods surpass prior approaches in various text-image alignment tasks, with significant improvements in challenging cases that involve complex composition or unnatural images. Finally, we demonstrate how our approaches can localize specific misalignments between an image and a given text, and how they can be used to automatically re-rank candidates in text-to-image generation.

READ FULL TEXT

page 1

page 4

page 8

page 15

research
05/24/2022

On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization

Integrating vision and language has gained notable attention following t...
research
06/29/2023

CLIPAG: Towards Generator-Free Text-to-Image Generation

Perceptually Aligned Gradients (PAG) refer to an intriguing property obs...
research
04/15/2022

Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer

Though deep generative models have gained a lot of attention, most of th...
research
05/18/2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Existing automatic evaluation on text-to-image synthesis can only provid...
research
10/01/2012

Enhanced Techniques for PDF Image Segmentation and Text Extraction

Extracting text objects from the PDF images is a challenging problem. Th...
research
07/18/2023

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Research in Image Generation has recently made significant progress, par...
research
05/24/2023

Revisiting Sentence Union Generation as a Testbed for Text Consolidation

Tasks involving text generation based on multiple input texts, such as m...

Please sign up or login with your details

Forgot password? Click here to reset