Towards Better Text-Image Consistency in Text-to-Image Generation

10/27/2022
by   Zhaorui Tan, et al.
0

Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which can fuse semantic information at different granularities and capture accurate semantics. Equipped with two novel plug-and-play components: Hard-Negative Sentence Constructor and Semantic Projection, the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments show that, as opposed to current state-of-the-art methods, our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.

READ FULL TEXT

page 1

page 6

page 7

research
03/14/2019

MirrorGAN: Learning Text-to-image Generation by Redescription

Generating an image from a given text description has two goals: visual ...
research
08/13/2020

DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Synthesizing high-resolution realistic images from text descriptions is ...
research
04/01/2021

Text to Image Generation with Semantic-Spatial Aware GAN

A text to image generation (T2I) model aims to generate photo-realistic ...
research
02/17/2023

Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis

Text-to-image synthesis refers to generating visual-realistic and semant...
research
08/20/2022

Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks

Text-to-image synthesis aims to generate a photo-realistic and semantic ...
research
04/02/2019

Semantics Disentangling for Text-to-Image Generation

Synthesizing photo-realistic images from text descriptions is a challeng...
research
08/27/2021

DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis

Text-to-image synthesis refers to generating an image from a given text ...

Please sign up or login with your details

Forgot password? Click here to reset