Improving Text-to-Image Synthesis Using Contrastive Learning

07/06/2021
by   Hui Ye, et al.
0

The goal of text-to-image synthesis is to generate a visually realistic image that matches a given text description. In practice, the captions annotated by humans for the same image have large variance in terms of contents and the choice of words. The linguistic discrepancy between the captions of the identical image leads to the synthetic images deviating from the ground truth. To address this issue, we propose a contrastive learning approach to improve the quality and enhance the semantic consistency of synthetic images. In the pre-training stage, we utilize the contrastive learning approach to learn the consistent textual representations for the captions corresponding to the same image. Furthermore, in the following stage of GAN training, we employ the contrastive learning method to enhance the consistency between the generated images from the captions related to the same image. We evaluate our approach over two popular text-to-image synthesis models, AttnGAN and DM-GAN, on datasets CUB and COCO, respectively. Experimental results have shown that our approach can effectively improve the quality of synthetic images in terms of three metrics: IS, FID and R-precision. Especially, on the challenging COCO dataset, our approach boosts the FID significantly by 29.60 by 21.96

READ FULL TEXT

page 6

page 7

research
08/16/2023

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Contrastive Language-Image Pre-training (CLIP) has significantly boosted...
research
10/09/2019

Text-to-Image Synthesis Based on Machine Generated Captions

Text to Image Synthesis refers to the process of automatic generation of...
research
01/05/2023

ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions

Advancements in Text-to-Image synthesis over recent years have focused m...
research
02/22/2018

ChatPainter: Improving Text to Image Generation using Dialogue

Synthesizing realistic images from text descriptions on a dataset like M...
research
09/20/2018

C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis

Generating an image from its description is a challenging task worth sol...
research
05/22/2023

Evaluating Pragmatic Abilities of Image Captioners on A3DS

Evaluating grounded neural language model performance with respect to pr...
research
12/20/2022

CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning

Machine-Generated Text (MGT) detection, a task that discriminates MGT fr...

Please sign up or login with your details

Forgot password? Click here to reset