Cross-Modal Contrastive Learning for Text-to-Image Generation

01/12/2021
by   Han Zhang, et al.
16

The output of text-to-image synthesis systems should be coherent, clear, photo-realistic scenes with high semantic fidelity to their conditioned text descriptions. Our Cross-Modal Contrastive Generative Adversarial Network (XMC-GAN) addresses this challenge by maximizing the mutual information between image and text. It does this via multiple contrastive losses which capture inter-modality and intra-modality correspondences. XMC-GAN uses an attentional self-modulation generator, which enforces strong text-image correspondence, and a contrastive discriminator, which acts as a critic as well as a feature encoder for contrastive learning. The quality of XMC-GAN's output is a major step up from previous models, as we show on three challenging datasets. On MS-COCO, not only does XMC-GAN improve state-of-the-art FID from 24.70 to 9.33, but–more importantly–people prefer XMC-GAN by 77.3 for image quality and 74.1 for image-text alignment, compared to three other recent models. XMC-GAN also generalizes to the challenging Localized Narratives dataset (which has longer, more detailed descriptions), improving state-of-the-art FID from 48.70 to 14.12. Lastly, we train and evaluate XMC-GAN on the challenging Open Images data, establishing a strong benchmark FID score of 26.91.

READ FULL TEXT

page 7

page 15

page 16

page 17

page 18

page 19

research
02/21/2022

Vision-Language Pre-Training with Triple Contrastive Learning

Vision-language representation learning largely benefits from image-text...
research
05/18/2019

Variational Hetero-Encoder Randomized Generative Adversarial Networks for Joint Image-Text Modeling

For bidirectional joint image-text modeling, we develop variational hete...
research
10/29/2017

A Novel Approach to Artistic Textual Visualization via GAN

While the visualization of statistical data tends to a mature technology...
research
02/26/2023

Contrast-PLC: Contrastive Learning for Packet Loss Concealment

Packet loss concealment (PLC) is challenging in concealing missing conte...
research
06/15/2022

Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation

Diffusion probabilistic models (DPMs) have become a popular approach to ...
research
09/16/2023

Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition

Vocoder models have recently achieved substantial progress in generating...
research
03/26/2020

Cycle Text-To-Image GAN with BERT

We explore novel approaches to the task of image generation from their r...

Please sign up or login with your details

Forgot password? Click here to reset