Image-to-Image Translation with Text Guidance

02/12/2020
by   Bowen Li, et al.
0

The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.

READ FULL TEXT

page 2

page 3

page 6

page 7

page 8

research
06/18/2022

Multi-Modality Image Inpainting using Generative Adversarial Networks

Deep learning techniques, especially Generative Adversarial Networks (GA...
research
07/23/2018

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

Recent studies on unsupervised image-to-image translation have made rema...
research
05/28/2019

Video-to-Video Translation for Visual Speech Synthesis

Despite remarkable success in image-to-image translation that celebrates...
research
08/10/2020

Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach

Manipulating visual attributes of images through human-written text is a...
research
12/12/2019

ManiGAN: Text-Guided Image Manipulation

The goal of our paper is to semantically edit parts of an image to match...
research
10/23/2020

Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

We propose a novel lightweight generative adversarial network for effici...
research
12/18/2019

CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Typical methods for text-to-image synthesis seek to design effective gen...

Please sign up or login with your details

Forgot password? Click here to reset