DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation

11/05/2020
by   Zhenxing Zhang, et al.
0

Most existing text-to-image generation methods adopt a multi-stage modular architecture which has three significant problems: (1) Training multiple networks can increase the run time and affect the convergence and stability of the generative model; (2) These approaches ignore the quality of early-stage generator images; (3) Many discriminators need to be trained. To this end, we propose the Dual Attention Generative Adversarial Network (DTGAN) which can synthesize high quality and visually realistic images only employing a single generator/discriminator pair. The proposed model introduces channel-aware and pixel-aware attention modules that can guide the generator to focus on text-relevant channels and pixels based on the global sentence vector and to fine-tune original feature maps using attention weights. Also, Conditional Adaptive Instance-Layer Normalization (CAdaILN) is presented to help our attention modules flexibly control the amount of change in shape and texture by the input natural-language description. Furthermore, a new type of visual loss is utilized to enhance the image quality by ensuring the vivid shape and the perceptually uniform color distributions of generated images. Experimental results on benchmark datasets demonstrate the superiority of our proposed method compared to the state-of-the-art models with a multi-stage framework. Visualization of the attention maps shows that the channel-aware attention module is able to localize the discriminative regions, while the pixel-aware attention module has the ability to capture the globally visual contents for the generation of an image.

READ FULL TEXT

page 13

page 14

page 15

research
11/17/2021

DiverGAN: An Efficient and Effective Single-Stage Framework for Diverse Text-to-Image Generation

In this paper, we present an efficient and effective single-stage framew...
research
05/17/2023

Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation

The goal of a speech-to-image transform is to produce a photo-realistic ...
research
03/14/2019

MirrorGAN: Learning Text-to-image Generation by Redescription

Generating an image from a given text description has two goals: visual ...
research
09/16/2019

Controllable Text-to-Image Generation

In this paper, we propose a novel controllable text-to-image generative ...
research
05/07/2019

Spatially Constrained Generative Adversarial Networks for Conditional Image Generation

Image generation has raised tremendous attention in both academic and in...
research
10/29/2018

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

This paper addresses the problem of manipulating images using natural la...
research
04/22/2022

Recurrent Affine Transformation for Text-to-image Synthesis

Text-to-image synthesis aims to generate natural images conditioned on t...

Please sign up or login with your details

Forgot password? Click here to reset