Recurrent Affine Transformation for Text-to-image Synthesis

04/22/2022
by   Senmao Ye, et al.
4

Text-to-image synthesis aims to generate natural images conditioned on text descriptions. The main difficulty of this task lies in effectively fusing text information into the image synthesis process. Existing methods usually adaptively fuse suitable text information into the synthesis process with multiple isolated fusion blocks (e.g., Conditional Batch Normalization and Instance Normalization). However, isolated fusion blocks not only conflict with each other but also increase the difficulty of training (see first page of the supplementary). To address these issues, we propose a Recurrent Affine Transformation (RAT) for Generative Adversarial Networks that connects all the fusion blocks with a recurrent neural network to model their long-term dependency. Besides, to improve semantic consistency between texts and synthesized images, we incorporate a spatial attention model in the discriminator. Being aware of matching image regions, text descriptions supervise the generator to synthesize more relevant image contents. Extensive experiments on the CUB, Oxford-102 and COCO datasets demonstrate the superiority of the proposed model in comparison to state-of-the-art models [https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis.git]

READ FULL TEXT

page 3

page 5

page 6

page 9

page 10

research
08/13/2020

DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Synthesizing high-resolution realistic images from text descriptions is ...
research
02/17/2023

Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis

Text-to-image synthesis refers to generating visual-realistic and semant...
research
04/01/2021

Text to Image Generation with Semantic-Spatial Aware GAN

A text to image generation (T2I) model aims to generate photo-realistic ...
research
12/18/2019

CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Typical methods for text-to-image synthesis seek to design effective gen...
research
11/05/2020

DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation

Most existing text-to-image generation methods adopt a multi-stage modul...
research
09/07/2023

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Due to the difficulty in scaling up, generative adversarial networks (GA...
research
12/18/2018

Recurrent Calibration Network for Irregular Text Recognition

Scene text recognition has received increased attention in the research ...

Please sign up or login with your details

Forgot password? Click here to reset