STransGAN: An Empirical Study on Transformer in GANs

10/25/2021
by   Rui Xu, et al.
10

Transformer becomes prevalent in computer vision, especially for high-level vision tasks. However, deploying Transformer in the generative adversarial network (GAN) framework is still an open yet challenging problem. In this paper, we conduct a comprehensive empirical study to investigate the intrinsic properties of Transformer in GAN for high-fidelity image synthesis. Our analysis highlights the importance of feature locality in image generation. We first investigate the effective ways to implement local attention. We then examine the influence of residual connections in self-attention layers and propose a novel way to reduce their negative impacts on learning discriminators and conditional generators. Our study leads to a new design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G, which achieves competitive results in both unconditional and conditional image generations. The Transformer-based discriminator, STrans-D, also significantly reduces its gap against the CNN-based discriminators.

READ FULL TEXT

page 1

page 4

page 8

page 18

page 19

page 20

page 21

research
02/17/2023

Transformer-based Generative Adversarial Networks in Computer Vision: A Comprehensive Survey

Generative Adversarial Networks (GANs) have been very successful for syn...
research
02/14/2021

TransGAN: Two Transformers Can Make One Strong GAN

The recent explosive interest on transformers has suggested their potent...
research
12/20/2021

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Despite the tantalizing success in a broad of vision tasks, transformers...
research
07/06/2022

MaiT: Leverage Attention Masks for More Efficient Image Transformers

Though image transformers have shown competitive results with convolutio...
research
08/30/2021

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Convolutional neural networks (CNN) are the dominant deep neural network...
research
06/30/2021

ResViT: Residual vision transformers for multi-modal medical image synthesis

Multi-modal imaging is a key healthcare technology in the diagnosis and ...
research
08/26/2021

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

In this paper we address the problem of fine-tuned text generation with ...

Please sign up or login with your details

Forgot password? Click here to reset