TransGAN: Two Transformers Can Make One Strong GAN

02/14/2021
by   Yifan Jiang, et al.
0

The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN. The code is available at <https://github.com/VITA-Group/TransGAN>.

READ FULL TEXT
research
08/11/2019

AutoGAN: Neural Architecture Search for Generative Adversarial Networks

Neural architecture search (NAS) has witnessed prevailing success in ima...
research
06/13/2021

Styleformer: Transformer based Generative Adversarial Networks with Style Vector

We propose Styleformer, which is a style-based generator for GAN archite...
research
12/20/2021

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Despite the tantalizing success in a broad of vision tasks, transformers...
research
10/25/2021

STransGAN: An Empirical Study on Transformer in GANs

Transformer becomes prevalent in computer vision, especially for high-le...
research
07/09/2021

ViTGAN: Training GANs with Vision Transformers

Recently, Vision Transformers (ViTs) have shown competitive performance ...
research
10/29/2019

Adversarial Fisher Vectors for Unsupervised Representation Learning

We examine Generative Adversarial Networks (GANs) through the lens of de...
research
04/07/2022

Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

Recent state-of-the-art performances of Vision Transformers (ViT) in com...

Please sign up or login with your details

Forgot password? Click here to reset