Improved Transformer for High-Resolution GANs

06/14/2021
by   Long Zhao, et al.
18

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs). In this paper, we introduce two key ingredients to Transformer to address this challenge. First, in low-resolution stages of the generative process, standard global self-attention is replaced with the proposed multi-axis blocked self-attention which allows efficient mixing of local and global attention. Second, in high-resolution stages, we drop self-attention while only keeping multi-layer perceptrons reminiscent of the implicit neural function. To further improve the performance, we introduce an additional self-modulation component based on cross-attention. The resulting model, denoted as HiT, has a linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet 128 × 128 and FFHQ 256 × 256, respectively, with a reasonable throughput. We believe the proposed HiT is an important milestone for generators in GANs which are completely free of convolutions.

READ FULL TEXT

page 6

page 8

page 17

page 18

research
05/21/2018

Self-Attention Generative Adversarial Networks

In this paper, we propose the Self-Attention Generative Adversarial Netw...
research
04/04/2022

MaxViT: Multi-Axis Vision Transformer

Transformers have recently gained significant attention in the computer ...
research
03/19/2023

Q-RBSA: High-Resolution 3D EBSD Map Generation Using An Efficient Quaternion Transformer Network

Gathering 3D material microstructural information is time-consuming, exp...
research
08/26/2021

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

In this paper we address the problem of fine-tuned text generation with ...
research
08/11/2022

Deep is a Luxury We Don't Have

Medical images come in high resolutions. A high resolution is vital for ...
research
02/15/2018

Image Tranformer

Image generation has been successfully cast as an autoregressive sequenc...
research
08/18/2023

Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Many medical or pharmaceutical processes have strict guidelines regardin...

Please sign up or login with your details

Forgot password? Click here to reset