Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

11/05/2021
by   Yanhong Zeng, et al.
12

We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem. Different from existing paradigms that directly synthesize a full image from a single input (e.g., a latent code), the new formulation enables a flexible local manipulation for different image regions, which makes it possible to learn content-aware and fine-grained style control for image synthesis. Specifically, it takes as input a sequence of latent tokens to predict the visual tokens for synthesizing an image. Under this perspective, we propose a token-based generator (i.e.,TokenGAN). Particularly, the TokenGAN inputs two semantically different visual tokens, i.e., the learned constant content tokens and the style tokens from the latent space. Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer. We conduct extensive experiments and show that the proposed TokenGAN has achieved state-of-the-art results on several widely-used image synthesis benchmarks, including FFHQ and LSUN CHURCH with different resolutions. In particular, the generator is able to synthesize high-fidelity images with 1024x1024 size, dispensing with convolutions entirely.

READ FULL TEXT

page 5

page 7

page 8

page 9

page 10

research
10/12/2021

Fine-grained style control in Transformer-based Text-to-speech Synthesis

In this paper, we present a novel architecture to realize fine-grained s...
research
11/01/2017

Uncovering Latent Style Factors for Expressive Speech Synthesis

Prosodic modeling is a core problem in speech synthesis. The key challen...
research
08/19/2021

Controlled GAN-Based Creature Synthesis via a Challenging Game Art Dataset – Addressing the Noise-Latent Trade-Off

The state-of-the-art StyleGAN2 network supports powerful methods to crea...
research
02/08/2022

MaskGIT: Masked Generative Image Transformer

Generative transformers have experienced rapid popularity growth in the ...
research
05/10/2022

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

Transformers have achieved great success in pluralistic image inpainting...
research
07/15/2021

StyleFusion: A Generative Model for Disentangling Spatial Segments

We present StyleFusion, a new mapping architecture for StyleGAN, which t...
research
11/27/2020

Image Generators with Conditionally-Independent Pixel Synthesis

Existing image generator networks rely heavily on spatial convolutions a...

Please sign up or login with your details

Forgot password? Click here to reset