StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

01/23/2023
by   Axel Sauer, et al.
5

Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models - the previous state-of-the-art in fast text-to-image synthesis - in terms of sample quality and speed.

READ FULL TEXT

page 3

page 4

page 6

page 8

page 13

research
05/23/2022

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

We present Imagen, a text-to-image diffusion model with an unprecedented...
research
09/29/2022

DreamFusion: Text-to-3D using 2D Diffusion

Recent breakthroughs in text-to-image synthesis have been driven by diff...
research
09/07/2023

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Due to the difficulty in scaling up, generative adversarial networks (GA...
research
06/01/2023

Wuerstchen: Efficient Pretraining of Text-to-Image Models

We introduce Wuerstchen, a novel technique for text-to-image synthesis t...
research
06/08/2023

Grounded Text-to-Image Synthesis with Attention Refocusing

Driven by scalable diffusion models trained on large-scale paired text-i...
research
07/25/2022

A Hazard Analysis Framework for Code Synthesis Large Language Models

Codex, a large language model (LLM) trained on a variety of codebases, e...
research
04/16/2019

Expediting TTS Synthesis with Adversarial Vocoding

Recent approaches in text-to-speech (TTS) synthesis employ neural networ...

Please sign up or login with your details

Forgot password? Click here to reset