Zero-Shot Text-to-Image Generation

02/24/2021
by   Aditya Ramesh, et al.
10

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 14

page 19

page 20

research
06/23/2023

Zero-shot spatial layout conditioning for text-to-image diffusion models

Large-scale text-to-image diffusion models have significantly improved t...
research
03/02/2023

X Fuse: Fusing Visual Information in Text-to-Image Generation

We introduce X Fuse, a general approach for conditioning on visual inf...
research
09/27/2022

What Does DALL-E 2 Know About Radiology?

Generative models such as DALL-E 2 could represent a promising future to...
research
04/11/2022

No Token Left Behind: Explainability-Aided Image Classification and Generation

The application of zero-shot learning in computer vision has been revolu...
research
03/28/2023

Variational Distribution Learning for Unsupervised Text-to-Image Generation

We propose a text-to-image generation algorithm based on deep neural net...
research
06/22/2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

We present the Pathways Autoregressive Text-to-Image (Parti) model, whic...
research
03/30/2022

Neural Pipeline for Zero-Shot Data-to-Text Generation

In data-to-text (D2T) generation, training on in-domain data leads to ov...

Please sign up or login with your details

Forgot password? Click here to reset