CogView: Mastering Text-to-Image Generation via Transformers

05/26/2021
by   Ming Ding, et al.
17

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView (zero-shot) achieves a new state-of-the-art FID on blurred MS COCO, outperforms previous GAN-based models and a recent similar work DALL-E.

READ FULL TEXT

page 1

page 7

page 8

page 14

page 15

page 17

page 18

research
04/28/2022

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

The development of the transformer-based text-to-image models are impede...
research
03/26/2020

StrokeCoder: Path-Based Image Generation from Single Examples using Transformers

This paper demonstrates how a Transformer Neural Network can be used to ...
research
08/18/2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Recently, large-scale diffusion models, e.g., Stable diffusion and DallE...
research
03/02/2023

X Fuse: Fusing Visual Information in Text-to-Image Generation

We introduce X Fuse, a general approach for conditioning on visual inf...
research
11/22/2021

L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transfo...
research
10/25/2022

Lafite2: Few-shot Text-to-Image Generation

Text-to-image generation models have progressed considerably in recent y...
research
06/06/2023

On the Difference of BERT-style and CLIP-style Text Encoders

Masked language modeling (MLM) has been one of the most popular pretrain...

Please sign up or login with your details

Forgot password? Click here to reset