Lafite2: Few-shot Text-to-Image Generation

10/25/2022
by   Yufan Zhou, et al.
0

Text-to-image generation models have progressed considerably in recent years, which can now generate impressive realistic images from arbitrary text. Most of such models are trained on web-scale image-text paired datasets, which may not be affordable for many researchers. In this paper, we propose a novel method for pre-training text-to-image generation model on image-only datasets. It considers a retrieval-then-optimization procedure to synthesize pseudo text features: for a given image, relevant pseudo text features are first retrieved, then optimized for better alignment. The low requirement of the proposed method yields high flexibility and usability: it can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning; it can be applied on different models including generative adversarial networks (GANs) and diffusion models. Extensive experiments illustrate the effectiveness of the proposed method. On MS-COCO dataset, our GAN model obtains Fréchet Inception Distance (FID) of 6.78 which is the new state-of-the-art (SoTA) of GANs under fully-supervised setting. Our diffusion model obtains FID of 8.42 and 4.28 on zero-shot and supervised setting respectively, which are competitive to SoTA diffusion models with a much smaller model size.

READ FULL TEXT

page 7

page 14

page 15

research
11/24/2022

Shifted Diffusion for Text-to-image Generation

We present Corgi, a novel method for text-to-image generation. Corgi is ...
research
11/27/2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

One of the major challenges in training text-to-image generation models ...
research
03/29/2023

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

Text-to-Image synthesis is the task of generating an image according to ...
research
05/26/2021

CogView: Mastering Text-to-Image Generation via Transformers

Text-to-Image generation in the general domain has long been an open pro...
research
03/28/2023

Variational Distribution Learning for Unsupervised Text-to-Image Generation

We propose a text-to-image generation algorithm based on deep neural net...
research
11/11/2022

StrokeGAN+: Few-Shot Semi-Supervised Chinese Font Generation with Stroke Encoding

The generation of Chinese fonts has a wide range of applications. The cu...
research
09/12/2023

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Diffusion models have revolutionized text-to-image generation with its e...

Please sign up or login with your details

Forgot password? Click here to reset