LAFITE: Towards Language-Free Training for Text-to-Image Generation

11/27/2021
by   Yufan Zhou, et al.
9

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs. While image samples are often easily accessible, the associated text descriptions typically require careful human captioning, which is particularly time- and cost-consuming. In this paper, we propose the first work to train text-to-image generation models without any text data. Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model: the requirement of text-conditioning is seamlessly alleviated via generating text features from image features. Extensive experiments are conducted to illustrate the effectiveness of the proposed method. We obtain state-of-the-art results in the standard text-to-image generation tasks. Importantly, the proposed language-free model outperforms most existing models trained with full image-text pairs. Furthermore, our method can be applied in fine-tuning pre-trained models, which saves both training time and cost in training text-to-image generation models. Our pre-trained model obtains competitive results in zero-shot text-to-image generation on the MS-COCO dataset, yet with around only 1 recently proposed large DALL-E model.

READ FULL TEXT

page 6

page 13

page 14

page 15

page 16

research
11/24/2022

Shifted Diffusion for Text-to-image Generation

We present Corgi, a novel method for text-to-image generation. Corgi is ...
research
05/24/2023

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

The recent popularity of text-to-image diffusion models (DM) can largely...
research
05/23/2023

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Recent text-to-image generation models have demonstrated impressive capa...
research
04/06/2023

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Recent advances in personalized image generation allow a pre-trained tex...
research
06/26/2023

Localized Text-to-Image Generation for Free via Cross Attention Control

Despite the tremendous success in text-to-image generative models, local...
research
12/07/2021

A Generic Approach for Enhancing GANs by Regularized Latent Optimization

With the rapidly growing model complexity and data volume, training deep...
research
10/25/2022

Lafite2: Few-shot Text-to-Image Generation

Text-to-image generation models have progressed considerably in recent y...

Please sign up or login with your details

Forgot password? Click here to reset