Shifted Diffusion for Text-to-image Generation

11/24/2022
by   Yufan Zhou, et al.
0

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7 our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

READ FULL TEXT

page 2

page 7

page 14

page 15

page 16

page 17

page 18

page 19

research
11/27/2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

One of the major challenges in training text-to-image generation models ...
research
10/25/2022

Lafite2: Few-shot Text-to-Image Generation

Text-to-image generation models have progressed considerably in recent y...
research
01/17/2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

Large-scale text-to-image diffusion models have made amazing advances. H...
research
03/28/2023

Variational Distribution Learning for Unsupervised Text-to-Image Generation

We propose a text-to-image generation algorithm based on deep neural net...
research
05/21/2018

Turbo Learning for Captionbot and Drawingbot

We study in this paper the problems of both image captioning and text-to...
research
05/22/2023

The CLIP Model is Secretly an Image-to-Prompt Converter

The Stable Diffusion model is a prominent text-to-image generation model...
research
03/13/2023

ODIN: On-demand Data Formulation to Mitigate Dataset Lock-in

ODIN is an innovative approach that addresses the problem of dataset con...

Please sign up or login with your details

Forgot password? Click here to reset