Variational Distribution Learning for Unsupervised Text-to-Image Generation

03/28/2023
by   Minsoo Kang, et al.
0

We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. In this work, instead of simply generating pseudo-ground-truth sentences of training images using existing image captioning methods, we employ a pretrained CLIP model, which is capable of properly aligning embeddings of images and corresponding texts in a joint space and, consequently, works well on zero-shot recognition tasks. We optimize a text-to-image generation model by maximizing the data log-likelihood conditioned on pairs of image-text CLIP embeddings. To better align data in the two domains, we employ a principled way based on a variational inference, which efficiently estimates an approximate posterior of the hidden text embedding given an image and its CLIP feature. Experimental results validate that the proposed framework outperforms existing approaches by large margins under unsupervised and semi-supervised text-to-image generation settings.

READ FULL TEXT

page 8

page 12

page 13

page 14

research
09/27/2018

Semantically Invariant Text-to-Image Generation

Image captioning has demonstrated models that are capable of generating ...
research
11/24/2022

Shifted Diffusion for Text-to-image Generation

We present Corgi, a novel method for text-to-image generation. Corgi is ...
research
02/24/2021

Zero-Shot Text-to-Image Generation

Text-to-image generation has traditionally focused on finding better mod...
research
10/25/2022

Lafite2: Few-shot Text-to-Image Generation

Text-to-image generation models have progressed considerably in recent y...
research
05/21/2018

Turbo Learning for Captionbot and Drawingbot

We study in this paper the problems of both image captioning and text-to...
research
12/18/2021

A Streaming Volumetric Image Generation Framework for Development and Evaluation of Out-of-Core Methods

Advances in 3D imaging technology in recent years have allowed for incre...
research
06/20/2022

DALL-E for Detection: Language-driven Context Image Synthesis for Object Detection

Object cut-and-paste has become a promising approach to efficiently gene...

Please sign up or login with your details

Forgot password? Click here to reset