Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

08/16/2023
by   Minho Park, et al.
0

Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5 billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and corresponding layout pairs. Our experiments reveal that we can guide text-to-image generation models to be aware of the semantics of different image regions, by training the model to generate semantic labels for each pixel. We demonstrate that our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset, where text-image pairs are scarce. Codes are available in this https://pmh9960.github.io/research/GCDP

READ FULL TEXT

page 2

page 5

page 8

page 22

page 23

page 24

page 25

page 26

research
04/05/2022

DT2I: Dense Text-to-Image Generation from Region Descriptions

Despite astonishing progress, generating realistic images of complex sce...
research
10/16/2022

LAION-5B: An open large-scale dataset for training next generation image-text models

Groundbreaking language-vision architectures like CLIP and DALL-E proved...
research
05/07/2018

Unpaired Multi-Domain Image Generation via Regularized Conditional GANs

In this paper, we study the problem of multi-domain image generation, th...
research
05/15/2022

Conditional Vector Graphics Generation for Music Cover Images

Generative Adversarial Networks (GAN) have motivated a rapid growth of t...
research
01/20/2023

Screen Correspondence: Mapping Interchangeable Elements between UIs

Understanding user interface (UI) functionality is a useful yet challeng...
research
09/23/2021

Paint4Poem: A Dataset for Artistic Visualization of Classical Chinese Poems

In this work we propose a new task: artistic visualization of classical ...
research
05/23/2023

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

Diffusion models have been shown to be capable of generating high-qualit...

Please sign up or login with your details

Forgot password? Click here to reset