LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

08/09/2023
by   Leigang Qu, et al.
0

In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial relation understanding and numeration failure) in complex natural scenes, which impedes the high-faithfulness text-to-image generation. Although recent efforts have been made to improve controllability by giving fine-grained guidance (e.g., sketch and scribbles), this issue has not been fundamentally tackled since users have to provide such guidance information manually. In this work, we strive to synthesize high-fidelity images that are semantically aligned with a given textual prompt without any guidance. Toward this end, we propose a coarse-to-fine paradigm to achieve layout planning and image generation. Concretely, we first generate the coarse-grained layout conditioned on a given textual prompt via in-context learning based on Large Language Models. Afterward, we propose a fine-grained object-interaction diffusion method to synthesize high-faithfulness images conditioned on the prompt and the automatically generated layout. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art models in terms of layout and image generation. Our code and settings are available at https://layoutllm-t2i.github.io.

READ FULL TEXT

page 2

page 8

page 13

page 14

page 16

page 17

page 18

page 19

research
02/16/2023

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

Layout-to-image generation refers to the task of synthesizing photo-real...
research
11/25/2022

SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...
research
05/24/2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Attaining a high degree of user controllability in visual generation oft...
research
05/23/2023

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Recent text-to-image generation models have demonstrated impressive capa...
research
05/29/2023

Controllable Text-to-Image Generation with GPT-4

Current text-to-image generation models often struggle to follow textual...
research
06/30/2023

Counting Guidance for High Fidelity Text-to-Image Synthesis

Recently, the quality and performance of text-to-image generation signif...
research
04/13/2023

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

Spatial control is a core capability in controllable image generation. A...

Please sign up or login with your details

Forgot password? Click here to reset