Layout-Bridging Text-to-Image Synthesis

08/12/2022
by   Jiadong Liang, et al.
5

The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circumvent this limitation is to generate an image layout as guidance, which is attempted by a few methods. Nevertheless, these methods fail to generate practically effective layouts due to the diversity of input text and object location. In this paper we push for effective modeling in both text-to-layout generation and layout-to-image synthesis. Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial relationships between objects by modeling the sequential dependencies between them. In the stage of layout-to-image synthesis, we focus on learning the textual-visual semantic alignment per object in the layout to precisely incorporate the input text into the layout-to-image synthesizing process. To evaluate the quality of generated layout, we design a new metric specifically, dubbed Layout Quality Score, which considers both the absolute distribution errors of bounding boxes in the layout and the mutual spatial relationships between them. Extensive experiments on three datasets demonstrate the superior performance of our method over state-of-the-art methods on both predicting the layout and synthesizing the image from the given text.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 9

page 10

page 11

research
01/16/2018

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

We propose a novel hierarchical approach for text-to-image synthesis by ...
research
12/18/2019

CPGAN: Full-Spectrum Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Typical methods for text-to-image synthesis seek to design effective gen...
research
08/19/2019

Seq-SG2SL: Inferring Semantic Layout from Scene Graph Through Sequence to Sequence Learning

Generating semantic layout from scene graph is a crucial intermediate ta...
research
04/06/2022

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

Text logo design heavily relies on the creativity and expertise of profe...
research
08/24/2023

A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

Creating layouts is a fundamental step in graphic design. In this work, ...
research
09/07/2019

Scene Recognition with Prototype-agnostic Scene Layout

Abstract--- Exploiting the spatial structure in scene images is a key re...
research
10/25/2021

Accelerate 3D Object Processing via Spectral Layout

3D image processing is an important problem in computer vision and patte...

Please sign up or login with your details

Forgot password? Click here to reset