LayoutBERT: Masked Language Layout Model for Object Insertion

by   Kerem Turgutlu, et al.

Image compositing is one of the most fundamental steps in creative workflows. It involves taking objects/parts of several images to create a new image, called a composite. Currently, this process is done manually by creating accurate masks of objects to be inserted and carefully blending them with the target scene or images, usually with the help of tools such as Photoshop or GIMP. While there have been several works on automatic selection of objects for creating masks, the problem of object placement within an image with the correct position, scale, and harmony remains a difficult problem with limited exploration. Automatic object insertion in images or designs is a difficult problem as it requires understanding of the scene geometry and the color harmony between objects. We propose LayoutBERT for the object insertion task. It uses a novel self-supervised masked language model objective and bidirectional multi-head self-attention. It outperforms previous layout-based likelihood models and shows favorable properties in terms of model capacity. We demonstrate the effectiveness of our approach for object insertion in the image compositing setting and other settings like documents and design templates. We further demonstrate the usefulness of the learned representations for layout-based retrieval tasks. We provide both qualitative and quantitative evaluations on datasets from diverse domains like COCO, PublayNet, and two new datasets which we call Image Layouts and Template Layouts. Image Layouts which consists of 5.8 million images with layout annotations is the largest image layout dataset to our knowledge. We also share ablation study results on the effect of dataset size, model size and class sample size for this task.


page 6

page 7

page 8

page 10

page 12

page 13

page 14

page 15


LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Recently, diffusion models have achieved great success in image synthesi...

Image Generation from Scene Graphs

To truly understand the visual world our models should be able not only ...

Layout Generation and Completion with Self-attention

We address the problem of layout generation for diverse domains such as ...

Image Generation from Layout

Despite significant recent progress on generative models, controlled gen...

Learning 3D Object Shape and Layout without 3D Supervision

A 3D scene consists of a set of objects, each with a shape and a layout ...

Scene Graph to Image Generation with Contextualized Object Layout Refinement

Generating high-quality images from scene graphs, that is, graphs that d...

Diverse Multimedia Layout Generation with Multi Choice Learning

Designing visually appealing layouts for multimedia documents containing...

Please sign up or login with your details

Forgot password? Click here to reset