Modeling Image Composition for Complex Scene Generation

06/02/2022
by   Zuopeng Yang, et al.
5

We present a method that achieves state-of-the-art results on challenging (few-shot) layout-to-image generation tasks by accurately modeling textures, structures and relationships contained in a complex scene. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch. Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level patch-level and object-level patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training. Furthermore, the proposed TwFA largely increases the data efficiency during training, therefore we propose the first few-shot complex scene generation strategy based on the well-trained TwFA. Comprehensive experiments show the superiority of our method, which significantly increases both quantitative metrics and qualitative visual realism with respect to state-of-the-art CNN-based and transformer-based methods. Code is available at https://github.com/JohnDreamer/TwFA.

READ FULL TEXT

page 6

page 14

page 15

page 16

page 17

page 18

page 19

page 20

research
02/27/2021

Transformer in Transformer

Transformer is a type of self-attention-based neural networks originally...
research
03/26/2023

Sector Patch Embedding: An Embedding Module Conforming to The Distortion Pattern of Fisheye Image

Fisheye cameras suffer from image distortion while having a large field ...
research
03/08/2023

Transformer-based Image Generation from Scene Graphs

Graph-structured scene descriptions can be efficiently used in generativ...
research
04/09/2022

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

The dominant CNN-based methods for cross-view image geo-localization rel...
research
07/20/2022

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

In this paper, we present NUWA-Infinity, a generative model for infinite...
research
11/17/2021

Compositional Transformers for Scene Generation

We introduce the GANformer2 model, an iterative object-oriented transfor...
research
04/14/2023

A Unified HDR Imaging Method with Pixel and Patch Level

Mapping Low Dynamic Range (LDR) images with different exposures to High ...

Please sign up or login with your details

Forgot password? Click here to reset