Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

08/29/2022
by   Wan-Cyuan Fan, et al.
15

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at https://github.com/davidhalladay/Frido.

READ FULL TEXT

page 2

page 8

page 13

page 16

page 17

page 19

page 20

page 21

research
03/30/2023

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Recently, diffusion models have achieved great success in image synthesi...
research
05/05/2022

Scene Graph Expansion for Semantics-Guided Image Outpainting

In this paper, we address the task of semantics-guided image outpainting...
research
03/08/2023

Transformer-based Image Generation from Scene Graphs

Graph-structured scene descriptions can be efficiently used in generativ...
research
03/25/2023

Freestyle Layout-to-Image Synthesis

Typical layout-to-image synthesis (LIS) models generate images for a clo...
research
08/19/2021

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Autoregressive models and their sequential factorization of the data lik...
research
11/17/2021

Compositional Transformers for Scene Generation

We introduce the GANformer2 model, an iterative object-oriented transfor...
research
06/01/2022

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Recently most successful image synthesis models are multi stage process ...

Please sign up or login with your details

Forgot password? Click here to reset