ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

08/19/2021
by   Patrick Esser, et al.
12

Autoregressive models and their sequential factorization of the data likelihood have recently demonstrated great potential for image representation and synthesis. Nevertheless, they incorporate image context in a linear 1D order by attending only to previously synthesized image patches above or to the left. Not only is this unidirectional, sequential bias of attention unnatural for images as it disregards large parts of a scene until synthesis is almost complete. It also processes the entire image on a single scale, thus ignoring more global contextual information up to the gist of the entire scene. As a remedy we incorporate a coarse-to-fine hierarchy of context by combining the autoregressive formulation with a multinomial diffusion process: Whereas a multistage diffusion process successively removes information to coarsen an image, we train a (short) Markov chain to invert this process. In each stage, the resulting autoregressive ImageBART model progressively incorporates context from previous stages in a coarse-to-fine manner. Experiments show greatly improved image modification capabilities over autoregressive models while also providing high-fidelity image generation, both of which are enabled through efficient training in a compressed latent space. Specifically, our approach can take unrestricted, user-provided masks into account to perform local image editing. Thus, in contrast to pure autoregressive models, it can solve free-form image inpainting and, in the case of conditional models, local, text-guided image modification without requiring mask-specific training.

READ FULL TEXT

page 13

page 14

page 15

page 16

page 20

page 21

page 22

page 27

research
12/03/2021

Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation

The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) wit...
research
10/05/2022

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Recently, vector quantized autoregressive (VQ-AR) models have shown rema...
research
12/20/2021

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application...
research
08/29/2022

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Diffusion models (DMs) have shown great potential for high-quality image...
research
03/06/2019

Hierarchical Autoregressive Image Models with Auxiliary Decoders

Autoregressive generative models of images tend to be biased towards cap...
research
06/23/2023

DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

We present DiffInfinite, a hierarchical diffusion model that generates a...
research
09/06/2022

Semantic Image Synthesis with Semantically Coupled VQ-Model

Semantic image synthesis enables control over unconditional image genera...

Please sign up or login with your details

Forgot password? Click here to reset