Collage Diffusion

03/01/2023
by   Vishnu Sarukkai, et al.
19

Text-conditional diffusion models generate high-quality, diverse images. However, text is often an ambiguous specification for a desired target image, creating the need for additional user-friendly controls for diffusion-based image generation. We focus on having precise control over image output for scenes with several objects. Users control image generation by defining a collage: a text prompt paired with an ordered sequence of layers, where each layer is an RGBA image and a corresponding text prompt. We introduce Collage Diffusion, a collage-conditional diffusion algorithm that allows users to control both the spatial arrangement and visual attributes of objects in the scene, and also enables users to edit individual components of generated images. To ensure that different parts of the input text correspond to the various locations specified in the input collage layers, Collage Diffusion modifies text-image cross-attention with the layers' alpha masks. To maintain characteristics of individual collage layers that are not specified in text, Collage Diffusion learns specialized text representations per layer. Collage input also enables layer-based controls that provide fine-grained control over the final output: users can control image harmonization on a layer-by-layer basis, and they can edit individual objects in generated images while keeping other objects fixed. Collage-conditional image generation requires harmonizing the input collage to make objects fit together–the key challenge involves minimizing changes in the positions and key visual attributes of objects in the input collage while allowing other attributes of the collage to change in the harmonization process. By leveraging the rich information present in layer input, Collage Diffusion generates globally harmonized images that maintain desired object locations and visual characteristics better than prior approaches.

READ FULL TEXT

page 10

page 20

page 21

page 22

page 23

page 24

page 25

page 26

research
08/11/2023

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation

Text-to-image synthesis has achieved high-quality results with recent ad...
research
02/25/2023

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, IMAGEN, and Stable Diffusi...
research
04/13/2023

Expressive Text-to-Image Generation with Rich Text

Plain text has become a prevalent interface for text-to-image synthesis....
research
03/20/2023

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

Text-to-image models give rise to workflows which often begin with an ex...
research
05/27/2023

FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference

Due to the recent success of diffusion models, text-to-image generation ...
research
11/25/2022

SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...
research
12/06/2022

M-VADER: A Model for Diffusion with Multimodal Context

We introduce M-VADER: a diffusion model (DM) for image generation where ...

Please sign up or login with your details

Forgot password? Click here to reset