Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

02/24/2023
by   Cusuh Ham, et al.
0

We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but does not require any updates to the diffusion network's parameters. MCM is a small module trained to modulate the diffusion network's predictions during sampling using 2D modalities (e.g., semantic segmentation maps, sketches) that were unseen during the original training of the diffusion model. We show that MCM enables user control over the spatial layout of the image and leads to increased control over the image generation process. Training MCM is cheap as it does not require gradients from the original diffusion net, consists of only ∼1% of the number of parameters of the base diffusion model, and is trained using only a limited number of training examples. We evaluate our method on unconditional and text-conditional models to demonstrate the improved control over the generated images and their alignment with respect to the conditioning inputs.

READ FULL TEXT

page 6

page 9

page 16

page 17

page 19

page 20

page 23

page 24

research
02/10/2023

Adding Conditional Control to Text-to-Image Diffusion Models

We present a neural network structure, ControlNet, to control pretrained...
research
05/07/2023

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

With the help of conditioning mechanisms, the state-of-the-art diffusion...
research
05/24/2023

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

The recent progress in diffusion-based text-to-image generation models h...
research
04/25/2022

Retrieval-Augmented Diffusion Models

Generative image synthesis with diffusion models has recently achieved e...
research
04/13/2023

Learning Controllable 3D Diffusion Models from Single-view Images

Diffusion models have recently become the de-facto approach for generati...
research
07/08/2023

Measuring the Success of Diffusion Models at Imitating Human Artists

Modern diffusion models have set the state-of-the-art in AI image genera...
research
12/19/2022

Optimizing Prompts for Text-to-Image Generation

Well-designed prompts can guide text-to-image models to generate amazing...

Please sign up or login with your details

Forgot password? Click here to reset