DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

05/24/2023
by   Sungnyun Kim, et al.
1

The recent progress in diffusion-based text-to-image generation models has significantly expanded generative capabilities via conditioning the text descriptions. However, since relying solely on text prompts is still restrictive for fine-grained customization, we aim to extend the boundaries of conditional generation to incorporate diverse types of modalities, e.g., sketch, box, and style embedding, simultaneously. We thus design a multimodal text-to-image diffusion model, coined as DiffBlender, that achieves the aforementioned goal in a single model by training only a few small hypernetworks. DiffBlender facilitates a convenient scaling of input modalities, without altering the parameters of an existing large-scale generative model to retain its well-established knowledge. Furthermore, our study sets new standards for multimodal generation by conducting quantitative and qualitative comparisons with existing approaches. By diversifying the channels of conditioning modalities, DiffBlender faithfully reflects the provided information or, in its absence, creates imaginative generation.

READ FULL TEXT

page 1

page 2

page 7

page 9

page 15

page 16

page 17

page 18

research
05/07/2023

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

With the help of conditioning mechanisms, the state-of-the-art diffusion...
research
05/19/2023

Any-to-Any Generation via Composable Diffusion

We present Composable Diffusion (CoDi), a novel generative model capable...
research
08/22/2023

MatFuse: Controllable Material Generation with Diffusion Models

Creating high quality and realistic materials in computer graphics is a ...
research
02/24/2023

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

We present multimodal conditioning modules (MCM) for enabling conditiona...
research
06/07/2020

Realistic text replacement with non-uniform style conditioning

In this work, we study the possibility of realistic text replacement, th...
research
09/15/2023

Breathing New Life into 3D Assets with Generative Repainting

Diffusion-based text-to-image models ignited immense attention from the ...
research
02/16/2023

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

The incredible generative ability of large-scale text-to-image (T2I) mod...

Please sign up or login with your details

Forgot password? Click here to reset