Collaborative Diffusion for Multi-Modal Face Generation and Editing

04/20/2023
by   Ziqi Huang, et al.
10

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

READ FULL TEXT

page 1

page 7

page 8

page 15

page 16

page 17

page 18

page 19

research
06/07/2023

Multi-modal Latent Diffusion

Multi-modal data-sets are ubiquitous in modern applications, and multi-m...
research
03/12/2023

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

This paper proposes a unified diffusion framework (dubbed UniDiffuser) t...
research
08/26/2022

Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model

Generating images from hand-drawings is a crucial and fundamental task i...
research
02/16/2023

Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Applying powerful generative denoising diffusion models (DDMs) for downs...
research
07/24/2022

Cross-Modal 3D Shape Generation and Manipulation

Creating and editing the shape and color of 3D objects require tremendou...
research
05/04/2023

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

Multimodal-driven talking face generation refers to animating a portrait...
research
05/24/2022

M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing

The fashion industry has diverse applications in multi-modal image gener...

Please sign up or login with your details

Forgot password? Click here to reset