HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation

11/11/2022
by   Kaiduo Zhang, et al.
0

Text-driven person image generation is an emerging and challenging task in cross-modality image generation. Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on. However, previous methods mostly employ single-modality information as the prior condition (e.g. pose-guided person image generation), or utilize the preset words for text-driven human synthesis. Introducing a sentence composed of free words with an editable semantic pose map to describe person appearance is a more user-friendly way. In this paper, we propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation. Specifically, two collaborative modules are proposed, the Stylized Memory Retrieval (SMR) module for fine-grained feature distillation in data processing and the Multi-scale Cross-modality Alignment (MCA) module for coarse-to-fine feature alignment in diffusion. These two modules guarantee the alignment quality of the text and image, from image-level to feature-level, from low-resolution to high-resolution. As a result, HumanDiffusion realizes open-vocabulary person image generation with desired semantic poses. Extensive experiments conducted on DeepFashion demonstrate the superiority of our method compared with previous approaches. Moreover, better results could be obtained for complicated person images with various details and uncommon poses.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 9

research
04/18/2023

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

Existing person image generative models can do either image generation o...
research
08/02/2023

Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation

The recently rising markup-to-image generation poses greater challenges ...
research
06/26/2023

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models

Text-to-image diffusion models have advanced towards more controllable g...
research
05/31/2022

Text2Human: Text-Driven Controllable Human Image Generation

Generating high-quality and diverse human images is an important yet cha...
research
11/26/2021

Self-supervised Correlation Mining Network for Person Image Generation

Person image generation aims to perform non-rigid deformation on source ...
research
10/02/2020

MGD-GAN: Text-to-Pedestrian generation through Multi-Grained Discrimination

In this paper, we investigate the problem of text-to-pedestrian synthesi...
research
05/23/2023

Understanding Text-driven Motion Synthesis with Keyframe Collaboration via Diffusion Models

The emergence of text-driven motion synthesis technique provides animato...

Please sign up or login with your details

Forgot password? Click here to reset