Text2Human: Text-Driven Controllable Human Image Generation

05/31/2022
by   Yuming Jiang, et al.
9

Generating high-quality and diverse human images is an important yet challenging task in vision and graphics. However, existing generative models often fall short under the high diversity of clothing shapes and textures. Furthermore, the generation process is even desired to be intuitively controllable for layman users. In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation. We synthesize full-body human images starting from a given human pose with two dedicated steps. 1) With some texts describing the shapes of clothes, the given human pose is first translated to a human parsing map. 2) The final human image is then generated by providing the system with more attributes about the textures of clothes. Specifically, to model the diversity of clothing textures, we build a hierarchical texture-aware codebook that stores multi-scale neural representations for each type of texture. The codebook at the coarse level includes the structural representations of textures, while the codebook at the fine level focuses on the details of textures. To make use of the learned hierarchical codebook to synthesize desired images, a diffusion-based transformer sampler with mixture of experts is firstly employed to sample indices from the coarsest level of the codebook, which then is used to predict the indices of the codebook at finer levels. The predicted indices at different levels are translated to human images by the decoder learned accompanied with hierarchical codebooks. The use of mixture-of-experts allows for the generated image conditioned on the fine-grained text input. The prediction for finer level indices refines the quality of clothing textures. Extensive quantitative and qualitative evaluations demonstrate that our proposed framework can generate more diverse and realistic human images compared to state-of-the-art methods.

READ FULL TEXT

page 1

page 3

page 6

page 8

page 9

research
10/04/2022

Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures

Human pose transfer aims to synthesize a new view of a person under a gi...
research
09/20/2022

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

High-quality HDRIs(High Dynamic Range Images), typically HDR panoramas, ...
research
11/11/2022

HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation

Text-driven person image generation is an emerging and challenging task ...
research
11/25/2018

PCGAN: Partition-Controlled Human Image Generation

Human image generation is a very challenging task since it is affected b...
research
11/18/2019

Learning to Synthesize Fashion Textures

Existing unconditional generative models mainly focus on modeling genera...
research
06/15/2023

DreamHuman: Animatable 3D Avatars from Text

We present DreamHuman, a method to generate realistic animatable 3D huma...
research
08/31/2023

Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details

We propose Text2Scene, a method to automatically create realistic textur...

Please sign up or login with your details

Forgot password? Click here to reset