Text-guided 3D Human Generation from 2D Collections

05/23/2023
by   Tsu-Jui Fu, et al.
0

3D human modeling has been widely used for engaging interaction in gaming, film, and animation. The customization of these characters is crucial for creativity and scalability, which highlights the importance of controllability. In this work, we introduce Text-guided 3D Human Generation (), where a model is to generate a 3D human, guided by the fashion description. There are two goals: 1) the 3D human should render articulately, and 2) its outfit is controlled by the given text. To address this task, we propose Compositional Cross-modal Human (CCH). CCH adopts cross-modal attention to fuse compositional human rendering with the extracted fashion semantics. Each human body part perceives relevant textual guidance as its visual patterns. We incorporate the human prior and semantic discrimination to enhance 3D geometry transformation and fine-grained consistency, enabling it to learn from 2D collections for data efficiency. We conduct evaluations on DeepFashion and SHHQ with diverse fashion attributes covering the shape, fabric, and color of upper and lower clothing. Extensive experiments demonstrate that CCH achieves superior results for with high efficiency.

READ FULL TEXT

page 1

page 6

research
08/11/2022

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

Cross-modal fashion image synthesis has emerged as one of the most promi...
research
05/20/2020

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval

In this paper, we address the text and image matching in cross-modal ret...
research
09/02/2023

AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Generating 3D human motion based on textual descriptions has been a rese...
research
08/22/2023

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

Cross-modal garment synthesis and manipulation will significantly benefi...
research
02/01/2022

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Human-Object Interaction (HOI) detection is an essential task to underst...
research
11/02/2022

CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

Radiology report generation (RRG) has gained increasing research attenti...
research
11/27/2018

A Compositional Textual Model for Recognition of Imperfect Word Images

Printed text recognition is an important problem for industrial OCR syst...

Please sign up or login with your details

Forgot password? Click here to reset