Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

09/07/2023
by   Sungwon Hwang, et al.
0

Recent advances in diffusion models such as ControlNet have enabled geometrically controllable, high-fidelity text-to-image generation. However, none of them addresses the question of adding such controllability to text-to-3D generation. In response, we propose Text2Control3D, a controllable text-to-3D avatar generation method whose facial expression is controllable given a monocular video casually captured with hand-held camera. Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images that we generate from ControlNet, whose condition input is the depth map extracted from the input video. When generating the viewpoint-aware images, we utilize cross-reference attention to inject well-controlled, referential facial expression and appearance via cross attention. We also conduct low-pass filtering of Gaussian latent of the diffusion model in order to ameliorate the viewpoint-agnostic texture problem we observed from our empirical analysis, where the viewpoint-aware images contain identical textures on identical pixel positions that are incomprehensible in 3D. Finally, to train NeRF with the images that are viewpoint-aware yet are not strictly consistent in geometry, our approach considers per-image geometric variation as a view of deformation from a shared 3D canonical space. Consequently, we construct the 3D avatar in a canonical space of deformable NeRF by learning a set of per-image deformation via deformation field table. We demonstrate the empirical results and discuss the effectiveness of our method.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

research
04/03/2023

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

We present DreamAvatar, a text-and-shape guided framework for generating...
research
03/12/2022

3D-GIF: 3D-Controllable Object Generation via Implicit Factorized Representations

While NeRF-based 3D-aware image generation methods enable viewpoint cont...
research
08/09/2018

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

The recent advances in deep learning have made it possible to generate p...
research
09/20/2023

Controllable Dynamic Appearance for Neural 3D Portraits

Recent advances in Neural Radiance Fields (NeRFs) have made it possible ...
research
04/13/2022

Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions

Recently, talking-face video generation has received considerable attent...
research
07/10/2023

Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

The ability to generate diverse 3D articulated head avatars is vital to ...
research
04/20/2023

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

We present Farm3D, a method to learn category-specific 3D reconstructors...

Please sign up or login with your details

Forgot password? Click here to reset