Local 3D Editing via 3D Distillation of CLIP Knowledge

06/21/2023
by   Junha Hyung, et al.
0

3D content manipulation is an important computer vision task with many real-world applications (e.g., product design, cartoon generation, and 3D Avatar editing). Recently proposed 3D GANs can generate diverse photorealistic 3D-aware contents using Neural Radiance fields (NeRF). However, manipulation of NeRF still remains a challenging problem since the visual quality tends to degrade after manipulation and suboptimal control handles such as 2D semantic maps are used for manipulations. While text-guided manipulations have shown potential in 3D editing, such approaches often lack locality. To overcome these problems, we propose Local Editing NeRF (LENeRF), which only requires text inputs for fine-grained and localized manipulation. Specifically, we present three add-on modules of LENeRF, the Latent Residual Mapper, the Attention Field Network, and the Deformation Network, which are jointly used for local manipulations of 3D features by estimating a 3D attention field. The 3D attention field is learned in an unsupervised way, by distilling the zero-shot mask generation capability of CLIP to the 3D space with multi-view guidance. We conduct diverse experiments and thorough evaluations both quantitatively and qualitatively.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 7

page 8

research
08/30/2023

Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models

Denoising diffusion models have shown outstanding performance in image e...
research
05/24/2023

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

Editing real facial images is a crucial task in computer vision with sig...
research
02/23/2023

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Image manipulation under the guidance of textual descriptions has recent...
research
02/06/2022

FEAT: Face Editing with Attention

Employing the latent space of pretrained generators has recently been sh...
research
08/09/2021

Learning to Cut by Watching Movies

Video content creation keeps growing at an incredible pace; yet, creatin...
research
12/28/2021

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Text-guided image manipulation tasks have recently gained attention in t...
research
05/02/2019

Real Differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors

OT (Operational Transformation) was invented for supporting real-time co...

Please sign up or login with your details

Forgot password? Click here to reset