3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

03/21/2023
by   Yu-Jhe Li, et al.
0

We tackle the task of text-to-3D creation with pre-trained latent-based NeRFs (NeRFs that generate 3D objects given input latent code). Recent works such as DreamFusion and Magic3D have shown great success in generating 3D content using NeRFs and text prompts, but the current approach of optimizing a NeRF for every text prompt is 1) extremely time-consuming and 2) often leads to low-resolution outputs. To address these challenges, we propose a novel method named 3D-CLFusion which leverages the pre-trained latent-based NeRFs and performs fast 3D content creation in less than a minute. In particular, we introduce a latent diffusion prior network for learning the w latent from the input CLIP text/image embeddings. This pipeline allows us to produce the w latent without further optimization during inference and the pre-trained NeRF is able to perform multi-view high-resolution 3D synthesis based on the latent. We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code. We demonstrate through experiments the effectiveness of our proposed view-invariant diffusion process for fast text-to-3D creation, e.g., 100 times faster than DreamFusion. We note that our model is able to serve as the role of a plug-and-play tool for text-to-3D with pre-trained NeRFs.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 12

page 13

page 14

page 15

research
10/05/2022

clip2latent: Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP

We introduce a new method to efficiently create text-to-image models fro...
research
11/18/2022

Magic3D: High-Resolution Text-to-3D Content Creation

DreamFusion has recently demonstrated the utility of a pre-trained text-...
research
04/06/2023

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

The increasing demand for high-quality 3D content creation has motivated...
research
09/04/2023

ControlMat: A Controlled Generative Approach to Material Capture

Material reconstruction from a photograph is a key component of 3D conte...
research
08/31/2023

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Stable diffusion, a generative model used in text-to-image synthesis, fr...
research
09/27/2021

ClipMatrix: Text-controlled Creation of 3D Textured Meshes

If a picture is worth thousand words, a moving 3d shape must be worth a ...
research
03/05/2023

PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow

During industrial processing, unforeseen defects may arise in products d...

Please sign up or login with your details

Forgot password? Click here to reset