ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

05/25/2023
by   Zhengyi Wang, et al.
0

Score distillation sampling (SDS) has shown great promise in text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models, but suffers from over-saturation, over-smoothing, and low-diversity problems. In this work, we propose to model the 3D parameter as a random variable instead of a constant as in SDS and present variational score distillation (VSD), a principled particle-based variational framework to explain and address the aforementioned issues in text-to-3D generation. We show that SDS is a special case of VSD and leads to poor samples with both small and large CFG weights. In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i.e., 7.5). We further present various improvements in the design space for text-to-3D such as distillation time schedule and density initialization, which are orthogonal to the distillation algorithm yet not well explored. Our overall approach, dubbed ProlificDreamer, can generate high rendering resolution (i.e., 512×512) and high-fidelity NeRF with rich structure and complex effects (e.g., smoke and drops). Further, initialized from NeRF, meshes fine-tuned by VSD are meticulously detailed and photo-realistic. Project page: https://ml.cs.tsinghua.edu.cn/prolificdreamer/

READ FULL TEXT

page 2

page 14

page 15

page 27

page 29

page 30

page 31

page 32

research
03/27/2023

Debiasing Scores and Prompts of 2D Diffusion for Robust Text-to-3D Generation

The view inconsistency problem in score-distilling text-to-3D generation...
research
03/14/2023

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Text-to-3D generation has shown rapid progress in recent days with the a...
research
05/24/2023

Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers

Large language models (LLMs) have shown remarkable ability on controllab...
research
11/21/2022

VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

Diffusion models have shown impressive results in text-to-image synthesi...
research
09/19/2023

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Diffusion models power a vast majority of text-to-audio (TTA) generation...
research
07/30/2023

HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation

In this paper, we study Text-to-3D content generation leveraging 2D diff...
research
03/21/2023

Compositional 3D Scene Generation using Locally Conditioned Diffusion

Designing complex 3D scenes has been a tedious, manual process requiring...

Please sign up or login with your details

Forgot password? Click here to reset