TextCraft: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Text

11/02/2022
by   Aditya Sanghi, et al.
0

Language is one of the primary means by which we describe the 3D world around us. While rapid progress has been made in text-to-2D-image synthesis, similar progress in text-to-3D-shape synthesis has been hindered by the lack of paired (text, shape) data. Moreover, extant methods for text-to-shape generation have limited shape diversity and fidelity. We introduce TextCraft, a method to address these limitations by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs for training. TextCraft achieves this by using CLIP and using a multi-resolution approach by first generating in a low-dimensional latent space and then upscaling to a higher resolution, improving the fidelity of the generated shape. To improve shape diversity, we use a discrete latent space which is modelled using a bidirectional transformer conditioned on the interchangeable image-text embedding space induced by CLIP. Moreover, we present a novel variant of classifier-free guidance, which further improves the accuracy-diversity trade-off. Finally, we perform extensive experiments that demonstrate that TextCraft outperforms state-of-the-art baselines.

READ FULL TEXT

page 4

page 8

research
03/28/2022

Towards Implicit Text-Guided 3D Shape Generation

In this work, we explore the challenging task of generating 3D shapes fr...
research
09/09/2022

ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation

Text-guided 3D shape generation remains challenging due to the absence o...
research
06/30/2023

Counting Guidance for High Fidelity Text-to-Image Synthesis

Recently, the quality and performance of text-to-image generation signif...
research
03/23/2023

TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision

In this paper, we investigate an open research task of generating contro...
research
11/14/2022

Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces

Conditional text-to-image generation has seen countless recent improveme...
research
02/02/2023

Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors

Fast generation of high-quality 3D digital humans is important to a vast...
research
06/14/2023

ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

Current state-of-the-art methods for text-to-shape generation either req...

Please sign up or login with your details

Forgot password? Click here to reset