Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

04/20/2023
by   Tomas Jakab, et al.
0

We present Farm3D, a method to learn category-specific 3D reconstructors for articulated objects entirely from "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn, given a collection of single-view images of an object category, a monocular network to predict the 3D shape, albedo, illumination and viewpoint of any object occurrence. We propose a framework using an image generator like Stable Diffusion to generate virtual training data for learning such a reconstruction network from scratch. Furthermore, we include the diffusion model as a score to further improve learning. The idea is to randomise some aspects of the reconstruction, such as viewpoint and illumination, generating synthetic views of the reconstructed 3D object, and have the 2D network assess the quality of the resulting image, providing feedback to the reconstructor. Different from work based on distillation which produces a single 3D asset for each textual prompt in hours, our approach produces a monocular reconstruction network that can output a controllable 3D asset from a given image, real or generated, in only seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 7

page 11

page 12

page 13

research
08/27/2023

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views

Reconstructing 3D objects from extremely sparse views is a long-standing...
research
03/20/2023

Zero-1-to-3: Zero-shot One Image to 3D Object

We introduce Zero-1-to-3, a framework for changing the camera viewpoint ...
research
09/14/2023

Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models

Text-to-image diffusion models understand spatial relationship between o...
research
04/21/2022

Planes vs. Chairs: Category-guided 3D shape learning without any 3D cues

We present a novel 3D shape reconstruction method which learns to predic...
research
12/20/2022

InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds

In this paper, we take a significant step towards real-world applicabili...
research
03/23/2023

SAOR: Single-View Articulated Object Reconstruction

We introduce SAOR, a novel approach for estimating the 3D shape, texture...
research
09/07/2023

Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

Recent advances in diffusion models such as ControlNet have enabled geom...

Please sign up or login with your details

Forgot password? Click here to reset