Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement

07/22/2022
by   Prafull Sharma, et al.
0

Human perception reliably identifies movable and immovable parts of 3D scenes, and completes the 3D structure of objects and background from incomplete observations. We learn this skill not via labeled examples, but simply by observing objects move. In this work, we propose an approach that observes unlabeled multi-view videos at training time and learns to map a single image observation of a complex scene, such as a street with cars, to a 3D neural scene representation that is disentangled into movable and immovable parts while plausibly completing its 3D structure. We separately parameterize movable and immovable scene parts via 2D neural ground plans. These ground plans are 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields. Our model is trained self-supervised via neural rendering. We demonstrate that the structure inherent to our disentangled 3D representation enables a variety of downstream tasks in street-scale 3D scenes using simple heuristics, such as extraction of object-centric 3D representations, novel view synthesis, instance segmentation, and 3D bounding box prediction, highlighting its value as a backbone for data-efficient 3D scene understanding models. This disentanglement further enables scene editing via object manipulation such as deletion, insertion, and rigid-body motion.

READ FULL TEXT

page 1

page 3

page 5

page 10

page 11

page 12

page 13

page 14

research
12/22/2020

STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in Motion with Neural Rendering

We present STaR, a novel method that performs Self-supervised Tracking a...
research
03/25/2023

SUDS: Scalable Urban Dynamic Scenes

We extend neural radiance fields (NeRFs) to dynamic large-scale urban sc...
research
07/16/2021

Unsupervised Discovery of Object Radiance Fields

We study the problem of inferring an object-centric scene representation...
research
09/07/2022

Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations

We present Neural Feature Fusion Fields (N3F), a method that improves de...
research
05/09/2022

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

We present Panoptic Neural Fields (PNF), an object-aware neural scene re...
research
03/24/2023

AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation

Both indoor and outdoor environments are inherently structured and repet...
research
11/19/2020

Multi-Plane Program Induction with 3D Box Priors

We consider two important aspects in understanding and editing images: m...

Please sign up or login with your details

Forgot password? Click here to reset