Learning 3D Scene Priors with 2D Supervision

11/25/2022
by   Yinyu Nie, et al.
0

Holistic 3D scene understanding entails estimation of both layout configuration and object geometry in a 3D environment. Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth. Instead, we rely on 2D supervision from multi-view RGB images. Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories, 3D bounding boxes, and meshes. With our trained autoregressive decoder representing the scene prior, our method facilitates many downstream applications, including scene synthesis, interpolation, and single-view reconstruction. Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction, and achieves state-of-the-art results in scene synthesis against baselines which require for 3D supervision.

READ FULL TEXT

page 6

page 7

page 15

page 16

research
02/27/2020

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Semantic reconstruction of indoor scenes refers to both scene understand...
research
06/14/2022

Learning 3D Object Shape and Layout without 3D Supervision

A 3D scene consists of a set of objects, each with a shape and a layout ...
research
07/30/2020

Unsupervised Continuous Object Representation Networks for Novel View Synthesis

Novel View Synthesis (NVS) is concerned with the generation of novel vie...
research
05/21/2023

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Panoramic image enables deeper understanding and more holistic perceptio...
research
10/31/2018

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Holistic 3D indoor scene understanding refers to jointly recovering the ...
research
12/01/2021

Pose2Room: Understanding 3D Scenes from Human Activities

With wearable IMU sensors, one can estimate human poses from wearable de...

Please sign up or login with your details

Forgot password? Click here to reset