VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling

10/20/2022
by   Beiwen Tian, et al.
0

Recently, 3D scenes parsing with deep learning approaches has been a heating topic. However, current methods with fully-supervised models require manually annotated point-wise supervision which is extremely user-unfriendly and time-consuming to obtain. As such, training 3D scene parsing models with sparse supervision is an intriguing alternative. We term this task as data-efficient 3D scene parsing and propose an effective two-stage framework named VIBUS to resolve it by exploiting the enormous unlabeled points. In the first stage, we perform self-supervised representation learning on unlabeled points with the proposed Viewpoint Bottleneck loss function. The loss function is derived from an information bottleneck objective imposed on scenes under different viewpoints, making the process of representation learning free of degradation and sampling. In the second stage, pseudo labels are harvested from the sparse labels based on uncertainty-spectrum modeling. By combining data-driven uncertainty measures and 3D mesh spectrum measures (derived from normal directions and geodesic distances), a robust local affinity metric is obtained. Finite gamma/beta mixture models are used to decompose category-wise distributions of these measures, leading to automatic selection of thresholds. We evaluate VIBUS on the public benchmark ScanNet and achieve state-of-the-art results on both validation set and online test server. Ablation studies show that both Viewpoint Bottleneck and uncertainty-spectrum modeling bring significant improvements. Codes and models are publicly available at https://github.com/AIR-DISCOVER/VIBUS.

READ FULL TEXT

page 3

page 16

page 17

page 19

page 35

page 36

page 37

research
09/17/2021

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck

Semantic understanding of 3D point clouds is important for various robot...
research
04/03/2020

Self-Supervised Viewpoint Learning From Image Collections

Training deep neural networks to estimate the viewpoint of objects requi...
research
03/11/2019

GOGGLES: Automatic Training Data Generation with Affinity Coding

Generating large labeled training data is becoming the biggest bottlenec...
research
07/22/2022

Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness

Adversarial training (AT) for robust representation learning and self-su...
research
08/21/2021

SSR: Semi-supervised Soft Rasterizer for single-view 2D to 3D Reconstruction

Recent work has made significant progress in learning object meshes with...
research
07/26/2018

Unified Perceptual Parsing for Scene Understanding

Humans recognize the visual world at multiple levels: we effortlessly ca...

Please sign up or login with your details

Forgot password? Click here to reset