A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes

08/11/2021
by   Davide Menini, et al.
6

This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label. Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space. Our approach exploits the joint volumetric representation of the depth and semantics in the scene feature space to solve this task. For a compelling online fusion of the semantic labels and geometry in real-time, we introduce an efficient vortex pooling block while dropping the routing network in online depth fusion to preserve high-frequency surface details. We show that the context information provided by the semantics of the scene helps the depth fusion network learn noise-resistant features. Not only that, it helps overcome the shortcomings of the current online depth fusion method in dealing with thin object structures, thickening artifacts, and false surfaces. Experimental evaluation on the Replica dataset shows that our approach can perform depth fusion at 37, 10 frames per second with an average reconstruction F-score of 88 our model shows an average IoU score of 0.515 on the ScanNet 3D semantic benchmark leaderboard.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

research
06/18/2019

Active Scene Understanding via Online Semantic Reconstruction

We propose a novel approach to robot-operated active understanding of un...
research
09/02/2019

Learned Semantic Multi-Sensor Depth Map Fusion

Volumetric depth map fusion based on truncated signed distance functions...
research
11/30/2020

NeuralFusion: Online Depth Fusion in Latent Space

We present a novel online depth map fusion approach that learns depth ma...
research
03/16/2023

Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video

We present a novel real-time capable learning method that jointly percei...
research
10/26/2020

SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion

Real-time scene reconstruction from depth data inevitably suffers from o...
research
08/09/2020

From depth image to semantic scene synthesis through point cloud classification and labeling: Application to assistive systems

The aim of this work is to provide a semantic scene synthesis from depth...
research
09/16/2019

Boosting Real-Time Driving Scene Parsing with Shared Semantics

Real-time scene parsing is a fundamental feature for autonomous driving ...

Please sign up or login with your details

Forgot password? Click here to reset