MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

07/18/2022
by   Runze Li, et al.
8

Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments. However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices. As compared to outdoor environments, estimating depth of monocular videos for indoor environments, using self-supervised methods, results in two additional challenges: (i) the depth range of indoor video sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues for training; (ii) the indoor sequences recorded with handheld devices often contain much more rotational motions, which cause difficulties for the pose network to predict accurate relative camera poses. In this work, we propose a novel framework-MonoIndoor++ by giving special considerations to those challenges and consolidating a set of good practices for improving the performance of self-supervised monocular depth estimation for indoor environments. First, a depth factorization module with transformer-based scale regression network is proposed to estimate a global depth scale factor explicitly, and the predicted scale factor can indicate the maximum depth values. Second, rather than using a single-stage pose estimation strategy as in previous methods, we propose to utilize a residual pose estimation module to estimate relative camera poses across consecutive frames iteratively. Third, to incorporate extensive coordinates guidance for our residual pose estimation module, we propose to perform coordinate convolutional encoding directly over the inputs to pose networks. The proposed method is validated on a variety of benchmark indoor datasets, i.e., EuRoC MAV, NYUv2, ScanNet and 7-Scenes, demonstrating the state-of-the-art performance.

READ FULL TEXT

page 1

page 4

page 8

page 9

page 11

research
07/26/2021

MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

Self-supervised depth estimation for indoor environments is more challen...
research
11/19/2018

Indoor GeoNet: Weakly Supervised Hybrid Learning for Depth and Pose Estimation

Humans naturally perceive a 3D scene in front of them through accumulati...
research
11/20/2019

Unsupervised Monocular Depth Prediction for Indoor Continuous Video Streams

This paper studies unsupervised monocular depth prediction problem. Most...
research
12/14/2022

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Training a Neural Radiance Field (NeRF) without pre-computed camera pose...
research
04/03/2022

Distortion-Aware Self-Supervised 360° Depth Estimation from A Single Equirectangular Projection Image

360 images are widely available over the last few years. This paper prop...
research
05/13/2020

Self-Supervised Deep Visual Odometry with Online Adaptation

Self-supervised VO methods have shown great success in jointly estimatin...
research
11/10/2020

SelfDeco: Self-Supervised Monocular Depth Completion in Challenging Indoor Environments

We present a novel algorithm for self-supervised monocular depth complet...

Please sign up or login with your details

Forgot password? Click here to reset