Improving 360 Monocular Depth Estimation via Non-local Dense Prediction Transformer and Joint Supervised and Self-supervised Learning

09/22/2021
by   IlWi Yun, et al.
13

Due to difficulties in acquiring ground truth depth of equirectangular (360) images, the quality and quantity of equirectangular depth data today is insufficient to represent the various scenes in the world. Therefore, 360 depth estimation studies, which relied solely on supervised learning, are destined to produce unsatisfactory results. Although self-supervised learning methods focusing on equirectangular images (EIs) are introduced, they often have incorrect or non-unique solutions, causing unstable performance. In this paper, we propose 360 monocular depth estimation methods which improve on the areas that limited previous studies. First, we introduce a self-supervised 360 depth learning method that only utilizes gravity-aligned videos, which has the potential to eliminate the needs for depth data during the training procedure. Second, we propose a joint learning scheme realized by combining supervised and self-supervised learning. The weakness of each learning is compensated, thus leading to more accurate depth estimation. Third, we propose a non-local fusion block, which retains global information encoded by vision transformer when reconstructing the depths. With the proposed methods, we successfully apply the transformer to 360 depth estimations, to the best of our knowledge, which has not been tried before. On several benchmarks, our approach achieves significant improvements over previous works and establishes a state of the art.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 10

page 11

page 12

page 13

research
08/06/2022

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

Self-supervised monocular depth estimation is an attractive solution tha...
research
03/23/2021

Revisiting Self-Supervised Monocular Depth Estimation

Self-supervised learning of depth map prediction and motion estimation f...
research
04/16/2023

EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation

Estimating the depths of equirectangular (360) images (EIs) is challengi...
research
09/07/2022

BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360 Depth Estimation

Due to the rise of spherical cameras, monocular 360 depth estimation bec...
research
08/04/2023

Diffusion-Augmented Depth Prediction with Sparse Annotations

Depth estimation aims to predict dense depth maps. In autonomous driving...
research
04/14/2023

The Second Monocular Depth Estimation Challenge

This paper discusses the results for the second edition of the Monocular...
research
04/28/2022

Depth Estimation with Simplified Transformer

Transformer and its variants have shown state-of-the-art results in many...

Please sign up or login with your details

Forgot password? Click here to reset