Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

07/10/2022
by   Ashutosh Agarwal, et al.
0

Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of transformers for monocular depth prediction has seldom been explored so far. This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset. We propose a novel attention-based architecture, Depthformer for monocular depth estimation that uses multi-head self-attention to produce the multiscale feature maps, which are effectively combined by our proposed decoder network. We also propose a Transbins module that divides the depth range into bins whose center value is estimated adaptively per image. The final depth estimated is a linear combination of bin centers for each pixel. Transbins module takes advantage of the global receptive field using the transformer module in the encoding stage. Experimental results on NYUV2 and KITTI depth estimation benchmark demonstrate that our proposed method improves the state-of-the-art by 3.3 available at https://github.com/ashutosh1807/Depthformer.git.

READ FULL TEXT
research
03/27/2022

DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation

This paper aims to address the problem of supervised monocular depth est...
research
03/06/2023

DwinFormer: Dual Window Transformers for End-to-End Monocular Depth Estimation

Depth estimation from a single image is of paramount importance in the r...
research
04/25/2023

CompletionFormer: Depth Completion with Convolutions and Vision Transformers

Given sparse depths and the corresponding RGB images, depth completion a...
research
12/29/2021

ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama Depth Estimation

Depth estimation is a crucial step for 3D reconstruction with panorama i...
research
10/17/2022

Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention

Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given ...
research
04/03/2022

BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation

Monocular depth estimation is a fundamental task in computer vision and ...
research
03/24/2021

Vision Transformers for Dense Prediction

We introduce dense vision transformers, an architecture that leverages v...

Please sign up or login with your details

Forgot password? Click here to reset