Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention

10/17/2022
by   Ashutosh Agarwal, et al.
0

Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single RGB image. For both, the convolutional as well as the recent attention-based models, encoder-decoder-based architectures have been found to be useful due to the simultaneous requirement of global context and pixel-level resolution. Typically, a skip connection module is used to fuse the encoder and decoder features, which comprises of feature map concatenation followed by a convolution operation. Inspired by the demonstrated benefits of attention in a multitude of computer vision problems, we propose an attention-based fusion of encoder and decoder features. We pose MDE as a pixel query refinement problem, where coarsest-level encoder features are used to initialize pixel-level queries, which are then refined to higher resolutions by the proposed Skip Attention Module (SAM). We formulate the prediction problem as ordinal regression over the bin centers that discretize the continuous depth range and introduce a Bin Center Predictor (BCP) module that predicts bins at the coarsest level using pixel queries. Apart from the benefit of image adaptive depth binning, the proposed design helps learn improved depth embedding in initial pixel queries via direct supervision from the ground truth. Extensive experiments on the two canonical datasets, NYUV2 and KITTI, show that our architecture outperforms the state-of-the-art by 5.3 along with an improved generalization performance by 9.4 dataset. Code is available at https://github.com/ashutosh1807/PixelFormer.git.

READ FULL TEXT

page 1

page 3

page 6

page 8

research
08/29/2022

Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation

Skip connections are fundamental units in encoder-decoder networks, whic...
research
07/10/2022

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Attention-based models such as transformers have shown outstanding perfo...
research
07/11/2022

Hybrid Skip: A Biologically Inspired Skip Connection for the UNet Architecture

In this work we introduce a biologically inspired long-range skip connec...
research
10/24/2022

Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image

Depth information is the foundation of perception, essential for autonom...
research
05/16/2020

Deep feature fusion for self-supervised monocular depth prediction

Recent advances in end-to-end unsupervised learning has significantly im...
research
03/05/2021

Variational Structured Attention Networks for Deep Visual Representation Learning

Convolutional neural networks have enabled major progress in addressing ...
research
02/26/2021

Boundary-induced and scene-aggregated network for monocular depth prediction

Monocular depth prediction is an important task in scene understanding. ...

Please sign up or login with your details

Forgot password? Click here to reset