Transformers Solve the Limited Receptive Field for Monocular Depth Prediction

03/22/2021
by   Guanglei Yang, et al.
13

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution operation. Transformers, initially designed for natural language processing tasks, have emerged as alternative architectures with innate global self-attention mechanisms to capture long-range dependencies. In this paper, we propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. To avoid the network to loose its ability to capture local-level details due to the adoption of transformers, we propose a novel decoder which employs on attention mechanisms based on gates. Notably, this is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels (i.e., monocular depth prediction and surface normal estimation). Extensive experiments demonstrate that the proposed TransDepth achieves state-of-the-art performance on three challenging datasets. The source code and trained models are available at https://github.com/ygjwd12345/TransDepth.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 6

page 7

page 8

06/04/2021

Glance-and-Gaze Vision Transformer

Recently, there emerges a series of vision Transformers, which show supe...
10/07/2021

TranSalNet: Towards perceptually relevant visual saliency prediction

Convolutional neural networks (CNNs) have significantly advanced computa...
04/12/2021

LocalViT: Bringing Locality to Vision Transformers

We study how to introduce locality mechanisms into vision transformers. ...
03/01/2021

OmniNet: Omnidirectional Representations from Transformers

This paper proposes Omnidirectional Representations from Transformers (O...
10/11/2021

Leveraging Transformers for StarCraft Macromanagement Prediction

Inspired by the recent success of transformers in natural language proce...
07/23/2019

Compact Global Descriptor for Neural Networks

Long-range dependencies modeling, widely used in capturing spatiotempora...
01/12/2022

Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning

It is a challenging task to learn rich and multi-scale spatiotemporal se...

Code Repositories

TransDepth

Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.