MonoFormer: Towards Generalization of self-supervised monocular depth estimation with Transformers

05/23/2022
by   Jinwoo Bae, et al.
0

Self-supervised monocular depth estimation has been widely studied recently. Most of the work has focused on improving performance on benchmark datasets, such as KITTI, but has offered a few experiments on generalization performance. In this paper, we investigate the backbone networks (e.g. CNNs, Transformers, and CNN-Transformer hybrid models) toward the generalization of monocular depth estimation. We first evaluate state-of-the-art models on diverse public datasets, which have never been seen during the network training. Next, we investigate the effects of texture-biased and shape-biased representations using the various texture-shifted datasets that we generated. We observe that Transformers exhibit a strong shape bias and CNNs do a strong texture-bias. We also find that shape-biased models show better generalization performance for monocular depth estimation compared to texture-biased models. Based on these observations, we newly design a CNN-Transformer hybrid network with a multi-level adaptive feature fusion module, called MonoFormer. The design intuition behind MonoFormer is to increase shape bias by employing Transformers while compensating for the weak locality bias of Transformers by adaptively fusing multi-level representations. Extensive experiments show that the proposed method achieves state-of-the-art performance with various public datasets. Our method also shows the best generalization ability among the competitive methods.

READ FULL TEXT

page 5

page 6

page 7

page 9

research
01/09/2023

A Study on the Generality of Neural Network Structures for Monocular Depth Estimation

Monocular depth estimation has been widely studied, and significant impr...
research
02/02/2023

Domain Generalization Emerges from Dreaming

Recent studies have proven that DNNs, unlike human vision, tend to explo...
research
08/06/2022

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

Self-supervised monocular depth estimation is an attractive solution tha...
research
02/20/2023

GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation

In recent years, self-supervised monocular depth estimation has drawn mu...
research
08/16/2023

Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN

Monocular depth estimation is an ongoing challenge in computer vision. R...
research
04/10/2022

DILEMMA: Self-Supervised Shape and Texture Learning with Transformers

There is a growing belief that deep neural networks with a shape bias ma...
research
05/30/2022

SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth Estimation with Learnable Patchmatch

Unsupervised monocular trained depth estimation models make use of adjac...

Please sign up or login with your details

Forgot password? Click here to reset