Multi-Frame Self-Supervised Depth with Transformers

04/15/2022
by   Vitor Guizilini, et al.
2

Multi-frame depth estimation improves over single-frame approaches by also leveraging geometric relationships between images via feature matching, in addition to learning appearance-based features. In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation. We use depth-discretized epipolar sampling to select matching candidates, and refine predictions through a series of self- and cross-attention layers. These layers sharpen the matching probability between pixel features, improving over standard similarity metrics prone to ambiguities and local minima. The refined cost volume is decoded into depth estimates, and the whole pipeline is trained end-to-end from videos using only a photometric objective. Experiments on the KITTI and DDAD datasets show that our DepthFormer architecture establishes a new state of the art in self-supervised monocular depth estimation, and is even competitive with highly specialized supervised single-frame architectures. We also show that our learned cross-attention network yields representations transferable across datasets, increasing the effectiveness of pre-training strategies. Project page: https://sites.google.com/tri.global/depthformer

READ FULL TEXT

page 1

page 4

page 5

page 6

page 12

page 13

page 14

page 15

research
04/25/2023

Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation

Although both self-supervised single-frame and multi-frame depth estimat...
research
04/07/2023

DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium

Self-supervised multi-frame depth estimation achieves high accuracy by c...
research
01/14/2023

Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes

Self-supervised methods have showed promising results on depth estimatio...
research
08/30/2022

SSORN: Self-Supervised Outlier Removal Network for Robust Homography Estimation

The traditional homography estimation pipeline consists of four main ste...
research
03/26/2023

Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes

Multi-frame methods improve monocular depth estimation over single-frame...
research
02/28/2023

Monocular Depth Estimation using Diffusion Models

We formulate monocular depth estimation using denoising diffusion models...
research
10/08/2022

Detaching and Boosting: Dual Engine for Scale-Invariant Self-Supervised Monocular Depth Estimation

Monocular depth estimation (MDE) in the self-supervised scenario has eme...

Please sign up or login with your details

Forgot password? Click here to reset