Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes

01/14/2023
by   Songchun Zhang, et al.
0

Self-supervised methods have showed promising results on depth estimation task. However, previous methods estimate the target depth map and camera ego-motion simultaneously, underusing multi-frame correlation information and ignoring the motion of dynamic objects. In this paper, we propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly and aggregates multi-frame information with transformer. Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation. Specifically, we use the perspective transformation to acquire the initial reference point, and use deformable attention to reduce the computational cost. Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior. To improve the motion field predictions, we propose an iterative optimization strategy, together with a sparsity-regularized loss. The entire pipeline achieves end-to-end self-supervised training by constructing a minimum reprojection loss. Extensive experiments on the KITTI and Cityscapes benchmarks demonstrate the effectiveness of our method and show that our method outperforms state-of-the-art algorithms.

READ FULL TEXT

page 1

page 5

research
04/25/2023

Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation

Although both self-supervised single-frame and multi-frame depth estimat...
research
04/15/2022

Multi-Frame Self-Supervised Depth with Transformers

Multi-frame depth estimation improves over single-frame approaches by al...
research
10/13/2021

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Estimating the motion of the camera together with the 3D structure of th...
research
07/21/2020

Feature-metric Loss for Self-supervised Learning of Depth and Egomotion

Photometric loss is widely used for self-supervised depth and egomotion ...
research
08/04/2019

Unsupervised Learning of Depth and Deep Representation for Visual Odometry from Monocular Videos in a Metric Space

For ego-motion estimation, the feature representation of the scenes is c...
research
04/27/2020

Self-Supervised Attention Learning for Depth and Ego-motion Estimation

We address the problem of depth and ego-motion estimation from image seq...
research
08/29/2019

Improving Self-Supervised Single View Depth Estimation by Masking Occlusion

Single view depth estimation models can be trained from video footage us...

Please sign up or login with your details

Forgot password? Click here to reset