Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

10/13/2021
by   Seokju Lee, et al.
0

Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task that often relies on the so-called scene rigidity assumption. When observing a dynamic environment, this assumption is violated which leads to an ambiguity between the ego-motion of the camera and the motion of the objects. To solve this problem, we present a self-supervised learning framework for 3D object motion field estimation from monocular videos. Our contributions are two-fold. First, we propose a two-stage projection pipeline to explicitly disentangle the camera ego-motion and the object motions with dynamics attention module, called DAM. Specifically, we design an integrated motion model that estimates the motion of the camera and object in the first and second warping stages, respectively, controlled by the attention module through a shared motion encoder. Second, we propose an object motion field estimation through contrastive sample consensus, called CSAC, taking advantage of weak semantic prior (bounding box from an object detector) and geometric constraints (each object respects the rigid body motion model). Experiments on KITTI, Cityscapes, and Waymo Open Dataset demonstrate the relevance of our approach and show that our method outperforms state-of-the-art algorithms for the tasks of self-supervised monocular depth estimation, object motion segmentation, monocular scene flow estimation, and visual odometry.

READ FULL TEXT

page 1

page 3

page 4

page 8

page 13

page 14

page 16

page 17

research
12/09/2019

Self-supervised Object Motion and Depth Estimation from Video

We present a self-supervised learning framework to estimate the individu...
research
03/09/2019

Sparse Representations for Object and Ego-motion Estimation in Dynamic Scenes

Dynamic scenes that contain both object motion and egomotion are a chall...
research
03/30/2021

Endo-Depth-and-Motion: Localization and Reconstruction in Endoscopic Videos using Depth Networks and Photometric Constraints

Estimating a scene reconstruction and the camera motion from in-body vid...
research
01/14/2023

Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes

Self-supervised methods have showed promising results on depth estimatio...
research
12/19/2019

Instance-wise Depth and Motion Learning from Monocular Videos

We present an end-to-end joint training framework that explicitly models...
research
02/04/2021

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

We present an end-to-end joint training framework that explicitly models...
research
03/02/2021

Depth from Camera Motion and Object Detection

This paper addresses the problem of learning to estimate the depth of de...

Please sign up or login with your details

Forgot password? Click here to reset