Instance-wise Depth and Motion Learning from Monocular Videos

12/19/2019
by   Seokju Lee, et al.
30

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. The only annotation used in our pipeline is a video instance segmentation map that can be predicted by our new auto-annotation scheme. Our technical contributions are three-fold. First, we propose a differentiable forward rigid projection module that plays a key role in our instance-wise depth and motion learning. Second, we design an instance-wise photometric and geometric consistency loss that effectively decomposes background and moving object regions. Lastly, we introduce an instance-wise mini-batch re-arrangement scheme that does not require additional iterations in training. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 9

research
02/04/2021

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

We present an end-to-end joint training framework that explicitly models...
research
03/02/2022

Instance-aware multi-object self-supervision for monocular depth prediction

This paper proposes a self-supervised monocular image-to-depth predictio...
research
10/13/2021

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Estimating the motion of the camera together with the 3D structure of th...
research
09/12/2018

End-to-end depth from motion with stabilized monocular videos

We propose a depth map inference system from monocular videos based on a...
research
11/29/2021

Instance-wise Occlusion and Depth Orders in Natural Scenes

In this paper, we introduce a new dataset, named InstaOrder, that can be...
research
05/25/2021

Unsupervised Scale-consistent Depth Learning from Video

We propose a monocular depth estimator SC-Depth, which requires only unl...
research
08/15/2022

Uni6Dv2: Noise Elimination for 6D Pose Estimation

Few prior 6D pose estimation methods use a backbone network to extract f...

Please sign up or login with your details

Forgot password? Click here to reset