Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

02/04/2021
by   Seokju Lee, et al.
0

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we design a unified instance-aware photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we introduce a general-purpose auto-annotation scheme using any off-the-shelf instance segmentation and optical flow models to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are available at https://github.com/SeokjuLee/Insta-DM .

READ FULL TEXT

page 1

page 3

page 4

research
12/19/2019

Instance-wise Depth and Motion Learning from Monocular Videos

We present an end-to-end joint training framework that explicitly models...
research
03/02/2022

Instance-aware multi-object self-supervision for monocular depth prediction

This paper proposes a self-supervised monocular image-to-depth predictio...
research
10/13/2021

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Estimating the motion of the camera together with the 3D structure of th...
research
05/30/2021

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Estimating geometric elements such as depth, camera motion, and optical ...
research
12/09/2020

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

In this paper, we present ViP-DeepLab, a unified model attempting to tac...
research
03/06/2018

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

We propose GeoNet, a jointly unsupervised learning framework for monocul...
research
06/12/2019

Unsupervised Monocular Depth and Ego-motion Learning with Structure and Semantics

We present an approach which takes advantage of both structure and seman...

Please sign up or login with your details

Forgot password? Click here to reset