Learning Depth from Monocular Videos Using Synthetic Data: A Temporally-Consistent Domain Adaptation Approach

07/16/2019
by   Yipeng Mou, et al.
1

Majority of state-of-the-art monocular depth estimation methods are supervised learning approaches. The success of such approaches heavily depends on the high-quality depth labels which are expensive to obtain. Recent methods try to learn depth networks by exploring unsupervised cues from monocular videos which are easier to acquire but less reliable. In this paper, we propose to resolve this dilemma by transferring knowledge from synthetic videos with easily obtainable ground truth depth labels. Due to the stylish difference between synthetic and real images, we propose a temporally-consistent domain adaptation (TCDA) approach that simultaneously explores labels in the synthetic domain and temporal constraints in the videos to improve style transfer and depth prediction. Furthermore, we make use of the ground truth optical flow and pose information in the synthetic data to learn moving mask and pose prediction networks. The learned moving masks can filter out moving regions that produces erroneous temporal constraints and the estimated poses provide better initializations for estimating temporal constraints. The experimental results demonstrate the effectiveness of our method and comparable performance against state-of-the-art.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

page 8

research
04/03/2019

Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation

Supervised depth estimation has achieved high accuracy due to the advanc...
research
02/27/2020

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

Leveraging synthetically rendered data offers great potential to improve...
research
09/19/2022

3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling

For monocular depth estimation, acquiring ground truths for real data is...
research
02/26/2019

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

While learning based depth estimation from images/videos has achieved su...
research
12/17/2019

Learning from Synthetic Animals

Despite great success in human parsing, progress for parsing other defor...
research
01/16/2019

Lightweight Markerless Monocular Face Capture with 3D Spatial Priors

We present a simple lightweight markerless facial performance capture fr...
research
05/04/2022

ShoeRinsics: Shoeprint Prediction for Forensics with Intrinsic Decomposition

Shoe tread impressions are one of the most common types of evidence left...

Please sign up or login with your details

Forgot password? Click here to reset