Self-Supervised Depth Learning for Urban Scene Understanding

12/13/2017
by   Huaizu Jiang, et al.
0

As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, far away mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we train a deep network, using fully automatic supervision, to predict relative scene depth from single images. The depth training images are automatically derived from simple videos of cars moving through a scene, using classic depth from motion techniques, and no human provided labels. We show that this pretext task of predicting depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. In particular, our pre-trained model outperforms an ImageNet counterpart for the monocular depth estimation task. Unlike work that analyzes video paired with additional information about direction of motion, our agent learns from "raw egomotion" video recorded from cars moving through the world. Unlike methods that require videos of moving objects, we neither depend on, nor are disrupted by, moving objects in the video. Indeed, we can benefit from predicting depth in the videos associated with various downstream tasks, showing that we can adapt to new scenes in an unsupervised manner to improve performance. By doing so, we achieve consistently better results over several different urban scene understanding tasks, obtaining results that are competitive with state-of-the-art method for monocular depth estimation.

READ FULL TEXT

page 2

page 4

page 5

page 12

page 13

page 14

research
10/21/2021

Self-Supervised Monocular Scene Decomposition and Depth Estimation

Self-supervised monocular depth estimation approaches either ignore inde...
research
07/14/2020

Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

Self-supervised monocular depth estimation presents a powerful method to...
research
03/31/2020

Distilled Semantics for Comprehensive Scene Understanding from Videos

Whole understanding of the surroundings is paramount to autonomous syste...
research
05/14/2021

Omnimatte: Associating Objects and Their Effects in Video

Computer vision is increasingly effective at segmenting objects in image...
research
02/26/2019

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

While learning based depth estimation from images/videos has achieved su...
research
07/17/2020

People as Scene Probes

By analyzing the motion of people and other objects in a scene, we demon...
research
09/12/2018

Learning structure-from-motionfrom motion

This work is based on a questioning of the quality metrics used by deep ...

Please sign up or login with your details

Forgot password? Click here to reset