Distilled Semantics for Comprehensive Scene Understanding from Videos

03/31/2020
by   Fabio Tosi, et al.
0

Whole understanding of the surroundings is paramount to autonomous systems. Recent works have shown that deep neural networks can learn geometry (depth) and motion (optical flow) from a monocular video without any explicit supervision from ground truth annotations, particularly hard to source for these two tasks. In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics, with supervision for the latter provided by a pre-trained network distilling proxy ground truth images. We address the three tasks jointly by a) a novel training protocol based on knowledge distillation and self-supervision and b) a compact network architecture which enables efficient scene understanding on both power hungry GPUs and low-power embedded platforms. We thoroughly assess the performance of our framework and show that it yields state-of-the-art results for monocular depth estimation, optical flow and motion segmentation.

READ FULL TEXT

page 1

page 5

page 22

page 23

page 24

page 25

page 26

research
03/28/2022

Learning Optical Flow, Depth, and Scene Flow without Real-World Labels

Self-supervised monocular depth estimation enables robots to learn 3D pe...
research
10/14/2018

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding

Learning to estimate 3D geometry in a single frame and optical flow from...
research
04/08/2021

Learning optical flow from still images

This paper deals with the scarcity of data for training optical flow net...
research
12/13/2017

Self-Supervised Depth Learning for Urban Scene Understanding

As an agent moves through the world, the apparent motion of scene elemen...
research
11/24/2021

MonoPLFlowNet: Permutohedral Lattice FlowNet for Real-Scale 3D Scene FlowEstimation with Monocular Images

Real-scale scene flow estimation has become increasingly important for 3...
research
02/26/2019

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

While learning based depth estimation from images/videos has achieved su...
research
10/11/2022

Weakly-Supervised Optical Flow Estimation for Time-of-Flight

Indirect Time-of-Flight (iToF) cameras are a widespread type of 3D senso...

Please sign up or login with your details

Forgot password? Click here to reset