Visuomotor Understanding for Representation Learning of Driving Scenes

09/16/2019
by   Seokju Lee, et al.
1

Dashboard cameras capture a tremendous amount of driving scene video each day. These videos are purposefully coupled with vehicle sensing data, such as from the speedometer and inertial sensors, providing an additional sensing modality for free. In this work, we leverage the large-scale unlabeled yet naturally paired data for visual representation learning in the driving scenario. A representation is learned in an end-to-end self-supervised framework for predicting dense optical flow from a single frame with paired sensing data. We postulate that success on this task requires the network to learn semantic and geometric knowledge in the ego-centric view. For example, forecasting a future view to be seen from a moving vehicle requires an understanding of scene depth, scale, and movement of objects. We demonstrate that our learned representation can benefit other tasks that require detailed scene understanding and outperforms competing unsupervised representations on semantic segmentation.

READ FULL TEXT
research
01/16/2021

Self-Supervised Representation Learning from Flow Equivariance

Self-supervised representation learning is able to learn semantically me...
research
12/04/2016

End-to-end Learning of Driving Models from Large-scale Video Datasets

Robust perception-action models should be learned from training data wit...
research
10/04/2022

Dfferentiable Raycasting for Self-supervised Occupancy Forecasting

Motion planning for safe autonomous driving requires learning how the en...
research
06/13/2016

Visual-Inertial-Semantic Scene Representation for 3-D Object Detection

We describe a system to detect objects in three-dimensional space using ...
research
04/14/2021

Adaptive Intermediate Representations for Video Understanding

A common strategy to video understanding is to incorporate spatial and m...
research
06/09/2019

Cross-view Semantic Segmentation for Sensing Surroundings

Sensing surroundings is ubiquitous and effortless to humans: It takes a ...
research
06/07/2019

Evolving Losses for Unlabeled Video Representation Learning

We present a new method to learn video representations from unlabeled da...

Please sign up or login with your details

Forgot password? Click here to reset