Neural World Models for Computer Vision

06/15/2023
by   Anthony Hu, et al.
0

Humans navigate in their environment by learning a mental model of the world through passive observation and active interaction. Their world model allows them to anticipate what might happen next and act accordingly with respect to an underlying objective. Such world models hold strong promises for planning in complex environments like in autonomous driving. A human driver, or a self-driving system, perceives their surroundings with their eyes or their cameras. They infer an internal representation of the world which should: (i) have spatial memory (e.g. occlusions), (ii) fill partially observable or noisy inputs (e.g. when blinded by sunlight), and (iii) be able to reason about unobservable events probabilistically (e.g. predict different possible futures). They are embodied intelligent agents that can predict, plan, and act in the physical world through their world model. In this thesis we present a general framework to train a world model and a policy, parameterised by deep neural networks, from camera observations and expert demonstrations. We leverage important computer vision concepts such as geometry, semantics, and motion to scale world models to complex urban driving scenes. First, we propose a model that predicts important quantities in computer vision: depth, semantic segmentation, and optical flow. We then use 3D geometry as an inductive bias to operate in the bird's-eye view space. We present for the first time a model that can predict probabilistic future trajectories of dynamic agents in bird's-eye view from 360 surround monocular cameras only. Finally, we demonstrate the benefits of learning a world model in closed-loop driving. Our model can jointly predict static scene, dynamic scene, and ego-behaviour in an urban driving environment.

READ FULL TEXT

page 25

page 29

page 32

page 36

page 39

page 42

research
10/14/2022

Model-Based Imitation Learning for Urban Driving

An accurate model of the environment and the dynamic agents acting in it...
research
03/13/2020

Probabilistic Future Prediction for Video Scene Understanding

We present a novel deep learning architecture for probabilistic future p...
research
04/21/2021

FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras

Driving requires interacting with road agents and predicting their futur...
research
11/09/2022

Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review

Computer vision applications in intelligent transportation systems (ITS)...
research
04/14/2020

Footprints and Free Space from a Single Color Image

Understanding the shape of a scene from a single color image is a formid...
research
09/14/2021

Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments

Driving in a complex urban environment is a difficult task that requires...
research
05/11/2022

NMR: Neural Manifold Representation for Autonomous Driving

Autonomous driving requires efficient reasoning about the Spatio-tempora...

Please sign up or login with your details

Forgot password? Click here to reset