Log In Sign Up

Probabilistic Future Prediction for Video Scene Understanding

by   Anthony Hu, et al.

We present a novel deep learning architecture for probabilistic future prediction from video. We predict the future semantics, geometry and motion of complex real-world urban scenes and use this representation to control an autonomous vehicle. This work is the first to jointly predict ego-motion, static scene, and the motion of dynamic agents in a probabilistic manner, which allows sampling consistent, highly probable futures from a compact latent space. Our model learns a representation from RGB video with a spatio-temporal convolutional module. The learned representation can be explicitly decoded to future semantic segmentation, depth, and optical flow, in addition to being an input to a learnt driving policy. To model the stochasticity of the future, we introduce a conditional variational approach which minimises the divergence between the present distribution (what could happen given what we have seen) and the future distribution (what we observe actually happens). During inference, diverse futures are generated by sampling from the present distribution.


page 4

page 12

page 13

page 21


Dense Optical Flow Prediction from a Static Image

Given a scene, what is going to move, and in what direction will it move...

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

In self-driving, predicting future in terms of location and motion of al...

Deep Learning Driven Visual Path Prediction from a Single Image

Capabilities of inference and prediction are significant components of v...

Predicting Deeper into the Future of Semantic Segmentation

The ability to predict and therefore to anticipate the future is an impo...

Motion Prediction Under Multimodality with Conditional Stochastic Networks

Given a visual history, multiple future outcomes for a video scene are e...

Future Segmentation Using 3D Structure

Predicting the future to anticipate the outcome of events and actions is...

T3VIP: Transformation-based 3D Video Prediction

For autonomous skill acquisition, robots have to learn about the physica...