Motion Prediction Under Multimodality with Conditional Stochastic Networks

05/05/2017
by   Katerina Fragkiadaki, et al.
0

Given a visual history, multiple future outcomes for a video scene are equally probable, in other words, the distribution of future outcomes has multiple modes. Multimodality is notoriously hard to handle by standard regressors or classifiers: the former regress to the mean and the latter discretize a continuous high dimensional output space. In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables. Such random variables are sampled from simple Gaussian distributions whose means and variances are parametrized by the output of convolutional encoders over the visual history. We introduce novel convolutional architectures for predicting future body joint trajectories that outperform fully connected alternatives DBLP:journals/corr/WalkerDGH16. We introduce stochastic spatial transformers through optical flow warping for predicting future frames, which outperform their deterministic equivalents DBLP:journals/corr/PatrauceanHC15. Training stochastic networks involves an intractable marginalization over stochastic variables. We compare various training schemes that handle such marginalization through a) straightforward sampling from the prior, b) conditional variational autoencoders NIPS2015_5775,DBLP:journals/corr/WalkerDGH16, and, c) a proposed K-best-sample loss that penalizes the best prediction under a fixed "prediction budget". We show experimental results on object trajectory prediction, human body joint trajectory prediction and video prediction under varying future uncertainty, validating quantitatively and qualitatively our architectural choices and training schemes.

READ FULL TEXT

page 5

page 7

research
03/22/2017

Predicting Deeper into the Future of Semantic Segmentation

The ability to predict and therefore to anticipate the future is an impo...
research
02/21/2018

Stochastic Video Generation with a Learned Prior

Generating video frames that accurately predict future world states is c...
research
10/30/2017

Stochastic Variational Video Prediction

Predicting the future in real-world settings, particularly from raw sens...
research
03/13/2020

Probabilistic Future Prediction for Video Scene Understanding

We present a novel deep learning architecture for probabilistic future p...
research
07/31/2015

Action-Conditional Video Prediction using Deep Networks in Atari Games

Motivated by vision-based reinforcement learning (RL) problems, in parti...
research
09/07/2021

CovarianceNet: Conditional Generative Model for Correct Covariance Prediction in Human Motion Prediction

The correct characterization of uncertainty when predicting human motion...
research
10/09/2020

Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic

Video anticipation is the task of predicting one/multiple future represe...

Please sign up or login with your details

Forgot password? Click here to reset