High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

by   Ruben Villegas, et al.

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity. We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.


page 6

page 7

page 8

page 9


Novel Video Prediction for Large-scale Scene using Optical Flow

Making predictions of future frames is a critical challenge in autonomou...

Transformation-based Adversarial Video Prediction on Large-Scale Data

Recent breakthroughs in adversarial generative modeling have led to mode...

Fully Context-Aware Video Prediction

This paper proposes a new neural network design for unsupervised learnin...

Semantic Video CNNs through Representation Warping

In this work, we propose a technique to convert CNN models for semantic ...

Predictive-Corrective Networks for Action Detection

While deep feature learning has revolutionized techniques for static-ima...

Comparing Correspondences: Video Prediction with Correspondence-wise Losses

Today's image prediction methods struggle to change the locations of obj...

Motion Prediction Under Multimodality with Conditional Stochastic Networks

Given a visual history, multiple future outcomes for a video scene are e...