High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

11/05/2019
by   Ruben Villegas, et al.
37

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity. We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.

READ FULL TEXT

page 6

page 7

page 8

page 9

05/30/2018

Novel Video Prediction for Large-scale Scene using Optical Flow

Making predictions of future frames is a critical challenge in autonomou...
03/09/2020

Transformation-based Adversarial Video Prediction on Large-Scale Data

Recent breakthroughs in adversarial generative modeling have led to mode...
10/23/2017

Fully Context-Aware Video Prediction

This paper proposes a new neural network design for unsupervised learnin...
08/10/2017

Semantic Video CNNs through Representation Warping

In this work, we propose a technique to convert CNN models for semantic ...
04/12/2017

Predictive-Corrective Networks for Action Detection

While deep feature learning has revolutionized techniques for static-ima...
04/19/2021

Comparing Correspondences: Video Prediction with Correspondence-wise Losses

Today's image prediction methods struggle to change the locations of obj...
05/05/2017

Motion Prediction Under Multimodality with Conditional Stochastic Networks

Given a visual history, multiple future outcomes for a video scene are e...