Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction

03/06/2021
by   Bohan Wu, et al.
11

A video prediction model that generalizes to diverse scenes would enable intelligent agents such as robots to perform a variety of tasks via planning with the model. However, while existing video prediction models have produced promising results on small datasets, they suffer from severe underfitting when trained on large and diverse datasets. To address this underfitting challenge, we first observe that the ability to train larger video prediction models is often bottlenecked by the memory constraints of GPUs or TPUs. In parallel, deep hierarchical latent variable models can produce higher quality predictions by capturing the multi-level stochasticity of future observations, but end-to-end optimization of such models is notably difficult. Our key insight is that greedy and modular optimization of hierarchical autoencoders can simultaneously address both the memory constraints and the optimization challenges of large-scale video prediction. We introduce Greedy Hierarchical Variational Autoencoders (GHVAEs), a method that learns high-fidelity video predictions by greedily training each level of a hierarchical autoencoder. In comparison to state-of-the-art models, GHVAEs provide 17-55 on four video datasets, a 35-40 can improve performance monotonically by simply adding more modules.

READ FULL TEXT

page 8

page 14

page 15

page 16

page 17

page 19

page 20

page 23

research
02/18/2021

Clockwork Variational Autoencoders

Deep learning has enabled algorithms to generate realistic images. Howev...
research
09/15/2022

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

Video prediction is an important yet challenging problem; burdened with ...
research
06/24/2021

FitVid: Overfitting in Pixel-Level Video Prediction

An agent that is capable of predicting what happens next can perform a v...
research
03/02/2021

Predicting Video with VQVAE

In recent years, the task of video prediction-forecasting future video g...
research
04/04/2018

Stochastic Adversarial Video Prediction

Being able to predict what may happen in the future requires an in-depth...
research
06/06/2019

Scaling Autoregressive Video Models

Due to the statistical complexity of video, the high degree of inherent ...
research
12/11/2020

A Log-likelihood Regularized KL Divergence for Video Prediction with A 3D Convolutional Variational Recurrent Network

The use of latent variable models has shown to be a powerful tool for mo...

Please sign up or login with your details

Forgot password? Click here to reset