Clockwork Variational Autoencoders

02/18/2021
by   Vaibhav Saxena, et al.
0

Deep learning has enabled algorithms to generate realistic images. However, accurately predicting long video sequences requires understanding long-term dependencies and remains an open challenge. While existing video prediction models succeed at generating sharp images, they tend to fail at accurately predicting far into the future. We introduce the Clockwork VAE (CW-VAE), a video prediction model that leverages a hierarchy of latent sequences, where higher levels tick at slower intervals. We demonstrate the benefits of both hierarchical latents and temporal abstraction on 4 diverse video prediction datasets with sequences of up to 1000 frames, where CW-VAE outperforms top video prediction models. Additionally, we propose a Minecraft benchmark for long-term video prediction. We conduct several experiments to gain insights into CW-VAE and confirm that slower levels learn to represent objects that change more slowly in the video, and faster levels learn to represent faster objects.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 7

page 15

page 16

page 17

research
10/05/2022

Temporally Consistent Video Transformer for Long-Term Video Prediction

Generating long, temporally consistent video remains an open challenge i...
research
03/06/2021

Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction

A video prediction model that generalizes to diverse scenes would enable...
research
03/13/2018

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

The Variational Autoencoder (VAE) has proven to be an effective model fo...
research
08/21/2023

Long-Term Prediction of Natural Video Sequences with Robust Video Predictors

Predicting high dimensional video sequences is a curiously difficult pro...
research
01/28/2021

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

Predicting future frames of video sequences is challenging due to the co...
research
12/29/2022

Long-horizon video prediction using a dynamic latent hierarchy

The task of video prediction and generation is known to be notoriously d...
research
06/06/2021

Technical Report: Temporal Aggregate Representations

This technical report extends our work presented in [9] with more experi...

Please sign up or login with your details

Forgot password? Click here to reset