VideoFlow: A Flow-Based Generative Model for Video

03/04/2019
by   Manoj Kumar, et al.
6

Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. In particular, learning predictive models of videos offers an especially appealing mechanism to enable a rich understanding of the physical world: videos of real-world interactions are plentiful and readily available, and a model that can predict future video frames can not only capture useful representations of the world, but can be useful in its own right, for problems such as model-based robotic control. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally (as in the case of pixel-level autoregressive models), or do not directly optimize the likelihood of the data. In this work, we propose a model for video prediction based on normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video.

READ FULL TEXT

page 5

page 8

page 9

page 10

research
06/18/2020

Latent Video Transformer

The video generation task can be formulated as a prediction of future vi...
research
10/30/2017

Stochastic Variational Video Prediction

Predicting the future in real-world settings, particularly from raw sens...
research
12/20/2014

Video (language) modeling: a baseline for generative models of natural videos

We propose a strong baseline model for unsupervised feature learning usi...
research
09/19/2022

T3VIP: Transformation-based 3D Video Prediction

For autonomous skill acquisition, robots have to learn about the physica...
research
06/11/2018

Learning to Decompose and Disentangle Representations for Video Prediction

Our goal is to predict future video frames given a sequence of input fra...
research
05/19/2023

Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes

Humans and animals have a rich and flexible understanding of the physica...
research
08/27/2016

Learning Temporal Transformations From Time-Lapse Videos

Based on life-long observations of physical, chemical, and biologic phen...

Please sign up or login with your details

Forgot password? Click here to reset