Generating Videos with Scene Dynamics

09/08/2016
by   Carl Vondrick, et al.
0

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning. We believe generative video models can impact many applications in video understanding and simulation.

READ FULL TEXT

page 5

page 6

page 8

research
02/21/2022

Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

In the deep learning era, long video generation of high-quality still re...
research
08/26/2023

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models

Text-to-video (T2V) synthesis has gained increasing attention in the com...
research
07/20/2021

Generative Video Transformer: Can Objects be the Words?

Transformers have been successful for many natural language processing t...
research
05/10/2019

Towards Unsupervised Familiar Scene Recognition in Egocentric Videos

Nowadays, there is an upsurge of interest in using lifelogging devices. ...
research
11/30/2017

Towards an Understanding of Our World by GANing Videos in the Wild

Existing generative video models work well only for videos with a static...
research
05/10/2021

Stochastic Image-to-Video Synthesis using cINNs

Video understanding calls for a model to learn the characteristic interp...
research
06/28/2020

Video Representations of Goals Emerge from Watching Failure

We introduce a video representation learning framework that models the l...

Please sign up or login with your details

Forgot password? Click here to reset