Probabilistic Video Generation using Holistic Attribute Control

by   Jiawei He, et al.

Videos express highly structured spatio-temporal patterns of visual data. A video can be thought of as being governed by two factors: (i) temporally invariant (e.g., person identity), or slowly varying (e.g., activity), attribute-induced appearance, encoding the persistent content of each frame, and (ii) an inter-frame motion or scene dynamics (e.g., encoding evolution of the person ex-ecuting the action). Based on this intuition, we propose a generative framework for video generation and future prediction. The proposed framework generates a video (short clip) by decoding samples sequentially drawn from a latent space distribution into full video frames. Variational Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from the latent space and RNN as a wayto model the dynamics in the latent space. We improve the video generation consistency through temporally-conditional sampling and quality by structuring the latent space with attribute controls; ensuring that attributes can be both inferred and conditioned on during learning/generation. As a result, given attributes and/orthe first frame, our model is able to generate diverse but highly consistent sets ofvideo sequences, accounting for the inherent uncertainty in the prediction task. Experimental results on Chair CAD, Weizmann Human Action, and MIT-Flickr datasets, along with detailed comparison to the state-of-the-art, verify effectiveness of the framework.



There are no comments yet.


page 2

page 13

page 14

page 18

page 19

page 20

page 21


Simple Video Generation using Neural ODEs

Despite having been studied to a great extent, the task of conditional g...

Prediction Under Uncertainty with Error-Encoding Networks

In this work we introduce a new framework for performing temporal predic...

VOS-GAN: Adversarial Learning of Visual-Temporal Dynamics for Unsupervised Dense Prediction in Videos

Recent GAN-based video generation approaches model videos as the combina...

Latent Video Transformer

The video generation task can be formulated as a prediction of future vi...

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

Human-motion generation is a long-standing challenging task due to the r...

Multi-frame sequence generator of 4D human body motion

We examine the problem of generating temporally and spatially dense 4D h...

Deep Probabilistic Video Compression

We propose a variational inference approach to deep probabilistic video ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.