VOS-GAN: Adversarial Learning of Visual-Temporal Dynamics for Unsupervised Dense Prediction in Videos

03/24/2018
by   C. Spampinato, et al.
0

Recent GAN-based video generation approaches model videos as the combination of a time-independent scene component and a time-varying motion component, thus factorizing the generation problem into generating background and foreground separately. One of the main limitations of current approaches is that both factors are learned by mapping one source latent space to videos, which complicates the generation task as a single data point must be informative of both background and foreground content. In this paper we propose a GAN framework for video generation that, instead, employs two latent spaces in order to structure the generative process in a more natural way: 1) a latent space to generate the static visual content of a scene (background), which remains the same for the whole video, and 2) a latent space where motion is encoded as a trajectory between sampled points and whose dynamics are modeled through an RNN encoder (jointly trained with the generator and the discriminator) and then mapped by the generator to visual objects' motion. Additionally, we extend current video discrimination approaches by incorporating in the learning procedure motion estimation and, leveraging the peculiarity of the generation process, unsupervised pixel-wise dense predictions. Extensive performance evaluation showed that our approach is able to a) synthesize more realistic videos than state-of-the-art methods, b) learn effectively both local and global video dynamics, as demonstrated by the results achieved on a video action recognition task over the UCF-101 dataset, and c) accurately perform unsupervised video object segmentation on standard video benchmarks, such as DAVIS, SegTrack and F4K-Fish.

READ FULL TEXT

page 2

page 13

page 15

research
07/17/2017

MoCoGAN: Decomposing Motion and Content for Video Generation

Visual signals in a video can be divided into content and motion. While ...
research
04/23/2023

LaMD: Latent Motion Diffusion for Video Generation

Generating coherent and natural movement is the key challenge in video g...
research
11/30/2017

Towards an Understanding of Our World by GANing Videos in the Wild

Existing generative video models work well only for videos with a static...
research
03/21/2018

Probabilistic Video Generation using Holistic Attribute Control

Videos express highly structured spatio-temporal patterns of visual data...
research
11/23/2022

Evaluating and Mitigating Static Bias of Action Representations in the Background and the Foreground

Deep neural networks for video action recognition easily learn to utiliz...
research
06/28/2020

Unsupervised Learning of Video Representations via Dense Trajectory Clustering

This paper addresses the task of unsupervised learning of representation...
research
09/06/2019

Video Interpolation and Prediction with Unsupervised Landmarks

Prediction and interpolation for long-range video data involves the comp...

Please sign up or login with your details

Forgot password? Click here to reset