Decomposing Motion and Content for Natural Video Sequence Prediction

06/25/2017
by   Ruben Villegas, et al.
0

We propose a deep neural network for the prediction of future frames in natural video sequences. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. Our model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction. Our model is end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on human activity videos using KTH, Weizmann action, and UCF-101 datasets. We show state-of-the-art performance in comparison to recent approaches. To the best of our knowledge, this is the first end-to-end trainable network architecture with motion and content separation to model the spatiotemporal dynamics for pixel-level future prediction in natural videos.

READ FULL TEXT

page 8

page 10

page 12

page 13

page 14

page 15

page 16

page 17

research
04/13/2018

MSnet: Mutual Suppression Network for Disentangled Video Representations

The extraction of meaningful features from videos is important as they c...
research
03/06/2023

Polar Prediction of Natural Videos

Observer motion and continuous deformations of objects and surfaces imbu...
research
05/06/2021

FDNet: A Deep Learning Approach with Two Parallel Cross Encoding Pathways for Precipitation Nowcasting

With the goal of predicting the future rainfall intensity in a local reg...
research
04/10/2022

Learning Pixel-Level Distinctions for Video Highlight Detection

The goal of video highlight detection is to select the most attractive s...
research
10/16/2020

Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation

Video generation models often operate under the assumption of fixed fram...
research
12/05/2017

Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

We propose an approach for forecasting video of complex human activity i...
research
01/19/2017

FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

We propose an end-to-end learning framework for segmenting generic objec...

Please sign up or login with your details

Forgot password? Click here to reset