Hierarchical Long-term Video Prediction without Supervision

06/12/2018
by   Nevan Wichers, et al.
10

Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons. The hierarchical video prediction method by Villegas et al. (2017) is an example of a state-of-the-art method for long-term video prediction, but their method is limited because it requires ground truth annotation of high-level structures (e.g., human joint landmarks) at training time. Our network encodes the input frame, predicts a high-level encoding into the future, and then a decoder with access to the first frame produces the predicted image from the predicted encoding. The decoder also produces a mask that outlines the predicted foreground object (e.g., person) as a by-product. Unlike Villegas et al. (2017), we develop a novel training method that jointly trains the encoder, the predictor, and the decoder together without highlevel supervision; we further improve upon this by using an adversarial loss in the feature space to train the predictor. Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset.

READ FULL TEXT

page 5

page 7

page 8

research
06/27/2017

Hierarchical Model for Long-term Video Prediction

Video prediction has been an active topic of research in the past few ye...
research
04/14/2021

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Learning to predict the long-term future of video frames is notoriously ...
research
04/16/2019

Long-Term Video Generation of Multiple Futures Using Human Poses

Predicting the near-future from an input video is a useful task for appl...
research
04/16/2019

Long-Term Video Generation of Multiple FuturesUsing Human Poses

Predicting the near-future from an input video is a useful task for appl...
research
01/05/2023

HierVL: Learning Hierarchical Video-Language Embeddings

Video-language embeddings are a promising avenue for injecting semantics...
research
05/08/2017

Multi Resolution LSTM For Long Term Prediction In Neural Activity Video

Epileptic seizures are caused by abnormal, overly syn- chronized, electr...
research
04/07/2022

Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping

We investigate the "Visual Pushing for Grasping" (VPG) system by Zeng et...

Please sign up or login with your details

Forgot password? Click here to reset