From Single to Multiple: Leveraging Multi-level Prediction Spaces for Video Forecasting

by   Mengcheng Lan, et al.

Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces. This work fills this gap. For the first time, we deeply study numerous strategies to perform video forecasting in multi-prediction spaces and fuse their results together to boost performance. The prediction in the pixel space usually lacks the ability to preserve the semantic and structure content of the video however the prediction in the high-level feature space is prone to generate errors in the reduction and recovering process. Therefore, we build a recurrent connection between different feature spaces and incorporate their generations in the upsampling process. Rather surprisingly, this simple idea yields a much more significant performance boost than PhyDNet (performance improved by 32.1 dataset, and 21.4 evaluations on four datasets demonstrate the generalization ability and effectiveness of our approach. We show that our model significantly reduces the troublesome distortions and blurry artifacts and brings remarkable improvements to the accuracy in long term video prediction. The code will be released soon.


page 1

page 4

page 6

page 7

page 8


Hierarchical Model for Long-term Video Prediction

Video prediction has been an active topic of research in the past few ye...

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Learning to predict the long-term future of video frames is notoriously ...

Curse of Small Sample Size in Forecasting of the Active Cases in COVID-19 Outbreak

During the COVID-19 pandemic, a massive number of attempts on the predic...

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

The crux of self-supervised video representation learning is to build ge...

A Multi-level Alignment Training Scheme for Video-and-Language Grounding

To solve video-and-language grounding tasks, the key is for the network ...

Asking Better Questions – The Art and Science of Forecasting: A mechanism for truer answers to high-stakes questions

Without the ability to estimate and benchmark AI capability advancements...

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

The mainstream of the existing approaches for video prediction builds up...

Please sign up or login with your details

Forgot password? Click here to reset