VPTR: Efficient Transformers for Video Prediction

03/29/2022
by   Xi Ye, et al.
0

In this paper, we propose a new Transformer block for video future frames prediction based on an efficient local spatial-temporal separation attention mechanism. Based on this new Transformer block, a fully autoregressive video future frames prediction Transformer is proposed. In addition, a non-autoregressive video prediction Transformer is also proposed to increase the inference speed and reduce the accumulated inference errors of its autoregressive counterpart. In order to avoid the prediction of very similar future frames, a contrastive feature loss is applied to maximize the mutual information between predicted and ground-truth future frame features. This work is the first that makes a formal comparison of the two types of attention-based video future frames prediction models over different scenarios. The proposed models reach a performance competitive with more complex state-of-the-art models. The source code is available at https://github.com/XiYe20/VPTR.

READ FULL TEXT
research
12/12/2022

Video Prediction by Efficient Transformers

Video prediction is a challenging computer vision task that has a wide r...
research
07/20/2019

Order Matters: Shuffling Sequence Generation for Video Prediction

Predicting future frames in natural video sequences is a new challenge t...
research
10/11/2022

Continuous conditional video synthesis by neural processes

We propose a unified model for multiple conditional video synthesis task...
research
03/14/2023

Implicit Stacked Autoregressive Model for Video Prediction

Future frame prediction has been approached through two primary methods:...
research
09/13/2023

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Video deblurring methods, aiming at recovering consecutive sharp frames ...
research
10/15/2020

Masked Contrastive Representation Learning for Reinforcement Learning

Improving sample efficiency is a key research problem in reinforcement l...
research
12/09/2022

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

The mainstream of the existing approaches for video prediction builds up...

Please sign up or login with your details

Forgot password? Click here to reset