STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction

06/09/2022
by   Zheng Chang, et al.
10

Although significant achievements have been achieved by recurrent neural network (RNN) based video prediction methods, their performance in datasets with high resolutions is still far from satisfactory because of the information loss problem and the perception-insensitive mean square error (MSE) based loss functions. In this paper, we propose a Spatiotemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems. To solve the information loss problem, the proposed model aims to preserve the spatiotemporal information for videos during the feature extraction and the state transitions, respectively. Firstly, a Multi-Grained Spatiotemporal Auto-Encoder (MGST-AE) is designed based on the X-Net structure. The proposed MGST-AE can help the decoders recall multi-grained information from the encoders in both the temporal and spatial domains. In this way, more spatiotemporal information can be preserved during the feature extraction for high-resolution videos. Secondly, a Spatiotemporal Gated Recurrent Unit (STGRU) is designed based on the standard Gated Recurrent Unit (GRU) structure, which can efficiently preserve spatiotemporal information during the state transitions. The proposed STGRU can achieve more satisfactory performance with a much lower computation load compared with the popular Long Short-Term (LSTM) based predictive memories. Furthermore, to improve the traditional MSE loss functions, a Learned Perceptual Loss (LP-loss) is further designed based on the Generative Adversarial Networks (GANs), which can help obtain a satisfactory trade-off between the objective quality and the perceptual quality. Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods. Source code has been available at <https://github.com/ZhengChang467/STIPHR>.

READ FULL TEXT

page 1

page 7

page 8

page 9

research
03/30/2022

STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction

Although many video prediction methods have obtained good performance in...
research
04/20/2022

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

Video prediction aims to predict future frames by modeling the complex s...
research
10/04/2018

Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs

The extension of image generation to video generation turns out to be a ...
research
12/29/2018

Brain MRI super-resolution using 3D generative adversarial networks

In this work we propose an adversarial learning approach to generate hig...
research
02/06/2021

CMS-LSTM: Context-Embedding and Multi-Scale Spatiotemporal-Expression LSTM for Video Prediction

Extracting variation and spatiotemporal features via limited frames rema...
research
07/12/2022

Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

Adaptive sampling that exploits the spatiotemporal redundancy in videos ...
research
01/29/2021

Spatiotemporal Dilated Convolution with Uncertain Matching for Video-based Crowd Estimation

In this paper, we propose a novel SpatioTemporal convolutional Dense Net...

Please sign up or login with your details

Forgot password? Click here to reset