Spatial-Temporal Transformer based Video Compression Framework

09/21/2023
by   Yanbo Gao, et al.
0

Learned video compression (LVC) has witnessed remarkable advancements in recent years. Similar as the traditional video coding, LVC inherits motion estimation/compensation, residual coding and other modules, all of which are implemented with neural networks (NNs). However, within the framework of NNs and its training mechanism using gradient backpropagation, most existing works often struggle to consistently generate stable motion information, which is in the form of geometric features, from the input color features. Moreover, the modules such as the inter-prediction and residual coding are independent from each other, making it inefficient to fully reduce the spatial-temporal redundancy. To address the above problems, in this paper, we propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework. It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression. Specifically, RDT is developed to stably estimate the motion information between frames by thoroughly investigating the relationship between the similarity based geometric motion feature extraction and self-attention. MGP is designed to fuse the multi-reference frame information by effectively exploring the coarse-grained prediction feature generated with the coded motion information. SFD-T is to compress the residual information by jointly exploring the spatial feature distributions in both residual and temporal prediction to further reduce the spatial-temporal redundancy. Experimental results demonstrate that our method achieves the best result with 13.5

READ FULL TEXT

page 1

page 3

page 9

research
05/20/2021

FVC: A New Framework towards Deep Video Compression in Feature Space

Learning based video compression attracts increasing attention in the pa...
research
07/09/2020

Neural Video Coding using Multiscale Motion Compensation and Spatiotemporal Context Model

Over the past two decades, traditional block-based video coding has made...
research
03/08/2023

Scene Matters: Model-based Deep Video Compression

Video compression has always been a popular research area, where many tr...
research
07/13/2022

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

For neural video codec, it is critical, yet challenging, to design an ef...
research
08/08/2022

Boosting neural video codecs by exploiting hierarchical redundancy

In video compression, coding efficiency is improved by reusing pixels fr...
research
12/13/2019

Learned Video Compression via Joint Spatial-Temporal Correlation Exploration

Traditional video compression technologies have been developed over deca...
research
07/11/2022

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in ge...

Please sign up or login with your details

Forgot password? Click here to reset