Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

01/25/2022
by   Xin Jin, et al.
2

Traditional media coding schemes typically encode image/video into a semantic-unknown binary stream, which fails to directly support downstream intelligent tasks at the bitstream level. Semantically Structured Image Coding (SSIC) framework makes the first attempt to enable decoding-free or partial-decoding image intelligent task analysis via a Semantically Structured Bitstream (SSB). However, the SSIC only considers image coding and its generated SSB only contains the static object information. In this paper, we extend the idea of semantically structured coding from video coding perspective and propose an advanced Semantically Structured Video Coding (SSVC) framework to support heterogeneous intelligent applications. Video signals contain more rich dynamic motion information and exist more redundancy due to the similarity between adjacent frames. Thus, we present a reformulation of semantically structured bitstream (SSB) in SSVC which contains both static object characteristics and dynamic motion clues. Specifically, we introduce optical flow to encode continuous motion information and reduce cross-frame redundancy via a predictive coding architecture, then the optical flow and residual information are reorganized into SSB, which enables the proposed SSVC could better adaptively support video-based downstream intelligent applications. Extensive experiments demonstrate that the proposed SSVC framework could directly support multiple intelligent tasks just depending on a partially decoded bitstream. This avoids the full bitstream decompression and thus significantly saves bitrate/bandwidth consumption for intelligent analytics. We verify this point on the tasks of image object detection, pose estimation, video action recognition, video object segmentation, etc.

READ FULL TEXT

page 1

page 5

page 8

page 9

research
10/22/2019

Predictive Coding Networks Meet Action Recognition

Action recognition is a key problem in computer vision that labels video...
research
07/26/2019

Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM

Most of current Convolution Neural Network (CNN) based methods for optic...
research
12/11/2019

Deep motion estimation for parallel inter-frame prediction in video compression

Standard video codecs rely on optical flow to guide inter-frame predicti...
research
08/06/2020

Optical Flow and Mode Selection for Learning-based Video Coding

This paper introduces a new method for inter-frame coding based on two c...
research
01/11/2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Motion has shown to be useful for video understanding, where motion is t...
research
06/06/2022

A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

Deep spatiotemporal models are used in a variety of computer vision task...
research
10/22/2021

IVS3D: An Open Source Framework for Intelligent Video Sampling and Preprocessing to Facilitate 3D Reconstruction

The creation of detailed 3D models is relevant for a wide range of appli...

Please sign up or login with your details

Forgot password? Click here to reset