Self-Conditioned Probabilistic Learning of Video Rescaling

07/24/2021
by   Yuan Tian, et al.
0

Bicubic downscaling is a prevalent technique used to reduce the video storage burden or to accelerate the downstream processing speed. However, the inverse upscaling step is non-trivial, and the downscaled video may also deteriorate the performance of downstream tasks. In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously. During the training, we decrease the entropy of the information lost in the downscaling by maximizing its probability conditioned on the strong spatial-temporal prior information within the downscaled video. After optimization, the downscaled video by our framework preserves more meaningful information, which is beneficial for both the upscaling step and the downstream tasks, e.g., video action recognition task. We further extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed for the end-to-end training of the whole system. Extensive experimental results demonstrate the superiority of our approach on video rescaling, video compression, and efficient action recognition tasks.

READ FULL TEXT

page 3

page 6

page 8

research
04/08/2022

Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning

Recent self-supervised video representation learning methods focus on ma...
research
06/28/2020

Video Representation Learning with Visual Tempo Consistency

Visual tempo, which describes how fast an action goes, has shown its pot...
research
12/13/2019

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

Annotating videos is cumbersome, expensive and not scalable. Yet, many s...
research
11/23/2022

Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

Static appearance of video may impede the ability of a deep neural netwo...
research
05/10/2023

Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

Current few-shot action recognition involves two primary sources of info...
research
07/08/2022

Beyond Transfer Learning: Co-finetuning for Action Localisation

Transfer learning is the predominant paradigm for training deep networks...
research
07/07/2021

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

Standard lossy image compression algorithms aim to preserve an image's a...

Please sign up or login with your details

Forgot password? Click here to reset