Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval

09/15/2023
by   Rui Deng, et al.
0

Optimizing video inference efficiency has become increasingly important with the growing demand for video analysis in various fields. Some existing methods achieve high efficiency by explicit discard of spatial or temporal information, which poses challenges in fast-changing and fine-grained scenarios. To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations. Specifically, we leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features, refining and updating the features into a high-low resolution video sequence. To process the new sequence, we introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions, while reducing spatial computation costs quadratically by utilizing fewer spatial tokens in low-resolution non-saliency frames. The entire network can be end-to-end optimized via the integration of the differentiable compression module. Experimental results show that our method achieves the best trade-off between efficiency and performance on near-duplicate video retrieval and competitive results on dynamic video classification compared to state-of-the-art methods. Code:https://github.com/dun-research/DRCA

READ FULL TEXT
research
10/28/2021

MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution

Space-time video super-resolution (STVSR) aims to construct a high space...
research
10/15/2021

EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

In this paper, we consider the problem of reference-based video super-re...
research
03/27/2022

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Space-time video super-resolution (STVSR) is the task of interpolating v...
research
03/13/2023

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

Video semantic segmentation (VSS) is a computationally expensive task du...
research
12/28/2021

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Recent works have shown that the computational efficiency of video recog...
research
07/11/2022

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in ge...
research
03/29/2021

Video Classification with FineCoarse Networks

A rich representation of the information in video data can be realized b...

Please sign up or login with your details

Forgot password? Click here to reset