VA-RED^2: Video Adaptive Redundancy Reduction

02/15/2021
by   Bowen Pan, et al.
0

Performing inference on deep learning models for videos remains a challenge due to the large amount of computational resources required to achieve robust recognition. An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both. The type of redundant features depends on the dynamics and type of events in the video: static videos have more temporal redundancy while videos focusing on objects tend to have more channel redundancy. Here we present a redundancy reduction framework, termed VA-RED^2, which is input-dependent. Specifically, our VA-RED^2 framework uses an input-dependent policy to decide how many features need to be computed for temporal and channel dimensions. To keep the capacity of the original model, after fully computing the necessary features, we reconstruct the remaining redundant features from those using cheap linear operations. We learn the adaptive policy jointly with the network weights in a differentiable way with a shared-weight mechanism, making it highly efficient. Extensive experiments on multiple video datasets and different visual tasks show that our framework achieves 20% - 40% reduction in computation (FLOPs) when compared to state-of-the-art methods without any performance loss. Project page: http://people.csail.mit.edu/bpan/va-red/.

READ FULL TEXT

page 9

page 16

page 17

page 18

page 19

research
06/23/2021

IA-RED^2: Interpretability-Aware Redundancy Reduction for Vision Transformers

The self-attention-based model, transformer, is recently becoming the le...
research
10/26/2018

Fine-grained Video Categorization with Redundancy Reduction Attention

For fine-grained categorization tasks, videos could serve as a better so...
research
02/10/2021

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Temporal modelling is the key for efficient video action recognition. Wh...
research
06/24/2020

Feature-dependent Cross-Connections in Multi-Path Neural Networks

Learning a particular task from a dataset, samples in which originate fr...
research
04/24/2023

Robust and Efficient Memory Network for Video Object Segmentation

This paper proposes a Robust and Efficient Memory Network, referred to a...
research
08/23/2021

Dynamic Network Quantization for Efficient Video Inference

Deep convolutional networks have recently achieved great success in vide...
research
01/03/2023

Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos

Adversarial robustness assessment for video recognition models has raise...

Please sign up or login with your details

Forgot password? Click here to reset