AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

07/31/2020
by   Yue Meng, et al.
4

Action recognition is an open and challenging problem in computer vision. While current state-of-the-art models offer excellent recognition results, their computational expense limits their impact for many real-world applications. In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos. Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on several challenging action recognition benchmark datasets well demonstrate the efficacy of our proposed approach over state-of-the-art methods. The project page can be found at https://mengyuest.github.io/AR-Net

READ FULL TEXT

page 2

page 5

page 14

page 21

page 22

page 23

page 24

05/11/2021

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Multi-modal learning, which focuses on utilizing various modalities to i...
02/10/2021

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Temporal modelling is the key for efficient video action recognition. Wh...
08/23/2021

Dynamic Network Quantization for Efficient Video Inference

Deep convolutional networks have recently achieved great success in vide...
02/19/2022

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study

While action recognition (AR) has gained large improvements with the int...
06/28/2020

Dynamic Sampling Networks for Efficient Action Recognition in Videos

The existing action recognition methods are mainly based on clip-level c...
07/12/2022

Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

Adaptive sampling that exploits the spatiotemporal redundancy in videos ...
11/03/2021

Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution

Automatic action identification from video and kinematic data is an impo...