Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

07/27/2023
by   Harry Cheng, et al.
0

Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets. Current methods primarily aim to either reduce model size or utilize pre-trained models, limiting their adaptability to various backbone architectures. This paper investigates the issue of over-sampled frames, a prevalent problem in many approaches yet it has received relatively little attention. Despite the use of fewer frames being a potential solution, this approach often results in a substantial decline in performance. To address this issue, we propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. This feature restoration technique brings a negligible increase in computational requirements compared to resource-intensive image encoders, such as ViT. To evaluate the effectiveness of our method, we conduct extensive experiments on four public datasets, including Kinetics-400, ActivityNet, UCF-101, and HMDB-51. With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50 a mere 0.5 surprisingly helps improve the generalization ability of the models under zero-shot settings.

READ FULL TEXT

page 2

page 3

research
07/20/2022

Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

A primary challenge faced in few-shot action recognition is inadequate v...
research
12/19/2020

SMART Frame Selection for Action Recognition

Action recognition is computationally expensive. In this paper, we addre...
research
08/04/2019

Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

With the rapid development of deep learning algorithms, action recogniti...
research
09/30/2022

Alignment-guided Temporal Attention for Video Action Recognition

Temporal modeling is crucial for various video learning tasks. Most rece...
research
12/17/2021

SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained Siamese Transformers

We propose a novel zero-shot multi-frame image restoration method for re...
research
08/03/2023

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

Applying large-scale pre-trained visual models like CLIP to few-shot act...
research
08/09/2023

Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning

The Contrastive Language-Image Pre-training (CLIP) has recently shown re...

Please sign up or login with your details

Forgot password? Click here to reset