Alignment-guided Temporal Attention for Video Action Recognition

09/30/2022
by   Yizhou Zhao, et al.
10

Temporal modeling is crucial for various video learning tasks. Most recent approaches employ either factorized (2D+1D) or joint (3D) spatial-temporal operations to extract temporal contexts from the input frames. While the former is more efficient in computation, the latter often obtains better performance. In this paper, we attribute this to a dilemma between the sufficiency and the efficiency of interactions among various positions in different frames. These interactions affect the extraction of task-relevant information shared among frames. To resolve this issue, we prove that frame-by-frame alignments have the potential to increase the mutual information between frame representations, thereby including more task-relevant information to boost effectiveness. Then we propose Alignment-guided Temporal Attention (ATA) to extend 1-dimensional temporal attention with parameter-free patch-level alignments between neighboring frames. It can act as a general plug-in for image backbones to conduct the action recognition task without any model-specific design. Extensive experiments on multiple benchmarks demonstrate the superiority and generality of our module.

READ FULL TEXT

page 6

page 12

research
07/20/2022

Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

A primary challenge faced in few-shot action recognition is inadequate v...
research
08/22/2018

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

We propose a self-supervised learning method to jointly reason about spa...
research
05/10/2023

Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

Current few-shot action recognition involves two primary sources of info...
research
11/18/2022

Look More but Care Less in Video Recognition

Existing action recognition methods typically sample a few frames to rep...
research
04/20/2023

Search-Map-Search: A Frame Selection Paradigm for Action Recognition

Despite the success of deep learning in video understanding tasks, proce...
research
07/27/2023

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

Training an effective video action recognition model poses significant c...
research
03/26/2023

Frame Flexible Network

Existing video recognition algorithms always conduct different training ...

Please sign up or login with your details

Forgot password? Click here to reset