Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

08/04/2019
by   Dong Cao, et al.
0

With the rapid development of deep learning algorithms, action recognition in video has achieved many important research results. Without requiring any positive examples to classify new categories, Zero-Shot Action Recognition (ZSAR) has recently attracted considerable attention. Another difficulty in action recognition is that untrimmed data may seriously affect model performance. We propose a composite two-stream framework with a pre-trained model. Our proposed framework includes a classifier branch and a composite feature branch. The graph network model is adopted in each of the two branches, which effectively improves the feature extraction and reasoning ability of the framework. In the composite feature branch, a 3-channel self-attention models are constructed to weight each frame in the video and give more attention to the key frames. Each self-attention models channel outputs a set of attention weights to focus on a particular aspect of the video, and a set of attention weights corresponds to a one-dimensional vector. The 3-channel self-attention models can evaluate key frames from multiple aspects, and the output sets of attention weight vectors form an attention matrix, which effectively enhances the attention of key frames with strong correlation of action. This model can implement action recognition under zero-shot conditions, and has good recognition performance for untrimmed video data. Experimental results on relevant data sets confirm the validity of our model.

READ FULL TEXT
research
09/13/2019

Zero-Shot Action Recognition in Videos: A Survey

Zero-Shot Action Recognition has attracted attention in the last years, ...
research
12/02/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

First-person action recognition is a challenging task in video understan...
research
12/17/2022

Inductive Attention for Video Action Anticipation

Anticipating future actions based on video observations is an important ...
research
02/20/2019

Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision

Action recognition in videos has attracted a lot of attention in the pas...
research
07/27/2023

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

Training an effective video action recognition model poses significant c...
research
12/15/2020

GTA: Global Temporal Attention for Video Action Understanding

Self-attention learns pairwise interactions via dot products to model lo...
research
09/29/2022

REST: REtrieve Self-Train for generative action recognition

This work is on training a generative action/video recognition model who...

Please sign up or login with your details

Forgot password? Click here to reset