ACTION-Net: Multipath Excitation for Action Recognition

03/11/2021
by   Zhengwei Wang, et al.
0

Spatial-temporal, channel-wise, and motion patterns are three complementary and crucial types of information for video action recognition. Conventional 2D CNNs are computationally cheap but cannot catch temporal relationships; 3D CNNs can achieve good performance but are computationally intensive. In this work, we tackle this dilemma by designing a generic and effective module that can be embedded into 2D CNNs. To this end, we propose a spAtio-temporal, Channel and moTion excitatION (ACTION) module consisting of three paths: Spatio-Temporal Excitation (STE) path, Channel Excitation (CE) path, and Motion Excitation (ME) path. The STE path employs one channel 3D convolution to characterize spatio-temporal representation. The CE path adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels in terms of the temporal aspect. The ME path calculates feature-level temporal differences, which is then utilized to excite motion-sensitive channels. We equip 2D CNNs with the proposed ACTION module to form a simple yet effective ACTION-Net with very limited extra computational cost. ACTION-Net is demonstrated by consistently outperforming 2D CNN counterparts on three backbones (i.e., ResNet-50, MobileNet V2 and BNInception) employing three datasets (i.e., Something-Something V2, Jester, and EgoGesture). Codes are available at <https://github.com/V-Sense/ACTION-Net>.

READ FULL TEXT

page 2

page 14

page 15

research
12/05/2021

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition

The modeling, computational cost, and accuracy of traditional Spatio-tem...
research
06/03/2021

CT-Net: Channel Tensorization Network for Video Classification

3D convolution is powerful for video classification but often computatio...
research
09/27/2021

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

The explosive growth in video streaming requires video understanding at ...
research
11/20/2018

Temporal Shift Module for Efficient Video Understanding

The explosive growth in online video streaming gives rise to challenges ...
research
11/21/2019

TEINet: Towards an Efficient Architecture for Video Recognition

Efficiency is an important issue in designing video architectures for ac...
research
10/20/2021

GTM: Gray Temporal Model for Video Recognition

Data input modality plays an important role in video action recognition....
research
04/20/2022

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

Attention mechanisms have significantly boosted the performance of video...

Please sign up or login with your details

Forgot password? Click here to reset