STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition

03/18/2020
by   Xu Li, et al.
0

Effective and Efficient spatio-temporal modeling is essential for action recognition. Existing methods suffer from the trade-off between model performance and model complexity. In this paper, we present a novel Spatio-Temporal Hybrid Convolution Network (denoted as "STH") which simultaneously encodes spatial and temporal video information with a small parameter cost. Different from existing works that sequentially or parallelly extract spatial and temporal information with different convolutional layers, we divide the input channels into multiple groups and interleave the spatial and temporal operations in one convolutional layer, which deeply incorporates spatial and temporal clues. Such a design enables efficient spatio-temporal modeling and maintains a small model scale. STH-Conv is a general building block, which can be plugged into existing 2D CNN architectures such as ResNet and MobileNet by replacing the conventional 2D-Conv blocks (2D convolutions). STH network achieves competitive or even better performance than its competitors on benchmark datasets such as Something-Something (V1 V2), Jester, and HMDB-51. Moreover, STH enjoys performance superiority over 3D CNNs while maintaining an even smaller parameter cost than 2D CNNs.

READ FULL TEXT

page 19

page 20

research
09/30/2019

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of ...
research
12/14/2018

TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition

We present Temporal Aggregation Network (TAN) which decomposes 3D convol...
research
03/04/2019

Collaborative Spatio-temporal Feature Learning for Video Action Recognition

Spatio-temporal feature learning is of central importance for action rec...
research
11/05/2018

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Despite the success of deep learning for static image understanding, it ...
research
10/29/2021

ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition

It is difficult for people to interpret the decision-making in the infer...
research
01/08/2023

STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition

Recently privacy-preserving action recognition (PPAR) has been becoming ...
research
09/28/2019

Grouped Spatial-Temporal Aggregation for Efficient Action Recognition

Temporal reasoning is an important aspect of video analysis. 3D CNN show...

Please sign up or login with your details

Forgot password? Click here to reset