GTM: Gray Temporal Model for Video Recognition

10/20/2021
by   Yanping Zhang, et al.
0

Data input modality plays an important role in video action recognition. Normally, there are three types of input: RGB, flow stream and compressed data. In this paper, we proposed a new input modality: gray stream. Specifically, taken the stacked consecutive 3 gray images as input, which is the same size of RGB, can not only skip the conversion process from video decoding data to RGB, but also improve the spatio-temporal modeling ability at zero computation and zero parameters. Meanwhile, we proposed a 1D Identity Channel-wise Spatio-temporal Convolution(1D-ICSC) which captures the temporal relationship at channel-feature level within a controllable computation budget(by parameters G R). Finally, we confirm its effectiveness and efficiency on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB-51 and UCF-101, and achieve impressive results.

READ FULL TEXT

page 1

page 3

research
10/06/2021

SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

This report presents the technical details of our submission to the EPIC...
research
04/21/2020

Spatio-Temporal Dual Affine Differential Invariant for Skeleton-based Action Recognition

The dynamics of human skeletons have significant information for the tas...
research
04/01/2019

Dance with Flow: Two-in-One Stream Action Detection

The goal of this paper is to detect the spatio-temporal extent of an act...
research
03/11/2021

ACTION-Net: Multipath Excitation for Action Recognition

Spatial-temporal, channel-wise, and motion patterns are three complement...
research
11/08/2020

Right on Time: Multi-Temporal Convolutions for Human Action Recognition in Videos

The variations in the temporal performance of human actions observed in ...
research
09/30/2019

Spatio-Temporal FAST 3D Convolutions for Human Action Recognition

Effective processing of video input is essential for the recognition of ...
research
05/28/2019

Improving Action Localization by Progressive Cross-stream Cooperation

Spatio-temporal action localization consists of three levels of tasks: s...

Please sign up or login with your details

Forgot password? Click here to reset