STM: SpatioTemporal and Motion Encoding for Action Recognition

08/07/2019
by   Boyuan Jiang, et al.
4

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.

READ FULL TEXT

page 1

page 5

research
11/07/2016

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performan...
research
07/11/2019

Two-stream Spatiotemporal Feature for Video QA Task

Understanding the content of videos is one of the core techniques for de...
research
01/04/2018

What have we learned from deep representations for action recognition?

As the success of deep models has led to their deployment in all areas o...
research
07/04/2022

Automated Classification of General Movements in Infants Using a Two-stream Spatiotemporal Fusion Network

The assessment of general movements (GMs) in infants is a useful tool in...
research
12/10/2019

HalluciNet-ing Spatiotemporal Representations Using 2D-CNN

Spatiotemporal representations learnt using 3D convolutional neural netw...
research
08/17/2014

Action Classification with Locality-constrained Linear Coding

We propose an action classification algorithm which uses Locality-constr...
research
11/24/2017

Appearance-and-Relation Networks for Video Classification

Spatiotemporal feature learning in videos is a fundamental and difficult...

Please sign up or login with your details

Forgot password? Click here to reset