Real-time Action Recognition with Enhanced Motion Vector CNNs

04/26/2016
by   Bowen Zhang, et al.
0

The deep two-stream architecture exhibited excellent performance on video based action recognition. The most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time. This paper accelerates this architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation. However, motion vector lacks fine structures, and contains noisy and inaccurate motion patterns, leading to the evident degradation of recognition performance. Our key insight for relieving this problem is that optical flow and motion vector are inherent correlated. Transferring the knowledge learned with optical flow CNN to motion vector CNN can significantly boost the performance of the latter. Specifically, we introduce three strategies for this, initialization transfer, supervision transfer and their combination. Experimental results show that our method achieves comparable recognition performance to the state-of-the-art, while our method can process 390.7 frames per second, which is 27 times faster than the original two-stream method.

READ FULL TEXT

page 1

page 4

research
04/02/2017

Hidden Two-Stream Convolutional Networks for Action Recognition

Analyzing videos of human actions involves understanding the temporal re...
research
12/10/2019

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

Two-stream networks have achieved great success in video recognition. A ...
research
03/05/2021

Unsupervised Motion Representation Enhanced Network for Action Recognition

Learning reliable motion representation between consecutive frames, such...
research
01/11/2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Motion has shown to be useful for video understanding, where motion is t...
research
10/14/2017

Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

We investigate video classification via a two-stream convolutional neura...
research
05/16/2018

Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning

Good temporal representations are crucial for video understanding, and t...
research
01/16/2020

Rethinking Motion Representation: Residual Frames with 3D ConvNets for Better Action Recognition

Recently, 3D convolutional networks yield good performance in action rec...

Please sign up or login with your details

Forgot password? Click here to reset