Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

12/10/2019
by   Shiyuan Huang, et al.
0

Two-stream networks have achieved great success in video recognition. A two-stream network combines a spatial stream of RGB frames and a temporal stream of Optical Flow to make predictions. However, the temporal redundancy of RGB frames as well as the high-cost of optical flow computation creates challenges for both the performance and efficiency. Recent works instead use modern compressed video modalities as an alternative to the RGB spatial stream and improve the inference speed by orders of magnitudes. Previous works create one stream for each modality which are combined with an additional temporal stream through late fusion. This is redundant since some modalities like motion vectors already contain temporal information. Based on this observation, we propose a compressed domain two-stream network IP TSN for compressed video recognition, where the two streams are represented by the two types of frames (I and P frames) in compressed videos, without needing a separate temporal stream. With this goal, we propose to fully exploit the motion information of P-stream through generalized distillation from optical flow, which largely improves the efficiency and accuracy. Our P-stream runs 60 times faster than using optical flow while achieving higher accuracy. Our full IP TSN, evaluated over public action recognition benchmarks (UCF101, HMDB51 and a subset of Kinetics), outperforms other compressed domain methods by large margins while improving the total inference speed by 20

READ FULL TEXT
research
12/10/2019

Flow-Distilled IP Two-Stream Networks for Compressed Video ActionRecognition

Two-stream networks have achieved great success in video recognition. A ...
research
12/19/2018

D3D: Distilled 3D Networks for Video Action Recognition

State-of-the-art methods for video action recognition commonly use an en...
research
01/11/2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Motion has shown to be useful for video understanding, where motion is t...
research
12/02/2017

Compressed Video Action Recognition

Training robust deep video representations has proven to be much more ch...
research
01/06/2023

Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

We propose the first metric learning system for the recognition of great...
research
04/12/2017

Predictive-Corrective Networks for Action Detection

While deep feature learning has revolutionized techniques for static-ima...
research
04/26/2016

Real-time Action Recognition with Enhanced Motion Vector CNNs

The deep two-stream architecture exhibited excellent performance on vide...

Please sign up or login with your details

Forgot password? Click here to reset