Flow-Distilled IP Two-Stream Networks for Compressed Video ActionRecognition

12/10/2019
by   Shiyuan Huang, et al.
0

Two-stream networks have achieved great success in video recognition. A two-stream network combines a spatial stream of RGB frames and a temporal stream of Optical Flow to make predictions. However, the temporal redundancy of RGB frames as well as the high-cost of optical flow computation creates challenges for both the performance and efficiency. Recent works instead use modern compressed video modalities as an alternative to the RGB spatial stream and improve the inference speed by orders of magnitudes. Previous works create one stream for each modality which are combined with an additional temporal stream through late fusion. This is redundant since some modalities like motion vectors already contain temporal information. Based on this observation, we propose a compressed domain two-stream network IP TSN for compressed video recognition, where the two streams are represented by the two types of frames (I and P frames) in compressed videos, without needing a separate temporal stream. With this goal, we propose to fully exploit the motion information of P-stream through generalized distillation from optical flow, which largely improves the efficiency and accuracy. Our P-stream runs 60 times faster than using optical flow while achieving higher accuracy. Our full IP TSN, evaluated over public action recognition benchmarks (UCF101, HMDB51 and a subset of Kinetics), outperforms other compressed domain methods by large margins while improving the total inference speed by 20

READ FULL TEXT
research
12/10/2019

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

Two-stream networks have achieved great success in video recognition. A ...
research
10/14/2017

Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

We investigate video classification via a two-stream convolutional neura...
research
10/22/2021

Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation

Several video-based 3D pose and shape estimation algorithms have been pr...
research
06/07/2022

TadML: A fast temporal action detection with Mechanics-MLP

Temporal Action Detection(TAD) is a crucial but challenging task in vide...
research
11/19/2019

Mimic The Raw Domain: Accelerating Action Recognition in the Compressed Domain

Video understanding usually requires expensive computation that prohibit...
research
06/19/2015

Crowd Flow Segmentation in Compressed Domain using CRF

Crowd flow segmentation is an important step in many video surveillance ...
research
01/06/2023

Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

We propose the first metric learning system for the recognition of great...

Please sign up or login with your details

Forgot password? Click here to reset