Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

10/14/2017
by   Aaron Chadha, et al.
0

We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard datasets shows that the proposed approach is competitive to the best two-stream video classification approaches found in the literature. At the same time: (i) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; (ii) selective decoding is up to 12 times faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.

READ FULL TEXT

page 3

page 4

research
12/10/2019

Flow-Distilled IP Two-Stream Networks for Compressed Video ActionRecognition

Two-stream networks have achieved great success in video recognition. A ...
research
09/27/2018

Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks

Advanced video classification systems decode video frames to derive the ...
research
08/31/2016

Efficient Two-Stream Motion and Appearance 3D CNNs for Video Classification

The video and action classification have extremely evolved by deep neura...
research
04/26/2016

Real-time Action Recognition with Enhanced Motion Vector CNNs

The deep two-stream architecture exhibited excellent performance on vide...
research
05/09/2017

Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks

Shot boundary detection (SBD) is an important pre-processing step for vi...
research
12/03/2018

Spatial-temporal Fusion Convolutional Neural Network for Simulated Driving Behavior Recognition

Abnormal driving behaviour is one of the leading cause of terrible traff...
research
03/26/2018

CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF

This paper addresses the problem of video object segmentation, where the...

Please sign up or login with your details

Forgot password? Click here to reset