DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

01/11/2019
by   Zheng Shou, et al.
0

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is noisy and has substantially reduced resolution, which makes it a less discriminative motion representation. To remedy these issues, we propose a lightweight generator network, which reduces noises in motion vectors and captures fine motion details, achieving a more Discriminative Motion Cue (DMC) representation. Since optical flow is a more accurate motion representation, we train the DMC generator to approximate flow using a reconstruction loss and a generative adversarial loss, jointly with the downstream action classification task. Extensive evaluations on three action recognition benchmarks (HMDB-51, UCF-101, and a subset of Kinetics) confirm the effectiveness of our method. Our full system, consisting of the generator and the classifier, is coined as DMC-Net which obtains high accuracy close to that of using flow and runs two orders of magnitude faster than using optical flow at inference time.

READ FULL TEXT

page 4

page 7

research
12/10/2019

Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition

Two-stream networks have achieved great success in video recognition. A ...
research
03/05/2021

Unsupervised Motion Representation Enhanced Network for Action Recognition

Learning reliable motion representation between consecutive frames, such...
research
04/26/2016

Real-time Action Recognition with Enhanced Motion Vector CNNs

The deep two-stream architecture exhibited excellent performance on vide...
research
08/08/2020

PAN: Towards Fast Action Recognition via Learning Persistence of Appearance

Efficiently modeling dynamic motion information in videos is crucial for...
research
06/19/2015

Crowd Flow Segmentation in Compressed Domain using CRF

Crowd flow segmentation is an important step in many video surveillance ...
research
01/25/2022

Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

Traditional media coding schemes typically encode image/video into a sem...
research
02/08/2021

Analysis of Latent-Space Motion for Collaborative Intelligence

When the input to a deep neural network (DNN) is a video signal, a seque...

Please sign up or login with your details

Forgot password? Click here to reset