Predictive Coding Networks Meet Action Recognition

10/22/2019
by   Xia Huang, et al.
19

Action recognition is a key problem in computer vision that labels videos with a set of predefined actions. Capturing both, semantic content and motion, along the video frames is key to achieve high accuracy performance on this task. Most of the state-of-the-art methods rely on RGB frames for extracting the semantics and pre-computed optical flow fields as a motion cue. Then, both are combined using deep neural networks. Yet, it has been argued that such models are not able to leverage the motion information extracted from the optical flow, but instead the optical flow allows for better recognition of people and objects in the video. This urges the need to explore different cues or models that can extract motion in a more informative fashion. To tackle this issue, we propose to explore the predictive coding network, so called PredNet, a recurrent neural network that propagates predictive coding errors across layers and time steps. We analyze whether PredNet can better capture motions in videos by estimating over time the representations extracted from pre-trained networks for action recognition. In this way, the model only relies on the video frames, and does not need pre-processed optical flows as input. We report the effectiveness of our proposed model on UCF101 and HMDB51 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2016

ActionFlowNet: Learning Motion Representation for Action Recognition

Even with the recent advances in convolutional neural networks (CNN) in ...
research
07/20/2020

MotionSqueeze: Neural Motion Feature Learning for Video Understanding

Motion plays a crucial role in understanding videos and most state-of-th...
research
01/25/2022

Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

Traditional media coding schemes typically encode image/video into a sem...
research
04/12/2017

Predictive-Corrective Networks for Action Detection

While deep feature learning has revolutionized techniques for static-ima...
research
06/21/2020

Motion Representation Using Residual Frames with 3D CNN

Recently, 3D convolutional networks (3D ConvNets) yield good performance...
research
08/08/2020

PAN: Towards Fast Action Recognition via Learning Persistence of Appearance

Efficiently modeling dynamic motion information in videos is crucial for...
research
12/30/2017

A Unified Method for First and Third Person Action Recognition

In this paper, a new video classification methodology is proposed which ...

Please sign up or login with your details

Forgot password? Click here to reset