Deep Local Video Feature for Action Recognition

01/25/2017
by   Zhenzhong Lan, et al.
0

We investigate the problem of representing an entire video using CNN features for human action recognition. Currently, limited by GPU memory, we have not been able to feed a whole video into CNN/RNNs for end-to-end learning. A common practice is to use sampled frames as inputs and video labels as supervision. One major problem of this popular approach is that the local samples may not contain the information indicated by global labels. To deal with this problem, we propose to treat the deep networks trained on local inputs as local feature extractors. After extracting local features, we aggregate them into global features and train another mapping function on the same training data to map the global features into global labels. We study a set of problems regarding this new type of local features such as how to aggregate them into global features. Experimental results on HMDB51 and UCF101 datasets show that, for these new local features, a simple maximum pooling on the sparsely sampled features lead to significant performance improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2017

End-to-end Video-level Representation Learning for Action Recognition

From the frame/clip-level feature learning to the video-level representa...
research
06/11/2015

P-CNN: Pose-based CNN Features for Action Recognition

This work targets human action recognition in video. While recent method...
research
01/27/2018

Interactive Deep Colorization With Simultaneous Global and Local Inputs

Colorization methods using deep neural networks have become a recent tre...
research
12/17/2020

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN

3D Convolutional Neural Network (3D CNN) captures spatial and temporal i...
research
02/19/2020

Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines

This paper proposes a simple yet effective method for human action recog...
research
03/26/2018

Video Representation Learning Using Discriminative Pooling

Popular deep models for action recognition in videos generate independen...
research
09/05/2019

Discriminative Video Representation Learning Using Support Vector Classifiers

Most popular deep models for action recognition in videos generate indep...

Please sign up or login with your details

Forgot password? Click here to reset