Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web

12/22/2015
by   Shugao Ma, et al.
0

Recently, attempts have been made to collect millions of videos to train CNN models for action recognition in videos. However, curating such large-scale video datasets requires immense human labor, and training CNNs on millions of videos demands huge computational resources. In contrast, collecting action images from the Web is much easier and training on images requires much less computation. In addition, labeled web images tend to contain discriminative action poses, which highlight discriminative portions of a video's temporal progression. We explore the question of whether we can utilize web action images to train better CNN models for action recognition in videos. We collect 23.8K manually filtered images from the Web that depict the 101 actions in the UCF101 action video dataset. We show that by utilizing web action images along with videos in training, significant performance boosts of CNN models can be achieved. We then investigate the scalability of the process by leveraging crawled web images (unfiltered) for UCF101 and ActivityNet. We replace 16.2M video frames by 393K unfiltered images and get comparable performance.

READ FULL TEXT

page 1

page 3

page 5

research
03/22/2020

Ensembles of Deep Neural Networks for Action Recognition in Still Images

Despite the fact that notable improvements have been made recently in th...
research
08/03/2017

Attention Transfer from Web Images for Video Recognition

Training deep learning based video classifiers for action recognition re...
research
06/14/2017

Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

Webly-supervised learning has recently emerged as an alternative paradig...
research
04/04/2015

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

We address the problem of fine-grained action localization from temporal...
research
10/22/2019

Human Action Recognition in Drone Videos using a Few Aerial Training Examples

Drones are enabling new forms of human actions surveillance due to their...
research
06/06/2015

First-Take-All: Temporal Order-Preserving Hashing for 3D Action Videos

With the prevalence of the commodity depth cameras, the new paradigm of ...
research
12/23/2015

Convolutional Architecture Exploration for Action Recognition and Image Classification

Convolutional Architecture for Fast Feature Encoding (CAFFE) [11] is a s...

Please sign up or login with your details

Forgot password? Click here to reset