SLAC: A Sparsely Labeled Dataset for Action Classification and Localization

12/26/2017
by   Hang Zhao, et al.
0

This paper describes a procedure for the creation of large-scale video datasets for action classification and localization from unconstrained, realistic web data. The scalability of the proposed procedure is demonstrated by building a novel video benchmark, named SLAC (Sparsely Labeled ACtions), consisting of over 520K untrimmed videos and 1.75M clip annotations spanning 200 action categories. Using our proposed framework, annotating a clip takes merely 8.8 seconds on average. This represents a saving in labeling time of over 95 localization of actions. Our approach dramatically reduces the amount of human labeling by automatically identifying hard clips, i.e., clips that contain coherent actions but lead to prediction disagreement between action classifiers. A human annotator can disambiguate whether such a clip truly contains the hypothesized action in a handful of seconds, thus generating labels for highly informative samples at little cost. We show that our large-scale dataset can be used to effectively pre-train action recognition models, significantly improving final metrics on smaller-scale benchmarks after fine-tuning. On Kinetics, UCF-101 and HMDB-51, models pre-trained on SLAC outperform baselines trained from scratch, by 2.0 accuracy, respectively when RGB input is used. Furthermore, we introduce a simple procedure that leverages the sparse labels in SLAC to pre-train action localization models. On THUMOS14 and ActivityNet-v1.3, our localization model improves the mAP of baseline model by 8.6

READ FULL TEXT

page 4

page 6

research
04/12/2018

STAIR Actions: A Video Dataset of Everyday Home Actions

A new large-scale video dataset for human action recognition, called STA...
research
04/04/2015

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

We address the problem of fine-grained action localization from temporal...
research
05/26/2023

CVB: A Video Dataset of Cattle Visual Behaviors

Existing image/video datasets for cattle behavior recognition are mostly...
research
11/01/2019

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

An event happening in the world is often made of different activities an...
research
09/29/2021

Grounding Predicates through Actions

Symbols representing abstract states such as "dish in dishwasher" or "cu...
research
01/10/2019

Cricket stroke extraction: Towards creation of a large-scale cricket actions dataset

In this paper, we deal with the problem of temporal action localization ...
research
09/08/2016

Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data

Action classification in still images has been a popular research topic ...

Please sign up or login with your details

Forgot password? Click here to reset