DeepAI AI Chat
Log In Sign Up

Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

by   Serena Yeung, et al.
Stanford University

Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.


page 4

page 6

page 12

page 13

page 14


Temporal Action Segmentation with High-level Complex Activity Labels

Over the past few years, the success in action recognition on short trim...

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

We introduce UCF101 which is currently the largest dataset of human acti...

BABEL: Bodies, Action and Behavior with English Labels

Understanding the semantics of human movement – the what, how and why of...

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

We address the problem of fine-grained action localization from temporal...

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Action recognition models have shown a promising capability to classify ...

Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction

This paper aims at task-oriented action prediction, i.e., predicting a s...

EvaluationNet: Can Human Skill be Evaluated by Deep Networks?

With the recent substantial growth of media such as YouTube, a considera...