Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

11/01/2019
by   Mathew Monfort, et al.
15

An event happening in the world is often made of different activities and actions that can unfold simultaneously or sequentially within a few seconds. However, most large-scale datasets built to train models for action recognition provide a single label per video clip. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information that would be mandatory to more completely comprehend different events and eventually learn causality between them. Towards this goal, we augmented the existing video dataset, Moments in Time (MiT), to include over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning and provide improved methods for visualizing and interpreting models trained for multi-label action detection.

READ FULL TEXT

page 1

page 5

page 7

page 9

research
01/09/2018

Moments in Time Dataset: one million videos for event understanding

We present the Moments in Time Dataset, a large-scale human-annotated co...
research
04/12/2018

STAIR Actions: A Video Dataset of Everyday Home Actions

A new large-scale video dataset for human action recognition, called STA...
research
09/05/2017

Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks

Automatic analysis of the video is one of most complex problems in the f...
research
09/15/2017

Multi-Label Zero-Shot Human Action Recognition via Joint Latent Embedding

Human action recognition refers to automatic recognizing human actions f...
research
12/26/2017

SLAC: A Sparsely Labeled Dataset for Action Classification and Localization

This paper describes a procedure for the creation of large-scale video d...
research
05/21/2023

Prompt Learning for Action Recognition

We present a new general learning approach for action recognition, Promp...
research
07/31/2020

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

Understanding and interpreting human actions is a long-standing challeng...

Please sign up or login with your details

Forgot password? Click here to reset