AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

05/23/2017
by   Chunhui Gu, et al.
0

This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 192 15-minute video clips, where actions are localized in space and time, resulting in 740k action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 16.2 approaches for video understanding.

READ FULL TEXT

page 1

page 4

page 5

page 12

page 13

page 15

research
05/16/2021

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

Spatio-temporal action detection is an important and challenging problem...
research
12/21/2022

Deep set conditioned latent representations for action recognition

In recent years multi-label, multi-class video action recognition has ga...
research
06/15/2016

A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets

In this paper, we introduce a new hierarchical model for human action re...
research
12/06/2018

Video Action Transformer Network

We introduce the Action Transformer model for recognizing and localizing...
research
05/26/2023

CVB: A Video Dataset of Cattle Visual Behaviors

Existing image/video datasets for cattle behavior recognition are mostly...
research
07/22/2017

Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

Current state-of-the-art human action recognition is focused on the clas...
research
02/20/2019

Dynamic Matrix Decomposition for Action Recognition

Designing a technique for the automatic analysis of different actions in...

Please sign up or login with your details

Forgot password? Click here to reset