BABEL: Bodies, Action and Behavior with English Labels

06/17/2021
by   Abhinanda R. Punnakkal, et al.
0

Understanding the semantics of human movement – the what, how and why of the movement – is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of action labels for about 43 hours of mocap sequences from AMASS. Action labels are at two levels of abstraction – sequence labels describe the overall action in the sequence, and frame labels describe all actions in every frame of the sequence. Each frame label is precisely aligned with the duration of the corresponding action in the mocap sequence, and multiple actions can overlap. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. Labels from BABEL can be leveraged for tasks like action recognition, temporal action localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark of progress in 3D action recognition. The dataset, baseline method, and evaluation code is made available, and supported for academic research purposes at https://babel.is.tue.mpg.de/.

READ FULL TEXT
research
04/12/2018

STAIR Actions: A Video Dataset of Everyday Home Actions

A new large-scale video dataset for human action recognition, called STA...
research
10/10/2022

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

Precisely naming the action depicted in a video can be a challenging and...
research
11/03/2021

Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution

Automatic action identification from video and kinematic data is an impo...
research
12/03/2020

Recovering Trajectories of Unmarked Joints in 3D Human Actions Using Latent Space Optimization

Motion capture (mocap) and time-of-flight based sensing of human actions...
research
08/22/2019

Learning Object-Action Relations from Bimanual Human Demonstration Using Graph Networks

Recognising human actions is a vital task for a humanoid robot, especial...
research
07/21/2015

Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

Every moment counts in action recognition. A comprehensive understanding...
research
05/10/2018

Towards an Unequivocal Representation of Actions

This work introduces verb-only representations for actions and interacti...

Please sign up or login with your details

Forgot password? Click here to reset