Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

03/28/2022
by   Fadime Sener, et al.
3

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.

READ FULL TEXT

page 2

page 3

page 4

page 12

page 13

research
04/24/2023

AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation

We present AssemblyHands, a large-scale benchmark dataset with accurate ...
research
07/01/2020

The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose

The availability of a large labeled dataset is a key requirement for app...
research
04/12/2020

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benc...
research
08/03/2017

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Understanding human activity and being able to explain it in detail surp...
research
04/18/2022

Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding

Understanding animals' behaviors is significant for a wide range of appl...
research
07/05/2022

MVP: Robust Multi-View Practice for Driving Action Localization

Distracted driving causes thousands of deaths per year, and how to apply...
research
04/08/2018

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

First-person vision is gaining interest as it offers a unique viewpoint ...

Please sign up or login with your details

Forgot password? Click here to reset