Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

04/08/2018
by   Dima Damen, et al.
0

First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse kitchen habits and cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.2K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. Dataset and Project page: http://epic-kitchens.github.io

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 8

page 9

page 10

research
04/06/2016

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Computer vision has a great potential to help our daily lives by searchi...
research
10/12/2020

The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain

Wearable cameras allow to collect images and videos of humans interactin...
research
05/12/2018

BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling

Datasets drive vision progress and autonomous driving is a critical visi...
research
09/26/2019

Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments

This paper introduces the problem of multiple object forecasting (MOF), ...
research
03/08/2023

The Casual Conversations v2 Dataset

This paper introduces a new large consent-driven dataset aimed at assist...
research
03/28/2022

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

Assembly101 is a new procedural activity dataset featuring 4321 videos o...
research
04/16/2012

Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction

We present an approach to labeling short video clips with English verbs ...

Please sign up or login with your details

Forgot password? Click here to reset