DeepAI AI Chat
Log In Sign Up

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

by   Ahmad Darkhalil, et al.

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards:


page 8

page 20

page 22

page 30

page 33

page 34

page 35

page 40


EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

Visual object tracking is a key component to many egocentric vision prob...

Implicit Motion Handling for Video Camouflaged Object Detection

We propose a new video camouflaged object detection (VCOD) framework tha...

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

We propose an end-to-end learning framework for segmenting generic objec...

STEP: Segmenting and Tracking Every Pixel

In this paper, we tackle video panoptic segmentation, a task that requir...

Reducing the Annotation Effort for Video Object Segmentation Datasets

For further progress in video object segmentation (VOS), larger, more di...

FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

We propose an end-to-end learning framework for segmenting generic objec...

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation

Interactive segmentation enables users to segment as needed by providing...