EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

09/26/2022
by   Ahmad Darkhalil, et al.
5

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: http://epic-kitchens.github.io/VISOR

READ FULL TEXT

page 8

page 20

page 22

page 30

page 33

page 34

page 35

page 40

research
01/09/2023

EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset

Visual object tracking is a key component to many egocentric vision prob...
research
03/14/2022

Implicit Motion Handling for Video Camouflaged Object Detection

We propose a new video camouflaged object detection (VCOD) framework tha...
research
08/11/2018

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

We propose an end-to-end learning framework for segmenting generic objec...
research
02/23/2021

STEP: Segmenting and Tracking Every Pixel

In this paper, we tackle video panoptic segmentation, a task that requir...
research
11/02/2020

Reducing the Annotation Effort for Video Object Segmentation Datasets

For further progress in video object segmentation (VOS), larger, more di...
research
09/21/2023

PanoVOS:Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

Panoramic videos contain richer spatial information and have attracted t...
research
01/19/2017

FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

We propose an end-to-end learning framework for segmenting generic objec...

Please sign up or login with your details

Forgot password? Click here to reset