HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

12/16/2021
by   Ali Athar, et al.
34

Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles VOS by effectively leveraging annotated static images for understanding object appearance and scene context. We encode object instances and scene information from an image frame into robust high-level descriptors which can then be used to re-segment those objects in different frames. As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks compared to existing methods trained without video annotations. Without any architectural modification, HODOR can also learn from video context around single annotated video frames by utilizing cyclic consistency, whereas other methods rely on dense, temporally consistent annotations.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 13

page 14

page 16

page 17

research
07/21/2022

Region Aware Video Object Segmentation with Deep Motion Modeling

Current semi-supervised video object segmentation (VOS) methods usually ...
research
05/20/2019

Learning Video Representations from Correspondence Proposals

Correspondences between frames encode rich information about dynamic con...
research
07/29/2023

XMem++: Production-level Video Segmentation From Few Annotated Frames

Despite advancements in user-guided video segmentation, extracting compl...
research
10/04/2018

Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images

Deep learning based object detectors require thousands of diversified bo...
research
03/03/2023

Unified Perception: Efficient Video Panoptic Segmentation with Minimal Annotation Costs

Depth-aware video panoptic segmentation is a promising approach to camer...
research
02/01/2021

ConvNets for Counting: Object Detection of Transient Phenomena in Steelpan Drums

We train an object detector built from convolutional neural networks to ...
research
12/30/2015

LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

Supervoxel segmentation has strong potential to be incorporated into ear...

Please sign up or login with your details

Forgot password? Click here to reset