Self-supervised Video Object Segmentation

06/22/2020
by   Fangrui Zhu, et al.
16

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching, which resolves the challenge caused by the dis-appearance and reappearance of objects; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity, e.g. occlusions or dis-occlusions, fast motions; (iii) we explore the efficiency of self-supervised representation learning for dense tracking, surprisingly, we show that a powerful tracking model can be trained with as few as 100 raw video clips (equivalent to a duration of 11mins), indicating that low-level statistics have already been effective for tracking tasks; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube-VOS, as well as surpassing most of methods trained with millions of manual segmentation annotations, further bridging the gap between self-supervised and supervised learning. Codes are released to foster any further research (https://github.com/fangruizhu/self_sup_semiVOS).

READ FULL TEXT

page 2

page 4

page 7

page 13

page 14

page 15

research
02/18/2020

MAST: A Memory-Augmented Self-supervised Tracker

Recent interest in self-supervised dense tracking has yielded rapid prog...
research
05/02/2019

Self-supervised Learning for Video Correspondence Flow

The objective of this paper is self-supervised learning of feature embed...
research
10/01/2021

Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective

Human interpretation of the world encompasses the use of symbols to cate...
research
09/26/2019

Joint-task Self-supervised Learning for Temporal Correspondence

This paper proposes to learn reliable dense correspondence from videos i...
research
07/05/2021

Do Different Tracking Tasks Require Different Appearance Models?

Tracking objects of interest in a video is one of the most popular and w...
research
06/27/2023

TrickVOS: A Bag of Tricks for Video Object Segmentation

Space-time memory (STM) network methods have been dominant in semi-super...
research
02/14/2022

Box Supervised Video Segmentation Proposal Network

Video Object Segmentation (VOS) has been targeted by various fully-super...

Please sign up or login with your details

Forgot password? Click here to reset