In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

03/29/2022
by   Xiao Pan, et al.
9

In this paper, we focus on the unsupervised Video Object Segmentation (VOS) task which learns visual correspondence from unlabeled videos. Previous methods are mainly based on the contrastive learning paradigm, which optimize either in pixel level or image level and show unsatisfactory scalability. Image-level optimization learns pixel-wise information implicitly therefore is sub-optimal for such dense prediction task, while pixel-level optimization ignores the high-level semantic scope for capturing object deformation. To complementarily learn these two levels of information in an unified framework, we propose the In-aNd-Out (INO) generative learning from a purely generative perspective, which captures both high-level and fine-grained semantics by leveraging the structural superiority of Vision Transformer (ViT) and achieves better scalability. Specifically, the in-generative learning recovers the corrupted parts of an image via inferring its fine-grained semantic structure, while the out-generative learning captures high-level semantics by imagining the global information of an image given only random fragments. To better discover the temporal information, we additionally force the inter-frame consistency from both feature level and affinity matrix level. Extensive experiments on DAVIS-2017 val and YouTube-VOS 2018 val show that our INO outperforms previous state-of-the-art methods by significant margins.

READ FULL TEXT

page 1

page 3

page 7

page 8

research
09/26/2019

Joint-task Self-supervised Learning for Temporal Correspondence

This paper proposes to learn reliable dense correspondence from videos i...
research
11/19/2020

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

Contrastive learning methods for unsupervised visual representation lear...
research
12/09/2020

Contrastive Transformation for Self-supervised Correspondence Learning

In this paper, we focus on the self-supervised learning of visual corres...
research
04/23/2023

Capturing Fine-grained Semantics in Contrastive Graph Representation Learning

Graph contrastive learning defines a contrastive task to pull similar in...
research
10/01/2021

Generative Memory-Guided Semantic Reasoning Model for Image Inpainting

Most existing methods for image inpainting focus on learning the intra-i...
research
09/21/2018

Unsupervised Image to Sequence Translation with Canvas-Drawer Networks

Encoding images as a series of high-level constructs, such as brush stro...
research
03/21/2022

Dense Siamese Network

This paper presents Dense Siamese Network (DenseSiam), a simple unsuperv...

Please sign up or login with your details

Forgot password? Click here to reset