UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

05/22/2023
by   Zhenghao Zhang, et al.
0

Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of available datasets. The Segment Anything Model (SAM) has introduced a new prompt-driven paradigm for image segmentation, unlocking a range of previously unexplored capabilities. In this paper, we propose a novel paradigm called UVOSAM, which leverages SAM for unsupervised video object segmentation without requiring video mask labels. To address SAM's limitations in instance discovery and identity association, we introduce a video salient object tracking network that automatically generates trajectories for prominent foreground objects. These trajectories then serve as prompts for SAM to produce video masks on a frame-by-frame basis. Our experimental results demonstrate that UVOSAM significantly outperforms current mask-supervised methods. These findings suggest that UVOSAM has the potential to improve unsupervised video object segmentation and reduce the cost of manual annotation.

READ FULL TEXT

page 4

page 6

page 7

page 8

research
07/26/2023

Tracking Anything in High Quality

Visual object tracking is a fundamental video task in computer vision. R...
research
03/28/2019

BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

Semi-supervised video object segmentation has made significant progress ...
research
04/10/2021

Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation

This paper addresses the task of unsupervised video multi-object segment...
research
06/08/2021

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Recent advances in deep learning have brought significant progress in vi...
research
12/23/2021

Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier

Unsupervised video object segmentation (UVOS) is a per-pixel binary labe...
research
06/01/2022

Differentiable Soft-Masked Attention

Transformers have become prevalent in computer vision due to their perfo...
research
11/13/2020

Image Animation with Perturbed Masks

We present a novel approach for image-animation of a source image by a d...

Please sign up or login with your details

Forgot password? Click here to reset