OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

07/18/2023
by   Dongming Wu, et al.
0

Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction. Current state-of-the-art methods fall into an offline pattern, in which each clip independently interacts with text embedding for cross-modal understanding. They usually present that the offline pattern is necessary for RVOS, yet model limited temporal association within each clip. In this work, we break up the previous offline belief and propose a simple yet effective online model using explicit query propagation, named OnlineRefer. Specifically, our approach leverages target cues that gather semantic information and position prior to improve the accuracy and ease of referring predictions for the current frame. Furthermore, we generalize our online model into a semi-online framework to be compatible with video-based backbones. To show the effectiveness of our method, we evaluate it on four benchmarks, , Refer-Youtube-VOS, Refer-DAVIS17, A2D-Sentences, and JHMDB-Sentences. Without bells and whistles, our OnlineRefer with a Swin-L backbone achieves 63.5 J F and 64.8 J F on Refer-Youtube-VOS and Refer-DAVIS17, outperforming all other offline methods.

READ FULL TEXT

page 1

page 2

page 3

page 7

research
01/05/2023

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Video instance segmentation (VIS) aims at segmenting and tracking object...
research
01/03/2022

Language as Queries for Referring Video Object Segmentation

Referring video object segmentation (R-VOS) is an emerging cross-modal t...
research
06/09/2022

VITA: Video Instance Segmentation via Object Token Association

We introduce a novel paradigm for offline Video Instance Segmentation (V...
research
07/21/2022

In Defense of Online Models for Video Instance Segmentation

In recent years, video instance segmentation (VIS) has been largely adva...
research
02/15/2023

Offline-to-Online Knowledge Distillation for Video Instance Segmentation

In this paper, we present offline-to-online knowledge distillation (OOKD...
research
09/05/2023

Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

Referring video object segmentation (RVOS), as a supervised learning tas...
research
10/23/2020

Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

In this paper, we address several inadequacies of current video object s...

Please sign up or login with your details

Forgot password? Click here to reset