InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

by   Fei He, et al.

Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods.


page 7

page 16

page 17

page 18


SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

In this work, we present SeqFormer, a frustratingly simple model for vid...

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Video Instance Segmentation (VIS) aims to simultaneously classify, segme...

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Referring video object segmentation (RVOS) aims at segmenting an object ...

In Defense of Online Models for Video Instance Segmentation

In recent years, video instance segmentation (VIS) has been largely adva...

Video Instance Segmentation with a Propose-Reduce Paradigm

Video instance segmentation (VIS) aims to segment and associate all inst...

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the inform...

Instance Shadow Detection

Instance shadow detection is a brand new problem, aiming to find shadow ...

Please sign up or login with your details

Forgot password? Click here to reset