In Defense of Online Models for Video Instance Segmentation

07/21/2022
by   Junfeng Wu, et al.
1

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highly desirable if online models can achieve comparable or even better performance than offline models. By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and fully exploit history information for stability. Despite its simplicity, our method outperforms all online and offline methods on three benchmarks. Specifically, we achieve 49.5 AP on YouTube-VIS 2019, a significant improvement of 13.2 AP and 2.1 AP over the prior online and offline art, respectively. Moreover, we achieve 30.2 AP on OVIS, a more challenging dataset with significant crowding and occlusions, surpassing the prior art by 14.8 AP. The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022). We hope the simplicity and effectiveness of our method, as well as our insight into current methods, could shed light on the exploration of VIS models.

READ FULL TEXT

page 6

page 11

page 20

page 21

page 22

page 23

research
06/07/2023

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

We introduce a novel framework called RefineVIS for Video Instance Segme...
research
01/05/2023

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Video instance segmentation (VIS) aims at segmenting and tracking object...
research
11/16/2022

Robust Online Video Instance Segmentation with Track Queries

Recently, transformer-based methods have achieved impressive results on ...
research
11/18/2022

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

This technical report describes our 2nd-place solution for the ECCV 2022...
research
02/15/2023

Offline-to-Online Knowledge Distillation for Video Instance Segmentation

In this paper, we present offline-to-online knowledge distillation (OOKD...
research
12/08/2021

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the inform...
research
07/18/2023

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Referring video object segmentation (RVOS) aims at segmenting an object ...

Please sign up or login with your details

Forgot password? Click here to reset