Video Instance Segmentation in an Open-World

by   Omkar Thawakar, et al.

Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at <>.


page 2

page 4

page 7

page 8


TCOVIS: Temporally Consistent Online Video Instance Segmentation

In recent years, significant progress has been made in video instance se...

ElC-OIS: Ellipsoidal Clustering for Open-World Instance Segmentation on LiDAR Data

Open-world Instance Segmentation (OIS) is a challenging task that aims t...

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) app...

Bayesian Semantic Instance Segmentation in Open Set World

This paper addresses the instance segmentation task in the open-set cond...

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

Accurate 3D mitochondria instance segmentation in electron microscopy (E...

Expanding Low-Density Latent Regions for Open-Set Object Detection

Modern object detectors have achieved impressive progress under the clos...

Contrastive Learning for Cross-Domain Open World Recognition

The ability to evolve is fundamental for any valuable autonomous agent w...

Please sign up or login with your details

Forgot password? Click here to reset