Towards Open-Vocabulary Video Instance Segmentation

04/04/2023
by   Haochen Wang, et al.
2

Video Instance Segmentation(VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation, which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset(LV-VIS), that contains well-annotated objects from 1,212 diverse categories, significantly surpassing the category size of existing datasets by more than one order of magnitude. Third, we propose an efficient Memory-Induced Vision-Language Transformer, MindVLT, to first achieve Open-Vocabulary VIS in an end-to-end manner with near real-time inference speed. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of MindVLT on novel categories. We will release the dataset and code to facilitate future endeavors.

READ FULL TEXT

page 1

page 8

research
06/23/2023

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

We introduce the task of open-vocabulary 3D instance segmentation. Tradi...
research
05/26/2023

OpenVIS: Open-vocabulary Video Instance Segmentation

We propose and study a new computer vision task named open-vocabulary vi...
research
08/08/2023

1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges

Currently, Video Instance Segmentation (VIS) aims at segmenting and cate...
research
04/25/2021

Learning to Better Segment Objects from Unseen Classes with Unlabeled Videos

The ability to localize and segment objects from unseen classes would op...
research
09/14/2023

Large-Vocabulary 3D Diffusion Model with Transformer

Creating diverse and high-quality 3D assets with an automatic generative...
research
03/21/2023

Detecting Everything in the Open World: Towards Universal Object Detection

In this paper, we formally address universal object detection, which aim...
research
04/10/2021

Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation

Current state-of-the-art object detection and segmentation methods work ...

Please sign up or login with your details

Forgot password? Click here to reset