OpenVIS: Open-vocabulary Video Instance Segmentation

05/26/2023
by   Pinxue Guo, et al.
0

We propose and study a new computer vision task named open-vocabulary video instance segmentation (OpenVIS), which aims to simultaneously segment, detect, and track arbitrary objects in a video according to corresponding text descriptions. Compared to the original video instance segmentation, OpenVIS enables users to identify objects of desired categories, regardless of whether those categories were included in the training dataset. To achieve this goal, we propose a two-stage pipeline for proposing high-quality class-agnostic object masks and predicting their corresponding categories via pre-trained VLM. Specifically, we first employ a query-based mask proposal network to generate masks of all potential objects, where we replace the original class head with an instance head trained with a binary object loss, thereby enhancing the class-agnostic mask proposal ability. Then, we introduce a proposal post-processing approach to adapt the proposals better to the pre-trained VLMs, avoiding distortion and unnatural proposal inputs. Meanwhile, to facilitate research on this new task, we also propose an evaluation benchmark that utilizes off-the-shelf datasets to comprehensively assess its performance. Experimentally, the proposed OpenVIS exhibits a remarkable 148% improvement compared to the full-supervised baselines on BURST, which have been trained on all categories.

READ FULL TEXT

page 1

page 5

page 6

page 7

page 8

research
04/04/2023

Towards Open-Vocabulary Video Instance Segmentation

Video Instance Segmentation(VIS) aims at segmenting and categorizing obj...
research
08/18/2022

Open-Vocabulary Panoptic Segmentation with MaskCLIP

In this paper, we tackle a new computer vision task, open-vocabulary pan...
research
06/23/2023

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

We introduce the task of open-vocabulary 3D instance segmentation. Tradi...
research
12/23/2022

Learning to Detect and Segment for Open Vocabulary Object Detection

Open vocabulary object detection has been greatly advanced by the recent...
research
03/18/2021

SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation

Video instance segmentation (VIS) is a new and critical task in computer...
research
12/04/2019

EmbedMask: Embedding Coupling for One-stage Instance Segmentation

Current instance segmentation methods can be categorized into segmentati...
research
10/09/2022

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

Open-vocabulary semantic segmentation aims to segment an image into sema...

Please sign up or login with your details

Forgot password? Click here to reset