Open-Vocabulary Panoptic Segmentation with MaskCLIP

08/18/2022
by   Zheng Ding, et al.
0

In this paper, we tackle a new computer vision task, open-vocabulary panoptic segmentation, that aims to perform panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions. We first build a baseline method without finetuning nor distillation to utilize the knowledge in the existing CLIP model. We then develop a new method, MaskCLIP, that is a Transformer-based approach using mask queries with the ViT-based CLIP backbone to perform semantic segmentation and object instance segmentation. Here we design a Relative Mask Attention (RMA) module to account for segmentations as additional tokens to the ViT CLIP model. MaskCLIP learns to efficiently and effectively utilize pre-trained dense/local CLIP features by avoiding the time-consuming operation to crop image patches and compute feature from an external CLIP image model. We obtain encouraging results for open-vocabulary panoptic segmentation and state-of-the-art results for open-vocabulary semantic segmentation on ADE20K and PASCAL datasets. We show qualitative illustration for MaskCLIP with custom categories.

READ FULL TEXT

page 7

page 8

page 9

research
06/23/2023

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

We introduce the task of open-vocabulary 3D instance segmentation. Tradi...
research
10/09/2022

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

Open-vocabulary semantic segmentation aims to segment an image into sema...
research
05/23/2023

3D Open-vocabulary Segmentation with Foundation Models

Open-vocabulary segmentation of 3D scenes is a fundamental function of h...
research
09/22/2018

Artistic Instance-Aware Image Filtering by Convolutional Neural Networks

In the recent years, public use of artistic effects for editing and beau...
research
05/26/2023

OpenVIS: Open-vocabulary Video Instance Segmentation

We propose and study a new computer vision task named open-vocabulary vi...
research
01/22/2023

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

In this paper, we consider the problem of open-vocabulary semantic segme...
research
09/01/2023

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Current 3D open-vocabulary scene understanding methods mostly utilize we...

Please sign up or login with your details

Forgot password? Click here to reset