Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

03/29/2023
by   Vibashan VS, et al.
0

Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) categories. To alleviate this problem, Open-Vocabulary (OV) methods leverage large-scale image-caption pairs and vision-language models to learn novel categories. In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs. This difference between strong and weak supervision leads to overfitting on base categories, resulting in poor generalization towards novel categories. In this work, we overcome this issue by learning both base and novel categories from pseudo-mask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free OVIS pipeline. Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. The generated pseudo-mask annotations are then used to supervise an instance segmentation model, freeing the entire pipeline from any labour-expensive instance-level annotations and overfitting. Our extensive experiments show that our method trained with just pseudo-masks significantly improves the mAP scores on the MS-COCO dataset and OpenImages dataset compared to the recent state-of-the-art methods trained with manual masks. Codes and models are provided in https://vibashan.github.io/ovis-web/.

READ FULL TEXT

page 2

page 4

page 8

page 12

page 13

page 14

page 15

page 16

research
11/28/2017

Learning to Segment Every Thing

Existing methods for object instance segmentation require all training i...
research
03/25/2023

IFSeg: Image-free Semantic Segmentation via Vision-Language Model

Vision-language (VL) pre-training has recently gained much attention for...
research
09/01/2023

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Current 3D open-vocabulary scene understanding methods mostly utilize we...
research
01/02/2023

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

In this work, we focus on instance-level open vocabulary segmentation, i...
research
08/01/2023

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

Open-world instance-level scene understanding aims to locate and recogni...
research
04/12/2022

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

Open-world instance segmentation is the task of grouping pixels into obj...
research
03/09/2023

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

Many top-down architectures for instance segmentation achieve significan...

Please sign up or login with your details

Forgot password? Click here to reset