Towards Universal Vision-language Omni-supervised Segmentation

03/12/2023
by   Bowen Dong, et al.
0

Existing open-world universal segmentation approaches usually leverage CLIP and pre-computed proposal masks to treat open-world segmentation tasks as proposal classification. However, 1) these works cannot handle universal segmentation in an end-to-end manner, and 2) the limited scale of panoptic datasets restricts the open-world segmentation ability on things classes. In this paper, we present Vision-Language Omni-Supervised Segmentation (VLOSS). VLOSS starts from a Mask2Former universal segmentation framework with CLIP text encoder. To improve the open-world segmentation ability, we leverage omni-supervised data (i.e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability and achieving better segmentation accuracy. To better improve the training efficiency and fully release the power of omni-supervised data, we propose several advanced techniques, i.e., FPN-style encoder, switchable training technique, and positive classification loss. Benefiting from the end-to-end training manner with proposed techniques, VLOSS can be applied to various open-world segmentation tasks without further adaptation. Experimental results on different open-world panoptic and instance segmentation benchmarks demonstrate the effectiveness of VLOSS. Notably, with fewer parameters, our VLOSS with Swin-Tiny backbone surpasses MaskCLIP by  2 terms of mask AP on LVIS v1 dataset.

READ FULL TEXT

page 1

page 4

page 8

research
05/05/2021

QueryInst: Parallelly Supervised Mask Query for Instance Segmentation

Recently, query based object detection frameworks achieve comparable per...
research
05/03/2021

ISTR: End-to-End Instance Segmentation with Transformers

End-to-end paradigms significantly improve the accuracy of various deep-...
research
06/02/2023

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Observing the close relationship among panoptic, semantic and instance s...
research
03/21/2023

Detecting Everything in the Open World: Towards Universal Object Detection

In this paper, we formally address universal object detection, which aim...
research
08/04/2023

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Open-vocabulary segmentation is a challenging task requiring segmenting ...
research
03/09/2023

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

Many top-down architectures for instance segmentation achieve significan...
research
04/12/2022

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

In this paper, we study the challenging instance-wise vision-language ta...

Please sign up or login with your details

Forgot password? Click here to reset