SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

11/27/2022
by   Huaishao Luo, et al.
0

Recently, the contrastive language-image pre-training, e.g., CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data. However, transferring the learned visual knowledge to open-vocabulary semantic segmentation is still under-explored. In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner. The SegCLIP achieves segmentation based on ViT and the main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs. The gathering operation can dynamically capture the semantic groups, which can be used to generate the final segmentation results. We further propose a reconstruction loss on masked patches and a superpixel-based KL loss with pseudo-labels to enhance the visual representation. Experimental results show that our model achieves comparable or superior segmentation accuracy on the PASCAL VOC 2012 (+1.4 Context (+2.4 the code at https://github.com/ArrowLuo/SegCLIP.

READ FULL TEXT

page 4

page 7

page 8

research
01/22/2023

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

In this paper, we consider the problem of open-vocabulary semantic segme...
research
08/09/2023

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

Recently, semantic segmentation models trained with image-level text sup...
research
09/11/2023

Panoptic Vision-Language Feature Fields

Recently, methods have been proposed for 3D open-vocabulary semantic seg...
research
07/03/2023

Hierarchical Open-vocabulary Universal Image Segmentation

Open-vocabulary image segmentation aims to partition an image into seman...
research
06/04/2022

Rethinking the Openness of CLIP

Contrastive Language-Image Pre-training (CLIP) has demonstrated great po...
research
03/21/2023

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Existing works on open-vocabulary semantic segmentation have utilized la...
research
08/22/2023

Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

This work targets what we consider to be the foundational step for urban...

Please sign up or login with your details

Forgot password? Click here to reset