ICPC: Instance-Conditioned Prompting with Contrastive Learning for Semantic Segmentation

08/14/2023
by   Chaohui Yu, et al.
0

Modern supervised semantic segmentation methods are usually finetuned based on the supervised or self-supervised models pre-trained on ImageNet. Recent work shows that transferring the knowledge from CLIP to semantic segmentation via prompt learning can achieve promising performance. The performance boost comes from the feature enhancement with multimodal alignment, i.e., the dot product between vision and text embeddings. However, how to improve the multimodal alignment for better transfer performance in dense tasks remains underexplored. In this work, we focus on improving the quality of vision-text alignment from two aspects of prompting design and loss function, and present an instance-conditioned prompting with contrastive learning (ICPC) framework. First, compared with the static prompt designs, we reveal that dynamic prompting conditioned on image content can more efficiently utilize the text encoder for complex dense tasks. Second, we propose an align-guided contrastive loss to refine the alignment of vision and text embeddings. We further propose lightweight multi-scale alignment for better performance. Extensive experiments on three large-scale datasets (ADE20K, COCO-Stuff10k, and ADE20K-Full) demonstrate that ICPC brings consistent improvements across diverse backbones. Taking ResNet-50 as an example, ICPC outperforms the state-of-the-art counterpart by 1.71 respectively.

READ FULL TEXT

page 3

page 7

research
04/04/2023

Multi-Level Contrastive Learning for Dense Prediction Task

In this work, we present Multi-Level Contrastive Learning for Dense Pred...
research
04/03/2023

Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation

In this work, we investigate performing semantic segmentation solely thr...
research
12/02/2021

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Recent progress has shown that large-scale pre-training using contrastiv...
research
03/25/2022

Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation

This work considers supervised contrastive learning for semantic segment...
research
12/01/2022

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

We tackle open-world semantic segmentation, which aims at learning to se...
research
08/29/2022

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Retrieval models based on dense representations in semantic space have b...
research
07/27/2022

Contrastive Masked Autoencoders are Stronger Vision Learners

Masked image modeling (MIM) has achieved promising results on various vi...

Please sign up or login with your details

Forgot password? Click here to reset