DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

12/02/2021
by   Yongming Rao, et al.
0

Recent progress has shown that large-scale pre-training using contrastive image-text pairs can be a promising alternative for high-quality visual representation learning from natural language supervision. Benefiting from a broader source of supervision, this new paradigm exhibits impressive transferability to downstream classification tasks and datasets. However, the problem of transferring the knowledge learned from image-text pairs to more complex dense prediction tasks has barely been visited. In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models. By further using the contextual information from the image to prompt the language model, we are able to facilitate our model to better exploit the pre-trained knowledge. Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones including both CLIP models and ImageNet pre-trained models. Extensive experiments demonstrate the superior performance of our methods on semantic segmentation, object detection, and instance segmentation tasks. Code is available at https://github.com/raoyongming/DenseCLIP

READ FULL TEXT
research
03/03/2023

Unleashing Text-to-Image Diffusion Models for Visual Perception

Diffusion models (DMs) have become the new trend of generative models an...
research
04/03/2023

Probabilistic Prompt Learning for Dense Prediction

Recent progress in deterministic prompt learning has become a promising ...
research
01/17/2023

Masked Visual Reconstruction in Language Semantic Space

Both masked image modeling (MIM) and natural language supervision have f...
research
08/14/2023

ICPC: Instance-Conditioned Prompting with Contrastive Learning for Semantic Segmentation

Modern supervised semantic segmentation methods are usually finetuned ba...
research
08/22/2023

Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

Grounded on pre-trained language models (PLMs), dense retrieval has been...
research
06/08/2023

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

The advent of large pre-trained models has brought about a paradigm shif...
research
03/23/2023

CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

Learning dense visual representations without labels is an arduous task ...

Please sign up or login with your details

Forgot password? Click here to reset