A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model

12/29/2021
by   Mengde Xu, et al.
10

Recently, zero-shot image classification by vision-language pre-training has demonstrated incredible achievements, that the model can classify arbitrary category without seeing additional annotated images of that category. However, it is still unclear how to make the zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation. In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. It is difficult because semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images. To remedy the discrepancy on processing granularity, we refuse the use of the prevalent one-stage FCN based framework, and advocate a two-stage semantic segmentation framework, with the first stage extracting generalizable mask proposals and the second stage leveraging an image based CLIP model to perform zero-shot classification on the masked image crops which are generated in the first stage. Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin: +29.5 hIoU on the Pascal VOC 2012 dataset, and +8.9 hIoU on the COCO Stuff dataset. With its simplicity and strong performance, we hope this framework to serve as a baseline to facilitate the future research.

READ FULL TEXT
research
12/15/2021

Decoupling Zero-Shot Semantic Segmentation

Zero-shot semantic segmentation (ZS3) aims to segment the novel categori...
research
04/10/2023

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

This work proposes POMP, a prompt pre-training method for vision-languag...
research
05/25/2023

DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification

Large pre-trained models have had a significant impact on computer visio...
research
04/03/2023

Zero-Shot Semantic Segmentation with Decoupled One-Pass Network

Recently, the zero-shot semantic segmentation problem has attracted incr...
research
06/11/2021

Conterfactual Generative Zero-Shot Semantic Segmentation

zero-shot learning is an essential part of computer vision. As a classic...
research
04/12/2022

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

Training a referring expression comprehension (ReC) model for a new visu...
research
06/14/2022

ReCo: Retrieve and Co-segment for Zero-shot Transfer

Semantic segmentation has a broad range of applications, but its real-wo...

Please sign up or login with your details

Forgot password? Click here to reset