Zero-Shot Semantic Segmentation with Decoupled One-Pass Network

04/03/2023
by   Cong Han, et al.
0

Recently, the zero-shot semantic segmentation problem has attracted increasing attention, and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pre-trained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visuallanguage model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-theart methods while being 4 to 7 times faster at inference. We release our code at https://github.com/CongHan0808/DeOP.git.

READ FULL TEXT

page 3

page 4

page 8

page 11

page 12

page 13

research
04/13/2023

[CLS] Token is All You Need for Zero-Shot Semantic Segmentation

In this paper, we propose an embarrassingly simple yet highly effective ...
research
12/29/2021

A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model

Recently, zero-shot image classification by vision-language pre-training...
research
11/30/2021

Zero-Shot Semantic Segmentation via Spatial and Multi-Scale Aware Visual Class Embedding

Fully supervised semantic segmentation technologies bring a paradigm shi...
research
01/10/2022

Language-driven Semantic Segmentation

We present LSeg, a novel model for language-driven semantic image segmen...
research
03/31/2023

Zero-shot Referring Image Segmentation with Global-Local Context Features

Referring image segmentation (RIS) aims to find a segmentation mask give...
research
06/25/2023

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Segment anything model (SAM) is a prompt-guided vision foundation model ...
research
05/26/2023

Zero-shot Visual Question Answering with Language Model Feedback

In this paper, we propose a novel language model guided captioning appro...

Please sign up or login with your details

Forgot password? Click here to reset