Open-Vocabulary Image Segmentation

12/22/2021
by   Golnaz Ghiasi, et al.
8

We design an open-vocabulary image segmentation model to organize an image into meaningful regions indicated by arbitrary texts. We identify that recent open-vocabulary models can not localize visual concepts well despite recognizing what are in an image. We argue that these models miss an important step of visual grouping, which organizes pixels into groups before learning visual-semantic alignments. We propose OpenSeg to address the above issue. First, it learns to propose segmentation masks for possible organizations. Then it learns visual-semantic alignments by aligning each word in a caption to one or a few predicted masks. We find the mask representations are the key to support learning from captions, making it possible to scale up the dataset and vocabulary sizes. Our work is the first to perform zero-shot transfer on holdout segmentation datasets. We set up two strong baselines by applying class activation maps or fine-tuning with pixel-wise labels on a pre-trained ALIGN model. OpenSeg outperforms these baselines by 3.4 mIoU on PASCAL-Context (459 classes) and 2.7 mIoU on ADE-20k (847 classes).

READ FULL TEXT

page 1

page 5

page 6

page 13

page 14

page 15

page 16

page 17

research
06/03/2019

Zero-Shot Semantic Segmentation

Semantic segmentation models are limited in their ability to scale to la...
research
12/02/2021

DenseCLIP: Extract Free Dense Labels from CLIP

Contrastive Language-Image Pre-training (CLIP) has made a remarkable bre...
research
10/09/2022

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

Open-vocabulary semantic segmentation aims to segment an image into sema...
research
03/23/2023

Zero-guidance Segmentation Using Zero Segment Labels

CLIP has enabled new and exciting joint vision-language applications, on...
research
03/26/2017

Open Vocabulary Scene Parsing

Recognizing arbitrary objects in the wild has been a challenging problem...
research
02/14/2023

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

In this work, instead of directly predicting the pixel-level segmentatio...
research
08/22/2023

Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

This work targets what we consider to be the foundational step for urban...

Please sign up or login with your details

Forgot password? Click here to reset