Hierarchical Open-vocabulary Universal Image Segmentation

07/03/2023
by   Xudong Wang, et al.
0

Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions. However, complex visual scenes can be naturally decomposed into simpler parts and abstracted at multiple levels of granularity, introducing inherent segmentation ambiguity. Unlike existing methods that typically sidestep this ambiguity and treat it as an external factor, our approach actively incorporates a hierarchical representation encompassing different semantic-levels into the learning process. We propose a decoupled text-image fusion mechanism and representation learning modules for both "things" and "stuff".1 Additionally, we systematically examine the differences that exist in the textual and visual features between these types of categories. Our resulting model, named HIPIE, tackles HIerarchical, oPen-vocabulary, and unIvErsal segmentation tasks within a unified framework. Benchmarked on over 40 datasets, e.g., ADE20K, COCO, Pascal-VOC Part, RefCOCO/RefCOCOg, ODinW and SeginW, HIPIE achieves the state-of-the-art results at various levels of image comprehension, including semantic-level (e.g., semantic segmentation), instance-level (e.g., panoptic/referring segmentation and object detection), as well as part-level (e.g., part/subpart segmentation) tasks. Our code is released at https://github.com/berkeley-hipie/HIPIE.

READ FULL TEXT

page 2

page 4

page 10

page 16

page 17

page 18

page 19

page 20

research
09/11/2023

Panoptic Vision-Language Feature Fields

Recently, methods have been proposed for 3D open-vocabulary semantic seg...
research
08/22/2023

Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

This work targets what we consider to be the foundational step for urban...
research
03/20/2023

Open-vocabulary Panoptic Segmentation with Embedding Modulation

Open-vocabulary image segmentation is attracting increasing attention du...
research
11/27/2022

SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

Recently, the contrastive language-image pre-training, e.g., CLIP, has d...
research
08/08/2021

OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

We introduce the task of open-vocabulary visual instance search (OVIS). ...
research
06/28/2021

K-Net: Towards Unified Image Segmentation

Semantic, instance, and panoptic segmentations have been addressed using...
research
04/14/2023

MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation

CLIP (Contrastive Language-Image Pretraining) is well-developed for open...

Please sign up or login with your details

Forgot password? Click here to reset