Position-Aware Contrastive Alignment for Referring Image Segmentation

12/27/2022
by   Bo Chen, et al.
0

Referring image segmentation aims to segment the target object described by a given natural language expression. Typically, referring expressions contain complex relationships between the target and its surrounding objects. The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image. Currently, the most effective way to solve the above problem is to obtain aligned multi-modal features by computing the correlation between visual and linguistic feature modalities under the supervision of the ground-truth mask. However, existing paradigms have difficulty in thoroughly understanding visual and linguistic content due to the inability to perceive information directly about surrounding objects that refer to the target. This prevents them from learning aligned multi-modal features, which leads to inaccurate segmentation. To address this issue, we present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features by guiding the interaction between vision and language through prior position information. Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment by comparing the features of the referred object with those of related objects. Extensive experiments on three benchmarks demonstrate our PCAN performs favorably against the state-of-the-art methods. Our code will be made publicly available.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 12

research
06/16/2021

CMF: Cascaded Multi-model Fusion for Referring Image Segmentation

In this work, we address the task of referring image segmentation (RIS),...
research
09/29/2021

Contrastive Video-Language Segmentation

We focus on the problem of segmenting a certain object referred by a nat...
research
12/04/2022

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

Referring image segmentation aims at localizing all pixels of the visual...
research
08/26/2023

Beyond One-to-One: Rethinking the Referring Image Segmentation

Referring image segmentation aims to segment the target object referred ...
research
06/26/2023

Mutual Query Network for Multi-Modal Product Image Segmentation

Product image segmentation is vital in e-commerce. Most existing methods...
research
05/21/2023

Advancing Referring Expression Segmentation Beyond Single Image

Referring Expression Segmentation (RES) is a widely explored multi-modal...
research
10/09/2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

Referring Image Segmentation (RIS) aims at segmenting the target object ...

Please sign up or login with your details

Forgot password? Click here to reset