Towards Robust Referring Image Segmentation

09/20/2022
by   Jianzong Wu, et al.
0

Referring Image Segmentation (RIS) aims to connect image and language via outputting the corresponding object masks given a text description, which is a fundamental vision-language task. Despite lots of works that have achieved considerable progress for RIS, in this work, we explore an essential question, "what if the description is wrong or misleading of the text description?". We term such a sentence as a negative sentence. However, we find that existing works cannot handle such settings. To this end, we propose a novel formulation of RIS, named Robust Referring Image Segmentation (R-RIS). It considers the negative sentence inputs besides the regularly given text inputs. We present three different datasets via augmenting the input negative sentences and a new metric to unify both input types. Furthermore, we design a new transformer-based model named RefSegformer, where we introduce a token-based vision and language fusion module. Such module can be easily extended to our R-RIS setting by adding extra blank tokens. Our proposed RefSegformer achieves the new state-of-the-art results on three regular RIS datasets and three R-RIS datasets, which serves as a new solid baseline for further research. The project page is at <https://lxtgh.github.io/project/robust_ref_seg/>.

READ FULL TEXT

page 1

page 8

research
08/30/2016

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

Image segmentation from referring expressions is a joint vision and lang...
research
12/21/2022

Generalized Decoding for Pixel, Image, and Language

We present X-Decoder, a generalized decoding model that can predict pixe...
research
08/26/2023

Beyond One-to-One: Rethinking the Referring Image Segmentation

Referring image segmentation aims to segment the target object referred ...
research
04/21/2021

Comprehensive Multi-Modal Interactions for Referring Image Segmentation

We investigate Referring Image Segmentation (RIS), which outputs a segme...
research
11/16/2017

Language-Based Image Editing with Recurrent Attentive Models

We investigate the problem of Language-Based Image Editing (LBIE) in thi...
research
05/24/2023

MMNet: Multi-Mask Network for Referring Image Segmentation

Referring image segmentation aims to segment an object referred to by na...
research
08/18/2023

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Referring image segmentation aims to segment an object mentioned in natu...

Please sign up or login with your details

Forgot password? Click here to reset