MMNet: Multi-Mask Network for Referring Image Segmentation

05/24/2023
by   Yichen Yan, et al.
0

Referring image segmentation aims to segment an object referred to by natural language expression from an image. However, this task is challenging due to the distinct data properties between text and image, and the randomness introduced by diverse objects and unrestricted language expression. Most of previous work focus on improving cross-modal feature fusion while not fully addressing the inherent uncertainty caused by diverse objects and unrestricted language. To tackle these problems, we propose an end-to-end Multi-Mask Network for referring image segmentation(MMNet). we first combine picture and language and then employ an attention mechanism to generate multiple queries that represent different aspects of the language expression. We then utilize these queries to produce a series of corresponding segmentation masks, assigning a score to each mask that reflects its importance. The final result is obtained through the weighted sum of all masks, which greatly reduces the randomness of the language expression. Our proposed framework demonstrates superior performance compared to state-of-the-art approaches on the two most commonly used datasets, RefCOCO, RefCOCO+ and G-Ref, without the need for any post-processing. This further validates the efficacy of our proposed framework.

READ FULL TEXT

page 1

page 2

page 8

research
03/30/2021

Locate then Segment: A Strong Pipeline for Referring Image Segmentation

Referring image segmentation aims to segment the objects referred by a n...
research
01/30/2020

Dual Convolutional LSTM Network for Referring Image Segmentation

We consider referring image segmentation. It is a problem at the interse...
research
08/18/2023

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Referring image segmentation aims to segment an object mentioned in natu...
research
10/09/2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

Referring Image Segmentation (RIS) aims at segmenting the target object ...
research
12/24/2021

Grounding Linguistic Commands to Navigable Regions

Humans have a natural ability to effortlessly comprehend linguistic comm...
research
11/29/2019

Deep Object Co-segmentation via Spatial-Semantic Network Modulation

Object co-segmentation is to segment the shared objects in multiple rele...
research
09/20/2022

Towards Robust Referring Image Segmentation

Referring Image Segmentation (RIS) aims to connect image and language vi...

Please sign up or login with your details

Forgot password? Click here to reset