Comprehensive Multi-Modal Interactions for Referring Image Segmentation

04/21/2021
by   Kanishk Jain, et al.
0

We investigate Referring Image Segmentation (RIS), which outputs a segmentation map corresponding to the given natural language description. To solve RIS efficiently, we need to understand each word's relationship with other words, each region in the image to other regions, and cross-modal alignment between linguistic and visual domains. Recent methods model these three types of interactions sequentially. We argue that such a modular approach limits these methods' performance, and joint simultaneous reasoning can help resolve ambiguities. To this end, we propose a Joint Reasoning (JRM) module and a novel Cross-Modal Multi-Level Fusion (CMMLF) module for tackling this task. JRM effectively models the referent's multi-modal context by jointly reasoning over visual and linguistic modalities (performing word-word, image region-region, word-region interactions in a single module). CMMLF module further refines the segmentation masks by exchanging contextual information across visual hierarchy through linguistic features acting as a bridge. We present thorough ablation studies and validate our approach's performance on four benchmark datasets, and show that the proposed method outperforms the existing state-of-the-art methods on all four datasets by significant margins.

READ FULL TEXT

page 1

page 4

page 7

page 8

page 12

page 13

research
04/09/2019

Cross-Modal Self-Attention Network for Referring Image Segmentation

We consider the problem of referring image segmentation. Given an input ...
research
06/26/2023

Mutual Query Network for Multi-Modal Product Image Segmentation

Product image segmentation is vital in e-commerce. Most existing methods...
research
01/16/2023

Learning Aligned Cross-modal Representations for Referring Image Segmentation

Referring image segmentation aims to segment the image region of interes...
research
06/16/2021

CMF: Cascaded Multi-model Fusion for Referring Image Segmentation

In this work, we address the task of referring image segmentation (RIS),...
research
03/23/2017

Recurrent Multimodal Interaction for Referring Image Segmentation

In this paper we are interested in the problem of image segmentation giv...
research
12/09/2020

Hateful Memes Detection via Complementary Visual and Linguistic Networks

Hateful memes are widespread in social media and convey negative informa...
research
09/20/2022

Towards Robust Referring Image Segmentation

Referring Image Segmentation (RIS) aims to connect image and language vi...

Please sign up or login with your details

Forgot password? Click here to reset