CMF: Cascaded Multi-model Fusion for Referring Image Segmentation

06/16/2021
by   Jianhua Yang, et al.
0

In this work, we address the task of referring image segmentation (RIS), which aims at predicting a segmentation mask for the object described by a natural language expression. Most existing methods focus on establishing unidirectional or directional relationships between visual and linguistic features to associate two modalities together, while the multi-scale context is ignored or insufficiently modeled. Multi-scale context is crucial to localize and segment those objects that have large scale variations during the multi-modal fusion process. To solve this problem, we propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel and further introduces a cascaded branch to fuse visual and linguistic features. The cascaded branch can progressively integrate multi-scale contextual information and facilitate the alignment of two modalities during the multi-modal fusion process. Experimental results on four benchmark datasets demonstrate that our method outperforms most state-of-the-art methods. Code is available at https://github.com/jianhua2022/CMF-Refseg.

READ FULL TEXT
research
05/05/2021

Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation

Recently, referring image segmentation has aroused widespread interest. ...
research
12/27/2022

Position-Aware Contrastive Alignment for Referring Image Segmentation

Referring image segmentation aims to segment the target object described...
research
03/30/2022

Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation

Referring video segmentation aims to segment the corresponding video obj...
research
07/18/2021

Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos

Anticipating human actions is an important task that needs to be address...
research
06/26/2023

Mutual Query Network for Multi-Modal Product Image Segmentation

Product image segmentation is vital in e-commerce. Most existing methods...
research
03/11/2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Referring image segmentation segments an image from a language expressio...
research
04/21/2021

Comprehensive Multi-Modal Interactions for Referring Image Segmentation

We investigate Referring Image Segmentation (RIS), which outputs a segme...

Please sign up or login with your details

Forgot password? Click here to reset