Dual Convolutional LSTM Network for Referring Image Segmentation

01/30/2020
by   Linwei Ye, et al.
12

We consider referring image segmentation. It is a problem at the intersection of computer vision and natural language understanding. Given an input image and a referring expression in the form of a natural language sentence, the goal is to segment the object of interest in the image referred by the linguistic query. To this end, we propose a dual convolutional LSTM (ConvLSTM) network to tackle this problem. Our model consists of an encoder network and a decoder network, where ConvLSTM is used in both encoder and decoder networks to capture spatial and sequential information. The encoder network extracts visual and linguistic features for each word in the expression sentence, and adopts an attention mechanism to focus on words that are more informative in the multimodal interaction. The decoder network integrates the features generated by the encoder network at multiple levels as its input and produces the final precise segmentation mask. Experimental results on four challenging datasets demonstrate that the proposed network achieves superior segmentation performance compared with other state-of-the-art methods.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 9

page 10

page 11

research
03/23/2017

Recurrent Multimodal Interaction for Referring Image Segmentation

In this paper we are interested in the problem of image segmentation giv...
research
03/30/2021

Locate then Segment: A Strong Pipeline for Referring Image Segmentation

Referring image segmentation aims to segment the objects referred by a n...
research
05/24/2023

MMNet: Multi-Mask Network for Referring Image Segmentation

Referring image segmentation aims to segment an object referred to by na...
research
08/26/2023

Beyond One-to-One: Rethinking the Referring Image Segmentation

Referring image segmentation aims to segment the target object referred ...
research
03/20/2016

Segmentation from Natural Language Expressions

In this paper we approach the novel problem of segmenting an image based...
research
06/28/2021

Recurrent neural network transducer for Japanese and Chinese offline handwritten text recognition

In this paper, we propose an RNN-Transducer model for recognizing Japane...
research
06/05/2018

Mining for meaning: from vision to language through multiple networks consensus

Describing visual data into natural language is a very challenging task,...

Please sign up or login with your details

Forgot password? Click here to reset