Recurrent Instance Segmentation using Sequences of Referring Expressions

11/05/2019
by   Alba Herrera-Palacio, et al.
16

The goal of this work is to segment the objects in an image that are referred to by a sequence of linguistic descriptions (referring expressions). We propose a deep neural network with recurrent layers that output a sequence of binary masks, one for each referring expression provided by the user. The recurrent layers in the architecture allow the model to condition each predicted mask on the previous ones, from a spatial perspective within the same image. Our multimodal approach uses off-the-shelf architectures to encode both the image and the referring expressions. The visual branch provides a tensor of pixel embeddings that are concatenated with the phrase embeddings produced by a language encoder. Our experiments on the RefCOCO dataset for still images indicate how the proposed architecture successfully exploits the sequences of referring expressions to solve a pixel-wise task of instance segmentation.

READ FULL TEXT

page 2

page 4

page 5

research
10/06/2020

Vec2Instance: Parameterization for Deep Instance Segmentation

Current advances in deep learning is leading to human-level accuracy in ...
research
01/26/2020

SDOD:Real-time Segmenting and Detecting 3D Objects by Depth

Most existing instance segmentation methods only focus on 2D objects and...
research
07/17/2023

Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions

In this study, we aim to develop a model that comprehends a natural lang...
research
12/12/2018

Weakly Supervised Instance Segmentation Using Hybrid Network

Weakly-supervised instance segmentation, which could greatly save labor ...
research
08/16/2020

InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving

Moving object segmentation is a crucial task for autonomous vehicles as ...
research
02/18/2019

PointIT: A Fast Tracking Framework Based on 3D Instance Segmentation

Recently most popular tracking frameworks focus on 2D image sequences. T...
research
09/10/2019

Multimodal Attention Branch Network for Perspective-Free Sentence Generation

In this paper, we address the automatic sentence generation of fetching ...

Please sign up or login with your details

Forgot password? Click here to reset