Dynamic Multimodal Instance Segmentation guided by natural language queries

07/06/2018
by   Edgar A. Margffoy-Tuay, et al.
0

In this paper, we address the task of segmenting an object given a natural language expression that references it, i.e. a referring expression. Current techniques tackle this task by either (i) directly or recursively merging the linguistic and visual information in the channel dimension and then performing convolutions; or by (ii) mapping the expression to a space in which it can be thought of as a filter, whose response is directly related to the presence of the object at a given spatial coordinate in the image, so that a convolution can be applied to look for the object. We propose a novel method that merges the best of both worlds to exploit the recursive nature of language, and that also, during the upsampling process, takes advantage of the intermediate information generated when downsampling the image, so that detailed segmentations can be obtained. Our method is compared with the state-of-the-art approaches in four standard datasets, in which it yields high performance and surpasses all previous methods in six of eight of the standard dataset splits for this task. Code will be made available in the final version of this paper. Full implementation of our method and training routines, written in PyTorch, can be found at <https://github.com/andfoy/query-objseg>

READ FULL TEXT

page 2

page 13

page 14

research
04/04/2019

VQD: Visual Query Detection in Natural Scenes

We propose Visual Query Detection (VQD), a new visual grounding task. In...
research
03/20/2016

Segmentation from Natural Language Expressions

In this paper we approach the novel problem of segmenting an image based...
research
10/10/2019

Referring Expression Object Segmentation with Caption-Aware Consistency

Referring expressions are natural language descriptions that identify a ...
research
05/04/2020

Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

Visual referring expression recognition is a challenging task that requi...
research
08/18/2023

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Referring image segmentation aims to segment an object mentioned in natu...
research
05/30/2018

Visual Referring Expression Recognition: What Do Systems Actually Learn?

We present an empirical analysis of the state-of-the-art systems for ref...
research
08/16/2023

InsightMapper: A Closer Look at Inner-instance Information for Vectorized High-Definition Mapping

Vectorized high-definition (HD) maps contain detailed information about ...

Please sign up or login with your details

Forgot password? Click here to reset