MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension

03/18/2020
by   Shuai Wang, et al.
0

Referring expression comprehension (REC) aims to localize a text-related region in a given image by a referring expression in natural language. Existing methods focus on how to build convincing visual and language representations independently, which may significantly isolate visual and language information. In this paper, we argue that for REC the referring expression and the target region are semantically correlated and subject, location and relationship consistency exist between vision and language.On top of this, we propose a novel approach called MutAtt to construct mutual guidance between vision and language, which treat vision and language equally thus yield compact information matching. Specifically, for each module of subject, location and relationship, MutAtt builds two kinds of attention-based mutual guidance strategies. One strategy is to generate vision-guided language embedding for the sake of matching relevant visual feature. The other reversely generates language-guided visual feature to match relevant language embedding. This mutual guidance strategy can effectively guarantees the vision-language consistency in three modules. Experiments on three popular REC datasets demonstrate that the proposed approach outperforms the current state-of-the-art methods.

READ FULL TEXT
research
01/24/2018

MAttNet: Modular Attention Network for Referring Expression Comprehension

In this paper, we address referring expression comprehension: localizing...
research
12/12/2018

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks

The task in referring expression comprehension is to localise the object...
research
04/21/2022

Referring Expression Comprehension via Cross-Level Multi-Modal Fusion

As an important and challenging problem in vision-language tasks, referr...
research
12/09/2018

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

In this paper, we propose a novel end-to-end model, namely Single-Stage ...
research
10/10/2019

Referring Expression Object Segmentation with Caption-Aware Consistency

Referring expressions are natural language descriptions that identify a ...
research
07/19/2020

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target objec...
research
10/24/2022

Towards Unifying Reference Expression Generation and Comprehension

Reference Expression Generation (REG) and Comprehension (REC) are two hi...

Please sign up or login with your details

Forgot password? Click here to reset