One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

07/31/2022
by   Zhipeng Zhang, et al.
0

Referring Expression Comprehension (REC) is one of the most important tasks in visual reasoning that requires a model to detect the target object referred by a natural language expression. Among the proposed pipelines, the one-stage Referring Expression Comprehension (OSREC) has become the dominant trend since it merges the region proposal and selection stages. Many state-of-the-art OSREC models adopt a multi-hop reasoning strategy because a sequence of objects is frequently mentioned in a single expression which needs multi-hop reasoning to analyze the semantic relation. However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions. In this paper, we propose a Dynamic Multi-step Reasoning Network, which allows the reasoning steps to be dynamically adjusted based on the reasoning state and expression complexity. Specifically, we adopt a Transformer module to memorize process the reasoning state and a Reinforcement Learning strategy to dynamically infer the reasoning steps. The work achieves the state-of-the-art performance or significant improvements on several REC datasets, ranging from RefCOCO (+, g) with short expressions, to Ref-Reasoning, a dataset with long and complex compositional expressions.

READ FULL TEXT
research
09/18/2019

Dynamic Graph Attention for Referring Expression Comprehension

Referring expression comprehension aims to locate the object instance de...
research
03/01/2020

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Referring expression comprehension (REF) aims at identifying a particula...
research
12/07/2019

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

Referring Expression Comprehension (REC) is an emerging research spot in...
research
01/03/2019

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Referring object detection and referring image segmentation are importan...
research
06/06/2023

Referring Expression Comprehension Using Language Adaptive Inference

Different from universal object detection, referring expression comprehe...
research
02/17/2023

CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension

The task of multimodal referring expression comprehension (REC), aiming ...
research
07/19/2020

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target objec...

Please sign up or login with your details

Forgot password? Click here to reset