Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

03/01/2020
by   Zhenfang Chen, et al.
0

Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features. First, we design a novel expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. We hope this new dataset and task can serve as a benchmark for deeper visual reasoning analysis and foster the research on referring expression comprehension.

READ FULL TEXT
research
07/31/2022

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

Referring Expression Comprehension (REC) is one of the most important ta...
research
07/19/2020

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target objec...
research
09/18/2019

Dynamic Graph Attention for Referring Expression Comprehension

Referring expression comprehension aims to locate the object instance de...
research
01/03/2019

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

Referring object detection and referring image segmentation are importan...
research
08/23/2023

RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D

Grounding textual expressions on scene objects from first-person views i...
research
06/02/2020

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

Conventional referring expression comprehension (REF) assumes people to ...
research
02/17/2023

CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension

The task of multimodal referring expression comprehension (REC), aiming ...

Please sign up or login with your details

Forgot password? Click here to reset