Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

06/02/2020
by   Peng Wang, et al.
8

Conventional referring expression comprehension (REF) assumes people to query something from an image by describing its visual appearance and spatial location, but in practice, we often ask for an object by describing its affordance or other non-visual attributes, especially when we do not have a precise target. For example, sometimes we say 'Give me something to eat'. In this case, we need to use commonsense knowledge to identify the objects in the image. Unfortunately, these is no existing referring expression dataset reflecting this requirement, not to mention a model to tackle this challenge. In this paper, we collect a new referring expression dataset, called KB-Ref, containing 43k expressions on 16k images. In KB-Ref, to answer each expression (detect the target object referred by the expression), at least one piece of commonsense knowledge must be required. We then test state-of-the-art (SoTA) REF models on KB-Ref, finding that all of them present a large drop compared to their outstanding performance on general REF datasets. We also present an expression conditioned image and fact attention (ECIFA) network that extract information from correlated image regions and commonsense knowledge facts. Our method leads to a significant improvement over SoTA REF models, although there is still a gap between this strong baseline and human performance. The dataset and baseline models will be released.

READ FULL TEXT

page 1

page 6

page 9

research
02/17/2023

CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension

The task of multimodal referring expression comprehension (REC), aiming ...
research
05/28/2021

Alleviating the Knowledge-Language Inconsistency: A Study for Deep Commonsense Knowledge

Knowledge facts are typically represented by relational triples, while w...
research
06/06/2023

Referring Expression Comprehension Using Language Adaptive Inference

Different from universal object detection, referring expression comprehe...
research
03/01/2020

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Referring expression comprehension (REF) aims at identifying a particula...
research
10/19/2021

Come Again? Re-Query in Referring Expression Comprehension

To build a shared perception of the world, humans rely on the ability to...
research
04/01/2019

Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs

To make machines better understand sentiments, research needs to move fr...
research
11/29/2018

Towards Human-Friendly Referring Expression Generation

This paper addresses the generation of referring expressions that not on...

Please sign up or login with your details

Forgot password? Click here to reset