Come Again? Re-Query in Referring Expression Comprehension

10/19/2021
by   Stephan J. Lemmer, et al.
0

To build a shared perception of the world, humans rely on the ability to resolve misunderstandings by requesting and accepting clarifications. However, when evaluating visiolinguistic models, metrics such as accuracy enforce the assumption that a decision must be made based on a single piece of evidence. In this work, we relax this assumption for the task of referring expression comprehension by allowing the model to request help when its confidence is low. We consider two ways in which this help can be provided: multimodal re-query, where the user is allowed to point or click to provide additional information to the model, and rephrase re-query, where the user is only allowed to provide another referring expression. We demonstrate the importance of re-query by showing that providing the best referring expression for all objects can increase accuracy by up to 21.9 re-querying only 12 re-query functions for both multimodal and rephrase re-query across three modern approaches and demonstrate combined replacement for rephrase re-query, which improves average single-query performance by up to 6.5 as close as 1.6

READ FULL TEXT

page 1

page 5

research
01/12/2017

Comprehension-guided referring expressions

We consider generation and comprehension of natural language referring e...
research
12/09/2018

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

In this paper, we propose a novel end-to-end model, namely Single-Stage ...
research
04/04/2019

VQD: Visual Query Detection in Natural Scenes

We propose Visual Query Detection (VQD), a new visual grounding task. In...
research
06/02/2020

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

Conventional referring expression comprehension (REF) assumes people to ...
research
12/12/2018

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks

The task in referring expression comprehension is to localise the object...
research
08/23/2023

RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D

Grounding textual expressions on scene objects from first-person views i...

Please sign up or login with your details

Forgot password? Click here to reset