Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding

01/19/2019
by   Hexiang Hu, et al.
0

Providing systems the ability to relate linguistic and visual content is one of the hallmarks of computer vision. Tasks such as image captioning and retrieval were designed to test this ability, but come with complex evaluation measures that gauge various other abilities and biases simultaneously. This paper presents an alternative evaluation task for visual-grounding systems: given a caption the system is asked to select the image that best matches the caption from a pair of semantically similar images. The system's accuracy on this Binary Image SelectiON (BISON) task is not only interpretable, but also measures the ability to relate fine-grained text content in the caption to visual content in the images. We gathered a BISON dataset that complements the COCO Captions dataset and used this dataset in auxiliary evaluations of captioning and caption-based retrieval systems. While captioning measures suggest visual grounding systems outperform humans, BISON shows that these systems are still far away from human performance.

READ FULL TEXT
research
09/26/2020

Neural Twins Talk

Inspired by how the human brain employs more neural pathways when increa...
research
09/04/2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

This paper presents a new metric called TIGEr for the automatic evaluati...
research
02/23/2023

HL Dataset: Grounding High-Level Linguistic Concepts in Vision

Current captioning datasets, focus on object-centric captions, describin...
research
12/06/2019

Connecting Vision and Language with Localized Narratives

We propose Localized Narratives, an efficient way to collect image capti...
research
03/24/2020

TextCaps: a Dataset for Image Captioning with Reading Comprehension

Image descriptions can help visually impaired people to quickly understa...
research
09/22/2019

Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators

Grounding language to visual relations is critical to various language-a...
research
11/07/2021

Machine-in-the-Loop Rewriting for Creative Image Captioning

Machine-in-the-loop writing aims to enable humans to collaborate with mo...

Please sign up or login with your details

Forgot password? Click here to reset