Evaluating Pragmatic Abilities of Image Captioners on A3DS

05/22/2023
by   Polina Tsvilodub, et al.
0

Evaluating grounded neural language model performance with respect to pragmatic qualities like the trade off between truthfulness, contrastivity and overinformativity of generated utterances remains a challenge in absence of data collected from humans. To enable such evaluation, we present a novel open source image-text dataset "Annotated 3D Shapes" (A3DS) comprising over nine million exhaustive natural language annotations and over 12 million variable-granularity captions for the 480,000 images provided by Burges Kim (2018). We showcase the evaluation of pragmatic abilities developed by a task-neutral image captioner fine-tuned in a multi-agent communication setting to produce contrastive captions. The evaluation is enabled by the dataset because the exhaustive annotations allow to quantify the presence of contrastive features in the model's generations. We show that the model develops human-like patterns (informativity, brevity, over-informativity for specific features (e.g., shape, color biases)).

READ FULL TEXT
research
01/25/2022

BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment

Evaluating video captioning systems is a challenging task as there are m...
research
04/01/2015

Microsoft COCO Captions: Data Collection and Evaluation Server

In this paper we describe the Microsoft COCO Caption dataset and evaluat...
research
04/30/2021

Evaluating Groundedness in Dialogue Systems: The BEGIN Benchmark

Knowledge-grounded dialogue agents are systems designed to conduct a con...
research
07/06/2021

Improving Text-to-Image Synthesis Using Contrastive Learning

The goal of text-to-image synthesis is to generate a visually realistic ...
research
05/14/2020

Multi-agent Communication meets Natural Language: Synergies between Functional and Structural Language Learning

We present a method for combining multi-agent communication and traditio...
research
04/02/2016

Reasoning About Pragmatics with Neural Listeners and Speakers

We present a model for pragmatically describing scenes, in which contras...
research
05/16/2018

Defoiling Foiled Image Captions

We address the task of detecting foiled image captions, i.e. identifying...

Please sign up or login with your details

Forgot password? Click here to reset