Communication breakdown: On the low mutual intelligibility between human and neural captioning

10/20/2022
by   Roberto Dessì, et al.
0

We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced ImageCoDe data-set <cit.>, which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher performance when fed neural rather than human captions, despite the fact that the former, unlike the latter, were generated without awareness of the distractors that make the task hard. Even more remarkably, when the same neural captions are given to human subjects, their retrieval performance is almost at chance level. Our results thus add to the growing body of evidence that, even when the “language” of neural models resembles English, this superficial resemblance might be deeply misleading.

READ FULL TEXT

page 3

page 8

page 10

research
03/22/2018

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

The aim of image captioning is to generate similar captions by machine a...
research
04/04/2023

Cross-Domain Image Captioning with Discriminative Finetuning

Neural captioners are typically trained to mimic human-generated referen...
research
03/12/2018

Discriminability objective for training descriptive captions

One property that remains lacking in image captions generated by contemp...
research
03/30/2017

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

While strong progress has been made in image captioning over the last ye...
research
01/27/2021

See-Through Captions: Real-Time Captioning on Transparent Display for Deaf and Hard-of-Hearing People

Real-time captioning is a useful technique for deaf and hard-of-hearing ...
research
07/31/2020

Evaluating Automatically Generated Phoneme Captions for Images

Image2Speech is the relatively new task of generating a spoken descripti...
research
05/28/2021

Linguistic Structures as Weak Supervision for Visual Scene Graph Generation

Prior work in scene graph generation requires categorical supervision at...

Please sign up or login with your details

Forgot password? Click here to reset