Automated Testing of Image Captioning Systems

06/14/2022
by   Boxi Yu, et al.
0

Image captioning (IC) systems, which automatically generate a text description of the salient objects in an image (real or synthetic), have seen great progress over the past few years due to the development of deep neural networks. IC plays an indispensable role in human society, for example, labeling massive photos for scientific studies and assisting visually-impaired people in perceiving the world. However, even the top-notch IC systems, such as Microsoft Azure Cognitive Services and IBM Image Caption Generator, may return incorrect results, leading to the omission of important objects, deep misunderstanding, and threats to personal safety. To address this problem, we propose MetaIC, the first metamorphic testing approach to validate IC systems. Our core idea is that the object names should exhibit directional changes after object insertion. Specifically, MetaIC (1) extracts objects from existing images to construct an object corpus; (2) inserts an object into an image via novel object resizing and location tuning algorithms; and (3) reports image pairs whose captions do not exhibit differences in an expected way. In our evaluation, we use MetaIC to test one widely-adopted image captioning API and five state-of-the-art (SOTA) image captioning models. Using 1,000 seeds, MetaIC successfully reports 16,825 erroneous issues with high precision (84.9%-98.4%). There are three kinds of errors: misclassification, omission, and incorrect quantity. We visualize the errors reported by MetaIC, which shows that flexible overlapping setting facilitates IC testing by increasing and diversifying the reported errors. In addition, MetaIC can be further generalized to detect label errors in the training dataset, which has successfully detected 151 incorrect labels in MS COCO Caption, a standard dataset in image captioning.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 7

page 8

page 10

page 11

research
06/04/2023

ROME: Testing Image Captioning Systems via Recursive Object Melting

Image captioning (IC) systems aim to generate a text description of the ...
research
05/17/2021

Multi-Modal Image Captioning for the Visually Impaired

One of the ways blind people understand their surroundings is by clickin...
research
09/10/2021

Partially-supervised novel object captioning leveraging context from paired data

In this paper, we propose an approach to improve image captioning soluti...
research
12/21/2020

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Image captioning has recently demonstrated impressive progress largely o...
research
07/31/2019

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at t...
research
04/28/2023

Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

Automated image captioning has the potential to be a useful tool for peo...
research
12/02/2021

Object-Centric Unsupervised Image Captioning

Training an image captioning model in an unsupervised manner without uti...

Please sign up or login with your details

Forgot password? Click here to reset