Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

by   Lu Yu, et al.

Automated image captioning has the potential to be a useful tool for people with vision impairments. Images taken by this user group are often noisy, which leads to incorrect and even unsafe model predictions. In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people. We address this problem from three angles: data, model, and evaluation. First, we show how data augmentation techniques for generating synthetic noise can address data sparsity in this domain. Second, we enhance the robustness of the model by expanding a state-of-the-art model to a dual network architecture, using the augmented data and leveraging different consistency losses. Our results demonstrate increased performance, e.g. an absolute improvement of 2.15 on CIDEr, compared to state-of-the-art image captioning networks, as well as increased robustness to noise with up to 3 points improvement on CIDEr in more noisy settings. Finally, we evaluate the prediction reliability using confidence calibration on images with different difficulty/noise levels, showing that our models perform more reliably in safety-critical situations. The improved model is part of an assisted living application, which we develop in partnership with the Royal National Institute of Blind People.


page 3

page 6

page 7


Multi-Modal Image Captioning for the Visually Impaired

One of the ways blind people understand their surroundings is by clickin...

Multimodal Data Augmentation for Image Captioning using Diffusion Models

Image captioning, an important vision-language task, often requires a tr...

Data augmentation to improve robustness of image captioning solutions

In this paper, we study the impact of motion blur, a common quality flaw...

A Survey on Biomedical Image Captioning

Image captioning applied to biomedical images can assist and accelerate ...

Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired

We propose a simple yet effective image captioning framework that can de...

Automated Testing of Image Captioning Systems

Image captioning (IC) systems, which automatically generate a text descr...

Assessing Image Quality Issues for Real-World Problem

We introduce a new large-scale dataset that links the assessment of imag...

Please sign up or login with your details

Forgot password? Click here to reset