Data augmentation to improve robustness of image captioning solutions

06/10/2021
by   Shashank Bujimalla, et al.
0

In this paper, we study the impact of motion blur, a common quality flaw in real world images, on a state-of-the-art two-stage image captioning solution, and notice a degradation in solution performance as blur intensity increases. We investigate techniques to improve the robustness of the solution to motion blur using training data augmentation at each or both stages of the solution, i.e., object detection and captioning, and observe improved results. In particular, augmenting both the stages reduces the CIDEr-D degradation for high motion blur intensity from 68.7 to 11.7 on MS COCO dataset, and from 22.4 to 6.8 on Vizwiz dataset.

READ FULL TEXT
research
05/03/2023

Multimodal Data Augmentation for Image Captioning using Diffusion Models

Image captioning, an important vision-language task, often requires a tr...
research
04/28/2023

Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

Automated image captioning has the potential to be a useful tool for peo...
research
11/29/2020

Improved Handling of Motion Blur in Online Object Detection

We wish to detect specific categories of objects, for online vision syst...
research
01/25/2019

Improving Image Captioning by Leveraging Knowledge Graphs

We explore the use of a knowledge graphs, that capture general or common...
research
11/17/2016

Examining the Impact of Blur on Recognition by Convolutional Networks

State-of-the-art algorithms for many semantic visual tasks are based on ...
research
10/28/2022

Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments

Image recognition models that can work in challenging environments (e.g....
research
02/28/2022

Interactive Machine Learning for Image Captioning

We propose an approach for interactive learning for an image captioning ...

Please sign up or login with your details

Forgot password? Click here to reset