A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

Image captioning is a computer vision task that involves generating natural language descriptions for images. This method has numerous applications in various domains, including image retrieval systems, medicine, and various industries. However, while there has been significant research in image captioning, most studies have focused on high quality images or controlled environments, without exploring the challenges of real-world image captioning. Real-world image captioning involves complex and dynamic environments with numerous points of attention, with images which are often very poor in quality, making it a challenging task, even for humans. This paper evaluates the performance of various models that are built on top of different encoding mechanisms, language decoders and training procedures using a newly created real-world dataset that consists of over 800+ images of over 65 different scene classes, built using MIT Indoor scenes dataset. This dataset is captioned using the IC3 approach that generates more descriptive captions by summarizing the details that are covered by standard image captioning models from unique view-points of the image.

READ FULL TEXT

page 3

page 9

page 13

research
02/01/2020

UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning

Image Captioning, the task of automatic generation of image captions, ha...
research
12/21/2020

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Image captioning has recently demonstrated impressive progress largely o...
research
11/28/2018

Towards Task Understanding in Visual Settings

We consider the problem of understanding real world tasks depicted in vi...
research
05/21/2021

Visual representation of negation: Real world data analysis on comic image design

There has been a widely held view that visual representations (e.g., pho...
research
11/17/2022

Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired

We propose a simple yet effective image captioning framework that can de...
research
06/15/2020

Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

Multi-sentence summarization is a well studied problem in NLP, while gen...
research
02/11/2023

See Your Heart: Psychological states Interpretation through Visual Creations

In psychoanalysis, generating interpretations to one's psychological sta...

Please sign up or login with your details

Forgot password? Click here to reset