Image to Language Understanding: Captioning approach

02/21/2020
by   Madhavan Seshadri, et al.
0

Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc. Such an approach is a combination of Computer Vision and Natural Language techniques which is a hard problem to solve. This project aims to compare different approaches for solving the image captioning problem. In specific, the focus was on comparing two different types of models: Encoder-Decoder approach and a Multi-model approach. In the encoder-decoder approach, inject and merge architectures were compared against a multi-modal image captioning approach based primarily on object detection. These approaches have been compared on the basis on state of the art sentence comparison metrics such as BLEU, GLEU, Meteor, and Rouge on a subset of the Google Conceptual captions dataset which contains 100k images. On the basis of this comparison, we observed that the best model was the Inception injected encoder model. This best approach has been deployed as a web-based system. On uploading an image, such a system will output the best caption associated with the image.

READ FULL TEXT

page 2

page 4

page 6

page 7

research
11/03/2020

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a give...
research
06/15/2019

Generating Diverse and Informative Natural Language Fashion Feedback

Recent advances in multi-modal vision and language tasks enable a new se...
research
05/14/2021

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning w...
research
07/14/2021

From Show to Tell: A Survey on Image Captioning

Connecting Vision and Language plays an essential role in Generative Int...
research
01/16/2020

Delving Deeper into the Decoder for Video Captioning

Video captioning is an advanced multi-modal task which aims to describe ...
research
02/25/2021

Bangla language textual image description by hybrid neural network model

Automatic image captioning task in different language is a challenging t...
research
10/14/2019

Tell-the-difference: Fine-grained Visual Descriptor via a Discriminating Referee

In this paper, we investigate a novel problem of telling the difference ...

Please sign up or login with your details

Forgot password? Click here to reset