Empirical Analysis of Image Caption Generation using Deep Learning

05/14/2021
by   Aditya Bhattacharya, et al.
0

Automated image captioning is one of the applications of Deep Learning which involves fusion of work done in computer vision and natural language processing, and it is typically performed using Encoder-Decoder architectures. In this project, we have implemented and experimented with various flavors of multi-modal image captioning networks where ResNet101, DenseNet121 and VGG19 based CNN Encoders and Attention based LSTM Decoders were explored. We have studied the effect of beam size and the use of pretrained word embeddings and compared them to baseline CNN encoder and RNN decoder architecture. The goal is to analyze the performance of each approach using various evaluation metrics including BLEU, CIDEr, ROUGE and METEOR. We have also explored model explainability using Visual Attention Maps (VAM) to highlight parts of the images which has maximum contribution for predicting each word of the generated caption.

READ FULL TEXT

page 7

page 11

page 12

research
11/03/2020

Attention Beam: An Image Captioning Approach

The aim of image captioning is to generate textual description of a give...
research
02/22/2021

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images...
research
02/21/2020

Image to Language Understanding: Captioning approach

Extracting context from visual representations is of utmost importance i...
research
06/15/2019

Generating Diverse and Informative Natural Language Fashion Feedback

Recent advances in multi-modal vision and language tasks enable a new se...
research
07/09/2020

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network

Domestic service robots (DSRs) are a promising solution to the shortage ...
research
12/17/2020

Efficient CNN-LSTM based Image Captioning using Neural Network Compression

Modern Neural Networks are eminent in achieving state of the art perform...
research
03/06/2019

A Character-Level Approach to the Text Normalization Problem Based on a New Causal Encoder

Text normalization is a ubiquitous process that appears as the first ste...

Please sign up or login with your details

Forgot password? Click here to reset