Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

02/22/2021
by   Sulabh Katiyar, et al.
19

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional Neural Network as encoder to extract features from images, Hierarchical Context based Word Embeddings for word representations and a Deep Stacked Long Short Term Memory network as decoder, in addition to using Image Data Augmentation to avoid over-fitting. For data Augmentation, we use Horizontal and Vertical Flipping in addition to Perspective Transformations on the images. We evaluate our proposed methods with two image captioning frameworks- Encoder-Decoder and Soft Attention. Evaluation on widely used metrics have shown that our approach leads to considerable improvement in model performance.

READ FULL TEXT

page 11

page 14

page 15

research
05/14/2021

Empirical Analysis of Image Caption Generation using Deep Learning

Automated image captioning is one of the applications of Deep Learning w...
research
03/08/2021

Analysis of Convolutional Decoder for Image Caption Generation

Recently Convolutional Neural Networks have been proposed for Sequence M...
research
04/04/2016

Image Captioning with Deep Bidirectional LSTMs

This work presents an end-to-end trainable deep bidirectional LSTM (Long...
research
05/20/2019

Image Captioning based on Deep Learning Methods: A Survey

Image captioning is a challenging task and attracting more and more atte...
research
02/28/2022

Interactive Machine Learning for Image Captioning

We propose an approach for interactive learning for an image captioning ...
research
02/25/2021

Bangla language textual image description by hybrid neural network model

Automatic image captioning task in different language is a challenging t...
research
06/06/2023

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory

Interactive machine learning (IML) is a beneficial learning paradigm in ...

Please sign up or login with your details

Forgot password? Click here to reset