Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network

10/24/2021
by   Md Aminul Haque Palash, et al.
0

Automatic Image Captioning is the never-ending effort of creating syntactically and validating the accuracy of textual descriptions of an image in natural language with context. The encoder-decoder structure used throughout existing Bengali Image Captioning (BIC) research utilized abstract image feature vectors as the encoder's input. We propose a novel transformer-based architecture with an attention mechanism with a pre-trained ResNet-101 model image encoder for feature extraction from images. Experiments demonstrate that the language decoder in our technique captures fine-grained information in the caption and, then paired with image features, produces accurate and diverse captions on the BanglaLekhaImageCaptions dataset. Our approach outperforms all existing Bengali Image Captioning work and sets a new benchmark by scoring 0.694 on BLEU-1, 0.630 on BLEU-2, 0.582 on BLEU-3, and 0.337 on METEOR.

READ FULL TEXT

page 11

page 12

page 13

research
02/14/2021

Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model

Image Captioning is an arduous task of producing syntactically and seman...
research
09/11/2021

Bornon: Bengali Image Captioning with Transformer-based Deep learning approach

Image captioning using Encoder-Decoder based approach where CNN is used ...
research
03/05/2023

Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning

In a globalized world at the present epoch of generative intelligence, m...
research
05/25/2016

Review Networks for Caption Generation

We propose a novel extension of the encoder-decoder framework, called a ...
research
07/26/2022

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providi...
research
06/26/2019

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive proble...
research
12/22/2020

Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit

There is very little notable research on generating descriptions of the ...

Please sign up or login with your details

Forgot password? Click here to reset