Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model

02/14/2021
by   Mohammad Faiyaz Khan, et al.
0

Image Captioning is an arduous task of producing syntactically and semantically correct textual descriptions of an image in natural language with context related to the image. Existing notable pieces of research in Bengali Image Captioning (BIC) are based on encoder-decoder architecture. This paper presents an end-to-end image captioning system utilizing a multimodal architecture by combining a one-dimensional convolutional neural network (CNN) to encode sequence information with a pre-trained ResNet-50 model image encoder for extracting region-based visual features. We investigate our approach's performance on the BanglaLekhaImageCaptions dataset using the existing evaluation metrics and perform a human evaluation for qualitative analysis. Experiments show that our approach's language encoder captures the fine-grained information in the caption, and combined with the image features, it generates accurate and diversified caption. Our work outperforms all the existing BIC works and achieves a new state-of-the-art (SOTA) performance by scoring 0.651 on BLUE-1, 0.572 on CIDEr, 0.297 on METEOR, 0.434 on ROUGE, and 0.357 on SPICE.

READ FULL TEXT

page 8

page 9

page 10

research
10/24/2021

Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network

Automatic Image Captioning is the never-ending effort of creating syntac...
research
07/26/2018

Recurrent Fusion Network for Image Captioning

Recently, much advance has been made in image captioning, and an encoder...
research
05/20/2019

Multimodal Transformer with Multi-View Visual Representation for Image Captioning

Image captioning aims to automatically generate a natural language descr...
research
04/15/2022

Image Captioning In the Transformer Age

Image Captioning (IC) has achieved astonishing developments by incorpora...
research
11/20/2017

End-to-end Trained CNN Encode-Decoder Networks for Image Steganography

All the existing image steganography methods use manually crafted featur...
research
09/11/2017

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sen...
research
10/14/2019

Tell-the-difference: Fine-grained Visual Descriptor via a Discriminating Referee

In this paper, we investigate a novel problem of telling the difference ...

Please sign up or login with your details

Forgot password? Click here to reset