Analysis of Convolutional Decoder for Image Caption Generation

03/08/2021
by   Sulabh Katiyar, et al.
13

Recently Convolutional Neural Networks have been proposed for Sequence Modelling tasks such as Image Caption Generation. However, unlike Recurrent Neural Networks, the performance of Convolutional Neural Networks as Decoders for Image Caption Generation has not been extensively studied. In this work, we analyse various aspects of Convolutional Neural Network based Decoders such as Network complexity and depth, use of Data Augmentation, Attention mechanism, length of sentences used during training, etc on performance of the model. We perform experiments using Flickr8k and Flickr30k image captioning datasets and observe that unlike Recurrent Neural Network based Decoder, Convolutional Decoder for Image Captioning does not generally benefit from increase in network depth, in the form of stacked Convolutional Layers, and also the use of Data Augmentation techniques. In addition, use of Attention mechanism also provides limited performance gains with Convolutional Decoder. Furthermore, we observe that Convolutional Decoders show performance comparable with Recurrent Decoders only when trained using sentences of smaller length which contain up to 15 words but they have limitations when trained using higher sentence lengths which suggests that Convolutional Decoders may not be able to model long-term dependencies efficiently. In addition, the Convolutional Decoder usually performs poorly on CIDEr evaluation metric as compared to Recurrent Decoder.

READ FULL TEXT

page 8

page 11

page 12

page 14

page 16

research
02/22/2021

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images...
research
03/03/2022

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Image captioning is a fast-growing research field of computer vision and...
research
07/26/2018

Recurrent Fusion Network for Image Captioning

Recently, much advance has been made in image captioning, and an encoder...
research
12/21/2016

An Empirical Study of Language CNN for Image Captioning

Language Models based on recurrent neural networks have dominated recent...
research
09/05/2020

Reverse-engineering Bar Charts Using Neural Networks

Reverse-engineering bar charts extracts textual and numeric information ...
research
07/09/2020

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network

Domestic service robots (DSRs) are a promising solution to the shortage ...
research
01/28/2021

Development of a Vertex Finding Algorithm using Recurrent Neural Network

Deep learning is a rapidly-evolving technology with possibility to signi...

Please sign up or login with your details

Forgot password? Click here to reset