CNN+CNN: Convolutional Decoders for Image Captioning

05/23/2018
by   Qingzhong Wang, et al.
0

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural network (RNN) or long-short term memory (LSTM) based models dominate this field. However, RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence. In this paper, we propose a framework that only employs convolutional neural networks (CNNs) to generate captions. Owing to parallel computing, our basic model is around 3 times faster than NIC (an LSTM-based model) during training time, while also providing better results. We conduct extensive experiments on MSCOCO and investigate the influence of the model width and depth. Compared with LSTM-based models that apply similar attention mechanisms, our proposed models achieves comparable scores of BLEU-1,2,3,4 and METEOR, and higher scores of CIDEr. We also test our model on the paragraph annotation dataset, and get higher CIDEr score compared with hierarchical LSTMs

READ FULL TEXT
research
11/05/2016

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an em...
research
07/02/2019

Neural Image Captioning

In recent years, the biggest advances in major Computer Vision tasks, su...
research
09/03/2022

vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM

This study presents our approach on the automatic Vietnamese image capti...
research
11/24/2017

Convolutional Image Captioning

Image captioning is an important but challenging task, applicable to vir...
research
01/07/2018

Approximate FPGA-based LSTMs under Computation Time Constraints

Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM...
research
10/15/2019

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models comb...
research
11/23/2016

Video Captioning with Transferred Semantic Attributes

Automatically generating natural language descriptions of videos plays a...

Please sign up or login with your details

Forgot password? Click here to reset