A sequential guiding network with attention for image captioning

11/01/2018
by   Daouda Sow, et al.
0

The recent advances of deep learning in both computer vision (CV)and natural language processing (NLP) provide us a new way of un-derstanding semantics, by which we can deal with more challeng-ing tasks such as automatic description generation from natural im-ages. In this challenge, the encoder-decoder framework has achievedpromising performance when a convolutional neural network (CNN)is used as image encoder and a recurrent neural network (RNN) asdecoder. In this paper, we introduce a sequential guiding networkthat guides the decoder during word generation. The new model is anextension of the encoder-decoder framework with attention that hasan additional guiding long short-term memory (LSTM) and can betrained in an end-to-end manner by using image/descriptions pairs.We validate our approach by conducting extensive experiments on abenchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2022

vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM

This study presents our approach on the automatic Vietnamese image capti...
research
07/28/2021

Experimenting with Self-Supervision using Rotation Prediction for Image Captioning

Image captioning is a task in the field of Artificial Intelligence that ...
research
01/16/2018

Image denoising and restoration with CNN-LSTM Encoder Decoder with Direct Attention

Image denoising is always a challenging task in the field of computer vi...
research
04/04/2019

An End-to-End Baseline for Video Captioning

Building correspondences across different modalities, such as video and ...
research
06/26/2019

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive proble...
research
12/23/2020

ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition

Despite the recent advances in optical character recognition (OCR), math...
research
09/02/2015

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

We propose an end-to-end, domain-independent neural encoder-aligner-deco...

Please sign up or login with your details

Forgot password? Click here to reset