Learning to Guide Decoding for Image Captioning

04/03/2018
by   Wenhao Jiang, et al.
0

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.

READ FULL TEXT
research
08/19/2019

Attention on Attention for Image Captioning

Attention mechanisms are widely used in current encoder/decoder framewor...
research
09/07/2020

VisCode: Embedding Information in Visualization Images using Encoder-Decoder Network

We present an approach called VisCode for embedding information into vis...
research
03/30/2018

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Recently, caption generation with an encoder-decoder framework has been ...
research
12/13/2020

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

Transformer-based architectures have shown great success in image captio...
research
05/28/2021

New Image Captioning Encoder via Semantic Visual Feature Matching for Heavy Rain Images

Image captioning generates text that describes scenes from input images....
research
04/15/2022

Image Captioning In the Transformer Age

Image Captioning (IC) has achieved astonishing developments by incorpora...
research
01/06/2023

An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU)

Image captioning by the encoder-decoder framework has shown tremendous a...

Please sign up or login with your details

Forgot password? Click here to reset