DeepAI AI Chat
Log In Sign Up

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

by   Jialin Wu, et al.

Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images, detected objects, or scene graph features as inputs.


page 1

page 7


Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in und...

Senti-Attend: Image Captioning using Sentiment and Attention

There has been much recent work on image captioning models that describe...

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Unsupervised image captioning is a challenging task that aims at generat...

Rethinking the Reference-based Distinctive Image Captioning

Distinctive Image Captioning (DIC) – generating distinctive captions tha...

Improving Image Captioning with Conditional Generative Adversarial Nets

In this paper, we propose a novel conditional generative adversarial net...

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

In this work, we study the robustness of a CNN+RNN based image captionin...

SubICap: Towards Subword-informed Image Captioning

Existing Image Captioning (IC) systems model words as atomic units in ca...