Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

10/31/2019
by   Jialin Wu, et al.
12

Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images, detected objects, or scene graph features as inputs.

READ FULL TEXT

page 1

page 7

research
04/12/2017

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in und...
research
11/24/2018

Senti-Attend: Image Captioning using Sentiment and Attention

There has been much recent work on image captioning models that describe...
research
04/28/2021

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Unsupervised image captioning is a challenging task that aims at generat...
research
07/22/2022

Rethinking the Reference-based Distinctive Image Captioning

Distinctive Image Captioning (DIC) – generating distinctive captions tha...
research
05/18/2018

Improving Image Captioning with Conditional Generative Adversarial Nets

In this paper, we propose a novel conditional generative adversarial net...
research
05/10/2019

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

In this work, we study the robustness of a CNN+RNN based image captionin...
research
12/24/2020

SubICap: Towards Subword-informed Image Captioning

Existing Image Captioning (IC) systems model words as atomic units in ca...

Please sign up or login with your details

Forgot password? Click here to reset