Improved Image Captioning via Policy Gradient optimization of SPIDEr

12/01/2016
by   Siqi Liu, et al.
0

Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for. In this paper, we show how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination we call SPIDEr): the SPICE score ensures our captions are semantically faithful to the image, while CIDEr score ensures our captions are syntactically fluent. The PG method we propose improves on the prior MIXER approach, by using Monte Carlo rollouts instead of mixing MLE training with PG. We show empirically that our algorithm leads to easier optimization and improved results compared to MIXER. Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.

READ FULL TEXT
research
04/06/2020

B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning

Bayesian deep neural networks (DNN) provide a mathematically grounded fr...
research
06/23/2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Image captioning aims to describe visual content in natural language. As...
research
09/08/2020

Towards Unique and Informative Captioning of Images

Despite considerable progress, state of the art image captioning models ...
research
10/12/2018

Pre-gen metrics: Predicting caption quality metrics without generating captions

Image caption generation systems are typically evaluated against referen...
research
06/12/2015

Technical Report: Image Captioning with Semantically Similar Images

This report presents our submission to the MS COCO Captioning Challenge ...
research
04/30/2018

Improved Image Captioning with Adversarial Semantic Alignment

In this paper we propose a new conditional GAN for image captioning that...
research
05/30/2018

Neural Joking Machine : Humorous image captioning

What is an effective expression that draws laughter from human beings? I...

Please sign up or login with your details

Forgot password? Click here to reset