B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning

04/06/2020
by   Shashank Bujimalla, et al.
14

Bayesian deep neural networks (DNN) provide a mathematically grounded framework to quantify uncertainty in their predictions. We propose a Bayesian variant of policy-gradient based reinforcement learning training technique for image captioning models to directly optimize non-differentiable image captioning quality metrics such as CIDEr-D. We extend the well-known Self-Critical Sequence Training (SCST) approach for image captioning models by incorporating Bayesian inference, and refer to it as B-SCST. The "baseline" reward for the policy-gradients in B-SCST is generated by averaging predictive quality metrics (CIDEr-D) of the captions drawn from the distribution obtained using a Bayesian DNN model. This predictive distribution is inferred using Monte Carlo (MC) dropout, which is one of the standard ways to approximate variational inference. We observe that B-SCST improves all the standard captioning quality scores on both Flickr30k and MS COCO datasets, compared to the SCST approach. We also provide a detailed study of uncertainty quantification for the predicted captions, and demonstrate that it correlates well with the CIDEr-D scores. To our knowledge, this is the first such analysis, and it can pave way to more practical image captioning solutions with interpretable models.

READ FULL TEXT
research
12/01/2016

Improved Image Captioning via Policy Gradient optimization of SPIDEr

Current image captioning methods are usually trained via (penalized) max...
research
08/11/2018

Dropout during inference as a model for neurological degeneration in an image captioning network

We replicate a variation of the image captioning architecture by Vinyals...
research
12/28/2021

Extended Self-Critical Pipeline for Transforming Videos to Text (TRECVID-VTT Task 2021) – Team: MMCUniAugsburg

The Multimedia and Computer Vision Lab of the University of Augsburg par...
research
12/02/2016

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcemen...
research
05/30/2018

Neural Joking Machine : Humorous image captioning

What is an effective expression that draws laughter from human beings? I...
research
07/14/2020

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets

A wide range of image captioning models has been developed, achieving si...
research
03/08/2020

Better Captioning with Sequence-Level Exploration

Sequence-level learning objective has been widely used in captioning tas...

Please sign up or login with your details

Forgot password? Click here to reset