Self-critical n-step Training for Image Captioning

04/15/2019
by   Junlong Gao, et al.
0

Existing methods for image captioning are usually trained by cross entropy loss, which leads to exposure bias and the inconsistency between the optimizing function and evaluation metrics. Recently it has been shown that these two issues can be addressed by incorporating techniques from reinforcement learning, where one of the popular techniques is the advantage actor-critic algorithm that calculates per-token advantage by estimating state value with a parametrized estimator at the cost of introducing estimation bias. In this paper, we estimate state value without using a parametrized value estimator. With the properties of image captioning, namely, the deterministic state transition function and the sparse reward, state value is equivalent to its preceding state-action value, and we reformulate advantage function by simply replacing the former with the latter. Moreover, the reformulated advantage is extended to n-step, which can generally increase the absolute value of the mean of reformulated advantage while lowering variance. Then two kinds of rollout are adopted to estimate state-action value, which we call self-critical n-step training. Empirically we find that our method can obtain better performance compared to the state-of-the-art methods that use the sequence level advantage and parametrized estimator respectively on the widely used MSCOCO benchmark.

READ FULL TEXT
research
06/29/2017

Actor-Critic Sequence Training for Image Captioning

Generating natural language descriptions of images is an important capab...
research
12/02/2016

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcemen...
research
08/16/2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Many vision-language tasks can be reduced to the problem of sequence pre...
research
09/30/2020

Teacher-Critical Training Strategies for Image Captioning

Existing image captioning models are usually trained by cross-entropy (X...
research
04/12/2017

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in und...
research
12/27/2017

Consensus-based Sequence Training for Video Captioning

Captioning models are typically trained using the cross-entropy loss. Ho...
research
09/13/2021

Direct Advantage Estimation

Credit assignment is one of the central problems in reinforcement learni...

Please sign up or login with your details

Forgot password? Click here to reset