Consensus-based Sequence Training for Video Captioning

12/27/2017
by   Sang Phan, et al.
0

Captioning models are typically trained using the cross-entropy loss. However, their performance is evaluated on other metrics designed to better correlate with human assessments. Recently, it has been shown that reinforcement learning (RL) can directly optimize these metrics in tasks such as captioning. However, this is computationally costly and requires specifying a baseline reward at each step to make training converge. We propose a fast approach to optimize one's objective of interest through the REINFORCE algorithm. First we show that, by replacing model samples with ground-truth sentences, RL training can be seen as a form of weighted cross-entropy loss, giving a fast, RL-based pre-training algorithm. Second, we propose to use the consensus among ground-truth captions of the same video as the baseline reward. This can be computed very efficiently. We call the complete proposal Consensus-based Sequence Training (CST). Applied to the MSRVTT video captioning benchmark, our proposals train significantly faster than comparable methods and establish a new state-of-the-art on the task, improving the CIDEr score from 47.3 to 54.2.

READ FULL TEXT

page 1

page 8

research
08/07/2017

Reinforced Video Captioning with Entailment Rewards

Sequence-to-sequence models have shown promising improvements on the tem...
research
09/30/2020

Teacher-Critical Training Strategies for Image Captioning

Existing image captioning models are usually trained by cross-entropy (X...
research
09/25/2022

Paraphrasing Is All You Need for Novel Object Captioning

Novel object captioning (NOC) aims to describe images containing objects...
research
07/25/2021

Boosting Video Captioning with Dynamic Loss Network

Video captioning is one of the challenging problems at the intersection ...
research
01/24/2021

Fast Sequence Generation with Multi-Agent Reinforcement Learning

Autoregressive sequence Generation models have achieved state-of-the-art...
research
04/15/2019

Self-critical n-step Training for Image Captioning

Existing methods for image captioning are usually trained by cross entro...
research
08/14/2019

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

Natural question generation (QG) is a challenging yet rewarding task, th...

Please sign up or login with your details

Forgot password? Click here to reset