Teacher-Critical Training Strategies for Image Captioning

09/30/2020
by   Yiqing Huang, et al.
0

Existing image captioning models are usually trained by cross-entropy (XE) loss and reinforcement learning (RL), which set ground-truth words as hard targets and force the captioning model to learn from them. However, the widely adopted training strategies suffer from misalignment in XE training and inappropriate reward assignment in RL training. To tackle these problems, we introduce a teacher model that serves as a bridge between the ground-truth caption and the caption model by generating some easier-to-learn word proposals as soft targets. The teacher model is constructed by incorporating the ground-truth image attributes into the baseline caption model. To effectively learn from the teacher model, we propose Teacher-Critical Training Strategies (TCTS) for both XE and RL training to facilitate better learning processes for the caption model. Experimental evaluations of several widely adopted caption models on the benchmark MSCOCO dataset show the proposed TCTS comprehensively enhances most evaluation metrics, especially the Bleu and Rouge-L scores, in both training stages. TCTS is able to achieve to-date the best published single model Bleu-4 and Rouge-L performances of 40.2 test split. Our codes and pre-trained models will be open-sourced.

READ FULL TEXT

page 3

page 7

research
12/27/2017

Consensus-based Sequence Training for Video Captioning

Captioning models are typically trained using the cross-entropy loss. Ho...
research
01/04/2022

StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams

In this paper, we build two automatic evaluation metrics for evaluating ...
research
01/20/2021

Macroscopic Control of Text Generation for Image Captioning

Despite the fact that image captioning models have been able to generate...
research
04/15/2019

Self-critical n-step Training for Image Captioning

Existing methods for image captioning are usually trained by cross entro...
research
05/09/2021

A Hybrid Model for Combining Neural Image Caption and k-Nearest Neighbor Approach for Image Captioning

A hybrid model is proposed that integrates two popular image captioning ...
research
01/15/2020

Show, Recall, and Tell: Image Captioning with Recall Mechanism

Generating natural and accurate descriptions in image cap-tioning has al...
research
10/13/2021

Language Modelling via Learning to Rank

We consider language modelling (LM) as a multi-label structured predicti...

Please sign up or login with your details

Forgot password? Click here to reset