Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

05/10/2020
by   Longteng Guo, et al.
0

Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2021

Fast Sequence Generation with Multi-Agent Reinforcement Learning

Autoregressive sequence Generation models have achieved state-of-the-art...
research
11/30/2022

Uncertainty-Aware Image Captioning

It is well believed that the higher uncertainty in a word of the caption...
research
08/05/2021

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Video captioning combines video understanding and language generation. D...
research
01/15/2020

Show, Recall, and Tell: Image Captioning with Recall Mechanism

Generating natural and accurate descriptions in image cap-tioning has al...
research
06/17/2021

Semi-Autoregressive Transformer for Image Captioning

Current state-of-the-art image captioning models adopt autoregressive de...
research
12/13/2019

Fast Image Caption Generation with Position Alignment

Recent neural network models for image captioning usually employ an enco...
research
04/18/2022

Cross-view Brain Decoding

How the brain captures the meaning of linguistic stimuli across multiple...

Please sign up or login with your details

Forgot password? Click here to reset