Fast Image Caption Generation with Position Alignment

12/13/2019
by   Zheng-cong Fei, et al.
0

Recent neural network models for image captioning usually employ an encoder-decoder architecture, where the decoder adopts a recursive sequence decoding way. However, such autoregressive decoding may result in sequential error accumulation and slow generation which limit the applications in practice. Non-autoregressive (NA) decoding has been proposed to cover these issues but suffers from language quality problem due to the indirect modeling of the target distribution. Towards that end, we propose an improved NA prediction framework to accelerate image captioning. Our decoding part consists of a position alignment to order the words that describe the content detected in the given image, and a fine non-autoregressive decoder to generate elegant descriptions. Furthermore, we introduce an inference strategy that regards position information as a latent variable to guide the further sentence generation. The Experimental results on public datasets show that our proposed model achieves better performance compared to general NA captioning models, while achieves comparable performance as autoregressive image captioning models with a significant speedup.

READ FULL TEXT
research
06/03/2019

Masked Non-Autoregressive Image Captioning

Existing captioning models often adopt the encoder-decoder architecture,...
research
10/11/2021

Semi-Autoregressive Image Captioning

Current state-of-the-art approaches for image captioning typically adopt...
research
05/10/2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Most image captioning models are autoregressive, i.e. they generate each...
research
01/24/2021

Fast Sequence Generation with Multi-Agent Reinforcement Learning

Autoregressive sequence Generation models have achieved state-of-the-art...
research
08/22/2019

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Diverse and accurate vision+language modeling is an important goal to re...
research
05/20/2023

DiffCap: Exploring Continuous Diffusion on Image Captioning

Current image captioning works usually focus on generating descriptions ...
research
12/06/2022

Semantic-Conditional Diffusion Networks for Image Captioning

Recent advances on text-to-image generation have witnessed the rise of d...

Please sign up or login with your details

Forgot password? Click here to reset