Uncertainty-Aware Image Captioning

11/30/2022
by   Zhengcong Fei, et al.
0

It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates insertion of discontinuous candidate words between existing words from easy to difficult until converged. We hypothesize that high-uncertainty words in a sentence need more prior information to make a correct decision and should be produced at a later stage. The resulting non-autoregressive hierarchy makes the caption generation explainable and intuitive. Specifically, we utilize an image-conditioned bag-of-word model to measure the word uncertainty and apply a dynamic programming algorithm to construct the training pairs. During inference, we devise an uncertainty-adaptive parallel beam search technique that yields an empirically logarithmic time complexity. Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed.

READ FULL TEXT
research
05/10/2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Most image captioning models are autoregressive, i.e. they generate each...
research
10/11/2021

Semi-Autoregressive Image Captioning

Current state-of-the-art approaches for image captioning typically adopt...
research
06/17/2021

Semi-Autoregressive Transformer for Image Captioning

Current state-of-the-art image captioning models adopt autoregressive de...
research
07/22/2022

Efficient Modeling of Future Context for Image Captioning

Existing approaches to image captioning usually generate the sentence wo...
research
09/19/2019

Adaptively Aligned Image Captioning via Adaptive Attention Time

Recent neural models for image captioning usually employs an encoder-dec...
research
08/05/2021

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Video captioning combines video understanding and language generation. D...
research
05/24/2017

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

We develop the first approximate inference algorithm for 1-Best (and M-B...

Please sign up or login with your details

Forgot password? Click here to reset