Adaptively Aligned Image Captioning via Adaptive Attention Time

09/19/2019
by   Lun Huang, et al.
13

Recent neural models for image captioning usually employs an encoder-decoder framework with attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), which can adaptively align source to target for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, image regions and caption words can be aligned adaptively in the decoding process: an image region can be mapped to arbitrary number of caption words while a caption word can also attend to arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients. AAT is also generic and can be employed by any sequence-to-sequence learning task. In this paper, we empirically show that AAT improves over state-of-the-art methods on the task of image captioning.

READ FULL TEXT
research
12/06/2016

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopt...
research
10/19/2022

Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning

Recently, attention based models have been used extensively in many sequ...
research
04/02/2020

Consistent Multiple Sequence Decoding

Sequence decoding is one of the core components of most visual-lingual m...
research
08/27/2018

A neural attention model for speech command recognition

This paper introduces a convolutional recurrent network with attention f...
research
11/10/2019

Can Neural Image Captioning be Controlled via Forced Attention?

Learned dynamic weighting of the conditioning signal (attention) has bee...
research
10/28/2018

Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are chal...
research
11/30/2022

Uncertainty-Aware Image Captioning

It is well believed that the higher uncertainty in a word of the caption...

Please sign up or login with your details

Forgot password? Click here to reset