Attention on Attention for Image Captioning

08/19/2019
by   Lun Huang, et al.
4

Attention mechanisms are widely used in current encoder/decoder frameworks of image captioning, where a weighted average on encoded vectors is generated at each time step to guide the caption decoding process. However, the decoder has little idea of whether or how well the attended vector and the given attention query are related, which could make the decoder give misled results. In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries. AoA first generates an information vector and an attention gate using the attention result and the current context, then adds another attention by applying element-wise multiplication to them and finally obtains the attended information, the expected useful knowledge. We apply AoA to both the encoder and the decoder of our image captioning model, which we name as AoA Network (AoANet). Experiments show that AoANet outperforms all previously published methods and achieves a new state-of-the-art performance of 129.8 CIDEr-D score on MS COCO Karpathy offline test split and 129.6 CIDEr-D (C40) score on the official online testing server. Code is available at https://github.com/husthuaan/AoANet.

READ FULL TEXT

page 6

page 8

page 11

page 12

research
04/03/2018

Learning to Guide Decoding for Image Captioning

Recently, much advance has been made in image captioning, and an encoder...
research
08/27/2018

simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

The encode-decoder framework has shown recent success in image captionin...
research
12/10/2020

Image Captioning with Context-Aware Auxiliary Guidance

Image captioning is a challenging computer vision task, which aims to ge...
research
08/13/2022

ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

Expansion methods explore the possibility of performance bottlenecks in ...
research
03/30/2018

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Recently, caption generation with an encoder-decoder framework has been ...
research
09/16/2021

Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning

Automatic transcription of scene understanding in images and videos is a...
research
07/06/2020

EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition

Printed Mathematical expression recognition (PMER) aims to transcribe a ...

Please sign up or login with your details

Forgot password? Click here to reset