Boost Image Captioning with Knowledge Reasoning

11/02/2020
by   Feicheng Huang, et al.
0

Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharmonious alignments that may reduce the quality of generated captions. In this paper, we make our efforts to reason about more accurate and meaningful captions. We first propose word attention to improve the correctness of visual attention when generating sequential descriptions word-by-word. The special word attention emphasizes on word importance when focusing on different regions of the input image, and makes full use of the internal annotation knowledge to assist the calculation of visual attention. Then, in order to reveal those incomprehensible intentions that cannot be expressed straightforwardly by machines, we introduce a new strategy to inject external knowledge extracted from knowledge graph into the encoder-decoder framework to facilitate meaningful captioning. Finally, we validate our model on two freely available captioning benchmarks: Microsoft COCO dataset and Flickr30k dataset. The results demonstrate that our approach achieves state-of-the-art performance and outperforms many of the existing approaches.

READ FULL TEXT

page 16

page 17

research
12/06/2016

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopt...
research
12/21/2016

Top-down Visual Saliency Guided by Captions

Neural image/video captioning models can generate accurate descriptions,...
research
11/22/2019

Learning to Caption Images with Two-Stream Attention and Sentence Auto-Encoder

Automatically generating natural language descriptions from an image is ...
research
06/26/2019

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive proble...
research
03/06/2020

Show, Edit and Tell: A Framework for Editing Image Captions

Most image captioning frameworks generate captions directly from images,...
research
02/15/2020

MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)

While image captioning through machines requires structured learning and...
research
11/16/2016

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised trai...

Please sign up or login with your details

Forgot password? Click here to reset