Guiding Attention using Partial-Order Relationships for Image Captioning

04/15/2022
by   Murad Popattia, et al.
0

The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions. The experimental results based on MSCOCO dataset shows the competitiveness of our approach, with many state-of-the-art models on various evaluation metrics.

READ FULL TEXT

page 6

page 8

research
12/12/2016

Text-guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demons...
research
08/08/2019

Image Captioning using Facial Expression and Attention

Benefiting from advances in machine vision and natural language processi...
research
05/29/2019

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural la...
research
11/19/2015

Order-Embeddings of Images and Language

Hypernymy, textual entailment, and image captioning can be seen as speci...
research
08/25/2019

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings

Understanding images without explicit supervision has become an importan...
research
11/22/2019

TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning

Image captioning can be improved if the structure of the graphical repre...
research
09/16/2022

Belief Revision based Caption Re-ranker with Visual Semantic Information

In this work, we focus on improving the captions generated by image-capt...

Please sign up or login with your details

Forgot password? Click here to reset