Text-guided Attention Model for Image Captioning

12/12/2016
by   Jonghwan Mun, et al.
0

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics.

READ FULL TEXT

page 1

page 6

research
08/30/2019

Reflective Decoding Network for Image Captioning

State-of-the-art image captioning methods mostly focus on improving visu...
research
06/21/2020

Improving Image Captioning with Better Use of Captions

Image captioning is a multimodal problem that has drawn extensive attent...
research
04/15/2022

Guiding Attention using Partial-Order Relationships for Image Captioning

The use of attention models for automated image captioning has enabled m...
research
05/26/2022

Prompt-based Learning for Unpaired Image Captioning

Unpaired Image Captioning (UIC) has been developed to learn image descri...
research
07/22/2017

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

Generating captions for images is a task that has recently received cons...
research
02/15/2020

MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)

While image captioning through machines requires structured learning and...
research
11/02/2020

Dual Attention on Pyramid Feature Maps for Image Captioning

Generating natural sentences from images is a fundamental learning task ...

Please sign up or login with your details

Forgot password? Click here to reset