Can Neural Image Captioning be Controlled via Forced Attention?

11/10/2019
by   Philipp Sadler, et al.
36

Learned dynamic weighting of the conditioning signal (attention) has been shown to improve neural language generation in a variety of settings. The weights applied when generating a particular output sequence have also been viewed as providing a potentially explanatory insight into the internal workings of the generator. In this paper, we reverse the direction of this connection and ask whether through the control of the attention of the model we can control its output. Specifically, we take a standard neural image captioning model that uses attention, and fix the attention to pre-determined areas in the image. We evaluate whether the resulting output is more likely to mention the class of the object in that area than the normally generated caption. We introduce three effective methods to control the attention and find that these are producing expected results in up to 28.56

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

research
01/04/2020

Understanding Image Captioning Models beyond Visualizing Attention

This paper explains predictions of image captioning models with attentio...
research
06/26/2017

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Image captioning has been recently gaining a lot of attention thanks to ...
research
09/19/2019

Adaptively Aligned Image Captioning via Adaptive Attention Time

Recent neural models for image captioning usually employs an encoder-dec...
research
12/08/2018

Attend More Times for Image Captioning

Most attention-based image captioning models attend to the image once pe...
research
10/23/2018

Area Attention

Existing attention mechanisms, are mostly item-based in that a model is ...
research
01/17/2020

Adapting Grad-CAM for Embedding Networks

The gradient-weighted class activation mapping (Grad-CAM) method can fai...

Please sign up or login with your details

Forgot password? Click here to reset