Gated Hierarchical Attention for Image Captioning

10/30/2018
by   Qingzhong Wang, et al.
0

Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neural machine translation, and significantly improves the performance. In this paper, we propose a bottom-up gated hierarchical attention (GHA) mechanism for image captioning. Our proposed model employs a CNN as the decoder which is able to learn different concepts at different layers, and apparently, different concepts correspond to different areas of an image. Therefore, we develop the GHA in which low-level concepts are merged into high-level concepts and simultaneously low-level attended features pass to the top to make predictions. Our GHA significantly improves the performance of the model that only applies one level attention, for example, the CIDEr score increases from 0.923 to 0.999, which is comparable to the state-of-the-art models that employ attributes boosting and reinforcement learning (RL). We also conduct extensive experiments to analyze the CNN decoder and our proposed GHA, and we find that deeper decoders cannot obtain better performance, and when the convolutional decoder becomes deeper the model is likely to collapse during training.

READ FULL TEXT

page 10

page 12

research
10/09/2018

Image Captioning as Neural Machine Translation Task in SOCKEYE

Image captioning is an interdisciplinary research problem that stands be...
research
06/21/2021

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Existing research for image captioning usually represents an image using...
research
03/09/2016

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Much recent progress in Vision-to-Language problems has been achieved th...
research
06/03/2015

What value do explicit high level concepts have in vision to language problems?

Much of the recent progress in Vision-to-Language (V2L) problems has bee...
research
11/13/2018

Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

Automatically generating the descriptions of an image, i.e., image capti...
research
06/16/2019

Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding

Image captioning has attracted considerable attention in recent years. H...
research
10/20/2020

Bayesian Attention Modules

Attention modules, as simple and effective tools, have not only enabled ...

Please sign up or login with your details

Forgot password? Click here to reset