Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

11/13/2018
by   Shiyang Yan, et al.
14

Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short-Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated caption and the image content by the discriminator in the GAN framework and RL optimization, we make the finally generated sentences more accurate and natural. Comprehensive experiments show the improved performance of the hierarchical attention mechanism and the effectiveness of our RL-based optimization method. Our model achieves state-of-the-art results on several important metrics in the MSCOCO dataset, using only greedy inference.

READ FULL TEXT

page 1

page 4

page 6

page 10

page 12

research
06/16/2022

Image Captioning based on Feature Refinement and Reflective Decoding

Automatically generating a description of an image in natural language i...
research
05/18/2018

Improving Image Captioning with Conditional Generative Adversarial Nets

In this paper, we propose a novel conditional generative adversarial net...
research
12/30/2022

On the Interpretability of Attention Networks

Attention mechanisms form a core component of several successful deep le...
research
05/09/2017

CHAM: action recognition using convolutional hierarchical attention model

Recently, the soft attention mechanism, which was originally proposed in...
research
06/07/2019

Figure Captioning with Reasoning and Sequence-Level Training

Figures, such as bar charts, pie charts, and line plots, are widely used...
research
06/21/2020

Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

Recently, several approaches have been proposed to solve language genera...
research
10/30/2018

Gated Hierarchical Attention for Image Captioning

Attention modules connecting encoder and decoders have been widely appli...

Please sign up or login with your details

Forgot password? Click here to reset