Deep Reinforcement Learning-based Image Captioning with Embedding Reward

04/12/2017
by   Zhou Ren, et al.
0

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.

READ FULL TEXT
research
09/13/2018

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement...
research
11/17/2018

Improving Automatic Source Code Summarization via Deep Reinforcement Learning

Code summarization provides a high level natural language description of...
research
10/31/2019

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Most RNN-based image captioning models receive supervision on the output...
research
10/05/2020

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

We deal with the problem of generating textual captions from optical rem...
research
02/14/2020

ResCap V1: Deep Residual Learning Based Image Captioning

Image Captioning alludes to the process of generating text description f...
research
12/19/2018

Generating Diverse and Meaningful Captions

Image Captioning is a task that requires models to acquire a multi-modal...
research
04/15/2019

Self-critical n-step Training for Image Captioning

Existing methods for image captioning are usually trained by cross entro...

Please sign up or login with your details

Forgot password? Click here to reset