Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

09/11/2017
by   Jiuxiang Gu, et al.
0

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder's test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.

READ FULL TEXT
research
02/14/2021

Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model

Image Captioning is an arduous task of producing syntactically and seman...
research
03/24/2020

Learning Compact Reward for Image Captioning

Adversarial learning has shown its advances in generating natural and di...
research
12/02/2016

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcemen...
research
05/18/2018

Improving Image Captioning with Conditional Generative Adversarial Nets

In this paper, we propose a novel conditional generative adversarial net...
research
01/30/2016

Convolutional Pose Machines

Pose Machines provide a sequential prediction framework for learning ric...
research
06/14/2022

Measuring Representational Harms in Image Captioning

Previous work has largely considered the fairness of image captioning sy...
research
01/07/2022

Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

We propose a novel optimization framework that crops a given image based...

Please sign up or login with your details

Forgot password? Click here to reset