Towards Diverse and Natural Image Descriptions via a Conditional GAN

03/17/2017
by   Bo Dai, et al.
0

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the "ground-truth" captions while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity -- two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.

READ FULL TEXT

page 3

page 7

page 8

research
10/13/2021

Diverse Audio Captioning via Adversarial Training

Audio captioning aims at generating natural language descriptions for au...
research
12/05/2022

Towards Generating Diverse Audio Captions via Adversarial Training

Automated audio captioning is a cross-modal translation task for describ...
research
10/31/2019

Can adversarial training learn image captioning ?

Recently, generative adversarial networks (GAN) have gathered a lot of i...
research
10/26/2019

Diverse Video Captioning Through Latent Variable Expansion with Conditional GAN

Automatically describing video content with text description is challeng...
research
11/21/2019

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

Human ratings are currently the most accurate way to assess the quality ...

Please sign up or login with your details

Forgot password? Click here to reset