Diverse and Controllable Image Captioning with Part-of-Speech Guidance

05/31/2018
by   Aditya Deshpande, et al.
0

Automatically describing an image is an important capability for virtual assistants. Significant progress has been achieved in recent years on this task of image captioning. However, classical prediction techniques based on maximum likelihood trained LSTM nets don't embrace the inherent ambiguity of image captioning. To address this concern, recent variational auto-encoder and generative adversarial network based methods produce a set of captions by sampling from an abstract latent space. But, this latent space has limited interpretability and therefore, a control mechanism for captioning remains an open problem. This paper proposes a captioning technique conditioned on part-of-speech. Our method provides human interpretable control in form of part-of-speech. Importantly, part-of-speech is a language prior, and conditioning on it provides: (i) more diversity as evaluated by counting n-grams and the novel sentences generated, (ii) achieves high accuracy for the diverse captions on standard captioning metrics.

READ FULL TEXT
research
12/14/2020

Intrinsic Image Captioning Evaluation

The image captioning task is about to generate suitable descriptions fro...
research
03/28/2019

Describing like humans: on diversity in image captioning

Recently, the state-of-the-art models for image captioning have overtake...
research
04/28/2022

Controllable Image Captioning

State-of-the-art image captioners can generate accurate sentences to des...
research
09/03/2018

Diverse and Coherent Paragraph Generation from Images

Paragraph generation from images, which has gained popularity recently, ...
research
08/22/2019

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Diverse and accurate vision+language modeling is an important goal to re...
research
12/04/2022

Controllable Image Captioning via Prompting

Despite the remarkable progress of image captioning, existing captioners...
research
09/10/2020

Weakly Supervised Content Selection for Improved Image Captioning

Image captioning involves identifying semantic concepts in the scene and...

Please sign up or login with your details

Forgot password? Click here to reset