Weakly Supervised Content Selection for Improved Image Captioning

09/10/2020
by   Khyathi Raghavi Chandu, et al.
3

Image captioning involves identifying semantic concepts in the scene and describing them in fluent natural language. Recent approaches do not explicitly model the semantic concepts and train the model only for the end goal of caption generation. Such models lack interpretability and controllability, primarily due to sub-optimal content selection. We address this problem by breaking down the captioning task into two simpler, manageable and more controllable tasks – skeleton prediction and skeleton-based caption generation. We approach the former as a weakly supervised task, using a simple off-the-shelf language syntax parser and avoiding the need for additional human annotations; the latter uses a supervised-learning approach. We investigate three methods of conditioning the caption on skeleton in the encoder, decoder and both. Our compositional model generates significantly better quality captions on out of domain test images, as judged by human annotators. Additionally, we demonstrate the cross-language effectiveness of the English skeleton to other languages including French, Italian, German, Spanish and Hindi. This compositional nature of captioning exhibits the potential of unpaired image captioning, thereby reducing the dependence on expensive image-caption pairs. Furthermore, we investigate the use of skeletons as a knob to control certain properties of the generated image caption, such as length, content, and gender expression.

READ FULL TEXT

page 3

page 6

page 7

research
10/23/2018

A Neural Compositional Paradigm for Image Captioning

Mainstream captioning models often follow a sequential structure to gene...
research
09/10/2019

Compositional Generalization in Image Captioning

Image captioning models are usually evaluated on their ability to descri...
research
05/31/2018

Diverse and Controllable Image Captioning with Part-of-Speech Guidance

Automatically describing an image is an important capability for virtual...
research
03/07/2022

Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition

The goal of unpaired image captioning (UIC) is to describe images withou...
research
03/27/2018

Neural Baby Talk

We introduce a novel framework for image captioning that can produce nat...
research
11/30/2020

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Controllable Image Captioning is a recent sub-field in the multi-modal t...

Please sign up or login with your details

Forgot password? Click here to reset