Language-Driven Region Pointer Advancement for Controllable Image Captioning

11/30/2020
by   Annika Lindh, et al.
0

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55 recall of 97.92 state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.

READ FULL TEXT

page 2

page 4

page 9

page 12

page 13

page 14

research
12/22/2016

Re-evaluating Automatic Metrics for Image Captioning

The task of generating natural language descriptions from images has rec...
research
06/29/2017

Actor-Critic Sequence Training for Image Captioning

Generating natural language descriptions of images is an important capab...
research
07/26/2022

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providi...
research
11/26/2018

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...
research
03/06/2019

A Synchronized Multi-Modal Attention-Caption Dataset and Analysis

In this work, we present a novel multi-modal dataset consisting of eye m...
research
09/06/2018

Object Hallucination in Image Captioning

Despite continuously improving performance, contemporary image captionin...
research
09/10/2020

Weakly Supervised Content Selection for Improved Image Captioning

Image captioning involves identifying semantic concepts in the scene and...

Please sign up or login with your details

Forgot password? Click here to reset