Informative Image Captioning with External Sources of Information

06/20/2019
by   Sanqiang Zhao, et al.
0

An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained to generate captions that only contain common object names, thus falling short on an important "informativeness" dimension. We present a mechanism for integrating image information together with fine-grained labels (assumed to be generated by some upstream models) into a caption that describes the image in a fluent and informative manner. We introduce a multimodal, multi-encoder model based on Transformer that ingests both image features and multiple sources of entity labels. We demonstrate that we can learn to control the appearance of these entity labels in the output, resulting in captions that are both fluent and informative.

READ FULL TEXT

page 1

page 8

page 9

research
09/05/2019

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Popular metrics used for evaluating image captioning systems, such as BL...
research
08/04/2021

ICECAP: Information Concentrated Entity-aware Image Captioning

Most current image captioning systems focus on describing general image ...
research
03/01/2020

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Humans are able to describe image contents with coarse to fine details a...
research
07/21/2018

What is not where: the challenge of integrating spatial representations into deep learning architectures

This paper examines to what degree current deep learning architectures f...
research
07/22/2020

Integrating Image Captioning with Rule-based Entity Masking

Given an image, generating its natural language description (i.e., capti...
research
05/26/2022

Fine-grained Image Captioning with CLIP Reward

Modern image captioning models are usually trained with text similarity ...
research
05/16/2018

Defoiling Foiled Image Captions

We address the task of detecting foiled image captions, i.e. identifying...

Please sign up or login with your details

Forgot password? Click here to reset