Generating image captions with external encyclopedic knowledge

10/10/2022
by   Sofia Nikiforova, et al.
0

Accurately reporting what objects are depicted in an image is largely a solved problem in automatic caption generation. The next big challenge on the way to truly humanlike captioning is being able to incorporate the context of the image and related real world knowledge. We tackle this challenge by creating an end-to-end caption generation system that makes extensive use of image-specific encyclopedic data. Our approach includes a novel way of using image location to identify relevant open-domain facts in an external knowledge base, with their subsequent integration into the captioning pipeline at both the encoding and decoding stages. Our system is trained and tested on a new dataset with naturally produced knowledge-rich captions, and achieves significant improvements over multiple baselines. We empirically demonstrate that our approach is effective for generating contextualized captions with encyclopedic knowledge that is both factually accurate and relevant to the image.

READ FULL TEXT

page 1

page 14

research
08/09/2015

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic ca...
research
03/20/2021

3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model

In this paper, we build a multi-style generative model for stylish image...
research
06/06/2023

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

In scholarly documents, figures provide a straightforward way of communi...
research
07/15/2022

LineCap: Line Charts for Data Visualization Captioning Models

Data visualization captions help readers understand the purpose of a vis...
research
04/17/2020

Transform and Tell: Entity-Aware News Image Captioning

We propose an end-to-end model which generates captions for images embed...
research
03/14/2023

Implicit and Explicit Commonsense for Multi-sentence Video Captioning

Existing dense or paragraph video captioning approaches rely on holistic...
research
08/20/2018

Deep Multimodal Image-Repurposing Detection

Nefarious actors on social media and other platforms often spread rumors...

Please sign up or login with your details

Forgot password? Click here to reset