SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

06/06/2023
by   Zhishen Yang, et al.
0

In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset <cit.> to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2023

IC^3: Image Captioning by Committee Consensus

If you ask a human to describe an image, they might do so in a thousand ...
research
02/23/2023

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Effective figure captions are crucial for clear comprehension of scienti...
research
05/10/2023

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation

Automatic image captioning evaluation is critical for benchmarking and p...
research
10/10/2022

Generating image captions with external encyclopedic knowledge

Accurately reporting what objects are depicted in an image is largely a ...
research
12/27/2022

Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Image captioning is one of the straightforward tasks that can take advan...
research
01/06/2023

You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona

To build a conversational agent that interacts fluently with humans, pre...
research
10/27/2019

Memeify: A Large-Scale Meme Generation System

Interest in the research areas related to meme propagation and generatio...

Please sign up or login with your details

Forgot password? Click here to reset