MedICaT: A Dataset of Medical Images, Captions, and Textual References

10/12/2020
by   Sanjay Subramanian, et al.
18

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75 text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate to the text. To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context. MedICaT consists of 217K images from 131K open access biomedical papers, and includes captions, inline references for 74 manually annotated subfigures and subcaptions for a subset of figures. Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures and demonstrate the utility of inline references in image-text matching. Our data and code can be accessed at https://github.com/allenai/medicat.

READ FULL TEXT

page 1

page 3

research
10/22/2021

SciCap: Generating Captions for Scientific Figures

Researchers use figures to communicate rich, complex information in scie...
research
09/02/2019

The CL-SciSumm Shared Task 2018: Results and Key Insights

This overview describes the official results of the CL-SciSumm Shared Ta...
research
07/23/2019

Overview and Results: CL-SciSumm Shared Task 2019

The CL-SciSumm Shared Task is the first medium-scale shared task on scie...
research
10/20/2021

SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

Generating texts in scientific papers requires not only capturing the co...
research
02/23/2023

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Effective figure captions are crucial for clear comprehension of scienti...
research
05/10/2023

Enriching language models with graph-based context information to better understand textual data

A considerable number of texts encountered daily are somehow connected w...
research
04/02/2019

Combinatorial inequalities

This is an expanded version of the Notices of the AMS column with the sa...

Please sign up or login with your details

Forgot password? Click here to reset