MedICaT: A Dataset of Medical Images, Captions, and Textual References

by   Sanjay Subramanian, et al.

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75 text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate to the text. To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context. MedICaT consists of 217K images from 131K open access biomedical papers, and includes captions, inline references for 74 manually annotated subfigures and subcaptions for a subset of figures. Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures and demonstrate the utility of inline references in image-text matching. Our data and code can be accessed at


page 1

page 3


SciCap: Generating Captions for Scientific Figures

Researchers use figures to communicate rich, complex information in scie...

The CL-SciSumm Shared Task 2018: Results and Key Insights

This overview describes the official results of the CL-SciSumm Shared Ta...

Overview and Results: CL-SciSumm Shared Task 2019

The CL-SciSumm Shared Task is the first medium-scale shared task on scie...

SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

Generating texts in scientific papers requires not only capturing the co...

Revising Image-Text Retrieval via Multi-Modal Entailment

An outstanding image-text retrieval model depends on high-quality labele...

Two tales of science technology linkage: Patent in-text versus front-page references

There is recurrent debate about how useful science is for technological ...

Incorporating Visual Layout Structures for Scientific Text Classification

Classifying the core textual components of a scientific paper-title, aut...