Look, Read and Enrich. Learning from Scientific Figures and their Captions

Compared to natural images, understanding scientific figures is particularly hard for machines. However, there is a valuable source of information in scientific literature that until now has remained untapped: the correspondence between a figure and its caption. In this paper we investigate what can be learnt by looking at a large number of figures and reading their captions, and introduce a figure-caption correspondence learning task that makes use of our observations. Training visual and language networks without supervision other than pairs of unconstrained figures and captions is shown to successfully solve this task. We also show that transferring lexical and semantic knowledge from a knowledge graph significantly enriches the resulting features. Finally, we demonstrate the positive impact of such features in other tasks involving scientific text and figures, like multi-modal classification and machine comprehension for question answering, outperforming supervised baselines and ad-hoc approaches.

READ FULL TEXT

page 8

page 9

research
05/23/2017

Look, Listen and Learn

We consider the question: what can be learnt by looking at and listening...
research
05/09/2021

Passage Retrieval for Outside-Knowledge Visual Question Answering

In this work, we address multi-modal information needs that contain text...
research
01/20/2023

Screen Correspondence: Mapping Interchangeable Elements between UIs

Understanding user interface (UI) functionality is a useful yet challeng...
research
10/22/2021

SciCap: Generating Captions for Scientific Figures

Researchers use figures to communicate rich, complex information in scie...
research
02/23/2023

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Effective figure captions are crucial for clear comprehension of scienti...
research
08/09/2023

Prompting In-Context Operator Learning with Sensor Data, Equations, and Natural Language

In the growing domain of scientific machine learning, in-context operato...
research
10/08/2017

Smarnet: Teaching Machines to Read and Comprehend Like Human

Machine Comprehension (MC) is a challenging task in Natural Language Pro...

Please sign up or login with your details

Forgot password? Click here to reset