Symlink: A New Dataset for Scientific Symbol-Description Linking

04/26/2022
by   Viet Dac Lai, et al.
0

Mathematical symbols and descriptions appear in various forms across document section boundaries without explicit markup. In this paper, we present a new large-scale dataset that emphasizes extracting symbols and descriptions in scientific documents. Symlink annotates scientific papers of 5 different domains (i.e., computer science, biology, physics, mathematics, and economics). Our experiments on Symlink demonstrate the challenges of the symbol-description linking task for existing models and call for further research effort in this area. We will publicly release Symlink to facilitate future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2023

What Makes a Good Dataset for Symbol Description Reading?

The usage of mathematical formulas as concise representations of a docum...
research
05/27/2019

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

In this paper, we present a new dataset for Form Understanding in Noisy ...
research
05/14/2022

ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

Systems that can automatically define unfamiliar terms hold the promise ...
research
06/08/2022

Network Report: A Structured Description for Network Datasets

The rapid development of network science and technologies depends on sha...
research
06/13/2017

A Supervised Approach to Extractive Summarisation of Scientific Papers

Automatic summarisation is a popular approach to reduce a document to it...
research
12/16/2020

MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeN...
research
04/18/2021

SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

Determining coreference of concept mentions across multiple documents is...

Please sign up or login with your details

Forgot password? Click here to reset