DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

07/03/2022
by   Tanishq Gupta, et al.
0

A crucial component in the curation of KB for a scientific domain is information extraction from tables in the domain's published articles – tables carry important information (often numeric), which must be adequately extracted for a comprehensive machine understanding of an article. Existing table extractors assume prior knowledge of table structure and format, which may not be known in scientific tables. We study a specific and challenging table extraction problem: extracting compositions of materials (e.g., glasses, alloys). We first observe that materials science researchers organize similar compositions in a wide variety of table styles, necessitating an intelligent model for table understanding and composition extraction. Consequently, we define this novel task as a challenge for the ML community and create a training dataset comprising 4,408 distantly supervised tables, along with 1,475 manually annotated dev and test tables. We also present DiSCoMaT, a strong baseline geared towards this specific task, which combines multiple graph neural networks with several task-specific regular expressions, features, and constraints. We show that DiSCoMaT outperforms recent table processing architectures by significant margins.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2021

TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables

Information Extraction (IE) from the tables present in scientific articl...
research
08/23/2022

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Tables are widely used in several types of documents since they can brin...
research
08/24/2021

Relation Extraction from Tables using Artificially Generated Metadata

Relation Extraction (RE) from tables is the task of identifying relation...
research
05/23/2023

Schema-Driven Information Extraction from Heterogeneous Tables

In this paper, we explore the question of whether language models (LLMs)...
research
02/05/2021

Analysing the use of graphs to represent the results of Systematic Reviews in Software Engineering

The presentation of results from Systematic Literature Reviews (SLRs) is...
research
02/26/2019

A framework for information extraction from tables in biomedical literature

The scientific literature is growing exponentially, and professionals ar...
research
02/16/2021

TableLab: An Interactive Table Extraction System with Adaptive Deep Learning

Table extraction from PDF and image documents is a ubiquitous task in th...

Please sign up or login with your details

Forgot password? Click here to reset