TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables

05/12/2021
by   Harsh Desai, et al.
0

Information Extraction (IE) from the tables present in scientific articles is challenging due to complicated tabular representations and complex embedded text. This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles. TabLeX consists of two subsets, one for table structure extraction and the other for table content extraction. Each table image is accompanied by its corresponding LATEX source code. To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts. Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images. Towards the end, we experiment with a transformer-based existing baseline to report performance scores. In contrast to the static benchmarks, we plan to augment this dataset with more complex and diverse tables at regular intervals.

READ FULL TEXT
research
10/31/2022

Tables to LaTeX: structure and content extraction from scientific tables

Scientific documents contain tables that list important information in a...
research
07/03/2022

DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

A crucial component in the curation of KB for a scientific domain is inf...
research
03/27/2023

A large-scale dataset for end-to-end table recognition in the wild

Table recognition (TR) is one of the research hotspots in pattern recogn...
research
05/23/2023

Schema-Driven Information Extraction from Heterogeneous Tables

In this paper, we explore the question of whether language models (LLMs)...
research
02/26/2019

A framework for information extraction from tables in biomedical literature

The scientific literature is growing exponentially, and professionals ar...
research
05/30/2021

ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX

Tables present important information concisely in many scientific docume...
research
05/25/2021

Tab.IAIS: Flexible Table Recognition and Semantic Interpretation System

Table extraction is an important but still unsolved problem. In this pap...

Please sign up or login with your details

Forgot password? Click here to reset