Schema-Driven Information Extraction from Heterogeneous Tables

05/23/2023
by   Fan Bai, et al.
0

In this paper, we explore the question of whether language models (LLMs) can support cost-efficient information extraction from complex tables. We introduce schema-driven information extraction, a new task that uses LLMs to transform tabular data into structured records following a human-authored schema. To assess various LLM's capabilities on this task, we develop a benchmark composed of tables from three diverse domains: machine learning papers, chemistry tables, and webpages. Accompanying the benchmark, we present InstrucTE, a table extraction method based on instruction-tuned LLMs. This method necessitates only a human-constructed extraction schema, and incorporates an error-recovery strategy. Notably, InstrucTE demonstrates competitive performance without task-specific labels, achieving an F1 score ranging from 72.3 to 95.7. Moreover, we validate the feasibility of distilling more compact table extraction models to minimize extraction costs and reduce API reliance. This study paves the way for the future development of instruction-following models for cost-efficient table extraction.

READ FULL TEXT

page 3

page 8

research
11/21/2019

Schemaless Queries over Document Tables with Dependencies

Unstructured enterprise data such as reports, manuals and guidelines oft...
research
05/12/2021

TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables

Information Extraction (IE) from the tables present in scientific articl...
research
07/03/2022

DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

A crucial component in the curation of KB for a scientific domain is inf...
research
09/06/2021

Text-to-Table: A New Way of Information Extraction

We study a new problem setting of information extraction (IE), referred ...
research
04/17/2023

InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction

Large language models have unlocked strong multi-task capabilities from ...
research
08/24/2021

Relation Extraction from Tables using Artificially Generated Metadata

Relation Extraction (RE) from tables is the task of identifying relation...

Please sign up or login with your details

Forgot password? Click here to reset