Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents

05/26/2022
by   Nguyen Hong Son, et al.
0

This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and sequence labeling. The combination allows the model to deal with long documents with sparse information (the small amount of extracted information). The model is trained end-to-end to jointly optimize the two tasks in a unified manner. Experimental results on four business datasets in English and Japanese show that the model achieves promising results and is significantly faster than the normal span-based extraction method. The code is also available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2020

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

Techniques for automatically extracting important content elements from ...
research
02/13/2020

Keyphrase Extraction with Span-based Feature Representations

Keyphrases are capable of providing semantic metadata characterizing doc...
research
05/16/2021

Doc2Dict: Information Extraction as Text Generation

Typically, information extraction (IE) requires a pipeline approach: fir...
research
12/21/2022

How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?

Text-to-text generation models have increasingly become the go-to soluti...
research
06/19/2023

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction

Universal Information Extraction (UIE) has been introduced as a unified ...
research
09/30/2013

Semi-structured data extraction and modelling: the WIA Project

Over the last decades, the amount of data of all kinds available electro...
research
02/28/2017

A Joint Identification Approach for Argumentative Writing Revisions

Prior work on revision identification typically uses a pipeline method: ...

Please sign up or login with your details

Forgot password? Click here to reset