Cost-effective End-to-end Information Extraction for Semi-structured Document Images

04/16/2021
by   Wonseok Hwang, et al.
0

A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost. One can instead consider an end-to-end model that directly maps the input to the target output and simplify the entire process. However, such generation approach is known to lead to unstable performance if not designed carefully. Here we present our recent effort on transitioning from our existing pipeline-based IE system to an end-to-end system focusing on practical challenges that are associated with replacing and deploying the system in real, large-scale production. By carefully formulating document IE as a sequence generation task, we show that a single end-to-end IE system can be built and still achieve competent performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes ...
research
04/03/2022

A sequence-to-sequence approach for document-level relation extraction

Motivated by the fact that many relations cross the sentence boundary, t...
research
05/12/2023

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Visual information extraction (VIE), which aims to simultaneously perfor...
research
12/18/2018

Attend, Copy, Parse - End-to-end information extraction from documents

Document information extraction tasks performed by humans create data co...
research
03/10/2021

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

We combine deep learning and Conditional Probabilistic Context Free Gram...
research
02/23/2022

Semi-Structured Query Grounding for Document-Oriented Databases with Deep Retrieval and Its Application to Receipt and POI Matching

Semi-structured query systems for document-oriented databases have many ...
research
07/20/2020

A Gated and Bifurcated Stacked U-Net Module for Document Image Dewarping

Capturing images of documents is one of the easiest and most used method...

Please sign up or login with your details

Forgot password? Click here to reset