TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

05/27/2020
by   Peng Zhang, et al.
85

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting and recognizing texts in the images and (2) information extraction for analyzing and extracting key elements from previously extracted plain text. However, they mainly focus on improving information extraction task, while neglecting the fact that text reading and information extraction are mutually correlated. In this paper, we propose a unified end-to-end text reading and information extraction network, where the two tasks can reinforce each other. Specifically, the multimodal visual and textual features of text reading are fused for information extraction and in turn, the semantics in information extraction contribute to the optimization of text reading. On three real-world datasets with diverse document images (from fixed layout to variable layout, from structured text to semi-structured text), our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.

READ FULL TEXT
research
01/24/2021

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Visual information extraction (VIE) has attracted considerable attention...
research
04/16/2021

Cost-effective End-to-end Information Extraction for Semi-structured Document Images

A real-world information extraction (IE) system for semi-structured docu...
research
09/24/2018

Chargrid: Towards Understanding 2D Documents

We introduce a novel type of text representation that preserves the 2D l...
research
07/20/2021

Readability Research: An Interdisciplinary Approach

Readability is on the cusp of a revolution. Fixed text is becoming fluid...
research
02/18/2021

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

We address the challenging problem of Natural Language Comprehension bey...
research
03/26/2021

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

Key information extraction from document images is of paramount importan...
research
08/23/2021

Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones

Information Extraction from visual documents enables convenient and inte...

Please sign up or login with your details

Forgot password? Click here to reset