Spatial Dual-Modality Graph Reasoning for Key Information Extraction

by   Hongbin Sun, et al.

Key information extraction from document images is of paramount importance in office automation. Conventional template matching based approaches fail to generalize well to document images of unseen templates, and are not robust against text recognition errors. In this paper, we propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images. We model document images as dual-modality graphs, nodes of which encode both the visual and textual features of detected text regions, and edges of which represent the spatial relations between neighboring text regions. The key information extraction is solved by iteratively propagating messages along graph edges and reasoning the categories of graph nodes. In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild. It contains 25 key information categories, a total of about 69000 text boxes, and is about 2 times larger than the existing public datasets. Extensive experiments validate that all information including visual features, textual features and spatial relations can benefit key information extraction. It has been shown that SDMG-R can effectively extract key information from document images of unseen templates, and obtain new state-of-the-art results on the recent popular benchmark SROIE and our WildReceipt. Our code and dataset will be publicly released.


page 1

page 5

page 8


Revising FUNSD dataset for key-value detection in document images

FUNSD is one of the limited publicly available datasets for information ...

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Visual information extraction (VIE) has attracted considerable attention...

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes ...

One-shot Key Information Extraction from Document with Deep Partial Graph Matching

Automating the Key Information Extraction (KIE) from documents improves ...

Iterative Document-level Information Extraction via Imitation Learning

We present a novel iterative extraction (IterX) model for extracting com...

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

Recently, Visual Information Extraction (VIE) has been becoming increasi...

DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Many previous methods on text-based person retrieval tasks are devoted t...

Please sign up or login with your details

Forgot password? Click here to reset