Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

03/23/2023
by   Zhibo Yang, et al.
0

Recently, Visual Information Extraction (VIE) has been becoming increasingly important in both the academia and industry, due to the wide range of real-world applications. Previously, numerous works have been proposed to tackle this problem. However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these benchmarks. As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common. All these factors may lead to failures in information extraction. Therefore, as the second contribution, we explore an alternative approach to precisely and robustly extract key information from document images under such tough conditions. Specifically, in contrast to previous methods, which usually either incorporate visual information into a multi-modal architecture or train text spotting and information extraction in an end-to-end fashion, we explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities, which could largely benefit entity labeling and linking. Extensive experiments on standard benchmarks in this field as well as the proposed dataset demonstrate that the proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models. Dataset is available at https://www.modelscope.cn/datasets/damo/SIBR/summary.

READ FULL TEXT

page 1

page 7

research
09/20/2019

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Extracting entity from images is a crucial part of many OCR applications...
research
05/12/2023

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Visual information extraction (VIE), which aims to simultaneously perfor...
research
06/24/2021

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Visual Information Extraction (VIE) task aims to extract key information...
research
07/20/2023

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Key Information Extraction (KIE) is a challenging multimodal task that a...
research
06/02/2021

End-to-End Hierarchical Relation Extraction for Generic Form Understanding

Form understanding is a challenging problem which aims to recognize sema...
research
10/13/2020

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

Joint-event-extraction, which extracts structural information (i.e., ent...
research
03/26/2021

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

Key information extraction from document images is of paramount importan...

Please sign up or login with your details

Forgot password? Click here to reset