Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

05/12/2023
by   Jianfeng Kuang, et al.
0

Visual information extraction (VIE), which aims to simultaneously perform OCR and information extraction in a unified framework, has drawn increasing attention due to its essential role in various applications like understanding receipts, goods, and traffic signs. However, as existing benchmark datasets for VIE mainly consist of document images without the adequate diversity of layout structures, background disturbs, and entity categories, they cannot fully reveal the challenges of real-world applications. In this paper, we propose a large-scale dataset consisting of camera images for VIE, which contains not only the larger variance of layout, backgrounds, and fonts but also much more types of entities. Besides, we propose a novel framework for end-to-end VIE that combines the stages of OCR and information extraction in an end-to-end learning fashion. Different from the previous end-to-end approaches that directly adopt OCR features as the input of an information extraction module, we propose to use contrastive learning to narrow the semantic gap caused by the difference between the tasks of OCR and information extraction. We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities. These results demonstrate our dataset is more practical for promoting advanced VIE algorithms. In addition, experiments demonstrate that the proposed VIE method consistently achieves the obvious performance gains on the proposed and SROIE datasets.

READ FULL TEXT
research
01/24/2021

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Visual information extraction (VIE) has attracted considerable attention...
research
07/20/2023

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Key Information Extraction (KIE) is a challenging multimodal task that a...
research
04/16/2021

ScreenSeg: On-Device Screenshot Layout Analysis

We propose a novel end-to-end solution that performs a Hierarchical Layo...
research
03/23/2023

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

Recently, Visual Information Extraction (VIE) has been becoming increasi...
research
04/16/2021

Cost-effective End-to-end Information Extraction for Semi-structured Document Images

A real-world information extraction (IE) system for semi-structured docu...
research
06/24/2021

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Visual Information Extraction (VIE) task aims to extract key information...
research
12/21/2021

Transferable End-to-end Room Layout Estimation via Implicit Encoding

We study the problem of estimating room layouts from a single panorama i...

Please sign up or login with your details

Forgot password? Click here to reset