VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations

05/13/2021
by   Peng Zhang, et al.
0

Document layout analysis is crucial for understanding document structures. On this task, vision and semantics of documents, and relations between layout components contribute to the understanding process. Though many works have been proposed to exploit the above information, they show unsatisfactory results. NLP-based methods model layout analysis as a sequence labeling task and show insufficient capabilities in layout modeling. CV-based methods model layout analysis as a detection or segmentation task, but bear limitations of inefficient modality fusion and lack of relation modeling between layout components. To address the above limitations, we propose a unified framework VSR for document layout analysis, combining vision, semantics and relations. VSR supports both NLP-based and CV-based methods. Specifically, we first introduce vision through document image and semantics through text embedding maps. Then, modality-specific visual and semantic features are extracted using a two-stream network, which are adaptively fused to make full use of complementary information. Finally, given component candidates, a relation module based on graph neural network is incorported to model relations between components and output final results. On three popular benchmarks, VSR outperforms previous models by large margins. Code will be released soon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2021

Document Layout Analysis with Aesthetic-Guided Image Augmentation

Document layout analysis (DLA) plays an important role in information ex...
research
08/22/2022

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Recognizing the layout of unstructured digital documents is crucial when...
research
04/12/2022

Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison

In this paper, we present a novel neural graph matching approach applied...
research
08/12/2021

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

Documents often contain complex physical structures, which make the Docu...
research
09/11/2019

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

For understanding generic documents, information like font sizes, column...
research
02/19/2020

LAMBERT: Layout-Aware language Modeling using BERT for information extraction

In this paper we introduce a novel approach to the problem of understand...
research
06/24/2021

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Visual Information Extraction (VIE) task aims to extract key information...

Please sign up or login with your details

Forgot password? Click here to reset