Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network

06/07/2017
by   Xiao Yang, et al.
0

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.

READ FULL TEXT

page 2

page 6

page 7

page 12

page 14

page 15

page 16

research
01/26/2018

PDNet: Semantic Segmentation integrated with a Primal-Dual Network for Document binarization

Binarization of digital documents is the task of classifying each pixel ...
research
04/14/2021

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

As camera-based documents are increasingly used, the rectification of di...
research
09/05/2017

PageNet: Page Boundary Extraction in Historical Handwritten Documents

When digitizing a document into an image, it is common to include a surr...
research
09/11/2019

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

For understanding generic documents, information like font sizes, column...
research
12/15/2020

docExtractor: An off-the-shelf historical document element extraction

We present docExtractor, a generic approach for extracting visual elemen...
research
09/24/2018

Chargrid: Towards Understanding 2D Documents

We introduce a novel type of text representation that preserves the 2D l...
research
06/12/2019

Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Network

We present a new handwritten text segmentation method by training a conv...

Please sign up or login with your details

Forgot password? Click here to reset