Synthetic Document Generator for Annotation-free Layout Recognition

11/11/2021
by   Natraj Raman, et al.
6

Analyzing the layout of a document to identify headers, sections, tables, figures etc. is critical to understanding its content. Deep learning based approaches for detecting the layout structure of document images have been promising. However, these methods require a large number of annotated examples during training, which are both expensive and time consuming to obtain. We describe here a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of the layout elements. The proposed generative process treats every physical component of a document as a random variable and models their intrinsic dependencies using a Bayesian Network graph. Our hierarchical formulation using stochastic templates allow parameter sharing between documents for retaining broad themes and yet the distributional characteristics produces visually unique samples, thereby capturing complex and diverse layouts. We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents.

READ FULL TEXT

page 8

page 10

page 12

page 16

research
04/18/2020

A Large Dataset of Historical Japanese Documents with Complex Layouts

Deep learning-based approaches for automatic document layout analysis an...
research
07/09/2021

Graph-based Deep Generative Modelling for Document Layout Generation

One of the major prerequisites for any deep learning approach is the ava...
research
01/28/2021

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Creating presentation materials requires complex multimodal reasoning sk...
research
08/26/2023

Bengali Document Layout Analysis with Detectron2

Document digitization is vital for preserving historical records, effici...
research
09/01/2019

READ: Recursive Autoencoders for Document Layout Generation

Layout is a fundamental component of any graphic design. Creating large ...
research
01/24/2022

Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

The document layout analysis (DLA) aims to decompose document images int...
research
01/11/2021

Learning to Automate Chart Layout Configurations Using Crowdsourced Paired Comparison

We contribute a method to automate parameter configurations for chart la...

Please sign up or login with your details

Forgot password? Click here to reset