LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books

01/20/2017
by   Christian Reul, et al.
0

A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.

READ FULL TEXT

page 2

page 4

page 5

research
09/09/2019

OCR4all -- An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

Optical Character Recognition (OCR) on historical printings is a challen...
research
11/23/2017

Open Evaluation Tool for Layout Analysis of Document Images

This paper presents an open tool for standardizing the evaluation proces...
research
06/08/2016

DefExt: A Semi Supervised Definition Extraction Tool

We present DefExt, an easy to use semi supervised Definition Extraction ...
research
09/06/2021

GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph

In this manuscript, we introduce a semi-automatic scene graph annotation...
research
01/20/2017

Case Study of a highly automated Layout Analysis and OCR of an incunabulum: 'Der Heiligen Leben' (1488)

This paper provides the first thorough documentation of a high quality d...
research
02/15/2019

Crime Analysis using Open Source Information

In this paper, we present a method of crime analysis from open source in...
research
06/28/2015

WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Visual Information Extraction

The visual layout of a webpage can provide valuable clues for certain ty...

Please sign up or login with your details

Forgot password? Click here to reset