Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts

04/30/2021
by   Mark-Christoph Müller, et al.
0

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs

Every day, thousands of digital documents are generated with useful info...
research
11/04/2018

Char2char Generation with Reranking for the E2E NLG Challenge

This paper describes our submission to the E2E NLG Challenge. Recently, ...
research
04/15/2021

Effect of Post-processing on Contextualized Word Representations

Post-processing of static embedding has beenshown to improve their perfo...
research
09/17/2020

Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform

Segmentation of handwritten document images into text lines and words is...
research
03/06/2022

An Adaptive Technique to Categorize Indic Language Documents

The significant growth of the electronic media to store and exchange tex...
research
05/28/2019

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections

Word error rate of an ocr is often higher than its character error rate....
research
07/07/2020

Unsupervised Data Extraction from Computer-generated Documents with Single Line Formatting

Processing large amounts of data is an essential problem of the big data...

Please sign up or login with your details

Forgot password? Click here to reset