Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

12/16/2021
by   Nikolai Vogler, et al.
7

We present a self-supervised pre-training approach for learning rich visual language representations for both handwritten and printed historical document transcription. After supervised fine-tuning of our pre-trained encoder representations for low-resource document transcription on two languages, (1) a heterogeneous set of handwritten Islamicate manuscript images and (2) early modern English printed documents, we show a meaningful improvement in recognition accuracy over the same supervised model trained from scratch with as few as 30 line image transcriptions for training. Our masked language model-style pre-training strategy, where the model is trained to be able to identify the true masked visual representation from distractors sampled from within the same line, encourages learning robust contextualized language representations invariant to scribal writing style and printing noise present across documents.

READ FULL TEXT

page 1

page 4

page 5

research
11/10/2019

Effectiveness of self-supervised pre-training for speech recognition

We present pre-training approaches for self-supervised representation le...
research
06/22/2020

Self-Supervised Representations Improve End-to-End Speech Translation

End-to-end speech-to-text translation can provide a simpler and smaller ...
research
03/02/2023

Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMT

We aim to investigate whether UNMT approaches with self-supervised pre-t...
research
05/22/2019

A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

Automatic analysis of scanned historical documents comprises a wide rang...
research
03/06/2023

ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

Keyword spotting (KWS) in historical documents is an important tool for ...
research
08/15/2023

Handwritten Stenography Recognition and the LION Dataset

Purpose: In this paper, we establish a baseline for handwritten stenogra...
research
04/07/2023

Linking Representations with Multimodal Contrastive Learning

Many applications require grouping instances contained in diverse docume...

Please sign up or login with your details

Forgot password? Click here to reset