Transformer-based HTR for Historical Documents

We apply the TrOCR framework to real-world, historical manuscripts and show that TrOCR per se is a strong model, ideal for transfer learning. TrOCR has been trained on English only, but it can adapt to other languages that use the Latin alphabet fairly easily and with little training material. We compare TrOCR against a SOTA HTR framework (Transkribus) and show that it can beat such systems. This finding is essential since Transkribus performs best when it has access to baseline information, which is not needed at all to fine-tune TrOCR.

READ FULL TEXT

page 1

page 2

page 3

research
05/03/2023

DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents

Language identification describes the task of recognizing the language o...
research
10/26/2022

The Biscari Archive. A case study of the application of Transkribus tool

The Paterno' Castello Principi di Biscari Archive, preserved at the Stat...
research
07/14/2023

Aspect-Driven Structuring of Historical Dutch Newspaper Archives

Digital libraries oftentimes provide access to historical newspaper arch...
research
06/01/2023

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

This paper investigates the impact of data volume and the use of similar...
research
08/16/2023

S2R: Exploring a Double-Win Transformer-Based Framework for Ideal and Blind Super-Resolution

Nowadays, deep learning based methods have demonstrated impressive perfo...
research
07/25/2017

Un modèle pour la représentation des connaissances temporelles dans les documents historiques

Processing and publishing the data of the historical sciences in the sem...
research
11/10/2018

Handwriting Recognition of Historical Documents with few labeled data

Historical documents present many challenges for offline handwriting rec...

Please sign up or login with your details

Forgot password? Click here to reset