On the Accuracy of CRNNs for Line-Based OCR: A Multi-Parameter Evaluation

08/06/2020
by   Bernhard Liebl, et al.
0

We investigate how to train a high quality optical character recognition (OCR) model for difficult historical typefaces on degraded paper. Through extensive grid searches, we obtain a neural network architecture and a set of optimal data augmentation settings. We discuss the influence of factors such as binarization, input line height, network width, network depth, and other network training parameters such as dropout. Implementing these findings into a practical model, we are able to obtain a 0.44 from only 10,000 lines of training data, outperforming currently available pretrained models that were trained on more than 20 times the amount of data. We show ablations for all components of our training pipeline, which relies on the open source framework Calamari.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2015

Dropout as data augmentation

Dropout is typically interpreted as bagging a large number of models sha...
research
09/30/2017

Variational Grid Setting Network

We propose a new neural network architecture for automatic generation of...
research
05/19/2022

Neural Network Architecture Beyond Width and Depth

This paper proposes a new neural network architecture by introducing an ...
research
06/15/2021

Mixed Model OCR Training on Historical Latin Script for Out-of-the-Box Recognition and Finetuning

In order to apply Optical Character Recognition (OCR) to historical prin...
research
05/17/2023

Kitana: Efficient Data Augmentation Search for AutoML

AutoML services provide a way for non-expert users to benefit from high-...
research
09/09/2019

OCR4all -- An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

Optical Character Recognition (OCR) on historical printings is a challen...
research
02/02/2023

Neural Network Architecture for Database Augmentation Using Shared Features

The popularity of learning from data with machine learning and neural ne...

Please sign up or login with your details

Forgot password? Click here to reset