Self-supervised Data Bootstrapping for Deep Optical Character Recognition of Identity Documents

08/12/2019
by   Oliver Mothes, et al.
0

The essential task of verifying person identities at airports and national borders is very time consuming. To accelerate it, optical character recognition for identity documents (IDs) using dictionaries is not appropriate due to high variability of the text content in IDs, e.g., individual street names or surnames. Additionally, no properties of the used fonts in IDs are known. Therefore, we propose an iterative self-supervised bootstrapping approach using a smart strategy to mine real character data from IDs. In combination with synthetically generated character data, the real data is used to train efficient convolutional neural networks for character classification serving a practical runtime as well as a high accuracy. On a dataset with 74 character classes, we achieve an average class-wise accuracy of 99.4 we would apply a classifier trained only using synthetic data, the accuracy is reduced to 58.1 outperforms an established open-source framework

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2018

State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines

In this paper we evaluate Optical Character Recognition (OCR) of 19th ce...
research
11/01/2022

Self-supervised Character-to-Character Distillation

Handling complicated text images (e.g., irregular structures, low resolu...
research
10/27/2020

It's All in the Name: A Character Based Approach To Infer Religion

Demographic inference from text has received a surge of attention in the...
research
09/10/2020

OCR Graph Features for Manipulation Detection in Documents

Detecting manipulations in digital documents is becoming increasingly im...
research
07/20/2022

BYEL : Bootstrap on Your Emotion Latent

According to the problem of dataset construction cost for training in de...
research
06/07/2022

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

Optical character recognition (OCR) technology has been widely used in v...
research
11/05/2014

Optical Character Recognition, Using K-Nearest Neighbors

The problem of optical character recognition, OCR, has been widely discu...

Please sign up or login with your details

Forgot password? Click here to reset