Language Model Supervision for Handwriting Recognition Model Adaptation

08/04/2018
by   Chris Tensmeyer, et al.
4

Training state-of-the-art offline handwriting recognition (HWR) models requires large labeled datasets, but unfortunately such datasets are not available in all languages and domains due to the high cost of manual labeling.We address this problem by showing how high resource languages can be leveraged to help train models for low resource languages.We propose a transfer learning methodology where we adapt HWR models trained on a source language to a target language that uses the same writing script.This methodology only requires labeled data in the source language, unlabeled data in the target language, and a language model of the target language. The language model is used in a bootstrapping fashion to refine predictions in the target language for use as ground truth in training the model.Using this approach we demonstrate improved transferability among French, English, and Spanish languages using both historical and modern handwriting datasets. In the best case, transferring with the proposed methodology results in character error rates nearly as good as full supervised training.

READ FULL TEXT
research
02/20/2019

Phoneme Level Language Models for Sequence Based Low Resource ASR

Building multilingual and crosslingual models help bring different langu...
research
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...
research
04/23/2021

Transfer training from smaller language model

Large language models have led to state-of-the-art accuracies across a r...
research
11/06/2018

Transfer learning of language-independent end-to-end ASR with language model fusion

This work explores better adaptation methods to low-resource languages u...
research
05/22/2023

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

This paper details the process of developing the first native large gene...
research
04/01/2016

A Semisupervised Approach for Language Identification based on Ladder Networks

In this study we address the problem of training a neuralnetwork for lan...

Please sign up or login with your details

Forgot password? Click here to reset