DeepAI AI Chat
Log In Sign Up

Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning

12/28/2020
by   Vasiliki Tassopoulou, et al.
National Technical University of Athens
15

Current state-of-the-art approaches in the field of Handwritten Text Recognition are predominately single task with unigram, character level target units. In our work, we utilize a Multi-task Learning scheme, training the model to perform decompositions of the target sequence with target units of different granularity, from fine to coarse. We consider this method as a way to utilize n-gram information, implicitly, in the training process, while the final recognition is performed using only the unigram output. the difference of the internal Unigram decoding of such a multi-task approach highlights the capability of the learned internal representations, imposed by the different n-grams at the training step. We select n-grams as our target units and we experiment from unigrams to fourgrams, namely subword level granularities. These multiple decompositions are learned from the network with task-specific CTC losses. Concerning network architectures, we propose two alternatives, namely the Hierarchical and the Block Multi-task. Overall, our proposed model, even though evaluated only on the unigram task, outperforms its counterpart single-task by absolute 2.52% WER and 1.02% CER, in the greedy decoding, without any computational overhead during inference, hinting towards successfully imposing an implicit language model.

READ FULL TEXT

page 1

page 3

07/18/2018

Hierarchical Multi Task Learning With CTC

In Automatic Speech Recognition, it is still challenging to learn useful...
06/15/2021

Multi-script Handwritten Digit Recognition Using Multi-task Learning

Handwritten digit recognition is one of the extensively studied area in ...
10/10/2016

Latent Sequence Decompositions

We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
11/06/2017

Towards Language-Universal End-to-End Speech Recognition

Building speech recognizers in multiple languages typically involves rep...
09/28/2018

Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images

There are two types of information in each handwritten word image: expli...
03/01/2017

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

Most existing sequence labelling models rely on a fixed decomposition of...
04/13/2018

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

The present work investigates whether different quantification mechanism...