Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by Random Lines Erasure and Curriculum Learning

05/06/2020
by   Anh Duc Le, et al.
0

Recognizing the full-page of Japanese historical documents is a challenging problem due to the complex layout/background and difficulty of writing styles, such as cursive and connected characters. Most of the previous methods divided the recognition process into character segmentation and recognition. However, those methods provide only character bounding boxes and classes without text transcription. In this paper, we enlarge our previous humaninspired recognition system from multiple lines to the full-page of Kuzushiji documents. The human-inspired recognition system simulates human eye movement during the reading process. For the lack of training data, we propose a random text line erasure approach that randomly erases text lines and distorts documents. For the convergence problem of the recognition system for fullpage documents, we employ curriculum learning that trains the recognition system step by step from the easy level (several text lines of documents) to the difficult level (full-page documents). We tested the step training approach and random text line erasure approach on the dataset of the Kuzushiji recognition competition on Kaggle. The results of the experiments demonstrate the effectiveness of our proposed approaches. These results are competitive with other participants of the Kuzushiji recognition competition.

READ FULL TEXT

page 5

page 6

page 7

page 10

page 11

research
05/14/2019

A human-inspired recognition system for premodern Japanese historical documents

Recognition of historical documents is a challenging problem due to the ...
research
04/15/2020

An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers

One important and particularly challenging step in the optical character...
research
05/11/2023

Combining OCR Models for Reading Early Modern Printed Books

In this paper, we investigate the usage of fine-grained font recognition...
research
01/12/2012

Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach

We study the task of cleaning scanned text documents that are strongly c...
research
08/29/2023

Is it an i or an l: Test-time Adaptation of Text Line Recognition Models

Recognizing text lines from images is a challenging problem, especially ...
research
03/16/2021

Digital Peter: Dataset, Competition and Handwriting Recognition Methods

This paper presents a new dataset of Peter the Great's manuscripts and d...
research
10/21/2019

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning

Kuzushiji, a cursive writing style, had been used in Japan for over a th...

Please sign up or login with your details

Forgot password? Click here to reset