A human-inspired recognition system for premodern Japanese historical documents

05/14/2019
by   Anh Duc Le, et al.
0

Recognition of historical documents is a challenging problem due to the noised, damaged characters and background. However, in Japanese historical documents, not only contains the mentioned problems, pre-modern Japanese characters were written in cursive and are connected. Therefore, character segmentation based methods do not work well. This leads to the idea of creating a new recognition system. In this paper, we propose a human-inspired document reading system to recognize multiple lines of premodern Japanese historical documents. During the reading, people employ eyes movement to determine the start of a text line. Then, they move the eyes from the current character/word to the next character/word. They can also determine the end of a line or skip a figure to move to the next line. The eyes movement integrates with visual processing to operate the reading process in the brain. We employ attention-based encoder-decoder to implement this recognition system. First, the recognition system detects where to start a text line. Second, the system scans and recognize character by character until the text line is completed. Then, the system continues to detect the start of the next text line. This process is repeated until reading the whole document. We tested our human-inspired recognition system on the pre-modern Japanese historical document provide by the PRMU Kuzushiji competition. The results of the experiments demonstrate the superiority and effectiveness of our proposed system by achieving Sequence Error Rate of 9.87 level 3 of the dataset, respectively. These results outperform to any other systems participated in the PRMU Kuzushiji competition.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

research
05/06/2020

Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by Random Lines Erasure and Curriculum Learning

Recognizing the full-page of Japanese historical documents is a challeng...
research
07/14/2020

Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization

In this paper, we propose an end-to-end trainable framework for restorin...
research
04/10/2007

Text Line Segmentation of Historical Documents: a Survey

There is a huge amount of historical documents in libraries and in vario...
research
10/21/2019

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning

Kuzushiji, a cursive writing style, had been used in Japan for over a th...
research
06/08/2015

License Plate Recognition System Based on Color Coding Of License Plates

License Plate Recognition Systems are used to determine the license plat...
research
03/08/2019

ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records

We propose a Historical Document Reading Challenge on Large Chinese Stru...

Please sign up or login with your details

Forgot password? Click here to reset