UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese

11/10/2022
by   Nghia Hieu Nguyen, et al.
0

Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.

READ FULL TEXT
research
11/11/2019

Recognition of Images of Korean Characters Using Embedded Networks

Despite the significant success in the field of text recognition, comple...
research
03/23/2022

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

Ethiopic/Amharic script is one of the oldest African writing systems, wh...
research
08/04/2018

Language Model Supervision for Handwriting Recognition Model Adaptation

Training state-of-the-art offline handwriting recognition (HWR) models r...
research
02/02/2023

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Homophone characters are common in tonal syllable-based languages, such ...
research
06/05/2022

Two Decades of Bengali Handwritten Digit Recognition: A Survey

Handwritten Digit Recognition (HDR) is one of the most challenging tasks...
research
06/25/2023

Weakly Supervised Scene Text Generation for Low-resource Languages

A large number of annotated training images is crucial for training succ...
research
01/11/2019

Color Recognition for Rubik's Cube Robot

In this paper, we proposed three methods to solve color recognition of R...

Please sign up or login with your details

Forgot password? Click here to reset