Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition

05/25/2023
by   Dongnan Gui, et al.
0

There are more than 80,000 character categories in Chinese while most of them are rarely used. To build a high performance handwritten Chinese character recognition (HCCR) system supporting the full character set with a traditional approach, many training samples need be collected for each character category, which is both time-consuming and expensive. In this paper, we propose a novel approach to transforming Chinese character glyph images generated from font libraries to handwritten ones with a denoising diffusion probabilistic model (DDPM). Training from handwritten samples of a small character set, the DDPM is capable of mapping printed strokes to handwritten ones, which makes it possible to generate photo-realistic and diverse style handwritten samples of unseen character categories. Combining DDPM-synthesized samples of unseen categories with real samples of other categories, we can build an HCCR system to support the full character set. Experimental results on CASIA-HWDB dataset with 3,755 character categories show that the HCCR systems trained with synthetic samples perform similarly with the one trained with real samples in terms of recognition accuracy. The proposed method has the potential to address HCCR with a larger vocabulary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2023

Improving Handwritten OCR with Training Samples Generated by Glyph Conditional Denoising Diffusion Probabilistic Model

Constructing a highly accurate handwritten OCR system requires large amo...
research
01/25/2018

Generating Handwritten Chinese Characters using CycleGAN

Handwriting of Chinese has long been an important skill in East Asia. Ho...
research
04/17/2019

TextCaps : Handwritten Character Recognition with Very Small Datasets

Many localized languages struggle to reap the benefits of recent advance...
research
03/17/2021

Interpretable Distance Metric Learning for Handwritten Chinese Character Recognition

Handwriting recognition is of crucial importance to both Human Computer ...
research
04/13/2020

Embedded Large-Scale Handwritten Chinese Character Recognition

As handwriting input becomes more prevalent, the large symbol inventory ...
research
02/05/2021

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

We introduce CharacterGAN, a generative model that can be trained on onl...
research
12/12/2022

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

Font generation is a difficult and time-consuming task, especially in th...

Please sign up or login with your details

Forgot password? Click here to reset