Open Set Chinese Character Recognition using Multi-typed Attributes

08/27/2018
by   Sheng He, et al.
0

Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters.

READ FULL TEXT
research
04/06/2021

Hippocampus-heuristic Character Recognition Network for Zero-shot Learning

The recognition of Chinese characters has always been a challenging task...
research
03/14/2013

A new type of judgement theorems for attribute characters in information system

The research of attribute characters in information system which contain...
research
01/24/2020

Character-independent font identification

There are a countless number of fonts with various shapes and styles. In...
research
11/03/2017

RAN: Radical analysis networks for zero-shot learning of Chinese characters

Chinese characters have a huge set of character categories, more than 20...
research
08/01/2013

Sparse arrays of signatures for online character recognition

In mathematics the signature of a path is a collection of iterated integ...
research
04/30/2019

Handwritten Chinese Font Generation with Collaborative Stroke Refinement

Automatic character generation is an appealing solution for new typeface...
research
05/31/2021

Pho(SC)Net: An Approach Towards Zero-shot Word Image Recognition in Historical Documents

Annotating words in a historical document image archive for word image r...

Please sign up or login with your details

Forgot password? Click here to reset