Reading Chinese in Natural Scenes with a Bag-of-Radicals Prior

10/05/2022
by   Liu Yongbin, et al.
0

Scene text recognition (STR) on Latin datasets has been extensively studied in recent years, and state-of-the-art (SOTA) models often reach high accuracy. However, the performance on non-Latin transcripts, such as Chinese, is not satisfactory. In this paper, we collect six open-source Chinese STR datasets and evaluate a series of classic methods performing well on Latin datasets, finding a significant performance drop. To improve the performance on Chinese datasets, we propose a novel radical-embedding (RE) representation to utilize the ideographic descriptions of Chinese characters. The ideographic descriptions of Chinese characters are firstly converted to bags of radicals and then fused with learnable character embeddings by a character-vector-fusion-module (CVFM). In addition, we utilize a bag of radicals as supervision signals for multi-task training to improve the ideographic structure perception of our model. Experiments show performance of the model with RE + CVFM + multi-task training is superior compared with the baseline on six Chinese STR datasets. In addition, we utilize a bag of radicals as supervision signals for multi-task training to improve the ideographic structure perception of our model. Experiments show performance of the model with RE + CVFM + multi-task training is superior compared with the baseline on six Chinese STR datasets.

READ FULL TEXT

page 2

page 15

page 17

research
08/30/2019

Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation

The task of Chinese text spam detection is very challenging due to both ...
research
05/26/2020

CalliGAN: Style and Structure-aware Chinese Calligraphy Character Generator

Chinese calligraphy is the writing of Chinese characters as an art form ...
research
06/30/2021

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

Recent pretraining models in Chinese neglect two important aspects speci...
research
07/11/2018

Neural Chinese Word Segmentation with Dictionary Knowledge

Chinese word segmentation (CWS) is an important task for Chinese NLP. Re...
research
06/28/2023

An Adversarial Multi-Task Learning Method for Chinese Text Correction with Semantic Detection

Text correction, especially the semantic correction of more widely used ...
research
09/04/2023

Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese Geographic Re-Ranking

Chinese geographic re-ranking task aims to find the most relevant addres...
research
07/23/2023

Context Perception Parallel Decoder for Scene Text Recognition

Scene text recognition (STR) methods have struggled to attain high accur...

Please sign up or login with your details

Forgot password? Click here to reset