SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check

04/26/2020
by   Xingyi Cheng, et al.
0

Chinese Spelling Check (CSC) is a task to detect and correct spelling errors in Chinese natural language. Existing methods have made attempts to incorporate the similarity knowledge between Chinese characters. However, they take the similarity knowledge as either an external input resource or just heuristic rules. This paper proposes to incorporate phonological and visual similarity knowledge into language models for CSC via a specialized graph convolutional network (SpellGCN). The model builds a graph over the characters, and SpellGCN is learned to map this graph into a set of inter-dependent character classifiers. These classifiers are applied to the representations extracted by another network, such as BERT, enabling the whole network to be end-to-end trainable. Experiments [%s] are conducted on three human-annotated datasets. Our method achieves superior performance against previous models by a large margin.

READ FULL TEXT

page 4

page 8

research
05/26/2021

Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking

Chinese Spell Checking (CSC) aims to detect and correct erroneous charac...
research
08/30/2019

Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation

The task of Chinese text spam detection is very challenging due to both ...
research
03/02/2022

The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling...
research
07/17/2022

Contextual Similarity is More Valuable than Character Similarity: Curriculum Learning for Chinese Spell Checking

Chinese Spell Checking (CSC) task aims to detect and correct Chinese spe...
research
03/14/2023

Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion

Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage fram...
research
08/18/2023

ChatHaruhi: Reviving Anime Character in Reality via Large Language Model

Role-playing chatbots built on large language models have drawn interest...
research
11/06/2017

Distributed Representation for Traditional Chinese Medicine Herb via Deep Learning Models

Traditional Chinese Medicine (TCM) has accumulated a big amount of preci...

Please sign up or login with your details

Forgot password? Click here to reset