Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking

10/19/2022
by   Yinghui Li, et al.
0

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling errors. Recent researches start from the pretrained knowledge of language models and take multimodal information into CSC models to improve the performance. However, they overlook the rich knowledge in the dictionary, the reference book where one can learn how one character should be pronounced, written, and used. In this paper, we propose the LEAD framework, which renders the CSC model to learn heterogeneous knowledge from the dictionary in terms of phonetics, vision, and meaning. LEAD first constructs positive and negative samples according to the knowledge of character phonetics, glyphs, and definitions in the dictionary. Then a unified contrastive learning-based training scheme is employed to refine the representations of the CSC models. Extensive experiments and detailed analyses on the SIGHAN benchmark datasets demonstrate the effectiveness of our proposed methods.

READ FULL TEXT
research
03/02/2022

The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling...
research
07/17/2022

Contextual Similarity is More Valuable than Character Similarity: Curriculum Learning for Chinese Spell Checking

Chinese Spell Checking (CSC) task aims to detect and correct Chinese spe...
research
07/11/2018

Neural Chinese Word Segmentation with Dictionary Knowledge

Chinese word segmentation (CWS) is an important task for Chinese NLP. Re...
research
09/15/2022

uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers

The task of Chinese Spelling Check (CSC) is aiming to detect and correct...
research
05/30/2023

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

We introduce CDBERT, a new learning paradigm that enhances the semantics...
research
03/09/2023

Dynamic Multi-View Fusion Mechanism For Chinese Relation Extraction

Recently, many studies incorporate external knowledge into character-lev...
research
08/26/2022

AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in Educational Applications

To automatically correct handwritten assignments, the traditional approa...

Please sign up or login with your details

Forgot password? Click here to reset