Block the Label and Noise: An N-Gram Masked Speller for Chinese Spell Checking

05/05/2023
by   Haiyun Yang, et al.
0

Recently, Chinese Spell Checking(CSC), a task to detect erroneous characters in a sentence and correct them, has attracted extensive interest because of its wide applications in various NLP tasks. Most of the existing methods have utilized BERT to extract semantic information for CSC task. However, these methods directly take sentences with only a few errors as inputs, where the correct characters may leak answers to the model and dampen its ability to capture distant context; while the erroneous characters may disturb the semantic encoding process and result in poor representations. Based on such observations, this paper proposes an n-gram masking layer that masks current and/or surrounding tokens to avoid label leakage and error disturbance. Moreover, considering that the mask strategy may ignore multi-modal information indicated by errors, a novel dot-product gating mechanism is proposed to integrate the phonological and morphological information with semantic representation. Extensive experiments on SIGHAN datasets have demonstrated that the pluggable n-gram masking mechanism can improve the performance of prevalent CSC models and the proposed methods in this paper outperform multiple powerful state-of-the-art models.

READ FULL TEXT
research
05/26/2021

Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking

Chinese Spell Checking (CSC) aims to detect and correct erroneous charac...
research
12/08/2022

Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next

While pre-trained Chinese language models have demonstrated impressive p...
research
11/15/2022

Chinese Spelling Check with Nearest Neighbors

Chinese Spelling Check (CSC) aims to detect and correct error tokens in ...
research
03/02/2022

The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling...
research
08/30/2019

Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation

The task of Chinese text spam detection is very challenging due to both ...
research
05/24/2023

Disentangled Phonetic Representation for Chinese Spelling Correction

Chinese Spelling Correction (CSC) aims to detect and correct erroneous c...
research
07/17/2022

Contextual Similarity is More Valuable than Character Similarity: Curriculum Learning for Chinese Spell Checking

Chinese Spell Checking (CSC) task aims to detect and correct Chinese spe...

Please sign up or login with your details

Forgot password? Click here to reset