Spelling Error Correction with Soft-Masked BERT

05/15/2020
by   Shaohua Zhang, et al.
0

Spelling error correction is an important yet challenging task because a satisfactory solution of it essentially needs human-level language understanding ability. Without loss of generality we consider Chinese spelling error correction (CSC) in this paper. A state-of-the-art method for the task selects a character from a list of candidates for correction (including non-correction) at each position of the sentence on the basis of BERT, the language representation model. The accuracy of the method can be sub-optimal, however, because BERT does not have sufficient capability to detect whether there is an error at each position, apparently due to the way of pre-training it using mask language modeling. In this work, we propose a novel neural architecture to address the aforementioned issue, which consists of a network for error detection and a network for error correction based on BERT, with the former being connected to the latter with what we call soft-masking technique. Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two datasets demonstrate that the performance of our proposed method is significantly better than the baselines including the one solely based on BERT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Whole word masking (WWM), which masks all subwords corresponding to a wo...
research
01/10/2020

Towards Minimal Supervision BERT-based Grammar Error Correction

Current grammatical error correction (GEC) models typically consider the...
research
09/29/2021

Hierarchical Character Tagger for Short Text Spelling Error Correction

State-of-the-art approaches to spelling error correction problem include...
research
05/19/2021

Combining GCN and Transformer for Chinese Grammatical Error Detection

This paper describes our system at NLPTEA-2020 Task: Chinese Grammatical...
research
08/17/2023

Chinese Spelling Correction as Rephrasing Language Model

This paper studies Chinese Spelling Correction (CSC), which aims to dete...
research
06/03/2021

Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction

We investigate the problem of Chinese Grammatical Error Correction (CGEC...
research
05/28/2023

Rethinking Masked Language Modeling for Chinese Spelling Correction

In this paper, we study Chinese Spelling Correction (CSC) as a joint dec...

Please sign up or login with your details

Forgot password? Click here to reset