Chinese Spelling Correction as Rephrasing Language Model

08/17/2023
by   Linfeng Liu, et al.
0

This paper studies Chinese Spelling Correction (CSC), which aims to detect and correct potential spelling errors in a given sentence. Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs. However, we note a critical flaw in the process of tagging one character to another, that the correction is excessively conditioned on the error. This is opposite from human mindset, where individuals rephrase the complete sentence based on its semantics, rather than solely on the error patterns memorized before. Such a counter-intuitive learning process results in the bottleneck of generalizability and transferability of machine spelling correction. To address this, we propose Rephrasing Language Modeling (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging. This novel training paradigm achieves the new state-of-the-art results across fine-tuned and zero-shot CSC benchmarks, outperforming previous counterparts by a large margin. Our method also learns transferable language representation when CSC is jointly trained with other tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Rethinking Masked Language Modeling for Chinese Spelling Correction

In this paper, we study Chinese Spelling Correction (CSC) as a joint dec...
research
03/01/2022

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Whole word masking (WWM), which masks all subwords corresponding to a wo...
research
08/31/2023

DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

We present DictaBERT, a new state-of-the-art pre-trained BERT model for ...
research
05/15/2020

Spelling Error Correction with Soft-Masked BERT

Spelling error correction is an important yet challenging task because a...
research
09/15/2022

uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers

The task of Chinese Spelling Check (CSC) is aiming to detect and correct...
research
12/28/2020

Universal Sentence Representation Learning with Conditional Masked Language Model

This paper presents a novel training method, Conditional Masked Language...
research
02/28/2019

Better, Faster, Stronger Sequence Tagging Constituent Parsers

Sequence tagging models for constituent parsing are faster, but less acc...

Please sign up or login with your details

Forgot password? Click here to reset