Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity

10/20/2022
by   Jiahao Li, et al.
0

Chinese spelling check (CSC) is a fundamental NLP task that detects and corrects spelling errors in Chinese texts. As most of these spelling errors are caused by phonetic similarity, effectively modeling the pronunciation of Chinese characters is a key factor for CSC. In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction (CPP) to improve CSC, and, for the first time, systematically discuss the adaptivity and granularity of this auxiliary task. We propose SCOPE which builds on top of a shared encoder two parallel decoders, one for the primary CSC task and the other for a fine-grained auxiliary CPP task, with a novel adaptive weighting scheme to balance the two tasks. In addition, we design a delicate iterative correction strategy for further improvements during inference. Empirical evaluation shows that SCOPE achieves new state-of-the-art on three CSC benchmarks, demonstrating the effectiveness and superiority of the auxiliary CPP task. Comprehensive ablation studies further verify the positive effects of adaptivity and granularity of the task. Code and data used in this paper are publicly available at https://github.com/jiahaozhenbang/SCOPE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2023

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

Pretrained language models (PLMs) have shown marvelous improvements acro...
research
03/23/2023

Retrieval-Augmented Classification with Decoupled Representation

Pretrained language models (PLMs) have shown marvelous improvements acro...
research
08/16/2023

RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

Chinese Spelling Check (CSC) refers to the detection and correction of s...
research
06/09/2017

Overview of the NLPCC 2017 Shared Task: Chinese News Headline Categorization

In this paper, we give an overview for the shared task at the CCF Confer...
research
04/15/2021

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

Chinese pre-trained language models usually process text as a sequence o...
research
04/03/2019

Multi-task Learning for Chinese Word Usage Errors Detection

Chinese word usage errors often occur in non-native Chinese learners' wr...
research
04/07/2023

T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Passage ranking involves two stages: passage retrieval and passage re-ra...

Please sign up or login with your details

Forgot password? Click here to reset