RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

08/16/2023
by   Siqi Song, et al.
0

Chinese Spelling Check (CSC) refers to the detection and correction of spelling errors in Chinese texts. In practical application scenarios, it is important to make CSC models have the ability to correct errors across different domains. In this paper, we propose a retrieval-augmented spelling check framework called RSpell, which searches corresponding domain terms and incorporates them into CSC models. Specifically, we employ pinyin fuzzy matching to search for terms, which are combined with the input and fed into the CSC model. Then, we introduce an adaptive process control mechanism to dynamically adjust the impact of external knowledge on the model. Additionally, we develop an iterative strategy for the RSpell framework to enhance reasoning capabilities. We conducted experiments on CSC datasets in three domains: law, medicine, and official document writing. The results demonstrate that RSpell achieves state-of-the-art performance in both zero-shot and fine-tuning scenarios, demonstrating the effectiveness of the retrieval-augmented CSC framework. Our code is available at https://github.com/47777777/Rspell.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2023

UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation

Large Language Models (LLMs) are popular for their impressive abilities,...
research
10/20/2022

Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity

Chinese spelling check (CSC) is a fundamental NLP task that detects and ...
research
06/06/2022

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

Recent work has shown that small distilled language models are strong co...
research
03/21/2022

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

The lack of label data is one of the significant bottlenecks for Chinese...
research
12/12/2022

In Defense of Cross-Encoders for Zero-Shot Retrieval

Bi-encoders and cross-encoders are widely used in many state-of-the-art ...
research
08/01/2023

Towards Effective Ancient Chinese Translation: Dataset, Model, and Evaluation

Interpreting ancient Chinese has been the key to comprehending vast Chin...
research
11/15/2022

Chinese Spelling Check with Nearest Neighbors

Chinese Spelling Check (CSC) aims to detect and correct error tokens in ...

Please sign up or login with your details

Forgot password? Click here to reset