Adaptable Filtering using Hierarchical Embeddings for Chinese Spell Check

08/27/2020
by   Minh Nguyen, et al.
0

Spell check is a useful application which involves processing noisy human-generated text. Compared to other languages like English, it is more challenging to detect and correct spelling errors in Chinese because it has more (up to 100k) characters. For Chinese spell check, using confusion sets narrows the search space and makes finding corrections easier. However, most, if not all, confusion sets used to date are fixed and thus do not include new, evolving error patterns. We propose a scalable approach to adapt confusion sets by exploiting hierarchical character embeddings to (1) obviate the need to handcraft confusion sets, and (2) resolve sparsity issues related to seldom-occurring errors. Our approach establishes new SOTA results in spelling error correction on the 2014 and 2015 Chinese Spelling Correction Bake-off datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2021

An Alignment-Agnostic Model for Chinese Text Error Correction

This paper investigates how to correct Chinese text errors with types of...
research
05/09/2023

CSED: A Chinese Semantic Error Diagnosis Corpus

Recently, much Chinese text error correction work has focused on Chinese...
research
07/18/2023

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Recently, the development and progress of Large Language Models (LLMs) h...
research
08/07/2023

Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism

Chinese Automatic Speech Recognition (ASR) error correction presents sig...
research
10/25/2022

A Chinese Spelling Check Framework Based on Reverse Contrastive Learning

Chinese spelling check is a task to detect and correct spelling mistakes...
research
10/19/2022

Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction

Chinese Grammatical Error Correction (CGEC) is both a challenging NLP ta...
research
11/16/2022

CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME

Chinese Spelling Correction (CSC) is a task to detect and correct spelli...

Please sign up or login with your details

Forgot password? Click here to reset