An Alignment-Agnostic Model for Chinese Text Error Correction

04/15/2021
by   Liying Zheng, et al.
0

This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which is common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters errors, but they cannot deal with missing or redundant characters. The reason is that lengths of sentences before and after correction are not the same, leading to the inconsistence between model inputs and outputs. Although the Seq2Seq-based or sequence tagging methods provide solutions to the problem and achieved relatively good results on English context, but they do not perform well in Chinese context according to our experimental results. In our work, we propose a novel detect-correct framework which is alignment-agnostic, meaning that it can handle both text aligned and non-aligned occasions, and it can also serve as a cold start model when there are no annotated data provided. Experimental results on three datasets demonstrate that our method is effective and achieves the best performance among existing published models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2022

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Grammatical Error Correction (GEC) has been broadly applied in automatic...
research
08/27/2020

Adaptable Filtering using Hierarchical Embeddings for Chinese Spell Check

Spell check is a useful application which involves processing noisy huma...
research
03/02/2022

The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling...
research
03/05/2018

Automatic Transferring between Ancient Chinese and Contemporary Chinese

During the long time of development, Chinese language has evolved a grea...
research
05/24/2023

Disentangled Phonetic Representation for Chinese Spelling Correction

Chinese Spelling Correction (CSC) aims to detect and correct erroneous c...
research
02/01/2021

Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning

The majority of Chinese characters are monophonic, i.e.their pronunciati...
research
05/31/2021

Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models

A sequence-to-sequence learning with neural networks has empirically pro...

Please sign up or login with your details

Forgot password? Click here to reset