CSED: A Chinese Semantic Error Diagnosis Corpus

05/09/2023
by   Bo Sun, et al.
0

Recently, much Chinese text error correction work has focused on Chinese Spelling Check (CSC) and Chinese Grammatical Error Diagnosis (CGED). In contrast, little attention has been paid to the complicated problem of Chinese Semantic Error Diagnosis (CSED), which lacks relevant datasets. The study of semantic errors is important because they are very common and may lead to syntactic irregularities or even problems of comprehension. To investigate this, we build the CSED corpus, which includes two datasets. The one is for the CSED-Recognition (CSED-R) task. The other is for the CSED-Correction (CSED-C) task. Our annotation guarantees high-quality data through quality assurance mechanisms. Our experiments show that powerful pre-trained models perform poorly on this corpus. We also find that the CSED task is challenging, as evidenced by the fact that even humans receive a low score. This paper proposes syntax-aware models to specifically adapt to the CSED task. The experimental results show that the introduction of the syntax-aware approach is meaningful.

READ FULL TEXT
research
04/15/2022

Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition

Existing Chinese text error detection mainly focuses on spelling and sim...
research
10/22/2022

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Grammatical Error Correction (GEC) has been broadly applied in automatic...
research
08/27/2020

Adaptable Filtering using Hierarchical Embeddings for Chinese Spell Check

Spell check is a useful application which involves processing noisy huma...
research
12/30/2021

YACLC: A Chinese Learner Corpus with Multidimensional Annotation

Learner corpus collects language data produced by L2 learners, that is s...
research
11/16/2021

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Due to the recent advances of natural language processing, several works...
research
11/16/2022

CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME

Chinese Spelling Correction (CSC) is a task to detect and correct spelli...
research
11/05/2021

A Syntax-Guided Grammatical Error Correction Model with Dependency Tree Correction

Grammatical Error Correction (GEC) is a task of detecting and correcting...

Please sign up or login with your details

Forgot password? Click here to reset