NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts

05/25/2023
by   Yue Zhang, et al.
0

We introduce NaSGEC, a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains. Previous CGEC research primarily focuses on correcting texts from a single domain, especially learner essays. To broaden the target domain, we annotate multiple references for 12,500 sentences from three native domains, i.e., social media, scientific writing, and examination. We provide solid benchmark results for NaSGEC by employing cutting-edge CGEC models and different training data. We further perform detailed analyses of the connections and gaps between our domains from both empirical and statistical views. We hope this work can inspire future studies on an important but under-explored direction–cross-domain GEC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2020

Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

Evaluation of grammatical error correction (GEC) systems has primarily f...
research
10/22/2022

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Grammatical Error Correction (GEC) has been broadly applied in automatic...
research
10/19/2022

Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction

Chinese Grammatical Error Correction (CGEC) is both a challenging NLP ta...
research
08/11/2022

Overview of CTC 2021: Chinese Text Correction for Native Speakers

In this paper, we present an overview of the CTC 2021, a Chinese text co...
research
05/13/2022

MuCPAD: A Multi-Domain Chinese Predicate-Argument Dataset

During the past decade, neural network models have made tremendous progr...
research
10/25/2022

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

Research on Korean grammatical error correction (GEC) is limited compare...

Please sign up or login with your details

Forgot password? Click here to reset