YACLC: A Chinese Learner Corpus with Multidimensional Annotation

12/30/2021
by   Yingying Wang, et al.
0

Learner corpus collects language data produced by L2 learners, that is second or foreign-language learners. This resource is of great relevance for second language acquisition research, foreign-language teaching, and automatic grammatical error correction. However, there is little focus on learner corpus for Chinese as Foreign Language (CFL) learners. Therefore, we propose to construct a large-scale, multidimensional annotated Chinese learner corpus. To construct the corpus, we first obtain a large number of topic-rich texts generated by CFL learners. Then we design an annotation scheme including a sentence acceptability score as well as grammatical error and fluency-based corrections. We build a crowdsourcing platform to perform the annotation effectively (https://yaclc.wenmind.net). We name the corpus YACLC (Yet Another Chinese Learner Corpus) and release it as part of the CUGE benchmark (http://cuge.baai.ac.cn). By analyzing the original sentences and annotations in the corpus, we found that YACLC has a considerable size and very high annotation quality. We hope this corpus can further enhance the studies on Chinese International Education and Chinese automatic grammatical error correction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2023

Is Argument Structure of Learner Chinese Understandable: A Corpus-Based Analysis

This paper presents a corpus-based analysis of argument structure errors...
research
04/23/2022

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

This paper presents MuCGEC, a multi-reference multi-source evaluation da...
research
04/22/2016

SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies

We present a new resource for Swedish, SweLL, a corpus of Swedish Learne...
research
10/05/2020

Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Many English-as-a-second language learners have trouble using near-synon...
research
05/09/2023

CSED: A Chinese Semantic Error Diagnosis Corpus

Recently, much Chinese text error correction work has focused on Chinese...
research
10/25/2022

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

Research on Korean grammatical error correction (GEC) is limited compare...
research
03/14/2022

Interpretability for Language Learners Using Example-Based Grammatical Error Correction

Grammatical Error Correction (GEC) should not focus only on high accurac...

Please sign up or login with your details

Forgot password? Click here to reset