MCTS: A Multi-Reference Chinese Text Simplification Dataset

06/05/2023
by   Ruining Chong, et al.
0

Text simplification aims to make the text easier to understand by applying rewriting transformations. There has been very little research on Chinese text simplification for a long time. The lack of generic evaluation data is an essential reason for this phenomenon. In this paper, we introduce MCTS, a multi-reference Chinese text simplification dataset. We describe the annotation process of the dataset and provide a detailed analysis of it. Furthermore, we evaluate the performance of some unsupervised methods and advanced large language models. We hope to build a basic understanding of Chinese text simplification through the foundational work and provide references for future research. We release our data at https://github.com/blcuicall/mcts.

READ FULL TEXT

page 5

page 6

research
09/12/2022

CSL: A Large-scale Chinese Scientific Literature Dataset

Scientific literature serves as a high-quality corpus, supporting a lot ...
research
04/23/2022

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

This paper presents MuCGEC, a multi-reference multi-source evaluation da...
research
09/14/2023

C-Pack: Packaged Resources To Advance General Chinese Embedding

We introduce C-Pack, a package of resources that significantly advance t...
research
06/03/2021

PsyQA: A Chinese Dataset for Generating Long Counseling Text for Mental Health Support

Great research interests have been attracted to devise AI services that ...
research
12/18/2017

A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction

Abbreviation is a common phenomenon across languages, especially in Chin...
research
11/13/2020

RethinkCWS: Is Chinese Word Segmentation a Solved Task?

The performance of the Chinese Word Segmentation (CWS) systems has gradu...
research
01/13/2021

On consistency scores in text data with an implementation in R

In this paper, we introduce a reproducible cleaning process for the text...

Please sign up or login with your details

Forgot password? Click here to reset