Cross-corpus Readability Compatibility Assessment for English Texts

06/16/2023
by   Zhenzhen Li, et al.
0

Text readability assessment has gained significant attention from researchers in various domains. However, the lack of exploration into corpus compatibility poses a challenge as different research groups utilize different corpora. In this study, we propose a novel evaluation framework, Cross-corpus text Readability Compatibility Assessment (CRCA), to address this issue. The framework encompasses three key components: (1) Corpus: CEFR, CLEC, CLOTH, NES, OSP, and RACE. Linguistic features, GloVe word vector representations, and their fusion features were extracted. (2) Classification models: Machine learning methods (XGBoost, SVM) and deep learning methods (BiLSTM, Attention-BiLSTM) were employed. (3) Compatibility metrics: RJSD, RRNSS, and NDCG metrics. Our findings revealed: (1) Validated corpus compatibility, with OSP standing out as significantly different from other datasets. (2) An adaptation effect among corpora, feature representations, and classification methods. (3) Consistent outcomes across the three metrics, validating the robustness of the compatibility assessment framework. The outcomes of this study offer valuable insights into corpus selection, feature representation, and classification methods, and it can also serve as a beginning effort for cross-corpus transfer learning.

READ FULL TEXT

page 17

page 18

page 20

page 21

page 22

page 23

page 24

page 35

research
09/19/2019

A Corpus for Automatic Readability Assessment and Text Simplification of German

In this paper, we present a corpus for use in automatic readability asse...
research
07/05/2020

Learning Color Compatibility in Fashion Outfits

Color compatibility is important for evaluating the compatibility of a f...
research
01/12/2021

TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation

We investigate how to solve the cross-corpus news recommendation for uns...
research
09/02/2023

Data Repurposing through Compatibility: A Computational Perspective

Reuse of data in new contexts beyond the purposes for which it was origi...
research
01/28/2021

Learning Matching Representations for Individualized Organ Transplantation Allocation

Organ transplantation is often the last resort for treating end-stage il...
research
11/17/2022

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

Approximately 1.2 As a result, automatic dysphonic voice detection has a...
research
07/28/2021

Investigating Text Simplification Evaluation

Modern text simplification (TS) heavily relies on the availability of go...

Please sign up or login with your details

Forgot password? Click here to reset