CEFR-Based Sentence Difficulty Annotation and Assessment

10/21/2022
by   Yuki Arase, et al.
0

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5 baselines employed in readability assessment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Towards Massively Multi-domain Multilingual Readability Assessment

We present ReadMe++, a massively multi-domain multilingual dataset for a...
research
06/17/2019

Manipulating the Difficulty of C-Tests

We propose two novel manipulation strategies for increasing and decreasi...
research
12/20/2022

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

We present, Naamapadam, the largest publicly available Named Entity Reco...
research
10/18/2020

hinglishNorm – A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

We present hinglishNorm – a human annotated corpus of Hindi-English code...
research
11/23/2020

An Interactive Foreign Language Trainer Using Assessment and Feedback Modalities

English has long been set as the universal language. Basically most, if ...
research
11/23/2019

Discourse Level Factors for Sentence Deletion in Text Simplification

This paper presents a data-driven study focusing on analyzing and predic...
research
09/26/2020

ARPA: Armenian Paraphrase Detection Corpus and Models

In this work, we employ a semi-automatic method based on back translatio...

Please sign up or login with your details

Forgot password? Click here to reset