A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity

03/29/2016
by   Ildikó Pilán, et al.
0

Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level. In this paper, we aim at addressing this issue by presenting the first approach for predicting linguistic complexity for Swedish second language learning material on a 5-point scale. After showing that the traditional Swedish readability measure, Läsbarhetsindex (LIX), is not suitable for this task, we propose a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level. Our model obtained an accuracy of 81.3 F-score of 0.8, which is comparable to the state of the art in English and is considerably higher than previously reported results for other languages. We further studied the utility of our features with single sentences instead of full texts since sentences are a common linguistic unit in language learning exercises. We trained a separate model on sentence-level data with five classes, which yielded 63.4 level performance, we achieved an adjacent accuracy of 92 found that using a combination of different features, compared to using lexical features alone, resulted in 7 sentence level, whereas at the document level, lexical features were more dominant. Our models are intended for use in a freely accessible web-based language learning platform for the automatic generation of exercises.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2020

Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL

The goal of this work is to build a classifier that can identify text co...
research
07/31/2021

Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts

In order to ensure quality and effective learning, fluency, and comprehe...
research
06/06/2023

Exploring Linguistic Features for Turkish Text Readability

This paper presents the first comprehensive study on automatic readabili...
research
08/21/2023

Age Recommendation from Texts and Sentences for Children

Children have less text understanding capability than adults. Moreover, ...
research
05/14/2021

DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing

We present DaLAJ 1.0, a Dataset for Linguistic Acceptability Judgments f...
research
10/01/2021

Under the Microscope: Interpreting Readability Assessment Models for Filipino

Readability assessment is the process of identifying the level of ease o...
research
12/03/2015

Predicting the top and bottom ranks of billboard songs using Machine Learning

The music industry is a 130 billion industry. Predicting whether a song ...

Please sign up or login with your details

Forgot password? Click here to reset