Automatic Difficulty Classification of Arabic Sentences

by   Nouran Khallaf, et al.

In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex. We compare the use of sentence embeddings of different kinds (fastText, mBERT , XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. Our best results have been achieved using fined-tuned Arabic-BERT. The accuracy of our 3-way CEFR classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification respectively and 0.71 Spearman correlation for regression. Our binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for sentence-pair semantic similarity classifier.



There are no comments yet.


page 1

page 2

page 3

page 4


I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

Treebanks are valuable linguistic resources that include the syntactic s...

DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing

We present DaLAJ 1.0, a Dataset for Linguistic Acceptability Judgments f...

Classifier Ensembles for Dialect and Language Variety Identification

In this paper we present ensemble-based systems for dialect and language...

Arabic aspect based sentiment analysis using BERT

Aspect-based sentiment analysis(ABSA) is a textual analysis methodology ...

An Arabic Dependency Treebank in the Travel Domain

In this paper we present a dependency treebank of travel domain sentence...

The Success of AdaBoost and Its Application in Portfolio Management

We develop a novel approach to explain why AdaBoost is a successful clas...

Supporting Language Learners with the Meanings Of Closed Class Items

The process of language learning involves the mastery of countless tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.