MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment

09/10/2021
by   Kepa Bengoetxea, et al.
0

Readability assessment is the task of determining how difficult or easy a text is or which level/grade it has. Traditionally, language dependent readability formula have been used, but these formulae take few text characteristics into account. However, Natural Language Processing (NLP) tools that assess the complexity of texts are able to measure more different features and can be adapted to different languages. In this paper, we present the MultiAzterTest tool: (i) an open source NLP tool which analyzes texts on over 125 measures of cohesion,language, and readability for English, Spanish and Basque, but whose architecture is designed to easily adapt other languages; (ii) readability assessment classifiers that improve the performance of Coh-Metrix in English, Coh-Metrix-Esp in Spanish and ErreXail in Basque; iii) a web tool. MultiAzterTest obtains 90.09 three reading levels (elementary, intermediate, and advanced) in English and 95.50 (simple and complex) using a SMO classifier. Using cross-lingual features, MultiAzterTest also obtains competitive results above all in a complex vs simple distinction.

READ FULL TEXT
research
01/16/2023

XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual Understanding (XLU)

Natural Language Processing systems are heavily dependent on the availab...
research
08/10/2016

An assessment of orthographic similarity measures for several African languages

Natural Language Interfaces and tools such as spellcheckers and Web sear...
research
08/01/2020

LXPER Index: a curriculum-specific text readability assessment model for EFL students in Korea

Automatic readability assessment is one of the most important applicatio...
research
02/11/2018

Distributed Readability Analysis Of Turkish Elementary School Textbooks

The readability assessment deals with estimating the level of difficulty...
research
12/17/2021

NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese

This paper presents and makes publicly available the NILC-Metrix, a comp...
research
11/23/2020

An Interactive Foreign Language Trainer Using Assessment and Feedback Modalities

English has long been set as the universal language. Basically most, if ...
research
02/08/2018

Biomedical term normalization of EHRs with UMLS

This paper presents a novel prototype for biomedical term normalization ...

Please sign up or login with your details

Forgot password? Click here to reset