A User-Centered Evaluation of Spanish Text Simplification

08/15/2023
by   Adrian de Wynter, et al.
0

We present an evaluation of text simplification (TS) in Spanish for a production system, by means of two corpora focused in both complex-sentence and complex-word identification. We compare the most prevalent Spanish-specific readability scores with neural networks, and show that the latter are consistently better at predicting user preferences regarding TS. As part of our analysis, we find that multilingual models underperform against equivalent Spanish-only models on the same task, yet all models focus too often on spurious statistical features, such as sentence length. We release the corpora in our evaluation to the broader community with the hopes of pushing forward the state-of-the-art in Spanish natural language processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2020

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

Arabic dialect identification is a specific task of natural language pro...
research
10/03/2017

MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing

Multilinguality is gradually becoming ubiquitous in the sense that more ...
research
06/09/2021

Probing Multilingual Language Models for Discourse

Pre-trained multilingual language models have become an important buildi...
research
06/12/2017

Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation

We present a framework and its implementation relying on Natural Languag...
research
07/03/2020

El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks

Pre-training large-scale language models (LMs) requires huge amounts of ...
research
04/18/2019

Continual Learning for Sentence Representations Using Conceptors

Distributed representations of sentences have become ubiquitous in natur...
research
09/11/2018

Multilingual Cross-domain Perspectives on Online Hate Speech

In this report, we present a study of eight corpora of online hate speec...

Please sign up or login with your details

Forgot password? Click here to reset