ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

09/19/2022
by   Kai North, et al.
0

Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions. To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605 candidate substitutions for 387 complex words. ALEXSIS-PT has been compiled following the ALEXSIS protocol for Spanish opening exciting new avenues for cross-lingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and BERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Toward Cross-Lingual Definition Generation for Language Learners

Generating dictionary definitions automatically can prove useful for lan...
research
05/19/2023

Deep Learning Approaches to Lexical Simplification: A Survey

Lexical Simplification (LS) is the task of replacing complex for simpler...
research
03/10/2020

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

We introduce Multi-SimLex, a large-scale lexical resource and evaluation...
research
10/09/2014

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simp...
research
03/03/2020

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Cross-lingual entity linking (XEL) is the task of finding referents in a...
research
05/08/2018

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Gender prediction has typically focused on lexical and social network fe...
research
09/17/2020

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Recognizing toponyms and resolving them to their real-world referents is...

Please sign up or login with your details

Forgot password? Click here to reset