Predicting Lexical Complexity in English Texts

02/17/2021
by   Matthew Shardlow, et al.
3

The first step in most text simplification is to predict which words are considered complex for a given target population before carrying out lexical substitution. This task is commonly referred to as Complex Word Identification (CWI) and it is often modelled as a supervised classification problem. For training such systems, annotated datasets in which words and sometimes multi-word expressions are labelled regarding complexity are required. In this paper we analyze previous work carried out in this task and investigate the properties of complex word identification datasets for English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2020

CompLex — A New Corpus for Lexical Complexity Predicition from Likert Scale Data

Predicting which words are considered hard to understand for a given tar...
research
10/13/2017

Complex Word Identification: Challenges in Data Annotation and System Performance

This paper revisits the problem of complex word identification (CWI) fol...
research
03/08/2023

Lexical Complexity Prediction: An Overview

The occurrence of unknown words in texts significantly hinders reading c...
research
05/15/2022

Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification

Complex word identification (CWI) is a cornerstone process towards prope...
research
05/12/2020

Detecting Multiword Expression Type Helps Lexical Complexity Assessment

Multiword expressions (MWEs) represent lexemes that should be treated as...
research
10/29/2021

Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings in the Spanish Press

This paper summarizes the main findings of the ADoBo 2021 shared task, p...
research
09/12/2022

Lexical Simplification Benchmarks for English, Portuguese, and Spanish

Even in highly-developed countries, as many as 15-30% of the population ...

Please sign up or login with your details

Forgot password? Click here to reset