Complex Word Identification: Challenges in Data Annotation and System Performance

10/13/2017
by   Marcos Zampieri, et al.
0

This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task. We use ensemble classifiers to investigate how well computational methods can discriminate between complex and non-complex words. Furthermore, we analyze the classification performance to understand what makes lexical complexity challenging. Our findings show that most systems performed poorly on the SemEval CWI dataset, and one of the reasons for that is the way in which human annotation was performed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

Predicting Lexical Complexity in English Texts

The first step in most text simplification is to predict which words are...
research
03/16/2020

CompLex — A New Corpus for Lexical Complexity Predicition from Likert Scale Data

Predicting which words are considered hard to understand for a given tar...
research
05/12/2020

Detecting Multiword Expression Type Helps Lexical Complexity Assessment

Multiword expressions (MWEs) represent lexemes that should be treated as...
research
05/18/2021

LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction

This paper describes team LCP-RIT's submission to the SemEval-2021 Task ...
research
05/15/2022

Domain Adaptation in Multilingual and Multi-Domain Monolingual Settings for Complex Word Identification

Complex word identification (CWI) is a cornerstone process towards prope...
research
05/05/2022

One Size Does Not Fit All: The Case for Personalised Word Complexity Models

Complex Word Identification (CWI) aims to detect words within a text tha...
research
09/30/2016

Discriminating Similar Languages: Evaluations and Explorations

We present an analysis of the performance of machine learning classifier...

Please sign up or login with your details

Forgot password? Click here to reset