Measuring Lexical Diversity in Texts: The Twofold Length Problem

07/10/2023
by   Yves Bestgen, et al.
0

The impact of text length on the estimation of lexical diversity has captured the attention of the scientific community for more than a century. Numerous indices have been proposed, and many studies have been conducted to evaluate them, but the problem remains. This methodological review provides a critical analysis not only of the most commonly used indices in language learning studies, but also of the length problem itself, as well as of the methodology for evaluating the proposed solutions. The analysis of three datasets of English language-learners' texts revealed that indices that reduce all texts to the same length using a probabilistic or an algorithmic approach solve the length dependency problem; however, all these indices failed to address the second problem, which is their sensitivity to the parameter that determines the length to which the texts are reduced. The paper concludes with recommendations for optimizing lexical diversity analysis.

READ FULL TEXT

page 14

page 17

page 26

page 28

page 29

page 30

page 31

page 32

research
02/07/2015

An investigation into language complexity of World-of-Warcraft game-external texts

We present a language complexity analysis of World of Warcraft (WoW) com...
research
01/03/2023

Measuring the diversity of data and metadata in digital libraries

Diversity indices have been traditionally used to capture the biodiversi...
research
05/16/2022

Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts using Multilayer Networks

Discourse cohesion facilitates text comprehension and helps the reader f...
research
03/08/2023

Lexical Complexity Prediction: An Overview

The occurrence of unknown words in texts significantly hinders reading c...
research
04/27/2021

Top-tier and predatory alike? A lexical structure perspective from the Academy of Management Journal and Espacios

This study compares the lexical structure of articles titles and abstrac...
research
05/17/2020

Analyzing the relationship between text features and research proposal productivity

Predicting the output of research grants is of considerable relevance to...
research
09/12/2022

Lexical Simplification Benchmarks for English, Portuguese, and Spanish

Even in highly-developed countries, as many as 15-30% of the population ...

Please sign up or login with your details

Forgot password? Click here to reset