Application of Lexical Features Towards Improvement of Filipino Readability Identification of Children's Literature

by   Joseph Marvin Imperial, et al.

Proper identification of grade levels of children's reading materials is an important step towards effective learning. Recent studies in readability assessment for the English domain applied modern approaches in natural language processing (NLP) such as machine learning (ML) techniques to automate the process. There is also a need to extract the correct linguistic features when modeling readability formulas. In the context of the Filipino language, limited work has been done [1, 2], especially in considering the language's lexical complexity as main features. In this paper, we explore the use of lexical features towards improving the development of readability identification of children's books written in Filipino. Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) used by previous works such as sentence length, average syllable length, polysyllabic words, word, sentence, and phrase counts increased the performance of readability models by almost a 5 of the most important features were shown to identify which features contribute the most in terms of reading complexity.



There are no comments yet.


page 1

page 2

page 3

page 4


Alejandro Mosquera at SemEval-2021 Task 1: Exploring Sentence and Word Features for Lexical Complexity Prediction

This paper revisits feature engineering approaches for predicting the co...

Using Diachronic Distributed Word Representations as Models of Lexical Development in Children

Recent work has shown that distributed word representations can encode a...

The Interplay between Lexical Resources and Natural Language Processing

Incorporating linguistic, world and common sense knowledge into AI/NLP s...

A Visual Distance for WordNet

Measuring the distance between concepts is an important field of study o...

Machine learning approach of Japanese composition scoring and writing aided system's design

Automatic scoring system is extremely complex for any language. Because ...

UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted Features for Lexical Complexity Prediction

Reading is a complex process which requires proper understanding of text...

Feature-rich multiplex lexical networks reveal mental strategies of early language learning

Knowledge in the human mind exhibits a dualistic vector/network nature. ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.