A Baseline Readability Model for Cebuano

In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano's documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87 is similar to previous results in readability assessment for the Filipino language showing potential of crosslingual application. To encourage more work for readability assessment in Philippine languages such as Cebuano, we open-sourced both code and data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2019

Song Hit Prediction: Predicting Billboard Hits Using Spotify Data

In this work, we attempt to solve the Hit Song Science problem, which ai...
research
04/14/2022

Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech

This study investigates whether the phonological features derived from t...
research
09/27/2016

AP16-OL7: A Multilingual Database for Oriental Languages and A Language Recognition Baseline

We present the AP16-OL7 database which was released as the training and ...
research
07/09/2021

Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability Assessment

Deep learning models for automatic readability assessment generally disc...
research
09/25/2021

Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features

We report two essential improvements in readability assessment: 1. three...
research
02/22/2022

NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space

In this paper, we present a unified model that works for both multilingu...
research
12/30/2019

AutoDiscern: Rating the Quality of Online Health Information with Hierarchical Encoder Attention-based Neural Networks

Patients increasingly turn to search engines and online content before, ...

Please sign up or login with your details

Forgot password? Click here to reset