An Improved Classification Model for Igbo Text Using N-Gram And K-Nearest Neighbour Approaches

04/01/2020
by   Nkechi Ifeanyi-Reuben, et al.
0

This paper presents an improved classification model for Igbo text using N-gram and K-Nearest Neighbour approaches. The N-gram model was used for text representation and the classification was carried out on the text using the K-Nearest Neighbour model. Object-Oriented design methodology is used for the work and is implemented with the Python programming language with tools from Natural Language Toolkit (NLTK). The performance of the Igbo text classification system is measured by computing the precision, recall and F1-measure of the result obtained on Unigram, Bigram and Trigram represented text. The Igbo text classification on bigram represented text has highest degree of exactness (precision); result obtained with three N-gram models has the same level of completeness (recall) while trigram has the lowest level of precision. This shows that the classification on bigram Igbo represented text outperforms unigram and trigram represented texts. Therefore, bigram text representation model is highly recommended for any intelligent text-based system in Igbo language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis

Deceptive text classification is a critical task in natural language pro...
research
06/24/2021

byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings

This article introduces byteSteady – a fast model for classification usi...
research
01/13/2021

On consistency scores in text data with an implementation in R

In this paper, we introduce a reproducible cleaning process for the text...
research
11/29/2019

A Multi-cascaded Deep Model for Bilingual SMS Classification

Most studies on text classification are focused on the English language....
research
12/04/2020

Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

In medical fields, text classification is one of the most important task...
research
08/19/2019

A novel text representation which enables image classifiers to perform text classification, applied to name disambiguation

Patent data are often used to study the process of innovation and resear...
research
07/27/2020

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Next word prediction is an input technology that simplifies the process ...

Please sign up or login with your details

Forgot password? Click here to reset