Analysis and representation of Igbo text document for a text-based system

09/05/2020
by   Ifeanyi-Reuben Nkechi J., et al.
0

The advancement in Information Technology (IT) has assisted in inculcating the three Nigeria major languages in text-based application such as text mining, information retrieval and natural language processing. The interest of this paper is the Igbo language, which uses compounding as a common type of word formation and as well has many vocabularies of compound words. The issues of collocation, word ordering and compounding play high role in Igbo language. The ambiguity in dealing with these compound words has made the representation of Igbo language text document very difficult because this cannot be addressed using the most common and standard approach of the Bag-Of-Words (BOW) model of text representation, which ignores the word order and relation. However, this cause for a concern and the need to develop an improved model to capture this situation. This paper presents the analysis of Igbo language text document, considering its compounding nature and describes its representation with the Word-based N-gram model to properly prepare it for any text-based application. The result shows that Bigram and Trigram n-gram text representation models provide more semantic information as well addresses the issues of compounding, word ordering and collocations which are the major language peculiarities in Igbo. They are likely to give better performance when used in any Igbo text-based system.

READ FULL TEXT
research
12/18/2017

word representation or word embedding in Persian text

Text processing is one of the sub-branches of natural language processin...
research
03/02/2018

Hybrid Model For Word Prediction Using Naive Bayes and Latent Information

Historically, the Natural Language Processing area has been given too mu...
research
02/25/2020

Declarative Memory-based Structure for the Representation of Text Data

In the era of intelligent computing, computational progress in text proc...
research
03/23/2016

The Anatomy of a Search and Mining System for Digital Archives

Samtla (Search And Mining Tools with Linguistic Analysis) is a digital h...
research
10/26/2020

Syllabification of the Divine Comedy

We provide a syllabification algorithm for the Divine Comedy using techn...
research
05/05/2023

A Model for Translation of Text from Indian Languages to Bharti Braille Characters

People who are visually impaired face a lot of difficulties while studyi...
research
01/05/2022

Some Strategies to Capture Karaka-Yogyata with Special Reference to apadana

In today's digital world language technology has gained importance. Seve...

Please sign up or login with your details

Forgot password? Click here to reset