Unknown Words Analysis in POS tagging of Sinhala Language

01/06/2015
by   A. J. P. M. P. Jayaweera, et al.
0

Part of Speech (POS) is a very vital topic in Natural Language Processing (NLP) task in any language, which involves analysing the construction of the language, behaviours and the dynamics of the language, the knowledge that could be utilized in computational linguistics analysis and automation applications. In this context, dealing with unknown words (words do not appear in the lexicon referred as unknown words) is also an important task, since growing NLP systems are used in more and more new applications. One aid of predicting lexical categories of unknown words is the use of syntactical knowledge of the language. The distinction between open class words and closed class words together with syntactical features of the language used in this research to predict lexical categories of unknown words in the tagging process. An experiment is performed to investigate the ability of the approach to parse unknown words using syntactical knowledge without human intervention. This experiment shows that the performance of the tagging process is enhanced when word class distinction is used together with syntactic rules to parse sentences containing unknown words in Sinhala language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2018

English Out-of-Vocabulary Lexical Evaluation Task

Unlike previous unknown nouns tagging task (Curran, 2005) (Ciaramita and...
research
07/02/2018

The Interplay between Lexical Resources and Natural Language Processing

Incorporating linguistic, world and common sense knowledge into AI/NLP s...
research
12/14/2019

Attending Form and Context to Generate Specialized Out-of-VocabularyWords Representations

We propose a new contextual-compositional neural network layer that hand...
research
10/28/2022

Development of a rule-based lemmatization algorithm through Finite State Machine for Uzbek language

Lemmatization is one of the core concepts in natural language processing...
research
04/14/2022

Usage-based learning of grammatical categories

Human languages use a wide range of grammatical categories to constrain ...
research
12/16/2019

Characterizing the dynamics of learning in repeated reference games

The language we use over the course of conversation changes as we establ...
research
07/14/2020

Deep learning models for representing out-of-vocabulary words

Communication has become increasingly dynamic with the popularization of...

Please sign up or login with your details

Forgot password? Click here to reset