CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

09/17/2021
by   Morteza Naserzade, et al.
0

A morphological analyzer, which is a significant component of many natural language processing applications especially for morphologically rich languages, divides an input word into all its composing morphemes and identifies their morphological roles. In this paper, we introduce a comprehensive morphological analyzer for Central Kurdish (CK), a low-resourced language with a rich morphology. Building upon the limited existing literature, we first assembled and systematically categorized a comprehensive collection of the morphological and morphophonological rules of the language. Additionally, we collected and manually labeled a generative lexicon containing nearly 10,000 verb, noun and adjective stems, named entities, and other types of word stems. We used these rule sets and resources to implement CKMorph Analyzer based on finite-state transducers. In order to provide a benchmark for future research, we collected, manually labeled, and publicly shared test sets for evaluating accuracy and coverage of the analyzer. CKMorph was able to correctly analyze 95.9 accuracy test set, containing 1,000 CK words morphologically analyzed according to the context. Moreover, CKMorph gave at least one analysis for 95.5 CK tokens of the coverage test set. The demonstration of the application and resources including CK verb database and test sets are openly accessible at https://github.com/CKMorph.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2015

An implementation of Apertium based Assamese morphological analyzer

Morphological Analysis is an important branch of linguistics for any Nat...
research
07/27/2019

Nefnir: A high accuracy lemmatizer for Icelandic

Lemmatization, finding the basic morphological form of a word in a corpu...
research
05/03/2020

Bootstrapping Techniques for Polysynthetic Morphological Analysis

Polysynthetic languages have exceptionally large and sparse vocabularies...
research
07/23/2017

Rule-Based Spanish Morphological Analyzer Built From Spell Checking Lexicon

Preprocessing tools for automated text analysis have become more widely ...
research
12/02/2019

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

In this paper, we present the first publicly available part-of-speech an...
research
08/10/2023

Spatial Pathomics Toolkit for Quantitative Analysis of Podocyte Nuclei with Histology and Spatial Transcriptomics Data in Renal Pathology

Podocytes, specialized epithelial cells that envelop the glomerular capi...
research
07/07/2021

SinSpell: A Comprehensive Spelling Checker for Sinhala

We have built SinSpell, a comprehensive spelling checker for the Sinhala...

Please sign up or login with your details

Forgot password? Click here to reset