DeepAI AI Chat
Log In Sign Up

Nefnir: A high accuracy lemmatizer for Icelandic

by   Svanhvít Lilja Ingólfsdóttir, et al.
Reykjavik University
Háskóli Íslands

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55 text tagged with a PoS tagger, the accuracy obtained is 96.88


page 1

page 2

page 3

page 4


A Morphology-aware Network for Morphological Disambiguation

Agglutinative languages such as Turkish, Finnish and Hungarian require m...

Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis

Spell checking and morphological analysis are two fundamental tasks in t...

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Increased popularity of different text representations has also brought ...

CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

A morphological analyzer, which is a significant component of many natur...

Morphological Analyzer and Generator for Russian and Ukrainian Languages

pymorphy2 is a morphological analyzer and generator for Russian and Ukra...

Building a Syllable Database to Solve the Problem of Khmer Word Segmentation

Word segmentation is a basic problem in natural language processing. Wit...

Improving Yorùbá Diacritic Restoration

Yorùbá is a widely spoken West African language with a writing system ri...