DeepAI AI Chat
Log In Sign Up

Nefnir: A high accuracy lemmatizer for Icelandic

07/27/2019
by   Svanhvít Lilja Ingólfsdóttir, et al.
Reykjavik University
Háskóli Íslands
0

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55 text tagged with a PoS tagger, the accuracy obtained is 96.88

READ FULL TEXT

page 1

page 2

page 3

page 4

02/13/2017

A Morphology-aware Network for Morphological Disambiguation

Agglutinative languages such as Turkish, Finnish and Hungarian require m...
09/14/2021

Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis

Spell checking and morphological analysis are two fundamental tasks in t...
02/13/2020

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Increased popularity of different text representations has also brought ...
09/17/2021

CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

A morphological analyzer, which is a significant component of many natur...
03/25/2015

Morphological Analyzer and Generator for Russian and Ukrainian Languages

pymorphy2 is a morphological analyzer and generator for Russian and Ukra...
03/07/2017

Building a Syllable Database to Solve the Problem of Khmer Word Segmentation

Word segmentation is a basic problem in natural language processing. Wit...
03/23/2020

Improving Yorùbá Diacritic Restoration

Yorùbá is a widely spoken West African language with a writing system ri...