Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

12/02/2019
by   Nelda Kote, et al.
0

In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it. There is currently a lack of available NLP resources for Albanian, and its complex grammar and morphology present challenges to their development. We have created an Albanian part-of-speech corpus based on the Universal Dependencies schema for morphological annotation, containing about 118,000 tokens of naturally occuring text collected from different text sources, with an addition of 67,000 tokens of artificially created simple sentences used only in training. On this corpus, we subsequently train and evaluate segmentation, morphological tagging and lemmatization models, using the Turku Neural Parser Pipeline. On the held-out evaluation set, the model achieves 92.74 85.31 annotated corpus, as well as the trained models are available under an open license.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2018

UniMorph 2.0: Universal Morphology

The Universal Morphology UniMorph project is a collaborative effort to i...
research
11/26/2019

Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

We present experiments with part-of-speech tagging for Bulgarian, a Slav...
research
04/08/2021

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

Morphological analysis (MA) and lexical normalization (LN) are both impo...
research
12/13/2022

Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Copora with Morphological Annotations

This article presents morphologically-annotated Yemeni, Sudanese, Iraqi,...
research
09/19/2021

FST Morphological Analyser and Generator for Mapudüngun

Following the Mapuche grammar by Smeets, this article describes the main...
research
05/30/2023

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Accurate neural models are much less efficient than non-neural models an...
research
09/17/2021

CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish

A morphological analyzer, which is a significant component of many natur...

Please sign up or login with your details

Forgot password? Click here to reset