Creating a morphological and syntactic tagged corpus for the Uzbek language

10/27/2022
by   Maksud Sharipov, et al.
0

Nowadays, creation of the tagged corpora is becoming one of the most important tasks of Natural Language Processing (NLP). There are not enough tagged corpora to build machine learning models for the low-resource Uzbek language. In this paper, we tried to fill that gap by developing a novel Part Of Speech (POS) and syntactic tagset for creating the syntactic and morphologically tagged corpus of the Uzbek language. This work also includes detailed description and presentation of a web-based application to work on a tagging as well. Based on the developed annotation tool and the software, we share our experience results of the first stage of the tagged corpus creation

READ FULL TEXT
research
11/30/2019

Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German

This paper presents SwissCrawl, the largest Swiss German text corpus to ...
research
12/03/2021

Creating and Managing a large annotated parallel corpora of Indian languages

This paper presents the challenges in creating and managing large parall...
research
08/18/2017

The Natural Stories Corpus

It is now a common practice to compare models of human language processi...
research
08/25/2022

Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks

Indigenous African languages are categorized as under-served in Artifici...
research
09/22/2021

Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech

While corpora of child speech and child-directed speech (CDS) have enabl...
research
07/16/2021

Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems

This paper proposes a novel statistical corpus analysis framework target...

Please sign up or login with your details

Forgot password? Click here to reset