Challenges in Developing LRs for Non-Scheduled Languages: A Case of Magahi

11/30/2021
by   Ritesh Kumar, et al.
0

Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts of India. Despite having a significant number of speakers, there has been virtually no language resource (LR) or language technology (LT) developed for the language, mainly because of its status as a non-scheduled language. The present paper describes an attempt to develop an annotated corpus of Magahi. The data is mainly taken from a couple of blogs in Magahi, some collection of stories in Magahi and the recordings of conversation in Magahi and it is annotated at the POS level using BIS tagset.

READ FULL TEXT
research
01/13/2022

Speech Resources in the Tamasheq Language

In this paper we present two datasets for Tamasheq, a developing languag...
research
04/06/2022

Language Resources and Technologies for Non-Scheduled and Endangered Indian Languages

In the present paper, we will present a survey of the language resources...
research
04/12/2022

Not always about you: Prioritizing community needs when developing endangered language technology

Languages are classified as low-resource when they lack the quantity of ...
research
07/04/2018

A Formal Ontology-Based Classification of Lexemes and its Applications

The paper describes the enrichment of OntoSenseNet - a verb-centric lexi...
research
05/19/2022

Curras + Baladi: Towards a Levantine Corpus

The processing of the Arabic language is a complex field of research. Th...
research
05/11/2020

Luganda Text-to-Speech Machine

In Uganda, Luganda is the most spoken native language. It is used for co...
research
05/07/2020

The Danish Gigaword Project

Danish is a North Germanic/Scandinavian language spoken primarily in Den...

Please sign up or login with your details

Forgot password? Click here to reset