Developing Universal Dependency Treebanks for Magahi and Braj

04/26/2022
by   Mohit Raj, et al.
0

In this paper, we discuss the development of treebanks for two low-resourced Indian languages - Magahi and Braj based on the Universal Dependencies framework. The Magahi treebank contains 945 sentences and Braj treebank around 500 sentences marked with their lemmas, part-of-speech, morphological features and universal dependencies. This paper gives a description of the different dependency relationship found in the two languages and give some statistics of the two treebanks. The dataset will be made publicly available on Universal Dependency (UD) repository (https://github.com/UniversalDependencies/UD_Magahi-MGTB/tree/master) in the next(v2.10) release.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2020

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

Universal Dependencies is an open community effort to create cross-lingu...
research
11/13/2022

Quantifying syntax similarity with a polynomial representation of dependency trees

We introduce a graph polynomial that distinguishes tree structures to re...
research
01/10/2022

Informal Persian Universal Dependency Treebank

This paper presents the phonological, morphological, and syntactic disti...
research
07/30/2015

One model, two languages: training bilingual parsers with harmonized treebanks

We introduce an approach to train lexicalized parsers using bilingual co...
research
06/30/2022

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

Previous Part-Of-Speech (POS) induction models usually assume certain in...
research
05/24/2022

Universal Dependency Treebank for Odia Language

This paper presents the first publicly available treebank of Odia, a mor...
research
03/17/2022

Finding Structural Knowledge in Multimodal-BERT

In this work, we investigate the knowledge learned in the embeddings of ...

Please sign up or login with your details

Forgot password? Click here to reset