Creating and Managing a large annotated parallel corpora of Indian languages

12/03/2021
by   Ritesh Kumar, et al.
2

This paper presents the challenges in creating and managing large parallel corpora of 12 major Indian languages (which is soon to be extended to 23 languages) as part of a major consortium project funded by the Department of Information Technology (DIT), Govt. of India, and running parallel in 10 different universities of India. In order to efficiently manage the process of creation and dissemination of these huge corpora, the web-based (with a reduced stand-alone version also) annotation tool ILCIANN (Indian Languages Corpora Initiative Annotation Tool) has been developed. It was primarily developed for the POS annotation as well as the management of the corpus annotation by people with differing amount of competence and at locations physically situated far apart. In order to maintain consistency and standards in the creation of the corpora, it was necessary that everyone works on a common platform which was provided by this tool.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

Charon: a FrameNet Annotation Tool for Multimodal Corpora

This paper presents Charon, a web tool for annotating multimodal corpora...
research
10/27/2022

Creating a morphological and syntactic tagged corpus for the Uzbek language

Nowadays, creation of the tagged corpora is becoming one of the most imp...
research
07/25/2022

Graph Querying for Semantic Annotations

This paper presents how the online tool GREW-MATCH can be used to make q...
research
03/03/2020

Seshat: A tool for managing and verifying annotation campaigns of audio data

We introduce Seshat, a new, simple and open-source software to efficient...
research
04/12/2021

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique

Major advancement in the performance of machine translation models has b...
research
02/17/2021

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

A prerequisite for the computational study of literature is the availabi...
research
02/18/2016

Overview of Annotation Creation: Processes & Tools

Creating linguistic annotations requires more than just a reliable annot...

Please sign up or login with your details

Forgot password? Click here to reset