Log In Sign Up

Creating and Managing a large annotated parallel corpora of Indian languages

by   Ritesh Kumar, et al.

This paper presents the challenges in creating and managing large parallel corpora of 12 major Indian languages (which is soon to be extended to 23 languages) as part of a major consortium project funded by the Department of Information Technology (DIT), Govt. of India, and running parallel in 10 different universities of India. In order to efficiently manage the process of creation and dissemination of these huge corpora, the web-based (with a reduced stand-alone version also) annotation tool ILCIANN (Indian Languages Corpora Initiative Annotation Tool) has been developed. It was primarily developed for the POS annotation as well as the management of the corpus annotation by people with differing amount of competence and at locations physically situated far apart. In order to maintain consistency and standards in the creation of the corpora, it was necessary that everyone works on a common platform which was provided by this tool.


page 1

page 2

page 3

page 4


Charon: a FrameNet Annotation Tool for Multimodal Corpora

This paper presents Charon, a web tool for annotating multimodal corpora...

Creating a morphological and syntactic tagged corpus for the Uzbek language

Nowadays, creation of the tagged corpora is becoming one of the most imp...

Graph Querying for Semantic Annotations

This paper presents how the online tool GREW-MATCH can be used to make q...

Seshat: A tool for managing and verifying annotation campaigns of audio data

We introduce Seshat, a new, simple and open-source software to efficient...

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique

Major advancement in the performance of machine translation models has b...

Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

A prerequisite for the computational study of literature is the availabi...

Overview of Annotation Creation: Processes & Tools

Creating linguistic annotations requires more than just a reliable annot...