SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

02/19/2023
by   Jivnesh Sandhan, et al.
0

We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritShala is deployed as a web-based application, which allows a user to get real-time analysis for the given input. It is built with easy-to-use interactive data annotation features that allow annotators to correct the system predictions when it makes mistakes. We publicly release the source codes of the 4 modules included in the toolkit, 7 word embedding models that have been trained on publicly available Sanskrit corpora and multiple annotated datasets such as word similarity, relatedness, categorization, analogy prediction to assess intrinsic properties of word embeddings. So far as we know, this is the first neural-based Sanskrit NLP toolkit that has a web-based interface and a number of NLP modules. We are sure that the people who are willing to work with Sanskrit will find it useful for pedagogical and annotative purposes. SanskritShala is available at: https://cnerg.iitkgp.ac.in/sanskritshala. The demo video of our platform can be accessed at: https://youtu.be/x0X31Y9k0mw4.

READ FULL TEXT

page 3

page 5

research
01/04/2018

VnCoreNLP: A Vietnamese Natural Language Processing Toolkit

We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NL...
research
08/17/2023

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

The primary focus of this thesis is to make Sanskrit manuscripts more ac...
research
03/22/2022

Demo of the Linguistic Field Data Management and Analysis System – LiFE

In the proposed demo, we will present a new software - Linguistic Field ...
research
06/27/2019

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Chinese word segmentation (CWS) is a fundamental step of Chinese natural...
research
08/25/2015

Visualizing NLP annotations for Crowdsourcing

Visualizing NLP annotation is useful for the collection of training data...
research
10/04/2022

Text Characterization Toolkit

In NLP, models are usually evaluated by reporting single-number performa...
research
12/06/2020

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

We present NaturalCC, an efficient and extensible toolkit to bridge the ...

Please sign up or login with your details

Forgot password? Click here to reset