Systematic Investigation of Strategies Tailored for Low-Resource Settings for Sanskrit Dependency Parsing

01/27/2022
by   Jivnesh Sandhan, et al.
0

Existing state of the art approaches for Sanskrit Dependency Parsing (SDP), are hybrid in nature, and rely on a lexicon-driven shallow parser for linguistically motivated feature engineering. However, these methods fail to handle out of vocabulary (OOV) words, which limits their applicability in realistic scenarios. On the other hand, purely data-driven approaches do not match the performance of hybrid approaches due to the labelled data sparsity. Thus, in this work, we investigate the following question: How far can we push a purely data-driven approach using recently proposed strategies for low-resource settings? We experiment with five strategies, namely, data augmentation, sequential transfer learning, cross-lingual/mono-lingual pretraining, multi-task learning and self-training. Our proposed ensembled system outperforms the purely data-driven state of the art system by 2.8/3.9 points (Unlabelled Attachment Score (UAS)/Labelled Attachment Score (LAS)) absolute gain. Interestingly, it also supersedes the performance of the state of the art hybrid system by 1.2 points (UAS) absolute gain and shows comparable performance in terms of LAS. Code and data will be publicly available at: <https://github.com/Jivnesh/SanDP>.

READ FULL TEXT
research
02/12/2021

A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages

Neural dependency parsing has achieved remarkable performance for many d...
research
04/17/2020

Neural Approaches for Data Driven Dependency Parsing in Sanskrit

Data-driven approaches for dependency parsing have been of great interes...
research
10/21/2022

TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer

Sanskrit Word Segmentation (SWS) is essential in making digitized texts ...
research
01/06/2017

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages

In cross-lingual dependency annotation projection, information is often ...
research
10/17/2022

Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization

We present Expected Statistic Regularization (ESR), a novel regularizati...
research
08/22/2022

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

The phenomenon of compounding is ubiquitous in Sanskrit. It serves for a...
research
11/12/2020

Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic

This paper explores the possibility of improving the performance of spec...

Please sign up or login with your details

Forgot password? Click here to reset