A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages

02/12/2021
by   Jivnesh Sandhan, et al.
20

Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, the morphological disambiguation and lack of powerful analyzers pose challenges to get this information for MRLs. To address these challenges, we propose simple auxiliary tasks for pretraining. We perform experiments on 10 MRLs in low-resource settings to measure the efficacy of our proposed pretraining method and observe an average absolute gain of 2 points (UAS) and 3.6 points (LAS). Code and data available at: https://github.com/jivnesh/LCM

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2022

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

Transformer language models (TLMs) are critical for most NLP tasks, but ...
research
01/27/2022

Systematic Investigation of Strategies Tailored for Low-Resource Settings for Sanskrit Dependency Parsing

Existing state of the art approaches for Sanskrit Dependency Parsing (SD...
research
08/17/2021

Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Different linearizations have been proposed to cast dependency parsing a...
research
09/29/2020

Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank

Pretrained multilingual contextual representations have shown great succ...
research
08/22/2022

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

The phenomenon of compounding is ubiquitous in Sanskrit. It serves for a...
research
08/17/2023

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

The primary focus of this thesis is to make Sanskrit manuscripts more ac...
research
06/10/2022

Unsupervised Sentence Simplification via Dependency Parsing

Text simplification is the task of rewriting a text so that it is readab...

Please sign up or login with your details

Forgot password? Click here to reset