A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

09/06/2019
by   Clara Vania, et al.
0

Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages---North Sámi, Galician, and Kazah---We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2022

Cross-lingual Inflection as a Data Augmentation Method for Parsing

We propose a morphology-based method for low-resource (LR) dependency pa...
research
05/24/2023

Cross-lingual Data Augmentation for Document-grounded Dialog Systems in Low Resource Languages

This paper proposes a framework to address the issue of data scarcity in...
research
11/14/2022

Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns

In this work, we focus on intrasentential code-mixing and propose severa...
research
09/03/2021

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Lack of training data presents a grand challenge to scaling out spoken l...
research
10/19/2022

Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study

This paper presents an empirical study to build relation extraction syst...
research
06/08/2021

A Falta de Pan, Buenas Son Tortas: The Efficacy of Predicted UPOS Tags for Low Resource UD Parsing

We evaluate the efficacy of predicted UPOS tags as input features for de...
research
03/22/2019

Data Augmentation via Dependency Tree Morphing for Low-Resource Languages

Neural NLP systems achieve high scores in the presence of sizable traini...

Please sign up or login with your details

Forgot password? Click here to reset