The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages

10/10/2017
by   Dingquan Wang, et al.
0

We release Galactic Dependencies 1.0---a large set of synthetic languages not found on Earth, but annotated in Universal Dependencies format. This new resource aims to provide training and development data for NLP methods that aim to adapt to unfamiliar languages. Each synthetic treebank is produced from a real treebank by stochastically permuting the dependents of nouns and/or verbs to match the word order of other real languages. We discuss the usefulness, realism, parsability, perplexity, and diversity of the synthetic languages. As a simple demonstration of the use of Galactic Dependencies, we consider single-source transfer, which attempts to parse a real target language using a parser trained on a "nearby" source language. We find that including synthetic source languages somewhat increases the diversity of the source pool, which significantly improves results for most target languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2019

Low-Resource Syntactic Transfer with Unsupervised Source Reordering

We describe a cross-lingual transfer method for dependency parsing that ...
research
05/23/2023

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

In this paper, we present MasakhaPOS, the largest part-of-speech (POS) d...
research
01/11/2017

Parsing Universal Dependencies without training

We propose UDP, the first training-free parser for Universal Dependencie...
research
02/01/2019

Multilingual NER Transfer for Low-resource Languages

In massively multilingual transfer NLP models over many source languages...
research
04/16/2020

Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers

Current methods of cross-lingual parser transfer focus on predicting the...
research
01/10/2022

Informal Persian Universal Dependency Treebank

This paper presents the phonological, morphological, and syntactic disti...
research
02/01/2023

Are UD Treebanks Getting More Consistent? A Report Card for English UD

Recent efforts to consolidate guidelines and treebanks in the Universal ...

Please sign up or login with your details

Forgot password? Click here to reset