Cross-lingual Parsing with Polyglot Training and Multi-treebank Learning: A Faroese Case Study

10/17/2019
by   James Barry, et al.
0

Cross-lingual dependency parsing involves transferring syntactic knowledge from one language to another. It is a crucial component for inducing dependency parsers in low-resource scenarios where no training data for a language exists. Using Faroese as the target language, we compare two approaches using annotation projection: first, projecting from multiple monolingual source models; second, projecting from a single polyglot model which is trained on the combination of all source languages. Furthermore, we reproduce multi-source projection (Tyers et al., 2018), in which dependency trees of multiple sources are combined. Finally, we apply multi-treebank modelling to the projected treebanks, in addition to or alternatively to polyglot modelling on the source side. We find that polyglot training on the source languages produces an overall trend of better results on the target language but the single best result for the target language is obtained by projecting from monolingual source parsing models and then training multi-treebank POS tagging and parsing models on the target side.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing

We present substructure distribution projection (SubDP), a technique tha...
research
05/19/2022

Cross-lingual Inflection as a Data Augmentation Method for Parsing

We propose a morphology-based method for low-resource (LR) dependency pa...
research
01/15/2022

Automatic Correction of Syntactic Dependency Annotation Differences

Annotation inconsistencies between data sets can cause problems for low-...
research
08/18/2017

Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017

This paper describes the submission from the University of Helsinki to t...
research
06/03/2016

Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task Learning

Various treebanks have been released for dependency parsing. Despite tha...
research
05/02/2020

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

A recent advance in monolingual dependency parsing is the idea of a tree...
research
03/12/2019

Bootstrapping Method for Developing Part-of-Speech Tagged Corpus in Low Resource Languages Tagset - A Focus on an African Igbo

Most languages, especially in Africa, have fewer or no established part-...

Please sign up or login with your details

Forgot password? Click here to reset