Genre as Weak Supervision for Cross-lingual Dependency Parsing

09/10/2021
by   Max Müller-Eberstein, et al.
0

Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection. Dataset genre labels are already frequently available, yet remain largely unexplored in cross-lingual setups. We harness this genre metadata as a weak supervision signal for targeted data selection in zero-shot dependency parsing. Specifically, we project treebank-level genre information to the finer-grained sentence level, with the goal to amplify information implicitly stored in unsupervised contextualized representations. We demonstrate that genre is recoverable from multilingual contextual embeddings and that it provides an effective signal for training data selection in cross-lingual, zero-shot scenarios. For 12 low-resource language treebanks, six of which are test-only, our genre-specific methods significantly outperform competitive baselines as well as recent embedding-based methods for data selection. Moreover, genre-based data selection provides new state-of-the-art results for three of these target languages.

READ FULL TEXT
research
05/19/2022

Cross-lingual Inflection as a Data Augmentation Method for Parsing

We propose a morphology-based method for low-resource (LR) dependency pa...
research
03/24/2022

Revisiting the Effects of Leakage on Dependency Parsing

Recent work by Søgaard (2020) showed that, treebank size aside, overlap ...
research
03/03/2021

Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation

Linear embedding transformation has been shown to be effective for zero-...
research
09/15/2019

Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing

This paper investigates the problem of learning cross-lingual representa...
research
05/02/2020

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

A recent advance in monolingual dependency parsing is the idea of a tree...
research
02/25/2019

Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing

We introduce a novel method for multilingual transfer that utilizes deep...
research
03/01/2021

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

Recent complementary strands of research have shown that leveraging info...

Please sign up or login with your details

Forgot password? Click here to reset