Genre as Weak Supervision for Cross-lingual Dependency Parsing

09/10/2021
by   Max Müller-Eberstein, et al.
0

Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection. Dataset genre labels are already frequently available, yet remain largely unexplored in cross-lingual setups. We harness this genre metadata as a weak supervision signal for targeted data selection in zero-shot dependency parsing. Specifically, we project treebank-level genre information to the finer-grained sentence level, with the goal to amplify information implicitly stored in unsupervised contextualized representations. We demonstrate that genre is recoverable from multilingual contextual embeddings and that it provides an effective signal for training data selection in cross-lingual, zero-shot scenarios. For 12 low-resource language treebanks, six of which are test-only, our genre-specific methods significantly outperform competitive baselines as well as recent embedding-based methods for data selection. Moreover, genre-based data selection provides new state-of-the-art results for three of these target languages.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 17

03/03/2021

Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation

Linear embedding transformation has been shown to be effective for zero-...
09/15/2019

Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing

This paper investigates the problem of learning cross-lingual representa...
02/15/2021

Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers

We explore cross-lingual transfer of register classification for web doc...
05/02/2020

Treebank Embedding Vectors for Out-of-domain Dependency Parsing

A recent advance in monolingual dependency parsing is the idea of a tree...
02/25/2019

Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing

We introduce a novel method for multilingual transfer that utilizes deep...
03/01/2021

On the Effectiveness of Dataset Embeddings in Mono-lingual,Multi-lingual and Zero-shot Conditions

Recent complementary strands of research have shown that leveraging info...
10/08/2021

Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Providing technologies to communities or domains where training data is ...

Code Repositories

ud-selection

Genre-driven Data Selection for Zero-shot Parsing (EMNLP 2021)


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.