Revisiting the Effects of Leakage on Dependency Parsing

03/24/2022
by   Nathaniel Krasner, et al.
0

Recent work by Søgaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available here: https://github.com/miriamwanner/reu-nlp-project

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Genre as Weak Supervision for Cross-lingual Dependency Parsing

Recent work has shown that monolingual masked language models learn to r...
research
07/07/2021

Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

The Shuffle Test is the most common task to evaluate whether NLP models ...
research
05/14/2021

A cost-benefit analysis of cross-lingual transfer methods

An effective method for cross-lingual transfer is to fine-tune a bilingu...
research
06/02/2023

Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

Massively multilingual Transformers (MMTs), such as mBERT and XLM-R, are...
research
08/18/2017

Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017

This paper describes the submission from the University of Helsinki to t...
research
02/10/2017

Universal Semantic Parsing

Universal Dependencies (UD) offer a uniform cross-lingual syntactic repr...
research
06/29/2022

How Train-Test Leakage Affects Zero-shot Retrieval

Neural retrieval models are often trained on (subsets of) the millions o...

Please sign up or login with your details

Forgot password? Click here to reset