Are UD Treebanks Getting More Consistent? A Report Card for English UD

02/01/2023
by   Amir Zeldes, et al.
0

Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple corpora. Focusing on the two largest UD English treebanks, we examine progress in data consolidation and answer several questions: Are UD English treebanks becoming more internally consistent? Are they becoming more like each other and to what extent? Is joint training a good idea, and if so, since which UD version? Our results indicate that while consolidation has made progress, joint models may still suffer from inconsistencies, which hamper their ability to leverage a larger pool of training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2021

Mischievous Nominal Constructions in Universal Dependencies

While the highly multilingual Universal Dependencies (UD) project provid...
research
04/06/2020

Bootstrapping a Crosslingual Semantic Parser

Datasets for semantic parsing scarcely consider languages other than Eng...
research
04/04/2019

A Simple Joint Model for Improved Contextual Neural Lemmatization

English verbs have multiple forms. For instance, talk may also appear as...
research
02/13/2023

Why Can't Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity

Recent advances in discourse parsing performance create the impression t...
research
11/05/2019

Data Diversification: An Elegant Strategy For Neural Machine Translation

A common approach to improve neural machine translation is to invent new...
research
09/02/2019

All Roads Lead to UD: Converting Stanford and Penn Parses to English Universal Dependencies with Multilayer Annotations

We describe and evaluate different approaches to the conversion of gold ...
research
10/10/2017

The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages

We release Galactic Dependencies 1.0---a large set of synthetic language...

Please sign up or login with your details

Forgot password? Click here to reset