X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

10/05/2020
by   Angel Daza, et al.
0

Even though SRL is researched for many languages, major improvements have mostly been obtained for English, for which more resources are available. In fact, existing multilingual SRL datasets contain disparate annotation styles or come from different domains, hampering generalization in multilingual learning. In this work, we propose a method to automatically construct an SRL corpus that is parallel in four languages: English, French, German, Spanish, with unified predicate and role annotations that are fully comparable across languages. We apply high-quality machine translation to the English CoNLL-09 dataset and use multilingual BERT to project its high-quality annotations to the target languages. We include human-validated test sets that we use to measure the projection quality, and show that projection is denser and more precise than a strong baseline. Finally, we train different SOTA models on our novel corpus for mono- and multilingual SRL, showing that the multilingual annotations improve performance especially for the weaker languages.

READ FULL TEXT
research
05/17/2020

Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles

We present a semantic role labeling resource for Hebrew built semi-autom...
research
04/14/2020

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Many efforts of research are devoted to semantic role labeling (SRL) whi...
research
07/26/2021

Multilingual Coreference Resolution with Harmonized Annotations

In this paper, we present coreference resolution experiments with a newl...
research
04/14/2021

I Wish I Would Have Loved This One, But I Didn't – A Multilingual Dataset for Counterfactual Detection in Product Reviews

Counterfactual statements describe events that did not or cannot take pl...
research
04/14/2017

Cross-lingual Abstract Meaning Representation Parsing

Abstract Meaning Representation (AMR) annotation efforts have mostly foc...
research
03/04/2016

A Bayesian Model of Multilingual Unsupervised Semantic Role Induction

We propose a Bayesian model of unsupervised semantic role induction in m...
research
10/11/2020

Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Health departments have been deploying text classification systems for t...

Please sign up or login with your details

Forgot password? Click here to reset