Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

05/18/2023
by   Elisa Bassignana, et al.
0

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and–as sanity check–over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2023

MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset

Relation extraction (RE) is a fundamental task in information extraction...
research
03/22/2015

Multilingual Open Relation Extraction Using Cross-lingual Projection

Open domain relation extraction systems identify relation and argument p...
research
10/17/2022

Towards Relation Extraction From Speech

Relation extraction typically aims to extract semantic relationships bet...
research
06/24/2020

A High-Quality Multilingual Dataset for Structured Documentation Translation

This paper presents a high-quality multilingual dataset for the document...
research
07/19/2018

Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence

The multilingual nature of the Internet increases complications in the c...
research
03/31/2023

Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical Expressions

Temporal and numerical expression understanding is of great importance i...
research
06/17/2022

BITS Pilani at HinglishEval: Quality Evaluation for Code-Mixed Hinglish Text Using Transformers

Code-Mixed text data consists of sentences having words or phrases from ...

Please sign up or login with your details

Forgot password? Click here to reset