DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction

04/17/2021
by   Abhyuday Bhartiya, et al.
0

Distant supervision (DS) is a well established technique for creating large-scale datasets for relation extraction (RE) without using human annotations. However, research in DS-RE has been mostly limited to the English language. Constraining RE to a single language inhibits utilization of large amounts of data in other languages which could allow extraction of more diverse facts. Very recently, a dataset for multilingual DS-RE has been released. However, our analysis reveals that the proposed dataset exhibits unrealistic characteristics such as 1) lack of sentences that do not express any relation, and 2) all sentences for a given entity pair expressing exactly one relation. We show that these characteristics lead to a gross overestimation of the model performance. In response, we propose a new dataset, DiS-ReX, which alleviates these issues. Our dataset has more than 1.5 million sentences, spanning across 4 languages with 36 relation classes + 1 no relation (NA) class. We also modify the widely used bag attention models by encoding sentences using mBERT and provide the first benchmark results on multilingual DS-RE. Unlike the competing dataset, we show that our dataset is challenging and leaves enough room for future research to take place in this field.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2018

Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector

Distant supervised relation extraction has been successfully applied to ...
research
06/16/2023

RED^ FM: a Filtered and Multilingual Relation Extraction Dataset

Relation Extraction (RE) is a task that identifies relationships between...
research
10/18/2021

A Data Bootstrapping Recipe for Low Resource Multilingual Relation Classification

Relation classification (sometimes called 'extraction') requires trustwo...
research
08/14/2019

X-WikiRE: A Large, Multilingual Resource for Relation Extraction asMachine Comprehension

Although the vast majority of knowledge bases KBs are heavily biased tow...
research
04/17/2022

Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED

DocRED is a widely used dataset for document-level relation extraction. ...
research
04/16/2021

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

TACRED is one of the largest and most widely used sentence-level relatio...
research
12/04/2020

DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic Dialogues

Interpersonal language style shifting in dialogues is an interesting and...

Please sign up or login with your details

Forgot password? Click here to reset