MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

11/11/2022
by   Amir Pouran Ben Veyseh, et al.
0

Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

MEE: A Novel Multilingual Event Extraction Dataset

Event Extraction (EE) is one of the fundamental tasks in Information Ext...
research
02/19/2022

MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction

Acronym extraction is the task of identifying acronyms and their expande...
research
05/10/2023

Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages

We present Vārta, a large-scale multilingual dataset for headline genera...
research
04/13/2022

Multilingual Event Linking to Wikidata

We present a task of multilingual linking of events to a knowledge base....
research
07/23/2014

Joint Energy-based Detection and Classificationon of Multilingual Text Lines

This paper proposes a new hierarchical MDL-based model for a joint detec...
research
02/22/2022

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

On the world wide web, toxic content detectors are a crucial line of def...
research
11/29/2022

TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages

We study politeness phenomena in nine typologically diverse languages. P...

Please sign up or login with your details

Forgot password? Click here to reset