HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

10/10/2022
by   Selim Fekih, et al.
0

Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert - assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HumSet provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HumSet also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of experiments on Pre-trained Language Models (PLM) to establish strong baselines for future research in this domain. The dataset is available at The dataset is available at https: //blog.thedeep.io/humset/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

Recent strides in Large Language Models (LLMs) have saturated many NLP b...
research
10/03/2022

SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis

We propose MINT, a new Multilingual INTimacy analysis dataset covering 1...
research
10/18/2021

A Data Bootstrapping Recipe for Low Resource Multilingual Relation Classification

Relation classification (sometimes called 'extraction') requires trustwo...
research
02/19/2022

MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction

Acronym extraction is the task of identifying acronyms and their expande...
research
05/18/2023

Multilingual Event Extraction from Historical Newspaper Adverts

NLP methods can aid historians in analyzing textual materials in greater...
research
04/27/2020

Natural language processing for achieving sustainable development: the case of neural labelling to enhance community profiling

In recent years, there has been an increasing interest in the applicatio...
research
02/21/2017

Systèmes du LIA à DEFT'13

The 2013 Défi de Fouille de Textes (DEFT) campaign is interested in two ...

Please sign up or login with your details

Forgot password? Click here to reset