Leave no Place Behind: Improved Geolocation in Humanitarian Documents

09/06/2023
by   Enrico M. Belliardo, et al.
1

Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2023

MphayaNER: Named Entity Recognition for Tshivenda

Named Entity Recognition (NER) plays a vital role in various Natural Lan...
research
08/31/2021

TNNT: The Named Entity Recognition Toolkit

Extraction of categorised named entities from text is a complex task giv...
research
06/05/2022

Story Beyond the Eye: Glyph Positions Break PDF Text Redaction

In the past redaction involved the use of black or white markers or pape...
research
09/23/2021

Named Entity Recognition and Classification on Historical Documents: A Survey

After decades of massive digitisation, an unprecedented amount of histor...
research
12/22/2018

A Survey on Deep Learning for Named Entity Recognition

Named entity recognition (NER) is the task to identify text spans that m...
research
02/07/2023

A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends

As more and more Arabic texts emerged on the Internet, extracting import...
research
07/17/2022

Extracting and Visualizing Wildlife Trafficking Events from Wildlife Trafficking Reports

Experts combating wildlife trafficking manually sift through articles ab...

Please sign up or login with your details

Forgot password? Click here to reset