Multilingual News Location Detection using an Entity-Based Siamese Network with Semi-Supervised Contrastive Learning and Knowledge Base

12/22/2022
by   Víctor Suárez-Paniagua, et al.
0

Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Neural Attentive Bag-of-Entities Model for Text Classification

This study proposes a Neural Attentive Bag-of-Entities model, which is a...
research
03/19/2022

Automatic Detection of Entity-Manipulated Text using Factual Knowledge

In this work, we focus on the problem of distinguishing a human written ...
research
07/18/2022

GOAL: Towards Benchmarking Few-Shot Sports Game Summarization

Sports game summarization aims to generate sports news based on real-tim...
research
03/23/2023

SwissBERT: The Multilingual Language Model for Switzerland

We present SwissBERT, a masked language model created specifically for p...
research
04/17/2020

Batch Clustering for Multilingual News Streaming

Nowadays, digital news articles are widely available, published by vario...
research
04/26/2023

Evaluating Code Metrics in GitHub Repositories Related to Fake News and Misinformation

The surge of research on fake news and misinformation in the aftermath o...

Please sign up or login with your details

Forgot password? Click here to reset