DeepAI AI Chat
Log In Sign Up

Multilingual News Location Detection using an Entity-Based Siamese Network with Semi-Supervised Contrastive Learning and Knowledge Base

12/22/2022
by   Víctor Suárez-Paniagua, et al.
HUAWEI Technologies Co., Ltd.
0

Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/03/2019

Neural Attentive Bag-of-Entities Model for Text Classification

This study proposes a Neural Attentive Bag-of-Entities model, which is a...
03/19/2022

Automatic Detection of Entity-Manipulated Text using Factual Knowledge

In this work, we focus on the problem of distinguishing a human written ...
07/18/2022

GOAL: Towards Benchmarking Few-Shot Sports Game Summarization

Sports game summarization aims to generate sports news based on real-tim...
03/23/2023

SwissBERT: The Multilingual Language Model for Switzerland

We present SwissBERT, a masked language model created specifically for p...
04/17/2020

Batch Clustering for Multilingual News Streaming

Nowadays, digital news articles are widely available, published by vario...
11/15/2022

MM-Locate-News: Multimodal Focus Location Estimation in News

The consumption of news has changed significantly as the Web has become ...
02/28/2019

Adversarial Training for Satire Detection: Controlling for Confounding Variables

The automatic detection of satire vs. regular news is relevant for downs...