News Across Languages - Cross-Lingual Document Similarity and Event Tracking

12/22/2015
by   Jan Rupnik, et al.
0

In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Taking a multilingual stream and clusters of articles from each language, we compare different cross-lingual document similarity measures based on Wikipedia. This allows us to compute the similarity of any two articles regardless of language. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data. Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event. We provide an extensive evaluation of the system as a whole, as well as an evaluation of the quality and robustness of the similarity measure and the linking algorithm.

READ FULL TEXT
research
02/02/2017

Multilingual and Cross-lingual Timeline Extraction

In this paper we present an approach to extract ordered timelines of eve...
research
08/20/2022

SemEval-2022 Task 8: Multi-lingual News Article Similarity

This work is about finding the similarity between a pair of news article...
research
10/25/2017

Linking Tweets with Monolingual and Cross-Lingual News using Transformed Word Embeddings

Social media platforms have grown into an important medium to spread inf...
research
10/07/2020

Cross-lingual Extended Named Entity Classification of Wikipedia Articles

The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-1...
research
12/15/2020

Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies

With the ongoing growth in number of digital articles in a wider set of ...
research
09/03/2018

Hypernyms Through Intra-Article Organization in Wikipedia

We introduce a new measure for unsupervised hypernym detection and direc...
research
11/12/2017

Linking Sequences of Events with Sparse or No Common Occurrence across Data Sets

Data of practical interest - such as personal records, transaction logs,...

Please sign up or login with your details

Forgot password? Click here to reset