Multilingual Clustering of Streaming News

09/03/2018
by   Sebastião Miranda, et al.
0

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish.

READ FULL TEXT
research
04/28/2022

Simplifying Multilingual News Clustering Through Projection From a Shared Space

The task of organizing and clustering multilingual news articles for med...
research
04/17/2020

Batch Clustering for Multilingual News Streaming

Nowadays, digital news articles are widely available, published by vario...
research
03/16/2023

Team SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification

This paper describes our approach for SemEval-2023 Task 3: Detecting the...
research
07/06/2023

MultiVENT: Multilingual Videos of Events with Aligned Natural Text

Everyday news coverage has shifted from traditional broadcasts towards a...
research
04/08/2023

Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding

Unsupervised discovery of stories with correlated news articles in real-...
research
03/01/2018

Growing Story Forest Online from Massive Breaking News

We describe our experience of implementing a news content organization s...
research
01/26/2021

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

We propose a method for online news stream clustering that is a variant ...

Please sign up or login with your details

Forgot password? Click here to reset