Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2022

Simplifying Multilingual News Clustering Through Projection From a Shared Space

The task of organizing and clustering multilingual news articles for med...
research
12/12/2021

Topic Detection and Tracking with Time-Aware Document Embeddings

The time at which a message is communicated is a vital piece of metadata...
research
09/03/2018

Multilingual Clustering of Streaming News

Clustering news across languages enables efficient media monitoring by a...
research
02/15/2021

Within-Document Event Coreference with BERT-Based Contextualized Representations

Event coreference continues to be a challenging problem in information e...
research
01/21/2021

Fast Clustering of Short Text Streams Using Efficient Cluster Indexing and Dynamic Similarity Thresholds

Short text stream clustering is an important but challenging task since ...
research
05/28/2018

Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization

We present an approach to event coreference resolution by developing a g...

Please sign up or login with your details

Forgot password? Click here to reset