Fast Clustering of Short Text Streams Using Efficient Cluster Indexing and Dynamic Similarity Thresholds

01/21/2021
by   Md Rashadul Hasan Rakib, et al.
0

Short text stream clustering is an important but challenging task since massive amount of text is generated from different sources such as micro-blogging, question-answering, and social news aggregation websites. One of the major challenges of clustering such massive amount of text is to cluster them within a reasonable amount of time. The existing state-of-the-art short text stream clustering methods can not cluster such massive amount of text within a reasonable amount of time as they compute similarities between a text and all the existing clusters to assign that text to a cluster. To overcome this challenge, we propose a fast short text stream clustering method (called FastStream) that efficiently index the clusters using inverted index and compute similarity between a text and a selected number of clusters while assigning a text to a cluster. In this way, we not only reduce the running time of our proposed method but also reduce the running time of several state-of-the-art short text stream clustering methods. FastStream assigns a text to a cluster (new or existing) using the dynamically computed similarity thresholds based on statistical measure. Thus our method efficiently deals with the concept drift problem. Experimental results demonstrate that FastStream outperforms the state-of-the-art short text stream clustering methods by a significant margin on several short text datasets. In addition, the running time of FastStream is several orders of magnitude faster than that of the state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2020

Enhancement of Short Text Clustering by Iterative Classification

Short text clustering is a challenging task due to the lack of signal co...
research
05/27/2019

Scaling Fine-grained Modularity Clustering for Massive Graphs

Modularity clustering is an essential tool to understand complicated gra...
research
06/24/2022

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Sequence clustering in a streaming environment is challenging because it...
research
06/10/2015

Fast Online Clustering with Randomized Skeleton Sets

We present a new fast online clustering algorithm that reliably recovers...
research
02/14/2020

Clustering based on Point-Set Kernel

Measuring similarity between two objects is the core operation in existi...
research
01/26/2021

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

We propose a method for online news stream clustering that is a variant ...
research
06/12/2015

Leading Tree in DPCLUS and Its Impact on Building Hierarchies

This paper reveals the tree structure as an intermediate result of clust...

Please sign up or login with your details

Forgot password? Click here to reset