Efficient Sparse Spherical k-Means for Document Clustering

07/30/2021
by   Johannes Knittel, et al.
0

Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which limits the suitability of the algorithm for larger values of k depending on the size of the collection. Optimizations targeted at the Euclidean k-Means algorithm largely do not apply because the cosine distance is not a metric. We therefore propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2021

Accelerating Spherical k-Means

Spherical k-means is a widely used clustering algorithm for sparse and h...
research
01/06/2010

K-tree: Large Scale Document Clustering

We introduce K-tree in an information retrieval context. It is an effici...
research
11/15/2020

Estimation of the number of clusters on d-dimensional sphere

Spherical data is distributed on the sphere. The data appears in various...
research
12/01/2021

Efficient Big Text Data Clustering Algorithms using Hadoop and Spark

Document clustering is a traditional, efficient and yet quite effective,...
research
08/23/2019

QuicK-means: Acceleration of K-means by learning a fast transform

K-means -- and the celebrated Lloyd algorithm -- is more than the cluste...
research
11/05/2019

Closing the Training/Inference Gap for Deep Attractor Networks

This paper improves the deep attractor network (DANet) approach by closi...
research
10/23/2020

Detection of groups of concomitant extremes using clustering

There is a growing empirical evidence that the spherical k-means cluster...

Please sign up or login with your details

Forgot password? Click here to reset