K-tree: Large Scale Document Clustering

01/06/2010
by   Christopher M. de Vries, et al.
0

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

READ FULL TEXT

page 1

page 2

research
01/06/2010

Document Clustering with K-tree

This paper describes the approach taken to the XML Mining track at INEX ...
research
12/01/2021

Efficient Big Text Data Clustering Algorithms using Hadoop and Spark

Document clustering is a traditional, efficient and yet quite effective,...
research
01/06/2010

Random Indexing K-tree

Random Indexing (RI) K-tree is the combination of two algorithms for clu...
research
07/30/2021

Efficient Sparse Spherical k-Means for Document Clustering

Spherical k-Means is frequently used to cluster document collections bec...
research
05/21/2015

Parallel Streaming Signature EM-tree: A Clustering Algorithm for Web Scale Applications

The proliferation of the web presents an unsolved problem of automatical...
research
02/21/2020

Inverted-File k-Means Clustering: Performance Analysis

This paper presents an inverted-file k-means clustering algorithm (IVF) ...
research
01/24/2022

Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments

While there are high-quality software frameworks for information retriev...

Please sign up or login with your details

Forgot password? Click here to reset