Random Indexing K-tree

01/06/2010
by   Christopher M. de Vries, et al.
0

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2010

Document Clustering with K-tree

This paper describes the approach taken to the XML Mining track at INEX ...
research
01/06/2010

K-tree: Large Scale Document Clustering

We introduce K-tree in an information retrieval context. It is an effici...
research
12/29/2011

A comparison of two suffix tree-based document clustering algorithms

Document clustering as an unsupervised approach extensively used to navi...
research
12/29/2011

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by research...
research
08/28/2012

Document Clustering Evaluation: Divergence from a Random Baseline

Divergence from a random baseline is a technique for the evaluation of d...
research
04/21/2022

A computational study of Gomory-Hu tree algorithms

We present an experimental study of algorithms for computing the Gomory-...
research
12/26/2019

On the Reproducibility of Experiments of Indexing Repetitive Document Collections

This work introduces a companion reproducible paper with the aim of allo...

Please sign up or login with your details

Forgot password? Click here to reset