DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering

07/30/2013
by   Brian P. Kent, et al.
0

The level set tree approach of Hartigan (1975) provides a probabilistically based and highly interpretable encoding of the clustering behavior of a dataset. By representing the hierarchy of data modes as a dendrogram of the level sets of a density estimator, this approach offers many advantages for exploratory analysis and clustering, especially for complex and high-dimensional data. Several R packages exist for level set tree estimation, but their practical usefulness is limited by computational inefficiency, absence of interactive graphical capabilities and, from a theoretical perspective, reliance on asymptotic approximations. To make it easier for practitioners to capture the advantages of level set trees, we have written the Python package DeBaCl for DEnsity-BAsed CLustering. In this article we illustrate how DeBaCl's level set tree estimates can be used for difficult clustering tasks and interactive graphical data analysis. The package is intended to promote the practical use of level set trees through improvements in computational efficiency and a high degree of user customization. In addition, the flexible algorithms implemented in DeBaCl enjoy finite sample accuracy, as demonstrated in recent literature on density clustering. Finally, we show the level set tree framework can be easily extended to deal with functional data.

READ FULL TEXT

page 12

page 13

page 14

page 18

page 19

page 21

page 22

page 23

research
09/18/2022

HiPart: Hierarchical Divisive Clustering Toolbox

This paper presents the HiPart package, an open-source native python lib...
research
03/03/2022

Statistical visualisation for tidy and geospatial data in R via kernel smoothing methods in the eks package

Kernel smoothers are essential tools for data analysis due to their abil...
research
11/11/2010

Stability of Density-Based Clustering

High density clusters can be characterized by the connected components o...
research
05/20/2016

Statistical Inference for Cluster Trees

A cluster tree provides a highly-interpretable summary of a density func...
research
08/17/2017

Adaptive Clustering Using Kernel Density Estimators

We investigate statistical properties of a clustering algorithm that rec...
research
03/01/2018

Multimode: An R Package for Mode Assessment

In several applied fields, multimodality assessment is a crucial task as...
research
09/30/2014

Fully adaptive density-based clustering

The clusters of a distribution are often defined by the connected compon...

Please sign up or login with your details

Forgot password? Click here to reset