Cluster Size Management in Multi-Stage Agglomerative Hierarchical Clustering of Acoustic Speech Segments

10/30/2018
by   Lerato Lerato, et al.
0

Agglomerative hierarchical clustering (AHC) requires only the similarity between objects to be known. This is attractive when clustering signals of varying length, such as speech, which are not readily represented in fixed-dimensional vector space. However, AHC is characterised by O(N^2) space and time complexity, making it infeasible for partitioning large datasets. This has recently been addressed by an approach based on the iterative re-clustering of independent subsets of the larger dataset. We show that, due to its iterative nature, this procedure can sometimes lead to unchecked growth of individual subsets, thereby compromising its effectiveness. We propose the integration of a simple space management strategy into the iterative process, and show experimentally that this leads to no loss in performance in terms of F-measure while guaranteeing that a threshold space complexity is not breached.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2022

Natural Hierarchical Cluster Analysis by Nearest Neighbors with Near-Linear Time Complexity

We propose a nearest neighbor based clustering algorithm that results in...
research
03/14/2022

Geometric reconstructions of density based clusterings

DBSCAN* and HDBSCAN* are well established density based clustering algor...
research
06/18/2012

Efficient Active Algorithms for Hierarchical Clustering

Advances in sensing technologies and the growth of the internet have res...
research
02/08/2021

Large-data determinantal clustering

Determinantal consensus clustering is a promising and attractive alterna...
research
10/30/2018

Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments

Dynamic time warping (DTW) can be used to compute the similarity between...
research
04/25/2019

The Mutex Watershed and its Objective: Efficient, Parameter-Free Image Partitioning

Image partitioning, or segmentation without semantics, is the task of de...
research
09/30/2019

K-Metamodes: frequency- and ensemble-based distributed k-modes clustering for security analytics

Nowadays processing of Big Security Data, such as log messages, is commo...

Please sign up or login with your details

Forgot password? Click here to reset