SMLSOM: The shrinking maximum likelihood self-organizing map

04/28/2021
by   Ryosuke Motegi, et al.
0

Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many methods have been proposed to solve the problem of selecting the number of clusters, considering it to be a problem with regard to model selection. This paper proposes a greedy algorithm that automatically selects a suitable number of clusters based on a probability distribution model framework. The algorithm includes two components. First, a generalization of Kohonen's self-organizing map (SOM), which has nodes linked to a probability distribution model, and which enables the algorithm to search for the winner based on the likelihood of each node, is introduced. Second, the proposed method uses a graph structure and a neighbor defined by the length of the shortest path between nodes, in contrast to Kohonen's SOM in which the nodes are fixed in the Euclidean space. This implementation makes it possible to update its graph structure by cutting links to weakly connected nodes to avoid unnecessary node deletion. The weakness of a node connection is measured using the Kullback–Leibler divergence and the redundancy of a node is measured by the minimum description length (MDL). This updating step makes it easy to determine the suitable number of clusters. Compared with existing methods, our proposed method is computationally efficient and can accurately select the number of clusters and perform clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2023

Refining a k-nearest neighbor graph for a computationally efficient spectral clustering

Spectral clustering became a popular choice for data clustering for its ...
research
10/06/2022

Beyond the shortest path: the path length index as a distribution

The traditional complex network approach considers only the shortest pat...
research
01/16/2013

Model-Based Hierarchical Clustering

We present an approach to model-based hierarchical clustering by formula...
research
12/02/2019

Identifying the number of clusters for K-Means: A hypersphere density based approach

Application of K-Means algorithm is restricted by the fact that the numb...
research
03/30/2023

Advice Complexity bounds for Online Delayed F-Node-, H-Node- and H-Edge-Deletion Problems

Let F be a fixed finite obstruction set of graphs and G be a graph revea...
research
05/18/2017

Discovering the Graph Structure in the Clustering Results

In a standard cluster analysis, such as k-means, in addition to clusters...
research
03/03/2023

Generalizing Lloyd's algorithm for graph clustering

Clustering is a commonplace problem in many areas of data science, with ...

Please sign up or login with your details

Forgot password? Click here to reset