A Novel Initial Clusters Generation Method for K-means-based Clustering Algorithms for Mixed Datasets

01/31/2019
by   Amir Ahmad, et al.
0

Mixed datasets consist of numeric and categorical attributes. Various K-means-based clustering algorithms have been developed to cluster these datasets. Generally, these clustering algorithms use random initial clusters which in turn produce different clustering results in different runs. A few cluster initialisation methods have been developed to compute initial clusters, however, they are either computationally expensive or they do not create the same clustering results in different runs. In this paper, we propose a novel approach to find initial clusters for K-means-based clustering algorithms for mixed datasets. The proposed approach is based on the observation that some data points in datasets remain in the same clusters created by K-means-based clustering algorithm irrespective of the choice of initial clusters. It is proposed that individual attribute information can be used to create initial clusters. A K-means-based clustering algorithm is run many times, in each run one of the attributes is used to create initial clusters. The clustering results of various runs are combined to produce a clustering result. This clustering result is used as initial clusters for a K-means-based clustering algorithm. Experiments with various categorical and mixed datasets showed that the proposed clustering approach produced accurate and consistent results.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 8

page 10

page 11

page 12

research
08/15/2022

POCS-based Clustering Algorithm

A novel clustering technique based on the projection onto convex set (PO...
research
10/09/2021

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

This paper introduces k-splits, an improved hierarchical algorithm based...
research
03/09/2020

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heur...
research
02/08/2012

Robust seed selection algorithm for k-means type algorithms

Selection of initial seeds greatly affects the quality of the clusters a...
research
08/21/2020

ConiVAT: Cluster Tendency Assessment and Clustering with Partial Background Knowledge

The VAT method is a visual technique for determining the potential clust...
research
12/27/2016

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Clustering is a widely used unsupervised learning method for finding str...
research
11/08/2022

Significance-Based Categorical Data Clustering

Although numerous algorithms have been proposed to solve the categorical...

Please sign up or login with your details

Forgot password? Click here to reset