Similarity-based Distance for Categorical Clustering using Space Structure

11/19/2020
by   Utkarsh Nath, et al.
0

Clustering is spotting pattern in a group of objects and resultantly grouping the similar objects together. Objects have attributes which are not always numerical, sometimes attributes have domain or categories to which they could belong to. Such data is called categorical data. To group categorical data many clustering algorithms are used, among which k- modes algorithm has so far given the most significant results. Nevertheless, there is still a lot which could be improved. Algorithms like k-means, fuzzy-c-means or hierarchical have given far better accuracies with numerical data. In this paper, we have proposed a novel distance metric, similarity-based distance (SBD) to find the distance between objects of categorical data. Experiments have shown that our proposed distance (SBD), when used with the SBC (space structure based clustering) type algorithm significantly outperforms the existing algorithms like k-modes or other SBC type algorithms when used on categorical datasets.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/06/2020

An Efficient k-modes Algorithm for Clustering Categorical Datasets

Mining clusters from datasets is an important endeavor in many applicati...
12/09/2018

A matching based clustering algorithm for categorical data

Cluster analysis is one of the essential tasks in data mining and knowle...
09/30/2019

K-Metamodes: frequency- and ensemble-based distributed k-modes clustering for security analytics

Nowadays processing of Big Security Data, such as log messages, is commo...
11/13/2019

Generating Stereotypes Automatically For Complex Categorical Features

In the context of stereotypes creation for recommender systems, we found...
08/17/2016

Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data

Datasets with a mixture of numerical and categorical attributes are rout...
05/21/2019

Similarity Measure Development for Case-Based Reasoning- A Data-driven Approach

In this paper, we demonstrate a data-driven methodology for modelling th...
11/16/2021

A Unified and Fast Interpretable Model for Predictive Analytics

In this paper, we propose FXAM (Fast and eXplainable Additive Model), a ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.