Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data

08/17/2016
by   Rajiv Sambasivan, et al.
0

Datasets with a mixture of numerical and categorical attributes are routinely encountered in many application domains. In this work we examine an approach to clustering such datasets using homogeneity analysis. Homogeneity analysis determines a euclidean representation of the data. This can be analyzed by leveraging the large body of tools and techniques for data with a euclidean representation. Experiments conducted as part of this study suggest that this approach can be useful in the analysis and exploration of big datasets with a mixture of numerical and categorical attributes.

READ FULL TEXT
research
11/11/2018

A Survey of Mixed Data Clustering Algorithms

Most of the datasets normally contain either numeric or categorical feat...
research
11/19/2020

Similarity-based Distance for Categorical Clustering using Space Structure

Clustering is spotting pattern in a group of objects and resultantly gro...
research
09/30/2019

K-Metamodes: frequency- and ensemble-based distributed k-modes clustering for security analytics

Nowadays processing of Big Security Data, such as log messages, is commo...
research
04/01/2022

Real-world K-Anonymity Applications: the KGen approach and its evaluation in Fraudulent Transactions

K-Anonymity is a property for the measurement, management, and governanc...
research
08/24/2017

GALILEO: A Generalized Low-Entropy Mixture Model

We present a new method of generating mixture models for data with categ...
research
12/23/2022

Balanced Subsampling for Big Data with Categorical Covariates

The use and analysis of massive data are challenging due to the high sto...
research
10/18/2022

Clustering Categorical Data: Soft Rounding k-modes

Over the last three decades, researchers have intensively explored vario...

Please sign up or login with your details

Forgot password? Click here to reset