Hierarchical clustering of mixed-type data based on barycentric coding

01/31/2022
by   Odysseas Moschidis, et al.
0

Clustering of mixed-type datasets can be a particularly challenging task as it requires taking into account the associations between variables with different level of measurement, i.e., nominal, ordinal and/or interval. In some cases, hierarchical clustering is considered a suitable approach, as it makes few assumptions about the data and its solution can be easily visualized. Since most hierarchical clustering approaches assume variables are measured on the same scale, a simple strategy for clustering mixed-type data is to homogenize the variables before clustering. This would mean either recoding the continuous variables as categorical ones or vice versa. However, typical discretization of continuous variables implies loss of information. In this work, an agglomerative hierarchical clustering approach for mixed-type data is proposed, which relies on a barycentric coding of continuous variables. The proposed approach minimizes information loss and is compatible with the framework of correspondence analysis. The utility of the method is demonstrated on real and simulated data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2022

Model Based Co-clustering of Mixed Numerical and Binary Data

Co-clustering is a data mining technique used to extract the underlying ...
research
06/30/2020

Hierarchical Qualitative Clustering – clustering mixed datasets with critical qualitative information

Clustering can be used to extract insights from data or to verify some o...
research
12/22/2022

Co-clustering based exploratory analysis of mixed-type data tables

Co-clustering is a class of unsupervised data analysis techniques that e...
research
12/20/2018

Block clustering of Binary Data with Gaussian Co-variables

The simultaneous grouping of rows and columns is an important technique ...
research
09/21/2020

Learning Representation for Mixed Data Types with a Nonlinear Deep Encoder-Decoder Framework

Representation of data on mixed variables, numerical and categorical typ...
research
02/06/2019

Un modèle Bayésien de co-clustering de données mixtes

We propose a MAP Bayesian approach to perform and evaluate a co-clusteri...
research
08/18/2020

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...

Please sign up or login with your details

Forgot password? Click here to reset