Fuzzy clustering of distribution-valued data using adaptive L2 Wasserstein distances

05/02/2016
by   Antonio Irpino, et al.
0

Distributional (or distribution-valued) data are a new type of data arising from several sources and are considered as realizations of distributional variables. A new set of fuzzy c-means algorithms for data described by distributional variables is proposed. The algorithms use the L2 Wasserstein distance between distributions as dissimilarity measures. Beside the extension of the fuzzy c-means algorithm for distributional data, and considering a decomposition of the squared L2 Wasserstein distance, we propose a set of algorithms using different automatic way to compute the weights associated with the variables as well as with their components, globally or cluster-wise. The relevance weights are computed in the clustering process introducing product-to-one constraints. The relevance weights induce adaptive distances expressing the importance of each variable or of each component in the clustering process, acting also as a variable selection method in clustering. We have tested the proposed algorithms on artificial and real-world data. Results confirm that the proposed methods are able to better take into account the cluster structure of the data with respect to the standard fuzzy c-means, with non-adaptive distances.

READ FULL TEXT

page 24

page 27

page 30

page 34

research
11/18/2010

A Fuzzy Clustering Model for Fuzzy Data with Outliers

In this paper a fuzzy clustering model for fuzzy data with outliers is p...
research
04/19/2018

Multiple factor analysis of distributional data

In the framework of Symbolic Data Analysis (SDA), distribution-variables...
research
10/17/2021

Noise-robust Clustering

This paper presents noise-robust clustering techniques in unsupervised m...
research
02/18/2021

Fuzzy clustering algorithms with distance metric learning and entropy regularization

The clustering methods have been used in a variety of fields such as ima...
research
02/13/2018

Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances

We revisit Markowitz's mean-variance portfolio selection model by consid...
research
04/24/2018

Classifying variable-structures: a general framework

In this work, we unify recent variable-clustering techniques within a co...
research
06/07/2022

Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances

The Sliced-Wasserstein distance (SW) is a computationally efficient and ...

Please sign up or login with your details

Forgot password? Click here to reset