Fast Discrete Distribution Clustering Using Wasserstein Barycenter with Sparse Support

09/30/2015
by   Jianbo Ye, et al.
0

In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues the minimum total within-cluster variation for a set of discrete distributions subject to the Kantorovich-Wasserstein metric. D2-clustering has a severe scalability issue, the bottleneck being the computation of a centroid distribution, called Wasserstein barycenter, that minimizes its sum of squared distances to the cluster members. In this paper, we develop a modified Bregman ADMM approach for computing the approximate discrete Wasserstein barycenter of large clusters. In the case when the support points of the barycenters are unknown and have low cardinality, our method achieves high accuracy empirically at a much reduced computational cost. The strengths and weaknesses of our method and its alternatives are examined through experiments, and we recommend scenarios for their respective usage. Moreover, we develop both serial and parallelized versions of the algorithm. By experimenting with large-scale data, we demonstrate the computational efficiency of the new methods and investigate their convergence properties and numerical stability. The clustering results obtained on several datasets in different domains are highly competitive in comparison with some widely used methods in the corresponding areas.

READ FULL TEXT
research
12/28/2018

Hybrid Wasserstein Distance and Fast Distribution Clustering

We define a modified Wasserstein distance for distribution clustering wh...
research
09/12/2018

A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters

In this paper, we consider the problem of computing a Wasserstein baryce...
research
11/25/2020

Wasserstein k-means with sparse simplex projection

This paper presents a proposal of a faster Wasserstein k-means algorithm...
research
01/01/2022

Dynamic Persistent Homology for Brain Networks via Wasserstein Graph Clustering

We present the novel Wasserstein graph clustering for dynamically changi...
research
10/05/2021

Fast and Interpretable Consensus Clustering via Minipatch Learning

Consensus clustering has been widely used in bioinformatics and other ap...
research
09/19/2019

On Efficient Multilevel Clustering via Wasserstein Distances

We propose a novel approach to the problem of multilevel clustering, whi...
research
06/23/2018

Variational Wasserstein Clustering

We propose a new clustering method based on optimal transportation. We s...

Please sign up or login with your details

Forgot password? Click here to reset