Importance Sampling Deterministic Annealing for Clustering

02/09/2023
by   Jiangshe Zhang, et al.
0

A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. However, this assumption may not hold in some real-world scenarios. In this paper, we propose an importance sampling based deterministic annealing approach (ISDA) for clustering problems which minimizes the worst case of expected distortions under the constraint of distribution deviation. The distribution deviation constraint can be converted to the constraint over a set of weight distributions centered on the uniform distribution derived from importance sampling. The objective of the proposed approach is to minimize the loss under maximum degradation hence the resulting problem is a constrained minimax optimization problem which can be reformulated to an unconstrained problem using the Lagrange method and be solved by the quasi-newton algorithm. Experiment results on synthetic datasets and a real-world load forecasting problem validate the effectiveness of the proposed ISDA. Furthermore, we show that fuzzy c-means is a special case of ISDA with the logarithmic distortion. This observation sheds a new light on the relationship between fuzzy c-means and deterministic annealing clustering algorithms and provides an interesting physical and information-theoretical interpretation for fuzzy exponent m.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2023

An information-theoretic learning model based on importance sampling

A crucial assumption underlying the most current theory of machine learn...
research
06/27/2023

Adaptive Annealed Importance Sampling with Constant Rate Progress

Annealed Importance Sampling (AIS) synthesizes weighted samples from an ...
research
09/09/2023

Correcting sampling biases via importancereweighting for spatial modeling

In machine learning models, the estimation of errors is often complex du...
research
07/05/2023

Privacy Amplification via Importance Sampling

We examine the privacy-enhancing properties of subsampling a data set vi...
research
03/21/2022

Coresets for Weight-Constrained Anisotropic Assignment and Clustering

The present paper constructs coresets for weight-constrained anisotropic...
research
09/07/2010

Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

This paper proposes an organized generalization of Newman and Girvan's m...
research
01/10/2020

Entropy Regularized Power k-Means Clustering

Despite its well-known shortcomings, k-means remains one of the most wid...

Please sign up or login with your details

Forgot password? Click here to reset