The Informativeness of k-Means and Dimensionality Reduction for Learning Mixture Models

03/30/2017
by   Zhaoqiang Liu, et al.
0

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the correct target clustering of the samples according to which component distribution they were generated from. For a clustering problem, practitioners often choose to use the simple k-means algorithm. k-means attempts to find an optimal clustering which minimizes the sum-of-squared distance between each point and its cluster center. In this paper, we provide sufficient conditions for the closeness of any optimal clustering and the correct target clustering assuming that the data samples are generated from a mixture of log-concave distributions. Moreover, we show that under similar or even weaker conditions on the mixture model, any optimal clustering for the samples with reduced dimensionality is also close to the correct target clustering. These results provide intuition for the informativeness of k-means (with and without dimensionality reduction) as an algorithm for learning mixture models. We verify the correctness of our theorems using numerical experiments and demonstrate using datasets with reduced dimensionality significant speed ups for the time required to perform clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2017

Parameter Estimation in Finite Mixture Models by Regularized Optimal Transport: A Unified Framework for Hard and Soft Clustering

In this short paper, we formulate parameter estimation for finite mixtur...
research
02/03/2015

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Laplacian mixture models identify overlapping regions of influence in un...
research
09/05/2017

A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms

Probabilistic mixture models have been widely used for different machine...
research
12/11/2018

Robust Bregman Clustering

Using a trimming approach, we investigate a k-means type method based on...
research
10/05/2017

Reliable Learning of Bernoulli Mixture Models

In this paper, we have derived a set of sufficient conditions for reliab...
research
03/04/2022

False clustering rate control in mixture models

The clustering task consists in delivering labels to the members of a sa...
research
09/16/2020

Clustering Data with Nonignorable Missingness using Semi-Parametric Mixture Models

We are concerned in clustering continuous data sets subject to nonignora...

Please sign up or login with your details

Forgot password? Click here to reset