Reliable Learning of Bernoulli Mixture Models

10/05/2017
by   Amir Najafi, et al.
0

In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with cluster-specific frequencies. The problem of clustering BMM data arises in many real-world applications, most notably in population genetics where researchers aim at inferring the population structure from multilocus genotype data. Our findings stipulate a minimum dataset size and a minimum number of Bernoulli trials (or genotyped loci) per sample, such that the existence of a clustering algorithm with a sufficient accuracy is guaranteed. Moreover, the mathematical intuitions and tools behind our work can help researchers in designing more effective and theoretically-plausible heuristic methods for similar problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2021

Vine copula mixture models and clustering for non-Gaussian data

The majority of finite mixture models suffer from not allowing asymmetri...
research
03/30/2017

The Informativeness of k-Means and Dimensionality Reduction for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. In...
research
10/31/2016

Flexible Models for Microclustering with Application to Entity Resolution

Most generative models for clustering implicitly assume that the number ...
research
07/15/2020

Mixture Complexity and Its Application to Gradual Clustering Change Detection

In model-based clustering using finite mixture models, it is a significa...
research
12/02/2015

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Most generative models for clustering implicitly assume that the number ...
research
11/12/2020

MCMC computations for Bayesian mixture models using repulsive point processes

Repulsive mixture models have recently gained popularity for Bayesian cl...
research
07/07/2021

Bayesian model-based clustering for multiple network data

There is increasing appetite for analysing multiple network data. This i...

Please sign up or login with your details

Forgot password? Click here to reset