A Random Finite Set Model for Data Clustering

03/14/2017
by   Dinh Phung, et al.
0

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.

READ FULL TEXT
research
10/31/2016

Flexible Models for Microclustering with Application to Entity Resolution

Most generative models for clustering implicitly assume that the number ...
research
03/22/2021

Forest Fire Clustering: Cluster-oriented Label Propagation Clustering and Monte Carlo Verification Inspired by Forest Fire Dynamics

Clustering methods group data points together and assign them group-leve...
research
07/31/2020

Bayesian Approaches for Flexible and Informative Clustering of Microbiome Data

We propose two unsupervised clustering methods that are designed for hum...
research
02/28/2019

Efficient Parameter-free Clustering Using First Neighbor Relations

We present a new clustering method in the form of a single clustering eq...
research
04/21/2016

Markov models for ocular fixation locations in the presence and absence of colour

We propose to model the fixation locations of the human eye when observi...
research
10/03/2017

Monte Carlo approximation certificates for k-means clustering

Efficient algorithms for k-means clustering frequently converge to subop...
research
04/06/2021

A New Parallel Adaptive Clustering and its Application to Streaming Data

This paper presents a parallel adaptive clustering (PAC) algorithm to au...

Please sign up or login with your details

Forgot password? Click here to reset