Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

by   Amar Shah, et al.

Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.



There are no comments yet.


page 1

page 2

page 3

page 4


Subspace clustering without knowing the number of clusters: A parameter free approach

Subspace clustering, the task of clustering high dimensional data when t...

A semi-supervised sparse K-Means algorithm

We consider the problem of data clustering with unidentified feature qua...

Semi-supervised model-based clustering with controlled clusters leakage

In this paper, we focus on finding clusters in partially categorized dat...

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

We propose a novel method for multiple clustering that assumes a co-clus...

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Hierarchical Agglomerative Clustering (HAC) is one of the oldest but sti...

Learning Graph Representation via Formal Concept Analysis

We present a novel method that can learn a graph representation from mul...

Criteria Sliders: Learning Continuous Database Criteria via Interactive Ranking

Large databases are often organized by hand-labeled metadata, or criteri...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.