Semi-supervised model-based clustering with controlled clusters leakage

05/04/2017
by   Marek Śmieja, et al.
0

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data.

READ FULL TEXT
research
05/03/2017

Semi-supervised cross-entropy clustering with information bottleneck constraint

In this paper, we propose a semi-supervised clustering method, CEC-IB, t...
research
07/02/2023

Large Language Models Enable Few-Shot Clustering

Unlike traditional unsupervised clustering, semi-supervised clustering a...
research
09/26/2013

Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

Semi-supervised clustering is the task of clustering data points into cl...
research
03/16/2020

A semi-supervised sparse K-Means algorithm

We consider the problem of data clustering with unidentified feature qua...
research
02/26/2023

Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R

Semi-supervised learning is being extensively applied to estimate classi...
research
07/02/2016

Rademacher Complexity Bounds for a Penalized Multiclass Semi-Supervised Algorithm

We propose Rademacher complexity bounds for multiclass classifiers train...
research
11/25/2019

Detecting Unknown Behaviors by Pre-defined Behaviours: An Bayesian Non-parametric Approach

An automatic mouse behavior recognition system can considerably reduce t...

Please sign up or login with your details

Forgot password? Click here to reset