Semi-supervised cross-entropy clustering with information bottleneck constraint

05/03/2017
by   Marek Śmieja, et al.
0

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering.

READ FULL TEXT

page 21

page 23

page 30

page 31

page 32

page 33

research
08/19/2015

Introduction to Cross-Entropy Clustering The R Package CEC

The R Package CEC performs clustering based on the cross-entropy cluster...
research
05/04/2017

Semi-supervised model-based clustering with controlled clusters leakage

In this paper, we focus on finding clusters in partially categorized dat...
research
07/01/2013

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous s...
research
05/16/2022

Sharp Asymptotics of Self-training with Linear Classifier

Self-training (ST) is a straightforward and standard approach in semi-su...
research
02/07/2014

Active Clustering with Model-Based Uncertainty Reduction

Semi-supervised clustering seeks to augment traditional clustering metho...
research
07/13/2013

Fractionally-Supervised Classification

Traditionally, there are three species of classification: unsupervised, ...
research
02/13/2021

Graph Convolution for Semi-Supervised Classification: Improved Linear Separability and Out-of-Distribution Generalization

Recently there has been increased interest in semi-supervised classifica...

Please sign up or login with your details

Forgot password? Click here to reset