Semi-Supervised Clustering via Markov Chain Aggregation

12/17/2021
by   Sophie Steger, et al.
0

We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities between corresponding data points, and by incorporating semi-supervision information as hard constraints in a Hartigan-style algorithm. The introduced Constrained Markov Clustering (CoMaC) is an extension of a recent information-theoretic framework for (unsupervised) Markov aggregation to the semi-supervised case. Instantiating CoMaC for certain parameter settings further generalizes two previous information-theoretic objectives for unsupervised clustering. Our results indicate that CoMaC is competitive with the state-of-the-art. Furthermore, our approach is less sensitive to hyperparameter settings than the unsupervised counterpart, which is especially attractive in the semi-supervised setting characterized by little labeled data.

READ FULL TEXT
research
01/02/2018

Co-Clustering via Information-Theoretic Markov Aggregation

We present an information-theoretic cost function for co-clustering, i.e...
research
04/29/2022

Information-Theoretic Reduction of Markov Chains

We survey information-theoretic approaches to the reduction of Markov ch...
research
07/14/2018

Adversarially Learned Mixture Model

The Adversarially Learned Mixture Model (AMM) is a generative model for ...
research
11/04/2021

An Information-Theoretic Framework for Identifying Age-Related Genes Using Human Dermal Fibroblast Transcriptome Data

Investigation of age-related genes is of great importance for multiple p...
research
05/31/2013

Privileged Information for Data Clustering

Many machine learning algorithms assume that all input samples are indep...
research
05/24/2022

Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

The stochastic block model is a canonical random graph model for cluster...
research
10/08/2021

Learning from non-irreducible Markov chains

Most of the existing literature on supervised learning problems focuses ...

Please sign up or login with your details

Forgot password? Click here to reset