Fast Randomized Semi-Supervised Clustering

05/20/2016
by   Alaa Saade, et al.
0

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a simple model. For the case of two clusters, we give bounds on the classification error and show that a small error can be achieved from O(n) randomly chosen measurements, where n is the number of items in the dataset. Our algorithm is therefore efficient both in terms of time and space complexities. We also investigate numerically the performance of the algorithm on synthetic and real world data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2020

A semi-supervised sparse K-Means algorithm

We consider the problem of data clustering with unidentified feature qua...
research
10/20/2015

Optimal Cluster Recovery in the Labeled Stochastic Block Model

We consider the problem of community detection or clustering in the labe...
research
07/02/2016

Rademacher Complexity Bounds for a Penalized Multiclass Semi-Supervised Algorithm

We propose Rademacher complexity bounds for multiclass classifiers train...
research
07/19/2012

Hierarchical Clustering using Randomly Selected Similarities

The problem of hierarchical clustering items from pairwise similarities ...
research
05/13/2019

Learning to Search Efficiently Using Comparisons

We consider the problem of searching in a set of items by using pairwise...
research
06/04/2021

Fuzzy Clustering with Similarity Queries

The fuzzy or soft k-means objective is a popular generalization of the w...
research
10/14/2019

Optimal Clustering from Noisy Binary Feedback

We study the problem of recovering clusters from binary user feedback. I...

Please sign up or login with your details

Forgot password? Click here to reset