Fuzzy Clustering with Similarity Queries

by   Wasim Huleihel, et al.

The fuzzy or soft k-means objective is a popular generalization of the well-known k-means problem, extending the clustering capability of the k-means to datasets that are uncertain, vague, and otherwise hard to cluster. In this paper, we propose a semi-supervised active clustering framework, where the learner is allowed to interact with an oracle (domain expert), asking for the similarity between a certain set of chosen items. We study the query and computational complexities of clustering in this framework. We prove that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem. In particular, we provide probabilistic algorithms for fuzzy clustering in this setting that asks O(𝗉𝗈𝗅𝗒(k)log n) similarity queries and run with polynomial-time-complexity, where n is the number of items. The fuzzy k-means objective is nonconvex, with k-means as a special case, and is equivalent to some other generic nonconvex problem such as non-negative matrix factorization. The ubiquitous Lloyd-type algorithms (or, expectation-maximization algorithm) can get stuck at a local minima. Our results show that by making few similarity queries, the problem becomes easier to solve. Finally, we test our algorithms over real-world datasets, showing their effectiveness in real-world applications.



There are no comments yet.


page 1

page 2

page 3

page 4


Clustering with Same-Cluster Queries

We propose a framework for Semi-Supervised Active Clustering framework (...

Approximate Correlation Clustering Using Same-Cluster Queries

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for c...

Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Several clustering frameworks with interactive (semi-supervised) queries...

Fuzzy Discriminant Clustering with Fuzzy Pairwise Constraints

In semi-supervised fuzzy clustering, this paper extends the traditional ...

Impact of Exponent Parameter Value for the Partition Matrix on the Performance of Fuzzy C Means Algorithm

Soft Clustering plays a very important rule on clustering real world dat...

Fast Randomized Semi-Supervised Clustering

We consider the problem of clustering partially labeled data from a mini...

A Soft Recommender System for Social Networks

Recent social recommender systems benefit from friendship graph to make ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.