Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle

06/18/2021
by   Pan Peng, et al.
0

Motivated by applications in crowdsourced entity resolution in database, signed edge prediction in social networks and correlation clustering, Mazumdar and Saha [NIPS 2017] proposed an elegant theoretical model for studying clustering with a faulty oracle. In this model, given a set of n items which belong to k unknown groups (or clusters), our goal is to recover the clusters by asking pairwise queries to an oracle. This oracle can answer the query that “do items u and v belong to the same cluster?”. However, the answer to each pairwise query errs with probability ε, for some ε∈(0,1/2). Mazumdar and Saha provided two algorithms under this model: one algorithm is query-optimal while time-inefficient (i.e., running in quasi-polynomial time), the other is time efficient (i.e., in polynomial time) while query-suboptimal. Larsen, Mitzenmacher and Tsourakakis [WWW 2020] then gave a new time-efficient algorithm for the special case of 2 clusters, which is query-optimal if the bias δ:=1-2ε of the model is large. It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of k clusters and other regimes of δ. In this paper, we make progress on the above question and provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of O(log^2 n)) for all constant k and any δ in the regime when information-theoretic recovery is possible. Our algorithm is built on a connection to the stochastic block model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2017

Clustering with Noisy Queries

In this paper, we initiate a rigorous theoretical study of clustering wi...
research
02/17/2022

Recovering Unbalanced Communities in the Stochastic Block Model With Application to Clustering with a Faulty Oracle

The stochastic block model (SBM) is a fundamental model for studying gra...
research
06/09/2022

Clustering with Queries under Semi-Random Noise

The seminal paper by Mazumdar and Saha <cit.> introduced an extensive li...
research
09/21/2019

Optimal Learning of Joint Alignments with a Faulty Oracle

We consider the following problem, which is useful in applications such ...
research
06/23/2017

Query Complexity of Clustering with Side Information

Suppose, we are given a set of n elements to be clustered into k (unknow...
research
06/09/2021

On Margin-Based Cluster Recovery with Oracle Queries

We study an active cluster recovery problem where, given a set of n poin...
research
02/26/2020

Query-Efficient Correlation Clustering

Correlation clustering is arguably the most natural formulation of clust...

Please sign up or login with your details

Forgot password? Click here to reset