PCCC: The Pairwise-Confidence-Constraints-Clustering Algorithm

12/29/2022
by   Philipp Baumann, et al.
0

We consider a semi-supervised k-clustering problem where information is available on whether pairs of objects are in the same or in different clusters. This information is either available with certainty or with a limited level of confidence. We introduce the PCCC algorithm, which iteratively assigns objects to clusters while accounting for the information provided on the pairs of objects. Our algorithm can include relationships as hard constraints that are guaranteed to be satisfied or as soft constraints that can be violated subject to a penalty. This flexibility distinguishes our algorithm from the state-of-the-art in which all pairwise constraints are either considered hard, or all are considered soft. Unlike existing algorithms, our algorithm scales to large-scale instances with up to 60,000 objects, 100 clusters, and millions of cannot-link constraints (which are the most challenging constraints to incorporate). We compare the PCCC algorithm with state-of-the-art approaches in an extensive computational study. Even though the PCCC algorithm is more general than the state-of-the-art approaches in its applicability, it outperforms the state-of-the-art approaches on instances with all hard constraints or all soft constraints both in terms of running time and various metrics of solution quality. The source code of the PCCC algorithm is publicly available on GitHub.

READ FULL TEXT

page 32

page 35

research
03/23/2022

Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

Clustering under pairwise constraints is an important knowledge discover...
research
01/30/2018

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

Clustering is inherently ill-posed: there often exist multiple valid clu...
research
09/23/2016

Constraint-Based Clustering Selection

Semi-supervised clustering methods incorporate a limited amount of super...
research
02/07/2014

Active Clustering with Model-Based Uncertainty Reduction

Semi-supervised clustering seeks to augment traditional clustering metho...
research
03/30/2018

A Rule for Committee Selection with Soft Diversity Constraints

Committee selection with diversity or distributional constraints is a ub...
research
09/23/2019

Inducing Hypernym Relationships Based On Order Theory

This paper introduces Strict Partial Order Networks (SPON), a novel neur...
research
05/06/2018

Clustering With Pairwise Relationships: A Generative Approach

Semi-supervised learning (SSL) has become important in current data anal...

Please sign up or login with your details

Forgot password? Click here to reset