Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions

Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be used when expert knowledge is available to indicate constraints that can be exploited. Well-known examples of such constraints are must-link (indicating that two instances belong to the same group) and cannot-link (two instances definitely do not belong together). The research area of constrained clustering has grown significantly over the years with a large variety of new algorithms and more advanced types of constraints being proposed. However, no unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks. To remedy this, this study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering. In addition, it focuses on the instance-level pairwise constraints, and gives an overview of its applications and its historical context. Finally, it presents a statistical analysis covering 307 constrained clustering methods, categorizes them according to their features, and provides a ranking score indicating which methods have the most potential based on their popularity and validation quality. Finally, based upon this analysis, potential pitfalls and future research directions are provided.

READ FULL TEXT

page 5

page 8

page 17

page 18

page 19

page 20

research
11/30/2021

An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering

The minimum sum-of-squares clustering (MSSC), or k-means type clustering...
research
07/01/2013

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous s...
research
09/23/2016

Constraint-Based Clustering Selection

Semi-supervised clustering methods incorporate a limited amount of super...
research
02/25/2023

Semi-supervised Clustering with Two Types of Background Knowledge: Fusing Pairwise Constraints and Monotonicity Constraints

This study addresses the problem of performing clustering in the presenc...
research
03/02/2021

Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints

Metric clustering is fundamental in areas ranging from Combinatorial Opt...
research
03/29/2018

COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints

Constraint-based clustering algorithms exploit background knowledge to c...
research
10/13/2021

Expert-driven Trace Clustering with Instance-level Constraints

Within the field of process mining, several different trace clustering a...

Please sign up or login with your details

Forgot password? Click here to reset