Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints

03/02/2021
by   Brian Brubach, et al.
6

Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of stochastic pairwise constraints, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others Individual Fairness in clustering and Must-link constraints in semi-supervised learning. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints. Furthermore, for certain objectives we devise improved results in the case of Must-link constraints, which are also the best possible from a theoretical perspective. Finally, we present experimental evidence that validates the effectiveness of our algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2020

Fair Correlation Clustering

In this paper, we study correlation clustering under fairness constraint...
research
09/23/2016

Constraint-Based Clustering Selection

Semi-supervised clustering methods incorporate a limited amount of super...
research
07/14/2020

A Pairwise Fair and Community-preserving Approach to k-Center Clustering

Clustering is a foundational problem in machine learning with numerous a...
research
07/16/2020

Maximizing coverage while ensuring fairness: a tale of conflicting objective

Ensuring fairness in computational problems has emerged as a key topic d...
research
06/14/2021

Allocating Stimulus Checks in Times of Crisis

We study the problem of allocating bailouts (stimulus, subsidy allocatio...
research
02/28/2023

Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions

Clustering is a well-known unsupervised machine learning approach capabl...
research
09/22/2011

Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications

This paper presents a novel pairwise constraint propagation approach by ...

Please sign up or login with your details

Forgot password? Click here to reset