Clustering without Over-Representation

05/29/2019
by   Sara Ahmadian, et al.
Google
0

In this paper we consider clustering problems in which each point is endowed with a color. The goal is to cluster the points to minimize the classical clustering cost but with the additional constraint that no color is over-represented in any cluster. This problem is motivated by practical clustering settings, e.g., in clustering news articles where the color of an article is its source, it is preferable that no single news source dominates any cluster. For the most general version of this problem, we obtain an algorithm that has provable guarantees of performance; our algorithm is based on finding a fractional solution using a linear program and rounding the solution subsequently. For the special case of the problem where no color has an absolute majority in any cluster, we obtain a simpler combinatorial algorithm also with provable guarantees. Experiments on real-world data shows that our algorithms are effective in finding good clustering without over-representation.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/14/2021

Fair Clustering Under a Bounded Cost

Clustering is a fundamental unsupervised learning problem where a datase...
10/26/2020

KFC: A Scalable Approximation Algorithm for k-center Fair Clustering

In this paper, we study the problem of fair clustering on the k-center o...
02/15/2018

Fair Clustering Through Fairlets

We study the question of fair clustering under the disparate impact doc...
01/26/2023

Re-embedding data to strengthen recovery guarantees of clustering

We propose a clustering method that involves chaining four known techniq...
07/07/2022

Individual Preference Stability for Clustering

In this paper, we propose a natural notion of individual preference (IP)...
09/20/2022

Explainable Clustering via Exemplars: Complexity and Efficient Approximation Algorithms

Explainable AI (XAI) is an important developing area but remains relativ...
02/27/2019

Reconciliation k-median: Clustering with Non-Polarized Representatives

We propose a new variant of the k-median problem, where the objective fu...

Please sign up or login with your details

Forgot password? Click here to reset