Fair Clustering Using Antidote Data

06/01/2021
by   Anshuman Chhabra, et al.
0

Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Fair Algorithms for Hierarchical Agglomerative Clustering

Hierarchical Agglomerative Clustering (HAC) algorithms are extensively u...
research
02/06/2020

Fair Correlation Clustering

In this paper, we study correlation clustering under fairness constraint...
research
11/17/2021

CONFAIR: Configurable and Interpretable Algorithmic Fairness

The rapid growth of data in the recent years has led to the development ...
research
10/04/2022

Robust Fair Clustering: A Novel Fairness Attack and Defense Framework

Clustering algorithms are widely used in many societal resource allocati...
research
02/08/2021

Learning to Generate Fair Clusters from Demonstrations

Fair clustering is the process of grouping similar entities together, wh...
research
01/24/2022

Learning Optimal Fair Classification Trees

The increasing use of machine learning in high-stakes domains – where pe...
research
01/18/2021

Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification

Classification, a heavily-studied data-driven machine learning task, dri...

Please sign up or login with your details

Forgot password? Click here to reset