Exploring Rawlsian Fairness for K-Means Clustering

05/04/2022
by   Stanley Simoes, et al.
0

We conduct an exploratory study that looks at incorporating John Rawls' ideas on fairness into existing unsupervised machine learning algorithms. Our focus is on the task of clustering, specifically the k-means clustering algorithm. To the best of our knowledge, this is the first work that uses Rawlsian ideas in clustering. Towards this, we attempt to develop a postprocessing technique i.e., one that operates on the cluster assignment generated by the standard k-means clustering algorithm. Our technique perturbs this assignment over a number of iterations to make it fairer according to Rawls' difference principle while minimally affecting the overall utility. As the first step, we consider two simple perturbation operators – 𝐑_1 and 𝐑_2 – that reassign examples in a given cluster assignment to new clusters; 𝐑_1 assigning a single example to a new cluster, and 𝐑_2 a pair of examples to new clusters. Our experiments on a sample of the Adult dataset demonstrate that both operators make meaningful perturbations in the cluster assignment towards incorporating Rawls' difference principle, with 𝐑_2 being more efficient than 𝐑_1 in terms of the number of iterations. However, we observe that there is still a need to design operators that make significantly better perturbations. Nevertheless, both operators provide good baselines for designing and comparing any future operator, and we hope our findings would aid future work in this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2022

Cluster-level Group Representativity Fairness in k-means Clustering

There has been much interest recently in developing fair clustering algo...
research
12/11/2013

Fast Approximate K-Means via Cluster Closures

K-means, a simple and effective clustering algorithm, is one of the most...
research
11/20/2019

CNAK : Cluster Number Assisted K-means

Determining the number of clusters present in a dataset is an important ...
research
01/01/2021

A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous Datasets

Clustering is a commonly used method for exploring and analysing data wh...
research
10/31/2018

On the True Number of Clusters in a Dataset

One of the main challenges in cluster analysis is estimating the true nu...
research
12/02/2022

Improved Representation Learning Through Tensorized Autoencoders

The central question in representation learning is what constitutes a go...
research
04/24/2014

Solution Path Clustering with Adaptive Concave Penalty

Fast accumulation of large amounts of complex data has created a need fo...

Please sign up or login with your details

Forgot password? Click here to reset