How to Solve Fair k-Center in Massive Data Models

02/18/2020
by   Ashish Chiplunkar, et al.
0

Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair k-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of k-center.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2023

Fair k-Center: a Coreset Approach in Low Dimensions

Center-based clustering techniques are fundamental in some areas of mach...
research
01/24/2019

Fair k-Center Clustering for Data Summarization

In data summarization we want to choose k prototypes in order to summari...
research
02/26/2018

Improved MapReduce and Streaming Algorithms for k-Center Clustering (with Outliers)

We present efficient MapReduce and Streaming algorithms for the k-center...
research
07/22/2022

Fair Range k-center

We study the problem of fairness in k-centers clustering on data with di...
research
05/23/2023

Single-Pass Pivot Algorithm for Correlation Clustering. Keep it simple!

We show that a simple single-pass semi-streaming variant of the Pivot al...
research
07/18/2022

Streaming Algorithms for Support-Aware Histograms

Histograms, i.e., piece-wise constant approximations, are a popular tool...
research
11/01/2022

Composable Coresets for Constrained Determinant Maximization and Beyond

We study the task of determinant maximization under partition constraint...

Please sign up or login with your details

Forgot password? Click here to reset