1 Introduction
Growing use of automated decision making has sparked a debate concerning bias, and what it means to be fair in this setting. As a result, an extensive literature exists on algorithmic fairness, and in particular on how to define fairness for problems in supervised learning
Dwork12Fairness; Romei13Multidisciplinary; Feldman2015DisparateImpact; hardt2016equality; arvindtutorial; mitchell2018predictionbased. However, these notions are not readily applicable to unsupervised learning problems such as clustering. The reason is that unlike in the supervised setting, a welldefined notion of ground truth does not exist in such problems. In 2017, chierichetti2017fair proposed the idea of balance as a notion of fairness in clustering. Given a set of data points with a type assigned to each one, balance asks for a clustering where each cluster has roughly the same proportion of types as the overall population. This definition spawned a flurry of research on efficient algorithms for fair clustering chierichetti2017fair; kleindessner2019fair; kleindessner2019guarantees; chen2019proportionally; schmidt2018fair; ahmadian2019clustering; rosner2018privacy. Further work by other researchers has extended this definition, but with the same basic principle of proportionality (abraham2019fairness; backurs2019scalable; bercea2018cost; huang2019coresets; bera2019fair; wang2019towards; zikoclustering).There are two sources of concern with balance as a normative principle. The idea that enforcing proportionate clusters leads to fairness would make sense if the objective is to pick one cluster as representative of the entire set. However, this is not a typical goal in clustering. The objective in clustering problems is often grouping similar data points together, where each cluster center is representative of its cluster. This means that unlike supervised learning, labels assigned by a clustering algorithm do not always carry an inherent meaning like being accepted to college or defaulting on a loan. So representativeness of a particular cluster may not always be meaningful or worse, may incorrectly “represent” the set of points. One example of such a concern is in the problem of redistricting: a partitioning of a region into voting districts that achieves “balance” between voters from different political parties will result in each district having a majority of voters from the same political party, which is in fact a technique in gerrymandering called cracking. Secondly, if we borrow the notion of disparate impact, we would desire “protected” classes to have approximately equal representation in the decision space, compared to the majority group. However, enforcing balance does not necessarily guarantee such requirement. To illustrate why, see the example presented in Figure 1, which shows a balancepreserving means clustering on the left for two groups denoted by the colors red and blue, and regular means clustering on the right. Here, the number of red points is larger than the other group. Therefore, each cluster center is chosen close to its respective red group’s centroid. As a result, red points are better represented by chosen centers compared to blue points.
1.1 Our Contributions
In this paper we propose a notion of fairness in clustering based on the idea of minimizing gaps in representativeness for groups. We present a number of different ways of measuring representativeness and interestingly, show that they naturally parallel standard notions of fairness in the supervised learning literature. We establish some basic properties of these measures as well as showing their incompatibility with each other. We also present bicritera approximation algorithms for computing medians under these different notions of fairness and support this with an experimental study that illustrates both the effectiveness of these measures and their incompatibility with notions of balance.
2 Related Work
Chierichetti et al. chierichetti2017fair introduced balance as a fairness constraint in clustering for two groups. Considering the same setting with binary attribute for groups, Backurs et al. improved the running time of their algorithm for fair median (backurs2019scalable). Rösner and Schmidt (rosner2018privacy) proposed a constantfactor approximation algorithm for fair center problem with multiple protected classes. Bercea et al. (bercea2018cost) proposed bicriteria constantfactor approximations for several classical clustering objectives and improved the results of Rösner and Schmidt. Bera et al. (Bera19) generalized previous works by allowing maximum over and minimum underrepresentation of groups in clusters, and multiple, nondisjoint sensitive types in their framework. Other works have studied multiple types setting (wang2019towards), multiple, nondisjoint types (huang2019coresets) and cluster dependent proportions (zikoclustering).
In a different line of work, Ahmadian et al. (ahmadian2019clustering) studied fair center problem where there is an upper bound on maximum fraction of a single type within each cluster. Chen et al. (chen2019proportionally) studied a variant of fair clustering problem where any large enough group of points with respect to the number of clusters are entitled to their own cluster center, if it is closer in distance to all of them.
A large body of works in the area of algorithmic fairness have focused on ensuring fair representation of all social groups in the machine learning pipeline
(bolukbasi2016man; samadi_price_2018; abbasi2019fairness). Recent work by Mahabadi et al. (mahabadi2020individual), studies the problem of individually fair clustering, under the fairness constraint proposed by Jung et al. (jung2019center). In their framework, if denotes the minimum radius such that the ball of radius centered at has at least points, then at least one center should be opened within distance from .3 Fair Clustering
In this paper we will consider clustering objectives that satisfy the Voronoi property: the optimal assignment for a point is the cluster center nearest to it. This includes the usual clustering formulations like center, means and median. Thus, we can represent a clustering as the set of cluster centers . The cost of a clustering of a set of points is given by the function . For any subset of points , we denote the cost of assigning to cluster centers in a given clustering as . Finally, given a cost function cost and a set of points we denote the set of centers in an optimal clustering of by . When the context is clear, we will drop the subscript and merely write this as .
Our ideas of fairness in clustering are rooted in the idea of equitable representations. To that end, we introduce different ways to measure the cost of group representation. We can then define a fair clustering.
Definition 1.
(Fair Clustering). Given a set of data points partitioned into groups , fair clustering minimizes the maximum average (representation) cost across all groups:
where is the set of all possible clusterings.
3.1 Quality of group representation
We now introduce different ways to measure group representation cost.
3.1.1 Absolute Representation Error
In supervised learning, statistical parity captures the idea that groups should have similar outcomes. Rephrasing, it says that groups should be represented equally well in the output. In the case of binary classification, Statistical Parity requires
for two groups and , where denotes the sensitive attribute. A clustering adaptation of statistical parity would require that cluster centers represent all groups equally well, regardless of their potentially different distributions. More specifically, the average distance between members of a group and their respective cluster centers should look the same across groups. Motivated by this, we introduce the following definition of representation cost.
Definition 2 (AbsError).
The absolute (representation) error of a clustering
where is a set of points, is a set of centers and is a an arbitrary distance function between and nearest center to it in .
An AbsErrorfair clustering is a fair clustering that uses AbsError to measure group representation cost in Defn 1.
3.1.2 Relative representation error
AbsError does not take different group distributions into account. In order to see how that might be problematic, let us consider minimizing the maximum value of AbsError for two groups and in Figure 2, using three clusters. Assume and . The points in group could be grouped in two clusters and with close to zero cost as shown in the figure. The points in group lie on three line segments , and where points are placed on and points are placed on each one of the other two and . We should note that the line segments are all of size and the points are distributed uniformly on each one. Since we assumed the size of group is much larger than the size of group , an optimal clustering for would have , and as its centers. Without loss of generality, if we assume all the points on and are closer to than and , then the total cost of clustering with for group is . If for we have , then the optimal clustering for group would have , and as its centers and . In addition, it is easy to see also minimizes the maximum average AbsError for both groups where this value for each group is . In such setting, for a small enough value of , we see that in an AbsErrorFair clustering, the total cost for group has increased substantially compared to an unconstrained clustering, while group has not gained a noticeable benefit.
In the example above, AbsErrorfair clustering fails to achieve a fair and at the same time acceptable clustering, because it ignores the fact that the two groups have drastically different distributions. Attention to this form of “base rates” is the motivation behind introduction of fairness measures like equality of opportunity based on balancing error rates rather than outcomes hardt2016equality.
A clustering adaptation of a notion like equality of opportunity, would require two steps. Firstly, comparing the average distance between members of a social group and their respective cluster centers, to the corresponding “optimal” value of that group. Secondly, ensuring the difference between these two values for the minority group is roughly equal to the corresponding difference for the majority group.
This relative measure of representation error motivates the following definition.
Definition 3 (RelError).
The relative (representation) error of a clustering is given by
where is a set of points, is a set of centers and is a an arbitrary distance function between and nearest center to it in .
Alternatively, one can capture the relative error via a difference instead of a division.
This is similar to the formulation used by samadi_price_2018 in their work on fair PCA. For technical reasons relating to the difficulty of optimizing for differences, we will not discuss this further here.
Equality of cost in fair clustering
Our definition of fair clustering asks to minimize the maximum representation cost for groups. Another way to think of fair clustering with respect to group representations is to enforce equality of representation costs across groups. Though it may not seem obvious at first glance, this approach to fairness is related to definition 1. In order to connect these two definitions, we present a similar argument to that of Samadi et al. (samadi_price_2018). In Observation 5, we describe how and under what conditions, minimizing the maximum cost across groups leads to equal costs for them.
Definition 4.
(Homogeneous group) Given a set of data points and an arbitrary subset , we call homogeneous with respect to and a given clustering cost function, if there is at least one clustering where ’s average cost is smaller than or equal to ’s optimal average cost. Formally, we call homogeneous if .
Observation 5.
Assume we are given a clustering algorithm with a continuous and convex cost function (e.g. soft clustering with means) and also a set of points , which can be partitioned into two homogeneous groups and . Minimizing the maximum average cost over two groups is equivalent to equalizing the average cost for them.
Proof.
Let denote the clustering returned after minimizing the maximum cost over two groups and . If , we’re done. So without loss of generality, let’s assume . In this case, (which is the global minimum for the function ) is a local minimum for group ’s cost function. Otherwise, since the cost function is continuous, there should be another clustering, , where:
which means min max procedure should have returned instead of . A convex function has only one local minimum which is also a global minimum. Therefore, since we assumed the given cost function is convex, is a global minimum for group ’s cost function. On the other hand, clustering returns a global minimum for group ’s cost function. Therefore, because the two groups are homogeneous:
(1) 
Inequality 1 tells us that overall average cost given clustering , is smaller than the average cost for in clustering which contradicts the optimality of the clustering .
Since the two groups are homogeneous, continuity of the cost function guarantees that there is at least one clustering where the average cost for the two groups is equal. Therefore, minimizing the maximum average cost over the two groups, would return such a clustering with the smallest value possible. ∎
4 Algorithms for fair clustering
We now present algorithms for fair clustering under these measures of fairness. We start with an observation about the difference between optimizing in a “groupblind” manner and explicit optimization for group representations. Such observations are generally referred to as the “price” of fair clustering^{1}^{1}1We use this terminology because it is commonly used. However, in a broader sense we believe that discussions of fairness in terms of a compromise in quality are misguided and represent a false tradeoff between two fundamentally different values..
Theorem 6.
Consider an arbitrary clustering algorithm and a set of data points which can be partitioned into of groups . If in the optimal clustering for the entire set, group suffer the largest average cost, the total cost of fair clustering would not be larger than times the optimal solution.
Proof.
Let us denote the fair clustering by . By assumption, the average costs for every group in , is no larger than the average cost for group in :
Therefore:
∎
We now show the analysis in theorem 6 is tight for all variations of fair clustering introduced in section 3.1. Consider the relaxed version of the means problem, namely linear subspace clustering. In this problem, the goal is to find subspaces of rank at most , which minimize the sum of squared distances between input points and subspaces (turning; cohen). The cost of clustering in this problem is the minimum cost of projecting data points on such subspaces. We first build the example for fair clustering, having AbsError as the cost function, and later make a small adjustment for it to be applicable to RelErrorfair clustering as well.
Consider minimizing the maximum cost for two groups and in Figure 3, using one cluster center (here, the cluster center is 1 dimensional or simply a direction). Let’s assume there are two points in each group and . Since , an optimal clustering with no fairness constraint would pick axis as subspace where the average AbsError for group is zero and the average AbsError for group is . However, it is easy to see that in order to minimize the maximum average AbsError over both groups, we should pick as the fair subspace. Referring to Theorem 5, we know the average AbsError for the two groups is equal.^{2}^{2}2Theorem 5 applies here because Frobenius norm is convex. Therefore:
is inversely related to . As a result, by increasing , the average cost of projecting group onto the fair solution () gets arbitrarily close to the corresponding cost in the optimal solution ( axis). Therefore, the cost of projection for all points in the fair solution would asymptotically get close to times the corresponding cost in the optimal unconstrained solution.^{3}^{3}3We should note that since the optimal cost for each group is zero, the same reasoning applies if RelErrorDiff is used as cost function
The example above could be used to prove the tightness of the analysis in Theorem 6 for RelErrorfair clustering. However, since division by zero is not defined, we make a small adjustment. So, instead of having two points for each group, we consider two sets of points i.e. the points and in the AbsErrorfair example, all become centers for points belonging to the same group. We assume these points are close enough to centers so their within group distances are negligible. The rest of the analysis is the same as before.
According to the result above, it may seem that different measures all have similar behavior on different data sets. However, The examples provided in Figure 4, showcase how different measures of representation cost induce different clusterings for the linear subspace clustering problem.
4.1 Approximation algorithm via LP relaxation
For the fair median and
means problem, we now study the natural linear programming relaxation and develop a rounding algorithm.
4.2 Relaxation for AbsErrorFair clustering
Let be the groups of vertices, and let . For , is intended to denote if vertex is assigned to center . These are called assignment variables. We also have variables that are intended to denote if is chosen as one of the centers (or medians). The LP (called FairLPAbsError) is now the following:
subject to  
(2) 
The only new constraint compared to the standard LP for median (e.g., charikar2002constant) is the constraint for all groups . This is to ensure that we minimize the maximum median objective over the groups. To handle means clustering, it suffices to replace in the constraint with . See Remark 9 for details.
Theorem 7.
The integrality gap of FairLPAbsError is .
Proof.
Consider an instance in which we have points in total, each in a different group. Formally, let for all . Suppose that for all , and let .
Now, consider the fractional solution in which for all . Also, let , and let for some (it does not matter which one). It is easy to see that this solution satisfies all the constraints. Moreover, the LP objective value is .
However, in any integral solution, one of the points is not chosen as a center, and thus the objective value is at least . Thus the integrality gap is . ∎
Theorem 7 makes it hard for an LP approach to give an approximation factor better than (which follows via a simple algorithm that finds an approximate median solution on using the weight for all points in ). However, the LP can still be used to obtain a bicriteria approximation.
Theorem 8.
Consider a feasible solution for FairLPAbsError. For any , there is an algorithm that opens centers, while achieving an objective value of .
Proof.
The proof is based on the wellknown “filtering” technique (Lin1992approximation; charikar2002constant). Define as the LP’s “connection cost” for the point . Formally, . Now, construct a subset of the points as follows. Set and to begin with, and in every step, find that has the smallest value (breaking ties arbitrarily) and add it to . Then, remove all such that from the set . Suppose we continue this process until is empty.
The set obtained satisfies the following property: for all , . This is true because if was added to before , then , and further, should not have been removed from , which gives the desired bound. The property above implies that the set of metric balls are all disjoint.
Next, we observe that each such ball contains a total value at least . This is by a simple application of Markov’s inequality. By definition, , and thus . This means that , and thus . As the balls are disjoint, we have that .
Now, consider an algorithm that opens all the points of as centers. By construction, all are at a distance from some point in , and thus for any group , we have that , thus completing the proof of the theorem. ∎
Remark 9 (Extension to means).
The argument above can be extended easily to obtain similar results for the means objective. We simply replace all distances with the squared distances. The metric ball around each point can be replaced with the ball, and the same approximation factors hold.
LP based heuristic.
The instance showing the factor integrality gap is special in the sense that every group has exactly one point, and thus it is impossible for an integer solution with to achieve a small cost for all of them. We now see that in the case of median, there exist randomized rounding strategies that ensure that in expectation, the connection cost of every group is within a constant factor of the LP objective. (Of course, all the costs need not simultaneously be small, e.g., in our gap instance.)
Definition 10 (Faithful rounding).
A (randomized) rounding procedure for FairLPAbsError is said to be faithful if it takes a feasible solution and produces a feasible integral solution with the guarantee that for every ,
Using a simple dependent rounding (see Chekuri2010dependent; Srinivasan2001distributions) procedure, charikar2012dependent showed that there exists a faithful rounding for FairLPAbsError with . We note that some of the other LP rounding schemes (e.g., charikar2002constant) are not faithful. Formally,
Theorem 11 (charikar2012dependent).
There exists a faithful (randomized) rounding algorithm for FairLPAbsError, with .
Corollary 12.
Let be a solution to FairLPAbsError. There exists a rounding algorithm that ensures that the expected connection cost for every group is .
The corollary follows directly from Theorem 11
, by linearity of expectation. While this does not guarantee that the rounding simultaneously produces a small connection cost for all groups, this gives a good heuristic rounding algorithm. In examples where every group has many points welldistributed across clusters, the costs tend to get concentrated around the expectation, leading to small connection costs for all clusters. We will see this via examples in the experimental section.
4.3 Relaxation for RelErrorFair clustering
We now see that the rounding methods introduced in Section 4.2 can also be used for RelErrorfair clustering. However, the LP in this case is not quite a relaxation:
subject to  
(3) 
The constraint (3) now involves a new term, , which is an approximation to the optimum median objective of the set . For our purposes, we do not care how this approximation is achieved – it can be via an LP relaxation charikar2002constant; Li2013approximating, local search arya2004local; gupta2008, or any other method. We assume that if is the optimum median objective for , then for all , for some constant . (From the works above, we can even think of as being .)
Lemma 13.
Suppose there is a rounding procedure that takes a solution to FairLPRelError and outputs a set of centers with the property that for some parameter ,
(4) 
Then, this algorithm provides an approximation to RelErrorfair clustering.
Proof.
Let Opt be the optimum value of the ratiofair objective on the instance . The main observation is that the LP provides a lower bound on Opt. This is true because any solution to ratiofair clustering leads to a feasible integral solution to FairLPRelError, where the RHS of the constraint (3) is replaced by . Since , it is also feasible for FairLPRelError, showing that the optimum LP value is .
Next, consider a rounding algorithm that takes the optimum LP solution and produces a set that satisfies (4) (with ). Then, since , we have
and using completes the proof of the lemma. ∎
Thus, it suffices to develop a rounding procedure for FairLPRelError. Here, we observe that the rounding from Theorem 8 directly applies (because ensures that every , ), giving us the same bicriteria guarantee (and the same adjustment under faithful rounding).
Corollary 14 (Corollary to Theorem 8).
For any , there is an efficient algorithm that opens centers and achieves a approximation to the optimum value of the ratiofair objective.
5 Experiments
In this section, we present two types of experiments. In the first part, we evaluate balancebased approaches to fair clustering with regard to our representationbased notions. In the second part, we evaluate our algorithms for fair clustering and provide an empirical assessment for their performance. We consider four datasets:

Synthetic. Synthetic dataset with three features. First feature is binary (“majority" or “minority"), and determines the group example belongs to. Second and third attributes are generated using distribution in the majority group, and distribution in minority group. Majority and minority groups are of size 250 and 50, respectively.

Iris.^{4}^{4}4https://archive.ics.uci.edu/ml/datasets/iris Data set consists of 50 samples from each of three species of Iris: Iris setosa, Iris virginica and Iris versicolor. Selected features are length and width of the petals.

Census.^{5}^{5}5https://archive.ics.uci.edu/ml/datasets/adult Dataset is 1994 US Census and selected attributes are “age",“fnlwgt", “educationnum", “capitalgain" and “hoursperweek". groups of interests are “female" and “male".

Bank.^{6}^{6}6https://archive.ics.uci.edu/ml/datasets/Bank+Marketing The dataset contains records of a marketing campaign based on phone calls, ran by a Portuguese banking institution. Selected attributes are “age", “balance", “duration" and groups of interest are “married" and “single".
We should note that in all experiments, points were clustered using 3 centers.
5.1 On balance and representations
In this section, we empirically study the effects of enforcing balance on group representations. More specifically, we compare each group’s average cost for unconstrained median to the corresponding value under balance constraint. As for the balancefair median, we chose the algorithm proposed by Backurs et al. (backurs2019scalable).^{7}^{7}7Implementation could be found here. In this experiment, we used the entire Synthetic and Iris datasets, and sampled 300 examples from each of Census (150 male, 150 female) and Bank (150 married, 150 single) datasets. In table 1, we present the average costs for all groups within each dataset, in two clusterings generated by unconstrained median and balanced median. In all datasets, we observe enforcing balance amplifies representation disparity across groups and leads to a higher maximum average cost. However, it is especially more noticeable in Synthetic and Iris datasets, where different groups have drastically different distributions.^{8}^{8}8The algorithm proposed by Backurs et al. works on only two groups. We chose two groups out of three from Iris. Repeating the experiment with other groups lead to similar results.
Datasets  Synthetic  Iris  Census  Bank  

majority  minority  Setosa  Versicolor  female  male  married  single  
Unconstrained  0.514  0.678  0.169  0.256  34492.40  35083.73  627.05  682.76 
Balanced  0.430  3.476  0.101  2.819  34019.34  35876.70  622.87  694.78 
5.2 Algorithm evaluation
The empirical results in the last section show that balancedbased algorithms do not mitigate representation disparity across groups. Therefore, In this section we propose two heuristic algorithms to compute grouprepresentative median clusterings, which we call LSFair median and LPFair median:
LSFair median
Arya et al. proposed a local search algorithm to approximately solve the median problem (arya2004local). Their algorithm starts with an arbitrary solution, and repeatedly improves it by swapping a subset of the centers in the current solution, with another set of centers not in it. We modify this algorithm to minimize the maximum average cost over all groups. Assuming we’re given a cost function and as groups where , LSFair median is presented in Algorithm 1.
Later in this section we see that LSFair median works well in practice. However, the following example shows that median with the AbsErrorfair objective can have local optima that are arbitrarily worse than the global optimum. Let and be two sets that are far apart (think of the distance between any pair as ). , where and , for some integer parameter . Likewise, suppose that , of sizes respectively. Suppose that all the elements of (so also ) are at distance away from one another. Suppose the distance between and (so also and ) is .
Now, suppose the two groups are and . Let . The optimal solution is to choose one point in and another in . This results in an objective value of
Consider the solution that chooses the unique points from and . The median objective for both the groups is , and thus the AbsErrorfair objective is . Now, consider swapping with some point . This changes the median objective for group 1 from to , and so even though the swap significantly decreases the objective for the second group, the local search algorithm will not perform the swap. The same argument holds for swapping with a point . It is thus easy to see that is a locally optimum solution.
However, the ratio between the AbsErrorfair objectives of this solution and the optimum is for . Thus the gap can be as bad as the number of points.
LPFair median
LPFair median first solves FairLP, presented in sections 4.1 and 4.3, and later rounds the solution with the matching idea proposed by Charikar et al. (charikar2012dependent). The rounding is done in four phases:

Filtering: Similar to the filtering technique described in section 4.1, we construct a subset of the points. With a small adjustment that after adding a point to the set , all points from the original set such that , will not be considered to be added to anymore.

Bundling: For each point , we create a bundle which is comprised of the centers that exclusively serve . In the rounding procedure, each bundle
is treated as a single entity, where at most one center from it will be opened. The probability of opening a center from a bundle,
, is the sum of , which we call bundle’s volume. 
Matching: The generated bundles have the nice property that their volume lies within and . So given any two bundles, at least one center from them should be opened. Therefore, while there are at least two unmatched points in , we match the corresponding bundles of the two closest unmatched points in .

Sampling: Given the matching generated in the last phase, we iterate over its members and consider the bundle volumes as probabilities, to open centers in expectation.
The centers picked in the sampling phase are returned as the final centers.
Results.
In this experiment, in order to save space, we focus on just Census and Bank datasets. However, we consider two subsamples of each dataset: 1:1 Census contains 150 female and 150 male examples, 1:5 Census contains 50 female and 250 male examples, 1:1 Bank contains 150 married and 150 single examples, and 1:5 Bank contains 50 married and 250 single examples. In table 2, we present the average cost for all groups within each sample. Group optimal presents the optimal average cost for a group, when it is clustered by itself via centers. median presents a group’s average cost, in a clustering generated by unconstrained median, performed on all groups together. The other rows in the table show the average group cost for either of heuristic algorithms, using various cost functions. In general, the results demonstrate the effectiveness of our algorithms. However, we emphasize on the difference between 1:1 and 5:1 samples. In 1:1 case, the groups have the same size and unconstrained median treats them roughly the same. But in 5:1 case, if the groups have different distributions, unconstrained median favors majority group over the other, and the effectiveness of our proposed algorithms are more evident. ^{9}^{9}9Each dataset was sampled 10 times and we reported the overall average.
Datasets  1:1 Census  1:5 Census  1:1 Bank  1:5 Bank  

female  male  female  male  married  single  married  single  
Group optimal  34499  31528  35349  32619  569  686  659  655  
median  35264  32351  40212  32689  596  730  948  665  
AbsError  LSFair  34827  33298  37887  36144  627  718  749  740 
LPFair  34668  33971  38396  35675  630  717  740  763  
RelError  LSFair  35390  32341  38099  34702  611  727  745  747 
LPFair  35397  32343  38067  33865  613  722  767  743 
6 Conclusion
In this work we presented a novel approach to think of and formulate fairness in clustering tasks, based on group representativeness. Our main contributions are introducing a fairness notion which parallels the development of fairness in classification setting, proposing bicritera approximation algorithms for
medians under different variations of this notion and providing theoretical bounds. Our results suggest that our formulation provides better quality representations especially when the groups are skewed in size.
7 Broader Impact
Clustering is a critical part and often one of the early steps of learning pipeline. Preprocessing data for supervised learning is one of its many use cases. Therefore it is critical to understand how bias might enter the pipeline through clustering, and how one might mitigate it. The underlying assumption in most clustering tasks is that cluster centers act as a representatives and summarize the variety of points in their cluster. If there exist predefined groups beyond clusters, it is possible that some groups are not as wellrepresented as others in a clustering. Poor representation of a specific set of data points in clustering, may lead to that group being neglected in the rest of the learning pipeline. Our research introduces a new way of understanding and mitigating poor representation of protected social groups in clustering. This is crucial in ensuring equal treatment of all social groups, in any learning task which uses clustering as a preprocessing tool.
In all discussions around what it means for something to be “fair”, it is important to look at the normative basis for the claim. We argue that representation quality acts as a normative basis. While it is true that this is not captured by existing formulations, a proper understanding of normative concerns around representation requires a deeper understanding of specific use cases, even if it just a matter of deciding whether to use AbsError or RelError. In that respect, our work is on the more theoretical end – trying to understand the computational elements of these measures – rather than providing a recommendation for how we should do clustering fairly.
Another concern that our work exposes but does not resolve is that the process of enforcing fair representations may come at the price of making a noticeable fraction of population suffer a larger burden, in terms of representation cost. The nature of such tradeoff is not wellstudied at this point and remains both a point of caution as well as an avenue for further study.