1 Introduction
Machine learning algorithms have been rapidly adopted in numerous humancentric domains, from personalized advertising to lending to health care. Fast on the heels of this ubiquity have come a whole host of concerning behaviors from these algorithms: facial recognition has higher accuracy on white, male faces
(Buolamwini & Gebru, 2017); DUI arrest help advertisements are shown more regularly to minority profiles (Sweeney, 2013); and criminal recidivism tools are more likely to label AfricanAmerican lowrisk defendants as highrisk (Angwin et al., 2016). There are also several examples of unsavory ML behavior pertaining to unsupervised learning tasks, whether considering the bias evident in word2vec embeddings
(Bolukbasi et al., 2016) or the gender imbalance of CEO image search results (Kay et al., 2015). Most of the academic work on fairness in machine learning, however, has investigated how to solve classification tasks subject to various constraints on the behavior of a classifier on different demographic groups
(e.g., Hardt et al., 2016; Zafar et al., 2017; Agarwal et al., 2018).This paper adds to the literature on fair methods for unsupervised learning tasks (see Section 4 for related work). We consider the problem of data summarization (Hesabi et al., 2015) through the lens of algorithmic fairness. The goal of data summarization is to output a small but representative subset of a data set. Think of an image database and a user entering a query that is matched by many images. Rather than presenting the user with all matching images, we only want to show a summary. In such an example, a data summary can be quite unfair on a demographic group. Indeed, Google Images has been found to answer the query “CEO” with a much higher fraction of images of men compared to the realworld fraction of male CEOs (Kay et al., 2015).
One approach to the problem of data summarization is provided by centroidbased clustering, such as center (formally defined in Section 2) or medoid (Hastie et al., 2009, Section 14.3.10; sometimes referred to as median). For a centroidbased clustering objective, an optimal clustering of a data set can be defined by points , called centroids, such that the clusters are formed by assigning every to its closest centroid. Since the centroids are good representatives of their clusters, the set of centroids can be used as a summary of . This approach of data summarization via clustering is used in numerous domains: in the social sciences (Bartholomew et al., 2008; Cameron & Trivedi, 2010), in psychology (Borgen & Barnett, 1987; Campbell et al., 2008), and in biology (Eisen et al., 1998).
If the data set comprises several demographic groups , we may consider to be a fair summary only if the groups are represented fairly: if in the real world 70% of CEOs are male and we want to output ten images for the query “CEO”, then three of the ten images should show women. Formally, this can be encoded with one parameter for every group . Our goal is then to minimize the clustering objective under the constraint that many centroids belong to . A constraint of this form can also enforce balanced summaries: even if in the real world there are more male CEOs than female ones, we might want to output an equal number of male and female images to reflect that gender is not definitional to the role of CEO.
Centroidbased clustering under such a constraint has been studied in the theoretical computer science literature (see Sections 2 and 4). However, existing approximation algorithms for this problem run in time , while the unconstrained center clustering problem can be approximated in time linear in . Since data summarization is particularly useful for massive data sets, such a slowdown may be practically prohibitive. The contribution of this paper is to present a simple approximation algorithm for center clustering under our fairness constraint with running time only linear in and . The improved running time comes at the price of a worse guarantee on the approximation factor if the number of demographic groups is large. However, note that in practical situations concerning fairness, the number of groups is often quite small (e.g., when the groups encode gender or race). Furthermore, in our extensive numerical simulations we never observed a large approximation factor, even when the number of groups was large (cf. Section 5), indicating the practical usefulness of our algorithm.
Outline of the paper In Section 2, we formally state the center and the fair center problem. In Section 3, we present our algorithm and provide a sketch of its analysis. The full proofs can be found in Appendix A. We discuss related work in Section 4 and present a number of experiments in Section 5. Further experiments can be found in Appendix B. We conclude with a discussion in Section 6.
2 Definition of Center and Fair Center
Let be a finite data set and be a metric on . In particular, we assume to satisfy the triangle inequality. The standard center clustering problem is the following minimization problem
(1) 
where is a given parameter and . Here, are called centers. Any set of centers defines a clustering of by assigning every to its closest center. The center problem is NPhard and is also NPhard to approximate to a factor better than (Gonzalez, 1985; Vazirani, 2001, Chapter 5). The famous greedy strategy of Gonzalez (1985) is a approximation algorithm with running time if we assume that can be evaluated in constant time (this is the case, e.g., if a problem instance is given via the distance matrix ). This algorithm randomly selects an element of the data set as first center and then iteratively adds the next center to be the point with maximum distance to the current set of centers.
We consider a fair variant of center as described in Section 1. Our variant also allows for the user to specify a given subset that has to be included in the set of centers (think of the example of the image database and the case that we always want to show five prespecified images as part of the summary). Assuming that , where are the demographic groups, the fair center problem can be stated as the minimization problem
(2) 
where with and are given. By means of a partition matroid, the fair center problem can be phrased as a matroid center problem, for which Chen et al. (2016) provide a 3approximation algorithm using matroid intersection (e.g., Cook et al., 1998). Chen et al. (2016) do not discuss the running time of their algorithm, but it requires to sort all distances between elements in and hence has running time at least . In our experiments in Section 5 we observe a running time in .
Notation For , we sometimes use .
3 A Lineartime Approximation Algorithm
In this section, we present our approximation algorithm for the minimization problem (2). It is a recursive algorithm with respect to the number of groups . To increase comprehensibility, we first present the case of two groups and then the general case of an arbitrary number of groups.
At several points, we will consider the standard (unfair) center problem (1) generalized to the case of initially given centers , that is
(3) 
We can adapt the greedy strategy of Gonzalez (1985) while maintaining its 2approximation guarantee for (3). For the sake of completeness, we provide the algorithm as Algorithm 1 and state the following lemma:
Lemma 1.
A proof of Lemma 1, similar in structure to a proof in HarPeled (2011, Section 4.2) for the strategy of Gonzalez (1985) for problem (1), can be found in Appendix A.
3.1 Fair Center with Two Groups
Assume that . Our algorithm first runs Algorithm 1 for the unfair problem (3) with and . If we are lucky and Algorithm 1 picks many centers from and many centers from , our algorithm terminates. Otherwise, Algorithm 1 picks too many centers from one group, say , and too few from . We try to decrease the number of centers in by replacing any such a center with an element in its cluster belonging to . Once we have made all such available swaps, the remaining clusters with centers in are entirely contained within . We then run Algorithm 1 on this subset with and the centers from as well as as initial centers, and return both the centers from the recursive call and those from from the initial call.
This algorithm is formally stated as Algorithm 2. The following theorem states that it is a 5approximation algorithm and that our analysis is tight—in general, Algorithm 2 does not achieve a better approximation factor.
Theorem 1.
Proof.
Here we only present a sketch of the proof. The full proof can be found in Appendix A. For showing that Algorithm 2 is a 5approximation algorithm, let be the optimal value of (2) and be the optimal value of (3) (for ). Clearly, . Let be the set of centers returned by Algorithm 2. It is clear that comprises many elements from and many elements from . We need to show that for every . Let be the output of Algorithm 1 when called in Line 3 of Algorithm 2. Since Algorithm 1 is a 2approximation algorithm for (3) according to Lemma 1, we have , . Assume that . It follows from the triangle inequality that after exchanging centers in the whileloop in Line 9 of Algorithm 2 we have , . Assume that still . We only need to show that , . Let be an optimal solution to (2). We split into two subsets , where comprises all for which the closest center in is in . Using the triangle inequality we can show that , . We partition into at most many clusters corresponding to the closest center in . Each of these clusters has diameter not greater than . If Algorithm 1 in Line 15 of Algorithm 2 chooses one element from each of these clusters, we immediately have , . Otherwise, Algorithm 1 chooses an element from or two elements from the same cluster of . In both cases, it follows from the greedy choice property of Algorithm 1 that , .
A family of examples shows that Algorithm 2 is not a approximation algorithm for any . ∎
3.2 Fair Center with Arbitrary Number of Groups
The main idea to handle an arbitrary number of groups is the same as for the case : we first run Algorithm 1. We then exchange centers for elements in their clusters in such a way that the number of centers from a group comes closer to , which is the requested number of centers from . If via exchanging centers we can actually hit for every group , we are done. Otherwise, we wish that, when no more exchanging is possible, we are left with a subset that only comprises elements from or fewer groups. Denote the set of these groups by . We also wish that for those groups not in we have picked only the requested number of centers or fewer and we can consider the groups not in to have been “resolved”. If both are true, we can recursively apply our algorithm to and a smaller number of groups. We might recurse down to the case of only one group, which we can solve with Algorithm 1.
The difficulty with this idea comes from the exchanging process. Formally, we are given centers and the corresponding clustering , where is the union of clusters with a center in , and we want to exchange some centers for an element in their cluster such that there exists a strict subset of groups with the following properties:
(4)  
(5) 
While in the case of only two groups this can easily be achieved by exchanging centers from the group that has more than the requested number of centers for elements from the other group, as we do in Algorithm 2, it is not immediately clear how to deal with a situation as shown in Figure 1. There are three groups (elements of these groups are shown in blue, green, and red, respectively), and we have . For the current set of centers (elements at the centers of the circles) there does not exist satisfying (4) and (5). We would like to decrease the number of centers in and increase the number of centers in , but the clusters with a center in do not comprise an element from . Hence, we cannot directly exchange a center from for an element in . Rather, we first have to exchange a center from for an element in (although this increases the number of centers from over ) and then a center from for an element in . An algorithm that can deal with such a situation is Algorithm 3. It exchanges some centers for an element in their cluster and yields that provably satisfies (4) and (5), as stated by the following lemma. Its proof can be found in Appendix A.
Lemma 2.
Observing that the number of iterations of the whileloop in Line 7 is upperbounded by as the proof of Lemma 2 shows, that the number of iterations of the forloop in Line 8 is upperbounded by , and that all shortest paths on can be computed in running time (Cormen et al., 2009, Chapter 25), it is not hard to see that Algorithm 3 can be implemented with running time .
Using Algorithm 3, it is straightforward to design a recursive approximation algorithm for the fair center problem (2) as outlined at the beginning of Section 3.2. We state the algorithm as Algorithm 4. Applying, by means of induction, a similar technique as in the proof of Theorem 1 to every (recursive) call of Algorithm 4, we can prove the following:
Theorem 2.
It is not clear to us whether our analysis of Algorithm 4 is tight and the approximation factor achieved by Algorithm 4 can indeed be as large as or whether the dependence on is actually less severe (compare with Section 5 and Section 6). Although trying hard to find instances for which the approximation factor of Algorithm 4 is large, we never observed a factor greater than .
4 Related Work
Fairness By now, there is a huge body of work on fairness in machine learning. For a recent paper providing an overview of the literature on fair classification see Donini et al. (2018). Our paper adds to the literature on fair methods for unsupervised learning tasks (Chierichetti et al., 2017; Celis et al., 2018a, b, c; Samadi et al., 2018; Schmidt et al., 2018). Note that all these papers assume to know which demographic group a data point belongs to just as we do. We discuss the two works most closely related to our paper.
First, Celis et al. (2018b) also deal with the problem of fair data summarization. They study the same fairness constraint on the summary as we do, that is the summary must contain many elements from group . However, while we aim for a representative summary, where every data point should be close to at least one center in the summary, Celis et al. aim for a diverse summary. Their approach requires the data set to consist of points in , and then the diversity of a subset of is measured by the volume of the parallelepiped that it spans (Kulesza & Taskar, 2012). Note that the summarization objective of Celis et al. is different from ours, and in different application domains one or the other may be more appropriate. An advantage of our approach is that it only requires access to a metric on the data set, rather than assuming feature representations of the data points.
The second line of work we discuss centers around the paper of Chierichetti et al. (2017). Their paper proposes a notion of fairness for clustering different from ours. Based on the fairness notion of disparate impact (Feldman et al., 2015) for classification (or the rule; Zafar et al., 2017), the paper by Chierichetti et al. asks that every group be approximately equally represented in each cluster. In their paper, Chierichetti et al. focus on medoid and center clustering and the case of two groups. Subsequently, Rösner & Schmidt (2018) study such a fair center problem for multiple groups, and Schmidt et al. (2018) build upon the work of Chierichetti et al. to devise algorithms for such a fair means problem. While we certainly consider the fairness notion of Chierichetti et al. (2017), which can be applied to any kind of clustering, to be meaningful in some scenarios, we believe that in certain applications of centroidbased clustering (such as data summarization) our proposed fairness notion provides a more sensible alternative.
Centroidbased clustering
There are many papers proposing heuristics and approximation algorithms for both
center (e.g., Hochbaum & Shmoys, 1986; Mladenović et al., 2003; Ferone et al., 2017) and medoid (e.g., Charikar et al., 2002; Arya et al., 2004; Li & Svensson, 2013) under various assumptions on and the distance function . There are also numerous papers on versions with constraints, such as lower or upper bounds on the size of the clusters (Aggarwal et al., 2010; Cygan et al., 2012; Rösner & Schmidt, 2018).Most important to mention are the works by Hajiaghayi et al. (2010), Krishnaswamy et al. (2011) and Chen et al. (2016). Hajiaghayi et al. are the first that consider our fairness constraint (for two groups and without a set that has to be included in the set of centers) for medoid. They present a local search algorithm and prove it to be a constantfactor approximation algorithm. Their work has been generalized by Krishnaswamy et al., who consider medoid under the constraint that the set of centers has to form an independent set in a given matroid. This kind of constraint contains our fairness constraint as a special case (for an arbitrary number of groups and an arbitrary set ). Krishnaswamy et al.
obtain a 16approximation algorithm for this socalled matroid median problem based on rounding the solution of a linear programming relaxation. Subsequently,
Chen et al. study the matroid center problem. Using an algorithm for matroid intersection as black box, they obtain a 3approximation algorithm. Note that none of Hajiaghayi et al., Krishnaswamy et al. or Chen et al. discuss the running time of their algorithm, except for arguing it to be polynomial time (compare with Section 2).5 Experiments
In this section, we present a number of experiments. We begin with a motivating example on a small image data set illustrating that a summary produced by Algorithm 1 (i.e., the standard greedy strategy for the unfair center problem) can be quite unfair. We also compare summaries produced by our algorithm to summaries produced by the method of Celis et al. (2018b). We then investigate the approximation factor of our algorithm on several artificial instances for which we know or can compute the optimal value of the fair center problem (2) and compare our algorithm to the one for the matroid center problem by Chen et al. (2016), both in terms of approximation factor and running time. Next, on both synthetic and real data, we compare our algorithm in terms of the cost of its output to two baseline heuristics. Finally, we compare our algorithm to Algorithm 1 more systematically. We study the difference in the costs of the outputs of our algorithm and Algorithm 1, a quantity one may refer to as price of fairness, and measure how unfair the output of Algorithm 1 can be. In the following, all boxplots show results of 200 runs of an experiment.
5.1 Motivating Example and Comparison with Celis et al. (2018b)
Algorithm 1  Our Algorithm  Celis et al. 
Consider the 14 images^{1}^{1}1All images were found on https://commons.wikimedia.org, https://pexels.com or https://pixnio.com and are in the public domain. of medical doctors shown in the first row of Figure 2. Assume we want to generate a summary of size four of these images. One way to do so is to run Algorithm 1. The first column of the table in Figure 2 shows in each row the summary produced in one run of Algorithm 1 (recall that all algorithms considered here are randomized algorithms). These summaries are quite unfair: although there is an equal number of images of female doctors and images of male doctors, all these summaries show three or even four females. To overcome this bias we can apply our algorithm or the method of Celis et al. (2018b), which both allow us to explicitly state the numbers of females and males that we want in the summary. The second and the third column of the table show summaries produced by these algorithms. It is hard to say which of them produces more useful summaries and the results ultimately depend on the feature representations of the images (see the next paragraph). To provide further illustration, we present a similar experiment in Figure 11 in Appendix B. Note that we chose very small numbers of images in these experiments solely for the purpose of easy visual digestion.
For computing feature representations of the images and running the algorithm of Celis et al. we used the code provided by them. The feature vector of an image is a histogram based on the image’s SIFT descriptors; see Celis et al. for details. We used the Euclidean metric between these feature vectors as metric for Algorithm 1 and our algorithm.
5.2 Approximation Factor and Comparison with Chen et al. (2016)
We implemented the algorithm by Chen et al. (2016) using the generic algorithm for matroid intersection provided in SageMath^{2}^{2}2http://sagemath.org/. To speed up computation, rather than testing all distance values as threshold as suggested by Chen et al., we implemented binary search to look for the optimal value.
In the experiment shown in the left part of Figure 3, we study the approximation factor achieved by our algorithm (Alg. 4) and the algorithm by Chen et al. (M.C.) in various settings of values of , and , . The data set always consists of 25 vertices of a random graph and is small enough to explicitly compute an optimal solution to the fair center problem (2
). The random graph is constructed according to an ErdősRényi model, where any possible edge between two vertices is contained in the graph with probability
. With high probability such a graph is connected (if not, we discard it). We put random weights on the edges, drawn from the uniform distribution on
, and let the metric be the shortestpath distance on the graph. We assign every vertex to one of groups uniformly at random and randomly choose a subset of initially given centers. As we can see from the boxplots, the approximation factor achieved by our algorithm is never larger than 2.4. We also see that in each of the seven settings that we consider the median of the achieved approximation factors (indicated by the red lines in the boxes) is smaller for our algorithm than for the algorithm by Chen et al..In the experiment shown in the right part of Figure 3, we study the running time of the two algorithms as a function of the size of the data set, which is created analogously to the experiment in the left part. We set , and , . The shown curves are obtained from averaging the running times of 200 runs of the experiment. While our algorithm never runs for more than 0.01 seconds, the algorithm by Chen et al., on average, runs for 240 seconds when . Its run time grows at least as , which proves it to be inappropriate for massive data sets.
In the experiment of Figure 4, we once more study the approximation factor achieved by our algorithm. We place 100 optimal centers at , , and sample points around them such that for every center the farthest point in its cluster is at distance 0.5 from the center (Euclidean distance). One such a point set can be seen in the left plot of Figure 4. We randomly assign every point and center to one of groups and set to the number of centers that have been assigned to group . We let . For , the right part of Figure 4 shows boxplots of the approximation factors for our algorithm. Similarly as before, the approximation factor achieved by our algorithm is never larger than 2.6. Most interestingly, the approximation factor increases very moderately with .
5.3 Comparison with Baseline Approaches
We compare our algorithm in terms of the cost of an approximate solution to two baseline heuristics for the fair center problem (2). The first one, referred to as Heuristic A, runs Algorithm 1 on each group separately (with and for group ) and returns the union of the centers obtained for the groups as output. The second one, Heuristic B, greedily chooses centers similarly to Algorithm 1, but only from those groups for which we have not reached the requested number of centers yet. It is easy to see that the approximation factor achieved by these heuristics can be arbitrarily large on some worstcase instances.
Figure 6 shows boxplots of the costs of the approximate solutions returned by our algorithm and the two heuristics for three data sets: the data set in the left plot consists of vertices of a random graph constructed similarly as in the experiments of Figure 3. We set , , , and . The data set in the middle and in the right plot consists of the first 25000 records of the Adult data set (Dheeru & Karra Taniskidou, 2017)
. We only use its six numerical features (e.g., age, hours worked per week), normalized to have mean zero and standard deviation one, for representing records and use the
distance as metric . For the experiment shown in the middle plot, we split the data set into two groups according to the sensitive feature gender (there are 16709 males and 8291 females) and set . For the experiment shown in the right plot, we split the data set into five groups according to the feature race (#White=21391, #AsianPacIslander=775, #AmerIndianEskimo=241, #Other=214, #Black=2379) and set , . In both cases, we let be a subset of randomly chosen records of size . In Figure 10 in Appendix B we present results for other choices of . The two heuristics perform surprisingly well. Although coming without any worstcase guarantees, overall, the cost of their solutions is comparable to the cost of the output of our algorithm. Still, in four out of seven experiments, our algorithm is superior to Heuristic B, which in turn is superior to Heuristic A in all seven experiments.5.4 Comparison with Unfair Algorithm 1
We compare the cost of the solution produced by our algorithm to the cost of the (potentially) unfair solution provided by Algorithm 1. Of course, we expect the latter to be lower. We consider the case , , and also examine how balanced the numbers of centers from a group in the output of Algorithm 1 are. Figure 5 shows the results, where the data sets and settings equal the ones in the experiments of Figure 6. Similar experiments with different settings are provided in Figure 12 in Appendix B. Remarkably, the costs of the solution produced by our algorithm and Algorithm 1 have the same order of magnitude in all experiments, showing that the price of fairness is small. On the other hand, the output of Algorithm 1 can be highly unfair.
6 Discussion
We considered center clustering under a fairness constraint that is motivated by the application of centroidbased clustering for data summarization. We presented a simple approximation algorithm with running time only linear in the size of the data set and the number of centers . We proved our algorithm to be a 5approximation algorithm when consists of two groups. For more than two groups, our analysis yields an upper bound on the approximation factor that increases exponentially with the number of groups. We do not know whether this exponential dependence is necessary or whether our analysis is loose—in our extensive numerical simulations we never observed a large approximation factor. Beside answering this question, in future work it would be interesting to extend our results to other clustering objectives such as medoid or means. It would also be interesting to characterize properties of data sets that guarantee that fast algorithms find an optimal fair clustering.
References
 Agarwal et al. (2018) Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., and Wallach, H. A reductions approach to fair classification. In International Conference on Machine Learning (ICML), 2018.
 Aggarwal et al. (2010) Aggarwal, G., Panigrahy, R., Feder, T., Thomas, D., Kenthapadi, K., Khuller, S., and Zhu, A. Achieving anonymity via clustering. ACM Transactions on Algorithms, 6(3):49:1–49:19, 2010.
 Angwin et al. (2016) Angwin, J., Larson, J., Mattu, S., and Kirchner, L. Propublica—machine bias, 2016. https://www.propublica.org/article/machinebiasriskassessmentsincriminalsentencing.
 Arya et al. (2004) Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., and Pandit, V. Local search heuristics for median and facility location problems. SIAM Journal on Computing, 33(3):544–562, 2004.
 Bartholomew et al. (2008) Bartholomew, D. J., Steele, F., Galbraith, J., and Moustaki, I. Analysis of multivariate social science data. Chapman and Hall, 2008.
 Bolukbasi et al. (2016) Bolukbasi, T., Chang, K.W., Zou, J., Saligrama, V., and Kalai, A. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Neural Information Processing Systems (NIPS), 2016.

Borgen & Barnett (1987)
Borgen, F. H. and Barnett, D. C.
Applying cluster analysis in counseling psychology research.
Journal of Counseling Psychology, 34(4):456, 1987.  Buolamwini & Gebru (2017) Buolamwini, J. and Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability, and Transparency (ACM FAT), 2017.
 Cameron & Trivedi (2010) Cameron, A. C. and Trivedi, P. K. Microeconometrics Using Stata, volume 2. Stata Press, 2010.
 Campbell et al. (2008) Campbell, R., Greeson, M. R., Bybee, D., and Raja, S. The cooccurrence of childhood sexual abuse, adult sexual assault, intimate partner violence, and sexual harassment: a mediational model of posttraumatic stress disorder and physical health outcomes. Journal of Consulting and Clinical Psychology, 76(2):194–207, 2008.

Celis et al. (2018a)
Celis, L. E., Huang, L., and Vishnoi, N. K.
Multiwinner voting with fairness constraints.
In
International Joint Conference on Artificial Intelligence (IJCAI)
, 2018a.  Celis et al. (2018b) Celis, L. E., Keswani, V., Straszak, D., Deshpande, A., Kathuria, T., and Vishnoi, N. K. Fair and diverse DPPbased data summarization. In International Conference on Machine Learning (ICML), 2018b. Code available on https://github.com/DamianStraszak/FairDiverseDPPSampling.
 Celis et al. (2018c) Celis, L. E., Straszak, D., and Vishnoi, N. K. Ranking with fairness constraints. In International Colloquium on Automata, Languages and Programming (ICALP), 2018c.
 Charikar et al. (2002) Charikar, M., Guha, S., Tardos, E., and Shmoys, D. B. A constantfactor approximation algorithm for the median problem. Journal of Computer and System Sciences, 65(1):129–149, 2002.
 Chen et al. (2016) Chen, D. Z., Li, J., Liang, H., and Wang, H. Matroid and knapsack center problems. Algorithmica, 75:27–52, 2016.
 Chierichetti et al. (2017) Chierichetti, F., Kumar, R., Lattanzi, S., and Vassilvitskii, S. Fair clustering through fairlets. In Neural Information Processing Systems (NIPS), 2017.
 Cook et al. (1998) Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., and Schrijver, A. Combinatorial Optimization. Wiley, 1998.
 Cormen et al. (2009) Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms. MIT Press, 3rd edition, 2009.
 Cygan et al. (2012) Cygan, M., Hajiaghayi, M., and Khuller, S. LP rounding for centers with nonuniform hard capacities. In Symposium on Foundations of Computer Science (FOCS), 2012.
 Dheeru & Karra Taniskidou (2017) Dheeru, D. and Karra Taniskidou, E. UCI machine learning repository, 2017. https://archive.ics.uci.edu/ml/datasets/adult.
 Donini et al. (2018) Donini, M., Oneto, L., BenDavid, S., ShaweTaylor, J., and Pontil, M. Empirical risk minimization under fairness constraints. In Neural Information Processing Systems (NeurIPS), 2018.
 Eisen et al. (1998) Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis and display of genomewide expression patterns. Proceedings of the National Academy of Sciences, 95(25):14863–14868, 1998.
 Feldman et al. (2015) Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., and Venkatasubramanian, S. Certifying and removing disparate impact. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2015.
 Ferone et al. (2017) Ferone, D., Festa, P., Napoletano, A., and Resende, M. G. C. A new local search for the center problem based on the critical vertex concept. In International Conference on Learning and Intelligent Optimization (LION), 2017.
 Gonzalez (1985) Gonzalez, T. F. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306, 1985.
 Hajiaghayi et al. (2010) Hajiaghayi, M., Khandekar, R., and Kortsarz, G. The redblue median problem and its generalization. In European Symposium on Algorithms (ESA), 2010.
 HarPeled (2011) HarPeled, S. Geometric approximation algorithms. American Mathematical Society, 2011.

Hardt et al. (2016)
Hardt, M., Price, E., and Srebro, N.
Equality of opportunity in supervised learning.
In Neural Information Processing Systems (NIPS), 2016.  Hastie et al. (2009) Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning — Data Mining, Inference, and Prediction. Springer, 2nd edition, 2009.
 Hesabi et al. (2015) Hesabi, Z. R., Tari, Z., Goscinski, A., Fahad, A., Khalil, I., and Queiroz, C. Data summarization techniques for big data—a survey. In Handbook on Data Centers, pp. 1109–1152. Springer, 2015.
 Hochbaum & Shmoys (1986) Hochbaum, D. S. and Shmoys, D. B. A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM, 33(3):533–550, 1986.
 Kay et al. (2015) Kay, M., Matuszek, C., and Munson, S. A. Unequal representation and gender stereotypes in image search results for occupations. In Conference on Human Factors in Computing Systems (CHI), 2015.
 Krishnaswamy et al. (2011) Krishnaswamy, R., Kumar, A., Nagarajan, V., Sabharwal, Y., and Saha, B. The matroid median problem. In Symposium on Discrete Algorithms (SODA), 2011.
 Kulesza & Taskar (2012) Kulesza, A. and Taskar, B. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning, 5:123–286, 2012.

Li & Svensson (2013)
Li, S. and Svensson, O.
Approximating median via pseudoapproximation.
In
Symposium on the Theory of Computing (STOC)
, 2013.  Mladenović et al. (2003) Mladenović, N., Labbé, M., and Hansen, P. Solving the center problem with tabu search and variable neighborhood search. Networks, 42(1):48–64, 2003.
 Rösner & Schmidt (2018) Rösner, C. and Schmidt, M. Privacy preserving clustering with constraints. In International Colloquium on Automata, Languages, and Programming (ICALP), 2018.
 Samadi et al. (2018) Samadi, S., Tantipongpipat, U., Morgenstern, J., Singh, M., and Vempala, S. The price of fair PCA: One extra dimension. In Neural Information Processing Systems (NeurIPS), 2018.
 Schmidt et al. (2018) Schmidt, M., Schwiegelshohn, C., and Sohler, C. Fair coresets and streaming algorithms for fair kmeans clustering. arXiv:1812.10854 [cs.DS], 2018.
 Sweeney (2013) Sweeney, L. Discrimination in online ad delivery. Queue, 11(3):10–29, 2013.
 Vazirani (2001) Vazirani, V. Approximation Algorithms. Springer, 2001.
 Zafar et al. (2017) Zafar, M. B., Valera, I., Rodriguez, M. G., and Gummadi, K. P. Fairness constraints: Mechanisms for fair classification. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
Appendix
Appendix A Proofs
Proof of Lemma 1:
It is straightforward to see that Algorithm 1 can be implemented in time . We only need to show that it is a 2approximation algorithm for (3).
If , there is nothing to show, so assume that . Let be the output of Algorithm 1 and be an optimal solution of (3) with objective value . Let be arbitrary. We need to show that for some . If , there is nothing to show. So assume . If
there exists with and we are done. Otherwise, let and hence . We distinguish two cases:

with :
We have and hence .

with :
There must be , where not both and can be in , and such that
Since and , it follows that .
Without loss of generality assume that in the execution of Algorithm 1, has been added to the set of centers after has been added. In particular, we have and for some . Due to the greedy choice in Line 5 of the algorithm and since has not been chosen by the algorithm, we have
Proof of Theorem 1:
Again it is easy to see that Algorithm 2 can be implemented in time . We need to prove that it is a 5approximation algorithm, but not a approximation algorithm for any :

Algorithm 2 is a 5approximation algorithm:
Let be the optimal value of the fair problem (2) and be the optimal value of the unfair problem (3). Clearly, . Let with and be an optimal solution to the fair problem (2) with cost and be the centers returned by Algorithm 2. It is clear that Algorithm 2 returns many elements from and many elements from and hence with and . We need to show that