1 Introduction
As more and more decisions are automated, there has been an increasing interest in incorporating fairness aspects in algorithms by design. This applies in particular to clustering problems, where considerable attention has recently been dedicated to developing and studying various models of fair clustering, see, e.g., [CKLV17], [BIPV19], and [BCCN21].
In this paper, we focus on the socalled colorful setting, which was introduced in [BIPV19]. In colorful clustering, each client is a member of certain subgroups and every clustering is required to cover at least a given number of clients of each subgroup. This may be considered under various clustering objectives (like median and mean), though only the center case has been studied so far.
Colorful clustering is an appealing notion as it is a natural generalization of the robust (or outlier) setting, where there is only a single group which every client belongs to. Various clustering problems have been studied in depth in the robust setting, see, e.g., [CN19], [HPST19], and [BCCN21].
While the robust setting is amenable to a variety of wellknown and basic algorithmic techniques, the only constantfactor approximations for the colorful setting, which imposes multiple covering constraints leading to more balanced clusterings, are based on significantly more sophisticated techniques, tailored specifically to those settings. More precisely, three distinct techniques have been successful at achieving constantfactor approximations in the context of colorful center clustering, namely the combinatorial approach of [JSS21], the roundorcutbased approach of [AAKZ21], and the iterative greedy reductions of [IV21].
However, these approaches do not immediately generalize to variants with constraints on the facilities, even for the common Matroid Center or Knapsack Center clustering variants. On the other hand, techniques for the Knapsack and Matroid Center problems in the robust setting (see [CN19] and [HPST19]) do not easily extend to multiple covering constraints.
Thus, prior to this work, no approaches have been known that lead to constantfactor approximations for colorful variants of otherwise wellstudied center problems like Matroid Center or Knapsack Center. Filling this gap is the goal of this paper.
1.1 Our contributions
Our main contribution is a partitioning procedure which leads to a general reduction of colorful center clustering problems with constraints on the facilities to a significantly simpler multidimensional covering problem (see Theorem 3). This reduction comes at the cost of a constant factor depending on the number of colors.
It is inspired by recent insights of [IV21] on decoupling multiple covering constraints and iteratively applying a greedy partitioning procedure of [CKMN01]. By taking into account multiple colors at the same time, our framework gives an improved way of dealing with multiple covering constraints while also becoming more versatile. Our framework also extends and simplifies ideas of the approximation algorithm for Robust Matroid Center of [CLLW16].
We start by introducing the Colorful Supplier problem, which formalizes colorful center problems with (downclosed) constraints on the facilities.
Definition 1 (Colorful Supplier problem).
Let be a finite metric space on a set of clients and facilities , let be a downclosed family of subsets of , and let . Moreover, we are given for each :

a unary encoded weight/color function , and

a covering requirement .
The Colorful Supplier problem asks to find the smallest radius together with a set such that for all .^{1}^{1}1We use the common notation for functions and , as well as for the ball of radius around point . Moreover, we use the shorthand for sets .
We note that it is also common to define colorful center versions in an unweighted way (thus not using weight functions ) by assigning to each client a subset of the many colors and requiring that, for each color, many clients of that color are covered. The definition we use clearly captures this case (and can easily be seen to be equivalent). This connection also explains why the weights are assumed to be given in unary encoding.
Following common terminology in the literature, when is the family of independent sets of a matroid or feasible sets with respect to a knapsack constraint, we call the problem Colorful Matroid Supplier and Colorful Knapsack Supplier, respectively.
Our main contribution is a general reduction of Colorful Supplier to an auxiliary problem, which we call CoverPromise (). , which is formally defined below, is a multidimensional cover problem with the added promise that highly structured solutions exist. The promise is key, as the problem without the promise can be thought of as a multidimensional maxcover problem.
Definition 2 (CoverPromise ()).
In the CoverPromise problem (), we are given a set family over a finite universe , a family of feasible subsets of , and many unary encoded weight functions each with a requirement (for ). The task is to find a feasible family of sets such that
The promise is that there exists a family and a way to pick for each a single representative such that
In words, the promise is that there is a solution that picks a family of sets and the requirements can be fulfilled by only using a single representative in each set. However, the solution we are allowed to build is such that the weight of all elements covered by our sets are counted instead of just a single representative per set.
We are now ready to state our main reduction theorem, which, as we discuss later, readily leads, for a constant number of colors , to the first constantfactor approximations for Colorful Matroid Supplier for linear matroids and Colorful Knapsack Supplier. Our reduction to comes at the cost of an factor in the approximation guarantee.
Theorem 3.
For any family of downclosed set systems, we have that if can be solved efficiently for any in that family, then there is an approximation algorithm for Colorful Supplier for any in the family.^{2}^{2}2When talking about the same set system both in the context of and Colorful Supplier, we consider to be the same set system in both settings even if the ground sets are different, as long as there is a onetoone relation between the ground sets mapping sets of one system to sets of the other one and vice versa.
While the dependence of the approximation factor on may be undesirable, the algorithmic barriers for prior approaches remain even when and, for hardness reasons, we do not expect approximation algorithms to exist at all when grows too quickly. In particular, [AAKZ21] showed that even a simple version of colorful clustering, where any centers can be chosen, does not admit an approximation algorithm when under the Exponential Time Hypothesis. Thus, in what follows, we restrict ourselves to .
We now discuss implications of Theorem 3 to Colorful Matroid Supplier for linear matroids and Colorful Knapsack Supplier. When is the family of independent sets of a linear matroid, we show how can be solved with techniques relying on an efficient randomized procedure for the Exact Weight Basis (XWB) problem for linear matroids.^{3}^{3}3In XWB, one is given a matroid on a ground set with unary encoded weights and a target weight; the goal is to find a basis of the matroid of weight equal to the target weight. The technique in [CGM92] to solve XWB for linear matroids needs an explicit linear representation of the linear matroid. We make the common assumption that this is the case whenever we make a statement about linear matroids. Linear matroids include as special cases many other wellknown matroid classes, including uniform matroids, and more generally partition and laminar matroids, graphic matroids, transversal matroids, gammoids, and regular matroids.
Theorem 4.
For and being the independent sets of a linear matroid, can be solved efficiently by a randomized algorithm. Hence (by Theorem 3), there is a randomized approximation algorithm for Colorful Matroid Supplier for linear matroids.
The restriction to linear matroids and the fact that the algorithm is randomized are not artifacts of our framework. Indeed, by an observation in [JSS21], rephrased for matroids below, we do not only have that XWB implies results for Colorful Matroid Supplier (which will follow from our reduction), but also a reverse implication. More precisely, even for Colorful Matroid Supplier, deciding whether there is a solution of radius zero requires being able to solve XWB on that matroid. However, it is unknown whether XWB can be solved efficiently on general matroids, and the only technique known for XWB on linear matroids is inherently randomized [CGM92]. (Derandomization is a longstanding open question in this context.)
Lemma 5 (based on [Jss21]).
If there is an efficient algorithm for deciding whether Colorful Matroid Supplier with respect to a given class of matroids admits a solution of radius zero, then XWB can be solved efficiently on the same class of matroids.
Note that if we cannot decide the existence of a radius zero solution, then no approximation algorithm with any finite approximation guarantee can exist.
For the case where are the feasible sets for a knapsack problem, one can use standard dynamic programming techniques to see that can be solved efficiently, which readily leads to a approximation for Colorful Knapsack Supplier.
Whereas our reduction given by Theorem 3 is broadly applicable and readily leads to first constantfactor approximations for Colorful Supplier problems, it remains open whether and in which settings a dependence of the approximation factor on the number of colors is necessary. We make first progress toward this question for Colorful Knapsack Supplier, where we show how techniques from [JSS21] can be modified and extended to give a approximation (independent of the number of colors).
Theorem 6.
For , there is a approximation algorithm for Colorful Knapsack Supplier.
Our technical contribution here lies in handling the knapsack constraint in this approach— modifying the algorithm of [JSS21] to the supplier setting and to weighted instances is straightforward. In fact, their algorithm can be seen to give a approximation even for Colorful Supplier, which is tight in light of a hardness result in [CKMN01], namely that it is hard to approximate Robust center with forbidden centers to within . This remains the strongest hardness result even for Colorful Supplier problems.
1.2 Organization of this paper
Our main reduction, Theorem 3, is based on what we call partitions, which is a way to judiciously partition the clients into parts that we want to cover together. We introduce partitions in Section 2 and show how the existence of certain strong partitions implies Theorem 3. In Section 3, we show how our reduction framework can be used to obtain first constantfactor approximations for Colorful Matroid Supplier for linear matroids (thus showing Theorem 4) and Colorful Knapsack Supplier. Finally, in Section 4 we prove existence of strong partitions. The proof of Lemma 5 and our approximation for Colorful Knapsack Supplier, i.e., the proof of Theorem 6, are presented in Appendix A and Appendix B, respectively.
2 Reducing to CP through partitions
Consider a Colorful Supplier problem on a metric space with weights for and covering requirements for ]. An partition is a partition of the clients into parts of small diameter each of which we consider in our analysis to be either fully covered or not covered at all. The key property of an partition is that, if our instance admits a radius solution, then there is a radius solution where we allow each center to cover only a single part of the partition. It is the existence of such highly structured solutions that we exploit to design approximation algorithms.
A crucial property of partitions is that they neither depend on nor the covering requirements , but only on the metric space and the weight functions, which we call a colorful space for convenience.
Definition 7 (colorful space ).
A colorful space consists of

a metric space , and

color functions for .
We assume for convenience that the supports of the color functions, i.e., for , are pairwise disjoint. One can reduce to this case without loss of generality by colocating copies of clients. We are now ready to formally define the notion of partition.
Definition 8 (partition).
Let be a colorful space and . A partition is an partition if

, and

for any , there exists a subfamily and injection such that

,^{4}^{4}4 For any set and , we use the shorthand . and

.

To connect partitions to colorful clustering problems, think of as centers of a Colorful Supplier problem that satisfy the covering requirements with radius . The definition of an partition then implies that there is a subset of the parts such that (i) for each there exists an element such that any client in has distance at most from , which follows from property 1 and 2a of the definition, and (ii) the clients in cover as much as in each color. Thus, the set of facilities satisfies the covering requirements with respect to the radius , and, furthermore, is feasible because and is downclosed. In short, is an approximate solution to the Colorful Supplier problem. Hence, to obtain an approximation, the problem reduces to deciding which of the parts of to cover. A key simplification we gain from this connection is that the client sets in are nonoverlapping because is a partition, which we will heavily exploit later to design our algorithms.
The key structural result of our work is to show that partitions with constant (for a fixed ) exist and can also be constructed efficiently, which is summarized below.
Lemma 9.
For every colorful space and , one can construct in polynomial time a partition.^{5}^{5}5As we highlight later, a more careful analysis of our approach allows for a slight improvement in the constant factor, leading to the construction of partitions. However, in the interest of simplicity, we present a simpler analysis that shows the bound claimed in the lemma.
We defer the proof of Lemma 9 to Section 4, and first show how it implies our main reduction theorem, Theorem 3, and how this reduction readily leads to approximations for Colorful Matroid Supplier for linear matroids and Colorful Knapsack Supplier.
Proof of Theorem 3.
Consider an instance of Colorful Supplier on a colorful space . We can guess the radius of an optimal solution to the problem. This can be achieved by considering all pairwise distances between facilities and clients , repeating the steps below for each guess and only considering the best output (and discarding outputs where the procedure fails). Hence, assume that is the optimal radius from now on.
By Lemma 9, we can efficiently construct an partition of for . Consider the instance with universe , family of sets
The family of feasible subsets of is the same as when identifying with the element . To make this relation explicit, if we denote by the family of feasible subsets, then some subset of , say where , is in if and only if . Moreover, the weights and coverage thresholds are inherited from those of the given Colorful Supplier problem; formally, for , the th weight of is given by .
To make sure that this indeed leads to an CP problem, we have to verify that the promise holds. Thus, let be a solution to the given Colorful Supplier problem for radius , which exists because we assume that was guessed correctly. As is an partition of , there is a subfamily and injection satisfying property 2 of Definition 8. We claim that a solution fulfilling the promise is given by choosing
and setting as representative element the element , where . Note that because and is downclosed, we indeed have . Furthermore, because the injection satisfies , we have , as desired. Moreover,
where the first inequality follows because fulfills the second property of Definition 8, and the last inequality is a consequence of being centers that are a radius solution to the given Colorful Supplier problem. Hence, the promised solution exists.
Thus, we can compute an solution , which can be written as for some . We claim that is a solution to the given Colorful Supplier problem with radius , which finishes the proof. This follows from the fact that is an solution, and that, for any , each client in has distance at most from because is an partition. Hence, the clustering solution with centers and radius covers all clients in
and the weight (for any ) that it covers is at least
where the equality uses that the ground set consists of sets that are disjoint, and the inequality holds because is a solution to . Thus, all coverage requirements are fulfilled by the clustering with centers and radius , as desired. ∎
3 Applications of our reduction framework
We now discuss implications of our reduction framework, Theorem 3, to Colorful Matroid Supplier for linear matroids and Colorful Knapsack Supplier.
3.1 Colorful Matroid Supplier
To apply our reduction framework to Colorful Matroid Supplier for linear matroids, we have to solve when are the independent sets of a linear matroid. We show how this problem can be reduced to XWB in a suitably defined matroid. More precisely, we use a reduction to the Exact Weight Independent Set (XWI) problem for matroids. This problem is identical to XWB except that an independent set with the desired target weight needs to be returned, instead of a basis. However, XWI easily reduces to XWB on linear matroids, by adding zero weight copies of the elements.
This reduction relies on Rado matroids, which is a way to construct a matroid from another one (see, e.g., [welsh2010matroid, Section 8.2]).^{6}^{6}6This construction of Rado matroids is also called the induction of a matroid by a bipartite graph. It relies on the notation of a system of representatives, where, for a finite universe and a set system , a system of representatives of is any set with for . In words, a system of representatives is obtained by replacing each set in by an element in that set (its representative). (Note that an element can be chosen more than once as a representative, but, as defined above, only appears once in the system of representatives.)
Definition 10 (Rado matroid).
Let be a finite universe, be some set system, and let be a matroid. The Rado matroid induced by is a matroid on the ground set with independent sets
A proof that a Rado matroid is indeed a matroid can be found, e.g., in [welsh2010matroid, Section 8.2]. We will reduce to XWI on a Rado matroid obtained from a linear matroid. For this, we need that also the Rado matroid we obtain is linear and, moreover, that an explicit linear representation of it can be found efficiently, which is the case due to a result from [PW70].
Lemma 11 (see Theorem 3 of [Pw70]).
For a set family and a linear matroid , the Rado matroid induced by is a linear matroid. Moreover, given a linear representation of , one can find a linear representation of in time polynomial in , , and the size of the linear representation of .
We are now ready to show that can be solved efficiently for linear matroids, which implies Theorem 4.
Lemma 12.
can be solved efficiently when is the family of independent sets of a linear matroid.
Proof.
We recall that we are given an instance, which defines a set system over a finite universe , and a family such that is a linear matroid. Let be the Rado matroid induced by . is a linear matroid by Lemma 11 and we can obtain a linear representation of in polynomial time. The promise of implies the existence of an independent set of satisfying the covering requirements, i.e.,
(1) 
To solve , we guess, for each color , the weight that covers. Note that is at most , which, due to the unary encoding of , is polynomially bounded in the input. Hence, the guessing of the , for , can be performed in time , which is polynomially bounded because .
We now determine an independent set in with for each . This can be achieved by encoding all many (unary encoded) weight functions for into a single one and then solving an appropriate XWI problem with respect to . More precisely, for an element , we obtain a new single weight whose first bits represent the weight , the next bits the weight , and so on. Because and all have unary encoding, this leads to combined weights whose unary encoding is polynomially bounded. Analogously, we encode the guessed weights for into a single one . We now solve XWI on with weights and target weight . As is linear, this is possible by a randomized algorithm in time pseudopolynomial in the total weight [CGM92]. Moreover, because the weights are unary encoded in our setting, this implies a polynomial running time as desired.
Let be a solution of this XWI problem, which must exist for the correct guess of the because of the promised solution . being independent in implies that it is a system of representatives for some independent set of . Such a set can be found through matroid intersection. More precisely, it is known that the minimal (inclusionwise) sets such that is a system of representatives for form the basis of a matroid , for which an efficient independence oracle can be obtained. (See [welsh2010matroid, Section 7.3].) Hence, the desired set can be obtained by finding a basis of that is independent in , which can be computed through matroid intersection algorithms. The set is the solution of that we return. Because , the set fulfills the covering requirements due to (1). ∎
3.2 Colorful Knapsack Supplier
To showcase the versatility of our reduction, we now show how it implies an approximation for Colorful Knapsack Supplier, by discussing an efficient way to solve when are the feasible solutions to a knapsack constraint. Even though there is a stronger (and more sophisticated) approximation result for this problem (as stated in Theorem 6), this application is a nice example of how one can readily obtain constantfactor approximations through our reduction technique combined with known methods; in this case, by solving through a standard dynamic programming approach.
Lemma 13.
Let be the feasible sets of a knapsack constraint, i.e., for some and budget . Then can be solved efficiently.
Proof.
Recall that the problem to be solved defines a family over a finite universe , and a family , which is defined by a knapsack constraint, i.e., . We define the following weight function on :
In words, corresponds to the cost of the cheapest set in that covers . Consider the following binary program, which can be solved efficiently by standard dynamic programming techniques due to the unary encoding of the weights for (see, e.g., [AAKZ21] for details):
We compute an optimal solution to the above binary program. Let . For each , let be a set of minimum cost that contains ; hence, . We claim that is a solution to . Because fulfills the constraints of the binary program, we have that fulfills the covering requirements. It remains to show that it fulfills the knapsack constraint, i.e., its cost is at most . This reduces to show that the optimal value of the binary program is at most . We claim that this holds because of the promise of . Indeed, the promise guarantees that there is and a system of representatives for such that for . Hence, setting for all , and setting all other coordinates of to zero, is a solution to the binary program which has objective value at most . ∎
4 Existence and construction of strong partitions
We now prove our key structural result, Lemma 9, which guarantees the existence and efficient constructability of partitions for colorful spaces. Our proof proceeds by induction on . The base case, i.e., , holds because the family is a grouping on every colorful space . The key step is extending an partition of a colorful space to a suitable partition of a colorful space.
To this end, we extend ideas on the greedy algorithm of [CKMN01], which was originally introduced to deal with a single color center problem. More precisely, to augment a partition of a colorful space, we apply a greedy subroutine on the points of color . A careful construction and analysis (which takes into account the earlier colors) then shows that this yields a partition of the colorful space. Our refined charging scheme improves on a decoupled analysis of [IV21] (which gives an approximation algorithm for Colorful Center).
The lemma below formalizes the induction step.
Lemma 14.
Given a partition for a colorful space, then one can efficiently construct a partition for any colorful space obtained by adding one color to the colorful space.
Proof.
Let be a colorful space, and let be the first colors. (Hence, we omitted the last color.) Let and , and let be a partition of the colorful space . Note that we assumed that the supports of the weights are disjoint. Hence, for . Moreover, without loss of generality, we assume that for every client , there is a facility with . All clients not fulfilling this condition can be deleted from the instance without changing the statement as they can never be covered by any radius solution. Indeed, a partition of the clients of this purged instance can simply be extended to a partition of all clients by adding the deleted clients as singleton sets to the partition.
We now prove that Algorithm 1 returns an partition of , where . Algorithm 1 goes through all facilities in a wellchosen order and iteratively builds new parts consisting of parts in together with a subset of . (See Fig. 2 for an illustration of this procedure.)
First, observe that is a partition. It clearly covers all clients as no client is farther than distance away from its nearest facility, and we consider all facilities. Moreover, the sets in are disjoint by construction. Now, observe that any has small diameter, because
where the second inequality holds because for any due to the following. Consider . If , then we even have . Otherwise, let be the set in the partition containing . Note that implies . Hence, , where we use , because , and , which holds because is an partition. Thus, property 1 of the definition of an partition (Definition 8) is fulfilled for .
It remains to show that property 2 holds for a given selection . To this end, we use that is an partition, which implies that there is a subfamily and a corresponding injection fulfilling property 2 of Definition 8 for the colorful space . In the following we construct and such that property 2 of Definition 8 is satisfied for and . At the same time when constructing , we employ a careful charging argument that makes sure that , i.e., that the constructed covers at least as much as of color . For the remaining colors, we show that the new selection includes all of ; formally, we show that for each , there is an such that . This, as well as for all and injectivity of , are proved later.
For , we define
to be the clients that are “uncovered” at step . By the way Algorithm 1 selects in each iteration , we have
which we call the greediness property.
We now describe the construction of and the charging scheme in detail. We successively add sets to , where the sets are considered in increasing order of their index. When adding a set to , we also perform two further steps: (i) we identify an element and set , and (ii) we mark as assigned to make sure that we never assign it again in the future (as needs to be an injection). For convenience, for and , we write for performing these steps, i.e., adding to , setting to , and marking as assigned.
The charging argument charges the coverage of color of against the coverage in . Whenever we charge a set against some subset , we make sure that . Algorithm 2 shows our procedure to construct both and the desired injection together with the charging argument. (See also Fig. 2.)
We start by showing that is an injection. Suppose is assigned using Rule 1 or 2. Then was not assigned so far as we only assign unassigned facilities. Now suppose is assigned using Rule 3. We claim that is not assigned so far. Assume by the sake of deriving a contradiction that it was assigned in a previous iteration . It cannot have been assigned by Rule 3, since is injective. So assume it is was assigned by Rule 1 or 2. Hence, satisfies . This implies that and thus , which contradicts .
Moreover, fulfills property 2a of a partition because of the following. Let and , and we have to show that . Because , we called at some point during Algorithm 2 the procedure . In both Rule 1 and Rule 2 we have , which implies that contains a client in , as desired. If was called in Rule 3, then we have , which implies by the fact hat is an partition.
It remains to show that fulfills property 2b of an partition. We first consider the last color (color ) and show . To this end, observe that the charging indeed charges clients in against clients in . We allow for charging a client in against more than one client in . However, no client in gets charged against more than once because in iteration we only charge against clients in , and the sets form a partition of . Also note that we always charge clients of against clients of of at least the same weight. This is true whenever charging happens in Rule 2 or Rule 3, because of the greediness property, and holds trivially for all other charging operations, which only charge clients against themselves. To conclude that , it remains to observe that all of gets charged against something.
To this end, fix a facility . Consider an iteration of Algorithm 2 such that intersects . We claim that for each such iteration, either is called, or is charged. To prove the claim, suppose is not assigned in iteration . By Algorithm 2, either Rule 1 or Rule 2 must have applied in this iteration , as satisfies the condition of Rule 2. Thus Assign was called on and all points in have been charged. Now suppose the first case applies, i.e., is called for some . Then all of is charged (and is already charged by the second case). If the first case never applies, then all of is charged by the second case since is empty. Hence, all of is charged, as desired.
To see that property 2b of Definition 8 is fulfilled also for all colors , observe that Rule 3 makes sure that any component that was in will still be selected in . Thus, for all colors .
It remains to show that . If Rule 1 or Rule 2 is applied, this is satisfied as there is a client ; because by construction, we have . If Rule 3 is applied for , we also have , where the last inequality follows from being an partition. ∎
Proof of Lemma 9.
The proof follows by induction on . For the induction start, consider . The set