Finding a Collective Set of Items: From Proportional Multirepresentation to Group Recommendation

02/13/2014 ∙ by Piotr Skowron, et al. ∙ 0

We consider the following problem: There is a set of items (e.g., movies) and a group of agents (e.g., passengers on a plane); each agent has some intrinsic utility for each of the items. Our goal is to pick a set of K items that maximize the total derived utility of all the agents (i.e., in our example we are to pick K movies that we put on the plane's entertainment system). However, the actual utility that an agent derives from a given item is only a fraction of its intrinsic one, and this fraction depends on how the agent ranks the item among the chosen, available, ones. We provide a formal specification of the model and provide concrete examples and settings where it is applicable. We show that the problem is hard in general, but we show a number of tractability results for its natural special cases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A number of real-world problems consist of selecting a set of items for a group of agents to jointly use. Examples of such activities include picking a set of movies to put on a plane’s entertainment system, deciding which journals a university library should subscribe to, deciding what common facilities to build, or even voting for a parliament (or other assembly of representatives). Let us consider some common features of these examples.

First, there is a set of items222We use the term ‘item’ in the most neutral possible way. Items may be candidates running for an election, or movies, or possible facilities, and so on. and a set of agents; each agent has some intrinsic utility for each of the items (e.g., this utility can be the level of appreciation for a movie, the average number of articles one reads from a given issue of a journal, expected benefit from building a particular facility, the feeling—measured in some way—of being represented by a particular politician).

Second, typically it is not possible to provide all the items to the agents and we can only pick some of them, say (a plane’s entertainment system fits only a handful of movies, the library has a limited budget, only several sites for the facilities are available, the parliament has a fixed size).

Third, the intrinsic utilities for items extend to the sets of items in such a way that the utility derived by an agent from a given item may depend on the rank of this item (from the agent’s point of view) among the selected ones. Extreme examples include the case where each agent derives utility from his or her most preferred item only (e.g., an agent will watch his or her favorite movie only, will read/use the favorite journal/favorite facility only, will feel represented by the most appropriate politician only), from his or her least preferred item only (say, the agent worries that the family will force him or her to watch the worst available movie), or derives

of the utility from each of the available items (e.g., the agent chooses the item—say, a movie—at random). However, in practice one should expect much more complicated schemes (e.g., an agent watches the top movie certainly, the second one probably, the third one perhaps, etc.; or, an agent is interested in having at least some

interesting journals in the library; an agent feels represented by some top members of the parliament, etc.).

The goal of this paper is to formally define a model that captures all the above-described scenarios, provide a set of examples where the model is applicable, and provide an initial set of computational results for it in terms of efficient algorithms (exact or approximate) and computational hardness results (-hardness and inapproximability results).

Our work builds upon, generalizes, and extends quite a number of settings that have already been studied in the literature. We provide a deeper overview of this research in Section 8 and here we only mention the two most directly related lines of work. First, our model where the agents derive utility from their most preferred item among the selected ones directly corresponds to winner determination under the Chamberlin–Courant’s voting rule [18, 48, 7] (it is also very deeply connected to the model of budgeted social choice [39, 47, 40]) and is in a certain formal sense a variant of the facility location problem. Second, the case where for each item each agent derives the same fraction of the utility is, in essence, the same as -winner range-voting (or -winner Borda [21]); that agents enjoy equally the items they get is also a key assumption in the Santa Claus problem [6], and in the problem of designing optimal picking sequences [14, 10, 33].

The paper is organized as follows. First, in Section 2 we discuss several important modeling choices and provide the formal description of our model. Then, in Section 3, we discuss the applicability of the model in various scenarios. Specifically, we show a number of examples that lead to particular parameter values of our model. We give an overview of our results in Section 4 and then, in Sections 56, and 7, we present these results formally. In Section 5 we present results regarding the complexity of computing exact solutions for our model. In the next two sections we discuss the issue of computing approximate solutions. First without putting restrictions on agents’ utilities (Section 6) and, then, for what we call non-finicky utilities (Section 7). Intuitively put, under non-finicky utilities the agents are required to give relatively high utility values to a relatively large fraction of the items). We believe that the notion of non-finicky utilities is one of the important contributions of this paper. We discuss related work in Section 8 and conclude in Section 9.

2 The Model

In this section we give a formal description of our model. However, before we move on to the mathematical details, let us explain and justify some high-level assumptions and choices that we have made.

First, we assume that the agents have separable preferences. This means that the intrinsic utility of an object does not depend on what other objects are selected. This is very different from, for example, the case of combinatorial auctions. However, in our model the impact of an object on the global utility of an agent does depend on its rank (according to that agent) among the selected items. This distinction between the intrinsic value of an item and its value distorted by its rank are also considered in several other research fields, especially in decision theory (where it is known as “rank-dependent utility theory”) and in multicriteria decision making, from which we borrow one of the main ingredients of our approach, the ordered weighted average (OWA) operators [55] (for technical details see the work of Kacprzyk et al. [32]). OWAs were recently used in social choice in several contexts [29, 3, 23]; we discuss these works in detain in Section 8.

Second, throughout the paper we navigate between two views of the agents’ intrinsic utilities:

  1. Generally, we assume that the utilities are provided explicitly in the input as numerical values, and that these values are comparable between agents. Yet, we make no further assumptions about the nature of agents’ utilities: they do not need to be normalized, they do not need to come from any particular range of values, etc. Indeed, it is possible that some agent has very strong preferences regarding the items, modeled through high, diverse utility values, whereas some other agent does not care much about the selection process and has low utility values only.

  2. In some parts of the paper (which will always be clearly identified), we assume that utilities are heavily constrained and derive from non-numerical information, such as approval ballots specifying which items an agent approves (leading to approval-based utilities), or rankings over alternatives, from which utilities are derived using an agent-independent scoring vector (typically, a Borda-like vector).

Formally, the latter view is a special case of the former, but we believe that it is worthwhile to consider it separately. Indeed, many multiwinner voting rules (such as the Chamberlin–Courant [18] rule or the Proportional Approval Voting rule [35]) fit the second view far more naturally, whereas for other applications the former view is more natural.

Third, we take the utilitarian view and measure the social welfare of the agents as the sum of their perceived utilities. One could study other variants, such as the egalitarian variant, where the social welfare is measured as the utility of the worst-off agent. We leave this as possible future research (our preliminary attempts indicated that the egalitarian setting is computationally even harder than the utilitarian one). Very recently, Elkind and Ismaïli [23] used OWA operators to define variants of the Chamberlin–Courant rule that lay between the utilitarian and egalitarian variants; we discuss this work in more detail in Section 8.

2.1 The Formal Setting

Let be a set of agents and let be a set of items. The goal is to pick a size- set of items that, in some sense, is most satisfying for the agents. To this end, (1) for each agent and for each item we have an intrinsic utility that agent derives from ; (2) the utility that each agent derives from a set of items is an ordered weighted average [55] of this agent’s intrinsic utilities for these items.

A weighted ordered average (OWA) operator over numbers is a function defined through a vector of (nonnegative) numbers333The standard definition of OWAs assumes normalization, that is, . We do not make this assumption here, for the sake of convenience; note that whether OWA vectors are normalized or not is irrelevant to all notions and results of this paper. as follows. Let be a vector of numbers and let be the nonincreasing rearrangement of , that is, , where is a permutation of such that . Then we set:

To make the notation lighter, we write , instead of .

We provide a more detailed discussion of the OWA operators useful in our context later and here we only mention that, for example, they can be used to express the arithmetic average (through the size- vector ), the maximum and minimum operators (through vectors , and , respectively) and the median operator (through the vector of all zeros, with a single one in the middle position).

We formalize our problem of computing “the most satisfying set of items” as follows.

Definition 1.

In the OWA-Winner problem we are given a set of agents, a set of items, a collection of agent’s utilities , a positive integer (), and a -number OWA . The task is to compute a subset of such that is maximal.

Example 1.

Consider six agents with the following utilities over the items from the set :

We want to select items and we use OWA . What is the score of ? The first three agents get utility each, the next two get each, and the last one gets . So, the score of is . Indeed, this is the optimal set; the next best ones are , and , all with score . The rule defined by the OWA , known as -Borda (due to the very specific values of agents’ utilities; see Example 2 in the next section), would choose and Chamberlin–Courant’s rule (in our terms, the rule defined by the OWA operator ) would choose .

For a family of OWAs, we write -OWA-Winner to denote the variant of the problem where for each given solution size we use OWA . From now on we will not mention the size of the OWA vector explicitly and it will always be clear from the context. We implicitly assume that OWAs in our families are polynomial-time computable.

2.2 Classes of Intrinsic Utilities

While our general setting allows agents to express arbitrary utilities, we also focus on two cases where they only provide dichotomous or ordinal information:

Dichotomous information.

Agents provide dichotomous information if they only have to specify which items they like. This information is then mapped into dichotomous (or, as we typically refer to them, approval-based) utilities, defined by if likes and otherwise.

Ordinal information.

Agents provide ordinal information if they only have to specify their rankings over items, called their preference orders. This information is then mapped into utilities using a scoring vector, exactly in the same way as positional scoring rules (for single-winner voting) do. We focus on the partiuclar case where this scoring vector is the Borda vector, i.e., if the rank of in ’s ranking is then . We refer to this setting as Borda-based utilities.

Naturally, these are special cases of our general setting. Yet using approval-based or Borda-based utilities can be more convenient than using the general approach.

Example 2.

The utilities of the agents from Example 1 are Borda-based and can be expresses as the following preference orders:

Both approval-based utilities and Borda-based utilities are inspired by analogous notions from the theory of voting, where approval and Borda count are very well-known single-winner voting rules (briefly put, under these rules we treat the utilities of the items as their scores, sum up the scores assigned to the items by the voters, and elect the item that has the highest score). Further, Borda-based utilities have been used in the original Chamberlin–Courant’s rule and in several works on fair division (see, e.g., a paper of Brams and King [13]).

One of the high-level messages of this paper is that OWA-Winner problems tend to be computationally easier for the case of Borda-based utilities than for the case of approval-based ones (while we typically obtain -hardness in both settings, we find good approximation algorithms for many of the Borda-based cases, whereas for the approval-based setting our algorithms are either significantly weaker or we obtain outright inapproximability results). This is so mostly because under Borda-based utilities all the agents assign relatively high utility values to a relatively large fraction of items. In the following definition we try to capture this property.

Definition 2.

Consider a setting with items and let denote the highest utility that some agent gives to an item. Let and be two numbers in . We say that the agents have (, )-non-finicky utilities if every agent has utility at least for at least items.

To understand this notion better, let us consider the following example.

Example 3.

Let and . The utilities are as defined below:

The agents have -non-finicky utilities. Indeed, all there agents have utility at least 8 for at least half of the items. They also have -non-finicky utilities, and -non-finicky utilities. We will also use the agents and items from this example later, when presenting our algorithms.

As we can expect, Borda-based utilities are non-finicky in a very natural sense.

Observation 1.

For every , , Borda-based utilities are -non-finicky.

However there are also other natural cases of non-finicky utilities. For example, consider agents that have approval-based utilities and where each agent approves of at least a fraction of the items. These agents have -non-finicky utilities. (The reader may be surprised here that approval-based utilities may be non-finicky even though we said that we obtain inapproximability results for them. Yet, there is no contradiction here. These inapproximability results rely on the fact that some agents approve of very few items.)

2.3 A Dictionary of Useful OWA Families

Below we give a catalog of OWA families that we focus on throughout the paper (in the description below we take to be the dimension of the vectors to which we apply a given OWA).

  1. -median OWA. For each , is the OWA defined by the vector of zeros, followed by a single one, followed by zeros. It is easy to see that is the -th largest number in the set and is known as the -median of . In particular, is the maximum operator, is the minimum operator, and if

    is odd,

    is the median operator.

  2. -best OWA. For each , OWA is defined through the vector of ones followed by zeros. That is, is the sum of the top values in (with appropriate scaling, this means an arithmetic average of the top numbers). is simply the sum of all the numbers in (after scaling, the arithmetic average).

  3. Arithmetic progression OWA. This OWA is defined through the vector , where and . (One can easily check that the choice of has no impact on the outcome of OWA-Winner; this is not the case for , though.)

  4. Geometric progression OWA. This OWA is defined through the vector , where . (This is without loss of generality, because multiplying the vector by a constant factor has no impact on the outcome of OWA-Winner; but the choice of matters.)

  5. Harmonic OWA. This OWA is defined through the vector ,

  6. Hurwicz OWA. This OWA is defined through a vector , where , , is a parameter.

Naturally, all sorts of middle-ground OWAs are possible between these particular cases, and can be tailored for specific applications. As our natural assumption is that highly ranked items have more impact than lower-ranked objects, we often make the assumption that OWA vectors are nonincreasing, that is, . While most OWA operators we consider in the paper are indeed nonincreasing, this is not the case for -medians (except for -median) and Hurwicz (except for ).

3 Applications of the Model

We believe that our model is very general. To substantiate this claim, in this section we provide four quite different scenarios where it is applicable.

Generalizing Voting Rules. Our research started as an attempt to generalize the rule of Chamberlin and Courant [18] for electing sets of representatives. For this rule, the voters (the agents) have Borda-based utilities over a set of candidates and we wish to elect a -member committee (e.g., a parliament), such that each voter is represented by one member of the committee. If we select candidates, then a voter is “represented” by the selected candidate that she ranks highest among the chosen ones. Thus, winner determination under Chamberlin–Courant’s voting rule boils down to solving -OWA-Winner for the case of Borda-based utilities. On the other hand, solving -OWA-Winner for Borda-based utilities is equivalent to finding winners under -Borda, the rule that picks candidates with the highest Borda scores (see the work of Elkind et al. [22] for a classification of multiwinner voting rules, including, e.g., -Borda and Chamberlin–Courant’s rule).

Our model extends one more appealing voting rule, known as Proportional Approval Voting (PAV; see the work of Kilgour [35] for a review of approval-based multiwinner rules, and the work of Aziz et al. [5] and Elkind and Lackner [24] for computational results). Winner determination under PAV is equivalent to solving -OWA-Winner for the harmonic OWA, for the case of approval-based utilities.

Malfunctioning Items or Unavailable Candidates. Consider a setting where we pick the items off-line, but on-line it may turn out that some of them are unavailable (for example, we pick a set of journals the library subscribes to, but when an agent goes to a library, a particular journal could already be borrowed by someone else; see the work of Lu and Boutilier [38] for other examples of social choice with possibly unavailable candidates). We assume that each item is available with the same, given, probability (i.i.d.). The utility an agent gets from a set of selected items is the expected value of the best available object. The probability that the ’th item is available while the preceding items are not, is proportional to . So, to model the problem of selecting items in this case, we should use the geometric progression OWA with initial value and coefficient .

Uncertainty Regarding How Many Items a User Enjoys. There may be some uncertainty about the number of items a user would enjoy (e.g., on a plane, it is uncertain how many movies a passenger would watch; one might fall asleep or might only watch those movies that are good enough). We give two possible models for the choice of the OWA vectors:

  1. The probability that an agent enjoys items, for

    , is uniformly distributed, i.e., an agent would enjoy exactly his or her first

    items in with probability . So, the agent enjoys the ’th item if she enjoys at least items, which occurs with probability ; we should use OWA vector defined by (we disregard the normalizing constant), i.e., an arithmetic progression.

  2. We assume that the values given by each user to each item are distributed uniformly, i.i.d., on and that each user uses only the items that have a value at least , where is a fixed (user-independent) threshold. Therefore, a user enjoys the item in ranked in position if she values at least items at least , which occurs with probability , thus leading to the OWA vector defined by .

Ignorance About Which Item Will Be Assigned to a User. We now assume that a matching mechanism will be used after selecting the items. The matching mechanism is not specified; it might also be randomized. If the agents have a complete ignorance about the mechanism used, then it makes sense to use known criteria for decision-making under complete uncertainty:

  1. The Wald criterion assumes that agents are extremely risk-averse, and corresponds to . The agents consider their worst possible items.

  2. The Hurwicz criterion is a linear combination between the worst and the best outcomes, and corresponds to for some fixed .

If the agents know that they are guaranteed to get one of their best items, then the Wald and Hurwicz criteria lead, respectively, to the OWAs and , with in position . If the agents know that the mechanism gives them one of their top items, each with the same probability, then we should use OWA. More generally, the matching mechanism may assign items to agents with a probability that decreases when the rank increases.

4 Overview of the Results

general and –non-finicky
OWA family approval utilities and Borda utilities References
-median ( fixed) -hard -hard (Borda) Proposition 8
DkS-bounded -approx. Theorem 15 and Corollary 26
PTAS (Borda) Theorem 30
-median -hard -hard Theorems 6 and 7
MEBP-bounded ? Theorem 19, open problem
-best -hard (approval) -hard (Borda) Literature [48, 39]
-approx. -approx. Literature [39], Corollary 26
PTAS (Borda) Literature [53]
-best ( fixed) -hard (approval) -hard (Borda) Proposition 8
-approx. -approx. Theorem 13 and Corollary 26
-best -hard (approval) -hard (Borda) Theorems 6 and 7
PTAS PTAS Theorem 23
-best folk result
arithmetic progression -hard ? Theorem 3, open problem
-approx. -approx. Theorem 13
geometric progression -hard ? Theorem 3, open problem
-approx. Theorem 13, Corollary 31
Hurwicz[] -hard (approval) ? Corollary 20, open problem
-approx. -approx. Corollary 22
for each
Table 1: Summary of our results for the OWA families from Section 2.3. For each OWA family we provide four entries: In the first row (for a given OWA family) we give its worst case complexity (in the general case and in the non-finicky utilities case), and in the second row we list the best known approximation result (in the general case and in the non-finicky utilities case). We write to mean the cardinality of the winner set that we seek. In the “References” column we point to the respective result in the paper/literature. For negative results we indicate the simplest types of utilities where they hold; for positive results we give the most general types of utilities where they hold. For approximability results for the case of non-finicky utilities, we write -approx to mean that there is a polynomial-time approximation algorithm whose approximation ratio approaches as the size of the committee increases (in effect, for each , , there is a polynomial-time algorithm that achieves approximation ratio, by using a brute-force algorithm is the size of the committee is smaller than a certain constant). For inapproximability results, by DkS-bounded and MEBP-bounded we mean, respectively, inapproximability results derived from the Densest-k-Subgraph problem and from the Maximum Edge Biclique Problem.

In this section we provide a high-level overview of our results. It turns out that computational properties of the OWA-Winner problem are quite varied and strongly depend on the types of OWA operators and the allowed agent utilities. We present a summary of our results in Table 1 (however, we stress that some of our technical results are not listed in the table and can be found only in the following sections).

Our first observation is that without any restrictions, OWA-Winner is -hard. This is hardly surprising since the problem generalizes other -hard problems, and it is natural to ask if there are any special cases where it is easy. Unfortunately, as we show in Section 5, they are very rare. For example, without restrictions on the agents’ utilities, OWA-Winner can be solved in polynomial time either if we treat as a constant or if we use the constant OWA vector (i.e., if we use OWA). Indeed, the problem becomes -hard already for the OWA. This holds even if the agents are restricted to have approval-based utilities (Theorem 6) or Borda-based utilities (Theorem 7). More generally, we show that OWA-Winner is -hard for every family of OWA vectors that are nonconstant and nonincreasing (Theorem 5), which captures a significant fraction of all interesting settings.

After considering the worst-case complexity of computing exact solutions in Section 5, in Section 6 we focus on the approximability of the OWA-Winner problem. We show that in this respect there is a significant difference between two main classes of OWA vectors, those that are nonincreasing and the remaining ones. We show that for the nonincreasing OWA vectors the standard greedy algorithm for optimizing submodular functions achieves approximation ratio of (), irrespective of the nature of the agents’ utilities (Lemma 12 and Theorem 13). On the other hand, we present evidence that there is little hope for good approximation algorithms for the case of OWA vectors that are not nonincreasing (Example 5 and Theorems 15 and 19).

Next, in Section 7, we consider approximation algorithms for OWA-Winner for the case where agents have non-finicky utilities. It turns out that for non-finicky utilities we can sometimes obtain much better approximability guarantees than in the general setting. The key feature of non-finicky utilities assumption is that every agent gives sufficiently high utility values to sufficiently many items, so that the algorithms have enough flexibility in picking the items to achieve high quality results. Specifically, we show a strong approximation algorithm for the case of non-finicky utilities and OWA vectors that concentrate most of the weight in a constant number of their top coefficients (Theorems 252930, and Corollary 31). These results apply, for example, to the case of geometric progression OWAs, OWAs, and OWAs (for fixed values of ). Further, when applied to the case of Borda-based utilities (which, as we have argued in Section 2.2, are non-finicky in a very strong sense), we obtain polynomial-time approximation schemes (that is, approximation algorithms that can compute solutions with an arbitrarily good precision, but whose running time depends polynomially only on the size of the problem but not necessarily on the desired approximation ratio).

5 Computing Exact Solutions

We start our analysis by discussing the complexity of solving the OWA-Winner problem exactly. In general, it seems that OWA-Winner is a rather difficult problem and below we show this section’s main negative result. That is, we show that our problem is -hard for any class of OWA vectors satisfying a certain natural restriction. Intuitively, this restriction says that in a considered family of OWAs, the impact of more-liked items on the total satisfaction of an agent is greater than that of the less-liked ones.

Theorem 3.

Fix an OWA family such that for every , is nonincreasing and nonconstant. -OWA-Winner is -hard, even for approval-based utilities.

For the sake of readability, we first prove two simpler results that we later use in the proof of Theorem 3. In these proofs, we give reductions from the standard VertexCover problem and from CubicVertexCover, its variant restricted to cubic graphs.

Definition 4.

In the VertexCover problem we are given an undirected graph , where is the set of vertices and is the set of edges, and a positive integer . We ask if there is a set of up to vertices such that each edge is incident to at least one vertex from . The CubicVertexCover problem the same problem, restricted to graphs where each vertex has degree exactly three.

VertexCover is well-known to be -hard [28]; -hardness for CubicVertexCover was shown by Alimonti and Kann [1].

Theorem 5.

Fix an OWA family , such that there exists such that for every we have . -OWA-Winner is -hard, ever for approval-based utilities.

Proof.

We give a reduction from CubicVertexCover problem. Let be an instance of CubicVertexCover with graph , where and , and positive integer . W.l.o.g., we assume that .

We construct an instance of -OWA-Winner. In we set (the agents correspond to the edges), (there are dummy items; other items correspond to the vertices), and we seek a collection of items of size . Each agent , , has utility exactly for all the dummy items and for two vertices that connects and for each of the dummy items (for the remaining items has utility ). In effect, each agent has utility for exactly items.

We claim that is a yes-instance of CubicVertexCover if and only if there exists a solution for with the total utility at least .

If there is a vertex cover of size for , then by selecting the items we obtain the required utility of the agents. Indeed, for every agent there are at least items in for which gives value (the dummy items and at least one vertex incident to ). These items contribute the value to the total agents’ utility. Additionally, since every non-dummy item has value for exactly 3 agents, and since every agent has at most items with value , there are exactly agents that have exactly items in with values . These ’th additional utility- items of the agents contribute to the total utility. Altogether, the agents’ utility is , as claimed.

Let us assume that there is a set of items with total utility at least . In we have items that have value for each of the agents, and every other item has value for exactly agents. Thus, the sum of the utilities of items (without applying the OWA operator yet) is at most . Thus, the total utility of the agents (now applying the OWA operator) is only if for each agent the solution contains items with utility . Since there are only dummy items, it means that for each agent there is a vertex in the solution such that is incident to . That is, is a yes-instance of CubicVertexCover. ∎

Theorem 6.

-OWA-Winner is -complete even for approval-based utilities.

Proof.

Membership in is clear. We show a reduction from the VertexCover problem. Let be an instance of VertexCover with graph , where and , and with a positive integer (without loss of generality, we assume that and ).

We construct an instance of -best-OWA-Winner in the following way. We let the set of items be and we form agents, two for each edge. Specifically, if is an edge connecting two vertices, call them and , then we introduce two agents, and , with the following utilities: has utility for and for , and has utility for all the other items; has opposite utilities—it has utility for and for , and has utility for all the remaining ones.

Let be some set of items (i.e., vertices) and consider the sum of the utilities derived by the two agents and from under -best-OWA. If neither nor belong to , then the total utility of and is equal to (the former agent gets utility and the latter one gets ). If only one of the items, i.e., either or , belongs to , then the total utility of and is equal to (the former agent gets utility and the latter one still gets ). Finally, if both items belong to , then the total utility of and is also equal to (the former gets utility and the latter gets utility ). Thus the total utility of all agents is equal to if and only if the answer to the instance is “yes”. This shows that the reduction is correct and, since the reduction is computable in polynomial time, the proof is complete. ∎

Using a proof that combines the ideas of the proof of Theorems 5 and 6, we show that indeed OWA-Winner is -hard for a large class of natural OWAs.

Proof of Theorem 3.

We give a reduction from CubicVertexCover. Let be an instance of CubicVertexCover with graph , where and , and with positive integer .

Now let us consider . Since is nonincreasing and nonconstant, one of the two following conditions must hold.

  1. There exists such that .

  2. There exists such that , and for every , we have .

If (1) is the case then we use a reduction similar to that in the proof of Theorem 5. The only difference is that apart from the set of dummy items (ranked first by all agents), we introduce the set of dummy items and sets , each consisting of dummy agents. The dummy items from are introduced only to fill-up the solution up to members. The dummy agents from have utility for each of the items from and for the ’th item from (they have utility for all the other items). This is to enforce that the items from are selected in the optimal solution. The further part of the reduction is as in the proof of Theorem 5.

If (2) is the case, then we use a reduction similar to that in the proof of Theorem 6. We let the set of items be , where , , and , are sets of dummy items that we need for our construction. Similarly as in the proof of Theorem 6, for each edge we introduce two agents and . Here, however, we additionally need the set of dummy agents. Each dummy agent from assigns utility 1 to each dummy item from and utility 0 to the remaining items—consequently, since , each dummy item from must be selected to every optimal solution. Further, each non-dummy agent assigns utility 1 to each dummy agent from —this way we ensure that every item from must be selected to every optimal solution. Finally, the utilities of the non-dummy agents for the non-dummy items are defined exactly as in the proof of Theorem 6. This ensures that the optimal solution, apart from and , will contain the non-dummy items that correspond to the vertices from the optimal vertex cover. ∎

One may wonder if our just-presented hardness results also hold for other restrictions on agents’ utilities. Below we show a variant of the result from Theorem 6 for Borda-based utilities. It follows by an application of a similar idea as in the proof of Theorem 6, but the restriction to Borda-based utilities requires a much more technical proof (available in the appendix).

Theorem 7.

-OWA-Winner is -hard even for Borda-based utilities.

5.1 Inherited Hardness Results

We now consider the cases of -OWA-Winner and -OWA-Winner (where is a constant). By results of Procaccia, Rosenschein and Zohar [48] and Lu and Boutilier [39], we know that the -best-OWA-Winner problem is -hard both for both approval-based utilities and Borda-based utilities (in this case the problem is equivalent to winner determination under appropriate variants of Chamberlin–Courant voting rule; in effect, many results regarding the complexity of this rule are applicable for this variant of the problem [7, 53, 56, 52]). A simple reduction shows that this result carries over to each family of -best OWAs and of -med OWAs, where is a fixed positive integer (note that for the case of approval-based utilities, these results also follow through Thoerem 3).

Proposition 8.

For each fixed , -OWA-Winner and -OWA-Winner are -complete, even if the utility profiles are restricted to be approval-based or Borda-based.

Proof.

Let be a fixed constant. It is easy to see that -best-OWA-Winner and -med-OWA-Winner are both in . To show -hardness, we give reductions from -best-OWA-Winner (either with approval-based utilities or with Borda-based utilities) to -best-OWA-Winner and to -med-OWA-Winner (with the same types of utilities).

Let be an instance of -best-OWA-Winner with agents, items, and where we seek a winner set of size . We form an instance of -best-OWA-Winner that is identical to except that: (1) We add special items such that under approval-based utilities each agent has utility for each item , , and under Borda-based utilities each agent has utility for item , . (2) We set the size of the desired winner set to be . It is easy to see that if there is an optimal solution for that achieves some utility , then there is a solution for that uses all the items and also achieves utility . Further, the set is an optimal solution for and, for , has utility .

Analogous argument shows that -best-OWA-Winner reduces to -med-OWA-Winner (also for approval-based and for Borda-based utilities). ∎

We leave the problem of generalizing the above two theorems to more general classes of OWA vectors as a technical (but conceptually easy) open problem.

5.2 Rare Easy Cases

While the OWA-Winner problem is in general -hard, there are also some natural easy cases. For example, the problem is in provided that we seek a winner set of a fixed size. Naturally, in practice the variant of the problem with fixed has only limited applicability.

Proposition 9.

For each fixed constant (the size of the winner set), OWA-Winner is in .

Proof.

For a profile with items, there are only sets of winners to try. We try them all and pick one that yields highest utility. ∎

Similarly, the problem is in when the number of available items is fixed (it follows by applying the above proposition; if the number of items is fixed then so is ). Throughout the rest of the paper we focus on the -OWA-Winner variant of the problem, where is given as part of the input and represents a family of OWAs, one for each value of .

It is easy to note that for -best OWA (that is, for the family of constant OWAs ) the problem is in .

Proposition 10.

-OWA-Winner is in .

Proof.

Let be an input instance with items and agents, where we seek a winner set of size . It suffices to compute for each item the total utility that all the agents would derive if this item were included in the winner set and return items for which this value is highest. ∎

Indeed, if the agents’ utilities are either approval-based or Borda-based, -best-OWA-Winner boils down to (polynomial-time) winner determination for -best approval rule and for -Borda rule [21], respectively (see also the work of Elkind et al. [22] for a general discussion of multiwinner rules). However, in light of this fact, Theorems 6 and 7 appear quite surprising.

Given the results in this section so far, we conjecture that the family of constant OWAs, that is, the family of -best OWAs, is the only natural family for which -OWA-Winner is in . We leave this conjecture as a natural follow-up question.444It is tempting to conjecture that for all families of non-constant OWAs, not just the natural ones, the problem is -hard. This, however, is not the case. Indeed, by following the arguments of the classic theorem of Ladner [36], it is possible to show a polynomial-time computable family of OWAs such that -OWA-Winner is in , but is neither -complete nor in . (Intuitively put, such a family could consist of interspersed long fragments where the OWAs are either -best or -best. The -best fragments would prevent the problem from being -complete, while the -best fragments would prevent it from being in .)

5.3 Integer Programming

In spite of all the hardness results that we have seen so far, we still might be in a position where it is necessary to obtain an exact solution for a given -OWA-Winner instance, and where the brute-force algorithm from Proposition 9

is too slow. In such a case, it might be possible to use an integer linear programming (ILP) formulation of the problem, given below. We believe that this ILP formulation is interesting in its own right and, in particular, that it is interesting future work to experimentally assess the size of instances for which it yields solutions in reasonable amount of time.

Theorem 11.

OWA-Winner reduces to computing a solution for the following integer linear program.

subject to:
Proof.

Consider an input instance with agents and items , where we seek a winner set of size , under OWA . For each , , we write to denote the utility that agent derives from item .

We form an instance of ILP with the following variables: (1) For each , , and , there is an indicator variable (intuitively, we interpret to mean that for agent , item is the -th most preferred one among those selected for the solution). (2) For each , there is an indicator variable (intuitively, we interpret to mean that is included in the solution). Given these variables (and assuming that we enforce their intuitive meaning), the goal of our ILP is to maximize the function .

We require that our variables are indeed indicator variables and, thus, take values from the set only (constraints (f) and (g)). We requite that the variables of the form are internally consistent. (constraint (c) says that each agent ranks only one of the candidates from the solution as -th best, constraint (d) say that there is no agent and item such that views as ranked on two different positions among the items from the solution.) Then, we require that variables of the form are consistent with those of the form (constraint (b)) and that exactly items are selected for the solution (constraint (a)).

Our final constraint, constraint (e), requires that variables indeed for each agent sort the items from the solution in the order of descending utility values. We mention that constraint (e) is necessary only for the case of OWAs that are not-nonincreasing. For a nonincreasing , an optimal solution for our ILP already ensures the correct “sorting” (otherwise our goal function would not be maximized). ∎

We should note that linear-programming formulations of OWA-based optimization problems have appeared in the literature far before our work; see, for example, the paper of Ogryczak and Śliwinski [46]. Yet, we use the OWA operators in a very different way and, thus, our approach is different. (In essence, Ogryczak and Śliwiński use an OWA operator to aggregate a number of values, whereas we use a simple sum to aggregate the agents’ perceived utilities, but we compute these perceived utilities by applying an OWA operator to each agent’s individual, intrinsic utilities.)

6 Approximation: General Utilities and Approval Utilities

The OWA-Winner problem is particularly well-suited for applications that involve recommendation systems (see, e.g., the work of Lu and Boutilier [39] for a discussion of

-best-OWA-Winner in this context). For recommendation systems it often suffices to find good approximate solutions instead of perfect, exact ones, especially if we only have estimates of agents’ utilities. It turns out that the quality of the approximate solutions that we can produce for

OWA-Winner very strongly depends on both the properties of the particular family of OWAs used and on the nature of agents’ utilities.

First, we show that as long as our OWA is nonincreasing, a simple greedy algorithm achieves approximation ratio. This result follows by showing that for a nonincreasing OWA , the function (recall Definition 1) is submodular and nondecreasing, and by applying the famous result of Nemhauser et al. [44].

Notation:
input OWA operator , restricted to its top entries.
;
for  to  do
       ;
       foreach  do
             ;
            
             ;
            
             return ;
Algorithm 1 The greedy algorithm for finding the utilitarian set of winners.

Recall that if is some set and is a function , then we say that: (1) is submodular if for each and , , and each it holds that:

and (2) is nondecreasing if for each and each it holds that .

Lemma 12.

Let be an instance of OWA-Winner with a nonincreasing OWA . The function is submodular and nondecreasing.

Proof.

Let be an instance of OWA-Winner with agent set , item set , desired solution size , and OWA . For each agent and each item , is a nonnegative utility that derives from .

Since all the utilities and all the entries of the OWA vector are nonnegative, we note that is nondecreasing. To show submodularity, we decompose as follows:

For each , and , let be the set of those items from whose utility, from the point of view of agent , is highest (we break ties in an arbitrary way). Since nonnegative linear combinations of submodular functions are submodular, it suffices to prove that for each and each , function is submodular.

To show submodularity of , consider two sets, and , , and some . We claim that:

(1)

Let and denote the utilities that the -th agent has for the -th best items from and , respectively (or if a given set has fewer than elements). Of course, . Let denote -th agent’s utility for . We consider two cases. If , then both sides of (1) have value 0. Otherwise:

which proves (1) and completes the proof. ∎

Based on the above result, we can easily show that Algorithm 1 is a polynomial time -approximation for the OWA-Winner problem, for the case of nonincreasing OWA vectors (see Theorem 13 below). Algorithm 1 is a natural incarnation of the greedy algorithm of Nemhauser et al. [44]. It starts by setting the found-so-far solution to be empty. Then, in each iteration it extends by adding this item that causes the greatest increase in the utility.

Example 4.

Let the items and agents be as in Example 3. Let and consider OWA vector . Throughout the iterations, we obtain the following gain values (the contents of are given at the beginning of each iteration; below we also explain some of the computation):

At the beginning of the first iteration and the algorithm simply computes the utility of each item separately, using OWA operator . For example, . In the first iteration both and lead to the highest gain and, so, the algorithm is free to pick either of them. We assume it picks . In the second iteration, we have and, for example, the gain value for is computed as:

It is the highest gain value and so the algorithm includes in the solution. In the third iteration, item has the highest gain and so the algorithm includes it in . Finally, the algorithm outputs .

Theorem 13.

For a nonincreasing OWA , Algorithm 1 is a polynomial time -approximation algorithm for the problem of finding the utilitarian set of winners.

Proof.

The thesis follows from the results of Nemhauser et al. [44] on approximating nondecreasing submodular functions. ∎

Algorithm 1 has interesting interpretation in the context of voting systems. This greedy algorithm can be viewed not only as an approximation algorithm, but also as a new iterative voting rule. Indeed, many popular voting rules are defined as iterative (greedy) algorithms. Such rules are not only polynomially solvable, but also are easier to understand for the society. Further, Caragiannis et al. [17] and, later, Elkind et al. [22], advocate viewing approximation algorithms for computationally hard voting rules as new election systems, and study their axiomatic properties (often showing that they are better than those of the original rules).

Here we give another interesting observation. It turns out that the algorithm from Theorem 13, when applied to the case of approval-based utilities and the harmonic OWA, is simply the winner determination procedure for the Sequential Proportional Approval Voting rule [11] (developed by the Danish astronomer and mathematician Thorvald N. Thiele, and used for a short period in Sweden during early 1900’s). That is, the Sequential Proportional Approval Voting rule is simply an approximation of the PAV rule (the Proportional Approval Voting rule). We believe that this observation gives another evidence that approximation algorithms for computationally hard voting rules can indeed be viewed as new full-fledged voting rules. (We point readers interested in approval-based multiwinner voting rules to the overview of Kilgour [35] and to the works of Aziz et al. [5, 4], Elkind and Lackner [24], and Skowron and Faliszewski [52]).

Is a -approximation algorithm a good result? After all, and so the algorithm guarantees only about 63% of the maximum possible satisfaction for the agents. Irrespective if one views it as sufficient or not, this is the best possible approximation ratio of a polynomial-time algorithm for (unrestricted) OWA-Winner with a nonincreasing OWA. The reason is that -best-OWA-Winner with approval-based utilities is, in essence, another name for the MaxCover problem, and if , then is approximation upper bound for MaxCover [25]. We omit the exact details of the connection between MaxCover and -best-OWA-Winner and instead we point the readers to the work of Skowron and Faliszewski [52] who discuss this point in detail (we mention that they refer to what we call -best-OWA-Winner as winner determination for Chamberlin–Courant’s voting rule).

For OWAs that are not nonincreasing, it seems that we cannot even hope for a -approximation algorithm. There are two arguments to support this belief. First, such OWAs yield utility functions that are not necessarily submodular and, so, it is impossible to apply the result of Nemhauser et al. [44]. As an example, we show that -med-OWA yields a utility function that is not submodular.

Example 5.

Let us consider a single agent, two sets of items and (of course ), and -OWA . The utilities of the agent over the items , , , and are equal to 10, 9, 2, and 1, respectively. We get:

That is, is not submodular. Indeed, this example works even for approval-based utilities: it suffices to set the utilities for and to be , and for and to be .

Second, it is quite plausible that there are no constant-factor approximation algorithms for many not-nonincreasing OWAs. As an example, let us consider the case of families of OWAs with the following structure: their first entries are zeros followed by some nonzero entry at a sufficiently early position. If there were a good approximation algorithm for winner determination under such OWAs, then there would be a good approximation algorithm for the Densest-K-Subgraph problem, which seems unlikely.

Definition 14.

In a Densest-k-Subgraph problem we are given an undirected graph and a positive integer . We ask for a subgraph with vertices with the maximal number of edges.

Theorem 15.

Fix some integer ,