Recognizing Single-Peaked Preferences on an Arbitrary Graph: Complexity and Algorithms

04/11/2020 ∙ by Bruno Escoffier, et al. ∙ Laboratoire d'Informatique de Paris 6 0

This paper is devoted to a study of single-peakedness on arbitrary graphs. Given a collection of preferences (rankings of a set of alternatives), we aim at determining a connected graph G on which the preferences are single-peaked, in the sense that all the preferences are traversals of G. Note that a collection of preferences is always single-peaked on the complete graph. We propose an Integer Linear Programming formulation (ILP) of the problem of minimizing the number of edges in G or the maximum degree of a vertex in G. We prove that both problems are NP-hard in the general case. However, we show that if the optimal number of edges is m-1 (where m is the number of candidates) then any optimal solution of the ILP is integer and thus the integrality constraints can be relaxed. This provides an alternative proof of the polynomial-time complexity of recognizing single-peaked preferences on a tree. We prove the same result for the case of a path (an axis), providing here also an alternative proof of polynomiality of the recognition problem. Furthermore, we provide a polynomial-time procedure to recognize single-peaked preferences on a pseudotree (a connected graph that contains at most one cycle). We also give some experimental results, both on real and synthetic datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Aggregating the preferences of multiple agents is a primary task in many applications of artificial intelligence, e.g., in preference learning

[8, 9] or in recommender systems [2, 20]. The individual preferences of the agents are often represented as rankings on a set of alternatives, where the alternatives may be cultural products (books, songs, movies…), technological products, candidates for an election, etc. The aim of preference aggregation is then to produce an aggregate ranking from a collection of rankings (called preference profile).

The preferences are said to be structured if they share some common structure [12]. For example, in a political context, it is conventional to assume that each individual preference is decreasing as one moves away from the preferred candidate along a left-right axis on the candidates, axis on which individuals all agree. Such preferences are called single-peaked [5]. They have been the subject of much work in social choice theory. The most well-known result states that if preferences are single-peaked, then one escapes from Arrow’s impossibility theorem. We recall that Arrow’s theorem states that any unanimous aggregation function for which the pairwise comparison between two alternatives is independent of irrelevant alternatives is dictatorial. Furthermore, from the computational viewpoint, many NP-hard social choice problems (e.g., Kemeny rule and Young rule for rank aggregation [6], Chamberlin-Courant rule for proportional representation [4]) become polynomially solvable if the preferences are single-peaked.

Given the axiomatic and algorithmic consequences, the question of the computational complexity of recognizing single-peaked preferences is thus natural. Bartholdi and Trick [3] have proposed an algorithm to compute a compact representation of all axes on which a collection of preferences on candidates are single-peaked, or state that none exists. This complexity can be decreased to if one looks for only one possible axis [14].

Several classes of structured preferences have been proposed in the literature in order to generalize the single-peaked domain with respect to an axis, i.e., a path, to more general graphs. Given a set of candidates, a preference order over is single-peaked on an undirected graph if it is a traversal of , i.e., for each the upper-contour set is connected. A preference profile is then single-peaked on if every preference is single-peaked on . Demange studied single-peakedness on a tree [11]; Peters and Lackner studied single-peakedness on a circle [23].

Some good axiomatic properties remain valid when preferences are single-peaked on a tree: if the number of voters is odd, such profiles still admit a Condorcet winner (a candidate who is preferred over each other candidate by a majority of voters)

[11], and returning this Condorcet winner is a strategyproof voting rule. On the contrary, every majority relation can be realized by a collection of preferences that are single-peaked on a circle [23], and consequently single-peaked preferences on a circle do not inherit the good axiomatic properties of single-peakedness on an axis regarding voting rules that are based on the majority relation.

The goal of this paper is to study the recognition problem for single-peaked preferences on arbitrary connected graphs. Although one cannot expect social choice theoretic guarantees from single-peakedness on an arbitrary graph (as it does not result in a domain restriction, in the sense that any preference profile is single-peaked on an arbitrary graph), the knowledge of such graphs indeed provides information based on preferences that gives some insights on the similarity of candidates/items, that could be used for instance in recommendation systems. For instance, assume that one discovers that the preferences over movies are single-peaked w.r.t. an axis (1,2,3,4,5) over the movies. If ones knows that an agent likes movies 3 and 5, then it is natural to recommend to watch movie 4. More generally, one can take advantage of single-peakedness on a sparse graph in order to make recommendations in the neighbourhood of liked items. Thereby, we focus here on determining a graph that minimizes (1) the number of edges of the graph or (2) the maximum degree of a vertex. This choice is motivated by the fact that these criteria are measures of sparsity of a graph (the sparsest the graph is, the more informative), but also because they generalize known cases such as paths, cycles and trees. Let us indeed emphasize that the mathematical programming approach we propose to identify a graph generalizes the best known instances of the single-peaked recognition problem and provides a uniform treatment of them, leading to simple polynomial time algorithms.

Our contribution. We propose here an Integer Linear Programming formulation (ILP) of problems (1) and (2), and we show that both of them are NP-hard. Nevertheless, if the optimal value for problem (1) is (where is the number of candidates), we prove the integrality of the optimal basis solution of the linear program obtained by relaxing the integrality constraint in the ILP. This provides an alternative polynomial time method, based on a simple linear programming solver, to recognize single-peakedness on a tree, as a connected graph with vertices and edges is a tree. By adding some constraints on the max degree of a vertex, we obtain the same result for the case of paths. As a last theoretical result, we prove that preferences single-peaked on a pseudotree (a connected graph that contains at most one cycle) can be recognized in polynomial time. We also provide some experimental results, both on real-world and synthetic datasets, where we measure the density of the graphs depending on the diversity of preferences of voters.

Related work. We briefly describe here some previous contributions that have addressed the concept of single-peakedness on arbitrary graphs, the optimization view of the recognition problem and the use of integer linear programming formulations for computational social choice problems related to structured preferences:
  Nehring and Puppe defined a general notion of single-peaked preferences based on abstract betweenness relations between candidates [19]. In their setting, it is possible to define single-peaked preferences on a graph by considering the graphic betweenness relation: candidate is between candidates and if and only if lies on a shortest path between and in . A preference profile is then single-peaked on if for every preference , if is the most preferred candidate w.r.t. and is on a shortest path between and then . This definition enables them to state general results regarding strategyproofness on restricted domains of preferences. Note that this definition of single-peakedness on a graph does not coincide with the one we use.
  Peters and Elkind showed how to compute in polynomial time a compact representation of all trees with respect to which a given profile is single-peaked [22]. This structure allows them to find in polynomial time trees that have, e.g., the minimum degree, diameter, or number of internal nodes among all trees with respect to which a given profile is single-peaked. We provide here alternative proofs for some of these results, based on linear programming arguments.
Peters recently proposed ILP formulations for proportional representation problems, and showed that the binary constraint matrix is totally unimodular if preferences are single-peaked, because the matrix has then the consecutive ones property [21]. We recall that the vertices of a polyhedron defined by a totally unimodular constraint matrix are all integer, thus solving the linear programming relaxation yields an optimal solution to the original ILP problem. We also rely on linear programming for proving the polynomial time complexity of some of the recognition problems we tackle here.

Organization of the paper. The two optimization variants of the recognition problem tackled in the paper are defined in Section 2. The two problems are proved NP-hard, and an ILP formulation is proposed. In Section 3, we consider the continuous relaxation of the ILP to show how to recognize single-peaked preferences on a path or a tree by linear programming, thus providing a unified view of known polynomiality results. Section 4 is devoted to a new tractable case of recognition problem, namely the recognition of single-peaked preferences on a pseudotree. Finally, the results of numerical tests are presented in Section 5, on both real and synthetic data, to give some hints about the kind of graphs that are returned by the ILP on real data, and to study how the density of the graph varies with the number of voters and the diversity of preferences.

2 Ilp Formulation and Complexity

2.1 Problem Definition

We start by recalling some basic terminology of social choice theory. Given a set of candidates and a set of voters, each voter ranks all candidates from the most to the least preferred one. This ranking is called the preference of . It is simply a permutation of , which can be formally described as an m -tuple , where is the k-th most preferred candidate of voter . The set of preferences of all voters is called the profile.

As emphasized in the introduction, several definitions of single-peakedness on an arbitrary graph can be found in the literature. In our study, we are using the following one [12]:

Definition 2.1.

Single-peakedness on arbitrary graph (SP) Let be a set of m candidates and the profile of preferences of n voters. Let be a connected undirected graph. We say that is single-peaked on the graph (SP) if every is a traversal of , i.e., for each and for each , the subgraph of induced by the vertices is connected.

This notion of single-peakedness coincides with the standard definition on an axis/cycle/tree when is a path/cycle/tree. In this article, when a profile is single-peaked w.r.t. a graph , for conciseness we will say that is compatible with (or that is compatible with ).

Example 1.

Consider the profile with 4 voters and 5 candidates:

Note that, for , the connectivity constraint applied to the first two candidates makes the edge necessary in the graph. The same occurs for , and . Thus, any solution contains the 4-cycle (in particular the profile is not SP on a tree or on a cycle). One can easily check that adding edge makes a graph with 5 edges compatible with the profile, and this is the (unique) optimal solution if we want to minimize the number of edges.

Obviously, any profile is SP on the complete graph. However, this case is not interesting because it does not give any information about the preference structure. That is why we are looking for a minimal graph on which the profile is SP. The notion of minimality needs to be made more precise. In our study, we focus essentially on minimizing the number of graph edges. Another criterion we consider is the minimization of the maximum degree of vertices. Put another way, given a preference profile , we want to determine a graph on which the profile is SP, so as to minimize either the number of edges of , or its (maximum) degree. We emphasize the fact that:

  • minimizing the number of edges allows to detect when the profile is compatible with a tree (this occurs iff the minimum number of edges is , since is necessarily connected);

  • minimizing the degree of allows to detect when the profile is compatible with a cycle (this occurs iff there exists a graph with maximum degree 2);

  • combining the objective allows to detect when the profile is compatible with an axis: this occurs iff there is a graph with maximum degree 2 and edges.

So the tackled problems generalize the most well known (tractable) recognition problems of single-peakedness.

2.2 ILP Formulation

We now present an ILP formulation of the tackled problems. We are looking for a graph with m vertices. For each pair

of vertices, we define a binary variable

such that

Hence, if we are minimizing the number of graph edges, the objective function takes the form

If the objective is to minimize the maximum degree, then

In this latter case, the classical way of linearizing in a minimization setting is to introduce an auxiliary variable as follows:

Regardless of the objective function, the other constraints of the problem remain the same. Each for has to be a graph traversal. In other words, for each , is connected to at least one of the vertices . In terms of LP constraints, it is formulated as follows:

To sum up, the ILP formulation of the tackled problems is:

s.t.
Example 2.

Consider profile . The ILP formulation of the problem of determining a graph compatible with while minimizing the number of graph edges is:

s.t.

2.3 Minimizing the Number of Edges

In this section, we study the computational complexity of the problem of minimizing the number of edges of . As a first observation, note that we cannot use the continuous relaxation of this ILP () to solve the problem. The following example indeed shows that the optimal solution (when minimizing the number of edges) of this relaxation is not necessarily integer:

Example 3.

Consider the profile with 3 voters and 4 candidates:

From the two first options of each voter, we see immediately that the edges , and are necessarily present in the graph. Then, we observe that vertex needs to be connected to at least one of vertices and , at least one of vertices and and finally at least one of vertices and . Consequently any integer solution of the problem will be a graph with at least 5 edges. However, there exists a fractional solution of the continuous relaxation with value 4.5:

1

2

3

4

1

1

1

0.5

0.5

0.5

We now show that the problem is actually NP-hard.

Theorem 2.1.

Given a preference profile , it is NP-hard to find a graph compatible with with a minimum number of edges.

Proof.

We use a polynomial time reduction from the set cover problem, known to be NP-hard [16]. We recall its definition:
Set cover problem: Given a finite set of elements, a set of subsets of and , the question is to determine if there exists a subset of size such that .
From an instance of the set cover problem, we define a preference profile as follows:

Let be a set of candidates.

Let be a set of voters. Let be the subsets in containing element , and the other subsets in . Then, the preference of voter is defined as

We add voters such that

We prove that there exists a set cover of size if and only if there exists a graph compatible with that has edges.

Let be a set cover solution of size . We generate a graph compatible with in the following manner:

  • For each , the edge is in - this is necessary for the preferences of type above to be SP on .

  • For each , the edge is in if and only if .

Hence, the subgraph formed by vertices is a clique having edges, and there are exactly more edges adjacent to - in total, has edges. As , the graph is connected and all preferences of type are SP on . Let be one of the preferences of type . We need to prove that is connected to at least one of the vertices . As the sets are the only sets of containing the element , and as is a solution of the set cover instance, this is true due to . So, is a graph compatible with that has edges.

To prove the other implication, let be a graph compatible with that has edges. As is compatible with , the subgraph induced by the set of vertices must be a clique so that the preferences of type are SP on . Hence, this subgraph contains edges, and so, there are exactly edges adjacent to . Let us define containing iff is adjacent to in . As is compatible with , each preference of type is SP on . It means that at least one of is adjacent to , so is in . As all these sets contains , there is an element of that covers . The subset is thus a solution of size of the set cover instance. ∎

2.4 Minimizing the Maximum Degree

We now consider our second objective function, namely the maximum degree of a vertex in the graph (to be minimized). We come up with similar results.

First, as for the minimization of the number of edges, the ILP formulation we have proposed in Section 2.2 is not integer, as we can see in the following case:

Example 4.

Consider a profile with 3 candidates and one voter with ranking . The ILP formulation of the problem of determining a graph of minimum degree compatible with :

s.t.

The value of an optimal integer solution is , but there exists a fractional solution of the continuous relaxation with value 1.5:

1

2

3

1

0.5

0.5

Here again, we show that the problem of minimizing the degree of is NP-hard, by a similar reduction sketched below.

Theorem 2.2.

Given a preference profile , it is NP-hard to find a graph compatible with with a minimum degree.

Proof.

Let , , be an instance of the set cover problem. Consider the profile defined as follows:

Let be a set of candidates.

Let be a set of voters. Let be the subsets in containing the element , and the other subsets in . Then, the preference of voter is defined as

We add voters such that

We add m voters where the preference of is defined as

and a voter with preference .

Note that in any graph compatible with the profile:

  • the form a clique (edge is enforced by voter ),

  • is adjacent to all (due to voter ),

  • is in the graph (due to ).

We call these edges necessary edges. We claim that there exists a set cover of size (at most) iff there is a graph compatible with the profile with degree at most .

Suppose that there is a set cover of size at most . Then, beyond the necessary edges mentioned above, we put an edge iff is in . Then the vertex with maximum degree is , with degree . The graph is compatible with each voter because is a set cover. It is compatible with voter thanks to the necessary edges (and , so is connected as well). It is compatible with and thanks to the necessary edges.

Now suppose that there is a solution with degree at most . In particular, has degree at most , hence is adjacent to at most vertices . The preference of voter imposes that is adjacent to some which contains . In other words, the set of these (at most) sets is a set cover of size at most . ∎

3 Recognition of Trees and Paths

In this section, we focus on the tree and path recognition - given a profile , we are looking for a tree (or a path) on which the profile if SP.

Recognizing single-peaked preferences on a tree can be done using the procedure by Trick [24]. As an alternative proof of this result, we show in this section that the continuous relaxation of the ILP formulations given in Section 2.2 can be used to solve this recognition problem in polynomial time: in fact, all (optimal) extremal solution are integral (Theorem 3.1). We show in Theorem 3.2 a similar result for the recognition of profiles SP on a path.

We start by recalling Trick’s procedure [24], as we will use it in the proof of the results of the two theorems mentioned above.

Recognition of profiles SP on a tree [24]

Let be a profile containing preference lists of n voters over m candidates. The algorithm of Trick builds a tree in an iterative way. In each iteration, it identifies one candidate (i.e., graph vertex) which is necessarily a leaf, and it determines the set of vertices this leaf can be connected to. It deletes then this leaf from all preferences, and repeats this process on the modified profile with preferences over candidates. The algorithm stops when only one candidate is remaining. Let us describe in more details the leaf recognition :

Let be a candidate placed at the last position by at least one voter. Trick shows that, if preferences are SP on a tree, then must necessarily be a leaf. More formally, for each , let us denote by the set of candidates ranked better than by voter if is not ranked first by ; if is ranked first by , then is the singleton containing the second most-preferred candidate of . From , the following conclusions can be drawn:

  • if , there does not exist a tree solution.

  • Otherwise, is the set of vertices the leaf can be connected to.

Example 5.

Consider the profile defined by:

  • The candidate is classed at the last position by at least one voter - we will determine the set :

    The candidate is then deleted from the preference lists and we continue next iteration with the sub-profile , and .

  • The candidate is classed at the last position by at least one voter - we see that . We continue with the sub-profile , , .

  • We get , and the algorithm stops as we obtain a sub-profile involving only one candidate.

To sum up, we have obtained (first iteration), (second iteration) and (third iteration). Consequently, in any tree compatible with , vertex 2 and vertex 4 have to be connected to vertex 1, and vertex 3 has to be connected to vertex 1 or 2. Hence, there exists two trees on which profile is SP, and these are:

1

2

3

4

1

2

4

3

Using LP to recognize SP preferences on a tree or a path

Let us consider the following continuous relaxation LP-SP (linear program for single-peakedness) of the ILP introduced in Section 2.2:

s.t.

We show in Theorem 3.1 that we can use LP-SP to solve in polynomial time the problem to determine, given a profile, whether there exists or not a tree compatible with it.

Theorem 3.1.

If a profile is compatible with a tree, then any extremal optimal solution of LP-SP is integral, i.e., for any .

Proof.

The proof is based on two properties of optimal solutions of LP-SP when the profile is compatible with a tree. These two properties allow to come up with a reformulation of the problem as a maximum flow problem, where there is a bijection between the solutions of LP-SP of value and the (optimal) flows of value . The result then comes from the fact that any extremal solution of the flow problem (with integral capacity) is integral [1].

The first property states that all constraints of LP-SP are tight in a solution of value .

Property 1. If the optimal value of LP-SP is , then all constraints are tight in an optimal solution : .

Proof of Property 1. Let be a voter. There are constraints associated with , and each variable appears in exactly one of these constraints. Since on the one hand the sum of all variables is (objective function), and on the other hand the sum of variables in each of these constraints is at least one, each constraint must be tight. This concludes the proof of Property 1.

Now, let us consider that the profile is SP with respect to a tree. The recognition procedure recalled above starts by identifying a candidate, say m, ranked last in at least one ranking and such that . This procedure is then applied recursively, till there is only one candidate. For simplicity, let us assume that the first removed (identified) candidate is m, the second , and so on. To avoid confusion, we denote the set when considering the profile restricted to the first candidates (when is identified as a leaf, and then removed from the profile).

Property 2. If the profile is SP on a tree, then in an optimal solution of LP-SP, for any candidate we have , and for any .

Proof of Property 2. Let us consider some candidate , some optimal solution of LP-SP, and assume that the properties in the lemma are true for any . We will show that they are also true for .

To do this, let us define LP-SP() as the linear program corresponding to the problem restricted to the candidates . We first need to show that the optimal solution restricted to the first candidates, let us call it , is feasible and optimal for LP-SP(). To do this, let us consider a constraint of LP-SP(), let us say the constraint of connecting candidate for some voter . In the corresponding constraint in the initial program LP-SP, there are possibly some other variables : there is such variable for each candidate ranked before by . But then when was removed, if appears in the constraint then , and then . So all ‘removed variables’ in the constraint was set to 0 in , hence is feasible for LP-SP(). We can now easily see that it is optimal: each time a candidate has been removed, so the total weights of (remaining) variables reduce by 1. Thus is a feasible solution of LP-SP() of value .

Now we can focus on on LP-SP(). Note that the profile is trivially SP on the first candidates (as it is SP on the whole set of candidates). Candidate is ranked in last position by some voter , so we have (constraint of connecting for voter ), and by Property 1 we have . If all candidates are in then we are done. Otherwise, consider a candidate . Then is ranked after by some other voter (and, for this voter , and are not the best two candidates). Then, if we get , and the constraint associated to for connecting to its predecessors is violated. So for any , and consequently . This concludes the proof of Property 2.

Now we reformulate the problem as a flow problem. From , we build a network (directed graph) with:

  • A source , a destination , and for each candidate two vertices and .

  • We have an arc from to each with capacity 1, and an arc from each to with capacity .

  • For each candidate , we have an arc for each . The capacity of this arc is 1 if , and 0 otherwise.

Let us denote by a flow on this network, with the flow on edge . Note that has no outgoing edge, so the optimal flow is at most .

We show that the correspondence (for each ) is a bijection between solutions of value of LP-SP and (optimal) flows of value in .

Let be a flow of value . As there is no flow through , there is a flow of value 1 through each . Since arc has capacity 0 if , by flow conservation we have , which means that . Now consider a voter where is not ranked first. By the procedure of Trick, when is identified as a leaf, all candidates in are ranked before , and the corresponding constraint is satisfied. This is true for all candidates and voters, so is a feasible solution of LP-SP, of value .

Conversely, let be a feasible solution of LP-SP of value . From Property 2, we have for each candidate . This immediately gives a flow of value .

By integrality of extremal flows (any non integral optimal flow is a convex combination of integral flows), any extremal optimal solution of LP-SP is integral (when there exists a tree compatible with ). ∎

Let us now turn to the recognition of profiles SP on a path. A (connected) graph is a path iff it is a tree with degree at most 2. Hence, we consider the following ILP formulation where we minimize the number of edges and add constraints on the vertex degrees:

s.t.

Clearly, a profile is compatible with a path iff the optimal value of the previous ILP is . Let us call LP-SP2 the continuous relaxation. By using very similar arguments as above (same reformulation as a flow problem), one can prove the following result.

Theorem 3.2.

If a profile is compatible with a path, then any extremal optimal solution of LP-SP2 is integral, i.e., for any .

4 Recognition of Pseudotrees

So far, we have seen that our minimization problem is NP-hard in the general case, but polynomially solvable in the case where the optimal solution is a tree. As a natural extension, we consider the problem to recognize profiles that are single-peaked with respect to a graph with edges, for some fixed , thus allowing more edges than in a tree. In this section, we consider the case . A graph on m vertices with m edges is called a pseudotree. We show that recognizing if there exists a pseudotree compatible with a given profile can be done in polynomial time. We leave as open question the parameterized complexity of the problem when is the parameter: would the problem be in XP? Or even in FPT?

Let us now deal with the case of pseudotree. Hence, the set of solutions we want to recognize is the class of connected graphs having (at most) m edges. To solve the problem in polynomial time, we devise an algorithm that first identifies the leaves of the pseudotree and then the cycle on the remaining vertices. The second step (cycle recognition) is done using the polynomiality of recognizing single-peakedness on a cycle [23]. For the first step, we need to modify the procedure of recalled in Section 3. This procedure was able to correctly identify leaves when the profile was compatible with a tree, but it fails to correctly identify leaves when the underlying structure is a pseudotree. With a slight modification though, we obtain in Proposition 4.1 a necessary and sufficient condition for a candidate to be a leaf in a pseudotree. This is the stepping stone leading to the polynomiality of detecting whether a given profile is compatible with a pseudotree, stated in Theorem 4.1.

Example 6.

Let us consider the profile on 4 voters and 5 candidates given in Example 1, for which there is a (unique) pseudotree compatible with it.

The procedure to detect leaves when looking for a tree focuses on candidates ranked last in some , candidates and here, and . Note that the whole profile is not compatible with a cycle, so we need somehow to first detect as a leaf, and then detect that the candidates are SP with respect to a cycle.

The central property that allows to recognize profiles compatible with a pseudotree is given in the following proposition.

Proposition 4.1.

Let be a preference profile, and suppose that a candidate is such that . Then is compatible with a pseudotree if and only if it is compatible with a pseudotree where is a leaf.

Proof.

Let be a pseudotree compatible with where is not a leaf. We transform into a pseudo-tree compatible with where is a leaf. Let .

Case 1: . Let us first consider an easy case, where . Then we build from by simply replacing each edge (with ) by the edge . Since for each voter either is ranked before , or is first and second, then this modification creates a graph compatible with all the preferences. Note that has (at most) as many edges as , so it is a pseudotree (or a tree, and we can add any edge to create a pseudotree).

Case 2: . Let us now consider the case where . Note that then is ranked before in all preferences (otherwise is first and is second, and the edge is forced to be in any compatible graph, a contradiction). Then we transform into a graph which is a pseudotree containing the edge , and then Case 1 applies to . To do this, let us consider two subcases.

Case 2a. If, in , in all (simple) paths from to the predecessor of is the same vertex . Then we create by replacing the edge by the edge . Consider a voter . Since is ranked before by , then is ranked before by (the subgraph induced by and the candidates ranked before him by is connected and contains and , so it contains ). Then the modification does not affect (it is still connected to one of the candidates ranked before him), and is now connected to .

Case 2b. In the other case, in there are two simple paths from to such that the predecessor of is in the first one and in the second one (note that there cannot be more than 2 since is a pseudotree). We build from by deleting the edges and , and adding edges and , see Figure 1.

Figure 1: Edge swap

Consider a voter . Since prefers to and the subgraph of induced by the candidates up to in the ranking of is connected, then or is ranked before by , say (we assume wlog that is preferred to by ). Then we see that is compatible with the preference of : indeed, when considering candidates one by one in the order of , the only modification holds for , which is now connected to (ranked before him), and for , which is now connected to (ranked before him). ∎

Note that can be connected to any vertex .

Before giving the procedure that recognizes preferences compatible with a pseudotree, we need to establish another property regarding such preferences.

Proposition 4.2.

If a preference profile is compatible with a pseudotree, then either there exists a candidate such that , or is compatible with a cycle.

If is not compatible with a cycle, then there exists a leaf in a pseudotree compatible with . Let be the unique neighbor of in . Suppose that . Let be a voter who prefers to , and is not second if is first in the ranking of . If is not first, then the subgraph induced by the candidates up to (in the ranking of ) is not connected. If is first (and not second), the subgraph induced by the first two candidates (in the ranking of ) is not connected. Contradiction.

Consider now the following procedure Detect_PseudoTree that detects pseudotree-singlepeakedness:

  1. Set

  2. While there are at least 4 candidates, and a candidate such that :

    1. Add edge to for some .

    2. Remove from the profile.

  3. Detect if there is a cycle which is compatible with the (remaining) profile:

    • If YES: output

    • If NO: output NO.

Theorem 4.1.

Given a preference profile on at least 3 candidates, Detect_PseudoTree is a polytime procedure which outputs a pseudotree compatible with if some exists, and outputs NO otherwise.

Proof.

Detect_PseudoTree obviously runs in polynomial time. We proceed by induction on the number of candidates. If there are three candidates the procedure outputs a cycle on these 3 candidates. Now suppose that the result is true up to candidates, and consider a profile on candidates.

Suppose that is compatible with a pseudotree .

  • If there exists a candidate with , then by Proposition 4.1, there exists a pseudotree compatible with where is a leaf. Then the profile obtained from by removing is compatible with a pseudotree (), and adding the edge as done by Detect_PseudoTree gives a pseudotree compatible with .

  • Otherwise, by Proposition 4.2, is compatible with a cycle, which is found by Detect_PseudoTree (Step 3).

Suppose now that Detect_PseudoTree does not output NO. If there were no candidate with , then is compatible with the cycle . Otherwise, let be the candidate in the first iteration of the loop in Step 2 (). Then, on the profile where is removed, Detect_PseudoTree outputs a pseudotree, compatible with this profile without by induction. Since , adding edge makes the pseudotree compatible with . ∎

Note that the generalization of this polynomiality result to, say, connected graphs with edges seems to require new techniques (even for fixed , i.e. to show that the problem is in XP when parameterized by ). Indeed, an enumeration of all subsets of edges does not allow to reduce the problem to trees. Procedure Detect_PseudoTree does not seem to generalize as well, as it specifically relies on the decomposition of the solution into one cycle and leaves.

5 Experimental Study

We carried out numerical experiments111All tests were performed on a Intel Core i7-1065G7 CPU with 8 GB of RAM under the Windows operating system. We used the IBM Cplex solver for the solution of ILPs. on real and randomly generated instances of the problems tackled in the paper. In the case of real data, we compare the optimal solution of the ILP to that of its continuous relaxation. We also focus on the ability to detect structure in voters’ preferences depending on the election context. To go further, we use randomly generated instances to study structural aspects of solutions; we notably study the graph density depending on the number of voters and on the dispersion of their opinions.

5.1 Numerical tests on real data

We used PrefLib data sets available on www.preflib.org to perform our numerical tests [18]. While this database offers four different types of data, only the ED (Election Data) type is relevant for our study. Among the ED data sets, we used the complete strict order lists (which correspond to files with .soc extension).

At the time we carried out these experiments, 315 data files of this type were available in PrefLib, however, many of them were not adapted to our study for several reasons. The first one is that many elections dealt with only 3 or 4 candidates and a great number of voters, hence the obtained graph was, unsurprisingly, always complete. We also met the opposite problem when there were very few voters, typically 4, so there was no point in looking for some general structure. Thus, in practice, there were 25 real data files usable for our purposes, namely:

  • 20 files from the ED-00006 data set, which contains figure skating rankings from various competitions during the 1998 season including the World Juniors, World Championships, and the Olympics.

  • 2 files from the ED-00009 data set, which contains the results of surveying students at AGH University of Science and Technology (Krakow, Poland) about their course preferences.

  • 1 file from the ED-00012 data set, where individuals ranked T-shirt designs.

  • 1 file from the ED-00014 data set, which contains preferences about various kinds of sushi (surveys conducted by Toshihiro Kamishima).

  • 1 file from the ED-00032 data set, which contains the results of surveying students in the Faculty of Informatics, Instituto Superior Politécnico José Antonio Echeverría (Cujae, Havana, Cuba), about the most important criteria affecting their performances as students.

We now present the results obtained for these 25 instances. The tackled optimization problem was to determine a graph with a minimal number of edges. For all instances considered here, an optimal graph has been obtained in about 40 milliseconds for the ILP formulation and 20 milliseconds for its LP relaxation. In fact, the linear programming formulation always returned an integer solution. Table 2 summarizes the obtained results.

Ed-00006 data set.

The number of candidates (skaters) varies from 14 to 30, and the number of voters (judges) from 7 to 9. For the 20 instances considered, a tree has been obtained 14 times, a pseudotree 5 times, and a solution with 23 edges for 19 candidates (vertices) has been obtained once. The possible interpretation of these results is that, even though the rankings are based on subjective opinions of the judges, there is something like a “true ranking” behind as some skaters are objectively better than other ones. Thus, the rankings given by the judges can be viewed as biased observations of the true ranking, so that they are quite close. We note also that even when the solution was not a tree, the LP continuous relaxation gave an integer solution (identical to the one of the ILP). We also precise that we also checked compatibility with an axis, and no profile was single-peaked with respect to an axis.

Ed-00009 data set.

Each student provided a rank ordering over all the courses with no missing elements. There were 9 courses to choose from in 2003 and 7 in 2004, and about 150 students. For both years, the optimal solution was a tree, more specifically a star. This is easily explained from the fact that, in both years, there was one course which was the most preferred for every student.

Ed-00012 and Ed-000014 data sets.

The optimal solution for the preferences over the T-shirt designs had 25 edges, which is quite a lot regarding the number of candidates (11) and voters (30). However, it is consistent with the intuition that there is probably no structure behind T-shirt designs. The same can be said for the preferences over the kinds of sushi, where 5000 voters were asked for their preferences about 10 kinds of sushi (the optimal solution is a complete graph in this case).

Ed-00032 data set

In the single instance with no tie nor missing element, there were 15 students that ranked the 6 criteria affecting their performances. In both the ILP and LP formulations, a solution with 8 edges has been returned.

File #candidates #voters #edges
ED-00006-00000003.soc 14 9 13 (tree)
ED-00006-00000004.soc 14 9 13 (tree)
ED-00006-00000007.soc 23 9 22 (tree)
ED-00006-00000008.soc 23 9 22 (tree)
ED-00006-00000011.soc 20 9 20 (pseudotree)
ED-00006-00000012.soc 20 9 20 (pseudotree)
ED-00006-00000018.soc 24 9 23 (tree)
ED-00006-00000021.soc 18 7 17 (tree)
ED-00006-00000022.soc 18 7 17 (tree)
ED-00006-00000028.soc 24 9 23 (tree)
ED-00006-00000029.soc 19 9 23
ED-00006-00000032.soc 23 9 23 (pseudotree)
ED-00006-00000033.soc 23 9 22 (tree)
ED-00006-00000034.soc 23 9 22 (tree)
ED-00006-00000035.soc 18 9 17 (tree)
ED-00006-00000036.soc 18 9 17 (tree)
ED-00006-00000037.soc 19 9 18 (tree)
ED-00006-00000044.soc 20 9 19 (tree)
ED-00006-00000046.soc 30 9 30 (pseudotree)
ED-00006-00000048.soc 24 9 23 (tree)
ED-00009-00000001.soc 9 146 8 (tree)
ED-00009-00000002.soc 7 153 6 (tree)
ED-00012-00000001.soc 11 30 25
ED-00014-00000001.soc 10 5000 45 (clique)
ED-00032-00000002.soc 6 15 7
Figure 2: Minimal number of edges on real data sets from PrefLib. The first column gives the file name, the second and third the number of candidates and voters. The fourth column gives the minimal number of edges; specific structures (axis, tree, pseudotree or clique) are indicated in parentheses when they appear.

5.2 Experimental study on randomly generated data

The experimental study on real data revealed some interesting information. Nevertheless, it is limited by the small amount of data available. It seems indeed hard to obtain real election data with complete strict order rankings. To overcome this difficulty, other experiments can be considered. One of them is to adapt the approach to partial order rankings that can be met more often in practice. However, in this paper, we preferred to generate random data in order to study the structure of solutions and the relation between the solutions returned by ILP and LP formulations.

As mentioned in the previous section, it seems that in some contexts we can assume that the voter’s preferences are biased observations of a “true” ranking. This idea can be modeled using the Mallows distribution on rankings. In this model, the “true” ranking is called central permutation and its probability is the highest one. The probability of other permutations decreases with the Kendall tau distance from the central permutation. Formally, let be the central permutation. The probability of a permutation is:

where is the Kendall tau distance, is a dispersion parameter modeling the opinion heterogeneity, and is a normalisation constant. The parameter is a real number greater than or equal to 0. If

, the uniform distribution is obtained. The greater the value of

, the more the voters agree on the central permutation.

We used the PerMallows R package222https://cran.r-project.org/web/packages/PerMallows/index.html for generating the random data according to the Mallows model. The number of candidates was set to , the value of varied from 0 to 1 by step of 0.1. The number of voters varied from 20 to 100 by step of 10. For each pair of parameter values, the results are averaged over 1000 randomly drawn preference profiles. The curves in Figure 3 shows the evolution of the graph density according to these parameter values.

Figure 3: Density of the graph according to parameters and (with ).
Figure 4: Expected number of necessary edges according to parameters and (with ).

In the best case, the obtained solution is a tree, hence, the density is . As we set , this corresponds to a density of 0.1. The function representing the graph density seems indeed to converge to the constant function of value 0.1 while the value of increases and the preferences in the profile become similar (the curves get closer and closer to the x-axis). Put another way, the density captures the similarity of voters’ preferences, as clearly the higher the lower the curve. On the contrary, the graph density becomes of course higher when the number of voters increases. Nevertheless, note that, even for 100 voters, the graph is still quite far from being complete. Besides, the slope of the curve decreases with . During our experiments, we plotted functions and obtained a set of (approximative) straight lines, thus indicating that the convergence towards density 1 (complete graphs) is of the form , where is a parameter decreasing with .

We now give some theoretical arguments that support this observation. Let us recall that if a voter ranks first and second (or the opposite), then edge must be present in the graph and is called necessary

edge. Assuming that the preferences in the profile are generated with the Mallows model, let us now estimate the number of necessary edges in the graph for

voters and candidates, which gives us an underestimation and hopefully good approximation of the total number of edges. Let be the model parameter and the central permutation. The probability that a preference induces the necessary edge is

(1)

where is the set of permutations of that ranks and in the first two positions. In a profile with voters, the probability that no preference induces the necessary edge is then written

Hence, by passing to the complement, the probability that is a necessary edge is

Finally, we obtain the expected value of the number of edges as

(2)

For , as the distribution is uniform, we get that . Then, we directly obtain that the average number of necessary edges is with , thus contributing for in the density, in accordance with the experiments. The curves in Figure 4 shows the evolution of the expected contribution of necessary edges in the graph density according to the values of and . The result extends to any value of but requires a dedicated algorithm to compute efficiently in Equation 2 (see Appendix A). As expected, we can see that the shapes of the curves coincide in Figures 3 and 4. Note, however, that the scale of the y-axis in Figure 4 slightly differs from the one in Figure 3 (the curves in Figure 4 indeed only account for necessary edges, thus the analytical values are smaller than the experimental ones).

By using the formula in Equation 2, we can have an idea of the evolution of the number of necessary edges in the graph for up to 1000 voters whose preferences follows the Mallows model. The obtained curves for various values of are shown in Figure 5. For instance, if one assumes that all votes are equally likely (impartial culture assumption, corresponding to ), then the graph becomes complete for a thousand voters, while only around 45% of the edges are present if one sets