Efficient Rank Aggregation via Lehmer Codes

01/28/2017 ∙ by Pan Li, et al. ∙ University of Massachusetts Amherst University of Illinois at Urbana-Champaign 0

We propose a novel rank aggregation method based on converting permutations into their corresponding Lehmer codes or other subdiagonal images. Lehmer codes, also known as inversion vectors, are vector representations of permutations in which each coordinate can take values not restricted by the values of other coordinates. This transformation allows for decoupling of the coordinates and for performing aggregation via simple scalar median or mode computations. We present simulation results illustrating the performance of this completely parallelizable approach and analytically prove that both the mode and median aggregation procedure recover the correct centroid aggregate with small sample complexity when the permutations are drawn according to the well-known Mallows models. The proposed Lehmer code approach may also be used on partial rankings, with similar performance guarantees.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rank aggregation is a family of problems concerned with fusing disparate ranking information, and it arises in application areas as diverse as social choice, meta-search, natural language processing, bioinformatics, and information retrieval 

[1, 2, 3]. The observed rankings are either linear orders (permutations) or partial (element-tied) rankings111

In the mathematics literature, partial rankings are commonly referred to as weak orders, while the term partial order is used to describe orders of subsets of elements of a ground set. We nevertheless use the term partial ranking to denote orders with ties, as this terminology is more widely adopted by the machine learning community.

. Sometimes, rankings are assumed to be of the form of a set of pairwise comparisons [4, 5]. Note that, many massive ordinal datasets arise from ratings, rather than actual comparisons. Rank aggregation, rather than averaging of ratings, is justified due to the fact that most raters have different rating “scales”. As an example, the rating three of one user may indicate that the user liked the item, while the rating three by another user may indicate that the user disliked the item. Hence, actual preferences can only be deduced using ranked ratings.

In rank aggregation, the task at hand is to find a ranking that is at the smallest cumulative distance from a given set of rankings. Here, the cumulative distance from a set equals the sum of the distances from each element of the set, and the most frequently used distance measure for the case of permutations is the Kendall distance. For the case of partial rankings, the distance of choice is the Kemeny distance [6]. The Kendall distance between two permutations equals the smallest number of adjacent transpositions needed to convert one permutation into the other. The Kemeny distance contains an additional weighted correction term that accounts for ties in the rankings.

It is well known that for a wide range of distance functions, learning the underlying models and aggregating rankings is computationally hard [7]. Nevertheless, for the case when the distance measure is the Kendall distance, a number of approximation algorithms have been developed that offer various trade-offs between quality of aggregation and computational complexity [8, 9]

. The techniques used for aggregating permutations in a given set include randomly choosing a permutation from the set (PICK-A-PERM), pivoting via random selections of elements and divide-and-conquer approaches (FAS-PIVOT), Markov chain methods akin to PageRank, and minimum weight graph matching methods exploiting the fact that the Kendall

distance is well-approximated by the Spearman footrule distance (SM) [10]. Methods with provable performance guarantees – PICK-A-PERM, FAS-PIVOT, and SM – give a -approximation for the objective function, although combinations thereof are known to improve the constant to or  [9]. There also exists a polynomial time approximation scheme (PTAS) for the aggregation problem [11].

Unfortunately, most of these known approximate rank aggregation algorithms have high complexity for use with massive datasets and may not be implemented in a parallel fashion. Furthermore, they do not easily extend to partial rankings. In many cases, a performance analysis on probabilistic models [12] such as the Plackett-Luce model [13] or the Mallows model [14, 15], is intractable.

In this paper, we propose a new approach to the problem of rank aggregation that uses a combinatorial transform, the Lehmer code (LC). The gist of the approach is to convert permutations into their Lehmer code representations, in which each coordinate takes values independently from other coordinates. Aggregation over the Lehmer code domain reduces to computing the median or mode of a bounded set of numbers, which can be done in linear time. Furthermore, efficient conversion algorithms between permutations and Lehmer codes – also running in linear time – are known, making the overall complexity of the parallel implementation of the scheme , where denotes the number of permutations to be aggregated, and denotes the length (size) of the permutations. To illustrate the performance of the Lehmer code aggregators (LCAs) on permutations, we carry out simulation studies showing that the algorithms perform comparably with the best known methods for approximate aggregation, but at a significantly lower computational cost. We then proceed to establish a number of theoretical performance guarantees for the LCA algorithms: In particular, we consider the Mallows model with the Kendall distance for permutations and Kemeny distance for partial rankings where ties are allowed. We show that the centroid permutation of the model or a derivative thereof may be recovered from

samples from the corresponding distribution with high probability.

The paper is organized as follows. Section 2 contains the mathematical preliminaries and the definitions used throughout the paper. Section 3 introduces our new aggregation methods for two types of rankings, while Section 4 describes our analysis pertaining to the Mallows and generalized Mallows models. Section 5 contains illustrative simulation results comparing the performance of the LC aggregators to that of other known aggregation methods, both on simulated and real ranking data. A number of technical results, namely detailed proofs of theorems and lemmas, can be found in the Appendix.

2 Mathematical Preliminaries

Let denote a set of elements, which without loss of generality we assume to be equal to . A ranking is an ordering of a subset of elements of according to a predefined rule. When , we refer to the order as a permutation (full ranking). When a ranking includes ties, we refer to it as a partial ranking (weak or bucket order). Partial rankings may be used to complete rankings of subsets of element in in a number of different ways [16], one being to tie all unranked elements at the last position.

Rigorously, a permutation is a bijection , and the set of permutations over forms the symmetric group of order denoted by . For any and , denotes the rank (position) of the element in . We say that is ranked higher than (ranked lower than ) iff (). The inverse of a permutation is denoted by . Clearly, represents the element ranked at position in . We define the projection of a permutation over a subset of elements , denoted by , as an ordering of elements in such that , iff . As an example, the projection of over equals since . As can be seen, equals the rank of element in .

We use a similar set of definitions for partial rankings [16]. A partial ranking is also defined as a mapping . In contrast to permutations, where the mapping is a bijection, the mapping in partial ranking allows for ties, i.e., there may exist two elements such that . A partial ranking is often represented using buckets, and is in this context referred to as a bucket order [16]. In a bucket order, the elements of the set are partitioned into a number of subsets, or buckets, . We let denote the index of the bucket containing the element in , so the element is assigned to bucket . Two elements lie in the same bucket if and only if they are tied in . We may also define a projection of a partial ranking over a subset of elements , denoted by , so that for , iff and iff . For a given partial ranking , we use to denote its corresponding buckets. In addition, we define and . Based on the previous discussion, (the number of elements that are in the bucket containing ). When referring to the bucket for a certain element , we use whenever no confusion arises. Note that if we arbitrarily break ties in to create a permutation , then ; clearly, if is a permutation, we have .

A number of distance functions between permutations are known from the social choice, learning and discrete mathematics literature [10]. One distance function of interest is based on transpositions: A transposition is a swap of elements at positions and , . If , the transposition is referred to as an adjacent transposition. It is well known that transpositions (adjacent transpositions) generate , i.e., any permutation can be converted into another permutation through a sequence of transpositions (adjacent transpositions) [17]. The smallest number of adjacent transpositions needed to convert a permutation into another permutation is known as the Kendall distance between and , and is denoted by . Alternatively, the Kendall distance between two permutations and over equals the number of mutual inversions between the elements of the two permutations:

(1)

Another distance measure, that does not rely on transpositions, is the Spearman footrule, defined as

A well known result by Diaconis and Graham [10] asserts that .

One may also define an extension of the Kendall distance for the case of two partial rankings and over the set , known as the Kemeny distance:

or (2)

The Kemeny distance includes a component equal to the Kendal distance between the linear chains in the partial rankings, and another, scaled component that characterizes the distance of tied pairs of elements [16]. The Spearman footrule distance may also be defined to apply to partial rankings [16], and it equals the sum of the absolute differences between “positions” of elements in the partial rankings. Here, the position of an element in a partial ranking is defined as

The above defined Spearman distance is a -approximation for the Kemeny distance between two partial rankings [16].

A permutation may be uniquely represented via its Lehmer code (also called the inversion vector), i.e. a word of the form

where for ,

(3)

and for integers , . By default, , and is typically omitted. For instance, we have

1 2 3 4 5 6 7 8 9
2 1 4 5 7 3 6 9 8
0 1 0 0 0 3 1 0 1

It is well known that the Lehmer code is bijective, and that the encoding and decoding algorithms have linear complexity  [18, 19]. Codes with similar properties to the Lehmer codes have been extensively studied under the name of subdiagonal codes. An overview of such codes and their relationship to Mahonian statistics on permutations may be found in [20].

We propose next our generalization of Lehmer codes to partial rankings. Recall that the -th entry in the Lehmer code of a permutation is the number of elements with index smaller than that are ranked lower than in  (3). For a partial ranking, in addition to , we use another code that takes into account ties according to:

(4)

Clearly, for all . It is straightforward to see that using and , one may recover the original partial ranking . In fact, we prove next that the linear-time Lehmer encoding and decoding algorithms may be used to encode and decode and in linear time as well.

Given a partial ranking , we may break the ties in each bucket to arrive at a permutation as follows: For , if ,

(5)

We observe that the entries of the Lehmer codes of and satisfy the following relationships for all :

where An example illustrating these concepts is given below.

1 2 3 4 5 6 7 8 9
1 1 2 2 3 1 2 3 3
1 2 4 5 7 3 6 8 9
0 0 0 0 0 3 1 0 0
IN 1 2 1 2 1 3 3 2 3
0 0 0 0 0 3 1 0 0
0 1 0 1 0 5 3 1 2

Note that , as well as and may be computed in linear time. The encoding procedure is outlined in Algorithm 1.

Algorithm 1:
Lehmer encoder for partial rankings
Input: a partial ranking ;
 1: Set to be the number of buckets in ;
 2: Initialize
  and ;
 3: For from to do
 4:  
 5:  ;
 6: Break ties of to get according to (5);
 7: ;
Output: Output , ;

3 Aggregation Algorithms

Assume that we have to aggregate a set of rankings, denoted by . Aggregation may be performed via the distance-based Kemeny-Young model, in which one seeks a ranking that minimizes the cumulative Kendall (Kemeny) distance () from the set , formally defined as:

Note that when the set comprises permutations only, is required to be a permutation; if comprises partial rankings, we allow the output to be either a permutation or a partial ranking.

The LCA procedure under the Kendall distance is described in Algorithm 2.

Algorithm 2: The LCA Method (Permutations)
Input: , where .
 1: Compute the Lehmer codewords for all
 2: Compute the median/mode of the coordinates:
   
 3: Compute , the inverse Lehmer code of .
Output: Output

Note that each step of the algorithm may be executed in parallel. If no parallelization is used, the first step requires time, given that the Lehmer codes may be computed in time [18, 19]. If parallelization on is used instead, the time reduces to . Similarly, without parallelization the second step requires time, while coordinate parallelization reduces this time to . This third step requires computations. Hence, the overall complexity of the algorithm is either or , depending on parallelization being used or not.

For permutations, the aggregation procedure may be viewed as specialized voting: The ranking casts a vote to rank at position for the case that only elements are considered (A vote corresponds to some score confined to ). However, when is a partial ranking involving ties, the vote should account for all possible placements between and . More precisely, suppose that the vote cast by to place element in position is denoted by . Then, one should have

(6)

if and only if , and zero otherwise. Note that when the mode is used, the “positive votes” are all equal to one, while when the median is used, a vote counts only a fractional value dictated by the length of the “ranking interval”.

Next, we use to denote the total voting score element received to be ranked at position . The inverse Lehmer code of the aggregator output is computed as:

(7)

To compute the values for all , the LCA algorithm requires time, which yields an overall aggregation complexity of when no parallelization is used. This complexity is reduced to for the parallel implementation. Note that the evaluations of the functions may be performed in a simple iterative manner provided that the votes are positive constants, leading to a reduction in the overall complexity of this step to when no parallelization is used. Relevant details regarding the iterative procedure may be found in Appendix G.

Note that the output of Algorithm 2 is a permutation. To generate a partial ranking that minimizes the Kemeny distance while being consistent222We say that two partial rankings are consistent if for any two elements , if and only if and vise versa. with , one can use a -time algorithm outlined in Appendix G. Alternatively, the following simple greedy method always produces practically good partial rankings with complexity: Scan the elements in the output permutation from highest () to lowest rank () and decide to put and in the same bucket or not based on which of the two choices offers smaller Kemeny distance with respect to the subset .

Discussion. In what follows, we briefly outline the similarities and differences between the LCA method and existing positional as well as InsertionSort based aggregation methods. Positional methods are a class of aggregation algorithms that seek to output a ranking in which the position of each element is “close” to the position of the element in . One example of a positional method is Borda’s algorithm, which is known to produce a -approximation to the Kemeny-Young problem for permutations [21]. Another method is the Spearman footrule aggregation method which seeks to find a permutation that minimizes the sum of the Spearman footrule distance between the output and each ranking in . As already mentioned, the latter method produces a -approximation for the Kendall aggregate for both permutations and partial ranking. LCA also falls under the category of positional methods, but the positions on which scoring is performed are highly specialized by the Lehmer code. And although it appears hard to prove worst-case performance guarantees for the method, statistical analysis on particular ranking models shows that it can recover the correct results with small sample complexity. It also offers significant reductions in computational time compared to the Spearman footrule method, which reduces to solving a weighted bipartite matching problem and hence has complexity at least  [22], or when implemented in MapReduce [23].

A related type of aggregation is based on InsertionSort [8, 22]. In each iteration, an element is randomly chosen to be inserted into the sequence containing the already sorted elements. The position of the insertion is selected as follows. Assume that the elements are inserted according to the identity order so that at iteration , element is chosen to be inserted into some previously constructed ranking over . Let and the symbol is inserted into the ranking over to arrive at , the ranking available after iteration . If is inserted between two adjacent elements and , then one should have when , when and . Let denote the rank assigned to element over , the choice of which may vary from method to method. The authors of [8] proposed setting to

or when the above set is empty. This insertion rule does not ensure a constant approximation guarantee in the worst case (It has an expected worst-case performance guarantee of ), although it leads to a Locally Kemeny optimal solution.

We next describe how the LCA method may be viewed as an InsertionSort method with a special choice of

. Consider the permutation LCA method of Algorithm 2, and focus on estimating the

-th coordinate of the Lehmer code (step 2) and the inverse Lehmer code via insertion (step 3) simultaneously. Once is generated, it’s corresponding inverse Lehmer transform may be viewed as the operation of placing the element at position over . In other words, inverting the incomplete ranking reduces to setting , where essentially equals the mode or median of the positions of in the rankings of , projected onto . The same is true of partial rankings, with the only difference being that the selection of has to be changed because of ties between elements.

4 Analysis of the Mallows Model

We provide next a theoretical performance analysis of the LCA algorithm under the assumption that the rankings are generated according to the Mallows and generalized Mallows Model. In the Mallows model MM with parameters and , denotes the centroid ranking and

determines the variance of the ranking with respect to

. The probability of a permutation is proportional to . For partial rankings, we assume that the samples are generated from a generalized Mallows Model (GMM) whose centroid is allowed to be a partial ranking and where the distance is the Kemeny , rather than the Kendall distance .

Our analysis is based on the premise that given a sufficiently large number of samples (permutations), one expects the ranking obtained by a good aggregation algorithm to be equal to the centroid with high probability. Alternative methods to analytically test the quality of an aggregation algorithm are to perform a worst-case analysis, which for the LCA method appears hard, or to perform a simulation-based analysis which produces a comparison of the objective function values for the Kemeny-Young problem given different aggregation methods. We report on the latter study in the section to follow.

To ease the notational burden, we henceforth use in all subsequent results and derivations. Detailed proofs are relegated to the appendix. One of our main theoretical result is the following.

Theorem 4.1.

Assume that , where MM are i.i.d. samples of the given Mallows model. If and with and , then the output ranking of Algorithm 2 under the mode rule equals with probability at least .

The idea behind the proof is to view the LCA procedure as an InsertionSort method, in which the probability of the event that the selected position is incorrect with respect to is very small for sufficiently large . Based on the lemma that follows (Lemma 4.2), one may show that if satisfies , the most probable position of an element in a ranking MM corresponds to its rank in the centroid . Given enough samples, one can estimate the rank of an element in the centroid by directly using the mode of the rank of the element in the drawn samples.

Lemma 4.2.

Let MM. Consider an element . Then, the following two statements describe the distribution of :

1)
2)

In 1), the upper bound is achieved when and while the lower bound is achieved when . In 2), the upper bound is achieved when and while the lower bound is achieved when .

Remark 4.1.

The result above may seem counterintuitive since it implies that for , the probability of ranking some element at a position different from its position in is larger than the probability of raking it at position . An easy-to-check example that shows that this indeed may be the case corresponds to and . Here, we have

Lemma 4.2 does not guarantee that in any single iteration the position of the element will be correct, since the ranking involves only a subset of elements. Therefore, Lemma 4.3, a generalized version for the subset-projected ranking, is required for the proof.

Lemma 4.3.

Let MM and let . Consider an element . Then, the following two statements describe the distribution of :

1)
2)

Observe that the conditions that allow one to achieve the upper bound in Lemma 4.2 also ensure that the upper bounds are achieved in Lemma 4.3. Moreover, when , the right hand sides are .

The next result establishes the performance guarantees for the LCA algorithm with the median operation.

Theorem 4.4.

Assume that , where MM . If and where , then the output of Algorithm 2 under the median operation equals with probability at least .

The proof follows by observing that if the median of the Lehmer code over all converges to as , then each should have . According to the following Lemma, in this case, one needs .

Lemma 4.5.

Let MM and let . For any , the following two bounds hold:

The inequality 1) is met for and while the inequality 2) is met for and .

We now turn our attention to partial rankings and prove the following extension of the previous result for the GMM, under the LCA algorithm that uses the median of coordinate values. Note that the output of Algorithm 2 is essentially a permutation, although it may be transformed into a partial ranking via the bucketing method described in Section 2.

Theorem 4.6.

Assume that , where GMM . If and with where , then the output ranking of the LCA algorithm (see Appendix E) under the median operation is in with probability at least . Here, denotes the set of permutations generated by breaking ties in .

The proof of this theorem relies on showing that the InsertionSort procedure places elements in their correct position with high probability. If the median is used for partial ranking aggregation, one vote is uniformly distributed amongst all possible positions in the range given by (

6). To ensure that the output permutation is in , we need to guarantee that the median of the positions of the votes for over is in for large enough (as in this case, represents the bucket in that contains ).

For a GMM, let be the vote that the partial ranking cast for position . Then, one requires that

The expectations in the expressions above may be evaluated as follows (We only consider the expectation on the left because of symmetry). If the event occurs, then the vote of that contributes to the sum equals . If the event , where occurs, then the vote that contributes to the sum equals Therefore, we have

(8)

The following lemma describes a lower bound for (8).

Lemma 4.7.

Let GMM and let be such that it contains a predefined element . Let . Define

Then, one can prove that

If , the lower bound above exceeds . Theorem 4.6 then follows using the union bound and Hoeffding’s inequality.

5 Performance Evaluation

We next evaluate the performance of the LCA algorithms via experimental methods and compare it to that of other rank aggregation methods using both synthetic and real datasets. For comparative analysis, we choose the Fas-Pivot and FasLP-Pivot (LP) methods [9], InsertionSort with Comparison (InsertionComp) from [8], and the optimal Spearman Footrule distance aggregator (Spearman) [10]. For the randomized algorithms Fas-Pivot and FasLP-Pivot, the pivot in each iteration is chosen randomly. For InsertionSort with Comparison, the insertion order of the elements is also chosen randomly. Furthermore, for all three methods, the procedure is executed five times, and the best solution is selected. For Fas-Pivot and FasLP-Pivot, we chose the better result of Pick-A-Perm and the given method, as suggested in [9].

In the context of synthetic data, we only present results for the Mallows model in which the number of ranked items equals and the number of rankings equals . The variance parameter was chosen according to , where is allowed to vary in . For each parameter setting, we ran independent simulations and computed the average cumulative Kendall distance (normalized by ) between the output ranking and , given as . We then normalized the value of each algorithm by that of FasLP-Pivot, since FasLP-Pivot always offered the best performance. The results are depicted in Fig. 1. Note that we used MostProb to describe the most probable ranking, which is the centroid for the Mallows Model.

Figure 1: The normalized Kendall Distance vs the parameter of the Mallows Model.

Note that for parameter values LCA algorithms perform almost identically to the best aggregation method, the LP-based pivoting scheme. For smaller values of , small performance differences may be observed; these are compensated by the significantly smaller complexity of the LCA methods which in the parallel implementation mode is only linear in and . Note that the InsertionSort Comp method performs poorly, although it ensures local Kemeny optimality.

We also conducted experiments on a number of real-world datasets. To test the permutation LCA aggregation algorithms, we used the Sushi ranking dataset [24] and the Jester dataset [25]. The Sushi dataset consists of permutations involving types of sushi. The Jester dataset contains scores in the continuous interval for jokes submitted by individuals. We chose the scores of individuals who rated all jokes and transformed the rating into permutations by sorting the scores. For each dataset, we tested our algorithms by randomly choosing many samples out of the complete list and by computing the average cumulative Kendall distance normalized by via independent tests. The results are listed in the Table 1 and Table 2.

10 50 200 1000 5000
Fas-Pivot 14.51 15.98 16.18 16.38 16.06
FasLP-Piovt 13.59 15.00 15.33 15.39 15.39
InsertionComp 15.87 16.60 16.70 16.80 16.65
Spearman 14.41 15.24 15.54 15.56 15.61
LC-median 14.03 15.25 15.57 15.58 15.74
LC-mode 14.19 15.33 15.46 15.47 15.49
Table 1: Rank aggregator comparison for the Sushi dataset (permutations)
50 200 1000 5000 10000
Fas-Pivot 2102 2137 2144 2127 2127
FasLP-Piovt 1874 1915 1920 1922 1921
InsertionComp 2327 2331 2337 2323 2390
Spearman 1900 1936 1935 1937 1937
LC-median 1932 1962 1965 1966 1965
LC-mode 1973 1965 1962 1964 1965
Table 2: Rank aggregator comparison for the Jester dataset (permutations)

To test our partial ranking aggregation algorithms, we used the complete Jester dataset [25] and the Movielens dataset [26]. For the Jester dataset, we first rounded the scores to the nearest integer and then placed the jokes with the same integer score in the same bucket of the resulting partial ranking. We also assumed that the unrated jokes were placed in a bucket ranked lower than any other bucket of the rated jokes. The movielens dataset contains incomplete lists of scores for more than movies rated by users. The scores are integers in so that many ties are present. We chose the most rated movies and users who rated these movies with largest coverage. Similarly as for the Jester dataset, we assumed that the unrated movies were tied for the last position. In each test, we used the iterative method described in Section 3 to transform permutations into partial rankings. Note that when computing the Kemeny distance between two partial rankings of (2), we omitted the penalty incurred by ties between unrated elements, because otherwise the iterative method would yield too many ties in the output partial ranking. More precisely, we used the following formula to assess the distance between two incomplete partial rankings (9):

or (9)

The results are listed in Table 3 and Table 4. As may be seen, the parallelizable, low-complexity LCA methods tend to offer very similar performance to that of the significantly more computationally demanding LP pivoting algorithm.

50 200 1000 5000 10000
Fas-Pivot 1265 1280 1279 1279 1281
FasLP-Piovt 1264 1280 1279 1279 1281
InsertionComp 1980 1967 1956 1949 1979
Spearman 1272 1284 1281 1281 1282
LC-median 1275 1287 1284 1283 1287
LC-mode 1311 1304 1289 1283 1283
Table 3: Rank aggregator comparison for the Jester dataset (partial rankings)
20 50 100 200 500
Fas-Pivot 328.8 344.4 350.3 351.4 353.3
FasLP-Piovt 328.6 344.4 350.3 351.4 353.5
InsertionComp 386.3 390.2 392.6 393.1 393.0
Spearman 332.9 347.3 352.5 353.5 355.4
LC-median 334.2 350.4 355.4 355.9 359.1
LC-mode 340.1 353.5 357.5 359.0 360.0
Table 4: Rank aggregator comparison for the Movielens dataset (partial rankings)

Reference

  • [1] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender, “Learning to rank using gradient descent,” in Proceedings of the 22nd international conference on Machine learning. ACM, 2005, pp. 89–96.
  • [2] Tie-Yan Liu, “Learning to rank for information retrieval,” Foundations and Trends in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009.
  • [3] Minji Kim, Farzad Farnoud, and Olgica Milenkovic, “Hydra: gene prioritization via hybrid distance-score rank aggregation,” Bioinformatics, p. btu766, 2014.
  • [4] Sahand Negahban, Sewoong Oh, and Devavrat Shah, “Iterative ranking from pair-wise comparisons,” in Advances in Neural Information Processing Systems, 2012, pp. 2474–2482.
  • [5] Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz, “Pairwise ranking aggregation in a crowdsourced setting,” in Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013, pp. 193–202.
  • [6] John G Kemeny, “Mathematics without numbers,” Daedalus, vol. 88, no. 4, pp. 577–591, 1959.
  • [7] Andrew Davenport and Jayant Kalagnanam, “A computational study of the kemeny rule for preference aggregation,” in AAAI, 2004, vol. 4, pp. 697–702.
  • [8] Cynthia Dwork, Ravi Kumar, Moni Naor, and D Sivakumar, “Rank aggregation revisited,” 2001.
  • [9] Nir Ailon, Moses Charikar, and Alantha Newman, “Aggregating inconsistent information: ranking and clustering,” Journal of the ACM (JACM), vol. 55, no. 5, pp. 23, 2008.
  • [10] Persi Diaconis and Ronald L Graham, “Spearman’s footrule as a measure of disarray,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 262–268, 1977.
  • [11] Claire Kenyon-Mathieu and Warren Schudy, “How to rank with few errors,” in

    Proceedings of the thirty-ninth annual ACM symposium on Theory of computing

    . ACM, 2007, pp. 95–103.
  • [12] Michael A Fligner and Joseph S Verducci, Probability models and statistical analyses for ranking data, vol. 80, Springer, 1993.
  • [13] Francois Caron and Arnaud Doucet,

    “Efficient bayesian inference for generalized bradley–terry models,”

    Journal of Computational and Graphical Statistics, vol. 21, no. 1, pp. 174–196, 2012.
  • [14] Tyler Lu and Craig Boutilier, “Learning mallows models with pairwise preferences,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 145–152.
  • [15] Guy Lebanon and John Lafferty, “Cranking: Combining rankings using conditional probability models on permutations,” in ICML. Citeseer, 2002, vol. 2, pp. 363–370.
  • [16] Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D Sivakumar, and Erik Vee, “Comparing and aggregating rankings with ties,” in Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004, pp. 47–58.
  • [17] Richard P Stanley, Enumerative combinatorics, Number 49. Cambridge university press, 2011.
  • [18] Martin Mareš and Milan Straka, “Linear-time ranking of permutations,” in Algorithms–ESA 2007, pp. 187–193. Springer, 2007.
  • [19] Wendy Myrvold and Frank Ruskey, “Ranking and unranking permutations in linear time,” Information Processing Letters, vol. 79, no. 6, pp. 281–284, 2001.
  • [20] Vincent Vajnovszki, “Lehmer code transforms and mahonian statistics on permutations,” Discrete Mathematics, vol. 313, no. 5, pp. 581–589, 2013.
  • [21] Don Coppersmith, Lisa Fleischer, and Atri Rudra, “Ordering by weighted number of wins gives a good ranking for weighted tournaments,” in Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. Society for Industrial and Applied Mathematics, 2006, pp. 776–782.
  • [22] Cynthia Dwork, Ravi Kumar, Moni Naor, and Dandapani Sivakumar, “Rank aggregation methods for the web,” in Proceedings of the 10th international conference on World Wide Web. ACM, 2001, pp. 613–622.
  • [23] Karthik Kambatla, Georgios Kollias, and Ananth Grama, “Efficient large-scale graph analysis in mapreduce,” 2012.
  • [24] Toshihiro Kamishima, “Nantonac collaborative filtering: recommendation based on order responses,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003, pp. 583–588.
  • [25] Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins, “Eigentaste: A constant time collaborative filtering algorithm,” Information Retrieval, vol. 4, no. 2, pp. 133–151, 2001.
  • [26] F Maxwell Harper and Joseph A Konstan, “The movielens datasets: History and context,” ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 5, no. 4, pp. 19, 2016.
  • [27] Pranjal Awasthi, Avrim Blum, Or Sheffet, and Aravindan Vijayaraghavan, “Learning mixtures of ranking models,” in Advances in Neural Information Processing Systems, 2014, pp. 2609–2617.

Appendix A Proof of Lemma 4.2

Before proceeding with the proof, we remark that some ideas in our derivatione have been motivated by Lemma 10.7 of [27].

Let . Suppose that and that we want to prove statement 1) (the second case when may be handled similarly). When , the underlying ratio is exactly equal to . Hence, we only consider the case when . Let and . In this case, and . Define the sets:

Clearly, and . By swapping and , we can construct two bijections and . Statement can then be easily proved by using the following three claims:

(10)

Observe that inequality is achieved in a) when . The first two claims are straightforward to check, and hence we only prove the third claim.

Consider a mapping from to based on circular swapping of elements, and let . Since and , there must exist an element such that and . Choose the element with the largest corresponding value of and construct a new ranking such that

It is easy to see that . Given that all elements ranked between and in have rank higher than , we have . Note that the above mapping is neither a bijection nor an injection. Denote the mapping by . For each , define , so that for all , . Then, forms a partition of the set . Next, consider two distinct rankings