1 Introduction
The problem of ranking a collection of items from noisy pairwise comparisons arises in a wide range of applications, including recommender systems for rating movies, books, or other consumer items [piech_tuned_2013, aggarwal_recommender_2016]; peer grading for ranking students in massive open online courses [shah2013case]; ranking players in tournaments; search engines; quantifying people’s perception of cities from pairwise comparison of street views of the cities [salesses_collaborative_2013]; and online sequential survey sampling for assessing the popularity of proposals in a population of voters [salganik_wiki_2015].
In each of these applications, the aim is to obtain a statistically sound ranking from as few comparisons as possible. In this work, we investigate the power of adaptively selecting which pairs to compare based on the outcomes of previous comparisons, a setting we call active or adaptive ranking. In contrast, passive or nonadaptive ranking approaches fix the comparisons to make before any data is collected. It is well understood that one can typically learn a ranking using fewer adaptively chosen comparisons than one would need when passively choosing comparisons [heckel_active_2016]. However, for moderately large or large collections of items–such as the ones that appear in most of the applications mentioned above–or for collections with many items of “similar quality” (to be made rigorous below), learning the exact groundtruth ranking may still require prohibitively many comparisons.
Motivated by these largescale ranking problems, this work studies the problem of adaptivity obtaining approximate rankings. We demonstrate that learning an approximate ranking may still be statistically tractable even when recovering the exact ranking is not. Formally, we consider a collection of items, and make comparison queries between pairs of items . We assume that the response to those queries are stochastic, where the probability that item “beats” item is given by . We assume that the outcomes of all queries are statistically independent, and assume that either item or item “wins” the comparison with probability , which means that for all . Our aim is to rank the items in terms of their Borda scores [de1781memoire], defined as the probability that item defeats an item chosen uniformly at random from :
(1) 
Apart from their intuitive appeal, the Borda scores generalize the orderings considered in several popular comparison models, including the classical, parametric BradleyTerryLuce (BTL) [bradley_rank_1952, luce_individual_1959] and Thurstone [thurstone_law_1927] models, as well as the nonparametric Strong Stochastic Transitivity (SST) model [tversky_substitutability_1969]. In all of these models, the intrinsic modeldefined ordering coincides with that given by the scores . Rather than learning the scores exactly, or ranking items according to their exact score, this paper considers the problem of approximately partitioning the items into sets of prespecified sizes according to their respective scores. This includes finding a total ordering that is approximately correct, and the task of finding a set of items that is close to the top items. For simplicity, we exclusively focus on the latter problem in this paper.
Contributions:
Our main contribution is to present and analyze a novel active ranking algorithm for estimating an approximate ranking of the items. The algorithm is based on adaptively estimating the scores to within sufficient resolution to deduce a ranking. We establish that with high probability, the algorithm returns a ranking which satisfies the desired approximation guarantee, and attains a distributiondependent sample complexity which can be parameterized in terms of the scores . We then prove distributiondependent lower bounds that match our upper bound up to logarithmic factors for many problem instances. Our analysis leverages the fact that ranking in terms of the scores is related to a particular class of multiarmed bandit problems [evendar_action_2006, bubeck_multiple_2013, urvoy_generic_2013]; this same connection has been observed in the context of finding the top item [yuekarmed2012, jamieson_sparse_2015, urvoy_generic_2013]. Since to the best of our knowledge, the approximate subset selection problem has not been studied in the bandit literature, a version of our algorithm and results are also new when specialized to the multiarmed bandit problem. Finally, we examine pathological distributions for which the complexity of approximate ranking (or approximate subset selection in the multiarmed bandit setup) seems to diverge from what one would expect. In these cases, we show that careful randomized guessing strategies can yield significant improvements in sample complexity.
Motivation for Approximate Rankings:
In order to understand how approximation can drastically reduce the number of comparisons required, let us consider a motivating example. Suppose that we are interested in identifying the top items, and suppose for simplicity that the items are ordered, i.e., (of course this ordering is not known apriori). The paper [heckel_active_2016] shows that in the active setting, the number of comparisons necessary and sufficient for finding the top items is of the order
(2) 
up to a logarithmic factor. Thus, the sample complexity depends on the distribution of the scores; see Figure 1 how these scores are distributed in some applications. In practice, the differences between the scores often obey the scaling on average (see Figure 1). To identify the top items exactly, the aforementioned optimal active scheme would require on the order of comparisons, and a minimaxoptimal passive ranking scheme would even require on the order of comparisons [shah_simple_2015].
Theorem 1 in this paper shows that if one does not need to extract the exact top items, but is instead willing to tolerate a few–say, many–mistakes, then the number of comparisons shrinks drastically, specifically by a factor proportional to . In particular, if we want to find a set of of the items () such that all but of the elements of are among the true top of items (, ), then the overall number of comparisons required would be on the order of . Thus, relaxing to approximate ranking can yield speedups that are linear and quadratic in the number of items, compared to optimal exact active and exact passive schemes. Moreover, our algorithm (Algorithm 1 below) that obtains this factorof speedup does not require priori information about the spacings of the , but instead learns a nearoptimal measurement allocation for these scores adaptively.
Related works:
There is a vast literature on ranking and estimation from pairwise comparison data; however, most work focuses on finding exact rankings. There are a number of papers [hunter2004mm, negahban_iterative_2012, hajek2014minimax, shah_estimation_2015, shah_simple_2015]
devoted to settings in which pairs to be compared are chosen a priori, whereas here we assume that the pairs may be chosen in an active manner. Moreover, several works impose restrictions on the pairwise comparison probabilities, e.g., by assuming the BradleyTerryLuce (BTL) parametric model (discussed below)
[szorenyi_online_2015, hunter2004mm, negahban_iterative_2012, hajek2014minimax, shah_estimation_2015]. eriksson_learning_2013 considers the problem of finding the very top items using graphbased techniques, whereas busafekete_topk_2013 consider the problem of finding the topk items. ailon_active_2011 considers the problem of linearly ordering the items so as to disagree in as few pairwise preference labels as possible. Our work is also related to the literature on multiarmed bandits, as discussed later in the paper.2 Problem formulation and background
In this section we formally state the approximate ranking problem considered in this paper.
2.1 Pairwise probabilities and scores
Given a collection of items , let us denote by the (unknown) probability that item wins a comparison with item . We let
denote a Bernoulli random variable taking a value of
if beats and otherwise, so that . Moreover, we require that any comparison results in a winner, so that . For each item , recall that the score (1) defined by corresponds to the probability that item wins a comparison with an item chosen uniformly at random from . We let denote any (possibly nonunique) permutation such that In words, denotes the item with the largest score. Ranking corresponds to partitioning the items into disjoint sets according to its scores. For simplicity, in this paper we focus on the ranking problem of splitting into the top items and its complement . In this work, our goal is to find an approximation to and in terms of the Hamming distance between two sets , defined as . Specifically, we say the ranking with is Hammingaccurate ifFor future reference, we define
corresponding to the set of pairwise comparison matrices with pairwise comparison probabilities lower bounded by .
2.2 The active approximate ranking problem
An active ranking algorithm acts on a pairwise comparison model . The goal is to obtain an approximate partition of the items into disjoint sets from active comparisons. At each time instant, the algorithm can compare two arbitrary items, which the algorithm may select based on the outcomes of previous comparisons. When comparing and , the algorithm obtains an independent draw of the random variable in response. The algorithm terminates based on an associated stopping rule, and returns an approximate ranking . For a given tolerance parameter , we say a ranking algorithm is accurate for a pairwise comparison matrix , if the ranking returned is Hamming accurate with probability at least . Moreover, we say that is uniformly accurate over a given set of pairwise comparison models if it is accurate for each .
2.3 Relation to multiarmed bandits
The exact version of the ranking problem considered in this paper is related to the subset selection problem in the bandit literature [kalyanakrishnan_pac_2012]. Specifically, a multiarmed bandit model consists of arms, each a random variable with unknown distribution. The subset selection problem is concerned with identifying the top arms (according to the means) by taking independent draws of the random variables. Various works [yue_beat_2011, yuekarmed2012, urvoy_generic_2013, jamieson_sparse_2015] have observed that, by definition of the score , comparing item to an item chosen uniformly at random from can be modeled as drawing a Bernoulli random variable with mean . Our subsequent analysis relies on this relation.
However, when viewing our problem as a multiarmed bandit problem with means , we are ignoring the fact that the means are coupled, as they must be realized by some pairwise comparison matrix . Due to , this matrix must satisfy certain constraints, such as and (e.g., see the papers [landau_dominance_1953, joe_majorization_1988]). Our algorithm turns out to be nearoptimal, even though it does not take those constraints into account. This seems to corroborate the observation in [simchowitz_simulator_2017] that many types of constraints surprisingly do not improve the sample complexity of bandit problems.
Finally, at least to the best of our knowledge, the problem of approximate subset selection has not been studied in the bandit literature, meaning that our algorithm and results are also new when specialized to the multiarmed bandit problem. However, it should be noted that other versions of approximation have been considered in the literature; for instance, zhou_optimal_2014 studied the problem of selecting arms with low aggregate regret, defined as the gap between the average reward of the optimal solution and the solution given by the algorithm.
2.4 Parametric models
In this section, we introduce a family of parametric models that are popular in the pairwise comparison literature [szorenyi_online_2015, hunter2004mm, negahban_iterative_2012, hajek2014minimax, shah_estimation_2015]. We focus on these parametric models in Section 3.3, where we show that, perhaps surprisingly, if the pairwise comparison probabilities are bounded away from zero, for most constellations of scores, these assumptions can at most provide little gains in sample complexity.
Any member of this family is defined by a strictly increasing and continuous function obeying , for all . The function
is assumed to be known. A pairwise comparison matrix in this family is associated to an unknown vector
, where each entry of represents some quality or strength of the corresponding item. The parametric model associated with the function is defined as:Popular examples of models in this family are the BradleyTerryLuce (BTL) model, obtained by setting
equal to the sigmoid function
, and the Thurstone model, obtained by setting equal to the Gaussian CDF. Since is equivalent to , the ranking induced by the scores is equivalent to that induced by .3 HammingLUCB: Algorithm and analysis
In this section, we present our approximate ranking algorithm, and an analysis proving that it is near optimal for many interesting and natural problem instances.
3.1 The HammingLUCB algorithm
Our algorithm is based on actively identifying sets and consisting of items and items, respectively, such that with high confidence the items in the first set have a larger score than the items in the second set. Once we have found such sets, we can arbitrarily distribute the remaining items to the sets and in order to obtain a Hammingaccurate ranking with high confidence.
Our algorithm identifies those sets based on adaptively estimating the scores . We estimate the score of item by comparing item with items chosen uniformly at random from
, which yields an unbiased estimate of
. The key idea is to only estimate the scores sufficiently well so we can obtain the two sets and from them. This strategy decides based on the current estimates of the scores and associated confidence intervals which estimate to “update”, by comparing it to a randomly chosen item. Our strategy to update the estimates of the scores is guided by the insight that the “easiest” items to distinguish are the top items, , and the bottom items, . Hence, our algorithm focuses on what it “thinks” are those top and bottom items.We define a confidence bound based on an nonasymptotic version of the law of the iterated algorithm [kaufmann_complexity_2014, jamieson_lil_2014]; it is of the form , where is an integer corresponding to the number of comparisons, and with the constants involved explicitly chosen by setting
For each item , the algorithm stores a counter of the number of comparisons in which it has been involved, along with an empirical estimate of the associated score . For notational convenience, we adopt the shorthands and . Within each round, we also let denote a permutation of such that . We then define the indices
(3) 
These indices are the analogues of the standard indices of the LowerUpper Confidence Bound (LUCB) strategy from the bandit literature [kalyanakrishnan_pac_2012] for the top and bottom items. The LUCB strategy for exact top recovery would update the scores and (for ) at each round. As mentioned before, our strategy will go after what it “thinks” are the top items, , and what it “thinks” are the bottom items, . Moreover, the algorithm keeps all the other items in consideration for inclusion in these sets, by keeping their confidence intervals below the confidence intervals of the items in and (cf. equation (4) in the algorithm below). This is crucial to ensure that the algorithm does not get stuck trying to distinguish the middle items , which in general requires many comparisons, as their scores are typically closer. In Figure 2 we show an example run of the HammingLUCB algorithm, to illustrate the idea.
(5) 
3.2 Guarantees and optimality of the HammingLUCB algorithm
We next establish guarantees on the number of comparisons for the HammingLUCB algorithm to succeed. As we show below, the number of comparisons depends on the following gaps between the scores
Thus, as one might intuitively expect, the number of comparisons is typically smaller when is larger, as the corresponding gaps typically become larger.
Theorem 1.
For any , the HammingLUCB algorithm run with confidence parameter is Hammingaccurate, and with probability at least , makes at most comparisons, where
(6) 
The notation absorbs factors logarithmic in , and doubly logarithmic in the gaps.
Theorem 1 proves that the HammingLUCB algorithm is accurate, and characterizes the number of comparisons that it requires as a function of the gaps between the scores.
Comparing to the number of comparisons necessary and sufficient for finding the top items, we see that the HammingLUCB algorithm depends on the gaps and instead of the gaps and which appear in the sample complexity for finding the top items (cf. equation (2)). These gaps are typically significantly larger, resulting in a lower sample complexity. For example, in practice, the scores are often increasing in that is on average on the order of . Thus, for sufficiently large , several real world models belong to the class (see Figure 1 for plausible members of this class):
(7) 
For this class, the complexity of finding the top items with the HammingLUCB algorithm is on the order of , which is by a factor of smaller than the complexity for finding the exact top items.
Moreover, Hamming LUCB provides a strict improvement over the optimal sample complexity in the passive setup, for which Shah and Wainwright [shah_simple_2015] establish upper bounds and minimax lower bounds which state that comparisons are necessary and sufficient to identify the top items up to a Hamming error with high probability.
As increases, the upper bound depends on gaps between items with increasingly disparate position in the ranking, and thus, the upper bound on the sample complexity decreases. The following lower bound shows that, up to logarithmic factors in , doubly logarithmic factors in the gaps, and a multiplicative scaling of , the HammingLUCB algorithm is optimal.
Theorem 2.
For any , let denote an algorithm which is uniformly accurate over . Then, when is run on any comparison instance , must make at least comparisons in expectation, where
for some universal constant .
Note that the above lower bound does not depend on the gaps involving the items . However, we can still relate the lower bound to the upper bound by (see Section A for the simple proof)
(8) 
so that we see that, up to rescaling our Hamming error tolerance , our upper and lower bounds ( and , respectively) match up to logarithmic factors. For many problem instances of interest—such as models in the class in equation (7)—the sample complexity bounds and degrade gracefully with the Hamming tolerance , so that typically we have .
Observe that if , we recover the exact top recovery upper bound in equation (2), which is related to similar results for multi armed bandits [kalyanakrishnan_pac_2012]. We believe that by modifying the confidence intervals in Hamming LUCB as in the LUCB++ algorithm of Simchowitz et al. [simchowitz_simulator_2017], one can sharpen the upper bound on the sample complexity by replacing with on the terms corresponding to items , thereby matching known lower bounds for top subset selection problem in the bandit literature [simchowitz_simulator_2017, chen2017nearly, kalyanakrishnan_pac_2012]. In the interest of simplicity, we defer refining these logarithmic factors to later work.
3.3 Parametric models
Even though the lower bound of qualitatively matches the upper bound , it gives the misleading impression that an approximate algorithm can get away without querying the items in . In the proof section, we use techniques from [simchowitz_simulator_2017] and [chen2017nearly] to establish a more refined technical lower bound showing that all items, including those with ranks close to must be compared an “adequate” number of times. For simplicity, we state a consequence of this lower bound applied to the parametric models described in Section 2.4. In addition to showing that each item has to be compared a certain number of times, this bound also establishes that even knowledge of the exact parametric form of the pairwise comparison probabilities cannot drastically improve the performance of an active ranking algorithm.
In more detail, we say that a model is parametric, if there exists a strictly increasing CDF such that for some weights . For any pair of constants , we say that a CDF is bounded, if it is differentiable, and if its derivative satisfies the bounds
(9) 
Note that for the popular BTL and Thurstone models, equation (9) holds with close to one, provided that is not too small. We say that an algorithm is symmetric if its distribution of comparisons commutes with permutations of the items. For any such algorithm, our main lower bound is as follows:
Theorem 3.
For a given , let be any symmetric algorithm that is uniformly Hamming accurate over . Then, when is run on the instance , for any integer and any item , it must make at least
comparisons involving item on average.
In particular, by choosing , we see that the total sample complexity is lower bounded by
(10) 
which is equivalent to the upper bound achieved by the HammingLUCB algorithm up to logarithmic factors. The lower bound from Theorem 3 is stronger than the lower bound from Theorem 2, in that it applies to the larger class of algorithms that are only accurate over the smaller class of parametric models. In fact, the parametric subclass is significantly smaller than the full set of pairwise comparison models , in the sense that one can find matrices in that cannot be wellapproximated by any parametric model [shah_stochastically_2015]. Therefore, theorem 3 shows that, up to rescaling the Hamming error tolerance and logarithmic factors, the HammingLUCB algorithm is optimal, even if we restrict ourself to algorithms that are uniformly accurate only over a parametric subclass. Thus, in the regime where the pairwise comparison probabilities are bounded away from zero, parametric assumptions cannot substantially reduce the sample complexity of finding an approximate ranking; an observation that has been made previously in the paper [heckel_active_2016] for exact rankings.
3.4 Random guessing
Even though our the upper and lower bounds essentially match whenever , there are there are pathological instances where , and where the HammingLUCB algorithm will make considerably more comparisons than a careful random guessing strategy.
As an example, consider a problem instance parameterized by , with scores given by
for some and . The upper bound (6) for the HammingLUCB strategy is at least on the order of , since the gap between the th and the th largest score is . However, the lower bound provided by Theorem 2 is , which is independent of . Thus, by making small, the ratio of upper and lower bounds becomes arbitrarily large. Intuitively, HammingLUCB is wasteful because it is attempting to identify the exact top arms with too much precision. However, for this particular problem instance, the following random guessing strategy will attain our lower bound. First, we obtain estimates of each score by comparing item to randomly chosen items. For each score, test whether there are items obeying and whether there are items obeying . If yes, assign these items the estimates and , respectively, and assign all remaining items uniformly at random to the sets and , and terminate.
4 Experimental results
In this section, we provide experimental evidence that corroborates our theoretical claims that the HammingLUCB algorithm allows to significantly reduce the number of comparisons if one is content with an approximate ranking. We show that these gains are attained on a realworld data set. Specifically, we generate a pairwise comparison model by choosing such that the Borda scores coincide with those found empirically in the PlaNYC survey [salganik_wiki_2015]; see panel (b) of Figure 1. We emphasize that, since Hamming LUCB depends only on the Borda scores and not on the comparison probabilities , these simulations provide a faithful representation of how Hamming LUCB performs on realworld data. In Figure 3, we plot the results of running the HammingLUCB algorithm on the PlanNYCpairwise comparison model in order to determine the top items, for different values of . We observed that the results for other values of are very similar. As suggested by our theory, the number of comparisons to find an approximate ranking decays in a manner inversely proportional in . We compare the HammingLUCB algorithm to another sensible active ranking strategy for obtaining an Hammingaccurate ranking. Specifically, we consider a version of the successive elimination strategy proposed in [heckel_active_2016, Sec. 3.1] for finding an exact ranking. This strategy can be adapted to yield an Hammingaccurate ranking by changing its stopping criterium. Instead of stopping once all items have been eliminated, we stop when either items have been assigned to the top, or items have been assigned to the bottom. While this strategy yields an Hamming accurate ranking, its sample complexity is, up to logarithmic factors, equal to , which is strictly smaller than that of the HammingLUCB algorithm. As Figure 3 shows, this strategy requires significantly more comparisons for finding an approximate ranking, thereby validating the benefits of our approach.
5 Proofs
In this section, we provide the proofs of our theorems. In order to simplify notation, we assume without loss of generality (reindexing as needed) that the underlying permutation equal to the identity, so that .
5.1 Proof of Theorem 1
Our analysis uses an argument inspired by the proof of the performance guarantee of the original LUCB algorithm from the bandit literature, presented in [kalyanakrishnan_pac_2012]. We begin by showing that the estimate is guaranteed to be close to , for all , with high probability.
Lemma 1 ([kaufmann_complexity_2014, Lem. 19]).
For any , with probability at least , the event
(11) 
occurs. The statement continues to hold for any with , .
Lemma 1 is a nonasymptotic version of the law of the iterated logarithm from kaufmann_complexity_2014 and jamieson_lil_2014.
We first show that, on the event defined in equation (11), the HammingLUCB algorithm returns sets and obeying , as desired. Indeed, suppose that . This implies that and differ in at most values, which in turn implies that and differ by at most values. Therefore, . Next, suppose that . Then, at least one item in is in . Thus, on , the termination condition (5) implies that . Similarly as above, this in turn implies that .
We next show that on the event , HammingLUCB terminates after the desired number of comparisons. Let , and define the event that item is bad as
Lemma 2.
If occurs and the termination condition (5) is false, then either or occurs.
Given Lemma 2, we can complete the proof in the following way. For an item , define
and let be the largest integer satisfying the bound . A simple calculation (see Section 5.1.1 for the details) yields that
(12) 
Let be the th iteration of the steps in the LUCB algorithm, and let and be the two items selected in Step LABEL:it:step4 of the algorithm. Note that in each iteration only those two items are compared to other items. By Lemma 2, we can therefore bound the total number comparisons by
(13) 
For inequality (i), we used the fact (12), and inequality (ii) follows because can only be true for iterations .
We conclude the proof by noting that the definition of and some algebra yields (see [heckel_active_2016, Eq. (20)]) that for sufficiently large
Applying this inequality to the RHS of equation (13) above concludes the proof.
5.1.1 Proof of fact (12)
First, consider an item . We show that if , then is false. On the event ,
(14) 
where inequality (i) follows from for , by definition of , and the last inequality follows from and . Thus, does not occur.
For an item , that is false, the argument is equivalent. For an item in the middle , the event is false by definition. This concludes the proof.
5.1.2 Proof of Lemma 2
We prove the lemma by considering all different values the indices and selected by the LUCB algorithm can take on, and showing that in each case and cannot occur simultaneously. For notational convenience, we define the indices
and note that

Suppose that and , and that both and do not occur. First note that
(15) In order to establish this claim, note that the inequality holds trivially with equality if . If , then it follows from and . Thus, we obtain
(16) where the last inequality holds by the assumption that does not occur. An analogous argument yields that
(17) Combining those inequalities yields , which contradicts that the termination condition (5) is false.

Next, suppose that is an index in the middle and is in the very bottom, i.e., , and , and both and do not occur.
First note that from and not occurring, we have that
Here, inequality (i) holds by and , and inequality (ii) follows by the definition of . On the event , this implies
(18) Inequality (18) can only be true for all if , which is equivalent to
Again using that and not occurring, we have that
(19) where inequality (i) holds since the termination condition (5) is false, and inequality (ii) follows from , where the last inequality holds since does not occur, by assumption.
From for all , it follows that for ,
(20) Below, we show that
, for all . (21) It follows that
(22) Together with equation (18), this yields that for all