1 Introduction
Rank aggregation has been investigated in a variety of contexts such as social choice (Caplin and Nalebuff, 1991; Azari Soufiani et al., 2014), web search and information retrieval (Dwork et al., 2001), recommendation systems (Baltrunas et al., 2010), and crowd sourcing (Chen et al., 2013), to name a few. The task aims to bring a consistent ordering to a collection of items, given only partial preference information.
Due to its broad range of applications, a sheer volume of work on ranking has been done. Of numerous ranking schemes developed in the literature, arguably most dominant paradigms are spectral ranking algorithms (Brin and Page, 1998; Dwork et al., 2001; Negahban et al., 2012; Seeley, 1949; Wei, 1952; Vigna, 2009) and maximum likelihood estimation (MLE) (Ford, 1957; Hunter, 2004). Postulating the existence of underlying realvalued true preferences of the items, these paradigms intend to produce preference estimates that are consistent in a global sense, usually measured by estimation error, to order the items. While it can be understood that such estimates are faithful globally with respect to the latent preferences, it is not necessarily guaranteed that they result in optimal ranking accuracy. Accurate ranking has more to do with how well the ordering of the estimates matches that of the true preferences, and less to do with how close the estimates are to the true preferences minimizing overall estimation error.
In many realistic applications of interest, however, what we expect from accurate ranking is not an ordering that respects the entire item preferences in a global sense. Instead, we expect an ordering that precisely separates only a few items that have the highest ranks from the rest. In light of this, recent work (Chen and Suh, 2015) investigated top identification which aims to recover the correct set of topranked items only. As a result, it characterized the minimax limit on the sample size (i.e., sample complexity) under a longlasting prominent statistical model, namely the BradleyTerryLuce (BTL) model (Bradley and Terry, 1952; Luce, 1959). In achieving the fundamental limit, its proposed scheme called Spectral MLE merges the two popular paradigms in series so as to yield low estimates, shown therein to be crucial in identifying topranked items with respect to their preferences. To start with, Spectral MLE first obtains preference estimates via a spectral method, particularly Rank Centrality (Negahban et al., 2012), which produces estimates with low squared loss. And by performing additional pointwise MLEs on the estimates, it makes them have low estimation error, leading to successful top ranking.
Analyzing error bounds can be interesting, as we can see in (Chen and Suh, 2015) where it has led to characterizing the minimax limit on the sample size for top ranking. What makes it even more appealing is its technical challenge. Even after decades of research since the introduction of spectral methods and MLE, two dominating approaches in the literature, yet we lack notable results for error bounds. Analytical techniques that have proven useful to obtain tight error bounds do not translate well into obtaining meaningful error bounds. There lie our main contributions: tangible progress in analysis.
Main contributions. In this work, we provide a tight analysis of error bounds of a spectral method, making progress toward richer understanding of rank aggregation. The analysis makes it possible for us to characterize conditions under which the spectral method achieves the minimax optimal performance, by comparing it with a fundamental bound that delineates the performance limit beyond which any ranking algorithm cannot achieve. To be more concrete, we investigate reliable recovery of top ranked items under the BTL pairwise comparison model in which one ranks items according to their perceived utilities modeled as noisy observations of their underlying true utilities. We consider mainly two comparison models: one is a deterministic model in which item pairs we compare are given a priori; the other is a random model in which item pairs we compare are chosen in a random and nonadaptive manner. As our main results, in the former model, we derive an upper and lower bound on the sample size for reliable recovery of top ranked items (Theorems 3 and 3), which respectively correspond to sufficient and necessary conditions for reliable top identification, when a spectral method Rank Centrality is employed. Inspecting the gap between the derived bounds allows us to identify conditions under which Rank Centrality can be optimal. We observe that, for wellbalanced cases where the number of distinct items that an item is compared to (which we call degree) does not deviate greatly from its minimum to its maximum, how the gap scales can be nicely expressed in terms of degree (details in Section 3 after Theorem 3). In the random model we consider, item pairs we compare are first chosen independently with probability and the chosen pairs are repeatedly compared (hence random and nonadaptive). Finding the random model fit for the wellbalanced case, we extend the aforementioned results and get a stronger one. We demonstrate that the gap shrinks to the order of constant (Theorem 3), hence show that a spectral method alone can achieve the orderwise optimal sample complexity for top ranking that has recently been characterized under the same model in (Chen and Suh, 2015). There are two distinctions to note in comparison with the results in (Chen and Suh, 2015). First, we show that a spectral method can achieve the limit in socalled dense regimes where the number of distinct item pairs we compare is somewhat large. That is, in comparison to the regimes in which Chen and Suh characterized the limit, the regimes in which we achieve it are slightly restricted. Second, we show that applying only Rank Centrality is sufficient to achieve the limit in the regimes mentioned earlier, hence is more advantageous in terms of computational complexity in comparison to Spectral MLE that merges a spectral method (particularly Rank Centrality in (Negahban et al., 2012)) and an additional stage performing coordinatewise MLEs.
Related work. Perhaps most relevant are (Chen and Suh, 2015) and (Negahban et al., 2012). To the best of our knowledge, Chen and Suh focused on top identification under the random comparison model of our interest for the first time. A key distinction with our work is that while we employ only a spectral method to obtain bounds on estimation error, they incorporated an additional refinement stage that performs successive pointwise MLEs. Negahban et al. developed Rank Centrality on which our proposed ranking scheme is solely based. Perhaps surprisingly, it was proved that Rank Centrality, a simple spectral method, achieves the same performance as MLE in error. A priori, there is no reason to believe that a spectral method can achieve such a strong minimax optimal performance. In a similar spirit, we show that this spectral method is also minimax optimal in error, achieving the same optimality guarantee as the MLE based algorithm in (Chen and Suh, 2015). The main objective of our work is in identifying the regimes where spectral methods are as good as MLE, and proving the minimax optimality of Rank Centrality in those regimes.
Maystre and Grossglauser (2015) recently developed an algorithm that also shares a spirit of spectral ranking, called Iterative Luce Spectral Ranking (ILSR), and showed its performance to be the same as MLE for underlying preference scores. Rajkumar and Agarwal (2014) put forth statistical assumptions that ensure the convergence of several rank aggregation methods including Rank Centrality and MLE to an optimal ranking. They derived sample complexity bounds, although the statistical optimality is not rigorously justified and total ordering instead of top ranking is concerned. In the pairwise preference setting, many works with different interests from ours have been done. Some studied active ranking where samples are obtained adaptively. Jamieson and Nowak (2011) considered perfect total ranking and characterized the query complexity gain of adaptive sampling in the noisefree case, and the works of (Braverman and Mossel, 2008; Jamieson and Nowak, 2011; Ailon, 2012; Wauthier et al., 2013) explored the query complexity in the presence of noise while aiming at approximate total rankings. Eriksson (2013) proposed a scheme that intends to find top queries when observation errors are assumed to be i.i.d. Some works looked into models different from the BTL model. The works of (Lu and Boutilier, 2011; BusaFekete et al., 2014) considered ranking problems with pairwise comparison data under the Mallows model (Mallows, 1957). Azari Soufiani et al. (2013) broke full rankings into pairwise comparisons toward parameter estimation under the PlackettLuce (PL) model (Plackett and Luce, 1975). Hajek et al. (2014), under the PL model, derived minimax lower bounds of parameter estimation error when schemes that break partial rankings into pairwise comparisons are used.
Very recently, Shah and Wainwright (2015) showed that a simple counting method (Borda, 1781) can achieve the fundamental limit on the sample size, up to constant factors, for top
ranking under a general parametric model in which observations depend only on the predefined probabilities of one item preferred to another
(Shah et al., 2015), including the BTL model as a special case. However, their assumption that the number of comparisons for each item pair follows a Binomial distribution led to a nearly complete observation model where almost every item pair is compared at least once. On the contrary, we examine a different observation model (which we will describe in detail soon), also considered in
(Negahban et al., 2012) and (Chen and Suh, 2015), which well captures the comparison graph structure that affects the sample complexity (see Theorem 1 of (Negahban et al., 2012) and numerical experiments in Section 4).Notation. Unless specified otherwise, we use to represent , and to represent an ErdősRényi random graph where total vertices reside and each pair of vertices is connected by an edge independently with probability , and to represent the outdegrees of vertex .
2 Problem Formulation
Comparison model and assumptions. Suppose we perform a few pairwise evaluations on items. To gain a statistical understanding toward the ranking limits, we assume the pairwise comparison outcomes are generated based on the BradleyTerryLuce (BTL) model (Bradley and Terry, 1952; Luce, 1959), a longestablished model that has been studied in numerous applications (Agresti, 2014; Hunter, 2004).

Preference scores.
The BTL model postulates the existence of an underlying preference vector
, where represents the preference score of item . The outcome of each pairwise comparison depends solely on the latent scores of the items being compared. Without loss of generality, we assume that(1) We assume that the range of the scores is fixed irrespective of . For some positive constants and :
(2) In fact, the case in which the range grows with can be translated into the above fixedrange regime by separating out those items with vanishing scores (e.g. via a voting method like Borda count (Borda, 1781; Ammar and Shah, 2011)).

Comparison model. We denote by a comparison graph in which items and are compared if and only if belongs to the edge set . We take into account two kinds of comparison graphs. We examine general comparison graphs which can exhibit all possible topologies described by an edge set given a vertice set . Furthermore, we investigate random comparison graphs constructed by the ErdősRényi random graph model in which each pair of vertices is connected by an edge independently with probability .

Pairwise comparisons. For each , we observe comparisons between items and . The outcome of the comparison between them, denoted by , is generated based on the BTL model:
(3) where indicates that item is preferred over item . We adopt the convention . We assume that conditional on , ’s are jointly independent over all and . For ease of presentation, we represent the collection of sufficient statistics as
(4)
Performance metric and goal. Given the pairwise comparisons, one wishes to know whether or not the top ranked items are identifiable. In light of this, we consider the probability of error in identifying the correct set of the top ranked items, namely,
(5) 
where is any ranking scheme that returns a set of indices and is the set of the first indices. Our goal in this work is to characterize the admissible region of in which top ranking is feasible for a given BTL parameter , in other words, can be vanishingly small as grows. The admissible region is defined as
(6) 
For a comparison graph , we are interested in the sample complexity defined as,
(7) 
where . Note that the way the sample complexity is defined as (7) shows that we investigate minimax scenarios in which nature may behave in an adversarial manner with the worstcase preference scores .
3 Main Results
The most crucial part of top ranking hinges on separating the two items near the decision boundary, i.e., the and ranked items. Unless the gap is large enough, noise in the observations can lead to erroneous estimates. In view of this, we pinpoint a separation measure as
(8) 
This measure turns out to play a key role in determining the fundamental limits of top identification.
As noted in (Ford, 1957), if the comparison graph is not connected, then it is impossible to determine the relative preferences between two disconnected components. Hence, we assume all comparison graphs considered in this paper are connected. For ErdősRényi model, we make the following assumption for the connectivity:
(9) 
Our main findings are sufficient and necessary conditions derived for reliable top identification for a general comparison graph. Especially, for a random comparison graph according to the ErdősRényi model, we can attain an orderwise tight sufficient condition for feasible top ranking. We first state our results for general comparison graphs. Given a comparison graph and , if
(10) 
then Rank Centrality correctly identifies the top ranked items with probability at least , where , and are some numerical constants, is the Laplacian matrix of graph whose entries are defined as , and . Here is the spectral gap of matrix
defined as the difference between the two largest absolute eigenvalues of
, is the maximum outdegree of vertices in and is the minimum. Note that in terms of the sample complexity defined as (7), this theorem establishes a sufficient condition of the sample complexity for reliable top ranking. Precisely,(11) 
We provide the proof of this theorem in Section 6.
What follows next is a necessary condition for reliable top ranking. Fix . Given a comparison graph , if
(12) 
for some numerical constant , then for any ranking scheme , there exists a preference score vector with seperation such that . This result implies that we need at least for reliable top ranking. Then, when we express the result of Theorem 3 in terms of the sample complexity, that is
(13) 
The proof is a generalized version of Theorem 2 in (Chen and Suh, 2015). We provide the proof of this theorem in Section 7.
For wellbalanced cases where , one can verify that is on the order of and is on the order of . Taking these two together, the gap between the necessary condition and the sufficient condition can be shown as a factor of . We note that for wellbalanced graphs, when is at least , the gap disappears. That is, Rank Centrality is optimal. We make this point precise in the following theorem where we analyze comparisons over ErdősRényi graphs.
Suppose . There exist positive numerical constants , and such that if , , and
(14) 
then Rank Centrality correctly identifies the top ranked items with probability at least , where . This result offers a much tighter bound than Theorem 3. In terms of the sample complexity, a sufficient condition of the sample complexity on a random comparison graph is
(15) 
since the sufficient condition for reliable ranking on a random comparison graph given by (14) is , and the number of item pairs being compared concentrates to for ErdősRényi random comparison graphs. Note that this sufficient condition for reliable top ranking matches the necessary condition in (13). That is, for random comparison graphs that follow the ErdősRényi model, we can establish the minimax optimality of Rank Centrality. Precisely,
(16) 
We provide the proof of this theorem in Section 8.
Our main contribution is the establishment of a sufficient condition for top identification, which matches the necessary condition derived in (Chen and Suh, 2015) for random comparison graphs constructed by the ErdősRényi model. It is important to point out two notable distinctions in achievability compared to (Chen and Suh, 2015). First, a spectral method such as Rank Centrality (Negahban et al., 2012) suffices to achieve the orderwise tight sample complexity, without relying on an additional process of local refinement employed in (Chen and Suh, 2015). Second, our main results concern a slightly denser regime, indicated by the condition , in which many distinct item pairs are likely to be compared. As shown in (Chen and Suh, 2015), the dense regime condition is not necessary for top identification. However, it is not clear yet whether or not the condition is required under our approach that employs only a spectral method. Our speculation is that the sparse regime condition, indicated by , may not be sufficient for spectral methods to achieve reliable top identification (to be discussed in Section 4).
To validate our main result based on the ErdősRényi model, we conducted numerical experiments (to be illustrated in Section 4). In the dense regime indicated by , the experimental results clearly illustrate that Rank Centrality alone (a spectral method) achieves reliable top identification as Spectral MLE does. In the sparse regime indicated by , however, Rank Centrality fails to achieve it, which leads us to the aforementioned speculation.
As mentioned earlier, our ranking algorithm is based solely on a spectral method, Rank Centrality in (Negahban et al., 2012), which enjoys nearlylinear time computational complexity. Hence, not only can the informationtheoretic limit promised by (16) be achieved by a computationally efficient lowcomplexity algorithm, but also we can achieve it with much less computational overhead as compared to Spectral MLE in (Chen and Suh, 2015), which employs an additional refinement stage.
By the hypothesis in Theorem 3, . It means that, unless , the minimax optimality of the sample complexity we claim to characterize is on the order of , not . We note that we consider a regime where is not on the constant order, so it is reasonable to assume , which leads to . Note that since there are items each with , a typical regime of scales as . Therefore, we conclude that the minimax optimality of the sample complexity is on the order of .
4 Experimental Results
We conduct a series of synthetic experiments to corroborate our main result in Theorem 3. We consider both dense () and sparse () regimes. To be more precise, we set constant , and set and , to make each be in its proper range. To specify the implementation parameters, we use , , and . Each result in all numerical simulations is obtained by averaging over 10000 Monte Carlo trials.
Figure 2 illustrates the numerical experiments conducted in the dense regime. We see that as increases, meaning as we get to obtain pairwise evaluation samples beyond the minimal sample complexity, (1) the estimation error of Rank Centrality decreases and soon meets that of Spectral MLE (left); (2) the success rate of Rank Centrality increases and soon hits along with Spectral MLE (right). The curves clearly support our results; in the dense regime specified by , Rank Centrality (a spectral method) alone can achieve reliable top ranking.
Figure 3 illustrates the numerical experiments conducted in the sparse regime. We see that, in contrast with the experiments in the dense regime, as increases, (1) the estimation error of Rank Centrality decreases but does not meet that of Spectral MLE (left); (2) the success rate of Rank Centrality increases but does not reach that of Spectral MLE which hits nearly (right). The curves lead us to speculate that the sparse regime condition specified by may not be sufficient for spectral methods to achieve reliable top identification.
5 Conclusion and Future Work
We investigated top rank aggregation from pairwise data. We demonstrated that a spectral method alone, which features nearlylinear time computational complexity, is sufficient in achieving the minimal sample complexity in the dense regime. Some limitations of our results suggest future directions. Exploring if a spectral method can also achieve reliable top identification in (part of) the sparse regime would be the most interesting one. Maystre and Grossglauser (2015) proposed an Iterative Luce Spectral Ranking (ILSR) algorithm that has a spirit of spectral ranking, and showed that surprisingly the performance of ILSR is the same as MLE for underlying preference scores. Motivated by their results, we can set out to investigate ILSR to see if it can achieve the minimax optimality in the sparge regime. Analyzing spectral methods under comparison graphs not limited to the ErdősRényi graphs as well as other choice models such as the PlackettLuce model (Plackett and Luce, 1975; Hajek et al., 2014; Maystre and Grossglauser, 2015) could be another.
6 Proof of Theorem 3
6.1 Algorithm Description
In an ideal scenario where we obtain an infinite number of samples per pairwise comparison, i.e., , sufficient statistics converge to . Then, constructed matrix defined in Algorithm 1 becomes a matrix whose entries are defined as
(17) 
The entries for observed pairs of items, , represent the relative likelihood of item being preferred to item . Intuitively, random walks of over the long run will visit some states (corresponding to items) more often, if they have been preferred to other frequentlyvisited states or preferred to many other states.
The random walks have some properties that lead us to a desirable outcome. We can see that the walks are reversible, as holds, thus have a stationary distribution equal to the preference score vector , up to some constant scaling. We can also see that under the assumption that guarantees connectivity, the walks are irreducible, thus the stationary distribution is unique. To find the stationary distribution of the walks of is to retrieve the precise underlying preference scores.
It is clear that random walks of , which can be viewed as a noisy version of , will give us an approximation to the groundtruth preference scores. The algorithm described above adopts a power method to compute the stationary distribution. Power methods are known to be computationally efficient in obtaining the leading eigenvalue of a sparse matrix (Meirovitch, 1997)
. Initially starting with the uniform distribution over
, the algorithm iteratively computes the following until convergence:(18) 
where is a vector that represents the distribution of a random walk at iteration . When convergence is reached, the algorithm returns the indices of the largest components of the distribution, which are the top ranked items.
6.2 Proof Outline
To distinguish the top items from the rest, the pointwise error of each item becomes a fundamental bottleneck for top ranking. It will be impossible to separate the and ranked items unless their score separation exceeds the aggregate error of the score estimates for the two items. Based on this observation, we focus on figuring out the maximal pointwise error .
For the sake of clear demonstration, we use our stronger result in Theorem 3 attained by extending the results in Theorems 3 and 3 to the ErdősRényi model. The explanations carry over to the general model of our interest just as well, illustrating the motivation and the needed steps toward proving Theorem 3. Now, let us see how the following norm bound on the pointwise error (derived under the ErdősRényi random comparison graphs model) plays a key role in top identification:
(19) 
given and where and are some constants.
We assume for ease of presentation. Suppose , then
(20) 
for all and , indicating that the algorithm will output the top items as desired. Hence, as long as holds (coinciding with the claimed bound in (19)), in other words, holds, reliable top ranking is guaranteed with the sample size .
What remains toward proving Theorem 3 is the proof of the following, which is an norm bound on the maximal pointwise error in general comparison graphs (i.e., a generalized version of (19)):
(21) 
given where is some constant.
To prove (21), we first derive an upper bound (which we will prove at the end of this section) on the pointwise error between the score estimate of item at iteration and the true score, which consists of three terms:
(22) 
Then, we use the three lemmas stated below (which we prove in the following sections). We consider the regime where is sufficiently large. For , applying Lemmas 6.2, 6.2 and 6.2 to (22) and solving it, we get
(23) 
where , , , and is a term that vanishes as tends to infinity. The above bound converges to as tends to infinity. Since it holds for all , we complete the proof of (21).
For a comparison graph ,
(24) 
with probability at least .
Suppose , where . Then,
(25) 
with probability at least .
Suppose . Then, in the regime where is sufficiently large,
(26) 
with probability at least , where , , and are some constants.
We prove (22) here, and the proofs of the three lemmas are to follow.
6.3 Proof of Lemma 6.2
From the definitions of and ,
(30) 
First, let us bound the absolute value of the summations in (30). Under the model of our interest, all pairwise comparison samples ’s are independent over pair and . Applying the Hoeffding inequality, conditional on , we get
(31) 
Then we choose , to get the tail probability as follows:
(32) 
Therefore, with probability at least ,
(33) 
Now, let us put (33) into (30) and use . Conditional on , with probability at least ,
(34) 
6.4 Proof of Lemma 6.2
6.5 Proof of Lemma 6.2
We define a sequence as follows.
(39) 
From Lemma 6.2, with probability at least ,
(40) 
Putting (22) with (40) into (39), we get
(41) 
We simplify the last two terms. The first of the two is straightforward. The definition of gives . The last term needs an extra effort. We defer the proof to a later part of this section, stating the following for now.
(42) 
Putting and (42) into (41), we get
(43) 
From Lemma 6.2, we can find a constant such that for all . Using such , we get
We now use an upper bound that prior work derived on (see Lemma 2 of (Negahban et al., 2012)). When , for some constants and ,
(44) 
We use and for uniformly distributed , to get
(45) 
We let , and . Putting (45) into (6.5) and solving it, we get
(46) 
From the definition of , we complete the proof of Lemma 6.2.
Proof of (42): By changing the order of the summations and the CauchySchwarz inequality, we get
(47) 
From the definitions of , we can bound the term as follows.