Top-K Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal

03/14/2016 ∙ by Minje Jang, et al. ∙ KAIST 수리과학과 University of Illinois at Urbana-Champaign ETRI 0

We explore the top-K rank aggregation problem. Suppose a collection of items is compared in pairs repeatedly, and we aim to recover a consistent ordering that focuses on the top-K ranked items based on partially revealed preference information. We investigate the Bradley-Terry-Luce model in which one ranks items according to their perceived utilities modeled as noisy observations of their underlying true utilities. Our main contributions are two-fold. First, in a general comparison model where item pairs to compare are given a priori, we attain an upper and lower bound on the sample size for reliable recovery of the top-K ranked items. Second, more importantly, extending the result to a random comparison model where item pairs to compare are chosen independently with some probability, we show that in slightly restricted regimes, the gap between the derived bounds reduces to a constant factor, hence reveals that a spectral method can achieve the minimax optimality on the (order-wise) sample size required for top-K ranking. That is to say, we demonstrate a spectral method alone to be sufficient to achieve the optimality and advantageous in terms of computational complexity, as it does not require an additional stage of maximum likelihood estimation that a state-of-the-art scheme employs to achieve the optimality. We corroborate our main results by numerical experiments.



page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rank aggregation has been investigated in a variety of contexts such as social choice (Caplin and Nalebuff, 1991; Azari Soufiani et al., 2014), web search and information retrieval (Dwork et al., 2001), recommendation systems (Baltrunas et al., 2010), and crowd sourcing (Chen et al., 2013), to name a few. The task aims to bring a consistent ordering to a collection of items, given only partial preference information.

Due to its broad range of applications, a sheer volume of work on ranking has been done. Of numerous ranking schemes developed in the literature, arguably most dominant paradigms are spectral ranking algorithms (Brin and Page, 1998; Dwork et al., 2001; Negahban et al., 2012; Seeley, 1949; Wei, 1952; Vigna, 2009) and maximum likelihood estimation (MLE) (Ford, 1957; Hunter, 2004). Postulating the existence of underlying real-valued true preferences of the items, these paradigms intend to produce preference estimates that are consistent in a global sense, usually measured by estimation error, to order the items. While it can be understood that such estimates are faithful globally with respect to the latent preferences, it is not necessarily guaranteed that they result in optimal ranking accuracy. Accurate ranking has more to do with how well the ordering of the estimates matches that of the true preferences, and less to do with how close the estimates are to the true preferences minimizing overall estimation error.

In many realistic applications of interest, however, what we expect from accurate ranking is not an ordering that respects the entire item preferences in a global sense. Instead, we expect an ordering that precisely separates only a few items that have the highest ranks from the rest. In light of this, recent work (Chen and Suh, 2015) investigated top- identification which aims to recover the correct set of top-ranked items only. As a result, it characterized the minimax limit on the sample size (i.e., sample complexity) under a long-lasting prominent statistical model, namely the Bradley-Terry-Luce (BTL) model (Bradley and Terry, 1952; Luce, 1959). In achieving the fundamental limit, its proposed scheme called Spectral MLE merges the two popular paradigms in series so as to yield low estimates, shown therein to be crucial in identifying top-ranked items with respect to their preferences. To start with, Spectral MLE first obtains preference estimates via a spectral method, particularly Rank Centrality (Negahban et al., 2012), which produces estimates with low squared loss. And by performing additional point-wise MLEs on the estimates, it makes them have low estimation error, leading to successful top- ranking.

Analyzing error bounds can be interesting, as we can see in (Chen and Suh, 2015) where it has led to characterizing the minimax limit on the sample size for top- ranking. What makes it even more appealing is its technical challenge. Even after decades of research since the introduction of spectral methods and MLE, two dominating approaches in the literature, yet we lack notable results for error bounds. Analytical techniques that have proven useful to obtain tight error bounds do not translate well into obtaining meaningful error bounds. There lie our main contributions: tangible progress in analysis.

Main contributions. In this work, we provide a tight analysis of error bounds of a spectral method, making progress toward richer understanding of rank aggregation. The analysis makes it possible for us to characterize conditions under which the spectral method achieves the minimax optimal performance, by comparing it with a fundamental bound that delineates the performance limit beyond which any ranking algorithm cannot achieve. To be more concrete, we investigate reliable recovery of top- ranked items under the BTL pairwise comparison model in which one ranks items according to their perceived utilities modeled as noisy observations of their underlying true utilities. We consider mainly two comparison models: one is a deterministic model in which item pairs we compare are given a priori; the other is a random model in which item pairs we compare are chosen in a random and non-adaptive manner. As our main results, in the former model, we derive an upper and lower bound on the sample size for reliable recovery of top- ranked items (Theorems 3 and 3), which respectively correspond to sufficient and necessary conditions for reliable top- identification, when a spectral method Rank Centrality is employed. Inspecting the gap between the derived bounds allows us to identify conditions under which Rank Centrality can be optimal. We observe that, for well-balanced cases where the number of distinct items that an item is compared to (which we call degree) does not deviate greatly from its minimum to its maximum, how the gap scales can be nicely expressed in terms of degree (details in Section 3 after Theorem 3). In the random model we consider, item pairs we compare are first chosen independently with probability and the chosen pairs are repeatedly compared (hence random and non-adaptive). Finding the random model fit for the well-balanced case, we extend the aforementioned results and get a stronger one. We demonstrate that the gap shrinks to the order of constant (Theorem 3), hence show that a spectral method alone can achieve the order-wise optimal sample complexity for top- ranking that has recently been characterized under the same model in (Chen and Suh, 2015). There are two distinctions to note in comparison with the results in (Chen and Suh, 2015). First, we show that a spectral method can achieve the limit in so-called dense regimes where the number of distinct item pairs we compare is somewhat large. That is, in comparison to the regimes in which Chen and Suh characterized the limit, the regimes in which we achieve it are slightly restricted. Second, we show that applying only Rank Centrality is sufficient to achieve the limit in the regimes mentioned earlier, hence is more advantageous in terms of computational complexity in comparison to Spectral MLE that merges a spectral method (particularly Rank Centrality in (Negahban et al., 2012)) and an additional stage performing coordinate-wise MLEs.

Related work. Perhaps most relevant are (Chen and Suh, 2015) and (Negahban et al., 2012). To the best of our knowledge, Chen and Suh focused on top- identification under the random comparison model of our interest for the first time. A key distinction with our work is that while we employ only a spectral method to obtain bounds on estimation error, they incorporated an additional refinement stage that performs successive point-wise MLEs. Negahban et al. developed Rank Centrality on which our proposed ranking scheme is solely based. Perhaps surprisingly, it was proved that Rank Centrality, a simple spectral method, achieves the same performance as MLE in error. A priori, there is no reason to believe that a spectral method can achieve such a strong minimax optimal performance. In a similar spirit, we show that this spectral method is also minimax optimal in error, achieving the same optimality guarantee as the MLE based algorithm in (Chen and Suh, 2015). The main objective of our work is in identifying the regimes where spectral methods are as good as MLE, and proving the minimax optimality of Rank Centrality in those regimes.

Maystre and Grossglauser (2015) recently developed an algorithm that also shares a spirit of spectral ranking, called Iterative Luce Spectral Ranking (I-LSR), and showed its performance to be the same as MLE for underlying preference scores. Rajkumar and Agarwal (2014) put forth statistical assumptions that ensure the convergence of several rank aggregation methods including Rank Centrality and MLE to an optimal ranking. They derived sample complexity bounds, although the statistical optimality is not rigorously justified and total ordering instead of top- ranking is concerned. In the pairwise preference setting, many works with different interests from ours have been done. Some studied active ranking where samples are obtained adaptively. Jamieson and Nowak (2011) considered perfect total ranking and characterized the query complexity gain of adaptive sampling in the noise-free case, and the works of (Braverman and Mossel, 2008; Jamieson and Nowak, 2011; Ailon, 2012; Wauthier et al., 2013) explored the query complexity in the presence of noise while aiming at approximate total rankings. Eriksson (2013) proposed a scheme that intends to find top- queries when observation errors are assumed to be i.i.d. Some works looked into models different from the BTL model. The works of (Lu and Boutilier, 2011; Busa-Fekete et al., 2014) considered ranking problems with pairwise comparison data under the Mallows model (Mallows, 1957). Azari Soufiani et al. (2013) broke full rankings into pairwise comparisons toward parameter estimation under the Plackett-Luce (PL) model (Plackett and Luce, 1975). Hajek et al. (2014), under the PL model, derived minimax lower bounds of parameter estimation error when schemes that break partial rankings into pairwise comparisons are used.

Very recently, Shah and Wainwright (2015) showed that a simple counting method (Borda, 1781) can achieve the fundamental limit on the sample size, up to constant factors, for top-

ranking under a general parametric model in which observations depend only on the predefined probabilities of one item preferred to another

(Shah et al., 2015)

, including the BTL model as a special case. However, their assumption that the number of comparisons for each item pair follows a Binomial distribution led to a nearly complete observation model where almost every item pair is compared at least once. On the contrary, we examine a different observation model (which we will describe in detail soon), also considered in

(Negahban et al., 2012) and (Chen and Suh, 2015), which well captures the comparison graph structure that affects the sample complexity (see Theorem 1 of (Negahban et al., 2012) and numerical experiments in Section 4).

Notation. Unless specified otherwise, we use to represent , and to represent an Erdős-Rényi random graph where total vertices reside and each pair of vertices is connected by an edge independently with probability , and to represent the out-degrees of vertex .

2 Problem Formulation

Comparison model and assumptions. Suppose we perform a few pairwise evaluations on items. To gain a statistical understanding toward the ranking limits, we assume the pairwise comparison outcomes are generated based on the Bradley-Terry-Luce (BTL) model (Bradley and Terry, 1952; Luce, 1959), a long-established model that has been studied in numerous applications (Agresti, 2014; Hunter, 2004).

  • Preference scores.

    The BTL model postulates the existence of an underlying preference vector

    , where represents the preference score of item . The outcome of each pairwise comparison depends solely on the latent scores of the items being compared. Without loss of generality, we assume that


    We assume that the range of the scores is fixed irrespective of . For some positive constants and :


    In fact, the case in which the range grows with can be translated into the above fixed-range regime by separating out those items with vanishing scores (e.g. via a voting method like Borda count (Borda, 1781; Ammar and Shah, 2011)).

  • Comparison model. We denote by a comparison graph in which items and are compared if and only if belongs to the edge set . We take into account two kinds of comparison graphs. We examine general comparison graphs which can exhibit all possible topologies described by an edge set given a vertice set . Furthermore, we investigate random comparison graphs constructed by the Erdős-Rényi random graph model in which each pair of vertices is connected by an edge independently with probability .

  • Pairwise comparisons. For each , we observe comparisons between items and . The outcome of the comparison between them, denoted by , is generated based on the BTL model:


    where indicates that item is preferred over item . We adopt the convention . We assume that conditional on , ’s are jointly independent over all and . For ease of presentation, we represent the collection of sufficient statistics as


Performance metric and goal. Given the pairwise comparisons, one wishes to know whether or not the top- ranked items are identifiable. In light of this, we consider the probability of error in identifying the correct set of the top- ranked items, namely,


where is any ranking scheme that returns a set of indices and is the set of the first indices. Our goal in this work is to characterize the admissible region of in which top- ranking is feasible for a given BTL parameter , in other words, can be vanishingly small as grows. The admissible region is defined as


For a comparison graph , we are interested in the sample complexity defined as,


where . Note that the way the sample complexity is defined as (7) shows that we investigate minimax scenarios in which nature may behave in an adversarial manner with the worst-case preference scores .

3 Main Results

The most crucial part of top- ranking hinges on separating the two items near the decision boundary, i.e., the and ranked items. Unless the gap is large enough, noise in the observations can lead to erroneous estimates. In view of this, we pinpoint a separation measure as


This measure turns out to play a key role in determining the fundamental limits of top- identification.

As noted in (Ford, 1957), if the comparison graph is not connected, then it is impossible to determine the relative preferences between two disconnected components. Hence, we assume all comparison graphs considered in this paper are connected. For Erdős-Rényi model, we make the following assumption for the connectivity:


Our main findings are sufficient and necessary conditions derived for reliable top- identification for a general comparison graph. Especially, for a random comparison graph according to the Erdős-Rényi model, we can attain an order-wise tight sufficient condition for feasible top- ranking. We first state our results for general comparison graphs. Given a comparison graph and , if


then Rank Centrality correctly identifies the top- ranked items with probability at least , where , and are some numerical constants, is the Laplacian matrix of graph whose entries are defined as , and . Here is the spectral gap of matrix

defined as the difference between the two largest absolute eigenvalues of

, is the maximum out-degree of vertices in and is the minimum. Note that in terms of the sample complexity defined as (7), this theorem establishes a sufficient condition of the sample complexity for reliable top- ranking. Precisely,


We provide the proof of this theorem in Section 6.

What follows next is a necessary condition for reliable top- ranking. Fix . Given a comparison graph , if


for some numerical constant , then for any ranking scheme , there exists a preference score vector with seperation such that . This result implies that we need at least for reliable top- ranking. Then, when we express the result of Theorem 3 in terms of the sample complexity, that is


The proof is a generalized version of Theorem 2 in (Chen and Suh, 2015). We provide the proof of this theorem in Section 7.

For well-balanced cases where , one can verify that is on the order of and is on the order of . Taking these two together, the gap between the necessary condition and the sufficient condition can be shown as a factor of . We note that for well-balanced graphs, when is at least , the gap disappears. That is, Rank Centrality is optimal. We make this point precise in the following theorem where we analyze comparisons over Erdős-Rényi graphs.

Suppose . There exist positive numerical constants , and such that if , , and


then Rank Centrality correctly identifies the top- ranked items with probability at least , where . This result offers a much tighter bound than Theorem 3. In terms of the sample complexity, a sufficient condition of the sample complexity on a random comparison graph is


since the sufficient condition for reliable ranking on a random comparison graph given by (14) is , and the number of item pairs being compared concentrates to for Erdős-Rényi random comparison graphs. Note that this sufficient condition for reliable top- ranking matches the necessary condition in (13). That is, for random comparison graphs that follow the Erdős-Rényi model, we can establish the minimax optimality of Rank Centrality. Precisely,


We provide the proof of this theorem in Section 8.

Figure 1: Spectral MLE, which merges Rank Centrality and an additional refinement stage that performs coordinate-wise MLEs, achieves reliable top- ranking in the entire admissible region depicted above. Our analysis reveals that in the dense regime Rank Centrality alone suffices to achieve it.

Our main contribution is the establishment of a sufficient condition for top- identification, which matches the necessary condition derived in (Chen and Suh, 2015) for random comparison graphs constructed by the Erdős-Rényi model. It is important to point out two notable distinctions in achievability compared to (Chen and Suh, 2015). First, a spectral method such as Rank Centrality (Negahban et al., 2012) suffices to achieve the order-wise tight sample complexity, without relying on an additional process of local refinement employed in (Chen and Suh, 2015). Second, our main results concern a slightly denser regime, indicated by the condition , in which many distinct item pairs are likely to be compared. As shown in (Chen and Suh, 2015), the dense regime condition is not necessary for top- identification. However, it is not clear yet whether or not the condition is required under our approach that employs only a spectral method. Our speculation is that the sparse regime condition, indicated by , may not be sufficient for spectral methods to achieve reliable top- identification (to be discussed in Section 4).

To validate our main result based on the Erdős-Rényi model, we conducted numerical experiments (to be illustrated in Section 4). In the dense regime indicated by , the experimental results clearly illustrate that Rank Centrality alone (a spectral method) achieves reliable top- identification as Spectral MLE does. In the sparse regime indicated by , however, Rank Centrality fails to achieve it, which leads us to the aforementioned speculation.

As mentioned earlier, our ranking algorithm is based solely on a spectral method, Rank Centrality in (Negahban et al., 2012), which enjoys nearly-linear time computational complexity. Hence, not only can the information-theoretic limit promised by (16) be achieved by a computationally efficient low-complexity algorithm, but also we can achieve it with much less computational overhead as compared to Spectral MLE in (Chen and Suh, 2015), which employs an additional refinement stage.

By the hypothesis in Theorem 3, . It means that, unless , the minimax optimality of the sample complexity we claim to characterize is on the order of , not . We note that we consider a regime where is not on the constant order, so it is reasonable to assume , which leads to . Note that since there are items each with , a typical regime of scales as . Therefore, we conclude that the minimax optimality of the sample complexity is on the order of .

4 Experimental Results

We conduct a series of synthetic experiments to corroborate our main result in Theorem 3. We consider both dense () and sparse () regimes. To be more precise, we set constant , and set and , to make each be in its proper range. To specify the implementation parameters, we use , , and . Each result in all numerical simulations is obtained by averaging over 10000 Monte Carlo trials.

Figure 2: Dense regime (): empirical estimation error v.s. (left); empirical success rate v.s. (right).
Figure 3: Sparse regime (): empirical estimation error v.s. (left); empirical success rate v.s. (right).

Figure 2 illustrates the numerical experiments conducted in the dense regime. We see that as increases, meaning as we get to obtain pairwise evaluation samples beyond the minimal sample complexity, (1) the estimation error of Rank Centrality decreases and soon meets that of Spectral MLE (left); (2) the success rate of Rank Centrality increases and soon hits along with Spectral MLE (right). The curves clearly support our results; in the dense regime specified by , Rank Centrality (a spectral method) alone can achieve reliable top- ranking.

Figure 3 illustrates the numerical experiments conducted in the sparse regime. We see that, in contrast with the experiments in the dense regime, as increases, (1) the estimation error of Rank Centrality decreases but does not meet that of Spectral MLE (left); (2) the success rate of Rank Centrality increases but does not reach that of Spectral MLE which hits nearly (right). The curves lead us to speculate that the sparse regime condition specified by may not be sufficient for spectral methods to achieve reliable top- identification.

5 Conclusion and Future Work

We investigated top- rank aggregation from pairwise data. We demonstrated that a spectral method alone, which features nearly-linear time computational complexity, is sufficient in achieving the minimal sample complexity in the dense regime. Some limitations of our results suggest future directions. Exploring if a spectral method can also achieve reliable top- identification in (part of) the sparse regime would be the most interesting one. Maystre and Grossglauser (2015) proposed an Iterative Luce Spectral Ranking (I-LSR) algorithm that has a spirit of spectral ranking, and showed that surprisingly the performance of I-LSR is the same as MLE for underlying preference scores. Motivated by their results, we can set out to investigate I-LSR to see if it can achieve the minimax optimality in the sparge regime. Analyzing spectral methods under comparison graphs not limited to the Erdős-Rényi graphs as well as other choice models such as the Plackett-Luce model (Plackett and Luce, 1975; Hajek et al., 2014; Maystre and Grossglauser, 2015) could be another.

6 Proof of Theorem 3

6.1 Algorithm Description

  Input: The collection of sufficient statistics .
  Compute the transition matrix :
  Output the stationary distribution of matrix .
Algorithm 1 Rank Centrality (Negahban et al., 2012)

In an ideal scenario where we obtain an infinite number of samples per pairwise comparison, i.e., , sufficient statistics converge to . Then, constructed matrix defined in Algorithm 1 becomes a matrix whose entries are defined as


The entries for observed pairs of items, , represent the relative likelihood of item being preferred to item . Intuitively, random walks of over the long run will visit some states (corresponding to items) more often, if they have been preferred to other frequently-visited states or preferred to many other states.

The random walks have some properties that lead us to a desirable outcome. We can see that the walks are reversible, as holds, thus have a stationary distribution equal to the preference score vector , up to some constant scaling. We can also see that under the assumption that guarantees connectivity, the walks are irreducible, thus the stationary distribution is unique. To find the stationary distribution of the walks of is to retrieve the precise underlying preference scores.

It is clear that random walks of , which can be viewed as a noisy version of , will give us an approximation to the ground-truth preference scores. The algorithm described above adopts a power method to compute the stationary distribution. Power methods are known to be computationally efficient in obtaining the leading eigenvalue of a sparse matrix (Meirovitch, 1997)

. Initially starting with the uniform distribution over

, the algorithm iteratively computes the following until convergence:


where is a vector that represents the distribution of a random walk at iteration . When convergence is reached, the algorithm returns the indices of the largest components of the distribution, which are the top- ranked items.

6.2 Proof Outline

To distinguish the top- items from the rest, the pointwise error of each item becomes a fundamental bottleneck for top- ranking. It will be impossible to separate the and ranked items unless their score separation exceeds the aggregate error of the score estimates for the two items. Based on this observation, we focus on figuring out the maximal pointwise error .

For the sake of clear demonstration, we use our stronger result in Theorem 3 attained by extending the results in Theorems 3 and 3 to the Erdős-Rényi model. The explanations carry over to the general model of our interest just as well, illustrating the motivation and the needed steps toward proving Theorem 3. Now, let us see how the following norm bound on the pointwise error (derived under the Erdős-Rényi random comparison graphs model) plays a key role in top- identification:


given and where and are some constants.

We assume for ease of presentation. Suppose , then


for all and , indicating that the algorithm will output the top- items as desired. Hence, as long as holds (coinciding with the claimed bound in (19)), in other words, holds, reliable top- ranking is guaranteed with the sample size .

What remains toward proving Theorem 3 is the proof of the following, which is an norm bound on the maximal pointwise error in general comparison graphs (i.e., a generalized version of (19)):


given where is some constant.

To prove (21), we first derive an upper bound (which we will prove at the end of this section) on the pointwise error between the score estimate of item at iteration and the true score, which consists of three terms:


Then, we use the three lemmas stated below (which we prove in the following sections). We consider the regime where is sufficiently large. For , applying Lemmas 6.26.2 and 6.2 to (22) and solving it, we get


where , , , and is a term that vanishes as tends to infinity. The above bound converges to as tends to infinity. Since it holds for all , we complete the proof of (21).

For a comparison graph ,


with probability at least .

Suppose , where . Then,


with probability at least .

Suppose . Then, in the regime where is sufficiently large,


with probability at least , where , , and are some constants.

We prove (22) here, and the proofs of the three lemmas are to follow.

Proof of (22): For fixed , applying , we get


Using the fact that random walks on an ideal version of matrix (matrix ) are reversible, we get


Using (27) and (6.2), we get


We note that from . Similarly, . Thus, . Applying this equality and the triangle inequality to (29), we get the recursive relation (22).

6.3 Proof of Lemma 6.2

From the definitions of and ,


First, let us bound the absolute value of the summations in (30). Under the model of our interest, all pairwise comparison samples ’s are independent over pair and . Applying the Hoeffding inequality, conditional on , we get


Then we choose , to get the tail probability as follows:


Therefore, with probability at least ,


Now, let us put (33) into (30) and use . Conditional on , with probability at least ,


6.4 Proof of Lemma 6.2

Using the Hoeffding inequality, as in the proof of Lemma 6.2, one can easily verify that, with probability at least ,


Using (35), we get


We let . From the definition of ,


Putting (37) into (36), we get


Choosing , we complete the proof of Lemma 6.2.

6.5 Proof of Lemma 6.2

We define a sequence as follows.


From Lemma 6.2, with probability at least ,


Putting (22) with (40) into (39), we get


We simplify the last two terms. The first of the two is straightforward. The definition of gives . The last term needs an extra effort. We defer the proof to a later part of this section, stating the following for now.


Putting and (42) into (41), we get


From Lemma 6.2, we can find a constant such that for all . Using such , we get

We now use an upper bound that prior work derived on (see Lemma 2 of (Negahban et al., 2012)). When , for some constants and ,


We use and for uniformly distributed , to get


We let , and . Putting (45) into (6.5) and solving it, we get


From the definition of , we complete the proof of Lemma 6.2.

Proof of (42): By changing the order of the summations and the Cauchy-Schwarz inequality, we get


From the definitions of , we can bound the term as follows.