1 Introduction
This paper studies a stylized, yet natural, learningtorank problem and points out the critical incorrectness of a widely used nearest neighbor algorithm. In this problem, let be the set of agents (users) and be the set of alternatives. Each agent or alternative is associated with a latent feature vector , where . The utility of to is determined by , where is a bivariate function. We observe a (partial) ranking of agent (for all ) over the alternatives. The distribution of ranking is determined by the alternatives’ utilities, . When is larger, is more likely to rank higher in .
The nearest neighbor problem. For an and a parameter , we aim to design an efficient algorithm that finds (almost) all ’s such that . The nearest neighbor problem for alternatives can also be defined similarly.
This fundamental machine learning problem is embedded in many critical operations. For example, recommender systems use partial ranking information (partial observations of
’s) to estimate agents’ preferences over unranked alternatives, product designers estimate the demand curve of a new product based on consumers’ past choices
(Berry et al., 1995), security firms estimate terrorists’ preferences based on their past behavior, and political firms estimate campaign options based on voters’ preferences Liu (2009).A widelyused algorithm produces incorrect results. The most widely studied and deployed algorithm Liu (2009); KatzSamuels and Scott (2017) uses KendallTau (KT) distance (see Section 2) as the metrics and uses knearest neighbors (kNN) to identify similar agents: for any given , it finds all such that the KT distances between and are minimized. We will refer to this algorithm as KTkNN.
In this paper, we show that under a natural and widely applied preference model, the KT distancebased kNN for agents is provably incorrect even when the sample size grows to infinite.
Novel (correct) algorithms. First, we design a new algorithm that correctly identifies similar agents based on . We introduce a set of new features, denoted by , so that enables us to identify similar agents. A salient property of is that it relies on the rankings of other agents, which we will refer to as “global information”. This property is in sharp contrast to most existing practices of feature engineering in learningtorank algorithms Liu (2009).
Second, we design another new algorithm for identifying similar alternatives. We find that construction of alternative features can be done using local information, making identifying similar alternatives significantly easier.
Agentwise or alternativewise similarities. Finding similar alternatives (items) is easier than finding similar agents (users) in collaborative filtering Sarwar et al. (2001): in practice, recommender systems based on “itemsimilarities” are usually more effective. One explanation is the “missing data problem”. Because there are often more users than items, the intersection between items ranked by two arbitrary users is often small, and measuring item similarities is usually unreliable.
Our result provides a new explanation of the performance discrepancies: under the PlackettLuce model, finding similar agents is fundamentally more difficult than finding similar alternatives.
Additional remarks
Finding neighbors implies learning to rank. We focus on the problem of identifying nearest neighbors in this work. Our approach can be naturally extended to infer ’s preferences over unranked alternatives by aggregating rankings from the neighbors using methods developed in the literature, such as in Conitzer et al. (2006); Alon (2006); Ailon (2007); KenyonMathieu and Schudy (2007).
Nondeterministic preferences. We assume that agent rank alternative according to her perceived untility , where
is a radial basis function (RBF)
Scholkopf and Smola (2001) (i.e., the value of depends only on ) and is a random noise.Practical implications. We focus on conceptual and theoretical investigation of the learningtorank problem with nondeterministic preferences. Although we point out the harm from using , it may not be the root cause of a practical system based on . To diagnose a ranking algorithm (specifically whether our theoretical results are relevant), one shall first check whether our model is suitable for his/her datasets.
2 Preliminaries
Our model. Let be the set of agents (or users) and be the set of alternatives (or items).
Utility functions. Agent ’s utility on alternative is determined by a utility function . Throughout this paper, we use , where is the norm. Most results developed in this paper can be generalized to many radialbasis functions Scholkopf and Smola (2001).
Observation and rankings. We observe the ranking of each user in the decreasing order of perceived utility of the alternatives . When follows a Gumbel distribution, then the nondeterministic preferences model is also known as the PlackettLuce model Plackett (1975); Luce (1977). Let be a random permutation of , and we have
(1) 
Distributions of and . We further assume that and are i.i.d. generated from fixed but unknown distributions and , respectively. Let the cdf (respectively, pdf) of and be and (respectively, and ). For exposition purposes, we make simplifying assumptions that (i) , and (ii) and are on and “near uniform” (i.e., and are bounded by a constant ). There assumptions are widely used in latent space models and can be relaxed via more careful analysis, see Abraham et al. (2013) and references therein.
Our problem. Given an agent , we say is an nearest neighbor set for if

For all such that , .

For all such that , .

For any such that , we do not require any performance guarantee (i.e., whether ).
Similarly, we can define nearest neighbor set for alternatives. In other words, all ’s that are within away from should be included in , and all ’s that are more than away from should not be in . Therefore, our goal is to design efficient algorithms to compute
nearest neighbor sets with high probability, where
.Partial observations and forecasts. All results presented in this paper can be generalized to the partial ranking scenario, where each only consists of a subset of linear size. Furthermore, a natural problem in this scenario is to infer an agent’s preferences over unranked alternatives. We note that an nearest neighbor set for can be used to infer its rankings over the entire via existing techniques Conitzer et al. (2006); Alon (2006); Ailon (2007). Therefore, our problem is strictly harder than the preference estimation problem.
Comparison to the KS model by KatzSamuels and Scott (2017). In the KS model, agent ’s ranking is deterministic, i.e., iff , whereas our model allows to “add noise” to the observations, which is a more standard practice in learning to rank.
2.1 Kendaltau distance and prior algorithms
Let and be two rankings over and let denote the rank of the th alternative. The Kendalltau distance is
(2) 
where is an indicator function that sets to one if and only if its argument is true. The Normalized Kendalltau distance between and is .
Nearestneighbor algorithms. See Algorithm 1. We shall refer to the algorithm as . This algorithm uses KTdistance as the distance metrics and run a kNN algorithm on top of it.
3 Incorrectness of  under Nondeterministic Preferences
This section explains why  is incorrect under the PlackettLuce model. Let be the groundtruth ranking of agent , i.e., the th element in is the th largest value of the set .
Intuition behind . Previously,  was considered correct because of two intuitions: (1) if and are close, then and are also close, and (2) if and are close, their “realizations” and will also be close. Therefore, when minimizes for large and , it also minimizes .
Intuition (1) is theoretically grounded (see KatzSamuels and Scott (2017)). The key problematic part is that for nondeterministic users, and do not have a monotone relationship. That is, an increase in does not necessarily imply an increase in , and vice versa.
Example 3.1
Let , , and . Consider the following two optimization problems:
(3) 
(4) 
We can see that the structures of these two optimization problems are very different. For (3), the optimal solution set is . But for (4), the optimal solution set is . The key difference is that itself would be an optimal solution to (3), but it is not an optimal solution to (4).
Interpreting the result. We need to solve (3) to find nearest neighbors, but the objective of  is closer to (4). Specifically, consider a scenario with only two alternatives but is sufficiently large. The above example shows that is far away from . Because is sufficiently large, we have . The right side of the approximation resembles the nearestneighbor approximation (i.e., ) because converges to its expectation for large . Therefore,  solves (4) which is different from solving (3).
This observation can be used to build a more general negative theorem, which implies that  cannot output any nearest neighbor set with high probability, because with probability the output of  is away.
Theorem 3.1
Let and
be a nearuniform distribution on
. Let be arbitrary agent provides ranking(a random variable conditioned on
). We have:where and .
Proof: Since is nearuniform on , we know . Then, we prove the following two claims indicating for all :

[noitemsep,topsep=0pt]

When :

When :
Those two claims above also indicate that  cannot output any nearest neighbor set with probability. Here, we focus on the most difficult case in the first claim above to highlight our new analytical techniques (see Appendix A for the full proof). Specifically, below we show
Because is a continuous function of , we use to characterize the minimal point . Specifically, we shall show that for all , which means the function is minimized when .
We next analyze in the following events respectively:

[noitemsep,topsep=0pt]

: when .

: when and .

: when and .

: when and .
According to the casebycase analysis shown in Lemma A.1 and noting that ,
Equality sign holds if and only if . Therefore, is minimized at for . Appendix A completes the proof for using similar techniques.
4 Nearestneighbor with global information
We propose a novel (and correct) kNN algorithm based on a new set of features for all that can be used by nearestneighbor algorithms. Each feature needs to use global information of all the rankings.
Features based on allpair normalized Kendalltau distance. We associate each agent with a feature , where is constructed as below:
First, we group to pairs so that the th pair consists of . We then let:
Our features are
(5) 
It follows that ’s are independent and for all . Then we define the distance function between agent and agent as
(6) 
The new kNN (). Let , (i.e., ). Our algorithm, hereafter , returns the set . See Algorithm 2.
Global vs. local information. Algorithm  uses only local information to construct features (i.e., feature of depends only on ), whereas  needs to use all ranking information to construct a feature . We note that relying on local information is unlikely to be sufficient to construct highquality features. Instead, we need to use a slower procedure that takes advantage of all of the local information available to construct . Earlier works in network analysis (see Li et al. (2017)
and references therein) developed similar techniques in classifying nodes.
Theorem 4.1
Using the notations above, for all , that are nearuniform on and , let . There exist positive constants , and such that is an nearest neighbor set of with probability at least .
Proof: We first show that there exist constants and such that
(7) 
Here, we outline a proof for the upper bound of (7), which is applicable to any distributions on (see Appendix B for the full proof and lower bound analysis, which uses the nearuniform conditions).
All the analysis below assumes conditioning in knowing and (i.e., means ). W.l.o.g., assume . We use techniques similar to those developed in Theorem 3.1. Specifically let . We have
Let us define three events:

[noitemsep,topsep=0pt]

: when or .

: when and .

: when or .
We compute conditioned on the three events (i.e., for ).
Event : One can see that by using a symmetric argument.
Event .
We have
Event . We have
follows from combining all results for .
Next, we show the tail bound of , where decays exponentially in . Observing that ’s are independent in any , we have according to standard Chernoff bound. Combining the tail bound above with (7), we know there exists constants , and such that
(8) 
We now interpret (8) in the context of nearest neighbor set. We analyze the part first. For any agent , we have,
We also note that there are at most agents in . By applying union bound to all agents in , we got the conclusion of .
Letting , we get . Then, Theorem 4.1 follows by applying union bound to ’s and ’s conclusions.
5 NearestNeighbor algorithm for alternatives
This section designs an algorithm for finding nearest neighbor set for an alternative . While global information is needed for finding nearest neighbor for agents, we need only local information for alternatives. For exposition purpose, we focus on uniform and .
Additional notations. Define . Here, represents an agent and represents its ranking over all alternatives. Intuitively, is if and is otherwise. Next, define . Note that the terms in the summation are i.i.d. random variables, each of which has the same distribution with .
Our algorithm and its intuition. Our goal is to find an nearest neighbor set of . To determine whether and are close, we shall check : when , with probability exactly that , which implies . When and are far away, then it is unlikely that (there is a catch; see below). As is the mean of copies of independent , it will drift away from .
A “bug” due to symmetry. One issue of the above argument is that large does not always imply . For any , when , we have:
One can check that for any . Therefore, for .
A twostep algorithm. Let be a suitable parameter and . We design a twostep algorithm to circumvent the symmetric bug:

[noitemsep,topsep=0pt]

Step 1. Construction of candidate set: We let . All neighbors of are in .

Step 2. Filtering: We design a procedure that can determine whether is close to or to for all . Using this algorithm, and use the procedure filter out all the alternatives in that are not close to .
Details of step 1 and 2 will be given below. The performance of our algorithm is characterized by the following proposition.
Proposition 5.1
Using the above notations, let be an arbitrary alternative. There exists an efficient algorithm that constructs an nearest neighbor set for any . Here, and are two suitably chosen constants.
Step 1: Construction of candidate set. Let .
Lemma 5.2
Let (for all ) and . Then there exist constants and such that
(9) 
Step 2: Filtering out unwanted alternatives. Now we have a candidate set such that for any , is either close to or . Next, we describe an algorithm that eliminates elements in that’s not close to . We now formally describe the problem.
The Splitcluster problem. Let be a set such that for any , either or , where . Our goal is to find all such that .
Our splitcluster algorithm is shown in Algorithm 3, with analysis and remarks shown in Appendix.
Lemma 5.3
When , Algorithm 3 returns all such that .
6 Numeric validation
This section presents results of experiments based on synthetic data to validate our theoretical results. We randomly generated 1200 agents and 6000 alternatives according to . Then we introduce a new agent and reveal its partial ranking to the system. Our goal is to predict . We examine three algorithms: (i) , (ii) , (iii) Groundtruth (i.e., directly using the nearest neighbors of an agent in latent space). The groundtruth algorithm cannot be implemented in practice and only serves as a optimal bound for any kNN based algorithms. We consider ( is the number of neighbors to keep). See Figure 1. One can see that  consistently has bad performance whereas ’s performance is very close to the lower bound.
Figure 1(c) shows experiments for highdimensional latent spaces () under the same setting as 1D except . We see  consistently has worse performance than , whose performance is very close to ground truth.
7 Additional related work
Nonparametric learning in practice.
Our model is sometimes considered as a nonparametric model. Nonparametric preference learning methods are widely applied in practice but little is known about their theoretical guarantees. Our work is related to the recent line of work in preference completion
McNee et al. (2006); Liu and Yang (2008b); Cremonesi et al. (2010); Wang et al. (2012, 2014); Huang et al. (2015); Cheng et al. (2017); KatzSamuels and Scott (2017). Some most recent algorithms (e.g., Wang et al. (2014); Huang et al. (2015)) have impressive performance in practice, but have no theoretical explanations justifying the successes.Nonranking observations. There is a rich literature (e.g., see Herlocker et al. (1999); Liu and Yang (2008a); Bobadilla et al. (2013); Lee et al. (2016) and references therein) on learning information about based on partial observations. For example, in the classical collaborative filtering problem, noisy observations of (e.g., the observation is
for some white noise
). These results are not comparable to ours. Other work Kleinberg and Sandler (2003, 2008) assumes an observation model related to ours: an alternative is more likely to be used/evaluated by an agent if is high.Lowrank assumption. The work Park et al. (2015); Gunasekar et al. (2016) assumes the matrix
(or its expectation) is low ranks. This matrix is full rank in all the utility functions and models considered in our program. Furthermore, their loss functions are not in terms of rank correlations (the most natural choice).
Parametric inference. Parametric preference learning has been extensively studied in machine learning, especially learning to rank Cheng et al. (2010); Mollica and Tardella (2016); Negahban et al. (2017); Azari Soufiani et al. (2012, 2014, 2013b, 2013a); Maystre and Grossglauser (2015); Khetan and Oh (2016); Hughes et al. (2015); Zhao et al. (2016). These method often assume the existence of a parametric model, usually Random Utility Model or Mallows’ model.
8 Concluding remarks
This paper introduced a natural learningtorank model, and showed that under this model a widelyused KTdistance based kNN algorithm failed to find similar agents (users). To fix the problem, we introduced a new set of features for agents that relies on the ranking of other agents (i.e., relying on “global information”). We also design an algorithm for finding similar alternatives, based on using only local information. The two algorithmic results showed that the “itemsimilarity” problem is fundamentally different from the “usersimilarity” problem.
Generalization. We made two assumptions in our analysis: (i) we observe each agent’s full ranking over ; and (ii) and are in 1dimensional space. Relaxing assumption (i) is straightforward because we need only develop specialized tail bounds for (discussed in Section 4). Relaxing assumption (ii), however, is challenging because our analysis heavily relies on symmetric properties over the 1dimensional lines, many of which break in highdimensional space. We note that in practice, the improvement of predictive power using highdimensional models is usually incremental Li et al. (2017).
Limitation. RBF utilities are not universally applicable in all recommender systems (e.g.,
in some circumstances, “cosine similarities” are more suitable utility functions). This paper’s major contribution is the theoretical investigation of a fundamental learningtorank problem. It remains a future work to apply our results to understand their impacts on practical recommender systems.
References
 Abraham et al. [2013] Ittai Abraham, Shiri Chechik, David Kempe, and Aleksandrs Slivkins. Lowdistortion inference of latent similarities from a multiplex social network. In Proceedings of the Twentyfourth Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’13, pages 1853–1883, Philadelphia, PA, USA, 2013. Society for Industrial and Applied Mathematics.
 Ailon [2007] Nir Ailon. Aggregation of partial rankings, pratings and topm lists. In Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms (SODA), 2007.
 Alon [2006] Noga Alon. Ranking tournaments. SIAM Journal of Discrete Mathematics, 20:137–142, 2006.
 Azari Soufiani et al. [2012] Hossein Azari Soufiani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In Proceedings of Advances in Neural Information Processing Systems (NIPS), pages 126–134, Lake Tahoe, NV, USA, 2012.

Azari Soufiani et al. [2013a]
Hossein Azari Soufiani, William Chen, David C. Parkes, and Lirong Xia.
Generalized methodofmoments for rank aggregation.
In Proceedings of Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 2013a. 
Azari Soufiani et al. [2013b]
Hossein Azari Soufiani, David C. Parkes, and Lirong Xia.
Preference Elicitation For General Random Utility Models.
In
Proceedings of Uncertainty in Artificial Intelligence (UAI)
, Bellevue, Washington, USA, 2013b.  Azari Soufiani et al. [2014] Hossein Azari Soufiani, David C. Parkes, and Lirong Xia. Computing Parametric Ranking Models via RankBreaking. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014.
 Berry et al. [1995] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium. Econometrica, 63(4):841–890, 1995.
 Bobadilla et al. [2013] J. Bobadilla, F. Ortega, A. Hernando, and A. GutiéRrez. Recommender systems survey. KnowledgeBased Systems, 46:109–132, 2013.
 Cheng et al. [2017] Peizhe Cheng, Shuaiqiang Wang, Jun Ma, Jiankai Sun, and Hui Xiong. Learning to recommend accurate and diverse items. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pages 183–192, 2017. ISBN 9781450349130.
 Cheng et al. [2010] Weiwei Cheng, Krzysztof J. Dembczynski, and Eyke Hüllermeier. Label ranking methods based on the plackettluce model. Proceedings of the 27th International Conference on Machine Learning (ICML10), pages 215–222, 2010.
 Conitzer et al. [2006] Vincent Conitzer, Andrew Davenport, and Jayant Kalagnanam. Improved bounds for computing Kemeny rankings. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 620–626, Boston, MA, USA, 2006.
 Cremonesi et al. [2010] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. Performance of recommender algorithms on topn recommendation tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, pages 39–46. ACM, 2010. ISBN 9781605589060.
 Gunasekar et al. [2016] Suriya Gunasekar, Oluwasanmi O. Koyejo, and Joydeep Ghosh. Preference Completion from Partial Rankings. In Advances in Neural Information Processing Systems, 2016.
 Herlocker et al. [1999] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 230–237, 1999.
 Huang et al. [2015] Shanshan Huang, Shuaiqiang Wang, TieYan Liu, Jun Ma, Zhumin Chen, and Jari Veijalainen. Listwise collaborative filtering. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 343–352. ACM, 2015.
 Hughes et al. [2015] David Hughes, Kevin Hwang, and Lirong Xia. Computing Optimal Bayesian Decisions for Rank Aggregation via MCMC Sampling. In Proceedings of the Conference on Uncertainly in Artificial Intelligence (UAI), pages 385–394, 2015.
 KatzSamuels and Scott [2017] Julian KatzSamuels and Clayton Scott. Nonparametric preference completion. CoRR, abs/1705.08621, 2017. URL http://arxiv.org/abs/1705.08621.

KenyonMathieu and Schudy [2007]
Claire KenyonMathieu and Warren Schudy.
How to Rank with Few Errors: A PTAS for Weighted Feedback Arc Set on
Tournaments.
In
Proceedings of the Thirtyninth Annual ACM Symposium on Theory of Computing
, pages 95–103, San Diego, California, USA, 2007.  Khetan and Oh [2016] Ashish Khetan and Sewoong Oh. Datadriven rank breaking for efficient rank aggregation. In Proceedings of the 33rd International Conference on Machine Learning, volume 48, 2016.
 Kleinberg and Sandler [2003] Jon Kleinberg and Mark Sandler. Convergent algorithms for collaborative filtering. In Proceedings of the 4th ACM conference on Electronic commerce, pages 1–10, 2003.
 Kleinberg and Sandler [2008] Jon Kleinberg and Mark Sandler. Using mixture models for collaborative filtering. Journal of Computer and System Sciences, 74(1):49–69, 2008.
 Lee et al. [2016] Christina E. Lee, Yihua Li, Devavrat Shah, and Dogyoon Song. Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering. In Advances in Neural Information Processing Systems, 2016.
 Li et al. [2017] Cheng Li, Felix MF Wong, Zhenming Liu, and Varun Kanade. From which world is your graph. In Advances in Neural Information Processing Systems, pages 1468–1478, 2017.
 Liu and Yang [2008a] Nathan N. Liu and Qiang Yang. EigenRank: a rankingoriented approach to collaborative filtering. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 83–90, 2008a.
 Liu and Yang [2008b] Nathan N. Liu and Qiang Yang. Eigenrank: A rankingoriented approach to collaborative filtering. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pages 83–90, 2008b. ISBN 9781605581644.
 Liu [2009] TieYan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225–331, March 2009. ISSN 15540669.
 Luce [1977] R. Duncan Luce. The choice axiom after twenty years. Journal of Mathematical Psychology, 15(3):215–233, 1977.
 Maystre and Grossglauser [2015] Lucas Maystre and Matthias Grossglauser. Fast and accurate inference of PlackettLuce models. In Proceedings of the 28th International Conference on Neural Information Processing Systems, pages 172–180, 2015.
 McNee et al. [2006] Sean M. McNee, John Riedl, and Joseph A. Konstan. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’06, pages 1097–1101, New York, NY, USA, 2006. ACM. ISBN 1595932984.
 Mollica and Tardella [2016] Cristina Mollica and Luca Tardella. Bayesian Plackett–Luce mixture models for partially ranked data. Psychometrika, pages 1–17, 2016.
 Negahban et al. [2017] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Rank centrality: Ranking from pairwise comparisons. Operations Research, 65(1):266–287, 2017.
 Park et al. [2015] Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, and Inderjit S. Dhillon. Preference Completion: Largescale Collaborative Ranking from Pairwise Comparisons. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, pages 1907–1916, 2015.
 Plackett [1975] Robin L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2):193–202, 1975.
 Sarwar et al. [2001] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Itembased collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. ACM, 2001.

Scholkopf and Smola [2001]
Bernhard Scholkopf and Alexander J Smola.
Learning with kernels: support vector machines, regularization, optimization, and beyond
. MIT press, 2001.  Wang et al. [2012] Shuaiqiang Wang, Jiankai Sun, Byron J. Gao, and Jun Ma. Adapting vector space model to rankingbased collaborative filtering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 1487–1491, 2012. ISBN 9781450311564.
 Wang et al. [2014] Shuaiqiang Wang, Jiankai Sun, Byron J. Gao, and Jun Ma. Vsrank: A novel framework for rankingbased collaborative filtering. ACM Trans. Intell. Syst. Technol., 5(3):51:1–51:24, July 2014. ISSN 21576904.
 Zhao et al. [2016] Zhibing Zhao, Peter Piech, and Lirong Xia. Learning Mixtures of PlackettLuce Models. In Proceedings of the 33rd International Conference on Machine Learning (ICML16), 2016.
Appendix A Missing analysis for analyzing 
This section presents the missing analysis in Section 3. We have the following three major lemmas.
Lemma A.1
Let be a uniform distribution on , , and . Let be an arbitrary agent and be the ranking of the agent (which is a random variable conditioned on ). We have:
Proof: Because is a continuous function of , we use to characterize the minimal point . Specifically, we shall show that for all , which means the function is minimized when .
We next calculate . Let and . When the context is clear, we can write and instead. We have
Comments
There are no comments yet.