Collaborative filtering (CF) is one of the most widely used methods in recommender systems. CF systems recommend similar items to users who share similar traits or similar tastes . Learning to Rank (LTR) methods, which directly learn to accurately rank items based on the user’s ratings, rankings, or implicit feedback over a set of items, are widely used to learn the perfect rankings for top-n recommendation scenarios [2, 3].
I-a Related Work
Learning to Rank (LTR) methods can be categorized into point-wise, pair-wise and list-wise methods. Point-wise methods learn ranking models from the scores assigned by users to individual items (see e.g., ). Pair-wise methods (e.g., BPR 
), learn binary classifiers that compare ordered pairs of items to decide whether the first item is preferred to the second. The applicability of such methods is limited by the high computational cost of pair-wise comparisons of user-rated items in generating the training samples for the binary classifiers7, 8]. Typically, such methods optimize a smooth approximation of a loss function that measures the distance between the reference lists of ranked items in the training data and the ranked list of items produced by the ranking model. For example, CLiMF , which optimizes a smooth lower bound of mean-reciprocal rank (MRR), aims at ranking a small set of most-preferred items at the top of the list; TFMAP  optimizes the mean average precision (MAP) of top-ranked items for each user in a given context. Other methods optimize discounted cumulative gain (DCG) or normalized DCG (NDCG) can be found in [10, 11]
. Examples of list-wise methods that optimize the probability of permutations that map items to ranks include: ListPMF, which represents each user as a probability distribution of the permutations over rated items based on the Plackett-Luce model; ListRank , which aims to identify a ranking permutation that minimizes the cross-entropy between the distribution of the observed ranking of items based on user ratings and the predicted rankings with respect to the top-ranked item; or methods that optimize the log-posterior over the predicted preference order with the observed preference orders ; and methods that leverage deep neural nets (e.g., ) to learn the non-linear interaction between user-item pairs (See  for a survey of such methods).
Existing LTR approaches suffer from several limitations. Although, in practical applications, only the top (say N) items in the ranked list are of interest, and the lower-ranked ratings in the list are less reliable, most existing LTR methods are optimized on the ranks of the entire lists, which, could potentially reduce the ranking quality of the top-ranked items. Furthermore, the computational complexity of straightforward approaches to optimizing ranking measures (e.g., DCG , MRR , AUC  or MAP ), scale quadratically with (the average number of observed items across all users), which renders such methods impractical in large-scale real-world settings.
I-B Overview and Contributions
To address the limitations of existing LTR systems, we propose Top-N-Rank, a novel latent factor based list-wise ranking model for top-N recommendation problem which directly optimizes a novel weighted “top-heavy” truncated variant of the DCG ranking measure, namely, wDCG@N. Since in many situations, the users only refer to the top-ranked items in the list, the higher positions often have more impact on the ranking score than the lower ones. Our proposed measure, wDCG@N, differs from the conventional DCG in two important aspects: (i) It considers only on the top N items in the ranked lists, thereby eliminating the impact of low-ranked items; and (ii) it incorporates weights that allow the model to learn from multiple kinds of implicit feedback.
Because wDCG@N is non-smooth, we introduce the rectified linear unit (ReLU) as a smoothing function, which is more suited to top N ranking problems than the traditional sigmoid function. ReLU not only eliminates the contribution of the low-ranked items on our loss function, but also allows us to obtain a significantly faster variant of the wDCG@N-based LTR approach (Top-N-Rank.ReLU), yielding a substantial reduction in computational complexity from to , where denotes the dimension of latent factors,
denotes the batch size of users for the stochastic gradient descent algorithm andis the average number of (observed) items, making Top-N-Rank.ReLU scalable to large-scale real-world settings.
The main contributions of this paper can be summarized as follows:
We have introduced a novel list-wise ranking model for top-N recommendation, which directly optimizes a weighted top-heavy truncated ranking objective function, wDCG@N. Our model improves the quality of the top-N item lists by mitigating the impact of the lower-ranked items, and is capable of handling multiple types of implicit feedback (if available).
We have introduced the rectified linear unit (ReLU) to smooth our objective function. We have demonstrated that ReLU could (1) eliminate the impact of the lower-ranked items and (2) substantially speed up the calculations by careful algorithm design.
We have proposed a fast learning algorithm (Top-N-Rank) for generic smoothing functions, and a substantially more efficient variant (Top-N-Rank.ReLU) for the ReLU smoothing function which reduce the computational complexity from to .
We compared the performance of Top-N-Rank and Top-N-Rank.ReLU with several state-of-the-art list-wise LTR methods [10, 3, 16, 13, 12] using the MovieLens (20M) data set  and the Amazon video games data set . All experiments were performed on Apache Spark cluster  and the raw data are stored on Hadoop Distribute File System (HDFS).
Let be the set of users, be the set of items and be the set of types of implicit feedback. The interactions of users with items and the associated implicit feedback are represented by , where the entry denotes the interaction of user with item and the associated implicit feedback . We further denote by the subset of items actually observed by or presented to . For each , we denote the rating of by and the position of based on ’s rank ordering of items by . We reserve the indexing letters to indicate arbitrary user in and to represent arbitrary item in .
Ii-a Latent Factor Model
Latent factor models (LFMs) are state-of-the-art in terms of both the quality of recommendations as well as scalability 
. LFMs represent both users and items using low-dimensional vectors of latent factors. Letbe the set of latent factors such that is an matrix with the -th row denoting the latent factors of and is an matrix with the -th row denoting the latent factors of . The rank , of the latent factor matrices is much smaller than or . The rating for to is predicted by the dot product of and .
Ii-B Discounted Cumulative Gain
The discounted Cumulative Gain (DCG)  is a widely used measure of quality of recommendations, which measures the degree to which higher ranked items are placed ahead of the lower ranked ones, with the contribution of lower ranked items discounted by a logarithmic factor. Let be a binary indicator to represent whether is relevant to , then DCG of is computed by:
Notice that the ranked position (start from zero) of item can be computed by:
where is an indicator function with if is true and otherwise . Given our emphasis on getting the top rated items ranked correctly in the list of recommended items, DCG appears to be good criterion to optimize on. However, as evident from (1), DCG suffers from two important limitations: (i) Although DCG de-emphasizes the contribution of the lower ranked items, it does not eliminate the collective effect of a large number of lower ranked items, even if the ranking of such lower ranked items are less reliable. If the goal is to optimize the ranking of the N top rated items, it makes sense to tailor objective function to focus explicitly on the ranking of the N top-rated items and ignore the rest. (ii) Because DCG assigns equal weights to all implicit user feedback, it fails to account for differences in their trustworthiness.
We proceed to introduce wDCG@N, a variant of DCG that overcomes its drawbacks. We then describe two smoothing functions (sigmoid and rectified linear unit (ReLU)) that can convert wDCG@N to a smoothed function that is amenable to being optimized using the standard optimization techniques. Finally, we show how to use the ReLU approximation of wDCG@N to obtain a scalable LTR algorithm.
Iii-a Top-N-Rank Training Objective
To address the limitations of DCG, we introduce wDCG@N, which is defined as follows:
The first term in (3), is an indicator function that selects only the N top-rated items and ignores the rest. The coefficient in the second term denotes the weight of the implicit feedback , which can model the reliability or importance of the feedback. The choice of is application and data-dependent. For example, one can set to the number of items rated by (or presented to) the user  or the conversion rate (the proportion between buyers and users who conducted the implicit feedback). The resulting ranking objective can be formulated as:
where denotes the -norm and is the regularization coefficient that controls over-fitting.
Iii-B Smooth Approximations of Top-N-Rank Training Objective
A non-smooth training objective such as the one in (4) is challenging to optimize. Hence, we replace the non-smooth training objective in (4) by its smooth approximation. Specifically, we approximate the indicator function in (2) by a smooth function such that with . In what follows, we will consider two different smooth functions that accomplish this goal.
Sigmoid function. The sigmoid function is widely used in existing list-wise LTR-based recommendation models (e.g., [9, 3]) for its appealing performance in practice. Instead of adopting the sigmoid function directly, we introduce a scaling constant
to provides more accurate estimation, such that the indicator function is approximated bywhere .
Rectifier function. The rectified linear units (ReLU) , is a nonlinear smooth function with several properties that make it attractive in our setting. The one-sided nature of ReLU () eliminates the contribution of of the lower-rated items to the objective function. Second, ReLU is computationally simper: only comparison and addition operations are required. Third, the form of ReLU permits an efficient algorithm (see Algorithm 2) with computational complexity that is linear in the average number of (observed) items across all users (see section III-C2). When ReLU is used, we have
Iii-B1 Parameterization of the Smooth Functions
Recall that the “top-N term”, , was introduced to indicate whether item ranks among the top N item list. However, a poor choice of the hyper-parameters in the smooth function could lead to gross under-estimation or over-estimation of and thus negate the utility of the “top-N term”.
Here we examine how to choose the parameters of the sigmoid and the ReLU functions so that they behave as intended. In the case of the sigmoid function, we see that a choice of matters, with proper values of (e.g., ) yielding the desired behavior. In the case of ReLU, we can ensure the desired behavior by controlling the initial distribution of latent factors . Suppose that , where,
approximately follows a Gaussian distribution, i.e.,with and . In order to ensure that , making use of the fact that , we find , and hence , which provides the basic setting for all of the Top-N models using the ReLU as the smoothing function.
Iii-C Fast LTR Algorithms
Iii-C1 Fast LTR Algorithm for Generic Smooth Function
To optimize the objective function reported in (4), we need to compute the predicted score of each item and then perform the pair-wise comparison to determine their positions in the rank-ordered list. Because in most cases, the number of items far outnumbers the dimension of the latent factors , the complexity of a single pass is . One common practice is to exploit the sparsity of by considering only the predicted scores of the observed items, yielding a smooth objective function such as:
The gradient of w.r.t. is given by (6).
The gradients for w.r.t. are and . is the derivative of smooth function which is presented in section III-B. The pseudo-code for Top-N-Rank (using stochastic gradient descent) is given in Algorithm 1.
Similar to , the computational complexity of Top-N-Rank for one iteration is ( denotes the average number of observed items across all users).
Iii-C2 Enhanced LTR Algorithm for Large-scale Top-N Recommendation
Algorithm 1 can be intractable in large-scale systems with massive number of items. The use of ReLU permits a more efficient version of Top-N-Rank (denoted as Top-N-Rank.ReLU) to further reduce the complexity to . The pseudo-code for Top-N-Rank.ReLU is given in Algorithm 2.
For a single user, step 5 and 12 is computed in . Note that (step 7 and step 14) and (step 8 and step 15) can be calculated in and respectively through step-by-step accumulation, the complexity of step 6-11 and step 13-17 are . Therefore, the overall computational complexity of Top--Rank.ReLU for one iteration is . In practice, is usually very small (less than 20) even in large-scale systems, Thus, we can expect that is of the same scale with , then the complexity is simplified to , making Top-N-Rank.ReLU suitable for large-scale settings with massive item sets.
Iv Experiments and Results
We report results of two sets of experiments. The first set of experiments compare the performance of Top-N-Rank models using either the sigmoid and the ReLU functions for smoothing with or without the “top-N truncation”. Our results show that Top-N-Rank.ReLU (using “top-N truncation” and the ReLU function, i.e., Algorithm 2) outperforms the other methods on both the benchmark data sets. The second set of experiments compare the performance of Top-N-Rank.ReLU with several state-of-the-art list-wise LTR CF approaches. Our results show that Top-N-Rank algorithms outperform the these methods on both the benchmark data sets.
All of our experiments were performed on an Apache Spark cluster  with four compute nodes (Intel Xeon 2.1 GHz CPU with 20G RAM per node) with the raw data stored on Hadoop Distributed File System (HDFS). The model parameters were tuned to optimize performance on the training data. We describe below the details of the experiments and the results.
Iv-a Experimental Setup
Iv-A1 Data Sets
We used two benchmark data sets in our experiments: (i) the Amazon video games data set , which contains a subset of video games product reviews (ratings, text, etc.) from Amazon. There are 7,077 users, 25,744 items and more than 1 million ratings in this data set. (ii) The MovieLens (20M) data set , which contains 138,493 users, 27,278 items and more than 20 millions of ratings. The ratings in both data sets are split to 1-5 stars, with more stars corresponding to higher ratings. We use only the user rating data to conduct the experiments.
Iv-A2 Evaluation Procedure
We first remove users who rated fewer than 10 items. For the remaining users, we convert the ratings to implicit feedback based on the item ratings provided by each user. That is, for each , we assign when and otherwise . We randomly select half of the ratings provided by each user for training, and use the rest for evaluation. On each test run, we average the performance over all of the users. We repeat this process 5 times and report the performance averaged across the 5 independent experiments.
|Amazon Video Games||Top-N-Rank.ReLU||0.8186||0.8009||0.8079||0.8334||0.8455|
We measure the performance based only on the rated items as in . Because we focus on the placement of the top-rated items in the rank-ordered list, it is natural to use the Normalized Discounted Cumulative Gain (NDCG)  as the performance measure. In this paper, we report the average of NDCG@1 through NCDG@N across all users.
The definition of NDCG at the top-N positions for a user is given by:
where DCG@N is the DCG value for the top-N ranked items as described in (1). IDCG@N is the perfect ranking score which is obtained when the ranked list is created by sorting the items in descending order of their implicit feedback values (ratings).
Iv-B Comparison of Variants of Top-N-Rank
We compare the performance of LTR models trained with the smoothed and regularized wDCG@N objective using either the sigmoid and the ReLU functions for smoothing, with or without the “top-N truncation”: (i) Top-N-Rank.ReLU: our proposed Top-N-Rank model trained to optimize wDCG@N smoothed using the ReLU function (Algorithm 2); (ii) non-Top-N.ReLU: The LTR model trained to optimize wDCG smoothed using the ReLU; (iii) Top-N-Rank.sgm: our proposed top-N model trained to optimize wDCG@N smoothed using the sigmoid function (Algorithm 1); and (iv) non-Top-N.sgm: The LTR model trained to optimize wDCG smoothed using the sigmoid function.
In these experiments, we set the number of latent factors and the number of items ranked, . For the sigmoid function, and for the ReLU function, (see section III-B1). The regularization coefficient is set to ; and the batch size is set to 10% of the users in the training data. All methods are run until either maximum iteration is reached or sum-of-square distance between parameters of two consecutive runs falls below the threshold .
The results of our experiments are summarized in Table I. Our results clearly show that the Top-N-Rank models with the “top-N truncation” term in the objective function consistently and statistically significantly (based on paired Student’s -test) outperform the non top-N counterparts. This confirms our intuition that Top-N-Rank models focus on correctly ordering the top-rated items, and hence are resistant to the cumulative effect (often unreliable) of lower-rated items. The results in Table I also show that Top-N-Rank.ReLU substantially outperforms Top-N-Rank.sgm. Moreover, the performance of Top-N-Rank.sgm is comparable to that of Non-Top-N.ReLU. We conclude that the ReLU function, with an appropriate choice of is better able to more accurately rank the top-rated items. The runtime for Top-N-Rank.ReLU is significantly lower than that of Top-N-Rank.sgm (results not shown), proving the appealing efficiency of Top-N-Rank.ReLU.
|Amazon Video Games||Top-N-Rank.ReLU||0.8135||0.7964||0.8043||0.8325||0.8383|
Iv-C Top-N-Rank.ReLU Compared with the State-of-the-Art List-wise LTR Models
We compare Top-N-Rank.ReLU with several state-of-the-art list-wise LTR CF approaches: (i) MF-ADG: An algorithm that optimizes the Averaged Discounted Gain (ADG), which is obtained by averaging the DCG across all users . Similar to our work, MF-ADG is designed to work with implicit feedback data sets. The sampling parameter is fixed at 100; (ii) CLiMF
: A MF model that is designed to work with binarized implicit feedback data sets, which optimizes mean-reciprocal rank (MRR). Instead of directly optimizing MRR, CLiMF learns the latent factors by maximizing the smoothed lower bound of MRR; (iii) xCLiMF: An extension of CLiMF that optimizes the expected reciprocal rank (ERR), which is designed to work with graded user ratings ; (iv) ListRank: A MF model that optimizes the cross-entropy between the distribution of the observed and predicted ratings using top-one probability, which is obtained using the softmax function ; and (v) ListPMF-PL: A list-wise probabilistic matrix factorization method that maximizes the log posterior over the predicted rank order with the observed preference order, using the Plackett-Luce model based permutation probability .
The results of our experiments are summarized in Table II. Top-N-Rank.ReLU consistently outperforms the baseline models on both the Amazon Video Game and MovieLens data sets, regardless of the length of recommended item lists. Student’s test further demonstrate the significance of our results (not shown). Although Top-N-Rank.ReLU maximize wDCG on the top-20 items, the results show that the model offers better quality of recommendations across the top 1-20 items relative to the baselines. This may be explained in part by the following limitations of the individual methods: CLiMF and xCLiMF are designed to optimize the smoothed reciprocal rank (RR), which does not fully exploit the user ratings, because of its emphasis on optimizing only a few of the relevant items for each user; MF-ADG maximizes an approximation of ADG, on a small set of sampled data which may limit the quality of the estimates; ListRank and ListPMF-PL are designed for rating data, but assign the same weight to all items with the same rating. Perhaps more importantly, all of the methods except Top-N-Rank.ReLU attempt to optimize the ranking over the entire set of the user-rated items, as opposed to only the N top-ranked items, which makes them susceptible to the noise in the ratings of low-ranked items.
V Summary and Discussion
In this paper, we proposed Top-N-Rank, a novel family of list-wise Learning-to-Rank models for reliably recommending the N top-ranked items. The proposed models optimize wDCG@N, a variant of the widely used cumulative discounted gain (DCG) objective function which differs from DCG in two important aspects: (1) It limits the evaluation of DCG only on the top N items in the ranked lists, thereby eliminating the impact of low-ranked items on the learned ranking function; and (2) it incorporates weights that allow the model to learn from multiple kinds of implicit user feedback with differing levels of reliability or trustworthiness. Because wDCG@N is non-smooth, we considered two smooth approximations of wDCG@N, using the traditional sigmoid function and the rectified linear unit (ReLU). We proposed a family of learning-to-rank algorithms (Top-N-Rank) that work with any smooth objective function (e.g., smooth approximations of wDCG@N). We designed Top-N-Rank.ReLU, a more efficient version of Top-N-Rank that exploits the properties of ReLU function to reduce the computational complexity of Top-N-Rank from quadratic to linear in the average number of items rated by users. The results of our experiments using two widely used benchmarks, namely, the Amazon Video Games data set and the MovieLens data set demonstrate that: (i) The “top-N truncation” of the objective function substantially improves the ranking quality; (ii) using the ReLU for smoothing the wDCG@N objective function yields significant improvement in both ranking quality as well as runtime as compared to using the sigmoid function; and (iii) Top-N-Rank.ReLU substantially outperforms the state-of-the-art list-wise ranking CF methods (MF-ADG, CLiMF, xCLiMF, ListRank, and ListPMF-PL) in terms of ranking quality.
Some promising directions for further research include: (i) Fusing the proposed top-N truncation component and ReLU smoothing function with different list-wise LTR objectives (i.e., MAP, AUC or MRR); (ii) investigation of complex interaction structure of user-item pairs with the help of deep neural nets; (iii) extending the proposed model to tensor factorization or factorization machines to take in multiple types of features.
Dr. Jinlong Hu and Dr. Shoubin Dong were supported in part by the Scientific Research Joint Funds of Ministry of Education of China and China Mobile [No. MCM20150512], and the Natural Science Foundation of Guangdong Province of China [No. 2018A030313309]; Junjie Liang was supported in part by a research assistantship funded by the National Science Foundation through the grant [No. CCF 1518732] to Dr. Vasant G. Honavar. Dr. Vasant Honavar was supported in part by the Edward Frymoyer Endowed Chair in Information Sciences and Technology at Pennsylvania State University, and in part by the Sudha Murty Distinguished Visiting Chair in Neurocomputing and Data Science at the Indian Institute of Science.
A. Gunawardana and G. Shani, “A survey of accuracy evaluation metrics of recommendation tasks,”
Journal of Machine Learning Research, vol. 10, no. Dec, pp. 2935–2962, 2009.
-  T.-Y. Liu, “Learning to rank for information retrieval,” Foundations and Trends® in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009.
-  Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, N. Oliver, and A. Hanjalic, “CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering,” in Proceedings of the sixth ACM conference on Recommender systems. ACM, 2012, pp. 139–146.
-  X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017, pp. 173–182.
S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “BPR:
Bayesian personalized ranking from implicit feedback,” in
Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 2009, pp. 452–461.
-  S. Huang, S. Wang, T.-Y. Liu, J. Ma, Z. Chen, and J. Veijalainen, “Listwise collaborative filtering,” in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015, pp. 343–352.
-  Z. C. T. Q. Tie-Yan and L. M.-F. T. H. Li, “Learning to Rank: From Pairwise Approach to Listwise Approach,” 2014.
-  F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li, “Listwise approach to learning to rank: theory and algorithm,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 1192–1199.
-  Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A. Hanjalic, and N. Oliver, “TFMAP: optimizing MAP for top-n context-aware recommendation,” in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2012, pp. 155–164.
-  D. Lim, J. McAuley, and G. Lanckriet, “Top-n recommendation with missing implicit feedback,” in Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 2015, pp. 309–312.
-  H. Steck, “Gaussian ranking by matrix factorization,” in Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 2015, pp. 115–122.
-  J. Liu, C. Wu, Y. Xiong, and W. Liu, “List-wise probabilistic matrix factorization for recommendation,” Information Sciences, vol. 278, pp. 434–447, 2014.
-  Y. Shi, M. Larson, and A. Hanjalic, “List-wise learning to rank with matrix factorization for collaborative filtering,” in Proceedings of the fourth ACM conference on Recommender systems. ACM, 2010, pp. 269–272.
-  S. Zhang, L. Yao, and A. Sun, “Deep learning based recommender system: A survey and new perspectives,” arXiv preprint arXiv:1707.07435, 2017.
-  N. Ifada and R. Nayak, “Do-Rank: DCG optimization for learning-to-rank in tag-based item recommendation systems,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2015, pp. 510–521.
-  Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, and A. Hanjalic, “xCLiMF: optimizing expected reciprocal rank for data with multiple levels of relevance,” in Proceedings of the 7th ACM conference on Recommender systems. ACM, 2013, pp. 431–434.
-  F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 5, no. 4, p. 19, 2016.
-  J. McAuley and A. Yang, “Addressing complex and subjective product-related queries with customer reviews,” in Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2016, pp. 625–635.
-  A. G. Shoro and T. R. Soomro, “Big data analysis: Apache spark perspective,” Global Journal of Computer Science and Technology, vol. 15, no. 1, 2015.
-  C. C. Aggarwal, “Model-based collaborative filtering,” in Recommender Systems. Springer, 2016, pp. 71–138.
-  K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of ir techniques,” ACM Transactions on Information Systems (TOIS), vol. 20, no. 4, pp. 422–446, 2002.
-  Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008, pp. 426–434.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015.
-  K. Järvelin and J. Kekäläinen, “IR evaluation methods for retrieving highly relevant documents,” in Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2000, pp. 41–48.