Implementation of 'Top-n Recommendation on Graphs'
Recommender systems play an increasingly important role in online applications to help users find what they need or prefer. Collaborative filtering algorithms that generate predictions by analyzing the user-item rating matrix perform poorly when the matrix is sparse. To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. The proposed method constructs a new representation which preserves affinity and structure information in the user-item rating matrix and then performs recommendation task. To capture proximity information about users and items, two graphs are constructed. Manifold learning idea is used to constrain the new representation to be smooth on these graphs, so as to enforce users and item proximities. Our model is formulated as a convex optimization problem, for which we need to solve the well-known Sylvester equation only. We carry out extensive empirical evaluations on six benchmark datasets to show the effectiveness of this approach.READ FULL TEXT VIEW PDF
In recommender systems, cold-start issues are situations where no previo...
Collaborative filtering (CF) aims to build a model from users' past beha...
Collaborative Filtering (CF) is one of the most commonly used recommenda...
We present collaborative similarity embedding (CSE), a unified framework...
Recommender systems play a central role in providing individualized acce...
In a collaborative-filtering recommendation scenario, biases in the data...
We introduce the payload optimization method for federated recommender
Implementation of 'Top-n Recommendation on Graphs'
Recommender systems have become increasingly indispensable in many applications . Collaborative filtering (CF) based methods are a fundamental building block in many recommender systems. CF based recommender systems predict the ratings of items to be given by a user based on the ratings of the items previously rated by other users who are most similar to the target user.
CF based methods can be classified into memory-based methods and model-based methods [14, 15]. The former includes two popular methods, user-oriented  and item-oriented, e.g., ItemKNN , depending on whether the neighborhood information is derived from similar users or items. First they compute similarities between the active item and other items, or between the active user and other users. Then they predict the unknown rating by combining the known rating of top neighbors. Due to the simplicity of memory-based CF, it has been successfully applied in industry. However, it suffers from several problems, including data sparsity, cold start and data correlation , as users typically rate only a small portion of the available items, and they also tend to rate similar items closely. Therefore, the similarities between users or items cannot be accurately obtained with the existing similarity measures such as cosine and Pearson correlation, which in turn compromises the recommendation accuracy.
To alleviate the problems of memory-based methods, many model-based methods have been proposed, which use observed ratings to learn a predictive model. Among them, matrix factorization (MF) based models, e.g., PureSVD  and weighted regularized MF (WRMF) , are very popular due to their capability of capturing the implicit relationships among items and their outstanding performance. Nevertheless, it introduces high computational complexity and also faces the problem of uninterpretable recommendations. Because the rating matrix is sparse, the factorization of the user-item matrix may lead to inferior solutions . By learning an aggregation coefficient matrix , recently, sparse linear method (SLIM)  has been proposed and shown to be effective. However, it just captures relations between items that have been co-purchased/co-rated by at least one user . Moreover, it only explores the linear relations between items. Another class of methods use Bayesian personalized ranking (BPR) criterion to measure the difference between the rankings of user-purchased items and the remaining items. For instance, BPRMF and BPRKNN  have been demonstrated to be effective for implicit feedback datasets.
In this paper, we propose a novel Top- recommendation model based on graphs. This method not only takes into account the neighborhood information, which is encoded by our user graph and item graph, but also reveals hidden structure in the data by deploying graph regularization. In the real world, data often reside on low-dimensional manifolds embedded in a high-dimensional ambient space. Like the Netflix Prize problem, where the size of the user-item matrix can be huge, there exist relationships between users (such as their age, hobbies, education, etc.) and movies (such as their genre, release year, actors, origin country, etc.). Moreover, people sharing the same tastes for a class of movies are likely to rate them similarly. As a result, the rows and columns of the user-item matrix possess important structural information, which should be taken advantage of in actual applications.
To preserve local geometric and discriminating structures embedded in a high-dimensional space, numerous manifold learning methods, such as locally linear embedding (LLE) , locality preserving projection (LPP) , have been proposed. In recent years, graph regularization based non-negative matrix factorization  of data representation has been developed to remedy the failure in representing geometric structures in data. Inspired by this observation, to comprehensively consider the associations between the users and items and the local manifold structures of the user-item data space, we propose to apply both user and item graph regularizations. Unlike many existing recommendation algorithms, we first establish a new representation which is infused with the above information. It turns out that this new representation is not sparse anymore. Therefore, we perform recommendation task with this novel representation.
Let and represent the sets of all users and all items, respectively. The whole set of user-item purchases/ratings are represented by the user-item matrix of size . Element is 1 or a positive value if user has ever purchased/rated item , otherwise it is marked as . The -th row of denotes the purchase/rating history of user on all items. The -th column of is the purchase/rating history of all users on item . is the squared Frobenius norm of . Tr stands for the trace operator.
denotes the identity matrix.is the Hadamard product.
User-item rating matrix is an overfit representation of user tastes and item descriptions. This leads to problems of synonymy, computational complexity, and potentially poorer results. Therefore, a more compact representation of user tastes and item descriptions is preferred. Graph regularization is effective in preserving local geometric and discriminating structures embedded in a high-dimensional space. It is based on the well known manifold assumption : If two data points such as and are close in the geodesic distance on the data manifold, then their corresponding representations and
are also close to each other. In practice, it is difficult to accurately estimate the global manifold structure of the data due to the insufficient number of samples and the high dimensionality of the ambient space. Therefore, many methods resort to local manifold structures. Much effort on manifold learning has shown that local geometric structures of the data manifold can be effectively modeled through a nearest neighbor graph on sampled data points.
We adopt graph regularization to incorporate user and item proximities. In this paper, we construct two graphs: the user graph and the item graph. We assume that users having similar tastes for items form communities in the user graph, while items having similar appeals to users form communities in the item graph. Since “birds of a feather flock together", this assumption is plausible and turns out to indeed benefit recommender systems substantially in our experiment results. As an example for movie recommendation, the users are the vertices of a “social graph" whose edges represent relations induced by similar tastes.
More formally, we construct an undirected weighted graph on items, called the item graph. The vertex set corresponds to items with each node corresponding to a data point which is the -th column of . Symmetric adjacency matrix encodes the inter-item information, in which is the weight of the edge joining vertices and and represents how strong the relationship or similarity items and have. is the edge set with each edge between nodes and associated with a weight . The graph regularization on the item graph is formulated as
where is a diagonal matrix with , and is the graph Laplacian. To preserve the structural information of the manifold, we want (1) to be as small as possible. It is apparent that minimizing (1) imposes the smoothness of the representation coefficients; i.e., if items and are similar (with a relatively bigger ), their low-dimensional representations and are also close to each other. Therefore, optimizing (1) is an attempt to ensure the manifold assumption.
The crucial part of graph regularization is the definition of the adjacency matrix . There exist a number of different similarity metrics in the literature 
, e.g., cosine similarity, Pearson correlation coefficient, and adjusted cosine similarity. For simplicity, in our experiment, we use cosine similarity for explicit rating datasets and Jaccard coefficient for implicit feedback datesets. For binary variables, the Jaccard coefficient is a more appropriate similarity metric than cosine because it is insensitive to the amplitudes of ratings. It measures the fraction of users who have interactions with both items over the number of users who have interacted either of them. Formally, according to cosine definition, the similaritybetween two items and is defined as , where ‘
’ denotes the vector dot-product operation. For Jaccard coefficient,, where and represent intersection and union operations, respectively. Likewise, by defining the user graph whose vertex set corresponds to users , we get a corresponding expression . Here denotes the Laplacian of , which is similarly obtained from the data points corresponding to the users, that is, the rows of .
By exploiting both user and item graphs, our proposed model can be written as
The first term of (2) penalizes large deviations of the predictions from the given ratings. The last two terms measure the smoothness of the predicted ratings on the graph structures and encourage the ratings of nodes with affinity to be similar. They can alleviate the data sparsity issue to some extent. When the item neighborhood information is not available, user neighborhood information might exist, vice versa. The parameters and adjust the balance between the reconstruction error and graph regularizations.
By setting the derivative of the objective function of (2) with respect to to zero, we have
Equation (3) is the well known Sylvester equation, which costs or with a general solver. But in our situation, is usually extremely sparse, and and can also be sparse, especially for large and , so the cost can be or , or sometimes even as low as or . Many packages or programs are available to solve (3).
To use the reconstructed matrix to make recommendations for user , we just sort ’s non-purchased/non-rated items based on their scores in non-increasing order and recommend the top items.
To the best of our knowledge, there are very few studies on graph Laplacian in the context of recommendation task. Graph regularized weighted nonnegative matrix factorization (GWNMF)  was proposed to incorporate the neighborhood information in a MF approach. It solves the following problem
where is an indicator matrix. and are in latent spaces, whose dimensionality is usually specified with an additional parameter. The latent factors are generally not obvious and might not necessarily be interpretable or intuitively understandable. Here (4) has to learn both user and item representations in the latent spaces. In our approach, we just need to learn one representation and thus the learning process is simplified. On the other hand, and are supposed to be of low dimensionality, and thus useful information can be lost during the low-rank approximation of from and . The encoding of graph Laplacian on and might be not accurate any more. On the contrary, our method can better preserve the information in , so it can potentially give better recommendations than GWNMF. Moreover, it is well known that several drawbacks exist in the MF approach, e.g., low convergence rate, many local optimums of and due to the non-convexity of (4). In contrast, our model (2) is strongly convex, admitting a unique, globally optimal solution.
In this table, the “#users", “#items", “#trns" columns represent the number of users, number of items and number of transactions, respectively, in each dataset. The “rsize" and “csize" columns show the average number of ratings of each user and of each item, respectively, in each dataset. Column corresponding to “density" shows the density of each dataset (i.e., density=#trns/(#users#items)). The “ratings" column is the rating range of each dataset . The ratings in FilmTrust are real values with step 0.5, while in the other datasets are integers.
Table 1 shows the characteristics of the datasets. Delicious, lastfm and BX have only implicit feedback. In particular, Delicious was from the bookmarking and tagging information111http://www.delicious.com, in which each URL was bookmarked by at least 3 users. Lastfm represents music artist listening information222 http://www.last.fm , in which each music artist was listened to by at least 10 users and each user listened to at least 5 artists. BX is derived from the Book-Crossing dataset333http://www.informatik.uni-freiburg.de/ cziegler/BX/ such that only implicit interactions were contained and each book was read by at least 10 users.
FilmTrust, Netflix and Yahoo contain multi-value ratings. Specifically, FilmTrust is a dataset crawled from the entire FilmTrust website444http://www.librec.net/datasets.html. The Netflix is derived from Netflix Prize dataset555http://www.netflixprize.com/ and each user rated at least 10 movies. The Yahoo dataset is a subset obtained from Yahoo!Movies user ratings666http://webscope.sandbox.yahoo.com/catalog.php?datatype=r. In this dataset, each user rated at least 5 movies and each movie was rated by at least 3 users.
For fair comparison, we follow the dataset preparation approach used by SLIM  and adopt the 5-fold cross validation. For each fold, a dataset is split into training and test sets by randomly selecting one non-zero entry for each user and putting it in the test set, while using the rest of the data for training. Then a ranked list of size- items for each user is produced. We subsequently evaluate the method by comparing the ranked list of recommended items with the item in the test set. In the following results presented in this paper, is equal to 10 by default.
For Top- recommendation, the most direct and meaningful metrics are hit-rate (HR) and the average reciprocal hit-rank (ARHR) , since the users only care if a short recommendation list contains the items of interest or not rather than a very long recommendation list. HR is defined as , where #hits is the number of users whose item in the testing set is contained (i.e., hit) in the size- recommendation list, and #users is the total number of users. ARHR is defined as: , where is the position of the -th hit in the ranked Top- list. In this metric, hits that occur earlier in the ranked list are weighted higher than those occur later, and thus ARHR indicates how strongly an item is recommended.
We use 5-fold cross-validation to choose parameters for all competing methods and report their best performance in Table 2. It can be seen that the HR improvements achieved by our method against the next best performing scheme (i.e., SLIM) are quite substantial on lastfm, Yahoo, BX, FilmTrust datasets777Code is available at https://github.com/sckangz/CIKM16. For Delicious and Netflix datasets, our performance is close to the best performance of other methods. In most cases, there is no much difference among other state-of-the-art methods in terms of HR. Figure 1 shows the performance in HR of various methods for different values of (i.e., 5, 10, 15, 20 and 25) on all six datasets. Our method works the best in most cases.
Our model involves two trade-off parameters and , which dictate how strongly item and user neighborhoods and structure information contribute to the objective and performance. In Figure 2, we depict the effects of different and values on HR and ARHR for dataset FilmTrust and Yahoo. The search for ranges from 1e-6 to 1e-2 with points from , the search points for are from . As can be seen from all figures, our algorithm performs well over a wide range of and values. HR and ARHR share the same trend with varying and . Specifically, when is small, HR and ARHR both increase with . After a certain point, they begin to decrease. For FilmTrust, the performance with is very stable with respect to . This suggests that user-user similarity dominates the FilmTrust dataset.
To show how our method reconstructs the user-item matrix, we compare it with the method of next best performance, SLIM, on FilmTrust. The density of FilmTrust is 1.14% and the mean for those non-zero elements is 2.998. The reconstructed matrix from SLIM has a density of 83.21%. For those 1.14% non-zero entries in , recovers 99.69% of them and their mean value is 1.686. In contrast, the reconstructed matrix by our proposed algorithm has a density of 91.7%. For those 1.14% non-zero entries in , our method recovers all of them with a mean of 2.975. These facts suggest that our method better recovers than SLIM. In other words, SLIM loses too much information. This appears to explain the superior performance of our method.
In fact, above analysis is equivalent to the two widely used prediction accuracy metrics: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Since our method can recover the original ratings much better than SLIM, our algorithm gives lower MAE and RMSE. This conclusion is consistent with our HR and ARHR evaluation.
As we discussed previously, similarity is an important ingredient of graph construction. In many recommendation algorithms, the similarity computation is crucial to the recommendation quality . To demonstrate the importance of similarity metric, we use the cosine measure rather than the Jaccard coefficient to measure the similarity in binary datasets. We compare the results in Table 3. As demonstrated, for lastfm dataset, HR and ARHR increase after we adopt the cosine similarity. However, for Delicious and BX dataset, the Jaccard coefficient works better. Therefore, the difference of final results can be big for certain datasets with different similarity measures. We expect that the experimental results in Table 2 can be further enhanced if one performs a more careful analysis of and . For example, it has been reported that normalizing the similarity scores can improve the performance . Also, a number of new similarity metrics have recently been proposed, e.g., , which may be also exploited.
|User Graph||Item Graph||User-Item Graph|
Another important parameter is the neighborhood size , which cannot be known a priori . For some small datasets, setting a small may not include all useful neighbors and would infer incomplete relationships. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. We just use the fully connected graph in our experiments. To demonstrate this, we test the effects of neighborhood size with values on FilmTrust data. As can be seen from Figure 3, the neighborhood size indeed influences the performance of our proposed recommendation method. Specifically, the performance keeps increasing as increases when is small compared to the size of dataset, then the performance keeps almost the same as the final accuracy obtained in Table 2 as it becomes larger. This conforms that a small neighborhood size can not capture all similarity information.
While the overall improvements are impressive, it would be interesting to see more fine-grained analysis of the impact of user-user and item-item similarity graphs. We use FilmTrust and Yahoo datasets as examples to show the effects of user and item graphs. Table 4 summarizes the HR values obtained with user graph, item graph, and both user and item graph. It demonstrates that we are able to obtain the best performance when we combine user and item graph. Thus neighborhood information of users and items can alleviate the problem of data sparsity by taking advantage of structural information more extensively, which in turn benefits the recommendation accuracy.
In this paper, we address the demands for high-quality recommendation on both implicit and explicit feedback datasets. We reconstruct the user-item matrix by fully exploiting the similarity information between users and items concurrently. Moreover, the reconstructed data matrix also respects the manifold structure of the user-item matrix. We conduct a comprehensive set of experiments and compare our method with other state-of-the-art Top- recommendation algorithms. The results demonstrate that the proposed algorithm works effectively. Due to the simplicity of our model, there is much room to improve. For instance, our model can be easily extended to include side information (e.g., user demographic information, item’s genre, social trust network) by utilizing the graph representation. In some cases, external information is more informative than the neighborhood information.
This work is supported by the U.S. National Science Foundation under Grant IIS 1218712, National Natural Science Foundation of China under grant 11241005, and Shanxi Scholarship Council of China 2015-093. Q. Cheng is the corresponding author.
Proceedings of the 24th International Conference on Artificial Intelligence, pages 3569–3575. AAAI Press, 2015.