Nowadays, location-based networks, e.g., Foursquare and Yelp, are becoming more and more popular. These platforms provide kinds of point of interests (POIs) such as hotels, restaurants, and markets, which makes our lives much easier than before. Meanwhile, the problem of “where to go” starts to bother people, since it is time-consuming for people to find their desired places from so many POIs. POI recommendation appears to address such a problem by helping users filter out uninteresting POIs and saving their decision-making time [28, 8].
Among the existing research from POI recommendation communities, Matrix Factorization (MF) techniques draws a lot of attention [6, 5, 26], and it has proven to be attractive in many recommendation applications [12, 4, 11, 3]. However, existing MF approaches train the recommendation model in a centralized way. That is, the recommender system, actually the organization who built it, first needs to collect the action data of all the users on all the POIs, and then trains a MF model. The shortcomings of this kind of centralized MF methods are mainly three-folds. (1) High cost of resources. Centralized MF not only needs large storage to store the collected user action data on POIs, but also requires huge computing resources to train the model, especially when datasets are large. (2) Low model training efficiency. Centralized MF model is trained on a single machine or a cluster of machines. Thus, the model training efficiency is restricted to the number of machines available. (3) Shallow user privacy protection. In recommender systems, user privacy is an importance concern , especially in POI scenarios. Most people do not want to release their activities on location based network, not only to other people, but also to the platforms, for privacy concerns. However, centralized MF trains model on the basis of having all the users’ activities data. In fact, the safest way is that users’ data are kept in their own hand, without sending them to anyone or any organization (platform).
To solve the above problems caused by centralized training of MF, we propose Decentralized MF (DMF) framework. That is, instead of training user and item latent factors in a centralized way by obtaining all the ratings, we train MF model in each user’s end, e.g., user’s cell phone and Pad. Figure 1 shows the difference between centralized and decentralized learning. DMF treats each user’s device as an autonomous learner (individual computation unit). As we know, the essence of MF is that user and item latent factors are learnt collaboratively, so should be DMF. Three key challenges exist when making users collaborate in DMF meanwhile preserving user privacy. The first challenge is which user should be communicated. We answer this question by analyzing the data in POI recommendation scenarios, and propose nearby user communication on user adjacent graph, which is built based on users’ geographical information. Along with this, the second challenge naturally appears: how far should users communicate with their neighbors. To address this, we present a random walk method to enhance local user communication. That is, we use random walk technique to intelligently select users’ high-order neighbors instead of only their direct neighbors. The third and also the most serious challenge is what information should users communicate with each other without leaking their data privacy. To solve this challenge, we decompose item preference into global (common) and local (personal) latent factors, and allow users communicate with each other by sending the gradients of the global item latent factors.
Our proposed DMF framework successfully deals with the shortcomings of centralized MF. (1) Each user only needs to store his own latent factor, and items’ latent factors. The computation is also cheap: each user only need to update the corresponding user and item latent factors when he (his neighbors) rates (rate) an item. (2) Since each user updates the DMF model on his own side, it can be taken as a distributed learning system with #users as #machines, which makes the model efficient to train. (3) The ratings of each user on items are still kept on one’s own hand, which avoids user’s privacy being disclosed.
We summarize our main contributions as follows:
We propose a novel DMF framework for POI recommendation, which is scalable and is able to preserve user privacy. To our best knowledge, it is the first attempt in literature.
We propose an efficient way to train DMF. Specifically, when a user rates an item on his side, the user and item gradients are first calculated. We then present a random walk based technique for users to send the item gradient to their neighbors. Finally, the correspoding user/item latent factors are updated using stochastic gradient descent.
Experimental results conducted on two real-world datasets demonstrate that DMF can achieve even better performance compared with the classic and state-of-the-art latent factor models in terms of precision and recall. Parameter analysis also shows the effectiveness of our proposed random walk based optimization method.
In this section, we review some necessary backgrounds which form the basis of our work, i.e., (1) Matrix Factorization (MF) in POI Recommendation, (2) decentralized learning.
Matrix Factorization in POI Recommendation
where and denote the latent factors of user and item , respectively, and denotes the known rating of user on item . We will describe other parameters in details later.
MF and its variants have been extensively applied to POI recommendation due to their promising performance and scalability [6, 5, 26]. However, these methods are all trained by using the centralized mechanism. This centralized MF training results in expensive resources required, low model training efficiency, and shallow protection of user privacy.
Decentralized learning appears to solve the above problems of centralized learning [18, 24]. Recently, it has been applied in many scenarios such as multiarmed bandit , network distance prediction , hash function learning , and deep networks [22, 16].
The most similar existing works to ours are decentralized matrix completion [15, 29]. However, we summarize the following two major differences. (1) They either allow each learner (user) to communicate with those who have rated the same items or communicate with all the learners, and thus have low accuracy or high communication cost. In practice, users always collaborate with their affinitive users. To capture this, in this paper, we propose a random walk approach for users from the adjacent graph to collaboratively commununicate with each other. (2) They allow directly exchange of item preference among learners, which may cause information leakage. For example, it is easy to be hacked by using the idea of collaborative filtering , i.e., similar items tend to be preferred by similar users. Assume user is a malicious user, he has his own latent factor of item (). He also gets the latent factor of item from user (). If likes item , and and are similar, then will know likes as well. In contrast, in this paper, we propose a gradient exchange scheme to limit the possibility of privacy leakage.
The Proposed Model
In this section, we first formally describe the Decentralized Matrix Factorization (DMF) problem. We then discuss a nearby user communication scheme for users to collaborate with each other. Next, we propose an enhanced version by applying random walk theory. Then, we present a privacy preserving nearby user collaboration algorithm to optimize DMF. We analyze the model complexity in the end.
Formally, let and be the user and item (POI) set with and denoting user size and item size, respectively. Let be an interaction between user and item , and be the rating of user on item . Without loss of generality, we assume in this paper. Let be the training dataset, where all the user-item ratings in it are known.
For the traditional centralized MF, it first collects all the , and then learns and using MF technique (Equation 1). Here, and
denote the user and item latent factor matrices, with their column vectorsand be the -dimensional latent factors for user and item , respectively.
For DMF, to guarantee the privacy of each user, we need to keep all the known ratings and latent factors on each user’s end during the whole training procedure. To do this, we use to denote user latent factor matrix, with each column vector denotes the -dimensional latent factors for user . We also use
to denote item latent factor tensor, withdenotes the item latent factor matrix for user , and further with denotes the -dimensional latent factors for item of user . Thus, each user only needs to store ’s own -dimensional latent factor , and ’s item latent factor matrix .
Besides, users need to collaboratively learn their stored factors, i.e., and , in DMF scenario. For centralized MF, all the users share the same item latent factor matrix, i.e., . For DMF, each user stores his and , and they should be trained collaboratively with other users—which we call ‘neighbors’. Suppose we have a user adjacent graph , we use to denote user adjacency matrix, where each element denotes the degree of relationship between user and . Of course, user and have no relationship if . We use to denote the order neighbors of on , as the neighbor size, and . Obviously, denotes the direct neighbors of . Besides, to save communication cost, we use to denote the maximum number of direct neighbors of each user.
DMF aims to learn and for each user, and the model learning procedure is performed on one’s own side, e.g., cell phone and Pad.
Nearby User Communication
The essence of MF is that the user and item latent factors are learnt collaboratively, so should be DMF. Thus, which user should be communicated under DMF framework becomes the first challenging question. We answer this question by first analyze the data in POI recommendation scenarios.
Observation. Figure 2 shows the user-POI check-in distributions on two real datasets, i.e., Foursquare and Alipay. Both datasets contain user-item-check-in-location records, and we divide locations into different cities. We randomly select the user-item check-in records in five cities from both datasets, and plot their check-in distributions in Figure 2, where each dot denotes a user-item check-in record. From it, we have the following observation: in POI scenarios, most users are only active in a certain city, and we call it “location aggregation”. Only a few users be active in multi-cities, which is neglectable.
User Adjacent Graph. We represent the affinities among users based on the definition of a user adjacency graph. It can be built using whatever information that is available, e.g., rating similarity  and user social relationship . However, in DMF for POI recommendation scenario, users’ ratings are on their own hand, and social relationship is not always available. Thus, we focus on using user geographical information to build user adjacent graph, similar as the existing researches [28, 7, 5]. Specifically, suppose is the distance between user and , the relationship degree between and is defined as
where , is the indicator function that equals to 1 if and are in the same city and 0 otherwise, and is a mapping function of distance and relationship degree, and the smaller the distance of and is, the bigger their relationship degree is. Existing research has proposed different such mapping functions . In practice, it’s extremely expensive if we maintain the communications for those super-users who have a huge number of neighbors. Thus, we set to be the maximum number of neighbors each user can have. With nearby user communication schema, the first question is answered.
Random Walk Enhanced Nearby User Communication
With the adjacency matrix representing the communication graph among users, how far should users communication with their neighbors becomes the second challenging question. Practically, a user’s decision on an item is not only affected by his direct neighbors, but also the further neighbors, e.g., the neighbors of neighbors. Thus, when a user rates an item , this information should not only be sent to the directly neighbors of user , but also his further neighbors, as shown in Figure 3. The challenge remains to be how far to explore the network, since there is a tradeoff between decentralization and communication/computation cost: the further the communication is, the more users can collaborate, meanwhile, the more communication and computation need to be done. We propose to solve this challenge by using random walk theory, which has been used to model trust relationship between users .
Random walk. We aim to find an intelligent way to determine how far a user communicate with his neighbors. Assuming user wants to communicate with his direct neighbors (), we define as the activity of user selecting a user from his neighbor set, and thus
According to the Markov property 
, the probability of userchoosing his order of neighbors () is
We use to denote the max distance of random walk, and generally, the adjacent matrix of order of neighbors is . With random walk theory on user adjacent graph, the second question is answered.
DMF: Privacy Preserving Nearby User Communication.
The random walk based nearby user communication is an intelligent way for selecting neighbors to be communicated within POI recommendation scenarios, but the third challenging question remains: what information should users communicate with each other, that is, how should users collaboratively learn the DMF model without leaking their data privacy. The original rating, of course, can clearly reflect his preference on this item. However, the rating itself discloses the user’s privacy too much. Inspired by the work , we propose a privacy preserving collaborative approach for decentralized POI recommendation scenarios. Specifically, we suppose that for each user, the corresponding -th item latent factor can be decomposed as follows:
which implies that the latent factor of item for user is the sum between one common (global) latent factor and one personal (local) latent factor , where the common factor represents the common preference of all the users while the personal factor shows the personal favor of user . Under this assumption, the DMF model can be formulated as
where can be least square loss
which minimizes the error between real ratings and predicted ratings , or listwise loss , as well as pairwise loss . In this paper, we will take least square loss as an example. The last three terms in Equation (6) are regularizers to prevent overfitting.
The factors and only depend on the information stored in user , while the item common latent factor depends on the information of all the users. Practically, in decentralized learning scenario, is also saved on each user’s (learner’s) hand, and thus, for each user , is actually saved as . Consequently, Equation (5) becomes
Thus, it needs one protocol for users to exchange to learn a global . To solve this issue, inspired by , we propose to send the gradient of the loss with respect to , i.e., for each user , to his neighbors, to help learn a global . This gradient exchange method has been successfully applied in decentralized learning scenarios [18, 24], which not only guarantees model convergency, but also protects the privacy of user raw data. For each user , the gradients of with respect to , , and are:
Based on the above gradient exchange protocol, users collaboratively learn a global . Figure 3 shows a demo of this protocal: will send to his neighbors, to collaboratively learn . Combining our proposed random walk enhanced nearby user communication method and gradient exchange protocol, we summarize our proposed privacy preserving DMF optimization framework for POI recommendation (Equation 6) in Algorithm 1. As we can see that users communicate with each other by sending the common item latent factor gradients instead of raw data, i.e., ratings, which significantly reduces the possibility of information leakage. With the gradient exchange protocol, the third question is answered.
From the objective function of DMF in Equation (6), we can easily make the following observations:
If is very large, then , and thus users will not exchange item common preferences, which means that item preference is learnt only based on their own data.
If is very large, then , which indicates that users will not save their personal favor on items anymore. It will work more like centralized MF.
The values of and determine how well item (common and personal) preferences are learnt. We will empirically study their effects on our model performance in experiments.
Unobserved rating sample.
A universal problem in POI recommendation is that the observations are extremely sparse. Unless we have access to negative observations, we will probably obtain an estimator that tends to predict all the unknownas 1. Following the existing researches , we solve this problem by sampling unobserved during SGD optimization. Specifically, for each , we randomly sample missing entries and treat them as negative examples, i.e., . However, a missing entry can denote either does not like or does not know the existing of . Therefore, we decrease the confidence of to .
Here we analyze the communication and computation complexity of Algorithm 1. Recall that denotes the dimension of latent factor, denotes the maximum number of direct neighbors of each user, denotes the max distance of random walk, as the training data, and as its size.
Communication Complexity. The communication cost depends on both the length of item gradient and number of neighbor to be communicated. Each item gradient contains bytes information, since it is a dimensional real-valued vector. For user , the max number of neighbors to be communicated of order random walk is , where is the number of users in the current city of . Thus, for each , the communication cost is bytes. It will be bytes for passing the training dataset once. The values of , , and are usually small, and thus the communication cost is linear with the training data size.
Computation Complexity. The computation cost mainly relies on two parts, (1) calculating gradients, i.e., Line in Algorithm 1, and (2) updating user and item latent factors, i.e., Line in Algorithm 1. For a single pass of the training data, the time complexity of (1) is , and the time complexity of (2) is . Therefore, the total computational complexity in one iteration is . The values of , , and are usually small, and thus the time complexity is linear with the training data size.
The above communication and computation complexity analysis shows that our proposed approach is very efficient and can scale to very large datasets.
In this section, we empirically compare the performance of DMF with the classic centralized MF models, we also study the effects of parameters on model performance.
We first describe the datasets, metrics, and comparison methods we use during our experiments.
Datasets. We use two real-world datasets, i.e., Foursquare and Alipay. Foursquare is a famous benchmark dataset for evaluating a POI recommendation model . We randomly choose two cities for each country from the original dataset, and further remove the users and POIs that have too few or too many interactions111The reason we sample a small dataset is: we mock decentralized learning during our experiments, that is, there will be POI (global and local) latent matrices in total, which actually is not a small scale.. Our Alipay dataset is sampled from user-merchant offline check-in records during 2017/07/01 to 2017/07/31, and we also perform similar preprocess on it. Table 1 shows the statistics after process for both datasets, with which we randomly sample 90% as training set and the rest 10% as test set.
where denotes the visited POI set of user in the test data, and denotes the recommended POI set of user which contains POIs.
Comparison methods. Our proposed DMF framework is a novel decentralized algorithm for POI recommendation, which is a decentralized version for the classic MF model . We compare our proposed DMF with the following classic and state-of-the-art latent factor models, including several variants of DMF:
MF  is a classic centralized latent factor model which uses least square loss.
Bayesian Personalized Ranking (BPR)  is the state-of-the-art centralized latent factor model which uses pairwise loss.
Global DMF (GDMF). Users will not save their personal favor () anymore, and they tend to share similar latent factor for the same item. This is a special case of our proposed DMF model, i.e., when is very large.
Local DMF (LDMF). Users will not exchange preferences and they learn the model only based on their own data. This is also a special case of our proposed DMF model, i.e., when is very large.
Note that we do not compare with the state-of-the-art POI recommendation methods. This is because, (1) most of them are the improvement of the classic MF model by using additional information, e.g. user social information and contextual information [5, 26, 8], which are not fair to compare with, and (2) our focus is to compare the effectiveness of the traditional centralized latent factor models and our proposed decentralized MF model.
Hyper-parameters. During comparison, we set user regularizer , learning rate , and the returned number of POI . We also set the maximum number of neighbor , and the number of sampled unobserved ratings . After we build the user adjacent graph, we simply set to eliminate the effect of mapping function on model performance, since this is not the focus of this paper. For the latent factor dimension , we vary its values in . For the random walk distance , we vary its values in . We also vary and in to study their effects on DMF. We tune parameters of each model to achieve their best performance for comparison.
All the model performances increase with the dimension of latent factor (). This is because the latent factor with a bigger contains more information, and thus user and POI preferences can be learnt more precisely.
GDMF achieves comparable performance with MF, since users collaboratively learn the global item latent factor, which works similarly as the traditional MF.
LDMF behave the worst, since each user learn item preference only based on his own check-in data which is very sparse. This indicates the importance of user collaboration during recommendation.
DMF consistently outperforms MF, GDMF, and LDMF, and even beats the pairwise ranking model (BPR) in most cases. Take the result of DMF on Alipay dataset for example, and of DMF improve those of MF by 27.75% and 25.89% when . This is because, the item preference of DMF contains not only common preference that obtained from all the users’ data, but also each user’s personal preference that learnt from his own data. Moreover, the random walk technique intelligently help users to choose neighbors to communicate with. Therefore, item preferences are learnt more precisely, to better match each user’s favor.
Finally, we study the effects of parameters on DMF, including item global and local regularizer ( and ), maximum random walk distance (), and maximum iteration ().
Effect of and .
The item global regularizer () and local regularizer () controls the proportion of one’s item preference comes from his own data or other users’ data. The bigger is, the more one’s item preference comes from his own data, and similarly, the bigger is, the more one’s item preference comes from other users’ data through global item gradient () exchange. Figure 5 shows their effects on DMF on both datasets. From it, we can find that, with the good choices of and , DMF can make full use of one’s own data and his neighbors’ data, and thus, achieves the best performance.
Effect of maximum random walk distance (). The maximum random walk distance determinates how many neighbors will be communicated after a user has interaction with a POI, as we described in complexity analysis section. Figure 6 shows its effect on DMF model on both Foursquare and Alipay datasets, where we set and fix other parameters to their best values. From it, we see that with the increase of , our model performance increases, and tends to be relative stable when is bigger than 3. This shows that DMF achieves a good performance with only a small value of , which indicates a low cost of communication complexity.
Effect of maximum iteration (). As we analyzed above, the computing time complexity is linear with the training data size, and thus, the converge speed determines how long DMF should be trained. Figure 4 shows the effect of on training loss and test loss on both Foursquare and Alipay datasets. As we can observe, DMF converges steadily with the increase of
, and it takes about 100 epochs to converge onFoursquare and about 200 epochs on Alipay.
Conclusion and Future Work
In this paper, we proposed a Decentralized MF (DMF) framework for POI recommendation. Specifically, we proposed a random walk based nearby collaborative decentralized training technique to train DMF model in each user’s end. By doing this, the data of each user on items are still kept on one’s own hand, and moreover, decentralized learning can be taken as distributed learning with multi-learner (user), and thus solves the efficiency problem. Experimental results on two real-world datasets demonstrate that, comparing with the classic and state-of-the-art latent factor models, DMF significantly improvements the recommendation performance in terms of precision and recall.
We would like to take model compression as our future work. Currently, each user needs to store the real-valued item latent matrix. A binary type of latent matrix will significantly reduce the storage cost. How to balance the model storage and model accuracy will be our next stage of work.
Reversible markov chains and random walks on graphs. Berkeley. Cited by: Random Walk Enhanced Nearby User Communication.
-  (2002) Collaborative filtering with privacy. In Security and Privacy, 2002. Proceedings. 2002 IEEE Symposium on, pp. 45–57. Cited by: Introduction.
-  (2016) Capturing semantic correlation for item recommendation in tagging systems.. In AAAI, pp. 108–114. Cited by: Introduction.
-  (2014) Context-aware collaborative topic regression with social matrix factorization for recommender systems.. In AAAI, Vol. 14, pp. 9–15. Cited by: Introduction.
-  (2012) Fused matrix factorization with geographical and social influence in location-based social networks. In AAAI, Vol. 12, pp. 17–23. Cited by: Introduction, Matrix Factorization in POI Recommendation, Nearby User Communication, Setting, Setting.
-  (2011) Exploring millions of footprints in location sharing services.. Proceedings of the 5th International AAAI Conference on Weblogs and Social Media 2011, pp. 81–88. Cited by: Introduction, Matrix Factorization in POI Recommendation.
-  (2011) Friendship and mobility: user movement in location-based social networks. In SIGKDD, pp. 1082–1090. Cited by: Nearby User Communication.
-  (2015) Content-aware point of interest recommendation on location-based social networks. In AAAI, pp. 1721–1727. Cited by: Introduction, Setting, Setting.
-  (2009) Trustwalker: a random walk model for combining trust-based and item-based recommendation. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 397–406. Cited by: Random Walk Enhanced Nearby User Communication.
-  (2014) Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory 60 (4), pp. 2331–2345. Cited by: Decentralized Learning.
-  (2011) Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy. In Proceedings of the 5th ACM conference on Recommender Systems, pp. 165–172. Cited by: Introduction.
-  (2009) Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. Cited by: Introduction, Matrix Factorization in POI Recommendation.
Hashing for distributed data.
International Conference on Machine Learning, pp. 1642–1650. Cited by: Decentralized Learning.
-  (2010) Network distance prediction based on decentralized matrix factorization. Proceedings of the 2010 International Conference on Research in Networking, pp. 15–26. Cited by: Decentralized Learning.
-  (2012) Decentralized low-rank matrix completion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2925–2928. Cited by: Decentralized Learning.
-  (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629. Cited by: Decentralized Learning.
-  (2007) Probabilistic matrix factorization. In Proceedings of the 20th Advances in Neural Information Processing Systems, pp. 1257–1264. Cited by: Matrix Factorization in POI Recommendation, DMF: Privacy Preserving Nearby User Communication., 1st item, Setting.
-  (2009) Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control 54 (1), pp. 48–61. Cited by: Decentralized Learning, DMF: Privacy Preserving Nearby User Communication..
BPR: bayesian personalized ranking from implicit feedback.
Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pp. 452–461. Cited by: DMF: Privacy Preserving Nearby User Communication., 2nd item.
-  (2001) Item-based collaborative filtering recommendation algorithms. In WWW, pp. 285–295. Cited by: Decentralized Learning.
-  (2010) List-wise learning to rank with matrix factorization for collaborative filtering. In Proceedings of the 4th ACM conference on Recommender systems, pp. 269–272. Cited by: DMF: Privacy Preserving Nearby User Communication..
Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: Decentralized Learning.
-  (2009) A survey of collaborative filtering techniques. Advances in Artificial Intelligence 2009, pp. 4. Cited by: Nearby User Communication.
-  (2013) Distributed autonomous online learning: regrets and intrinsic privacy-preserving properties. IEEE Transactions on Knowledge and Data Engineering 25 (11), pp. 2483–2493. Cited by: Decentralized Learning, DMF: Privacy Preserving Nearby User Communication., DMF: Privacy Preserving Nearby User Communication..
-  (2016) Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Transactions on Intelligent Systems and Technology (TIST) 7 (3), pp. 30. Cited by: Setting.
-  (2013) A sentiment-enhanced personalized location recommendation system. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, pp. 119–128. Cited by: Introduction, Matrix Factorization in POI Recommendation, Setting.
-  (2011) Like like alike: joint friendship and interest propagation in social networks. In Proceedings of the 20th International Conference on World Wide Web, pp. 537–546. Cited by: Nearby User Communication, DMF: Privacy Preserving Nearby User Communication..
-  (2011) Exploiting geographical influence for collaborative point-of-interest recommendation. In SIGIR, pp. 325–334. Cited by: Introduction, Nearby User Communication.
-  (2014) NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proceedings of the VLDB Endowment 7 (11), pp. 975–986. Cited by: Decentralized Learning.
-  (2016) A survey of point-of-interest recommendation in location-based social networks. arXiv preprint arXiv:1607.00647. Cited by: Nearby User Communication.