The rapid growth of cities has led to an increase in the number of points of interest (POIs) such as; restaurants, theaters, stores, and hotels which provide some sort of entertainment and enrich people’s lives as well as providing us with more choices of life experiences than ever before. People are willing to explore the city and their neighborhood in their daily lives and decide which places to go according to their interest and various choices of POIs. However, because of the large number of possible POIs across cities, complexity of modern cities, and unfamiliarity to new individuals in these cities; finding a POI or making a satisfactory decision efficiently becomes a problem for people. Fortunately, advanced mobile devices embedded with wireless communication and location based social networks (LBSN) applications such as Foursquare, Yelp, and Facebook have become increasingly popular. These applications have become some of the most popular Internet applications and have attracted millions of users as they help solve the problem of finding possible places of interest in a specific area Bao et al. (2015). Through these applications individuals share their footprint, opinions, experiences and contribute assorted forms of location-specific multimedia content and may also declaring their presence by an action known as a check-in which is very helpful for individuals wishing to find new restaurants, events, bars, hotels and so on.
Nonetheless, despite the availability of information generally presented by LBSNs, a user is still subjected to an overwhelming amount of information which in most cases is biased by the popularity of a POI rather than individual preference. More specifically to this study, a user traveling or migrating to a new geographical region with no prior information about the new geographical region will have a hard time deciding which POI to visit aside from the tourist attractions. People are usually active only within small geographical regions within their home city. While it is easy to associate users when they visit similar sets of venues in the same geographical region, it is interesting and challenging to investigate ways to correlate users across different regions based on their local behavior. In this paper we propose New Region Location Recommender System (NRLRS), a system that makes relevant recommendations in a new location based on an individual’s preferences discovered from their collective reviews and ratings obtained from their history data from previous regions visited.
2 Problem formulation
Suppose that there are users and POIs in a given geographical area , where as set of geographical regions. Let denote a rating matrix from region , where indicates rating of user on POI and zero is unknown or not rated. In LBSNs users are connected to each other explicitly (i.e. friendship) or implicitly through a similarity function creating similar user neighborhoods. For this denoting a user relationship (similarity in our cases) matrix where represents the strength of the relationship between . Users rate POIs and write reviews and thus each user is associated with a set of reviews. For this content we denote a given a vocabulary of words , where represents the importance of a word to user based on how often the user uses this word in expressing their preferences. In matrix we treat each user as a document and its content as all words that they have ever written that appear in vocabulary. Given where . We aim to make personalized recommendation of locally interesting POI in a region to users from a region when they migrate to or visit region where is geographically different from .
3 New Region Location Recommender
3.1 Baseline model: Latent factor Model
We adopt the latent factor modelKoren and Bell (2011) based on matrix factorization as the baseline rating prediction model as follows;
where is the global offset (average across the dataset), and are user and POI biases and and are - dimensional user and POI factors respectively. We consider as user preferences towards POIs and as POI properties hence the dot product matches the interaction between a user and POI. This gives us the following optimization problem;
where and have a non-zero rating in rating matrix and are weights that control the capability of in order to avoid over-fitting at which we use and as smoothness regularization terms. Therefore, a user traveling to a different region will get a prediction of a locally interesting POI mainly influenced by the global offset and the user’s and item’s rating bias.
3.2 Integrating Rating with Review: Latent Dirichlet Allocation (LDA)
LDA uncovers hidden dimensions in a review text from which characteristics such as categories, quality and services of a POI reviewed by users can be deduced. We add an LDA component to our basic model as a regularizer so as to control
(user latent vector) by giving us more information about the user. Therefore, our optimization problem changes to the following;
where LDA parameters and denote the topic and word distributions, respectively; and are the n word occurring in user and the corresponding topic, and control contribution of the LDA regularization term addition effect of the user review. represents the regularization terms . Ratings and reviews are fused through this transformation;
where the parameter is used to control the quality of the transformation being peaky and is the summation across each latent topics . In this transformation we expect that the real valued parameters in the user preference vector associated with ratings to be transformed to the probabilistic ones associated with the reviews. We adopt the Hidden Factors as Topics (HFT) algorithm McAuley and Leskovec (2013) as a component of our proposed system.
3.3 Integrating Social Influence
Most LBSNs recommendation system only consider direct friendships or users with physically overlapping visits to POI as a basis for social influence to improve accuracyBao et al. (2015)Cheng et al. (2012)Gao et al. (2012)Li et al. (2016)Gao et al. (2015). However, they are less effective when a targeted user has very few social connection or location history. We use a weighted hierarchical category approach developed in Adomavicius and Tuzhilin (2005) to model user preferences to form a basis for similarity comparison between users irrespective of their geographical region. We extend this similarity computation in Adomavicius and Tuzhilin (2005) to add user reviews to add more descriptive understanding of a user’s preference. We use LDA Model to discover users to words interaction and we achieve the optimal solution using the Gibbs sampling. Using this similarity information we build our similarity matrix containing similarity between any two users and . This matrix assists us in providing additional information for building a neighborhood of similar users to offer social influence hence local opinion in a new geographical region unfamiliar to the user. We integrate Social Influence from a user’s nearest neighbors in our Recommender Systems to give us the final objective function;
where is the similarity between two user and , is the user vector from the user latent factor matrix and is the social correlation matrix. , and are introduced as weights to control the contribution of the social correlation and over-fitting respectively.
3.4 Integrating POI Characteristics
In LBSNs a POI characteristics affects its rating Liu and Xiong (2013)Pavan et al. (2017)Gonzalez et al. (2008). Therefore the more information we have about a POI the more accurate a recommendation of a particular POI. In this study, we assume a user’s rating to a given business is determined by its intrinsic and extrinsic characteristics of its geographical neighbors. Therefore, we divide the POI into latent factors of extrinsic properties its geographical neighbors and latent factor of intrinsic properties its categories and rating of its neighbors.
Several studies Cheng et al. (2012)Adomavicius and Tuzhilin (2005)Liu et al. (2014)Rhee et al. (2011)Zhang and Chow (2015)He et al. (2016)Yu et al. (2016) have linked geographical influence to improving POI recommendation systems. Therefore, we incorporate the geographical neighborhood influence to improve the accuracy of business rating prediction. Let be a set of geographical neighbors for a business , satisfying certain criterion selection (e.g. the top ten nearest neighbors). Let be neighbors of business . We consider POI category to be important because it gives an indication of the services or activities that take place or the way the business is conducted at a POI. We annotate the POI with a category information by integrating a category latent factor vector per category. This implies that similar category of POI tend to influence each other’s rating. This gives us our new prediction ratings computation;
where and are weights that control the importance of the influence of the geographical neighborhood, and and denoting the cardinality of the set of neighbors. We add and to the objective function as regularization terms as shown;
We finally incorporate the POI popularity as an indicator of quality of services or product the POI offers. Studies Gao et al. (2012)Li et al. (2016)Yu et al. (2016) have shown that popularity can influence user check-in behaviors to a great extent. We model popularity using Liu et al. (2013) normalized popularity score and integrate it to our recommendation model as shown;
where controls the contribution of the popularity to the prediction rating.
4 Model training
Finally, our objective function that we wish to optimize in order to make accurate prediction is as follows.
where is our objective function which we wish to minimize. represents the parameter set i.e. the users, POI, social correlation, POI neighbor and category latent factors which are associated with the ratings and social relation and represents the parameters associated with the review text. Parameter set are the latent topics and controller for transformation between ratings and reviews. are the regularization terms as follows;
We use stochastic gradient descent approach (GD) to find our optimal solution. The connection between ratings and social influence is the realized through the users latent feature space, and ratings and reviews are linked through the transformation involving and through equation 4. Our objective function optimal solution can be found by gradient descent and the latter by Gibbs sampling; so, we design a procedure alternating between following two steps;
The first step Equation 11, we fix the sampling phase or topic assignments for each word in reviews corpus as we update the terms and by gradient descent. and depend on each other; we fit only and then determine by equation 4. The second step equation 12, all parameters associated with reviews corpus and are fixed; then we sample topic assignments by iterating through all docs and each word within, setting with probability proportion to . This is similar to updating via LDA except that topic proportions are not sampled from a Dirichlet distribution, but instead are determined using equation 11. Finally, these two steps are repeated until a local optimum is reached.
5 Experimental results
In this section, we provide an empirical evaluation of the performances of the proposed model.
5.1 Experimental dataset
We evaluated our model using the Yelp Dataset Challenge111https://www.yelp.com/dataset_challenge comprising of; 2.2M reviews and 591K tips by 552K users for 77K businesses; 566K business attributes, e.g., hours, parking availability, ambience; Social network of 552K users for a total of 3.5M social edges; Aggregated check-ins over time for each of the 77K businesses; 200,000 pictures from the included businesses. This dataset is collected from 10 cities belonging to 4 countries.
We selected Phoenix and Las Vegas based on their relatively larger amounts of ratings and reviews data coupled with a high number of overlapping users (users with activities in both cities). In our experiments target users are considered as the users from Phoenix with ratings in Las Vegas and vice versa. It should be noted that for simplicity we consider a city as our geographical region for testing. The dataset statistics for the two cities are shown in Table 1. The Yelp dataset provided does not explicitly contain a user’s home location or address. Therefore, user’s most active city is assumed to be user’s home location. A user’s activity refers to the total count of ratings and reviews left by the user at POIs in a given city. We use the local ratings/reviews as the training set, including 1-3 foreign reviews for our target users and use the remaining set of foreign ratings and reviews as test data.
|#users with review||65191||173703|
|#users with review||65191||173703|
|#Min/Max review per Business||1/1354||Jan-37|
|#Min/Max review per User||1/607||1/1126|
5.2 Evaluation Metrics
We adopt the Mean Absolute Error (MAE) and normalized MAE (rMAE) to measure the accuracy of predicted ratings which measures the average absolute deviation between a predicted rating and the user’s true ratings. MAE is defined as follows;
where denotes the number of tested ratings, is the real rating, and is the predicted rating. This approach is used because the predicted rating values create an ordering across the items, predictive accuracy can also be used to measure the ability of a recommender system to rank items with respect to user preferenceBao et al. (2015).
5.3 Experimental Evaluation
To evaluate the effectiveness of our proposed solution, we compare it with the following baseline approaches; User-KNNSarwar et al. (2001)Spertus et al. (2005) and Item-KNN Sarwar et al. (2001), we set the neighborhood size k = 150; User Cluster(UC)Cheng et al. (2012); CKNN Gao et al. (2012) ; SVD++Koren (2008); HFTMcAuley and Leskovec (2013); and our model NRLRS. We use librec222https://www.librec.net/ a recommendation system library in java for algorithms implementation and extensionBao et al. (2012). For all the latent factor models we set the default factor otherwise stated. We set the learning rate , and the weights and .
The results for the two cities in table 2 show the neighborhood models UKNN and IKKN as the least performing models. This is expected because when a user moves to a geographical region where they have little or no activity history due to limited information to match them with other users. Neighborhood approaches consider overlapping visited POI/items between users to determine similarity in preferences therefore this information is limited for a user with few ratings leading to a coldstart problem. CKNN and User Clusters methods performance is slightly better because in this approaches we incorporate user’s category preferences from previous cities activity history to build a user preferences. The latent factors models (HFT and NRLRS) outperform the neighborhood models because of ability to exploit and incorporate active users reviews from their previous geographical region into the new region. Our NRLRS incorporates variety information which helps us model POI properties and user preferences better for a new user in addition to integrating the neighborhood model feature of social influence. Different cities show different prediction accuracy values due to difference in the datasets statistics and patterns specific to individual cities, however the consistency in performance is shown across the different algorithms.
We test for performance of the latent factor models by varying the number of latent factors assigned.Thus, we adjust the number of latent factors and record the prediction accuracy results for each algorithm per city. This is tested with respect to the earlier wisdom that latent factor model tend to use more factors, hence an increase in the factors is expected to show an increment in ratings predictionKoren et al. (2009). We show the results in table 3. We use MAE to test the variation of the accuracy with the increase in the number of factors. The accuracy of SVD++, HFT and NRLRS do not show much variation with a change in the number of latent factors and show stability across different size of . We adopt as a default for experimental evaluation because it gives the best prediction results.
We further investigate the impact of social local opinion, POI properties and Reviews. We define our algorithms as followings; NRLRS/Rev: Consider all features but the reviews component, set ; NRLRS/Social: Considers all feature but the Social Relations component, set ; NRLRS/Social/Rev: Considers features all but the social relation and review features by setting and ; and NRLRS/POI: Considers all features but the POI properties by setting and . We show the results in Table 4. Performance degrades when any of the components is eliminated demonstrating the importance of each components contribution to the entire model. We note that our model performance without the proposed integrated components shows a performance comparable to SVD++ algorithm across the two cities. Further, we note that reviews (NRLRS/Rev) show a very strong contribution as the performance of the algorithm significantly degrades when removed.
Our proposed model shows small improvements in the accuracy, they are significant in recommender systems, as Koren (2008) provides evidence that even a small improvement in a rating prediction error can affect the ordering of items and have significant impact on the quality of the top few presented recommendations and thus the overall performance of the recommender system.
We demonstrate that our proposed solution achieves a higher prediction accuracy than the current state of the art. This is especially true in our set context of exclusively considering users traveling to new geographical regions; in the case of our datasets Phoenix users traveling to Las Vegas and vice versa. Our algorithm outperforms the recommendation techniques from both cities.
- Bao et al. (2015) J. Bao, Y. Zheng, D. Wilkie, M. Mokbel, Recommendations in location-based social networks: a survey, GeoInformatica 19 (2015) 525–565.
- Koren and Bell (2011) Y. Koren, R. Bell, Advances in collaborative filtering, in: Recommender systems handbook, Springer, 2011, pp. 145–186.
- McAuley and Leskovec (2013) J. McAuley, J. Leskovec, Hidden factors and hidden topics: understanding rating dimensions with review text, in: Proceedings of the 7th ACM conference on Recommender systems, ACM, pp. 165–172.
- Cheng et al. (2012) C. Cheng, H. Yang, I. King, M. R. Lyu, Fused matrix factorization with geographical and social influence in location-based social networks., in: Aaai, volume 12, p. 1.
- Gao et al. (2012) H. Gao, J. Tang, H. Liu, gscorr: modeling geo-social correlations for new check-ins on location-based social networks, in: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, pp. 1582–1586.
- Li et al. (2016) H. Li, Y. Ge, R. Hong, H. Zhu, Point-of-interest recommendations: Learning potential check-ins from friends., in: KDD, pp. 975–984.
- Gao et al. (2015) H. Gao, J. Tang, H. Liu, Addressing the cold-start problem in location recommendation using geo-social correlations, Data Mining and Knowledge Discovery 29 (2015) 299–323.
- Adomavicius and Tuzhilin (2005) G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE transactions on knowledge and data engineering 17 (2005) 734–749.
- Liu and Xiong (2013) B. Liu, H. Xiong, Point-of-interest recommendation in location based social networks with topic and location awareness, in: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, pp. 396–404.
- Pavan et al. (2017) M. Pavan, S. Mizzaro, I. Scagnetto, Mining movement data to extract personal points of interest: A feature based approach, in: Information Filtering and Retrieval, Springer, 2017, pp. 35–61.
- Gonzalez et al. (2008) M. C. Gonzalez, C. A. Hidalgo, A.-L. Barabasi, Understanding individual human mobility patterns, Nature 453 (2008) 779–782.
- Liu et al. (2014) Y. Liu, W. Wei, A. Sun, C. Miao, Exploiting geographical neighborhood characteristics for location recommendation, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, pp. 739–748.
- Rhee et al. (2011) I. Rhee, M. Shin, S. Hong, K. Lee, S. J. Kim, S. Chong, On the levy-walk nature of human mobility, IEEE/ACM transactions on networking (TON) 19 (2011) 630–643.
- Zhang and Chow (2015) J.-D. Zhang, C.-Y. Chow, Geosoca: Exploiting geographical, social and categorical correlations for point-of-interest recommendations, in: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 443–452.
He et al. (2016)
J. He, X. Li, L. Liao,
D. Song, W. K. Cheung,
Inferring a personalized next point-of-interest
recommendation model with latent behavior patterns,
in: Thirtieth AAAI Conference on Artificial Intelligence.
- Yu et al. (2016) Z. Yu, H. Xu, Z. Yang, B. Guo, Personalized travel package with multi-point-of-interest recommendation based on crowdsourced user footprints, IEEE Transactions on Human-Machine Systems 46 (2016) 151–158.
- Liu et al. (2013) B. Liu, Y. Fu, Z. Yao, H. Xiong, Learning geographical preferences for point-of-interest recommendation, in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 1043–1051.
- Sarwar et al. (2001) B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Item-based collaborative filtering recommendation algorithms, in: Proceedings of the 10th international conference on World Wide Web, ACM, pp. 285–295.
- Spertus et al. (2005) E. Spertus, M. Sahami, O. Buyukkokten, Evaluating similarity measures: a large-scale study in the orkut social network, in: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, pp. 678–684.
- Koren (2008) Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 426–434.
- Bao et al. (2012) J. Bao, Y. Zheng, M. F. Mokbel, Location-based and preference-aware recommendation using sparse geo-social networking data, in: Proceedings of the 20th international conference on advances in geographic information systems, ACM, pp. 199–208.
- Koren et al. (2009) Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender systems, Computer 42 (2009).