A Systematic Analysis on the Impact of Contextual Information on Point-of-Interest Recommendation

As the popularity of Location-based Social Networks (LBSNs) increases, designing accurate models for Point-of-Interest (POI) recommendation receives more attention. POI recommendation is often performed by incorporating contextual information into previously designed recommendation algorithms. Some of the major contextual information that has been considered in POI recommendation are the location attributes (i.e., exact coordinates of a location, category, and check-in time), the user attributes (i.e., comments, reviews, tips, and check-in made to the locations), and other information, such as the distance of the POI from user's main activity location, and the social tie between users. The right selection of such factors can significantly impact the performance of the POI recommendation. However, previous research does not consider the impact of the combination of these different factors. In this paper, we propose different contextual models and analyze the fusion of different major contextual information in POI recommendation. The major contributions of this paper are: (i) providing an extensive survey of context-aware location recommendation (ii) quantifying and analyzing the impact of different contextual information (e.g., social, temporal, spatial, and categorical) in the POI recommendation on available baselines and two new linear and non-linear models, that can incorporate all the major contextual information into a single recommendation model, and (iii) evaluating the considered models using two well-known real-world datasets. Our results indicate that while modeling geographical and temporal influences can improve recommendation quality, fusing all other contextual information into a recommendation model is not always the best strategy.



There are no comments yet.


page 1

page 2

page 3

page 4


Joint Geographical and Temporal Modeling based on Matrix Factorization for Point-of-Interest Recommendation

With the popularity of Location-based Social Networks, Point-of-Interest...

SAM: A Self-adaptive Attention Module for Context-Aware Recommendation System

Recently, textual information has been proved to play a positive role in...

Category-Aware Location Embedding for Point-of-Interest Recommendation

Recently, Point of interest (POI) recommendation has gained ever-increas...

Using Social Media Background to Improve Cold-start Recommendation Deep Models

In recommender systems, a cold-start problem occurs when there is no pas...

CAPS: Context Aware Personalized POI Sequence Recommender System

The revolution of World Wide Web (WWW) and smart-phone technologies have...

Top-k Socio-Spatial Co-engaged Location Selection for Social Users

With the advent of location-based social networks, users can tag their d...

Kernel Density Estimation based Factored Relevance Model for Multi-Contextual Point-of-Interest Recommendation

An automated contextual suggestion algorithm is likely to recommend cont...

Code Repositories


A Systematic Analysis on the Impact of Contextual Information on Point-of-Interest Recommendation (TOIS 2022)

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and Motivations

With the ever-increasing number of smartphone users, Location-based Social Networks (LBSNs), such as Foursquare111https://www.foursquare.com, Swarm222https://www.swarmapp.com, and Yelp333https://www.yelp.com

are growing. LBSNs allow users to check-in through their smartphones at a location or Point-of-Interest (POI) such as restaurants, malls, or movie theaters. Users’ check-in data often include geographical information, i.e., latitude and longitude, and the check-in timestamp. With thousands of potential POIs in the vicinity of each user, the process of choosing a POI by a user becomes overwhelming. POI recommendation algorithms strive to address this issue by filtering through a large variety of options available and returning those that are most likely to be of the user’s interest. In recent years, Matrix Factorization (MF), as a linear technique, and Neural Network (NN), as a non-linear one, have proven to offer promising solutions to the problem of designing efficient filtering algorithms for recommendation systems 

(Liu et al., 2017; Zhang et al., 2019; Adomavicius and Tuzhilin, 2005; Ricci et al., 2015, 2011). The main difference between MF and NN lies in their approach of modeling the relation between users and POIs in a linear or non-linear way (He et al., 2017; Rendle et al., 2020; Anelli et al., 2021). In general, POI recommendation suffers from two well-known problems, namely, data sparsity and cold start. Given a large amount of POIs and the user’s ability to visit only a few of them, the user-POI interaction matrix becomes very sparse. This makes it very hard to model collaborative interactions between users and POIs. This problem is referred to as data sparsity. On the other hand, the cold-start problem refers to recommending POIs to those users who have very limited or no interaction records. Also, POIs with very limited or no interaction records are considered as cold items, and the models often fail to recommend these items effectively (Liu et al., 2017; Li et al., 2015; Elahi et al., 2016; Xiong et al., 2020). To address the data sparsity issue, more recently, Context-Aware Recommender Systems (CARSs) have gained popularity as many researchers in different disciplines such as recommender systems, information retrieval, and data mining have recognized the importance of contextual information (Liu et al., 2017; Stepan et al., 2016; Baltrunas, 2008; Rahmani et al., 2019c; Adomavicius and Tuzhilin, 2011; Chakraborty et al., 2020). A CARS should provide a user with recommendations taking into consideration the user’s current context. The context of a user can be defined as a set of factors and limitations that impact users’ perception and acceptance of a particular item. Various definitions of context exist in the literature (Liu et al., 2017; Baral and Li, 2018; Sánchez, 2019; Ricci, 2010), from which the most popular contextual factors for POI recommendation can be listed as follows: the time of check-in, weather, location, prices, or even the users’ friendships.

Fig. 1 shows a typical check-in record in an LBSN. Based on such records, four important and effective contextual information are typically considered for POI recommendation, namely, geographical, temporal, social, and categorical information (Liu et al., 2017; Baral and Li, 2018; Ricci, 2002; Manotumruksa et al., 2018). Among various contextual factors, finding the right combination of contextual information for a specific user is of great importance, as it affects both the effectiveness and efficiency of the models. Previous studies have shown that incorporating all available contextual factors is not always beneficial (Liu et al., 2017; Baral and Li, 2018). In addition to the extra data processing load, using all contextual information will not necessarily lead to an improvement in recommendation accuracy (Baral and Li, 2018). Furthermore, the effect of different contextual factors and their combinations on linear and non-linear algorithms performances has not been studied in depth (Liu et al., 2017; Baral and Li, 2018; Manotumruksa, 2019). Yet, Liu et al. (2017) studied the different POI recommendation methods that have been proposed, and Baral and Li (2018) exploited the impact of contextual information only on the PageRank algorithm. Indeed, a careful selection of such factors, while taking into account the characteristics of the recommender model, can significantly impact the effectiveness of the system. Therefore, the main challenge is to identify which contextual information or which combination of them should be incorporated into a POI recommendation system to improve recommendation quality. To create an accurate POI recommender system, we need to be able to answer questions such as, what might be the impact of using temporal information instead of geographical information? or what is the impact of using social and categorical information instead of geographical and temporal information? Moreover, users in an LBSN have different behavior; for instance, a user might prefer to visit the same POIs again and again, while another user may prefer to discover new and unvisited locations. Therefore, analyzing the impact of users’ behavioral biases on the performance of the models and the effectiveness of each contextual factor is of high importance. Such an analysis would enable systems to take different strategies for different groups of users. For instance, geographical context might be more effective for users who tend to regularly visit POIs, while social context might be more beneficial for users who tend to discover new POIs more often. With this knowledge, a recommender system would be able to employ separate strategies for each case.

Figure 1. Illustration of a typical check-in in LBSN.

In this work, we seek to provide a more comprehensive understanding of contextual factors’ impact on a set of representative POI recommendation models. Therefore, our main research questions can be summarized as follows:

  • RQ1: How effectively do different models incorporate multiple contextual factors in recommendation? (cf. Section 5.1)

  • RQ2:

    How can different evaluation metrics capture the effect of contextual information on various models? (cf. Section


  • RQ3: How can we incorporate different contextual factors in linear and non-linear models? (cf. Section 5.3)

  • RQ4: How do models incorporating contextual information perform for users with different behavior? (cf. Section 5.4)

To answer these research questions, we consider several baseline models previously proposed to prove the lack of deep consideration of contextual models. Also, we define different contextual models that can be used in a POI recommendation system and fuse them into the MF and NN recommendation approaches. Next, we analyze the impact of major influential contexts for POI recommendation by combining different contextual information on creating fused models. In particular, we are not aiming to justify the effect of any contextual information on any model. Instead, our goal is to characterize and provide a structured review of the impact of context on these two specific models. To the best of our knowledge, the impact of users’ behavior on contextual models’ performance in POI recommendation has not been extensively studied before. We intend to fill this gap via a detailed analysis defining different factors on users’ behavior such as geographical distance, check-ins density, and exploration

. Our aim is to identify a selection of contextual factors that perform best and the impact of considering a combination of multiple factors (e.g., a combination of social and temporal factors or spatial and temporal factors). We analyze the results based on two different datasets to see how much the results can be generalized. This analysis can help select contextual factors that are suitable for implementation in real systems. Based on our findings, we see that neural networks-based models are more accurate than matrix factorization. Moreover, in most cases, temporal information has a greater effect than others. Also, the combination of temporal and geographical information helps the recommendation algorithms achieve better performance. Finally, to enable the results’ reproducibility, we have made our codes, datasets, and analysis publicly available in open source.


The rest of this paper is organized as follows. We review the relevant studies on POI recommendations in Section 2. Our proposed experimental framework and our analysis approach are presented in Section 3, followed by experimental evaluation in Section 4. Finally, Sections 5 and 6 discuss and conclude this paper.

2. Related Work

This section will discuss similar papers that tried to reproduce and examine multiple POI recommender models and analyze them. Then, we will review the different proposed contextual POI recommendations.

Previously some researchers reproduced and examined multiple POI recommendation models to analyze and discuss the impact of different contextual information in POI recommendation. In this regard, the first study was done by Liu et al. (2017) in 2017. The authors of this work reproduce and provide a comprehensive evaluation of 12 state-of-the-art POI recommendation models proposed by different researchers. Then, they compare them based on different evaluation metrics such as Precision, Recall, and nDCG (Sec. 4.2 shows the formula for each metric). In another work, Stepan et al. (2016) incorporate spatial, temporal, and social information in their recommendation model. They analyze the impact of their contextual information on the proposed model by adding them separately. However, they do not consider the impact of the combination of contextual information. Baral and Li (2018) exploited different contextual information in POI recommendation employing PageRank as a raking model. More recently, Sanchez (Sánchez, 2019) discusses the impact and the importance of evaluating contextual information in a POI recommendation model. They propose using evaluation metrics based on different contextual information to see the impact of contextual information on the performance of models. In addition, Rendle et al. (2020)

compare the performance of matrix factorization when it uses dot product and multi-layer perceptrons (MLP)

555This approach is often referred to as neural collaborative filtering (NCF).. They conclude that simple and traditional dot product archives better results, and MLP is too costly to use for recommendation in production environments. Our work is in line with the earlier mentioned studies performed by Liu et al. (2017), Stepan et al. (2016), Baral and Li (2018), Sánchez (2019). However, there are some major differences between our study and these works. The experimental research of Liu et al. (2017) studies several POI recommendation models and compares their performance in terms of precision, recall, and nDCG. In contrast, we propose two extensible models, one based on Matrix Factorization and the other based on Neural Networks, that can easily incorporate contextual information. By doing so, we are able to analyze the importance of contextual information both on a linear and non-linear model. Furthermore, the study presented in (Liu et al., 2017) only compares the individual, geographical and social components of the models (see Section 5.3 in Liu et al. (2017)). In this paper, we additionally study the categorical context. We also perform a much broader set of comparisons, going beyond comparing only social components or geographical components of different models, as done in (Liu et al., 2017), but also study the performance of models based on differences in user behaviors. The study of Stepan et al. (2016) considers fusing all different contextual information into one single model. At the same time, we analyze the impact of having a different selection of contextual information in creating fused models. The study of Baral and Li (2018) considers different fused models of contextual information. Still, they applied them into a single ranking model based on the PageRank algorithm that is a linear model. This study does not consider the analysis of performance by changing the underlying ranking model. We will, however, study the performance achieved by linear and non-linear models. The study of Sánchez (2019) implements some traditional POI recommendation systems by incorporating contextual information to show that the study and evaluation of different contextual information are important in the domain of POI recommendation without actually performing the analysis. In comparison, our paper provides a detailed analysis of the impact of different contextual factors in POI recommendation. In fact, to evaluate different combinations of the contextual information in different fused models, we analyze the impact of different contextual information in two commonly used models in POI recommendation, i.e., matrix factorization and neural networks, representing linear and non-linear models. To the best of our knowledge, none of the existing recommendation models incorporated these aspects in a linear and non-linear manner to the best of our knowledge.

Since different approaches use these two models in different settings, it is important to know which one is better and in which situation. In what follows, we will provide an overview of non-contextual and contextual-based POI recommendation systems and review relevant previous studies in each category. Table 1 shows a summary of the related studies.

Related work Interaction Geographical Temporal Social Categorical
Ye et al. (2011)
Ference et al. (2013)
Cheng et al. (2012)
Rahmani et al. (2019a)
Aliannejadi et al. (2019)
Zhang and Chow (2013)
Cheng et al. (2016)
Li et al. (2015)
Guo et al. (2019)
Griesner et al. (2015)
Gao et al. (2013)
Li et al. (2017)
Cho et al. (2011)
Bao et al. (2012)
Gibson et al. (1998)
Rahmani et al. (2019b)
Baral and Li (2018)
Stepan et al. (2016)
Cheng et al. ([n.d.])
Baral and Li (2016)
Baral et al. (2016)
Liu and Xiong (2013)
Yin et al. (2013)
Hu and Ester (2013)
Yin et al. (2015)
Xie et al. (2016)
Rahmani et al. (2020)
Chen et al. (2020)
Baral et al. (2019)
Pan et al. (2019)
Zheng et al. (2020)
Lim et al. (2020)
Zhou et al. (2019)
Zhang et al. (2014)
Zhang and Chow (2015)
Manotumruksa et al. (2017)
Manotumruksa et al. (2018)
Manotumruksa et al. (2020)
Chang et al. (2020a)
Lim et al. (2020)
Ma et al. (2018)
Zhou et al. (2019)
Table 1. Summary of POI recommendation papers in the related works in relation to the use of interaction (i.e., check-ins) and contextual information.

2.1. Non-contextual Information

2.1.1. Interaction (I)

Most traditional recommendation systems make recommendations for items such as movies or music using explicit ratings. In an LBSN, explicit ratings are rare, and usually, the check-in frequency (i.e., the interaction of users with POIs without any contexts such as geographical or temporal) implicitly reflects users’ preferences for POIs. Hence, to produce POI recommendations, several earlier studies (Berjani and Strufe, 2011; Ye et al., 2011; Mülligann et al., 2011; Davidsson and Moritz, 2011; Manotumruksa et al., 2020) adopted traditional recommendation models to infer users’ personalized preference for POI by mining the check-in patterns of users. With the available check-in information, existing recommendation approaches (e.g., user-based and item-based Collaborative Filtering (CF)) can be employed for POI recommendation in LBSNs by treating POIs as items. By taking this approach, Ye et al. (2010) was the first research to provide location recommendations services in LBSNs that proposed user-based and item-based POI recommendation algorithms. The proposed approach assumes that similar users have similar tastes for locations and make POI recommendations based on most similar neighbors’ opinions. On the other hand, this item-based POI recommendation approach assumes that users are interested in similar POIs.

2.2. Contextual Information

2.2.1. Geographical information (G)

Incorporating geographical information is one of the most important factors that distinguish a POI recommendation from a conventional item recommendation. Tobler’s First Law of Geography (1970) states that “everything is related to everything else, but near things are more related than distant things”. In fact, analysis of users’ check-in data shows that a user’s check-ins happen in geographically constrained areas (Ye et al., 2011; Cheng et al., 2012; Sun et al., 2020) and thus follow this general rule. This reflects the users’ interest in visiting nearby POIs rather than distant ones. Several studies (Ye et al., 2011; Cheng et al., 2012; Ference et al., 2013; Zhao et al., 2017) attempt to employ such geographical information to improve POI recommendation systems. Ye et al. (2011) showed that users’ check-in behavior follows a power-law distribution. They proposed a unified POI recommendation system by incorporating this geographical information to address the data sparsity problem. Ference et al. (2013) took into consideration user preference, geographical proximity, and social information for out-of-town POI recommendation. Cheng et al. (Cheng et al., 2012, 2016) proposed a Multi-center Gaussian model to capture users’ movement patterns as they assumed users’ check-ins happen around several centers. Lian et al. (2014) proposed a POI recommendation approach based on weighted matrix factorization. They explicitly model the so-called geographical users’ “activity area” and “the influence” area of POIs. Li et al. (2015) modeled the POI recommendation task as a ranking-based approach, where they incorporated the geographical information into the pair-wise ranking model. The geographical information is modeled by defining an extra factor matrix.

Conversely, Zhang and Chow (2013)

argued that geographical information should be considered for each user separately. To this end, a model was proposed based on kernel density estimation of the distance distributions between POIs checked-in by each user.

Rahmani et al. (2019a) modeled this geographic information from two different perspectives: the user’s and the location’s. They showed that the recommendation model’s performance could be improved by incorporating the impact of the neighboring POIs. Similarly, Yuan et al. (2016) addressed the data sparsity problem, assuming that users tend to show more interest in POIs that are geographically closer to the one that they have already visited. Guo et al. (2019) proposed a location neighborhood-aware weighted matrix factorization model to exploit the location perspective, incorporating geographical distance among POIs. More recently, Aliannejadi et al. (2019) proposed a two-phase collaborative ranking algorithm for POI recommendation that takes into account the geographical information of POIs located in the same neighborhood. Manotumruksa et al. (2017)

capture the complex relations between users and POIs using a deep recurrent collaborative filtering method. They apply a pairwise ranking function with the aim of capturing check-ins in the form of sequences of observed feedback using a multi-layer perceptron and a recurrent neural network architecture. In particular, their method can learn complex user and POI features using element-wise and dot products as well as the concatenation of latent factors.

Chang et al. (2020a) proposed a graph neural network-based method inspired by the idea that consecutive check-ins at two POIs indicate a greater geographical influence between them. They designed a model to incorporate user preferences using a user-POI graph and geographical influences using a POI-POI graph in which edges of the POI-POI graph are weighted based on the frequency of users’ consecutive visits. Ma et al. (2018)

address the challenge of modeling more complex user-POI interactions from the sparse implicit feedback using an autoencoder-based approach named SAE-NAD. This network is a combination of a self-attentive encoder (SAE) and a neighbor-aware decoder (NAD). Their self-attentive encoder adopts a multi-dimensional attention mechanism to differentiate between user preference degrees. They also incorporate the geographical context information using the neighbor-aware decoder to make users’ reachability higher on the similar and nearby neighbors of checked-in POIs. This is achieved by the inner product of POI embeddings together with the radial basis function (RBF) kernel.

2.2.2. Temporal information (T)

Temporal constraints can result in specific user check-in patterns. Users’ temporal check-in behaviors in LBSNs typically exhibit a periodic pattern. For instance, it is observed that users visit places around their office or home area on weekdays and spend time in shopping malls on weekends. There are hourly patterns observed in check-ins. For instance, user check-ins at restaurants typically happen during lunchtime, whereas check-ins at nightclubs happen, naturally, at night. Capturing such temporal information is of vital importance for POI recommendation.

Many researchers have previously studied the effect of temporal information on users’ preferences by proposing different models to improve the POI recommendation accuracy (Ding and Li, 2005; Yuan et al., 2013; Zhao et al., 2017; Liu et al., 2017). Griesner et al. (2015) proposed an approach to integrate temporal information into weighted matrix factorization. They propose an approach to change the values of each POI’s influence area through accounting for the time spent by a user to go from the current POI to the next. Gao et al. (2013) divided users’ check-ins into different hourly time slots. Next, to train a user-based CF model, they compute the similarity between users based on their temporal overlap in visits to the same POIs. Zhao et al. (2016)

proposed a latent ranking method that explicitly models the interactions between users, POIs, and time. In particular, they proposed to build upon a ranking-based pairwise tensor factorization framework.

Li et al. (2017) proposed a time-aware personalized model adopting a fourth-order tensor factorization-based ranking, enabling the model to capture short-term and long-term preferences. Yao et al. (2016) matched users’ temporal regularity with the popularity of POIs to create a factorization-based algorithm. Also, Yuan et al. (2013) preserved the similarity of personal preferences in consecutive time slots by considering different latent variables at each time slot for each user. Moreover, Manotumruksa et al. (2018) proposed a contextual attention recurrent architecture model called CARA based on the success of recurrent neural network (RNN) models in modeling sequential patterns. Their model incorporates contextual information related to users’ sequence of check-ins (e.g., time of the day) to effectively capture the users’ dynamic preferences.

2.2.3. Social Information (S)

It has been observed that other users can socially influence a user’s movements. The effect of social information has been studied to enhance POI recommendation based on the assumption that friends in LBSNs share more common interests than non-friends. Modeling social information has been explored in traditional recommendation systems (Tang et al., 2013; Jiang et al., 2012), and most of the existing work in POI recommendation is inspired by ideas taken from traditional recommendation systems. The analysis of Cho et al. (2011)

shows that around 10–30% of human movements can be socially influenced. Also, using the Gowalla dataset, they show an improvement in the accuracy of recommendation by considering the influence of friendships on users’ mobility (estimated to be around 61%) and the influence of mobility on new friendships (24%).

Qiao et al. ([n.d.])

present SocialMix, a hybrid model that considers (i) user’s familiarity and (ii) preference similarity for POI recommendation. To calculate users’ familiarity score, they use three features: number of mutual friends, Jaccard similarity (based on user’s friend list), and cosine similarity (based on user’s check-in history). The weight of each feature is determined through maximum likelihood estimation. The preference similarity that shows users’ similarity in terms of their preference in visited POIs is calculated based on the cosine similarity of user-location check-in data.

Zeng et al. (2018)

consider creating vectors representing user check-ins in 24-hour time-slots. They consecutively calculate the user similarities by measuring the cosine similarity of these vectors. Conversely,

Ye et al. (2010) showed on a dataset of Foursquare check-ins that 96% of users share less than 10% of the commonly visited places, 87% of people share nothing at all. However, to incorporate social information, they proposed a friend-based CF method to recommend POIs to users based on the commonly visited POIs. Moreover, Gao et al. (2013) assumed that people share their friends’ check-in activities. Their model used the Hierarchical Pitman-Yor (HPY) language model to represent the check-in pattern with effective results.

2.2.4. Categorical information (C)

Category information of POIs provides a strong indication of the activities that can be performed in them. It is shown that users have distinct biases on the categories of POIs they check-in to. In LBSNs, POI categories are typically organized in a hierarchical category tree. Foursquare offers a 3-level category hierarchy. The top-most level consists of nightlife, food, while the lowest level, consisting of bars, pubs, Japanese food, or cafes. Considering such information in the analysis of check-in data can reflect users’ preferences on the corresponding category. In the (Liu et al., 2013), the authors propose a category-aware recommendation to model the user’s preference transition among POIs over their categories of each POI. Finally, they recommend POIs to a user based on the categories that the user prefers. Bao et al. (2012) model the preference of users based on their social opinions using Hypertext Induced Topic Search (HITS) (Gibson et al. (1998)). HITS regards an individual’s visit to a POI as a directed link from the user to that POI. Each user has a hub score denoting their knowledge of a POI, and each location is associated with an authority score indicating its interest level. The target is to obtain a mutually reinforcing relationship between a user’s knowledge and the interest level of a POI. The users’ location history is categorized according to the POI’s type (such as shopping or restaurants). A user-location matrix is used to identify the local experts who have a higher affinity towards a POI category, and such experts’ social opinions were used in the recommendation. More recently, Rahmani et al. (2019b) proposed a category-aware POI embedding model that considers both the users’ sequence of check-ins and the category information of POIs. To this end, they made use of Word2Vec (Mikolov et al., 2013) to generate a high-dimensional representation of the sequence of check-ins and POI categories.

2.2.5. Fusion Models

Contextual information has proven to be beneficial in improving the performance of POI recommendation models (Mazumdar et al., 2020; Chang et al., 2020b; Wang et al., 2020; Davtalab and Alesheikh, 2021). A class of studies tried to incorporate different contextual information in a single model (Liu et al., 2020; Yu et al., 2020; Ma and Gan, 2020; Si et al., 2019). Baral and Li (2016) proposed a multi-aspect POI recommendation system using geographical, temporal, categorical, and social information. They model a graph where each node corresponds to a location, and each user-time tuple is regarded as an attribute of the location node that shows the transition of a user between locations. For recommendation, they use the Top-Sensitive PageRank model (Haveliwala, 2003), an extension of the PageRank algorithm, to rank each location for each user. Liu and Xiong (Liu and Xiong, 2013) propose a topic and location-aware POI recommendation system by exploiting textual and contextual information. They exploit an aggregated Latent Dirichlet Allocation (LDA) model to learn users’ interest topics and find interesting POIs by mining textual information associated with them. Then, they utilized this contextual information for POI recommendation in combination with probabilistic matrix factorization. Zhang and Chow (2015)

proposed GeoSoCa, which incorporates three different contextual factors, namely, geographical, social, and categorical information. GeoSoCa models geographical influence based on a check-in probability distribution over a two-dimensional space using a kernel density estimation (KDE) method. To model the social and categorical influences, GeoSoCa considers the check-in frequency of a user’s friends on a POI and the power-law distribution of all users’ categorical check-in frequency, respectively.

Baral et al. (2016) proposed a multi-aspect model based on the weighted matrix factorization. They incorporated geographical, temporal, categorical, and social aspects into the model. The intuition behind using the category information is that locations with the same category may have similar visits. For modeling the temporal information, they consider check-ins that happened simultaneously on POI as temporal popularity. Also, to model geographical information, they consider the user’s activity area (or influence area), which is defined as the region where the user frequently appears. Yin et al. (2013) use POIs’ content information such as item tags and category keywords to link the content-similar spatial items. Their model learns each user’s interest and the local preference of each city by capturing item co-occurrence patterns and exploiting item contents to produce the top item recommendations. Hu and Ester (Hu and Ester, 2013) used topic modeling to exploit user posts’ spatial and textual aspects. Cheng et al. ([n.d.])

considered the users’ movement constraints, i.e., moving around a localized region, and proposed a successive personalized POI recommendation model using a matrix factorization method that embedded personalized Markov chains and localized regions.

Yin et al. (2015) proposed a probabilistic generative model that exploited geographical, temporal, word-of-mouth, and semantic information. Zhang et al. (2014)

proposed a fusion method that considers geographical, temporal, and social influences. They model the geographical influence using a two-dimensional check-in probability distribution using KDE. The social influence is modeled based on the friendship relation of users. In contrast, the temporal influence is modeled by mining location sequences on location-location transition graph of all users and dividing the transition counts between every pair of POIs by the outgoing counts of each node. Also,

Xie et al. (2016) used geographical, temporal, and semantic aspects in their heterogeneous graph embedding model. They use a time decay method claimed to be an efficient predictor for the user’s latest preferences. Moreover, Rahmani et al. (2020) proposed a joint model of geographical and temporal information. To this end, they first find users’ activity centers based on different temporal states and then recommend to users the unvisited POIs located in these areas to the users. Lim et al. (2020) proposed a graph attention network using the spatial-temporal-preference data of users. Their method is able to exploit personalized user preferences and explore new POIs in global spatial-temporal-preference neighborhoods at the same time, while allowing users to learn from other users selectively. In addition, using a random walks approach, their method can mask a self-attention option to leverage the spatial-temporal-preference graph and find a new higher-order POI neighbor during exploration. Zhou et al. (2019) proposed an adversarial model called APOIR to learn the distribution of user latent preferences of certain POIs to others. APOIR includes two major components: a recommender and a discriminator. The recommender module maximizes the probabilities of POIs that are unvisited based on the learned distribution to insert them in recommendation. The discriminator distinguishes the recommended POIs from the true check-ins and provides gradients to guide and improve the recommendation module in a reward framework.

3. Evaluation Framework

In this section, we describe how we designed our experiments to study contextual information for POI recommendation to answer the research questions posed in the introduction (Section 1). We outline what can be learned from each experiment, focusing on different contextual information and how their differences can be quantified. To this end, we propose an analytical and experimental evaluation framework in which we examine each contextual piece of information’s effect. One of the important advantages of this framework is that it enables us to fuse every contextual model, either those proposed by ourselves or others.

To make use of contextual information, two tasks need to be performed. To show the ability to fuse previously proposed contextual information into our experimental framework, in the first task, we consider evaluating the contextual information that is incorporated into the previous POI recommendation systems666In this paper, we refer to them as baselines.. Next, in the second task, to show the ability to incorporate any new contextual models, we model each piece of contextual information and fuse them in different ways into our analytical framework777In this paper, we refer to them as proposed models.. Therefore, in this work, we consider two types of models: (i) we consider three state-of-the-art and well-known context-aware POI recommendation algorithms and analyze them within our proposed analytical and experimental evaluation framework, (ii) we also propose different contextual models and apply them into our proposed analytical framework. In our framework, to model (i), we have selected three state-of-the-art context-aware POI recommendation systems, namely, GeoSoCa (Zhang and Chow, 2015), FCFKDEAMC (or LORE) (Zhang et al., 2014), and PFMMGM (Cheng et al., 2012). These POI recommendation systems are well-known by the community for being able to make use of various contextual information. We will demonstrate how these contextual models will be incorporated into our proposed experimental framework. To show how new contextual information will be incorporated into the experimental framework, to model (ii), we consider matrix factorization (linear) and neural networks (non-linear) approaches to create contextual models for each piece of contextual information (geographical, social, temporal, and categorical information). Next, these contextual models will be fused by aggregation into matrix factorization and neural networks. Moreover, different users follow various behavior; for instance, some users prefer to visit nearby POIs rather than distant ones. Thus, different behavior impacts the modeling and the performance of different contextual information. Therefore, we conduct additional experiments to study the impact of users’ behavior on contextual information. To this end, first, we categorize users based on three behavioral parameters in visiting different POIs, namely, geographical distance, check-ins density, and exploration. Furthermore, we study each model’s performance by the different contextual models for each user category and compare the effect of contextual information for these user groups. In the following, we first introduce the baseline algorithms, contextual models, and then we present our analytical framework.

3.1. Baseline Algorithm Analysis

Here, we describe the three selected baseline algorithms that can incorporate different contextual information to improve the performance of the POI recommendation. These are GeoSoCa (Zhang and Chow, 2015), FCFKDEAMC (Zhang et al., 2014), and PFMMGM (Cheng et al., 2012).

GeoSoCa performs recommendation by exploiting geographical (Geo), social (So), and categorical (Ca) correlations among users and POIs. GeoSoCa models geographical influence based on a check-in probability distribution over a two-dimensional space using a kernel density estimation (KDE) method. To model the social influence, GeoSoCa considers the check-in frequency of users’ friends on a POI to compute the social check-in frequency or rating between users and POIs. GeoSoCa computes a power-law distribution over users’ categorical check-in frequency to incorporate the categorical information. FCFKDEAMC fuses temporal information based on sequential patterns (AMC), two-dimensional check-in probability distributions as geographical (KDE) information, and social (FCF) information. To model the geographical influence, similar to GeoSoCa, FCFKDEAMC considers a two-dimensional check-in probability distribution using KDE. It models the social influence using a friend-based collaborative filtering (FCF) approach that allows making POI recommendations based on similar friends. Moreover, FCFKDEAMC incrementally mines location sequences of all users and represents the sequential patterns as a dynamic location-location transition graph. Then, transition probabilities can be dynamically calculated by dividing the transition counts on the location-location transition graph by the outgoing counts of each node. PFMMGM fuses the probabilistic factorization model (PFM) and the multi-center Gaussian model (MGM), which models user check-ins based on a user’s geographical centers to predict the probability of a user visiting a POI.

We have selected these methods for our experiments mainly for two reasons: (i) they have successfully modeled all the contextual factors that we plan to study in this work, and (ii) they have been recognized as state-of-the-art by a significant number of works in the literature (Liu et al., 2017).

Notation Information
M Matrix Factorization (interaction-based)
N Neural Network (interaction-based)
G Geographical Information
T Temporal Information
S Social Information
C Categorical Information
MG Matrix Factorization and Geographical Information
GT Geographical and Temporal Information
GS Geographical and Social Information
GC Geographical and Categorical Information
TS Temporal and Social Information
TC Temporal and Categorical Information
SC Social and Categorical Information
GTS Geographical, Temporal, and Social Information
GTC Geographical, Temporal, and Categorical Information
GSC Geographical, Social, and Categorical Information
TSC Temporal, Social, and Categorical Information
GTSC Geographical, Temporal, Social, and Categorical Information
Table 2. Notation of proposed models. The Notation column show the contextual information that are parts of the proposed models. The Information column shows which pieces of information is captured by the notation. Interaction-based means no context, when only check-ins information is considered.

Our experiments on the baseline models are defined as follows. We carry out two different experiments on baseline models to consider both the impact of different contextual information and the evaluation metrics for analysis. Table 2 shows the notations that we use to specify a different combination of baselines. The first column (i.e., Notation) shows the different contextual information parts of the proposed models (see Section 3.2). The second column (i.e., Information) shows which pieces of information are captured by the notations. To refer to the baseline models, we will use Baseline-(Contexts) notations where Baseline indicates the model name (i.e., GeoSoca, FCFKDEAMC, or PFMMGM), and Contexts shows which contexts are included in the model. For instance, to refer to GeoSoCa, when we only consider the geographical and categorical parts, we use the notation GeoSoCa-(GC). We will use these notations to refer to the combination of baselines later in the experiment (Section 5).

3.1.1. Experiment 1: Focus on Contexts

In the first experiment, to answer RQ1, we study and analyze the impact of each contextual information separately. Also, we consider different combinations to evaluate the combined use of contextual information in the recommendation process. To this end, we report each model’s results while keeping only the mentioned contextual information and removing the ones that are not mentioned. This enables us to understand better how each of the contextual factors, as well their combinations, would affect each of the models. Given that a diverse set of combinations of contextual information is possible by selection from four pieces of information, this allows us to characterize the impact of different contextual factors more carefully.

3.1.2. Experiment 2: Focus on Metrics

In this experiment, we aim to answer RQ2 by studying the results of different evaluation metrics that we use in our analytical framework to evaluate the proposed models and compare them on the use of contextual information. To this end, we consider the three most commonly used evaluation metrics in information retrieval and recommender system domains, namely Precision, Recall, and nDCG. Then, we evaluate the effect of each contextual information and of their different combinations using the evaluation metrics mentioned above. This experiment helps us see how different evaluation metrics can capture the effect of contextual information on various models. For example, if precision shows that a model using geographical information outperforms one using social information, is it the same based on Recall?

Terms Definition
The set of users in the dataset
The set of POIs in the dataset
The set of all POI categories
A sample user in
A sample POI in
The dimension of latent features
The user latent-factor matrix of matrix factorization
The POI latent-factor matrix of matrix factorization
The user latent-factor matrix of neural network
The POI latent-factor matrix of neural network
The user-POI frequency matrix
The social links matrix
The categorical popularity for user on POI
The number of check-ins of user ’s friends on POI
The frequency of user visiting the POIs that belong to category
The check-in frequency of all users on POI in category
The number of check-ins by user
The total number of unique visited POIs by user
Table 3. Definition of terms used in the paper.

3.2. Contextual Model Analysis

We selected matrix factorization as the linear approach to model the relation between users and POIs. Neural networks are instead selected as the non-linear way of capturing such a relation. Different contextual information can be fused and used in combination with matrix factorization and neural networks. This section presents our proposed linear and non-linear models, followed by the contextual information models. Table 3 presents the terms and notations used in the rest of this paper.

3.2.1. Matrix Factorization

To model the user’s preferences based on check-in data, we apply matrix factorization, a popularly used linear model (Liu et al., 2017; Yuan et al., 2013; Rahmani et al., 2019a)

. Since user feedback in POI recommendation is implicit feedback, we use a probabilistic factor model to efficiently deal with check-in data by defining a Poisson distribution on the frequency

(Cheng et al., 2012, 2016).

Figure 2. Matrix Factorization Model.

As shown in Fig. 2, the goal of matrix factorization is to find two low-rank matrices and based on the frequency matrix such that , where denotes “approximately equal to”. The predicted probability of a user , on a POI , is determined by:


which can be obtained by solving the following optimization problem that places Beta distributions as priors on the latent matrices

and while defining a Poisson distribution on the frequency:


where and are parameters for the Beta distributions, and is a constant.

3.2.2. Neural Network

To model a neural network of CF, as shown in Fig. 3, we consider a multi-layer representation of a user–POI interaction, similarly to He et al. (2017).

Figure 3. Neural Network Model.

The bottom input layer consists of user and POI feature vectors that describe user and POI , respectively. Above the input layer is the embedding layer, a fully connected layer that projects the sparse representation to a dense vector. The user and POI embeddings are then fed into a multi-layer neural architecture to map the latent vectors to prediction scores. The final output layer is the predicted score that predicts if the user visit a POI or not:


where and , denote the latent factor matrix for users and POIs, respectively; while denotes the model parameters of the interaction function . Since the function is defined as a multi-layer neural network, it can be formulated as:


where and respectively, denote the mapping function for the output layer and x-th neural layer when there are neural layers in total.

With the above two models, we can take into account all the different types of context that we have seen in Section 1. In what follows, we provide a formal definition of these contextual factors.

3.2.3. Geographical Information

Different models use geographical information to generate a more accurate recommendation for each user based on the POI’s and the user’s location. Most of them are based on the distance between users and POIs. Some studies (Zhang and Chow, 2015; Zhang et al., 2014) show that modeling geographical information leads to more recommendations as each user has different behavior. Here, we model a check-in probability distribution over a two-dimensional space using kernel density estimation (Zhang and Chow, 2015). Then, we add a user-dependent variable for check-ins to make the geographical modeling more personalized. The geographical probability of visiting a new POI by a user is then estimated based on its location on the check-in probability distribution:


where is the number of check-ins by user , shows the user-POI check-ins frequency matrix, and is the normal kernel function with a user-dependent variable.

3.2.4. Social Information

Zhang and Chow (2015) show that users’ social check-ins follow a power-law distribution. We take this same approach to model social information. Therefore, in this case, the social link check-in frequency for user on POI is the check-in frequency on POI by user ’s friends. Then, we use the cumulative of the power-law distribution as the social information in recommendations as follows:


where is estimated by the check-in matrix and social link matrix as follows:


in which is the social check-in frequency or rating of the friends of user on POI and is the aggregation of the check-in frequency of ’s friends on the POI , given by:


where is a binary matrix that indicates whether there exists a social link between users and or not.

3.2.5. Temporal information

To model temporal information, we employ an Additive Markov chain (Zhang et al., 2014) that exploits the sequential transition pattern between users and POIs. The temporal probability of a user visiting a POI is based on the transition probability between all the user’s visited POIs and the target POI, that is:


where is the number of POIs and derive the sequential probability of visiting a new location conditioned on the sequence of visits based on the additive Markov chain that is calculated as:


in which represents the sequence decay weight with the decay rate parameter and is the transition probability that is the outgoing count of as a transition predecessor to other POIs. Also, indicates the number of transitions between and .

3.2.6. Categorical Information

To model the categorical information, inspired by Zhang and Chow (2015), we estimate a power-law distribution for users’ categorical check-in frequency. This denotes the check-in frequency of user on all the POIs of category . The cumulative distribution of the users’ categorical check-in frequency is used as categorical information in recommendations as follow:


where and . Here, is the set of all POI categories, is the frequency of user visiting POIs that belong to the category and is the check-in frequency of all users on POI in category . Therefore, shows the categorical popularity of user on POI based on the set of categories and is calculated as:


3.2.7. Experiment 3: Focus on Contextual Models

To answer RQ1, this experiment considers two well-known linear and non-linear methods: matrix factorization and neural networks. To answer RQ2, we evaluate the different combinations of our proposed contextual information on these two models. Finally, to integrate the proposed models with the contextual information given by Eqs. (1), (3), (5), (9), (6), and (11) into a unified preference score, , for user to unvisited POI , we use the following sum rule:


where is equal to for the matrix factorization based models or for the neural networks models. Also, that is any sum combination of contextual models. The top- POIs with the highest score are recommended to user . It is worth mentioning that the sum rule has been widely used to fuse different factors for POI recommendations in previous works (Cheng et al., 2012; Zhang and Chow, 2015, 2013; Rahmani et al., 2019a) and has shown high robustness.

As one can see in Table 2, each proposed model is denoted by a capital letter. To refer to the proposed models, we will use Model-(Contexts) notations that Model indicates the model name (i.e., M or N) and Contexts shows which contexts are included in the model. For instance, to refer to the combination of matrix factorization with geographical and categorical context, we use the notation M-(GC). We will use these notations to refer to the combination of models later in the experiment (Section 5).

Finally, to answer RQ3, we study the results of the proposed models based on the selected evaluation metrics, and we compare the results of different models on those metrics.

3.3. Users Behavior Analysis

Users have different behaviors based on the different contextual information. In this section, we define three different analyses to study the users’ behavior. For instance, a user tends to visit the same POIs repeatedly, but another user might, differently, prefer to visit different POIs and never visit the same POI twice. Moreover, based on each context, users have different behavior. Therefore, we perform three experiments to consider user behavior, namely, (i) geographical distance, (ii) temporal check-in density, and (iii) exploration.

3.3.1. User Behavior on Geographical Distance

In order to model the users’ behavior based on geographical information, we categorize users’ geographical behavior. Users who tend to stay in a small neighborhood are different from those who move a lot. Thus, the geographical categorization distinguishes users by considering the range of consecutive check-ins’ geographical distance.

3.3.2. User Behavior on Temporal Check-in Density

Similar to geographical distance, we investigate how users’ density of check-ins impacts the performance of models. To this end, we consider the time between consecutive user check-ins at POIs. In this way, we distinguish between users who perform multiple check-ins on the same day for those who check in over several days. We consider the temporal density of user check-ins and, therefore, the temporal distance of consecutive check-ins.

3.3.3. User Behavior on Exploration

We also define a new metric to measure how much a specific user wants to explore new POIs, instead of revisiting POIs. We define as the total number of unique visited POIs by user and as the total number of check-ins by the user. As denoted in Eq. 14, the user exploration factor is calculated simply by dividing the total number of unique visited POIs by the total number of check-ins. Therefore, if the exploration factor equals one, all the user’s check-ins are to new POIs, and the user never visits a POI twice. Indeed, lower values mean that the user tends to visit the same POIs again and again. We categorize the users by the following exploration factor and analyze the performance of the models in relation to that.

(a) Exploration Geographical correlation.
(b) Exploration Temporal correlation.
(c) Temporal Geographical correlation.
Figure 7. User Behavior Aspects Correlation

3.4. Behavior Correlation

In this section, we aim to show whether the three different aspects of user behavior that we study correlate with each other. One may argue that each of these variables depends on the others and, therefore, their values correlate. To test this, we compute the pair-wise correlations between these three factors based on Pearson’s R to analyze their relationship. These results are presented in Fig. 7. Overall, we observe a low correlation between these aspects/variables, supporting our assumption to consider them independently. Moreover, we see a negative correlation between the temporal and exploration aspects (Pearson’s R: -0.33). This suggests that the more time between consecutive user check-ins, the less exploration the user carries out. Furthermore, from Fig. (a)a, one can see that we found a low positive correlation between geographical distance and users’ check-in exploration (Pearson’s R: 0.12). This indicates that the users that stay in the same neighborhoods tend to visit the same places, while when the geographical distance increases (e.g., visiting a new city or place which is far from their home location), they are more likely to go and explore more places. However, in none of the cases, we observe a considerable correlation between the variables. Therefore we conclude that even though there might be some dependencies among these variables (like those that we mentioned), the effect of those dependencies is negligible.

3.4.1. Experiment 4: Focus on Users Behavior

In this experiment, we aim to answer RQ4 by studying the users’ behavior based on different contexts. We consider three different behavioral habit experiments based on geographical, temporal, and user preference information.

4. Evaluation Methodology

In this section, we present the experimental settings, including the datasets, the evaluation metrics, and the experimental setup we employed.

4.1. Datasets

We use two real-world check-in datasets from Yelp and Gowalla. They are commonly used by other papers as they have provided a lot of check-in information. Yelp dataset includes all the contextual information we needed (geographical coordinates, POI category, friendship information, and check-in timestamp). The Yelp dataset provided during the Yelp dataset challenge888https://www.yelp.com/dataset/challenge round 7 (access date: Feb 2016) includes data from 10 metropolitan areas across two countries. From this dataset, we eliminate those users with fewer than 10 check-in POIs and those POIs with fewer than 10 visitors to consider enriched users and POIs and avoid the cold-start problem. The Gowalla dataset, on the other hand, does not include the POI category information. This dataset includes check-in data from February 2009 to October 2010. Before using this dataset, we filter out users with fewer than 15 check-ins and POIs with fewer than 10 visitors. The statistics of the datasets are provided in Table 4.

Datasets %Sparsity
Yelp 7,135 16,621 301,753 285,608 595 46,778 159.42 68.43 99.94%
Gowalla 5,628 31,803 620,683 378,968 - 46,001 110.28 19.51 99.78%
Table 4. Characteristics of the evaluation dataset used in the experiments. is the number of users, the number of POIs, is the number of check-ins, the number of unique check-ins, the number of category the number of social links.

4.2. Evaluation Metrics

To evaluate the performance of the proposed experiments, we used three evaluation metrics commonly used for assessing the performance of location-based recommendation: Precision at K (Pre@K), Recall at K (Rec@K), and Normalized Discounted Cumulative Gain at N (nDCG@K), with . Given the top-K returned POIs for user , Pre@K is defined as:


while is defiend as:


where: is the number of recommended POIs visited by (i.e., correct recommendations); is the number of recommended POIs that are not visited by (i.e., incorrect recommendations); is the number of POIs visited by , that are not in the top-K recommendations.

Finally, we have the measure nDCG@K evaluating the ranking quality of the recommendation models. For each user, nDCG@K is defined as:


where . IDCG@K is the DCG@N value when the recommended POIs are ideally ranked, and refers to the graded relevance of the result ranked at position . nDCG@K is in the range to , and higher values mean better results. In the experiments, the average nDCG values of all users are reported.

4.3. Experimental Setup

In this paper, we use the critical difference diagram based on the Wilcoxon-Holm method to detect pairwise significance and to compare the ranking of contextual models where we evaluated them on all of the evaluation metrics. We use critical difference diagrams to present the overall ranking of methods. In this diagram, a thick horizontal line groups a set of models that are not significantly different. We partition each dataset into training, validation, and test sets. We first sort the check-ins of each user based on their timestamp, then for each user, we use the top 70% of check-ins as training data, the following 20% of check-ins as test data, and the remaining 10% as validation data. To apply the significant difference, we first check the distribution of the datasets, and we observed that the datasets follow the normal distribution. Therefore, we determine the statistically significant differences between contextual models in terms of

, , and where

using a two-tailed paired t-test at a 95% confidence interval (

). To evaluate the performance of neural network-based models, we need to add negative samples of check-ins, i.e., unvisited POIs, for each user into the datasets to prevent biasing of models. To this end, we add negative samples by first counting each user’s unique visited POIs and next adding the same number of unvisited POIs for the same user (He et al., 2017, 2018)

. For the test set, we consider 1,000 negative samples for each user. We implement our methods with Python and Tensorflow

999https://www.tensorflow.org/. For the baseline algorithms, the parameters are initialized as reported in the corresponding papers (Zhang and Chow, 2015; Zhang et al., 2014; Cheng et al., 2012; Zhang and Chow, 2013; Liu et al., 2017). We set the latent factors parameter of K to for M and PFMMGM models. For the N model, we set user’s and POI’s embedding size to 30 and initialized them by latent factors that were extracted from matrix factorization. Finally, we optimized the model with mini-batch Adam optimizer (Kingma and Ba, 2014) and we employ two hidden layers for neural layers with size . Similarly, the choice of these parameters is based on literature (He et al., 2017).

5. Results and Analysis

This section presents the results obtained from the experiments outlined in Section 3.

max width= Baselines Contexts Pre@10 Pre@20 Rec@10 Rec@20 nDCG@10 nDCG@20 GeoSoCa G 0.0108 0.0106 0.0167 0.0324 0.0108 0.0107 S 0.0127 0.0113 0.0154 0.0267 0.0130 0.0120 C 0.0017 0.0017 0.0025 0.0047 0.0018 0.0018 GS 0.0169 0.0142 0.0194 0.0329 0.0178 0.0157 GC 0.0156 0.0142 0.0230 0.0424 0.0160 0.0149 SC 0.0121 0.0114 0.0149 0.0276 0.0126 0.0120 GSC 0.0183 0.0150 0.0214 0.0340 0.0195 0.0168 FCFKDEAMC G 0.0108 0.0103 0.0163 0.0311 0.0108 0.0104 S 0.0156 0.0129 0.0177 0.0283 0.0162 0.0143 T 0.0158 0.0145 0.0229 0.0406 0.0165 0.0154 SG 0.0170 0.0146 0.0192 0.0329 0.0179 0.0160 ST 0.0174 0.0148 0.0195 0.0321 0.0185 0.0164 GT 0.0198 0.0173 0.0284 0.0487 0.0203 0.0184 SGT 0.0182 0.0156 0.0203 0.0334 0.0195 0.0173 PFMMGM M 0.0089 0.0076 0.0104 0.0181 0.0086 0.0078 G 0.0075 0.0072 0.0113 0.0216 0.0074 0.0072 MG 0.0160 0.0132 0.0241 0.0398 0.0172 0.0149

Table 5. Performance comparison in terms of Precision@, Recall@, and nDCG@ for on Yelp. The superscripts letters (a-q) denote significant improvements compared to the other models (). The notation shows the set of all the letters, and the best result of a different combination of contexts of baselines are shown in bold.

max width= Baselines Contexts Pre@10 Pre@20 Rec@10 Rec@20 nDCG@10 nDCG@20 GeoSoCa G 0.0149 0.0151 0.0180 0.0363 0.0151 0.0151 S 0.0207 0.0181 0.0212 0.0355 0.0213 0.0193 GS 0.0215 0.0195 0.0253 0.0449 0.0222 0.0206 FCFKDEAMC G 0.0159 0.0152 0.0191 0.0355 0.0159 0.0155 S 0.0297 0.0242 0.0283 0.0441 0.0318 0.0274 T 0.0502 0.0420 0.0470 0.0781 0.0536 0.0469 SG 0.0334 0.0273 0.0321 0.0501 0.0355 0.0306 ST 0.0466 0.0360 0.0420 0.0612 0.0516 0.0428 GT 0.0479 0.0404 0.0459 0.0761 0.0512 0.0450 SGT 0.0450 0.0348 0.0403 0.0590 0.0498 0.0413 PFMMGM M 0.0168 0.0142 0.0149 0.0254 0.0202 0.0173 G 0.0187 0.0169 0.0210 0.0364 0.0192 0.0178 MG 0.0238 0.0208 0.0257 0.0442 0.0256 0.0229

Table 6. Performance comparison in terms of Precision@, Recall@, and nDCG@ for on Gowalla. The superscripts letters (a-m) denote significant improvements compared to the other models (). The notation shows the set of all the letters and the best result of different combination of contexts of baselines are shown in bold.

5.1. Experiment 1: Focus on Contexts (RQ1)

Here, our goal is to analyze the contribution of the contextual information and their different combinations on the POI recommendation’s performance. In Tables 5 and 6, we show the results of a different combination of contextual models in baselines on the Gowalla and Yelp datasets. As the Gowalla dataset does not include categorical information, we cannot experiment with categorical information-based models on this dataset; thus Table 6 has fewer number rows by removing the models with categorical context. The first column of Table 2 shows the notation that we use to refer to different combinations of contextual information in baselines algorithms (Zhang and Chow, 2015; Zhang et al., 2014; Cheng et al., 2012). Below we summarise the conclusions we can draw from these tables.

Temporal information plays a pivotal role among all combinations of contextual models. As seen, on both datasets displayed in Tables 5 and 6, by using temporal context, FCFKDEAMC-(T) is achieving higher performance against all models that include only a single contextual information (GeoSoCa-(G), GeoSoCa-(S), GeoSoCa-(C), FCFKDEAMC-(S), FCFKDEAMC-(G), FCFKDEAMC-(T), PFMMGM-(M), PFMMGM-(G)). The location-based social networks capture the user’s trajectory in the form of a sequence of visited locations. Our results show the importance of capturing such temporal information to model users’ behavior. After FCFKDEAMC-(T), the results of GeoSoCa-(S) and FCFKDEAMC-(S) are close and, in some cases, even better than all geographical information, namely, GeoSoCa-(G), FCFKDEAMC-(G), and PFMMGM-(G). Moreover, temporal information is one of the most important contexts when combined with other contextual information. Table 5 shows FCFKDEAMC-(GT) achieves the best results among all models, which include temporal information.

Considering geographical information per user on an individual basis is a more effective approach than creating universal distribution for all users. The results of GeoSoCa-(G), FCFKDEAMC-(G), and PFMMGM-(G) show that among the models using geographical information (G), overall FCFKDEAMC-(G) achieves the most stable results on both datasets. As we can see in Tables 5 and 6, PFMMGM-(G) exhibits better results than GeoSoCa-(G) and FCFKDEAMC-(G) on the Gowalla dataset. However, on the Yelp dataset we see an opposite trend, where GeoSoCa-(G) archives the best results. We see that on both datasets, even though less effective, FCFKDEAMC-(G)’s performance is comparable with the other two models. Moreover, we observe a considerable difference in the performance of PFMMGM-(G) and GeoSoCa-(G) on the two datasets, where the models perform well on one dataset while failing on the other. Given that FCFKDEAMC-(G) models the geographical component per user, while the two other models (i.e., PFMMGM-(G) and GeoSoCa-(G)) learn a universal geographical distribution, and FCFKDEAMC-(G) exhibits a more robust performance on both datasets, we can conclude that utilizing a user-based individual geographical distribution leads to a more effective and robust geographical model.

Considering categorical information alone will not lead to improving the performance. However, its impact is significant when combined with geographical information. Table 5 shows that the GeoSoCa-(C) model is performing worse than all 16 other models in terms of all performance metrics. This could be due to the lack of consideration of relevant geographical context. For example, consider a situation of having POIs with the same category in different locations; users would prefer to visit nearby POIs rather than distant ones. Moreover, the result in Table 5 on GeoSoCa-(GC) shows that a combination of GeoSoCa-(G) and GeoSoCa-(C) is significantly more effective than either of them separately.

Among different combinations of contextual information, the importance of geographical and temporal information is obvious. As shown in Tables 5 and 6, FCFKDEAMC-(GT) achieves the best performance in comparison with the other combinations of contextual information across out of models on the Yelp dataset and out of models on the Gowalla dataset. Moreover, one can see, when we add geographical information or temporal information to the models, they improve the performance of the models.

Considering all contextual information in a model will not necessarily improve the performance. The comparison of FCFKDEAMC-(GT) with FCFKDEAMC-(SGT) shows that across 12 comparisons based on 6 evaluation metrics on two datasets, FCFKDEAMC-(GT) beats the FCFKDEAMC-(SGT) even when using fewer combinations of contextual information. It is worth noting that FCFKDEAMC-(SGT) is the final model that Zhang et al. proposed (Zhang et al., 2014), but FCFKDEAMC-(ST), even with fewer contexts, outperforms FCFKDEAMC-(SGT).

In summary, our main findings related to RQ1 are as follows:

  • Temporal contextual information plays a pivotal role among all combinations of contextual models.

  • Categorical information alone will not lead to improvement in performance.

  • Considering all the available contextual information in a model will not necessarily improve the performance.

  • Among different combinations of contextual information, the importance of geographical and temporal information is clear.

5.2. Experiment 2: Focus on Metrics (RQ2)

As seen in Tables 5 and 6, the performance of models varies when compared based on different evaluation metrics. For example, in terms of precision on both datasets GeoSoCa-(S) performs more effectively compared with GeoSoCa-(G). However, the results on recall exhibit different behavior. We see that on recall, GeoSoCa-(G) performs better than GeoSoCa-(S), suggesting that geographical information has more impact when focusing on recall. When we compare the results of PFMMGM-(M) and PFMMGM-(G), which are respectively interaction-based and geographical-based models, we observe that PFMMGM-(G) significantly outperforms PFMMGM-(M) on recall. On the contrary, PFMMGM-(M) achieves better results than PFMMGM-(G) on the two other metrics, precision and nDCG.

Considering these differences, in order to have an idea of the overall ranking of the models based on each performance metric, in Fig. 10 we present a critical difference diagram that allows comparing the statistical ranking of different models. In these graphs, the horizontal lines annotated by numbers show the ranking of models and, the thick horizontal lines group a set of models that are not significantly different. In Fig. 10, we plot the ranking of a different combination of contextual models proposed in GeoSoCa (Zhang and Chow, 2015) based on two different evaluation metrics. As one can see in Fig. (a)a, the combination of geographical and categorical information achieve the best result based on the recall, while in Fig. (b)b, that shows the performance on nDCG, GeoSoCa-(GC) ranked third and GeoSoCa-(SG)

is better. The difference is explained by the fact that these metrics measure different things. nDCG considers the ranking of the POIs when we recommend a list of POIs based on users’ interests and behaviors, while the precision and recall only compare the list of recommended POIs and test set of users. One important fact to consider here is that these metrics are traditional metrics that are not designed for evaluating the quality of contextual information captured fully. There are currently no metrics that can assess the performance of recommendation with respect to the consistency of temporal, social, or geographical information. For instance, in evaluating the geographical context, we would ideally want to check if the recommended POIs are actually located near the user. Similarly, for evaluating a model incorporating temporal context, the sequence of visits or time of check-ins should be considered in the evaluation process and based on the users in the test set.

(a) Performance on Rec@20
(b) Performance on nDCG@20
Figure 10. Ranking of different combination of contextual information of GeoSoCa model on (a) Rec@20 and (b) nDCG@20.

In summary, in relation to RQ2, we can observe different contrasting effects on traditional evaluation metrics of the use of context in POI recommendation models. For instance, one model might outperform other models on one metric like nDCG, while on other metrics such as precision and recall, the results might be completely different. The reasons for such behavior might be related to the different nature of these evaluation metrics. Thus, we need to propose new evaluation metrics based on contextual information.

5.3. Experiment 3: Focus on Contextual Models (RQ3)

max width= Model Contexts Pre@10 Pre@20 Rec@10 Rec@20 nDCG@10 nDCG@20 M No Context 0.0080 0.0066 0.0081 0.0133 0.0077 0.0068 2-8 G 0.0284 0.0241 0.0411 0.0690 0.0297 0.0263 S 0.0178 0.0160 0.0218 0.0380 0.0185 0.0170 T 0.0221 0.0195 0.0308 0.0537 0.0232 0.0211 C 0.0075 0.0072 0.0071 0.0148 0.0074 0.0072 2-8 GS 0.0214 0.0181 0.0248 0.0412 0.0229 0.0201 GT 0.0253 0.0219 0.0371 0.0617 0.0264 0.0237 GC 0.0289 0.0246 0.0418 0.0701 0.0297 0.0265 ST 0.0195 0.0166 0.0224 0.0369 0.0203 0.0181 SC 0.0184 0.0165 0.0221 0.0382 0.0192 0.0176 TC 0.0231 0.0202 0.0322 0.0564 0.0243 0.0220 2-8 GST 0.0204 0.0172 0.0239 0.0382 0.0213 0.0188 GSC 0.0219 0.0186 0.0253 0.0417 0.0232 0.0205 GTC 0.0262 0.0225 0.0382 0.0640 0.0273 0.0245 STC 0.0197 0.0170 0.0225 0.0378 0.0205 0.0185 2-8 GSTC 0.0207 0.0175 0.0240 0.0389 0.0216 0.0191

Table 7. Performance comparison on matrix factorization based POI recommendation in terms of Precision@, Recall@, and nDCG@ for on Yelp. The superscripts letters (a-p) denote significant improvements compared to the other models (). The notation shows the set of all the letters. The best result between all models and the best result between different numbers of combinations are shown in bold and italic, respectively.

max width= Model Contexts Pre@10 Pre@20 Rec@10 Rec@20 nDCG@10 nDCG@20 M No Context 0.0168 0.0142 0.0149 0.0254 0.0202 0.0173 2-8 G 0.0377 0.0326 0.0404 0.0680 0.0402 0.0359 S 0.0261 0.0219 0.0242 0.0399 0.0290 0.0252 T 0.0542 0.0442 0.0517 0.0826 0.0583 0.0501 2-8 GS 0.0372 0.0295 0.0360 0.0543 0.0407 0.0343 GT 0.0503 0.0410 0.0493 0.0789 0.0538 0.0463 ST 0.0471 0.0366 0.0413 0.0617 0.0518 0.0431 2-8 GST 0.0440 0.0338 0.0396 0.0579 0.0482 0.0398

Table 8. Performance comparison on matrix factorization based POI recommendation in terms of Precision@, Recall@, and nDCG@ for on Gowalla. The superscripts letters (a-h) denote significant improvements compared to the other models (). The notation shows the set of all the letters. The best result between all models and the best result between different numbers of combinations are shown in bold and italic, respectively.

max width= Model Contexts Pre@10 Pre@20 Rec@10 Rec@20 nDCG@10 nDCG@20 N No Context 0.066 0.0551 0.0849 0.1405 0.0687 0.0604 2-8 G 0.0526 0.042 0.0599 0.0934 0.059 0.0497 S 0.1104 0.0863 0.1412 0.2123 0.1205 0.1008 T 0.0683 0.0565 0.0886 0.1447 0.0742 0.0642 C 0.0643 0.0548 0.0825 0.1419 0.0665 0.0594 2-8 GS 0.0563 0.0449 0.0644 0.1 0.0623 0.0525 GT 0.0528 0.0421 0.0602 0.0936 0.0597 0.0501 GC 0.0521 0.0418 0.0588 0.0916 0.0576 0.0487 ST 0.1115 0.0868 0.1433 0.2144 0.1222 0.1019 SC 0.1115 0.0863 0.1422 0.2128 0.1213 0.1009