1 Introduction
Recently, Online Social Networks (OSN) incorporated geographical information into their content which triggered new functionalities and introduced the concept of Locationbased Social Networks (LBSNs). In such networks, such as Facebook Places^{1}^{1}1www.facebook.com/places, Foursquare^{2}^{2}2www.foursquare.com, Yelp^{3}^{3}3www.yelp.com, users may share their interests along with spatial dimension to obtain recommendations for possibly interesting places based on their recent history. Learning users’ history is a crucial task for these models, to provide meaningful suggestions for PointsofInterest (POIs). Unfortunately, factors like sparsity, heterogeneity and multidimensionality pose significant challenges that increase the problem complexity with a large impact on effectiveness and efficiency.
There has been extensive research on the topic, which primarily focused on users’ relations with geographical information over userlocation bipartite networks. However, such approaches failed to eliminate the sparsity problem since they miss preference dynamics and auxiliary information related to users’ interactions. Similarly, other works enrich userlocation information with social network ties but still miss all aforementioned factors. Moreover, research considering the temporal evolution of users’ preferences still fail to deal with sparsity because there is additional contextual information related to users which changes along with their preferences over time.
(a) Spatial distribution of one user’s checkins close to home or work, (b) temporal distribution of checkins where both lines indicate the proportional probability of being in one of these two states, (c) the social behavior indicate the influence of the social network to the target user, and (d) preference dynamics evolution between two months.
In this paper, we present a technique that considers social, geographical and temporal influence, along with users’ preference dynamics, in a unified model, embedding eight relational graphs into a single latent space.
Spatialbased Behavior. Recent research points out that there is a spatial pattern on users’ checkin behavior during daily activities [2]. In particular, users who checkin a location within a region have high probability to attend locations that are proximate. For example, a user located close to work or home has higher probability to visit a proximate location, rather than one in a long distance [6]. We claim what users tend to perform a sequence of activities during the same time period within a region which is related to a task. For example, users who want to do shopping usually checkin a mall, a supermarket, a grocery, or a bakery located close to home, as shown in Figure 1(a). Also, during weekends people perform multiple checkins in clubs and restaurants that are close to each other. Thus, spatial proximity should be considered as a repeated geographical pattern. Also, the relation of those locations can be considered as a route of locations that are related to one activity.
Temporalbased Behavior. Usually, users maintain a fixed daily program in their activities and the checkins they perform. Thus, on weekdays a user performs checkins at locations close to work , whereas from p.m. until midnight she checkins at locations close to home, as shown in Figure 1(b). This pattern is reversed in weekends because users tend to checkin bars and restaurants. Several works aim to model this behavior by focusing on temporal drifting but they ignore explicit and implicit contextual information [12, 17] related to users activities.
Preference Dynamics. Users tend to change behavior during time which makes this problem even harder. For example, a user may attend clubs every weekend of September and cinemas or restaurants during April weekends, as shown in Figure 1(d). In both cases, the same user alternates his checkin behavior, which should be taken into account. This preference evolution according to [21, 6] may be due to: 1) New POIs exploration:contrary to ordinary checkin behavior, users tend to visit new locations, 2) User experience:users will choose a location according to locations in which he had a pleasant experience in the past, 3) Popularity:some locations tend to be more popular during a time interval rather than the rest of the year, 4) Social influence:friends’ opinion bias users’ decisions (as shown in Figure 1(c)). Thus, users tend to examine their friends’ evaluations over locations before attending and follow their lead.
Motivation and Contribution. Motivated by users’ behavioral pattern in this section we summarize the limitations of existing approaches:

many methods consider POIs as conventional nodes and do not capture the spatial dimension and the proximity among them,

other works consider geographical influence but ignore preference dynamics tackling the over all accuracy,

methods that capture temporal dynamics do not treat simultaneously the spatial dimension,

finally, models that consider both spatial and temporal behavior ignore the preference evolution.
Thus, there is a need to consider all aforementioned factors in a unified model which allows to further understand users’ behavior and personalize the recommendations. The contributions of our work are as follows:

we present a probabilistic weighting strategy over the information graphs to overcome sparsity,

we propose two novel algorithms to extract the routes and the stay points out of the past history checkins,

we introduce a novel graphbased approach that jointly learns users’ and POI embeddings from these weighted graphs into the same latent space and provides personalized POI recommendations,

our approach extends the LINE model [16] and learn the embeddings of large unipartite and bipartite graphs simultaneously into a low dimensional space,

we experimentally evaluated our model measuring the accuracy of POI recommendations for ) all the users, ) the coldstart users, and ) the coldstart locations.
Extensions beyond the Conference Version. This work is an extended version of our work presented at the 5
IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018)
[3]. This journal version contains several enhancements with respect to the conference paper. The most significant changes are summarized below:
we introduce two novel networks: ) Stay Points, representing locations the user stayed the most, and ) Routes, the path followed when visiting POIs,

we incorporate two novel algorithms to build the aforementioned information networks which we weigh according to their importance,

we jointly capture users’ and POIs sequential dynamics,

the performance evaluation section has been extended significantly by studying two important topics in the domain related to the coldstart problem for both users and locations, and

we compare our approach against additional stateoftheart methods in terms of accuracy.
Roadmap. The rest of the paper is organized as follows. Section 2 summarizes the related work, whereas Section 3 describes the preliminaries and the problem definition. In Section 4, we illustrate the model structural parts in details, while in Section 5, we present the findings of our experimentation. Finally, Section 6 concludes this paper.
2 Related work
In this section, we discuss research conducted related to POI recommendation. In particular, we analyze how the prior work exploits dimensional networks to over come sparsity and cold start problems. These networks include social ties, geographical proximity, temporal distance, and preference dynamics of users’ checkin past history.
POI Recommendations. The lack of direct relation between trajectory points and users’ preferences derive from their checkin records tackled research to that direction. Recent years with the raise of LBSNs, users are able to checkin locations which has resulted to an anonymous access of their data for research purposes. To this context there has been a lot of attention in recommendation models that use such information. Most of related work uses Collaborative Filtering (CF) [24], Contentbased Filtering (CB) [13, 4] or hybrid [6, 20, 22] approaches to learn users’ preferences over attended POIs and makes predictions for unvisited locations. The former approaches offer recommendations based on the assumption that users who visit the same POIs are most likely to visit the same locations in the future. For instance, Yuan et al. exploit the similarity among users through the checkin history and use collaborative filtering [24]. On the other hand, contentbased approaches use additional information related to users or POIs to tackle sparsity problem. Similarly, Gao et al. [4] exploit the content information of LBSNs by investigating the types of content information related to POI attributes along with checkin records. To overcome problems each method faces separately, such as: ) treating POIs as nodes and ignoring the geographical proximity, and ) missing additional dimensions such as social influence or temporal evolution, hybrid approaches were introduced. Bellow we discuss in detail models that use additional information networks.
Social Influence. Since checkin records cannot always overcome the sparsity and coldstart problems, many approaches use social ties following the assumption that users tend to follow their friends’ lead [6]. For example, Li et al. [8] distinguish three types of friendships, that are ) linked, ) colocated, and
) proximate friends and exploit their checkin records through a unified framework. First, this model learns the common POIs that the target user and all three types of friends checkin the past. Then, it uses matrix factorization to minimize two loss functions over the learned POIs to personalize recommendations. Similarly, Zhang et al.
[25]introduced another unified model named LORE (Location Recommendations) that combines sequential patterns with social and geographical influence. This model exploits the sequential influence of POIs over users’ records through a dynamic graph with Additive Markov Chain (AMC). Finally, it combines all aforementioned influences into one model with product fusion rule equation.
Geographical Influence. To further enhance the knowledge about users and eliminate sparsity, many methods use geographical information. Unfortunately, some of them treat POIs as conventional items [11] and miss the geographical proximity between locations. On the other hand, recent studies [19, 22, 23, 18] consider the geographical proximity and treat them as ‘spatial items’. In particular, Wang et al. [19] focused on the importance of sequential influence of spatial items a user purchases, and proposed a Sequential PersOnalized item REcommender system (SPORE) that fuses the sequential influence of spatial items and the preferences of a target user into the same latent space through a probabilistic topicregion unified model. Extending previous work related to spatial item recommendation, another probabilistic model [18] was introduced that jointly correlates the geographical influence, item attributes, and users’ reviews into one unified framework named LSARS. Both models support the claim that users are willing to interact with proximate items, thus they use geographical influence jointly with additional factors.
Yin et al. [23] claimed that users keep the same preferences either being in their hometown or when they visit a new region. The authors used spatial attributes of POIs to alleviate sparsity and coldstart problems for outoftown users. The geographical influence, such as region attributes, is used as auxiliary information to autocorrelate users with new locations. That model, named SHCDL, jointly learns these attributes along with additive representations of users’ checkin preferences. This way they search for proximate items close to users’ past preferences in other regions. In the same content, Yang et al [22] developed a Preference And Context Embedding
(PACE) model that jointly learns the embeddings of users and locations. This hybrid model bridges SemiSupervised Learning (SSL) with Collaborative Filtering (CF) in a unified way.
Temporal Influence. Recent stateoftheart studies indicate that there is a periodicity on users’ behavior which should also be encountered along with all aforementioned factors [6, 26, 10]. Yuan et al. [24] examined the spatiotemporal behavior during different hours of day claiming that users tend to visit specific locations at different time periods. They proposed a unified model that combines CF and Bayes rule to provide POI recommendations considering both proximity and periodicity. Similarly, Kefalas et al. [6] proposed another hybrid model that combines the CF and CB in a unified way exploring i) the proximate users’ preferences, ii) the textual influence alternation within time periods, and iii) the preference dynamics evolution. Results indicate an evolution of users’ checkin behavior, since they are highly influenced by all factors combined. The results pointed an incremental robustness of the precision against models considering each factor separately. In the same direction, Liu et al. [10] developed another hybrid spatiotemporalaware model that learns the jointly representations of users, spatiotemporal patterns and POIs.
On the other hand, following the embedding models like word2vec [14], Zhao et al. [26] presented GeoTemporal sequential embedding rank (GeoTeaser) that combines both ) the temporal POI embeddings, and ) the spatial hierarchical pairwise ranking. In particular, the first half model learns POI representations based on a temporal POI embedding model which uses users’ daily checkin history as a sequence. The second half model ranks POIs based on geographical information in a hierarchical pairwise preference. Similarly, Xie et al. [20] presented a unified graphbased model named GE that jointly learns POI embeddings by capturing the sequential effect, the spatial influence, the periodicity and the semantics into the same latent space.
In contrast to existing work, we propose a model that considers all aforementioned factors into one unified model. In particular, we apply a probabilistic strategy to weigh the importance of edges over eight information graphs. Then, we introduce a graphbased approach that jointly learns users’ and POIs embeddings from these weighted graphs into the same latent space and provides personalized POI recommendations. Furthermore, we examine the performance of our approach against coldstart users and coldstart POIs problems in terms of accuracy. The proposed technique shows significant performance improvement in comparison to existing stateoftheart methods. To the best of our knowledge, this is the first attempt in literature to face both problems simultaneously.
3 Preliminaries and Problem Definition
In this section, we present the problem definition in detail and also discuss some fundamental concepts related to our research. Figure 2 depicts all participating networks, whereas Table I presents the most frequently used symbols in the sequel. Some fundamental definitions follow.
Definition 1
(POI): is a unique location in which users checkedin. A POI is represented as a tuple: , longitude, latitude.
Definition 2
(Checkin): is a selfreport positioning of a user , in a location , at time , represented as a tuple: =u,l,t. One checkin can be performed only by one user, but the same user may have multiple checkins recorded in the profile .
Definition 3
(Time Period): is defined as distinct time intervals of the entire dataset, divided in equal size bins of days, weeks or months. Each period contains the checkins of all users performed during that time interval .
Symbol  Description 

set of users ,  
set of locations ,  
set of Time intervals ,  
set of Routes ,  
set of Checkins  
user’s checkin  
location belong to user’s route  
number of POIs in routes  
time interval  
,  number of samples, number of negative samples 
edge between two nodes  
set of edges over each graph  
weighted edge  
set of weights over each graph  
an edge sampled from 
Definition 4
(UserUser graph): is a unipartite graph that describes the social network of the users. This graph is denoted as , where and are sets of users, and is the set of the weighted edges among them. This network is an undirected graph of friendship relations among users. Thus, the bidirectional connection between the vertices is described as = and the weight is defined as the fraction of 1 over the number of users (), who are friends with user :
Definition 5
(UserPOI graph): is a directed bipartite graph which indicates relation between users and locations for the entire checkin history. In particular, it defines the importance of one location for a target user against all the other locations. This graph is denoted as , where is the set of users, is the set of locations, and is the set of the weighted edges among them. The weights are computed as the fraction of the number of times a user visited location against all over all her checkins:
Definition 6
(UserTime period graph): is a directed bipartite graph representing the interaction of a user at a time period. This graph is denoted as , where U is the set of users, T is the set of time periods, and is the set of the weighted edges among them. The weights are computed as the fraction of the number of each user checkins during a time period against all the checkins s/he has made:
Definition 7
(Route): is the sequence of POIs each user attended during a time period . For example indicates that user u moved from location to location and then ended his route at location during time period . To extract the routes of each user, we first short the checkins according to time. Then, we split the dataset into time periods and for each user we construct the route sequence during that time interval.
Definition 8
(UserRoute graph): is a directed bipartite graph which emphasizes the importance of a route for each user. First, we extract the routes from given POIs (as described in Definition 1) with Algorithm 1.
Then, we correlate each user from to one route from the set . This relation is represented as . The weight of the edge between two nodes is computed as the number of times one user followed one particular route to the total number or routes made by the same user during all time periods:
Definition 9
(POIPOI graph): is a bidirectional bipartite graph pointing the spatial proximity between two locations. In particular, two locations and are connected with an edge, if and only if one user checkins both of them during a time period and a distance range . According to this assumption we build our graph denoted as , where is the set of locations, and is the set of the weighted edges between two location nodes considering their geographical proximity. This weight is computed as:
where computes the geographical distance between two locations.
Definition 10
(POIUser graph): is also a directed bipartite graph which points the relation between a location and a user. The main difference in that graph is the weighting strategy we follow, since the influence of a location to a user differs from the previous approach. In particular, this graph is denoted as , where L is the set of locations, and U is the set of users, and is the set of the weighted edges among them. The weight is computed as the fraction of the number of times a location is visited by a user to the number of all users checkedin that location:
Definition 11
(POITime Period graph): is a directed bipartite graph which indicates the importance of a location during a time period. This graph is denoted as , where L is the set of locations, T is the set of Time periods, and is the set of the weighted edges among them. The weight is computed as the fraction of the number of the checkins performed at a location by all users, during a time period , against the total number of checkins during all time periods at the same location:
Definition 12
(POIStay Points graph): is a bidirectional bipartite graph which describes the significance of some locations for a user. In daily schedule users follow some routes and spend some time in each location of that route sequence. The elapsed time of each location indicates the importance of this location to this particular user. Thus, the more time is spent on a location, the higher the importance of that location is. To extract these important locations, denoted as ‘stay points’ (), we use Algorithm 2.
Then we construct the graph, where L is the set of locations, ST is the set of stay points, and is the set of the weighted edges among them. The weight is computed as the fraction of the number of times one location is considered as a stay point in a route, against the total number of times this location is considered as stay point in all routes:
Problem Definition: “Given a user , a location and a time instance (expressed as ), and the checkin history, predict the top unvisited proximate POIs to that target user.”
4 Proposed Approach
Next, we present RELINE, an optimized solution for jointly learning the graph embeddings of different information networks in the same latent space and we propose a unified framework for POI recommendations.
4.1 Learning Embeddings in a Bipartite Graph
Nodes that are directly connected to an edge and weight , consist the firstorder proximity, that is the local pairwise closeness between two nodes. On the other hand, nodes that share many connections but they are not directly related to an edge, they belong to the same neighborhood, thus, there are most likely to be similar to each other. These nodes consist the secondorder proximity, that is the similarity between two unlinked nodes according to their network structure. To extract this kind of proximity on unipartite graphs, the LINE model [16] learns the embeddings of large graphs into a low dimensional space. With this work, we extend this model to learn the embeddings over bipartite graph nodes. Moreover, our model can be applied into all kind of bipartite graphs i.e., directed/undirected, weighted/unweighted, or combinations of them.
Given two disjoinτ sets , the nodes in that share many connections with but are not directly connected to each other, are most likely to have the same distributions. The conditional probability of one node is generated through node such as:
(1) 
where the embeddings vectors of vertices
, and are represented as and , respectively. Thus, for each node , Equation (1) defines the conditional distribution to all the corresponding nodes in the set . Then, for each edge there is a weight which implies the strength of this tie. To retain the proximity of the unlinked nodes in , we let the described conditional distribution to approximate the empirical distribution with the following objective function:(2) 
where
denotes the KullbackLeibler divergence of the conditional and the empirical distributions, and
is a regularization parameter to tune the significance of node . For simplicity, we set this parameter equal to the outdegree of each node. Thus, Equation (2) corresponds to the minimization of the following objective function:(3) 
The vectors and that minimize Equation (3) are the lowrank nodes representations in .
4.2 Model Optimization trough Negative Sampling
To avoid the calculation of the conditional probability which needs the summation of the entire set of nodes in Equation (3), we apply negative sampling (NEG) [14] over each edge. In particular, we use the noisy distribution of each edge individually to sample negative edges as described in the following objective function:
(4) 
where
is the sigmoid function with output values [01], and
is the noise distribution in which the negative samples are chosen with a unigram distribution empirically tuned in [14], such that each node occurrence in the set is independent of all other node occurrences. Thus, selecting a node as a negative sample is related to the outdegree in that power. To further improve the solution of Equation (4), we apply ‘Hogwild’ algorithm [15]based on the asynchronous stochastic gradient descent (ASGD). In particular, each time an edge
is sampled, we calculate the gradient of node over the corresponding embedding vector as follows:Notice that the gradient of node is multiplied with the weight related to that edge. Thus, tuning the learning rate of the model, may cause problems due the valiance of the weights. On the one hand, ‘overfitting’ may occur to the gradients with large weights, if large learning rate is chosen according to edges multiplied with small weights values. On the other hand, ‘underfitting’ may occur to the gradients with small weights, if small learning rate is chosen for edges multiplied with large weights values.
To balance the learning rate or our model, we adopt the sampling method presented in [16]. In particular, we sample a random edge , where denotes the sum of all weights in the particular network, and then we examine the interval in which the particular sampled edge falls into, i.e.. Finally, we draw a sampled edge using alias table according to [7] which eventually reduces the complexity to . Table II presents the over all complexity of edge sampling optimization.
Sample edge from alias table  

Negative sampling optimization  
Overall complexity 
4.3 Joint Learning of Graph Dynamics
Given the input bipartite graphs, the next step is to integrate them into our model. Graphs that correspond to users’ relations are: UserUser, UserPOI, UserRoute, and UserTime Period. On the other hand, graphs that correspond to the location ties with other networks are: POIPOI, POIUser, POIImportant locations, and POITime period. We collectively integrate the embeddings of these graphs corresponding to users’ and POIs ties, by minimizing the objective function:
(5) 
where each of the above objective functions is computed as:
,  
,  
,  
, 
To minimize the objective function presented in Equation (5), first we merge together edges of all unipartite and bipartite graphs, and then, in each step we update the model by sampling a new edge. The probability of sampling an edge corresponds to the weight related to that edge. This way our model walks through the heterogeneous bipartite graphs with respect to the inner and the outer vertices of the graphs and the weight influence. The training of the model is done jointly and dynamically as shown in Algorithm 3.
4.4 Unified Model for POI Recommendations
By the time all embeddings, presented in previous section, have been learned, and given a prediction request , concerning a user in a location at a timestamp , we project these values to the corresponding time period , route , and stay point , with geographical range distance less than 10 Km. We claim that one user is willing to attend proximate locations. Thus, given a recommendation further than this range, the probability of attending is very small. Then, we rank a list with the top@ unvisited candidate POIs for that user in that distance. The prediction score for each of the unvisited location is computed as:
(6) 
where is the embedding of user , is the embedding of location , is the embedding of the route this checkin belongs to, is the embeddings of the time period in which the particular checkin was made, and is the embedding of the stay points. Moreover, RELINE learns jointly the embeddings from different information networks in the same latent space. Thus, the learned POI embeddings capture the information of all participated networks presented in the previous sections, such as , , etc. This way, we aim to eliminate sparsity, simply by using additional information networks. In particular, we jointly learn the dynamics of the social influence , the the geographical influence , the the temporal influence , and the user’s preference dynamics , simultaneously. Finally, , , , and are regularization parameters that define the importance of each information network separately into our model.
5 Performance Evaluation
In this section, we focus on the performance of the proposed methodology. In particular, we compare our technique against previous ones. The source code of our approach is available at https://github.com/thedx4/RELINE.
5.1 Datasets and Techniques
We have used three realworld datasets: ) Foursquare [20]^{4}^{4}4https://sites.google.com/site/dbhongzhi, ) Weeplaces [1]^{5}^{5}5http://www.yongliu.org/datasets, and ) Gowalla [1]. Their main characteristics are presented in Table III. All datasets contain users’ checkin history with timestamp and geographical information. Additionally, they contain information about the social ties of users’ friendships. The datasets span a time period of 5, 91 and 31 months, respectively.
Foursquare  Weeplaces  Gowalla  

Users  114,508  15,799  319,063 
POIs  62,462  971,309  2,844,076 
Checkins  1,434,668  7,658,368  36,001,959 
Friendships  32,511  119,930  1,900,653 
Time span  Sep 2010  Jan 2011  Nov 2003  Jun 2011  Jan 2009  Aug 2011 
In Figure 3 we present the distributions for all three datasets. As shown, the datasets follow a power law distribution for both the number of users’ checkins and the number of visits to a particular location as presented in. According to the power law distribution, there is a small number of users with many checkins (short head) and many users that have a small checkin record (long tail). Similarly, the same principle holds for locations where, on the one hand, there are popular locations with a lot of visits, and on the other hand, locations that are visited fewer times. Moreover, all datasets present a good example of the cold start problem [5], i.e., to recommend new locations to users with small history record.
Furthermore, we present the temporal distribution of all datasets in Figure 4. In particular, Figure 4 presents the distribution of the checkins per day of the week. It is noticeable that users tend to be more active from Thursday to Sunday than the rest days of the week. Due to the fact that users checkin locations during their spare time, this figure reflects a tendency according to which they use to perform more checkins during the end of the week. The same principal stands for the distribution of the checkins during the day as shown in Figure 4. Once again, we observe that users are more active during 13:00 p.m. to 2:00 a.m. with peak time at 18:00 p.m. and they tend to checkin at more locations during the night than during the morning.
In the experimentation we evaluate the performance of the following techniques:

RankGeoFM [9]: is a MF unified model which learns users’ preferences and incorporates the spatial influence of proximate POIs simultaneously. The second term of the model is regularized with a distancebased weighting related to the target POI.

ASMF [8]: is another MF model consisted by a two step procedure. First, the model learns candidate POIs which have been visited by friends (social influence), and then a categoricalbased weight is applied considering geographical influence.

GE [20]:
is a Graph Factorization (GF) approach according to which two joint probability distributions are computed for each pair of nodes. The first is related to the adjacency matrix and the second is related to the embeddings. This method embed four bipartite information networks into the same latent space to predict the next unvisited POI by updating dynamically the users’ profile.

PACE [22]: is a semisupervised learning framework, based on users’ Preference and Context Embeddings, that jointly learns users’ and locations embeddings.

Versions of RELINE: To evaluate the influence of each network we have used three versions of our model. In particular, we start with a simplistic version and then we proceed with more enhanced alternatives.

RELINE: Is a simplified version of RELINE, which contains only the social influence information, that is .

RELINE: Is the previous model enriched with the geographical influence .

RELINE: Is the last version which enriches the former model with the Temporal influence .




5.2 Evaluation Methodology
We consider the partitioning of checkins of each target user into three sets: (i) the training set , that is of the total checkins and is treated as known information, (ii) the probe set , that is and is used for testing our model, and (iii) the validation set is the rest for tuning the hyperparameters. It holds that, = and =. Thus, for each target user we generate recommendations based only on the POIs in .
For the evaluation we measure the Accuracy@ as proposed in [19]. In particular, for each given as a query , we compute the prediction score for that along with all unvisited proximate POIs of the target user with Equation (6). We rank the predicted scores into a list and we select the top@ POIs. If the ground truth appears in the top@, then we have predicted correctly that location (i.e. True Positive), otherwise our prediction is wrong. To compute the overall accuracy of the top@ we average all predictions test cases as:
(7) 
5.3 Impact of Information Networks
Next, we examine the influence of each information network into the overall predictions accuracy. In particular, we explore how beneficial the enrichment is against the top predictions. The participant networks examined in this section are i) the social influence, ii) the geographical influence, iii) the temporal influence, and iv) the users’ preference dynamics. Thus, we compare RELINE with the three models RELINE, RELINE, and RELINE, which are described in section 5.1. It is noticeable that as long as we embed more information networks, the accuracy increases as well, as shown in each of the columns of Tables V(a)V(c). Thus, our model gradually alleviates the sparsity problem since it explores more information about the users or the POIs. Moreover, the accuracy of each model increases with , meaning that the models fit well to users’ behavior.
5.4 Comparison with Other Techniques
In this section we compare our model with other stateoftheart approaches in terms of accuracy for the top predictions [=] against 3 big datasets. In particular, we examine the performance of all models for providing POI recommendations for ) all users, i) coldstart users, and ) coldstart POIs.
5.4.1 Accuracy over all users
First, we examine how models perform while providing recommendations to all users, without taking into account the history size or data sparsity. It is observed that RELINE significantly outperforms all methods as shown in Figure 5. Compared to methods that either a) learn users’ preferences and then incorporate geographical influence like RankGeoFM, or b) learn social network checkins like ASMF and ignore i) the sequence of the POIs, ii) the temporal influence, and iii) preference dynamics, the performance is much higher since our method explores richer information in the same latent space. On the other hand, methods that learn users’ and POI embeddings into the same latent space through multiple information networks, like GE and PACE, miss other important factors such as i) periodicity, ii) preference evolution, and iii) sequential importance of the checkins and gain lower accuracy compared to our method. We highlight that all models show higher accuracy while the number of POIs is small and the checkin activity denser, as show in Figure 5. Moreover, when the dataset is sparse with multiple available POIs to visit, such as the other two datasets presented in Figures 55, the accuracy is lower. This finding supports the claim that learning both users’ and POIs embeddings derived from as many information networks as possible, increases the model’s ability to correlate a user with POIs and eventually improves the accuracy.
5.4.2 Accuracy over the coldstart Users
Next, we examine the effectiveness of our model regarding the coldstart user and perform comparisons with other approaches. The concept was initially introduced in [5] and refers to users with short history. It is evident that supporting recommendations for such users is a difficult task due to lack of adequate information. To this context, we performed experiments providing recommendations only to coldstart users and we compared the performance of the models in terms of accuracy, as shown in Figure 6. Since only ASMF and GE support coldstart recommendations, we compare our model to these two methods. Moreover, we observe that the overall accuracy compared to the experiments of the previous section is reduced, which is normal, since we do not have much information. Even though, ASMF learns the POIs visited by target user’s social network and then it refines the results with a categorical weighted strategy w.r.t. geographical influence, its accuracy is significantly lower compared to our method. Similarly, the performance of GE which jointly captures the spatial influence, the sequential effect, the periodicity, and the semantics into the same latent space is lower since it does not consider: 1) the importance of stay points in each sequence, 2) the preference dynamics, 3) the temporal effect, and 4) the social influence. In contrast, we use side information related to both users and POIs from eight weighted information networks, which has a significant impact in effectiveness.



5.4.3 Accuracy over coldstart POIs.
Next, we examine a similar problem called coldstart POI. The goal in this case is to recommend unvisited POIs to users that have at least one checkin at a POI that has less than 15 checkins. Thus, we examine not only how models behave on new users but also, how they behave when a new location gets into the system. In simple terms, we want to check if the new location is among the top recommendations or not.
Once again, we evaluate our approach only with the models that support coldstart POI problem. In Figure 7, we present the results of all models against all three datasets. Clearly, our model outperforms compared methods since, among all aforementioned factors, it explores POIs as a sequence of routes. In particular, all coldstart POIs are correlated with other proximate POIs when we extract users’ routes. Also, some of them are considered as ‘stay points’ based on how long a user spends on that location. Then, we weigh the edge between the user and the coldstart POI, as an important one, if that user spends a lot of time there. This way, we tackle coldstart POIs by using their relational influence with other nodes on the graph during learning phase. It is noticeable that both comparison models gain lower accuracy for top prediction when the number of the POIs is small as shown in Figures 77 because they either explore users’ friends checkins who have not visited coldstart POIs, or they explore POIs sequences based on the frequency a user checkins a coldstart POI. The POIs with few checkins are not connected to others in the POIPOI graph which GE uses. Finally, both comparison models seem to gain higher accuracy with many POIs on the system as for example the Gowalla dataset which is pointed in Figure 7.



5.5 Parameter Tuning
In this section, we study the importance of parameter tuning. In particular, we examine the impact of: 1) adding information networks to the model, 2) the number of samples , 3) the embeddings dimensionality , and 4) the time period size , to the performance of our model in terms of accuracy.
5.5.1 Impact of Samples and Dimensions number
Here, we present the experiments conducted to select the best candidate parameters for the number of samples and dimensions. The results for each dataset are presented in Table V. Our findings for the top@10 indicate that our model is not greatly affected by the dimensionality . The accuracy increases with higher rate along with dimensionality until =100 for Foursquare and Weeplaces and =140 for Gowalla. Thereafter, the accuracy does not change significantly. In contrast to dimensionality, our model is sensitive to the number of the samples (). Until a convergence point is reached, our model keeps increasing its accuracy along with the size of and then the improvement is poor. Also, the higher the number of connections between the network edges, the higher the number of the sample. To gain higher accuracy, we set the number of samples equal to 100 for Foursquare, 200 for Weeplaces, and 300 for Gowalla along with the dimensionalities discussed previously.
5.5.2 Impact of Time Period
Next, we examine the influence of the time interval T to the overall accuracy against varying values of the top@ predictions. T is crucial for our model since it is used to construct multiple graphs such as UserRoute, UserTime period, POITime period etc. When extracting userroute edges, if there are not enough checkin data during small time intervals, the correlation of that user with other candidate POIs is difficult to be achieved. To overcome this issue, we use different size for T to examine which achieves higher accuracy. Table VI presents the results for each dataset separately. It is noticeable that there is a point in each subtable where the accuracy reaches its maximum value and then gradually decreases. The reason is that, when T is too small, there are few data and the accuracy is low. On the other hand, when T is large, there are too many nodes correlated to the target user which leads to overfitting. Thus, we set the size equal to 20, 40, and 15 for Foursquare, Weeplaces, and Gowalla, respectively.
5.5.3 Tuning parameters , and
In Figure 8, we present the results of tuning the parameters , and in terms of accuracy for all datasets. As shown in equation (6), each parameter corresponds to the social, geographical and temporal influences, along with the preference dynamics, respectively. For simplicity, we set each parameter to a value between [01] and all other parameters equal to . It is observed that there is an intersection point in each diagram where the accuracy meets. Also, there is a peak point where our model gets the higher accuracy, which was used to train our model. Moreover, the importance of each influential network is diversified trying to adapt the model to users’ behavior.
As shown in Figure 8, the influential parameter may be different. For example, regarding Foursquare, users are highly influenced by their social network as shown in Figure 8. Thus, they tend to visit the locations their friends visit. On the other hand, Weeplaces users tend to follow specific routes everyday, thus their movement to a new location is highly influenced by geographical factors, as shown in Figure 8. Finally, in datasets with many checkins, like Gowalla, it is clear that capturing the preference dynamics influence is more important when predicting the next location as Figure 8 depicts.
6 Conclusions
The rapid growth of users’ interaction in OSN and the huge amount of information they provide, led researchers to models that retrieve personalized information close to their past preferences. Unfortunately, these models are tackled by sparsity and coldstart problems. The geographical information related to the content posted in such networks triggered new functionalities and directions for research to attack both problems. There are many models that provide POI recommendations using either the social network influence, or the geographical proximity of user’s current location. However, these models miss the influence of the temporal dimension or users’ preference dynamics, which are crucial while personalizing the retrieved information.
In this paper, we present a novel model that considers all aforementioned factors while providing POI recommendations. In particular, our model uses a probabilistic weighting strategy over 8 information graphs that correspond to users and POI relations. Then it uses a graphbased approach that jointly learns users’ and POI embeddings from these weighted graphs into the same latent space and provides personalized POI recommendations. We examine the influence of social, geographical, temporal, and preference dynamics in terms of accuracy. We compare our approach against four stateoftheart models measuring the accuracy of the recommendations for i) all the users, ii) the coldstart users, and iii) the coldstart locations. Our method significantly outperforms stateoftheart approaches.
References
 [1] R. Baral and T. Li. MAPS: A multi aspect personalized POI recommender system. In Proc. 10th ACM Conf. on Recommender Systems (RecSys), pages 281–284, Boston, MA, 2016.
 [2] E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: User movement in locationbased social networks. In Proc. 17th ACM SIGKDD International Conf. on Knowledge Discovery & Data Mining (KDD), pages 1082–1090, San Diego, CA, 2011.
 [3] G. Christoforidis, P. Kefalas, A. Papadopoulos, and Y. Manolopoulos. Recommendation of pointsofinterest using graph embeddings. In Proc. 5th IEEE International Conf. on Data Science & Advanced Analytics (DSAA), Turin, Italy, 2018.

[4]
H. Gao, J. Tang, X. Hu, and H. Liu.
Contentaware point of interest recommendation on locationbased
social networks.
In
Proc. 29th AAAI Conf. on Artificial Intelligence
, pages 1721–1727, Austin, TX, 2015.  [5] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1):5–53, 2004.
 [6] P. Kefalas and Y. Manolopoulos. A timeaware spatiotextual recommender system. Expert Systems with Applications, 78:396–406, 2017.
 [7] A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In Proc. 20th ACM SIGKDD International Conf. on Knowledge Discovery & Data Mining (KDD), pages 891–900, New York, NY, 2014.
 [8] H. Li, Y. Ge, R. Hong, and H. Zhu. PointofInterest recommendations: Learning potential checkins from friends. In Proc. 22nd ACM SIGKDD International Conf. on Knowledge Discovery & Data Mining (KDD), pages 975–984, San Francisco, CA, 2016.
 [9] X. Li, G. Cong, X.L. Li, T.A. N. Pham, and S. Krishnaswamy. RankGeoFM: A ranking based geographical factorization method for point of interest recommendation. In Proc. 38th International ACM Conf. on Research & Development in Information Retrieval (SIGIR), pages 433–442, Santiago, Chile, 2015.
 [10] B. Liu, T. Qian, B. Liu, L. Hong, Z. You, and Y. Li. Learning spatiotemporalaware representation for POI recommendation. CoRR, abs/1704.08853, 2017.
 [11] L. Liu, J. Xu, S. S. Liao, and H. Chen. A realtime personalized route recommendation system for selfdrive tourists based on vehicle to vehicle communication. Expert Systems with Applications, 41(7):3409–3417, 2014.
 [12] Z. Lu, B. Savas, W. Tang, and I. S. Dhillon. Supervised link prediction using multiple sources. In Proc. 10th IEEE International Conf. on Data Mining (ICDM), pages 923–928, Sydney, Australia, 2010.
 [13] F. Meng, D. Gao, W. Li, X. Sun, and Y. Hou. A unified graph model for personalized queryoriented reference paper recommendation. Proc. 22nd ACM International Conf. on Information & Knowledge Management (CIKM), 2013.
 [14] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proc. 26th International Conf. on Neural Information Processing Systems (NIPS), pages 3111–3119, Lake Tahoe, NV, 2013.
 [15] F. Niu, B. Recht, C. Re, and S. J. Wright. HOGWILD!: a lockfree approach to parallelizing stochastic gradient descent. In Proc. 24th International Conf. on Neural Information Processing Systems (NIPS), pages 693–701, Granada, Spain, 2011.
 [16] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Largescale information network embedding. In Proc. of 24th International Conf. on World Wide Web (WWW), pages 1067–1077, Florence, Italy, 2015.
 [17] V. Vasuki, N. Natarajan, Z. Lu, B. Savas, and I. Dhillon. Scalable affiliation recommendation using auxiliary networks. ACM Transaction on Intelligent Systems & Technology, 3(1), 2011.
 [18] H. Wang, Y. Fu, Q. Wang, H. Yin, C. Du, and H. Xiong. A locationsentimentaware recommender system for both hometown and outoftown users. Proc. 23rd ACM SIGKDD International Conf. on Knowledge Discovery & Data Mining (KDD), 2017.
 [19] W. Wang, H. Yin, S. Sadiq, L. Chen, M. Xie, and X. Zhou. SPORE: A sequential personalized spatial item recommender system. In Proc. 32nd IEEE International Conf. on Data Engineering (ICDE), pages 954–965, Helsinki, Finland, 2016.
 [20] M. Xie, H. Yin, H. Wang, F. Xu, W. Chen, and S. Wang. Learning graphbased POI embedding for locationbased recommendation. In Proc. 25th ACM International on Conf. on Information & Knowledge Management (CIKM), pages 15–24, Indianapolis, IN, 2016.

[21]
L. Xiong, X. Chen, T.K. Huang, J. Schneider, and J. G. Carbonell.
Temporal collaborative filtering with Bayesian probabilistic tensor factorization.
In Proc. 10th SIAM International Conf. on Data Mining (SDM), pages 211–222, 2010.  [22] C. Yang, L. Bai, C. Zhang, Q. Yuan, and J. Han. Bridging collaborative filtering and semisupervised learning: A neural approach for POI recommendation. In Proc. 23rd ACM SIGKDD International Conf. on Knowledge Discovery & Data Mining (KDD), pages 1245–1254, Halifax, Canada, 2017.

[23]
H. Yin, W. Wang, H. Wang, L. Chen, and X. Zhou.
Spatialaware hierarchical collaborative deep learning forPOI recommendation.
IEEE Transactions on Knowledge and Data Engineering, 29:2537–2551, 2017.  [24] Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M. Thalmann. Timeaware pointofinterest recommendation. In Proc. 36th ACM International Conf. on Research & Development in Information Retrieval (SIGIR), pages 363–372, Dublin, Ireland, 2013.
 [25] J.D. Zhang, C.Y. Chow, and Y. Li. LORE: Exploiting sequential influence for location recommendations. In Proc. 22nd ACM SIGSPATIAL International Conf. on Advances in Geographic Information System (GIS), pages 103–112, Dallas, TX, 2014.
 [26] S. Zhao, T. Zhao, I. King, and M. R. Lyu. Geoteaser. Companion Proc. 26th International Conf. on World Wide Web (WWW), pages 153–162, 2017.