PG^2Net: Personalized and Group Preferences Guided Network for Next Place Prediction

10/15/2021 ∙ by Huifeng Li, et al. ∙ Shanghai Jiao Tong University NetEase, Inc 0

Predicting the next place to visit is a key in human mobility behavior modeling, which plays a significant role in various fields, such as epidemic control, urban planning, traffic management, and travel recommendation. To achieve this, one typical solution is designing modules based on RNN to capture their preferences to various locations. Although these RNN-based methods can effectively learn individual's hidden personalized preferences to her visited places, the interactions among users can only be weakly learned through the representations of locations. Targeting this, we propose an end-to-end framework named personalized and group preference guided network (PG^2Net), considering the users' preferences to various places at both individual and collective levels. Specifically, PG^2Net concatenates Bi-LSTM and attention mechanism to capture each user's long-term mobility tendency. To learn population's group preferences, we utilize spatial and temporal information of the visitations to construct a spatio-temporal dependency module. We adopt a graph embedding method to map users' trajectory into a hidden space, capturing their sequential relation. In addition, we devise an auxiliary loss to learn the vectorial representation of her next location. Experiment results on two Foursquare check-in datasets and one mobile phone dataset indicate the advantages of our model compared to the state-of-the-art baselines. Source codes are available at



There are no comments yet.


page 2

page 3

page 4

page 5

page 9

page 10

page 11

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rapid development of information and communication technologies, users can share their locations almost anytime and anywhere to acquire the location-aware services, which forms an abundant amount of trajectory data. Location-based social networks (LBSNs), such as Foursquare and Yelp, collect a huge amount of location data from millions of individuals [38, 33]. These trajectory data provide an unprecedented opportunity to study human mobility behavior at scale [11, 16]. With massive mobile phone data, González et al. found a high degree of temporal and spatial regularity in human trajectories [11]. Their radius of gyrations clearly follow a power law distribution, indicating the sample reproducible patterns of human mobility. In a following study, Song et al. measured the potential predictability of human mobility as 93% [27]. Due to the high predictability of human mobility, researchers attempted to model the mobility behavior of the population at urban scale and utilized it to tackle various urban challenges, including traffic congestion mitigation [7, 34, 24]

, air pollution exposure estimation 

[35], planning of electric vehicles charging behavior [32], control of the epidemics [29, 3].

Prediction of user’s next place to visit is a key in modeling of human mobility, and has attracted increasing attention from researchers [2]

. The task aims to predict the destination of next trip for each user. Assuming that a user’s next destination is highly correlated with her recently visited locations, Rendle et al. developed a personalized transition matrix based on the Markov chain to capture the influence of users’ recently visited venues on the current mobility decision 

[25]. Researchers also notice that the users’ periodical behavior observed from long-term history trajectory plays a critical role in the decision of next travel [30, 15]. For instance, in a user’s daily routine, she or he is accustomed to going to the library every weekend. Therefore, combining the influence of users’ long-term and short-term trajectory can benefit the prediction of next location [9, 23]. Feng et al. proposed DeepMove based on an attention mechanism to extract the influence of the user’s history trajectory information on the current situation and provide a personalized recommendation on locations to users [9]. Manotumruksa et al. proposed a deep recurrent collaborative filtering framework (DRCF) to utilize the location of users who are similar in the historical trajectory to assist the next location prediction. More recently, researchers proposed the CNN-based method and gated network under the RNN-based framework to extract both long-term and short-term preferences of users [5, 28].

Fig. 1: An illustration of next location prediction.

Although the prediction accuracy has been gradually improved in the fore-mentioned literature, predicting the mobility behavior of large number of users in complex urban environments is still significantly challenging. On one hand, existing studies focused on modeling the mobility patterns at individual level. The collective pattern of the interaction between the population and the space is not clearly modeled in these methods. In the early studies, statistical physicists used historical movement data to discover the collective pattern of humans. For example, Schneider et al. [26] found that more than 90% of residents’ mobile behavior conformed to one of 17 basic network modes (Motif). Alessandretti et al. [1] found that the number of places visited repeatedly by individuals is stable, revealing people’s stable social relations. These studies suggest that statistical physical characteristics are valuable for studying the movement trend of human group level.

On the other hand, existing methods for next location prediction usually cascade the embedding of user ID with the latent vector of locations in the long-term historical and recent trajectories to capture the user’s personalized preferences 

[13, 19, 20]. While users have various preferences to different locations, these methods can not directly model such heterogeneity and dynamic change of the preferences. Targeting this weakness, researchers introduced attention mechanism to learn the personalized preference of each user [31]. However, users’ mobility pattern in space and time is not fully explored from their trajectory sequences. In specific, they focus mainly on exploiting the sequential patterns of user personalised preferences without considering the associated timestamps and geographic information that reflect the collective mobility patterns. Therefore, how to reasonably use trajectory data to learn user’s personalized preferences is also the main content of our research.

Fig. 2: The overall framework of our model.

To this end, we propose a novel personalized and group preference guided network (PG2Net) to tackle the above issues. The framework is devised to predict each user’s next place to visit via considering her preferences to various locations at both individual and collective level. Fig. 1 illustrates the end-to-end procedure of our learning-based model. PG2

Net first learns users’ personalized features, as well as the group characteristics from the training data, and integrates the urban information into the prediction module. In the testing phase, the well-trained model is used to predict the next location given their current trajectories. Specifically, we consider several key factors that impact the collective mobility behavior, including geographic environment, distance attenuation, and individual spatial activity characteristics. Among them, the geographic environment determines the spatial distribution of potential visit locations which users can reach; the distance attenuation indicates the relationship between user’s visit frequency and distance; the individual spatial activity characteristics reflect people’s potential life habits. Considering the above factors, we design a module that combines prior statistical information and recurrent neural network, namely, the dynamic spatio-temporal dependency module, which can learn the influence of group preferences on users’ mobile patterns. For learning personalized preferences of each user, we propose a module based on bidirectional long short-term memory networks (Bi-LSTM) 


and an attention mechanism to capture her dynamic preference. Besides, we propose a novel graph embedding method to represent each location and its categories, which can efficiently learn the sequential relation between visited places. It should be mentioned that most previous studies did not take category information into account. In our model, we attempt to use category information to construct a statistical model to study the characteristics of user group activities. Finally, we propose a novel auxiliary loss function to learn the vectorial representation of the target location and improve the prediction accuracy. The network structure is shown in Fig. 


Our contributions are summarized as follows:

  1. We propose a PG2Net framework to learn the user’s personalized preferences and group preferences and predict the next place to visit. PG2Net framework consists of two parts: (i) dynamic spatio-temporal dependency module that leverages temporal and spatial information to model the group preferences of users; (ii) personalized preference module that uses Bi-LSTM and attention mechanism to capture users’ personalized preferences.

  2. For seeking more efficient hypothesis space for location embedding, we devise an auxiliary loss to enhance the similarity between the representations of the predicted and actual next locations. Consequently, the final loss function is designed to increase the probability of the actual location and decrease the distance between the predicted and actual locations in the hypothesis space.

  3. We design a group preference module in PG2Net to model the people’s collective preferences to variant locations during long-term and short-term periods. PG2Net captures the long-term and short-term preferences via integrating the prior statistical knowledge of people’s mobility behavior with the encoding of historical and recent trajectories, respectively.

  4. We conduct extensive experiments on three real-world datasets in different countries, including two public check-in datasets and a mobile phone dataset, a.k.a. call detail records (CDRs). Experiments show that our model achieves significant improvements over the state-of-the-art methods.

2 Related Work

The purpose of the next location prediction task is to recommend a set of ranked locations for users, where the highest-ranking is the predicted value of the next location. At present, there are two types of methods for this task – conventional (non-deep learning) machine learning-based methods and deep learning-based methods 


2.1 Conventional machine learning-based methods

The trajectory prediction task needs to mine the users’ past trajectory information to predict where the users will go. Typically, for this sequence prediction task, we can use a method based on the Markov model and its variants to make predictions. The Markov-based method is mainly to calculate the transition matrix of the location. According to transition matrix, we can predict where the user will go next. For example, Rendle et al. 

[25] proposed the factorized personalized Markov chains (FPMC), which is a method applying personalized Markov and matrix factorization to learn users’ transition matrix and overall preference. Cheng et al. [6] proposed a method FPMC-LR to obtain users’ personalized preference and realize the prediction of the next location. Excepted for Markov-based model, Alhasoun et al. utilized information from similar strangers for next place Prediction [2]

. In this model, they proposed several human mobility similarity metrics used to identify other users with similar mobility characteristics, and proposed dynamic Bayesian network (DBN) model that incorporates the mobility patterns of similar strangers towards better predicting next locations.

2.2 Deep learning-based methods

In recent years, deep learning has developed rapidly. Especially, RNN-based methods have attracted increasing attention and been successfully applied to many sequential problems, such as natural language processing, voice recognition, image annotation, and machine translation. So far, researchers have tried to use the RNN-based method for travel information prediction and achieve inspiring results 

[21, 9, 18, 28, 39]. For example, Liu et al. [21] proposed the ST-RNN network, which takes users’ adjacent time and space information as the input of the RNN module to capture the spatio-temporal influence. Yao et al. [36] established a recursive model of semantic perception (SERM), which learned the embedding of multiple factors (user, location, time) and captured the time and spatial transition regularity of semantic perception.

Recently, the attention mechanism has been widely used in various fields. Feng et al. [9]

proposed a model named DeepMove based on attention mechanism and recurrent neural network to capture human mobility. In DeepMove, a multi-module embedding method is adopted to convert sparse features (user, location, time) into a dense representation, and then the historical attention module is used to obtain the most relevant historical trajectory information. However, this method fails to capture the user’s dynamic personalized preferences and barely considers the temporal and spatial dependence of the actual users. Gao et al. 

[10] proposed a variation-attention-based next location prediction model to overcome the sparsity problem of trajectory data. Wu et al. [31] proposed a model named PLSPL to learn the specific preference for each user. This method attempts to model category information to predict the next location, but the method does not consider the specific regularities of human mobility and the sequence interaction influences. At present, the latest model for predicting the next location is the LSTPM model proposed by [28], which uses context-aware non-local network and geo-dilated RNN to obtain users’ long-term and short-term preferences respectively.

Fig. 3: Personalized Preference Modeling.

3 Problem Formulation

We define userset= as a set of LBSN users, and locationset= as a set of trajectories, each of which is geocoded by a (longitude, latitude) tuple representing urban information. Each user contains a large number of trajectories. Taking user as an example, represents all trajectory data of in a period. Because each trajectory corresponds to a category and a timestamp , it can be expressed as , where contains three attributes . Then we divide each user’s trajectories into multiple sub-trajectories in order, for example, is the trajectory sequence of , where is his historical trajectory sequence and is his recent trajectory. can be expressed as , where has three attributes , and represents recent situation. is the location that has been recently visited. The purpose of this paper is to study the influence of historical trajectories and recent trajectories on the current situation , and predict the top-N of the possible locations of at the next timestamp .

4 Our model

In this part, we propose our PG2Net model. We begin with an overview, before zooming into the details.

4.1 Overview

The overall framework of PG2Net is depicted in Fig. 2. Our PG2Net characterizes the user’s preference to various places at both personalized and group level, and fuses them into a unified framework. Specially, we learn the personalized preference of user U from his historical trajectory , which contains longer trajectory data reflecting the general preference of U. And different users have different preferences for the same location. Thus we use user-location attention to learn the latent vectors of user U and location . Firstly, we learn the latent vectors for user U and POI (which contains location , category , and timestamp ) in the multi-modal trajectory embedding module. Then we use Bi-LSTM to learn the historical trajectory’s transition relationship, and we compute the important of each POI to users. Finally, we integrate the sequence information of POIs to present the user’s personalized preference.

As human’s decision of moving is impacted by her periodic lifestyle and recent travel behavior, we use historical trajectory to capture her lifestyle while recent trajectory to capture instant decision. Specifically, we firstly learn the latent vectors for in the embedding layer, where contains the location , category , and timestamp . To better understand user’s check-in behaviors, we feed the concatenated embeddings of into Long Short-Term Memory (LSTM). Then we use a statistical spatio-temporal physical module to model a user’s historical trajectory and recent trajectory to learn his group preferences. Finally, concat layer is used to combine the outputs of the personalized preference, long and short-term group preference and feed them into the output layer to generate the final probabilities of candidate locations. It is a remarkable fact that we proposed a novel graph embedding method to learn the latent vectors of locations and categories. At the same time, in the output layer, we propose an auxiliary loss function to supervise the vectorial representation of the next location and improve the prediction accuracy.

4.2 Multi-modal Trajectory Embedding Module

Trajectory sequence usually contains a large amount of human mobility information. Due to the mobile device or the user itself, the trajectory sequence has strong sparseness. Targeting this weakness, we use sequence embedding for this kind of data. For example, in check-in sequence, it contains four different types of attributes, namely user ID, timestamp, location, and location category. We will adopt different embedding methods for these different types of attributes in trajectory sequences.

user ID and timestamp. The original user ID and timestamp cannot be directly inputted into the model. We refer to the embedding method mentioned in [9, 4] for these two attributes. As each timestamp

is continuous, which is difficult to embed, we map it into discrete hours. Firstly, we divide one week into 48 slots, where 0-23 slots represent weekdays, and 24-47 represent weekends. Then each hour is represented as a one-hot 48-dimensional vector, where the non-zero entry denotes the index for the hour. Because one-hot encoding can’t reflect the correlation between sequences, we transform them into

dimensional dense vectors and represent them as . For a user ID sequence, we utilize the same embedding method to map it into a dense vector, the dimension of which is . The embedding vector is represented as .

location and location category. In recent years, graph embedding (also known as network embedding) has been applied to many graph related research areas, such as text classification [22], detecting an anomaly in financial networks or social networks [17, 8], etc. The task of our paper is to predict the user’s next location, and all potential locations that the user could reach can construct a graph. Therefore, we attempt to adopt a graph embedding method to learn the location representation. Firstly, we use the training dataset to construct a directed weighted graph, where the direction is sequential, and weight refers to the frequency of the two consecutively visited locations. Then we use the graph embedding method node2vec [12] to map each location into a low dimensional vector, the dimension of which is . The embedding vector is represented as . We adopt the same embedding method for the location category sequence. The embedding vector is represented as , the dimension of which is . Through this method, we can capture the characteristics of group mobility patterns and location interaction. In the following network training, location and location category embedding will no longer be trained.

The embedding of each POI (which contains location, location category, and the timestamp) can be represented as:


where denotes concatenation, represents the latent vector for each POI. Different from [9, 28] which only learn the latent vector of location, we further consider the context information such as the category of location and the check-in time.

4.3 Personalized Preference Modeling

When modeling personalized preferences, an intuitive idea is to learn each user’s location preference. Motivated by this, we propose a user-location attention structure followed by Bi-LSTM to learn the latent representation of the target user’s personalized preference. As shown in Fig. 3, we firstly embed all POIs in each historical trajectory , and user embedding into a low-dimensional vector for a user U. Then a Bi-LSTM layer is used to learn each POI’s high-level representation and sequential dependency. Finally, we compute the importance of each POI to users and integrate the sequence information of POIs to present the user’s personalized preference.

To capture the user’s high-level representations and sequential dependencies of different locations, it is beneficial to learn future trajectories as well as past context. Different from [9], which used RNN-based variant network LSTM process sequences in sequence order and ignore future context, the Bi-LSTM network can exploit information both from the past and the future. This is important to learn user’s personalized preferences.

The calculation process of user’s personalized preferences can be summarized as follows:


where represents the user’s latent vector, denotes concatenation representing the combination of the forward and backward outputs, represents the hidden information of the user’s historical trajectory, denotes the importance of each POI, is the final representation of personalized preferences of a user U.

Fig. 4: Dynamic Spatio-Temporal Dependency Modeling.

4.4 Group Preference Modeling

When predicting the user’s next location, her personalized preferences can reflect the user’s general preference. However, the user’s interest in the next trajectory not only follows her personalized preference but also is affected by the group behavior pattern. At the same time, the user’s preference for the next location changes dynamically with time and space. Therefore, we use trajectory data to construct a statistical physical model named dynamic spatio-temporal dependency module, which uses time and spatial information to model user’s group preferences and realizes the prediction of the next location. See Fig. 4 for the illustration of the proposed model. The dynamic spatio-temporal dependency module is comprised of three parts: the dynamic spatial dependency module, the dynamic time dependency module, and the dynamic activity preference module.

Fig. 5: The statistical analysis of the distance factors on users’ spatial preference. Represents the distance distribution between the adjacent location in the three datasets.

Dynamic spatial dependency module. Generally, the distance between geographic locations has a great impact on the user’s next location prediction [28]. The statistical analysis of the distance between adjacent locations is shown in Fig. 5. We can observe that the user’s travel patterns completely follow the distance attenuation rule of group mobility. And the higher the cost of long-distance travel, the lower the probability people choose it. That is to say, users tend to visit nearby locations. Based on this observation, this paper proposes a dynamic spatial dependency module to characterize the changing spatial preferences of users when they move. The module can understand the user’s dynamic interest in geographic locations, rather than her preference is fixed over time [37]. When we model the distance preference of users, the key issue is to select the trajectory that has the greatest impact on the recent situation from the history trajectories based on the distance among different locations. Specifically, we firstly generate a geo-distance matrix based on the real-world geographic locations and the historical trajectory data, whose values represent the distance between any locations. Then we generate the weight vector between the recent situation and the historical trajectory based on the distance matrix as follows,


where is the distance between and . Finally, we utilize the above generated weight vector to integrate the sequence information of POIs.

Specifically, for the recent situation of a user, we first learn its latent embedding vector before modeling spatial preference. Considering that the historical and recent trajectory have different influences on the current situation of users, we utilize geo-distance to model long and short-term spatial group preference,


where and are the outputs of Bi-LSTM and LSTM respectively, representing historical trajectory and recent trajectory information. and represent long and short-term spatial group preference of a user.

Fig. 6: The visualization of the time-correlation matrix.

Dynamic time dependency module. Traditional methods always consider the influence of time factors on the next location prediction [9, 31]. However, these methods simply learn the semantic relationship of the timestamp sequences and ignore the interaction between the time sequences. For example, most users are accustomed to eating in the cafeteria at 12:00 and 18:00, and drinking coffee in a coffee shop at 15:00 and 20:00. For the users, the location at 12:00 is more related to that at 18:00 rather than 15:00, because both 12:00 and 18:00 are the user’s mealtime. This reflects some group regularities in human movement. Moreover, as the user moves, the user’s location preferences at different timestamps are also dynamically changing. Therefore, we propose a dynamic time dependency module to capture the influence of the user’s historical trajectory information on the recent state in the time dimension.

We first divide one week into 48 slots, where 0-23 slots represent weekdays, and 24-47 slots represent weekends. We construct a location set to represent the location preference of each slot. For example, means all locations where the i-th slot appears. Then we calculate the time-correlation matrix. As shown in Fig. 6, time correlation of any two slots is expressed as follows,


Finally, we generate the weight vector between the recent state and the historical trajectory based on the time-correlation matrix, and utilize the weight vector to integrate the sequence information of POIs,


where is the time correlation between c-th and k-th time slots, is the weight vector between recent state and the historical trajectory. Similar to modeling users’ spatial preferences, we utilize time information to model the long and short-term time group preference of the user,


where and represent long-term and short-term time group preference of the user.

Fig. 7: The statistical analysis of location categories in the NYC dataset.

Dynamic activity preference module. Users usually tend to have different activities at different time. Fig. 7(a) and (b) tell us users more tend to work and commute on workdays, while (c) and (d) present users are more inclined to relax on weekends. Users also have different activity preferences at different time on the same day. Based on this, we propose a framework that can characterize the user’s activity preference at different time. It is worth to mention that it is the first time to use graphical representation to define this problem. We attempt to construct a bipartite graph, of which location category and time are the two end nodes, while the correlation between them is the edges of the nodes at both ends. See details in Fig. 8.

Fig. 8: Using the bipartite graph to construct the activity preference matrix.

We define a bipartite graph as , where is a set containing two types of vertices: , . , where the weight of the edge is . represents the weight of at time

. It is a collection of edges with two different types of vertices at both ends. The task is to learn the user’s activity preference at each timestamp. However, it would be expensive to directly iterate over the bipartite graph. To alleviate this problem, we generate a candidate list of all activities at each moment, and get the correlation between each timestamp and all location categories. Then we generate the weight vector between the recent state and the historical trajectory based on the bipartite graph.


where is the weight of activity at timestamp in activity preference bipartite graph. is the weight vector between recent state and the historical trajectory.

Finally, we utilize the activity preference bipartite graph to model the long and short-term activity group preference of the user,


where and represent long and short-term activity group preference of a user.

Using the dynamic spatio-temporal dependency module, we can capture the influence of spatio-temporal factors on users’ long and short-term group preferences, which are represented as follows respectively,


4.5 Multi-supervised Prediction Module

After obtaining the representations for personalized preferences, long and short-term group preference, we make use of the softmax function to compute the probability distribution

p of the next location as follows:


where represents the concatenation of personalized preferences with long and short-term group preference, is a trainable matrix. Consequently, the index of the largest probability is used as the predicted value of the next location. When training the model, we use negative log likelihood as the loss function. However, in the sequence structure model, the outputted hidden state can more effectively represent the user’s potential interest [40]. As a result, in order to improve the network’s prediction accuracy, we propose an auxiliary loss function to supervise the hidden state of the user’s target location. Our proposed loss function shown in Fig. 9 is defined as follows:


where N represents the numbers of the training set, and

is a hyperparameter. We choose the L2 loss as our auxiliary loss function.

is used to balance the weight of the prediction and auxiliary loss function. With the help of the auxiliary loss function, the generated hidden vector can better express the user’s interest and increase the accuracy of network prediction.

Fig. 9: Multi-supervised loss function

5 Experiments

In this section, we proceed to evaluate the PG2Net model on three real-world data (two check-in data and one CDRs data). We compare our proposed approach with state-of-the-art next location prediction models, and discuss the experimental results.

5.1 Datasets

We evaluate our model on the publicly available Foursquare check-in data collected from New York City (NYC) and Tokyo (TKY), which are widely used in related studies. In addition, we leverage CDRs data collected from Shanghai to evaluate our model. The check-in data contains the anonymized user ID, location id and its coordinate, location category and timestamp, while CDRs data contains the anonymized user ID, the base station ID and its coordinate, and the timestamp. Thus, we remove the embedding of category information when testing our model on CDRs data. And due to the lack of category information for CDRs, our model cannot be compared with PLSPL method. The check-in dataset collected about 10 months records in NYC and TKY via Foursquare from 12 April 2012 to 16 February 2013. Note that the temporal visitors are removed in check-in datasets via eliminating users those presented for less than two weeks. The CDRs are collected during March 2014 from 1,000 anonymized users. Table I presents the details of the three datasets. For the sparse check-in data, we first filter out the users with less than 10 records. Then, we split the trajectory of each user into multiple sub-trajectories at an interval of three days, and merge the two consecutive locations if their time interval is less than 10 mins. Next, we limit the number of sub-trajectories for each user to between 5 and 10. Sub-trajectories with less than 5 records are filtered out, and the sub-trajectories with more than 10 records are further divided into multiple trajectories. Finally, we use 80% of each users’ trajectories as the training set and the rest as testing set.

City # users # locations Timespan
New York 1083 227420 10 months
Tokyo 2293 573703 10 months
Shanghai 1000 44476 1 months
TABLE I: Statistics of the evaluation datasets.

More information about the three datasets is shown in Fig. 10, where (a) represents the distribution of the number of trajectories for each user, and (b) represents the proportion of the number of trajectories in each hour. We can observe from (a) that the NYC and TKY dataset are more sparse than the CDRs dataset. And (b) shows the distributions of TKY and CDRs dataset behave similarly, reflecting similar living habits of the inhabitants in Shanghai and Tokyo.

Fig. 10: the statistical distribution of the number of trajectories in the check-in data and CDRs data about each user and each hour respectively.

5.2 Metrics

For comparing our model with the baselines, we utilize two evaluation indicators: Recall@K and normalized discounted cumulative gain (NDCG@K). Recall@K measures whether there is a correct location among the top K recommended locations. NDCG@K measures the quality of top-K recommended location list. In this paper, we choose K={1,5,10} for comprehensive evaluation. The definitions Recall@K and NDCG@K are given as follows,


where denotes the top-k locations recommended for user u, represents the list of locations visited in the test set, is an index function, represents the j-th location recommended in , is the maximum value in , which is a standardized constant.

5.3 Baselines and Settings

To verify the effectiveness of our proposed method, we compare PG2Net with a classic traditional method and some mainstream deep learning methods:

Markov Chain (MC): It is widely used to predict human trajectories. It builds a transition matrix based on past trajectories to generate the probability of future location. In our paper, we use the first-order MC method.

LSTM: A neural network-based model, which is a variant model of the recurrent neural network, and can efficiently process sequence data.

Deepmove [9]: A neural network model based on the attention mechanism that leverage each user’s historical and recent trajectory data to learn her preferences. An attention mechanism is used to capture the correlation between long-term and short-term trajectory data.

PLSPL [31]: A neural network model to learn the specific preference for each user, which considers category information into the network for the first time.

LSTPM [28]: It is the state-of-the-art model for next location prediction, which uses context-aware non-local network structure and geo-dilated RNN to capture users’ long and short-term preferences respectively.

For our method, we set the embeddings dimension of users and locations to be and respectively, and set categories to be and . The dimension of the hidden state is 500. In our model we use Adam which is a gradient descent optimization algorithm to learn all the parameters. We set the initial learning rate and weight of regularization to 0.0001 and 1e-5 respectively. In the training process, we adopt the method of gradient cutting and adjust the learning rate to ensure that the model has the best performance. We take TKY dataset as an example to show the training process of the proposed model. See details in Fig. 11. For other baseline models, we set their parameters to the default values that come with the original paper.

Fig. 11: Loss in train and test process of TKY dataset.

5.4 Result and Analysis

Datasets Methods Rec@1 Rec@5 Rec@10 NDCG@1 NDCG@5 NDCG@10
Markov 0.1356 0.2732 0.3441 0.1356 0.2078 0.2306
LSTM 0.1545 0.3300 0.3860 0.1545 0.2482 0.2665
NYC DeepMove [9] 0.1828 0.3978 0.4678 0.1828 0.2967 0.3195
PLSPL [31] 0.1820 0.3947 0.4753 0.1820 0.2949 0.3210
LSTPM [28] 0.1864 0.4302 0.5230 0.1864 0.3143 0.3446
PG2Net 0.2120 0.4585 0.5326 0.2120 0.3437 0.3679
Markov 0.1286 0.2500 0.3037 0.1286 0.1929 0.2102
LSTM 0.1440 0.3051 0.3650 0.1440 0.2293 0.2488
TKY DeepMove [9] 0.1658 0.3609 0.4526 0.1658 0.2733 0.3034
PLSPL [31] 0.1631 0.3516 0.4294 0.1631 0.2615 0.2867
LSTPM [28] 0.1773 0.4052 0.4917 0.1773 0.2977 0.3258
PG2Net 0.1994 0.4336 0.5105 0.1994 0.3240 0.3490
Markov 0.2234 0.4715 0.5520 0.2234 0.3549 0.3811
LSTM 0.2337 0.5516 0.6724 0.2337 0.3996 0.4390
CDRs DeepMove [9] 0.2360 0.5724 0.6800 0.2360 0.4126 0.4479
LSTPM [28] 0.2248 0.5742 0.7016 0.2248 0.4047 0.4462
PG2Net 0.2346 0.5981 0.7021 0.2346 0.4262 0.4604
TABLE II: Performance comparison with five baselines on three datasets.The best method is shown in bold.

The experimental results are reported in Table II. The best results in each column are highlighted in boldface. It shows that,

(1) The proposed model PG2Net is compared with the baselines on three datasets, and the overall performance is superior. All the metrics on NYC and TKY datasets are better than the baseline method. On the CDRs dataset, the four measurement indicators are better than all comparison methods. Concretely, for Rec@k on NYC datasets, our method is almost 7.64%-18.85% higher than Markov, 5.75%-14.66% higher than LSTM, 2.92%-6.48% higher than DeepMove, 0.96%-2.83% higher than LSTPM. For NDGC@10, our model outperforms Markov, LSTM, DeepMove, LSTPM by 13.73%, 10.14%, 4.84%, 2.33% respectively. Our model also shows better performance than other baselines under all metrics on the TKY dataset. On the CDRs dataset, the DeepMove model performs best on Rec@1 and NDCG@1, and our model followed. Our model is the best under other metrics. The quantitative evaluation demonstrates the superior effectiveness of our method.

(2) Among all the methods, the Markov model has the worst performance compared with other deep learning methods on the three datasets. This also shows that the neural network-based method has great advantages compared with the traditional method.

(3) PLSPL shows better performance than LSTM on all metrics on NYC and TKY datasets. That is because PLSPL considers the context information such as category to learn the specific preference for each user. However, it shows slightly poor performance than DeepMove on NYC and TKY datasets. This phenomenon can be explained that PLSPL doesn’t gather useful information from history trajectory based on the current situation.

(4) Among the baseline methods, the LSTPM model performs best in terms of most metrics, followed by DeepMove. Compared with the DeepMove model, both our model and the LSTPM model take spatio-temporal factors into consideration, which strongly illustrates that the importance of considering the user’s time and spatial factors when predicting the user’s next location.

(5) Although the deep learning method performs well on the check-in datasets, these methods don’t improve the Rec@1 much when comparing with the Markov method on the CDRs dataset. We argue this probably be related to the sparsity of the trajectory data. The check-in datasets are much sparser than CDRs. For sparse data, deep learning methods can capture high-level semantic information while the traditional methods may be limited in this.

5.5 Comparison of model variants

Datasets Methods Rec@1 Rec@5 Rec@10 NDCG@1 NDCG@5 NDCG@10
PG2Net 0.2120 0.4585 0.5326 0.2120 0.3437 0.3679
NYC GNet 0.1795 0.4304 0.5176 0.1795 0.3116 0.3400
PNet 0.1851 0.4107 0.4932 0.1851 0.3098 0.3364
L-PG2Net 0.1918 0.4245 0.5003 0.1918 0.3159 0.3407
S-PG2Net 0.2019 0.4462 0.5227 0.2019 0.3299 0.3550
PG2Net 0.1994 0.4336 0.5105 0.1994 0.3240 0.3490
TKY GNet 0.1645 0.4103 0.4852 0.1645 0.3065 0.3223
PNet 0.1782 0.3951 0.4762 0.1782 0.3027 0.3203
L-PG2Net 0.1848 0.3997 0.4801 0.1848 0.3099 0.3247
S-PG2Net 0.1890 0.4224 0.4981 0.1890 0.3134 0.3379
PG2Net 0.2346 0.5981 0.7021 0.2346 0.4262 0.4604
CDRs GNet 0.2302 0.5956 0.6995 0.2302 0.4232 0.4573
PNet 0.2320 0.5926 0.6954 0.2320 0.4207 0.4545
L-PG2Net 0.2325 0.5936 0.6975 0.2325 0.4214 0.4595
S-PG2Net 0.2337 0.5970 0.6993 0.2337 0.4229 0.4565
TABLE III: The performance comparison of next place prediction models on three datasets.

In this section, we analyze four variants of PG2Net to further evaluate the effectiveness of our model. The four variants are shown as follows.

GNet: a variant model which only engages the group preference of users, containing long-term group preference and short-term group preference.

PNet: a variant model which only engages the personalized preference of users, removing dynamic spatio-temporal dependency module.

L-PG2Net: a variant model which engages the personalized preference and long-term group preference of users.

S-PG2Net: a variant model which engages the personalized preference and short-term group preference of users.

Fig. 12: The weight proportion of different parts on three datasets.

The experimental results of the ablation tests are shown in Table III. Our PG2Net outperforms the four variants. Specifically, (1) PNet performs better than GNet on Rec@1 and NDCG@1, while GNet performs better on other indicators. The reason is mainly on that GNet learns the group behavior pattern of all users. This pattern is generalized and can achieve a rough prediction of the next trajectory, so it performs better on Rec@5, Rec@10, NDCG@5, NDCG@10, while has a poor performance on Rec@1 and NDCG@1. For PNet, it can learn the precise and personalized preference information of each user, so the performances is better on Rec@1 and NDCG@1. This also shows that GNet and PNet can respectively achieve accurate and rough predictions of user trajectories. To further demonstrate the effectiveness of personalized preference and group preference, we compare the weight proportion of them in predicting the next location. As shown in Fig. 12, the user’s personalized preference characteristics have a greater influence on the user trajectory prediction than the group characteristics. (2) S-PG2Net always performs better than L-PG2Net. The reason is mainly on that S-PG2Net can better capture user’s group preferences based on her recent state. It also shows that the recent trajectory has a greater impact on the current situation. (3) The proposed model PG2Net, which is the combination of PNet and GNet, achieves the best performance on all test datasets. It shows that both personalized preference and group preference have positive impact on the user’s choice of the next location.

5.6 Importance of key components in PG2Net

To better understand the influence of the node2vec embedding method and auxiliary loss function on network training, we use the NYC dataset to evaluate the performance of each module. As shown in Fig. 13, PG2Net is our proposed complete model, PG2Net-Node2vec denotes no graph embedding (node2vec) to embed user location and location category in the model, and PG2Net-Auxiliary Loss represents that the influence of the hidden state of the target location is not considered in the prediction module. Fig. 13 shows that our complete model performs best, and other variant models have decreased performance. Among them, the performance of the PG2Net-Auxiliary Loss model drops the most, with a drop of 2.68% in Rec@5, indicating that the hidden vector of the target position has a great influence on the prediction accuracy of the next position. The second is the performance of the PG2Net-Node2vec model, which shows that graph embedding training on location and location category can improve model performance.

Fig. 13: Analyzing on the impact of node2vec embedding and auxiliary loss

5.7 Analysis of spatial distribution of predicted locations

Fig. 14: Distance distribution between the current situation and the next predicted location

To show the ability of our model to predict next locations at different distances, we examine the distance distribution between the current and the next predicted locations on the three datasets. For each dataset, we compare the actual distances with the distances predicted by the LSTM, DeepMove, and PG2Net respectively. Fig. 14 shows that when the prediction distance is short, PG2Net has a similar prediction performance with LSTM and DeepMove. While when performing long-distance prediction, PG2Net outperforms the LSTM and DeepMove model, and can effectively predict long-distance locations. In addition, DeepMove always outperforms LSTM on the three datasets. CDRs data can especially reflect this phenomenon. For the next places locating in less than 20 km, the performance of LSTM and DeepMove is comparable to our model. When the next places are far away, e.g., over 40 km, the performance of LSTM and DeepMove gradually deteriorates. For places locating over 80 km, the distance predicted by the LSTM model has completely deviated from the real distance distribution, and the performance is the worst, followed by DeepMove. In this scenario, the distance distribution of the predicted locations by PG2Net matches well with the empirical data. The reasons are mainly that two locations at a long distance may be more similar for the recent state. The LSTM and DeepMove models only learn the sequence relationship of the trajectory and fails to take into account the user’s personalized characteristics, the temporal and spatial information that reflects the regularities of group behavior. This could lead to LSTM and DeepMove model being unable to distinguish the two locations, resulting in long-distance jumping of the prediction locations.

6 Conclusion and future work

In this paper, we propose a novel end-to-end deep neural network, PG2Net, to predict the next place to visit via considering users’ preferences to various locations at both individual and collective level. In the personalized preference module, we use Bi-LSTM and the attention mechanism to capture the users’ personalized long-term mobility tendency. In the group preference module, we use spatio-temporal and categorical information of the visited places to represent users’ long-term and short-term group preferences. In addition, we utilize a graph embedding method, node2vec, to capture the sequential relation of users’ visited locations and propose an auxiliary loss function to learn the vectorial representation of the target location. The extensive experimental results on three real-world datasets demonstrate the effectiveness of our proposed model. In future work, we will model more heterogeneous information and use graph neural networks to learn the interaction between them to further improve the next POI recommendation performance.


This work was jointly supported by the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the Science and Technology Commission of Shanghai Municipality Project (2051102600), and the National Key Research and Development Program of China (2020YFC2008701).


  • [1] L. Alessandretti, P. Sapiezynski, V. Sekara, S. Lehmann, and A. Baronchelli (2018) Evidence for a conserved quantity in human mobility. Nature human behaviour 2 (7), pp. 485–491. Cited by: §1.
  • [2] F. Alhasoun, M. Alhazzani, F. Aleissa, R. Alnasser, and M. González (2017) City scale next place prediction from sparse data through similar strangers. In Proceedings of ACM KDD Workshop, pp. 191–196. Cited by: §1, §2.1.
  • [3] S. Chang, E. Pierson, P. W. Koh, J. Gerardin, B. Redbird, D. Grusky, and J. Leskovec (2020) Mobility network models of covid-19 explain inequities and inform reopening. Nature, pp. 1–8. Cited by: §1.
  • [4] J. Chen, J. Li, M. Ahmed, J. Pang, M. Lu, and X. Sun (2020) Next location prediction with a graph convolutional network based on a seq2seq framework. KSII Transactions on Internet and Information Systems (TIIS) 14 (5), pp. 1909–1928. Cited by: §4.2.
  • [5] J. Chen, J. Li, and Y. Li (2020) Predicting human mobility via long short-term patterns. Computer Modeling in Engineering & Sciences 124 (3), pp. 847–864. Cited by: §1.
  • [6] C. Cheng, H. Yang, M. R. Lyu, and I. King (2013) Where you like to go next: successive point-of-interest recommendation. In Twenty-Third international joint conference on Artificial Intelligence, Cited by: §2.1.
  • [7] S. Çolak, A. Lima, and M. C. González (2016) Understanding congested travel in urban areas. Nature Communications 7 (1), pp. 1–8. Cited by: §1.
  • [8] K. Ding, J. Li, R. Bhanushali, and H. Liu (2019)

    Deep anomaly detection on attributed networks

    In Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 594–602. Cited by: §4.2.
  • [9] J. Feng, Y. Li, C. Zhang, F. Sun, F. Meng, A. Guo, and D. Jin (2018) Deepmove: predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 world wide web conference, pp. 1459–1468. Cited by: §1, §2.2, §2.2, §2, §4.2, §4.2, §4.3, §4.4, §5.3, TABLE II.
  • [10] Q. Gao, F. Zhou, G. Trajcevski, K. Zhang, T. Zhong, and F. Zhang (2019) Predicting human mobility via variational attention. In The World Wide Web Conference, pp. 2750–2756. Cited by: §2.2.
  • [11] M. C. Gonzalez, C. A. Hidalgo, and A. Barabasi (2008) Understanding individual human mobility patterns. nature 453 (7196), pp. 779–782. Cited by: §1.
  • [12] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §4.2.
  • [13] Q. Guo, Z. Sun, J. Zhang, and Y. Theng (2020) An attentional recurrent neural network for personalized next location recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 83–90. Cited by: §1.
  • [14] Z. Huang, W. Xu, and K. Yu (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991. Cited by: §1.
  • [15] J. Jiang, S. Tao, D. Lian, Z. Huang, and E. Chen (2020) Predicting human mobility with self-attention and feature interaction. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, pp. 117–131. Cited by: §1.
  • [16] S. Jiang, Y. Yang, S. Gupta, D. Veneziano, S. Athavale, and M. C. González (2016) The timegeo modeling framework for urban mobility without travel surveys. Proceedings of the National Academy of Sciences 113 (37), pp. E5370–E5378. Cited by: §1.
  • [17] A. Khazane, J. Rider, M. Serpe, A. Gogoglou, K. Hines, C. B. Bruss, and R. Serpe (2019) Deeptrax: embedding graphs of financial transactions. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 126–133. Cited by: §4.2.
  • [18] W. Lan, Y. Xu, and B. Zhao (2019) Travel time estimation without road networks: an urban morphological layout representation approach. In Twenty-Eighth International Joint Conferences on Artifical Intelligence (IJCAI), pp. 1772–1778. Cited by: §2.2.
  • [19] D. Lian, Y. Wu, Y. Ge, X. Xie, and E. Chen (2020) Geography-aware sequential location recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2009–2019. Cited by: §1.
  • [20] W. Liang and W. Zhang (2020) Learning social relations and spatiotemporal trajectories for next check-in inference. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §1.
  • [21] Q. Liu, S. Wu, L. Wang, and T. Tan (2016) Predicting the next location: a recurrent model with spatial and temporal contexts. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30. Cited by: §2.2.
  • [22] Z. Lu, P. Du, and J. Nie (2020) VGCN-bert: augmenting bert with graph embedding for text classification. In European Conference on Information Retrieval, pp. 369–382. Cited by: §4.2.
  • [23] J. Manotumruksa, C. Macdonald, and I. Ounis (2017) A deep recurrent collaborative filtering framework for venue recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1429–1438. Cited by: §1.
  • [24] L. E. Olmos, S. Çolak, S. Shafiei, M. Saberi, and M. C. González (2018) Macroscopic dynamics and the collapse of urban traffic. Proceedings of the National Academy of Sciences 115 (50), pp. 12654–12661. Cited by: §1.
  • [25] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme (2010) Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on world wide web, pp. 811–820. Cited by: §1, §2.1.
  • [26] C. M. Schneider, V. Belik, T. Couronné, Z. Smoreda, and M. C. González (2013) Unravelling daily human mobility motifs. Journal of The Royal Society Interface 10 (84), pp. 20130246. Cited by: §1.
  • [27] C. Song, Z. Qu, N. Blumm, and A. Barabási (2010) Limits of predictability in human mobility. Science 327 (5968), pp. 1018–1021. Cited by: §1.
  • [28] K. Sun, T. Qian, T. Chen, Y. Liang, Q. V. H. Nguyen, and H. Yin (2020) Where to go next: modeling long-and short-term user preferences for point-of-interest recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 214–221. Cited by: §1, §2.2, §2.2, §4.2, §4.4, §5.3, TABLE II.
  • [29] M. Tizzoni, P. Bajardi, A. Decuyper, G. K. K. King, C. M. Schneider, V. Blondel, Z. Smoreda, M. C. González, and V. Colizza (2014) On the use of human mobility proxies for modeling epidemics. PLoS Comput Biol 10 (7), pp. e1003716. Cited by: §1.
  • [30] Y. Wu, K. Li, G. Zhao, and X. Qian (2019) Long-and short-term preference learning for next poi recommendation. In Proceedings of the 28th ACM international conference on information and knowledge management, pp. 2301–2304. Cited by: §1.
  • [31] Y. Wu, K. Li, G. Zhao, and Q. Xueming (2020) Personalized long-and short-term preference learning for next poi recommendation. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1, §2.2, §4.4, §5.3, TABLE II.
  • [32] Y. Xu, S. Çolak, E. C. Kara, S. J. Moura, and M. C. González (2018) Planning for electric vehicle needs by coupling charging profiles with urban mobility. Nature Energy 3, pp. 484–493. Cited by: §1.
  • [33] Y. Xu, R. Di Clemente, and M. C. González (2021) Understanding vehicular routing behavior with location-based service data.

    EPJ Data Science

    10 (1), pp. 1–17.
    Cited by: §1.
  • [34] Y. Xu and M. C. González (2017) Collective benefits in traffic during mega events via the use of information technologies. Journal of The Royal Society Interface 14 (129), pp. 20161041. Cited by: §1.
  • [35] Y. Xu, S. Jiang, R. Li, J. Zhang, J. Zhao, S. Abbar, and M. C. González (2019) Unraveling environmental justice in ambient PM exposure in Beijing: A big data approach. Computers, Environment and Urban Systems 75, pp. 12–21. Cited by: §1.
  • [36] D. Yao, C. Zhang, J. Huang, and J. Bi (2017) Serm: a recurrent model for next location prediction in semantic trajectories. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2411–2414. Cited by: §2.2.
  • [37] R. Zhang, J. Guo, H. Jiang, P. Xie, and C. Wang (2019) Multi-task learning for location prediction with deep multi-model ensembles. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1093–1100. Cited by: §4.4.
  • [38] Z. Zhang, C. Li, Z. Wu, A. Sun, D. Ye, and X. Luo (2020) Next: a neural network framework for next poi recommendation. Frontiers of Computer Science 14 (2), pp. 314–333. Cited by: §1.
  • [39] P. Zhao, A. Luo, Y. Liu, F. Zhuang, J. Xu, Z. Li, V. S. Sheng, and X. Zhou (2020) Where to go next: a spatio-temporal gated network for next poi recommendation. IEEE Transactions on Knowledge and Data Engineering. Cited by: §2.2.
  • [40] G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai (2019) Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33, pp. 5941–5948. Cited by: §4.5.