1 Introduction
Recent years have witnessed the rapid growth of locationbased social network services, such as Foursquare, Facebook Places, Yelp and so on. These services have attracted many users to share their locations and experiences with massive amounts of geotagged data accumulated, e.g., 55 million users generated more than 10 billion checkins on Foursquare until December 2017. These online footprints (or checkins) provide an excellent opportunity to understand users’ mobile behaviors. For example, we can analyze and predict where a user will go next based on historical footprints. Moreover, such analysis can benefit POI holders to predict the customer arrival in the next time period.
In the literature, approaches like latent factor model and Markov chain have been widely applied for sequential data analysis and recommendation.
[Rendle et al.2010] proposed Factorizing Personalized Markov Chain (FPMC), which bridges matrix factorization and Markov chains together, for nextbasket recommendation. [Cheng et al.2013] extended FPMC to embed personalized Markov chain and user movement constraint for next POI recommendation. [He et al.2016]proposed a unified tensorbased latent model to capture the successive checkin behavior by exploring the latent patternlevel preference for each user. Recently, Recurrent Neural Networks (RNNs) have been successfully employed on modeling sequential data and become stateoftheart methods.
[Hidasi et al.2015] focused on RNN solutions for sessionbased recommendation task, where no user id exists, and recommendations are made only on short session data. [Zhu et al.2017]proposed a variant of LongShort Term Memory network (LSTM), called TimeLSTM, to equip LSTM with time gates to model time intervals for next item recommendation.
However, none of the above recommendation methods considers both time intervals and geographical distances between neighbor items, which makes next POI recommendation different from other sequential tasks such as language modeling and nextbasket recommender system (RS). As shown in Figure 1, there is no spatiotemporal interval between neighbor words in language modeling, and there is no distance interval between neighbor items in nextbasket RS, while there are time and distance intervals between neighbor checkins in next POI recommendation. Traditional RNN and its variants, e.g., LSTM and GRU, do well in modeling the order information of sequential data with constant intervals, but cannot model dynamic time and distance intervals as shown in Figure 1(c). A recent work STRNN [Liu et al.2016a] tried to extend RNN to model the temporal and spatial context for next location prediction. In order to model temporal context, STRNN models multicheckins in a time window in each RNN cell. Meanwhile, STRNN employs timespecific and distancespecific transition matrices to characterize dynamic time intervals and geographical distances, respectively. Thus, STRNN can obtain improvement in the spatiotemporal sequential recommendation. However, there exist some challenges preventing STRNN from becoming the best solution for next POI recommendation.
First of all, STRNN may fail to model spatial and temporal relations of neighbor checkins properly. STRNN adopts timespecific and distancespecific transition matrices between cell hidden states within RNN. Due to data sparsity, STRNN cannot learn every possible continuous time intervals and geographical distances but partition them into discrete bins. Secondly, STRNN is designed for shortterm interests and not well designed for longterm interests. [Jannach et al.2015] reported that users’ shortterm and longterm interests are both significant on achieving the best performance. The shortterm interest here means that recommended POIs should depend on recently visited POIs, and the longterm interest means that recommended POIs should depend on all historical visited POIs. Thirdly, it is hard to select the proper width of the time window for different applications in STRNN since it models not one element in each layer but multielements in a fixed time period.
To this end, in this paper, we propose a new recurrent neural network model, named STLSTM, to model users’ sequential visiting behaviors. Time intervals and distance intervals of neighbor checkins are modeled by time gate and distance gate, respectively. Note that there are two time gates and two distance gates in the STLSTM model. One pair of time gate and distance gate is designed to exploit time and distance intervals to capture the shortterm interest, and the other is to memorize time and distance intervals to model the longterm interest. Furthermore, enlightened by [Greff et al.2017], we use the coupled input and forget gates to reduce the number of parameters, making our model more efficient. Experimental results on four realworld datasets show STLSTM significantly improves next POI recommendation performance.
To summarize, our contributions are listed as follows.

To the best of our knowledge, this is the first work that models spatiotemporal intervals between checkins under LSTM architecture to learn user’s visiting behavior for the next POI recommendation.

A STLSTM model is proposed to incorporate carefully designed time gates and distance gates to capture the spatiotemporal interval information between checkins. As a result, STLSTM well models user’s shortterm and longterm interests simultaneously.

Experiments on four largescale realworld datasets are conducted to evaluate the performance of our proposed model. Our experimental results show that our method outperforms stateoftheart methods.
2 Related Work
In this section, we discuss related work from two aspects, which are POI recommendation and leveraging neural networks for recommendation.
2.1 POI Recommendation
Different from traditional recommendations (e.g., movie recommendation, music recommendation), POI recommendation is characterized by geographic information and no explicit rating information [Ye et al.2011, Lian et al.2014]. Moreover, additional information, such as social influence, temporal information, review information, and transition between POIs, has been leveraged for POI recommendation. [Ye et al.2011] integrated the social influence with a userbased Collaborative Filtering (CF) model and modeled the geographical influence by a Bayesian model. [Yuan et al.2013] utilized the temporal preference to enhance the efficiency and effectiveness of the solution. [Kurashima et al.2013] proposed a topic model, in which a POI is sampled based on its topics and the distance to historical visited POIs of a target user. [Liu et al.2016b] exploited users’ interests and their evolving sequential preferences with temporal interval assessment to recommend POI in a specified time period.
Next POI recommendation, as a natural extension of general POI recommendation, is recently proposed and has attracted great research interest. Research has shown that the sequential influence between successive checkins plays a crucial role in next POI recommendation since human movement exhibits sequential patterns. A tensorbased model, named FPMCLR, was proposed by integrating the firstorder Markov chain of POI transitions and distance constraints for next POI recommendation [Cheng et al.2013]. [He et al.2016] further proposed a tensorbased latent model considering the influence of user’s latent behavior patterns, which are determined by the contextual temporal and categorical information. [Feng et al.2015] proposed a personalized ranking metric embedding method (PRME) to model personalized checkin sequences for next POI recommendation. [Xie et al.2016] proposed a graphbased embedding learning approach, named GE, which utilize bipartite graphs to model context factors in a unified optimization framework.
2.2 Neural Networks for Recommendation
Neural networks are not only naturally used for feature learning to model various features of users or items, but also explored as a core recommendation model to simulate nonlinear, complex interactions between users and items [Wang and Wang2014, Zhang et al.2016]. [Zheng et al.2016] further improved it with an autoregressive method. [Yang et al.2017a]
proposed a deep neural architecture named PACE for POI recommendation, which utilizes the smoothness of semisupervised learning to alleviate the sparsity of collaborative filtering.
[Yang et al.2017b] jointly modeled a social network structure and users’ trajectory behaviors with a neural network model named JNTM. [Zhang et al.2017] tried to learn user’s next movement intention and incorporated different contextual factors to improve next POI recommendation. [Zhu et al.2017] proposed a TimeLSTM model and two variants, which equip LSTM with time gates to model time intervals for next item recommendation.A recent work proposed a model named STRNN, which considers spatial and temporal contexts to model user behavior for next location prediction, is closely related to our work [Liu et al.2016a]. However, our proposed STLSTM model differs significantly from STRNN in two aspects. First, STLSTM equips the LSTM model with time and distance gates while STRNN adds spatiotemporal transition matrices to the RNN model. Second, STLSTM well models time and distance intervals between neighbor checkins to extract longterm and shortterm interests. However, STRNN recommends next POI depending only on POIs in the nearest time window which may be hard to distinguish shortterm and longterm interests.
3 Preliminaries
In this section, we first give the formal problem definition of next POI recommendation, and then briefly introduce LSTM.
3.1 Problem Formulation
Let be the set of users and be the set of POIs. For user , she has a sequence of historical POI visits up to time represented as , where means user visit POI at time . The goal of next POI recommendation is to recommend a list of unvisited POIs for a user to visit next at time point . Specifically, a higher prediction score of a user to an unvisited POI
indicates a higher probability that the user
would like to visit the POI at time . According to prediction scores, we can recommend top POIs to user .3.2 Lstm
LSTM [Hochreiter and Schmidhuber1997], a variant of RNN, is capable of learning short and longterm dependencies. LSTM has become an effective and scalable model for sequential prediction problems, and many improvements have been made to the original LSTM architecture. We use the basic LSTM model in our approach for the concise and general purpose, and it is easy to extend to other variants of LSTM. The basic update equations of LSTM are as follows:
(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
where , , represent the input, forget and output gates of the th object, deciding what information to store, forget and output, respectively.
is the cell activation vector representing cell state, which is the key to LSTM.
and represent the input feature vector and the hidden output vector, respectively. represents a sigmoid layer to map the values between 0 to 1, where 1 represents “complete keep this” while 0 represents “completely get rid of this”. , , and are the weights of gates. , , and are corresponding biases. And represents for the elementwise (Hadamard) product. The update of cell state has two parts. The former part is the previous cell state that is controlled by forget gate , and the latter part is the new candidate value scaled by how much we decided to add state value.4 Our Approach
In this section, we first propose a spatiotemporal LSTM model, STLSTM, which utilizes time and distance intervals to model user’s shortterm interest and longterm interest simultaneously. Then, we improve STLSTM with coupled input and output gates for efficiency.
4.1 Spatiotemporal LSTM
When using LSTM for next POI recommendation, represents user’s last visited POI, which can be exploited to learn user’s shortterm interest. While contains the information of user’s historical visited POIs, which reflect user’s longterm interest. However, how much the shortterm interest determines where to go next heavily depends on the time interval and the geographical distance between the last POI and the next POI. Intuitively, a POI visited long time ago and long distance away has little influence on next POI, and vice versa. In our proposed STLSTM model, we use time gate and distance gate to control the influence of the last visited POI on next POI recommendation. Furthermore, the time gate and the distance gate can also help to store time and distance intervals in cell state , which memorizes user’s longterm interest. In this way, we utilize time and distance intervals to model user’s shortterm interest and longterm interest simultaneously.
As shown in two dotted red rectangles in Figure 2, we add two time gates and two distance gates to LSTM, denoted as , , and respectively. and are used to control the influence of the latest visited POI on next POI, and and are used to capture time and distance intervals to model user’s longterm interest. Based on LSTM, we add equations for time gates and distance gates as follows:
(7) 
(8) 
(9) 
(10) 
We then modify Eq. (4)(6) to:
(11) 
(12) 
(13) 
(14) 
where is the time interval and is the distance interval. Besides input gate , can be regarded as an input information filter considering time interval, and can be regarded as another input information filter considering distance interval. We add a new cell state to store the result, then transfer to the hidden state and finally influences next recommendation. Along this line, is filtered by time gate and distance gate as well as input gate on current recommendations.
Cell state is used to memory users general interest, i.e., longterm interest. We designed a time gate and a distance gate to control the cell state update. first memorizes then transfers to , further to . So helps store to model user longterm interest. In the similar way, memorizes and transfers to cell state to help model user longterm interest. In this way, captures user longterm interest by memorizing not only the order of user’s historical visited POIs, but also the time and distance interval information between neighbor POIs. Modeling distance intervals can help capture user’s general spatial interest, while modeling time intervals helps capture user’s periodical visiting behavior.
Normally, a more recently visited POI with a shorter distance should have a larger influence on choosing next POI. To incorporate this knowledge in the designed gates, we add constraints and in Eq. (7) and Eq. (9). Accordingly, if is smaller, would be larger according to Eq. (7). In the similar way, if is shorter, would be larger according to Eq. (9). For example, if time and distance intervals are smaller between and next POI, then better indicates the shortterm interest, thus its influence should be increased. If or is larger, would have a smaller influence on the new cell state . In this case, the shortterm interest is uncertain, so we should depend more on the longterm interests. It is why we set two time gates and two distance gates to distinguish the shortterm and longterm interests update.
4.2 Variation of coupled input and forget gates
Enlightened by [Greff et al.2017], we propose another version of STLSTM, named STCLSTM, to reduce the number of parameters and improve efficiency. STCLSTM uses coupled input and forget gates instead of separately deciding what to forget and what new information to add, as shown in Figure 3. Specifically, we remove the forget gate, and modify Eq. (11) and Eq. (12) to:
(15) 
(16) 
4.3 Training
The way we adapt our model to next POI recommendation is as follows. Firstly we transform to . Then in STLSTM is equivalent to , is equivalent to , and is equivalent to , where
is the function computing the distance between two geographical points. Moreover, we make use of all users’ behavioral histories for learning and recommendation. We leverage the minibatch learning method, and train the model on users’ existing histories until convergence. The model output is a probability distribution on all POIs calculated by
and . And then we take a gradient step to optimize the loss based on the output and onehot representations of .We use Adam, a variant of Stochastic Gradient Descent(SGD), to optimize the parameters in STLSTM, which adapts the learning rate for each parameter by performing smaller updates for frequent parameters and larger updates for infrequent parameters. We use the projection operator described in
[Rakhlin et al.2012] to meet the constraints in Eq. (7) and in Eq. (9). If we have during the training process, we set . And parameter is set in the same way.The computational complexity of learning LSTM models per weight and time step with the stochastic gradient descent (SGD) optimization technique is . Hence, the LSTM algorithm is very efficient, with an excellent update complexity of , where is the number of weights and can be calculated as , where is the number of memory cells, is the number of input units, and is the number of output units. Similarly, STLSTM computational complexity is also and can be calculated as . The training time of our proposed model for rounds of training on four datasets after data cleaning is about minutes on GPU M6000.
5 Experiments
In this section, we conduct experiments to evaluate the performance of our proposed model STLSTM on four realworld datasets. We first briefly depict the datasets, followed by baseline methods. Finally, we present our experimental results and discussions.
5.1 Dataset
We use four public LBSNs datasets that have userPOI interactions of users and locations of POIs. The statistics of the four datasets are listed in Table 1. CA is a Foursquare dataset from users whose homes are in California, collected from January 2010 to February 2011 and used in [Gao et al.2012]. SIN is a Singapore dataset crawled from Foursquare used by [Yuan et al.2013]. Gowalla^{1}^{1}1http://snap.stanford.edu/data/locgowalla.html and Brightkite^{2}^{2}2http://snap.stanford.edu/data/locbrightkite.html are two widely used LBSN datasets, which have been used in many related research papers. We eliminate users with fewer than 10 checkins and POIs visited by fewer than 10 users in the four datasets. Then, we sorted each user’s checkin records according to timestamp order, taking the first 70% as training set, the remaining 30% for the test set.
Dataset  #user  #POI  #Checkin  Density 

CA  49,005  206,097  425,691  0.004% 
SIN  30,887  18,995  860,888  0.014% 
Gowalla  18,737  32,510  1,278,274  0.209% 
Brightkite  51,406  772,967  4,747,288  0.012% 
5.2 Baseline Methods
We compare our proposed model STLSTM with seven representative methods for next POI recommendation.

FPMCLR [Cheng et al.2013]: It combines the personalized Markov chains with the user movement constraints around a localized region. It factorizes the transition tensor matrices of all users and predicts next location by computing the transition probability.

PRMEG [Feng et al.2015]: It utilizes the Metric Embedding method to avoid drawbacks of the MF. Specifically, it embeds users and POIs into the same latent space to capture the user transition patterns.

GE [Xie et al.2016]: It embeds four relational graphs (POIPOI, POIRegion, POITime, POIWord) into a shared low dimensional space. The recommendation score is then calculated by a linear combination of inner products for these contextual factors.

RNN [Zhang et al.2014]: This method leverages the temporal dependency in user’s behavior sequence through a standard recurrent structure.

LSTM [Hochreiter and Schmidhuber1997] This is a variant of RNN model, which contains a memory cell and three multiplicative gates to allow longterm dependency learning.

GRU [Cho et al.2014]: This is a variant of RNN model, which is equipped with two gates to control the information flow.

STRNN [Liu et al.2016a]: Based on the standard RNN model, STRNN replaces the single transition matrix in RNN with timespecific transition matrices and distancespecific transition matrices to model spatial and temporal contexts.
5.3 Evaluation Metrics
To evaluate the performance of our proposed model STLSTM and compare with the seven baselines described above, we use two standard metrics Acc@K and Mean Average Precision (MAP). These two metrics are popularly used for evaluating recommendation results, such as [Liu et al.2016a, He et al.2016, Xie et al.2016]. Note that for an instance in testing set, Acc@K is 1 if the visited POI appears in the set of topK recommendation POIs, and 0 otherwise. The overall Acc@K is calculated as the average value of all testing instances. In this paper, we choose K = {1, 5, 10, 15, 20} to illustrate different results of Acc@K.
5.4 Results and Discussions
CA  SIN  
Acc@1  Acc@5  Acc@10  MAP  Acc@1  Acc@5  Acc@10  MAP  
FPMCLR  0.0378  0.0493  0.0784  0.1791  0.0395  0.0625  0.0826  0.1724 
PRMEG  0.0422  0.065  0.0813  0.1868  0.0466  0.0723  0.0876  0.1715 
GE  0.0294  0.0329  0.0714  0.1691  0.0062  0.0321  0.0607  0.1102 
RNN  0.0475  0.0901  0.1138  0.1901  0.1321  0.1867  0.2043  0.2186 
LSTM  0.0486  0.0937  0.1276  0.1975  0.1261  0.1881  0.2019  0.2123 
GRU  0.0483  0.0915  0.1216  0.1934  0.1237  0.1921  0.1992  0.2101 
STRNN  0.0505  0.0922  0.1232  0.2075  0.1379  0.1957  0.2091  0.2239 
STLSTM  0.0716  0.1232  0.1508  0.2208  0.1978  0.2436  0.2651  0.3194 
STCLSTM  0.0801  0.1308  0.1612  0.2556  0.2037  0.2542  0.2861  0.3433 
Gowalla  Brightkite  
Acc@1  Acc@5  Acc@10  MAP  Acc@1  Acc@5  Acc@10  MAP  
FPMCLR  0.0293  0.0524  0.0849  0.1745  0.1634  0.2475  0.3164  0.33 
PRMEG  0.0334  0.0652  0.0869  0.1916  0.1976  0.2993  0.3495  0.3115 
GE  0.0174  0.06  0.0947  0.1973  0.0521  0.1376  0.2118  0.2602 
RNN  0.0473  0.0892  0.1207  0.1998  0.3401  0.4087  0.432  0.413 
LSTM  0.0503  0.0967  0.1241  0.2004  0.3575  0.4146  0.4489  0.4303 
GRU  0.0498  0.0931  0.1289  0.2045  0.331  0.4007  0.4377  0.4042 
STRNN  0.0519  0.09532  0.1304  0.2187  0.3672  0.4231  0.4477  0.4369 
STLSTM  0.0713  0.1355  0.1669  0.2338  0.4389  0.4807  0.5035  0.5266 
STCLSTM  0.0778  0.1492  0.1818  0.2557  0.4443  0.4953  0.5231  0.5626 
Method Comparison.
The performance of our proposed model STLSTM and the seven baselines on four datasets evaluated by Acc@K and MAP is shown in Table 2. The cell size and the hidden state size are set as 128 in our experiments. The number of Epochs is set as 100 and the batch size is set as 10 for our proposed model. Other baseline parameters follow the best settings in their papers. From the experimental results, we can see following observations: RNN performs better than Markov chain method FPMCLR and embedding method PRMEG, due to its capability in modeling sequential data and user interests using RNN cell. Both LSTM and GRU slightly improve the performance compare with RNN because of their advantages in modeling longterm interests. The result of GE is not good for missing social and textual information in our datasets. The performance of the stateoftheart method STRNN is close to the standard RNN method, which may be caused by the difficulty of manually setting the windows of time and distance intervals. Another reason may be that the setting of the window does not well model the relation of recently visited POIs and next POI. Our model STLSTM outperforms all baselines on the four datasets. The significant improvement of STLSTM indicates that it can well model temporal and spatial contexts. This is because we add time and distance gates to integrate time and distance intervals into the model. Moreover, STCLSTM not only reduces the number of parameters, but also marginally improve the performance compared with STLSTM.
Effectiveness of Time and Distance Gates. There are two time gates and two distance gates in our STCLSTM model. We first investigate the effectiveness of time and distance gates on modeling time and distance intervals. Specifically, we set and , in Eq. (9) and Eq. (10), respectively. That is, we close two distance gates and only consider the time intervals. Similarly, we set and , in Eq. (7) and Eq. (8), respectively. That is, we close two time gates and only consider distance information. From Figure 4, we can observe that the time gates and distance gates have almost equal importance on the two datasets (i.e., Gowalla and CA). Moreover, they both are critical for improving the recommendation performances.
We also investigate the effectiveness of time and distance gates on modeling shortterm and longterm interests. We set and , in Eq. (8) and Eq. (10), which means we close time and distance gates on longterm interests and only activate time and distance gates on shortterm interest. Similarly, we set and , in Eq. (7) and Eq. (9), which means we close time and distance gates for shortterm interest. As shown in Figure 4, we can observe that they all perform worse than original STCLSTM, which means that time and distance intervals are not only critical to shortterm interests but also important to longterm interests. Distance intervals may help model user general spatial preference and time intervals may help to model user longterm periodical behavior.


Performance of Cold Start. We also evaluate the performance of STLSTM by comparing with other competitors for coldstart users. If a user just visits a few POIs, we think the user is cold. Specifically, we take users with less than 5 checkins as a cold user in our experiments. We conduct the experiments on two datasets (i.e., Gowalla and BrightKite) and use Acc@K as the measure metric. As shown in Figure 5, we can observe that STCLSTM performs the best among all methods under cold start scenario. The reason is that STCLSTM models longterm interests as well as shortterm interests with considering time and distance intervals.


Impact of Parameters. In the standard RNN, different cell sizes and batch sizes may lead to different performances. We investigate the impact of these two parameters for STLSTM and STCLSTM. We vary cell sizes and batch sizes to observe the performance and the training time of our proposed two models. We only show the impact of the two parameters on Gowalla dataset due to space constraint. As shown in Figure 6, increasing the cell size can improve our model in terms of the Acc@10 metric, and a proper batch size can help achieve the best performance. The cell size determines the model complexity, and the cell with a larger size may fit the data better. Moreover, a small batch size may lead to local optimum, and a big one may lead to insufficient updating of parameters in our two models.


6 Conclusions
In this paper, a spatiotemporal recurrent neural network, named STLSTM, was proposed for next POI recommendation. Time and distance intervals between neighbor checkins were modeled using time and distance gates in STLSTM. Specifically, we added a new cell state, and so there are two cell states to memorize users’ shortterm and longterm interests respectively. We designed time and distance gates to control user’s shortterm interest update and another pair of gates to control longterm interest update, so as to improve next POI recommendation performance. We further coupled time and distance gates to improve STLSTM efficiency. Experimental results on four largescale realworld datasets demonstrated the effectiveness of our model, which performed better than the stateoftheart methods. In future work, we would incorporate more context information such as social network and textual description content into the model to further improve the next POI recommendation accuracy.
References
 [Cheng et al.2013] Chen Cheng, Haiqin Yang, Michael R Lyu, and Irwin King. Where you like to go next: Successive pointofinterest recommendation. In IJCAI, volume 13, pages 2605–2611, 2013.
 [Cho et al.2014] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoderdecoder for statistical machine translation. In EMNLP, pages 1724–1734, 2014.
 [Feng et al.2015] Shanshan Feng, Xutao Li, Yifeng Zeng, Gao Cong, Yeow Meng Chee, and Quan Yuan. Personalized ranking metric embedding for next new poi recommendation. In IJCAI, pages 2069–2075, 2015.
 [Gao et al.2012] Huiji Gao, Jiliang Tang, and Huan Liu. gscorr: modeling geosocial correlations for new checkins on locationbased social networks. In CIKM, pages 1582–1586. ACM, 2012.
 [Greff et al.2017] Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232, 2017.
 [He et al.2016] Jing He, Xin Li, Lejian Liao, Dandan Song, and William K Cheung. Inferring a personalized next pointofinterest recommendation model with latent behavior patterns. In AAAI, pages 137–143, 2016.
 [Hidasi et al.2015] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Sessionbased recommendations with recurrent neural networks. CoRR, abs/1511.06939, 2015.
 [Hochreiter and Schmidhuber1997] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural computation, 9(8):1735–1780, 1997.
 [Jannach et al.2015] Dietmar Jannach, Lukas Lerche, and Michael Jugovac. Adaptation and evaluation of recommendations for shortterm shopping goals. In RecSys’16, pages 211–218. ACM, 2015.
 [Kurashima et al.2013] Takeshi Kurashima, Tomoharu Iwata, Takahide Hoshide, Noriko Takaya, and Ko Fujimura. Geo topic model: joint modeling of user’s activity area and interests for location recommendation. In WSDM, pages 375–384. ACM, 2013.
 [Lian et al.2014] Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, Enhong Chen, and Yong Rui. Geomf: joint geographical modeling and matrix factorization for pointofinterest recommendation. In SIGKDD, pages 831–840. ACM, 2014.
 [Liu et al.2016a] Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. Predicting the next location: A recurrent model with spatial and temporal contexts. In AAAI, pages 194–200, 2016.
 [Liu et al.2016b] Yanchi Liu, Chuanren Liu, Bin Liu, Meng Qu, and Hui Xiong. Unified pointofinterest recommendation with temporal interval assessment. In KDD, pages 1015–1024, 2016.
 [Rakhlin et al.2012] Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In ICML’12, pages 1571–1578, 2012.
 [Rendle et al.2010] Steffen Rendle, Christoph Freudenthaler, and Lars SchmidtThieme. Factorizing personalized markov chains for nextbasket recommendation. In WWW, pages 811–820. ACM, 2010.

[Wang and Wang2014]
Xinxi Wang and Ye Wang.
Improving contentbased and hybrid music recommendation using deep learning.
In MM, pages 627–636. ACM, 2014.  [Xie et al.2016] Min Xie, Hongzhi Yin, Hao Wang, Fanjiang Xu, Weitong Chen, and Sen Wang. Learning graphbased poi embedding for locationbased recommendation. In CIKM, pages 15–24. ACM, 2016.
 [Yang et al.2017a] Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han. Bridging collaborative filtering and semisupervised learning: A neural approach for poi recommendation. In SIGKDD, pages 1245–1254. ACM, 2017.
 [Yang et al.2017b] Cheng Yang, Maosong Sun, Wayne Xin Zhao, Zhiyuan Liu, and Edward Y. Chang. A neural network approach to jointly modeling social networks and mobile trajectories. ACM TOIS, 35(4):36:1–36:28, 2017.
 [Ye et al.2011] Mao Ye, Peifeng Yin, WangChien Lee, and DikLun Lee. Exploiting geographical influence for collaborative pointofinterest recommendation. In SIGIR, pages 325–334. ACM, 2011.
 [Yuan et al.2013] Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, and Nadia Magnenat Thalmann. Timeaware pointofinterest recommendation. In SIGIR, pages 363–372. ACM, 2013.
 [Zhang et al.2014] Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and TieYan Liu. Sequential click prediction for sponsored search with recurrent neural networks. In AAAI, pages 1369–1375, 2014.
 [Zhang et al.2016] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and WeiYing Ma. Collaborative knowledge base embedding for recommender systems. In SIGKDD, pages 353–362. ACM, 2016.
 [Zhang et al.2017] Zhiqian Zhang, Chenliang Li, Zhiyong Wu, Aixin Sun, Dengpan Ye, and Xiangyang Luo. Next: A neural network framework for next poi recommendation. arXiv preprint arXiv:1704.04576, 2017.
 [Zheng et al.2016] Yin Zheng, Bangsheng Tang, Wenkui Ding, and Hanning Zhou. A neural autoregressive approach to collaborative filtering. In ICML, pages 764–773, 2016.
 [Zhu et al.2017] Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. What to do next: Modeling user behaviors by timelstm. In IJCAI17, pages 3602–3608, 2017.