Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence

09/27/2018 ∙ by Chen Ma, et al. ∙ McGill University 0

The rapid growth of Location-based Social Networks (LBSNs) provides a great opportunity to satisfy the strong demand for personalized Point-of-Interest (POI) recommendation services. However, with the tremendous increase of users and POIs, POI recommender systems still face several challenging problems: (1) the hardness of modeling non-linear user-POI interactions from implicit feedback; (2) the difficulty of incorporating context information such as POIs' geographical coordinates. To cope with these challenges, we propose a novel autoencoder-based model to learn the non-linear user-POI relations, namely SAE-NAD, which consists of a self-attentive encoder (SAE) and a neighbor-aware decoder (NAD). In particular, unlike previous works equally treat users' checked-in POIs, our self-attentive encoder adaptively differentiates the user preference degrees in multiple aspects, by adopting a multi-dimensional attention mechanism. To incorporate the geographical context information, we propose a neighbor-aware decoder to make users' reachability higher on the similar and nearby neighbors of checked-in POIs, which is achieved by the inner product of POI embeddings together with the radial basis function (RBF) kernel. To evaluate the proposed model, we conduct extensive experiments on three real-world datasets with many state-of-the-art baseline methods and evaluation metrics. The experimental results demonstrate the effectiveness of our model.



There are no comments yet.


page 8

Code Repositories


The implementation of SAE-NAD

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the rapid growth of mobile devices and location-acquisition technologies, it has become more convenient for people to access their real-time location information. This development enables the advent of Location-based Social Networks (LBSNs), such as Yelp and Foursquare. These LBSNs allow users to connect with each other, post physical positions and share experiences associated with a location, namely, Point-of-Interest (POI). The large amount of user-POI interaction data facilitates a promising service—personalized POI recommendation. POI recommender systems serve a potentially huge service demand and bring significant benefits to at least two parties: (1) help residents or tourists to explore interesting unvisited places; (2) create opportunities for POIs to attract more visitors.

In the literature, effective methods have been proposed for personalized POI recommendation. These methods mainly rely on collaborative filtering (CF), which can be divided into memory-based and model-based methods (Bobadilla et al., 2013). Memory-based methods infer a user’s preferences regarding unvisited POIs based on the weighted average of ratings from similar users or POIs. For example, (Ye et al., 2011) and (Zhang and Chow, 2013) applied friend-based collaborative filtering methods to recommend POIs by the similarities between a user and her friends. On the other hand, model-based methods make use of the collection of user-POI records to learn a model for the recommendation. Popularized by the Netflix Prize, some of the most successful realizations of model-based methods are built on matrix factorization (MF). MF discovers the latent features underlying the interactions between users and POIs, which predicts user preferences by the inner product of latent factors. For instance, (Lian et al., 2014), (Liu et al., 2014) and (Li et al., 2016) adopted a weighted regularized MF to infer user preferences on unvisited POIs.

However, the aforementioned methods may not fully leverage the complicated user-POI interactions in the large-scale data. They usually model user preferences by the weighted average of ratings or the inner product of latent factors, which can only capture the linear relations between users and POIs. It has been shown in (He et al., 2017) that how the inner product combines latent features linearly and limits the expressiveness of MF.

Recently, due to the ability to represent non-linear and complex data, autoencoders have been a great success in the domain of recommendation and bring more opportunities to reshape the conventional recommendation architectures (Wang et al., 2015; Li et al., 2015b; Wu et al., 2016)

. Motivated by this, we propose an autoencoder-based model to cope with the complicated user-POI check-in data. The primary reason we adopt the stacked autoencoder (AE) is that, with the deep neural network structure and the non-linear activation function, the stacked AE effectively captures the non-linear and non-trivial relationships between users and POIs, and enables more complex data representations in the latent space. Besides, AE also has strong relations with multiple MF methods

(Wu et al., 2016), which can be directly utilized to model the user rating data.

Nevertheless, applying AE in POI recommendation is a non-trivial task. There are still several challenges. First, we argue that, in the user check-in records, some POIs are more representative than others to reflect users’ preferences. Equally treating these representative POIs along with other POIs may lead to inaccurate understanding of users’ preferences. Hence, how to further distinguish the user preference degrees on checked-in POIs is significant for learning personalized user preferences. Second, the spatial context information is a unique property in the check-in records, which is critical for improving recommendation performance. Therefore, how to incorporate the auxiliary information into the neural network-based method is a problem. Third, check-in data is a kind of implicit feedback, which means there are only positive samples in the data, and all of the negative samples and missing positive samples are mixed together (Pan et al., 2008). Moreover, users can only visit a small number of POIs from millions of POIs, which makes the user-POI check-in data extremely sparse. Thus, how to capture users’ preferences from the sparse implicit feedback is a challenge.

To address the challenges above, we propose a novel autoencoder-based model, SAE-NAD

, which consists of two components: a self-attentive encoder (SAE) and a neighbor-aware decoder (NAD). First, unlike existing methods do not deeply explore the implicitness of users’ preferences, we propose the self-attentive encoder to adaptively compute an importance vector for each POI in a user’s check-in records, which demonstrates the user preference on those checked-in POIs in multiple aspects. As such, users’ preferences on checked-in POIs can be further distinguished. The POIs with larger importance values will contribute more to learn the user hidden representation, which can make the user hidden representation more personalized. Second, we propose the neighbor-aware decoder to incorporate the geographical influence

(Ye et al., 2011; Cheng et al., 2012), which widely exists in the human mobility behavior on LBSNs. We adopt the inner product between the embeddings of checked-in POIs and unvisited POIs, together with the radial basis function (RBF) kernel (based on the pair-wise distance of corresponding POIs), to calculate the influence checked-in POIs applied on unvisited POIs. By doing this, the user reachability on the neighbors that are similar and close to checked-in POIs will be higher than the distant POIs. Third, to model the sparse implicit feedback, we assign the same small weights to all unvisited POIs and assign larger weights to visited POIs according to the visit frequency, which makes a distinction between unvisited POIs, less-visited POIs and frequent-visited POIs for each user. We extensively evaluate our model with many state-of-the-art baseline methods and different validation metrics on three real-world datasets. The experimental results demonstrate the improvements of our model over other baseline methods on POI recommendation.

The major contributions of this paper are as follows:

  • [leftmargin=*]

  • Due to the fact that memory-based and MF-based CF can only capture the linear relationships between users and POIs, we adopt an autoencoder-based model to capture the non-linear and complex user-POI interactions in the check-in data.

  • To distinguish the user preference on checked-in POIs, we propose a self-attentive encoder to adaptively compute an importance vector for each checked-in POI, and make the POI contribute to the user hidden representation according to the importance values. To the best of our knowledge, this is the first paper to use attention-based autoencoders in POI recommendation.

  • To incorporate the geographical influence, we propose a neighbor-aware decoder, which adopts the inner product between the embeddings of checked-in POIs and unvisited POIs, along with the RBF kernel based on the distance of POIs, to model the influence checked-in POIs exerted on unvisited POIs.

  • The proposed model achieves the best performance on three real-world datasets comparing to the state-of-the-art methods, exhibiting the superiority of our model.

2. Related Work

POI recommendation, also referred to location recommendation or venue recommendation, is an important topic in the domain of recommender systems (Bao et al., 2015). In this section, we describe related work in personalized location recommendation, as well as the applications of attention mechanisms in recommendation tasks.

2.1. Personalized Location Recommendation

Recently, with the advance of LBSNs, location recommendation has been widely studied. User’s historical data (check-ins, comments, etc.) was used to make the recommendation personalized. Most of the proposed methods, using historical records, are based on collaborative filtering (CF). Some researchers employ memory-based CF (Bobadilla et al., 2013) to learn user preferences (Ye et al., 2010; Ye et al., 2011). For example , Ye et al. (Ye et al., 2010) proposed a friend-based CF integrating the preferences of user’s social friends, which was based on user-based CF. On the other hand, recent work utilized model-based CF (Bobadilla et al., 2013) to recommend POIs (Li et al., 2015a; Cheng et al., 2012; Yin et al., 2015), such as matrix factorization (Koren et al., 2009). Furthermore, in (Lian et al., 2014), (Liu et al., 2014) and (Li et al., 2016), researchers found check-ins can be treated as implicit feedback, and applied the weighted regularized matrix factorization (Hu et al., 2008) to model the implicit feedback data. While other researchers considered the recommendation problem as a task of pair-wise ranking. In (Cheng et al., 2013) and (Zhao et al., 2016), researchers adopted Bayesian personalized ranking loss (Rendle et al., 2009) to learn the pair-wise preferences on POIs.

To make more accurate recommendations, researchers incorporated POI geographical influence into their proposed models (Cho et al., 2011; Ye et al., 2011; Cheng et al., 2012; Liu et al., 2013; Yuan et al., 2014; Liu et al., 2014)

. There are several ways to model the geographical influence. In particular, some researchers employed Gaussian distribution to characterize user’s check-in activities. For example, Cho et al.

(Cho et al., 2011) applied a two-state Gaussian mixture to model the check-ins that close to users’ home or work places. Cheng et al. (Cheng et al., 2012)

proposed a multi-center discovering algorithm to detect user’s check-in centers. Then Gaussian distribution was built on each center, calculating user check-in probabilities on unvisited locations together. On the other hand, some researchers proposed the kernel density estimation (KDE) to estimate user’s check-in activities. Ye et al.

(Ye et al., 2011)

discovered that user’s check-in behaviors were in a power law distribution pattern. The power law pattern revealed two locations’ co-occurrence probability distribution over their distance, and this discovery was also employed in

(Liu et al., 2013). Besides, Liu et al. (Liu et al., 2014) exploited geographical characteristics from a location perspective, which were modeled by two levels of neighborhoods, i.e., instance-level and region-level.

Recently, deep neural networks are also applied in POI recommendation. In (Yang et al., 2017), Yang et al. proposed to exploit context graphs and apply user/POI smoothing to address data sparsity and various context. In (Manotumruksa et al., 2017), Manotumruksa et al. proposed a deep recurrent collaborative filtering framework with a pairwise ranking function.

2.2. Attention Mechanism in Recommendation

The idea of attention mechanism in neural networks is loosely based on the visual attention found in humans, which has demonstrated the effectiveness in various machine learning tasks such as document classification

(Yang et al., 2016) and machine translation (Luong et al., 2015; Vaswani et al., 2017).

Recently, researchers also adopt attention mechanism on recommendation tasks. In (Pei et al., 2017)

, Pei et al. adopted an attention model to measure the relevance between users and items, which can capture the joint effects on user-item interactions. Wang et al.

(Wang et al., 2017) proposed a hybrid attention model to adaptively capture the change of editors’ selection criteria. In (Gong and Zhang, 2016), Gong et al. adopted attentional mechanism to scan input microblogs and select trigger words. Chen et al. (Chen et al., 2017) proposed item- and component-level attention mechanisms to model the implicit feedback in multimedia recommendation. In (Seo et al., 2017)

, Seo et al. proposed to model user preferences and item properties using convolutional neural networks (CNNs) with dual local and global attention, where the local attention provides insight on a user’s preferences or an item’s properties and the global attention helps CNNs focus on the semantic meaning of the review text.

However, our method is different from existing works. We propose a self-attentive encoder to further discriminate the user preference on checked-in POIs in multiple aspects. This is achieved by the proposed multi-dimensional attention mechanism, which utilizes an importance vector to depict the user preference. Furthermore, we adopt a neighbor-aware decoder to incorporate the geographical influence checked-in POIs applied on unvisited POIs, which makes the user reachability higher on the nearby neighbors of checked-in POIs. To the best of our knowledge, this is the first study to apply attention-based autoencoder in POI recommendation.

3. Preliminaries

In this section, we first introduce the definitions and notations. Then we review the basic ideas of autoencoders.

3.1. Definition and Notation

For ease of illustration, we first summarize the definitions and notations.

Definition 1. (POI) A POI is defined as a uniquely identified site (e.g., a restaurant) with two attributes: an identifier and geographical coordinates (latitude and longitude).

Definition 2. (Check-in) A check-in is a record that demonstrates a user has visited a POI at a certain time. A user’s check-in is represented by a 3-tuple: user ID, POI ID, and the timestamp.

Definition 3. (POI Recommendation) Given users’ check-in records, POI recommendation aims at recommending a list of POIs for each user that the user is interested in but never visited.

, the number of users and POIs
, the input data and reconstructed data
the check-in frequency matrix
the confidence matrix

the weight matrix and bias vector

the activation function
the dimension of the bottleneck layer
the dimension of the importance vector
the parameter to control POI’s correlation level
, the parameters of the weighting scheme
the regularization term
Table 1. List of notations.

POI recommendation is commonly studied on a user-POI check-in matrix , where there are users and locations, each entry represents the frequency user checked-in at location . We denote the binary rating matrix as , where each entry indicates whether user has visited location . The terms POI and location are used interchangeably in this paper. Here, following common symbolic notation, upper case bold letters denote matrices, lower case bold letters denote column vectors without any specification, and non-bold letters represent scalars. The notations are shown in Table 1.

3.2. Autoencoders

A single hidden-layer autoencoder (AE) is an unsupervised neural network, which is composed of two parts, i.e., an encoder and a decoder. The encoder has one activation function that maps the input data to the latent space. The decoder also has one activation function mapping the representations in the latent space to the reconstruction space. Given the input , a single hidden-layer autoencoder is shown as follows:


where , and denote the weight matrices, bias vectors and activation functions, respectively. is the reconstructed version of . The output of the encoder is the representation of

in the latent space. The goal of the autoencoder is to minimize the reconstruction error of the output and the input. The loss function is shown as follows:


Relations to Matrix Factorization. One reason that the autoencoder is capable of recommendation is that its formulation is much similar to the classical matrix factorization (Koren et al., 2009). Let we denote and as user latent factors and item latent factors, respectively, where has the same dimension with If is set to the identity function, then the formula in Eq. 1 can be rephrased as follows:


where is the predicted rating of a user on an item , is the -th element of , which can be treated as the item bias. The rephrased formula demonstrates the strong relations between autoencoders and matrix factorization in recommendation, which makes autoencoders have the ability to recommend items.

Relations to word2vec. word2vec (Mikolov et al., 2013)

is an effective and scalable method to learn embedding representations in word sequences, modeling words’ contextual correlations in word sentences. word2vec utilizes either of two model architectures to produce a distributed representation of words: continuous bag-of-words (CBOW) or continuous skip-gram. Taking continuous skip-gram for example, the input of this model is an one-hot vector to represent the current word, then the model uses the current word to predict the surrounding window of context words. This model is highly similar to AE when the input of AE is an one-hot vector. If the current word is

and target word is , we set the activation function to identity and bias to zero, then the output of the decoder is:


where and are the -th column and -th row of and , respectively. If we further apply on the output of the decoder:


where this probability shows how likely the word will appear in the window of the current word . The combination of Eq. 4 and 5 is similar to the Eq. 2 in (Mikolov et al., 2013). In our POI recommendation setting, this formula demonstrates if a user has checked-in location , how likely the user would check-in location . Therefore, the product of and can be used for capturing the relation between and in a single hidden-layer AE.

4. Methodologies

In this section, we introduce the proposed model for POI recommendation, which consists of two components, i.e., a self-attentive encoder and a neighbor-aware decoder, demonstrating in Figure 1. We first present the stacked autoencoder as our major building block. Then we illustrate the self-attentive encoder to adaptively select representative POIs that can reflect users’ preferences. Next, we demonstrate the neighbor-aware decoder to model the geographical influence in POI recommendation, which is a phenomenon that users tend to check-in those unvisited POIs that close to a POI they checked-in before. Lastly, we present the loss function for implicit feedback and how to optimize the proposed model.

4.1. Model Basics

To learn the user hidden representation and reconstruct user preferences on unvisited POIs, we propose to adopt a stacked autoencoder, where the deep network architecture and non-linear activation functions can capture the non-linear and complex user-POI interactions (He et al., 2017). Formally, the stacked autoencoder is shown as follows:


where , , and are parameter matrices of the stacked AE. is the dimension of the first hidden layer, and is the dimension of the bottleneck layer. and are the hidden representation and reconstructed ratings of user , respectively.

Figure 1. The model architecture. The yellow part is the self-attentive encoder, the green part is the neighbor-aware decoder, and the gray part is the attention network. The bright yellow rectangle is the user hidden representation. Specifically, Att_Layer denotes the attention layer and Agg_Layer denotes the aggregation layer.

4.2. Self-Attentive Encoder

As presented in section 4.1, we apply a stacked AE to learn users’ hidden representations. In the proposed model, the input is a multi-hot user preference vector , where 1 in the vector indicates the user has been to a certain POI. Based on the input, the encoder of a vanilla stacked AE works as follows: (1) given a user’s check-in set , where is the index of a POI, corresponding POI vectors (e.g., ) in are selected and summed; (2) after having the summed vector, perform the activation function to get the user hidden representation. Here, works like a POI embedding matrix, which is similar to the word embedding matrix in the word2vec model.

Since the model input is a multi-hot vector, which makes each embedding in equally contribute to the user hidden representation, where is the slicing operation that selects corresponding POI vectors to form a sub-matrix of , which has the size -by-:


where is the -th column of .

However, in the user check-in history, there should be some POIs more representative than others that can directly reflect a user’s preferences. These representative POIs should contribute more to the user hidden representation to express the user preference. This inspires us to propose a self-attentive mechanism, which learns a weighted sum of embeddings in to form a user’s hidden representation.

The goal of the self-attentive encoder is to adaptively assign different importances on checked-in POIs for expressing various users’ preference levels. Then the embeddings of checked-in POIs are aggregated in a weighted manner to characterize users. Given checked-in POI embeddings of user , we use a single-layer network without bias to compute the importance score (attention score):


where is the parameter in the attention layer, the ensures all the computed weights sum up to 1. Then we sum up the embeddings in according to the importance score provided by to get a vector representation of the user:


However, the standard attention mechanism that assigning a single importance value to a POI makes the model focus on only one specific aspect of POIs (Lin et al., 2017), which is not sufficient to reflect the sophisticated human sentiment on POIs. Taking a restaurant for example. From the perspective of food flavor, a user likes this restaurant; from the perspective of eating environment, the user may think the restaurant is not good enough. Thus, to capture the user preference from different aspects, we may need to perform multiple times of Eq. 8 with different sets of parameters.

Therefore, we adopt an importance score matrix to capture the effects of multiple-dimensional attention (Vaswani et al., 2017) on POIs. Each dimension of the importance scores represents the importance levels of checked-in POIs in a certain aspect. Suppose we want aspects of attention to be extracted from the embeddings, then we can extend to 111We also tried using a two-layer neural network to compute the importance score matrix, which achieves similar performance with the single-layer one.:


where is the importance score matrix, each column of is the importance vector of a specific POI, and each row of depicts the importance levels of checked-in POIs in a certain aspect. The is performed along the second dimension of its input. By multiplying the importance score matrix with the POI embeddings, we have:


where is the matrix representation of user , which depicts the user from aspects. To make the matrix representation of users fit our encoder, we have one more neural layer to aggregate users’ representations from different aspects into one aspect. Then the vector representation of user is shown:


where is the parameter in the aggregation layer.

4.3. Neighbor-Aware Decoder

In LBSNs, there is physical distance between users and POIs, which is a unique property distinguishing POI recommendation from other recommendation tasks. In a user’s check-in history, the user’s occurrences are typically constrained in several certain areas. This is the well-known geographical clustering phenomenon (a.k.a geographical influence) in users’ check-in activities, which has been exploited to largely improve the POI recommendation performance (Ye et al., 2011; Cheng et al., 2012; Liu et al., 2014; Lian et al., 2014; Li et al., 2015, 2016). Different from most of the previous studies that mainly exploit geographical influence from a user’s perspective: learning the geographical distribution of each individual user’s check-ins (Ye et al., 2011; Cheng et al., 2012; Li et al., 2016) or modeling the user preference on a POI from both this POI and its neighbors (Lian et al., 2014; Li et al., 2015), the proposed neighbor-aware influence model captures the geographical influence solely from the perspective of POIs.

According to aforementioned geographical influence, one intuition contributes to this phenomenon: users prefer to check-in POIs surrounded a POI that they visited before. From this intuition, a checked-in POI may have impacts on other unvisited POIs, and the impact level is determined by the properties and distance of two POIs. Inspired by the skip-gram model of word2vec, which applies the inner product to predict the context words given an input word, we also leverage similar techniques to model the influence a checked-in POI exerted on unvisited POIs (section 3.2, relations to word2vec). The proposed technique can discover unvisited POIs that may be similar and close to the visited POIs. Similarly, we treat as the POI embedding matrix (the first weight matrix in word2vec) and as the context POI embedding matrix (the second weight matrix in word2vec). Moreover, the proposed method is also similar to FISM (Kabbur et al., 2013), where FISM adopts two matrices of item latent factors to model the similarity between items.

Formally, given a user’s check-in set , the influence checked-in POIs exerted on unvisited POIs is shown:


where . Each column of is the influence a certain checked-in POI applied on all other POIs (the influence on itself is set to 0).

The above inner product gives a basic indication about how related two POIs are, however, it does not explicitly take the distance between two POIs into consideration. According to Tobler’s First Law of Geography, everything is related to everything else, but near things are more related than distant things. To incorporate the geographical distance property, we adopt the Gaussian radial basis function kernel (RBF kernel) to further make checked-in POIs exert more influence on nearby unvisited POIs. The RBF kernel is shown as follows:


where and are the geographical coordinates of two POIs and . is a hyper-parameter to control the geographical correlation level of two given POIs, a larger value of will lead to a larger . The value range of RBF kernel is . For computation simplicity, if the value of is less than 0.1, we set it to 0. We can pre-compute the pair-wise RBF value of each POI pair to get a RBF value matrix .

By incorporating the RBF kernel, our neighbor-aware influence model is shown:


where is the RBF kernel value from Eq. 14, is the element-wise multiplication.

To obtain the accumulated influence from all checked-in POIs, we sum along the row of to get :


where and are the row and column index, respectively.

To incorporate the neighbor-aware influence, the decoder of the proposed model can be rewritten as:


where captures the user preference, models the neighbor-aware geographical influence.

Discussion. As we mentioned before, the way we adopt the inner product to capture the relations between POIs is similar to FISM (Kabbur et al., 2013), if we treat as and as in FISM. In FISM, the predicted rating of user on item is mainly estimated by , where is the set of items rated by user , and are learned item latent factors from and , respectively.

4.4. Weighted Loss for Implicit Feedback

In POI recommendation, check-in data is treated as implicit feedback. Since a user’s check-in records only include the locations she visited, and the visit frequency indicates the confidence level of her preference. Therefore, there are only positive examples observed in the check-in records, which makes POI recommendation an One Class Collaborative Filtering (OCCF) problem (Pan et al., 2008; Hu et al., 2008).

To tackle the OCCF problem and capture user preferences from check-in data, we adopt a general weighting scheme (Hu et al., 2008) to distinguish visited and unvisited POIs. Specifically, we consider all unvisited locations as negative examples, and assign the weights of all negative examples to the same value, e.g., 1. As for visited locations, the weights are increased monotonically with users’ check-in frequencies. With such a weighting scheme, our model not only distinguishes visited and unvisited POIs, but also discriminates the confidence levels of all visited POIs. The objective function for implicit feedback is presented as follows,


where is the element-wise multiplication of matrices. is the Frobenius norm of matrices. In particular, we set the confidence matrix as follows:


where and are hyper-parameters. This setting exactly encodes the observation that the frequency is a confidence of user preferences. This weighted loss with a vanilla autoencoder can be used in other recommendation tasks that take implicit feedback as input.

4.5. Network Training

By combining regularization terms, the objective function of the proposed model is shown as follows:


where is the regularization parameter, includes , , and . and are the learned parameters in the attention layer and aggregation layer, respectively. By minimizing the objective function, the partial derivatives with respect to all the parameters can be computed by gradient descent with back-propagation. And we apply Adam (Kingma and Ba, 2014) to automatically adapt the learning rate during the learning procedure. The mini-batch training algorithm is shown in Alg. 1.

1Input: , ;
2 Initialize parameters , , , , ;
3 numBatches = ;
4 while iter ¡ numIterations do
5       Shuffle() ;
6       for batchID = 0; batchID ¡ numBatches; batchID++ do
7             = ExtractBatchData(batchID, ) ;
8             Apply Eq. 7 to get for each user in ;
9             Apply Eq. 10, Eq. 11 and Eq. 12 to get ;
10             Apply Eq. 15 and Eq. 16 to get ;
11             Apply Eq. 6 and Eq. 17 to get ;
12             Apply Eq. 20 to obtain and back-propagate the error through the entire network ;
14       end for
16 end while
Algorithm 1 Training Algorithm

Recommendation. At prediction time, the proposed model takes each user’s binary rating vector as input and obtains the reconstructed rating vector as output. Then the POIs that are not in training set and have largest prediction scores in are recommended to the user.

5. Experiments

In this section, we evaluate the proposed model with the state-of-the-art methods on three real-world datasets.

5.1. Datasets

We evaluate the proposed model on three real-world datasets: Gowalla (Cho et al., 2011), Foursquare (Liu et al., 2017) and Yelp (Liu et al., 2017). The Gowalla dataset was generated worldwide from February 2009 to October 2010. The Foursquare dataset comprises check-ins from April 2012 to September 2013 within the United States (except Alaska and Hawaii). The Yelp dataset is obtained from the Yelp dataset challenge round 7. Each check-in record in above datasets includes a timestamp, a user ID, a POI ID, and the latitude and longitude of this POI.

To filter noisy data, for the Gowalla dataset, we remove users whose total check-ins are less than 20 and POIs visited less than 20 times; for the Foursquare and Yelp datasets, we eliminate those users with fewer than 10 check-in POIs, as well as those POIs with fewer than 10 visitors. The data statistics after preprocessing are shown in Table 2. For each user, we randomly select 20% of her visiting locations as ground truth for testing. The remaining constitutes the training set. Similar data partition methods have been widely used in previous work (Lian et al., 2014; Gao et al., 2015; Ye et al., 2011) to validate the performance of POI recommendation. The random selection is carried out six times independently, we tune the model on one partition and report the average results on the rest five partitions.

Dataset #Users #POIs #Check-ins Density
Gowalla 43,074 46,234 1,720,082 0.0500%
Foursquare 24,941 28,593 1,196,248 0.1006%
Yelp 30,887 18,995 860,888 0.1399%
Table 2. The statistics of datasets.

5.2. Evaluation Metrics

We evaluate our model versus other models in terms of Precision@k, Recall@k and MAP@k. For each user, Precision@k indicates what percentage of locations among the top recommended POIs has been visited by her, while Recall@k indicates what percentage of her visiting locations can emerge in the top recommended POIs. MAP@k is the mean average precision at , where average precision is the average of precision values at all ranks where relevant POIs are found. They are formally defined as follows,


where is a set of top- unvisited locations recommended to user excluding those locations in the training, and is a set of locations that are visited by user in the testing. is the precision of a cut-off rank list from to , and is an indicator function that equals to if the location is visited in the testing, otherwise equals to .

5.3. Methods Studied

To demonstrate the effectiveness of our model, we compare to the following POI recommendation methods.

Traditional MF methods for implicit feedback222The implementations are from LibRec: https://www.librec.net/:

  • WRMF, weighted regularized matrix factorization (Hu et al., 2008), which minimizes the square error loss by assigning both observed and unobserved check-ins with different confidential values based on matrix factorization.

  • BPRMF, Bayesian personalized ranking (Rendle et al., 2009), which optimizes the ordering of the preferences for the observed locations and the unobserved locations.

Classical POI recommendation methods333In a recent study (Liu et al., 2017) that evaluated a number of POI recommendation methods, RankGeoFM and IRENMF achieve the best results on three datasets.:

  • MGMMF, a multi-center Gaussian model fused with matrix factorization (Cheng et al., 2012), which learns regions of activities for each user using multiple Gaussian distributions.

  • IRENMF, instance-region neighborhood matrix factorization (Liu et al., 2014), which incorporates instance-level and region-level geographical influence into weighted matrix factorization.

  • RankGeoFM, ranking-based geographical factorization (Li et al., 2015), which is an ranking-based matrix factorization model that learns users’ preference rankings for POIs, as well as includes the geographical influence of neighboring POIs.

Deep learning-based methods:

  • PACE, preference and context embedding (Yang et al., 2017), a deep neural architecture that jointly learns the embeddings of users and POIs to predict both user preference over POIs and various context associated with users and POIs.

  • DeepAE, a three-hidden-layer autoencoder with a weighted loss function (section 4.4).

The proposed method:

  • SAE-NAD, the proposed model with self-attentive encoder (section 4.2) and neighbor-aware decoder (section 4.3) for implicit feedback (section 4.4).

(a) Precision@k on Gowalla
(b) Recall@k on Gowalla
(c) MAP@k on Gowalla
Figure 2. The comparison of performance on Gowalla.
(a) Precision@k on Foursquare
(b) Recall@k on Foursquare
(c) MAP@k on Foursquare
Figure 3. The comparison of performance on Foursquare.
(a) Precision@k on Yelp
(b) Recall@k on Yelp
(c) MAP@k on Yelp
Figure 4. The comparison of performance on Yelp.

5.4. Parameter Settings

In the experiments, the latent dimension of all the models is set to 50. The dimension of the importance vector and the geographical correlation level are selected by grid search, which are set to 20 and 60, respectively. The parameters of the weighting scheme and are set to 2.0 and 1e-5, respectively. The gradient descent parameters, learning rate and regularization , are set to 0.001 and 0.001, respectively. - are set as the function, is set to the function. The batch size is set to 256. On the Gowalla dataset, we set the network architecture as ; otherwise, the network architecture is set as

. In addition, Dropout is used except for the first and last layer, where the Dropout probability is set to 0.5. Our model is implemented with PyTorch

444https://pytorch.org/ running on GPU machines of Nvidia GeForce GTX 1080 Ti555Code is available at https://github.com/allenjack/SAE-NAD.

For other baseline methods, following parameter settings achieve relatively good performance. DeepAE adopts the same network architecture and weighted loss function with the proposed model. PACE also uses the same network architecture (except for the hidden dimension) and parameters with the original paper. For RankGeoFM, the number of the nearest neighbors is set to 300, the regularization radius is set to 1.0, the regularization balance is set to 0.2, and the ranking margin is set to 0.3 on all datasets. As for IRENMF, , and are set to 0.015, 0.015 and 1, respectively; the instance weighting parameter

is set to 0.6; as a preprocessing step, the model uses the k-means algorithm to cluster locations into 100 groups and the number of the nearest neighbors for each location is set to 10. For

MGMMF, the and of the Poisson Factor Model are set to 20 and 0.2, respectively; , and the distance threshold of the Multi-center Gaussian Model are set to 0.2, 0.02 and 15. WRMF adopts the same weighting scheme as the proposed model.

5.5. Performance Comparison

The performance comparison of our model and baseline models are shown in Figure 2, 3 and 4.

Observations about our model

. First, our proposed model–SAE-NAD achieves the best performance on three datasets with all evaluation metrics, which illustrates the superiority of our model. Second, SAE-NAD outperforms PACE, one possible reason is that PACE models the important geographical influence by a context graph, which does not explicitly model the user reachability to unvisited POIs. Instead, SAE-NAD directly captures the geographical influence between checked-in POIs and unvisited POIs through the neighbor-aware decoder. Third, SAE-NAD achieves better results than DeepAE, the major reason is that DeepAE only applies a multi-layer perceptron to model the check-in data without considering other context information in the check-in records. Fourth, SAE-NAD outperforms RankGeoFM and IRENMF. Although these two methods effectively incorporate geographical influence into a ranking model and an MF model, respectively, they still apply the inner product to predict users’ preferences on POIs, which cannot sufficiently capture the non-linear interactions between users and POIs. On the other hand, SAE-NAD adopts a deep neural structure with non-linear activation functions to model the complex interactions in the user check-in data. Fifth, although MGMMF models the geographical influence effectively, it is not good at capturing user preferences from implicit feedback. Nevertheless, SAE-NAD encodes user’s check-in frequencies into the weighting scheme, which indicates the confidence of users’ preferences. Sixth, SAE-NAD outperforms BPRMF, because BPRMF only learns the pair-wise ranking of locations based on user preferences, it does not incorporate the context information such as spatial information of POIs. On the contrary, SAE-NAD integrates the geographical influence to further improve the performance. Besides, unlike existing methods that do not deeply explore the implicitness of users’ preferences on checked-in POIs, SAE-NAD assigns an importance vector to each checked-in POI to characterize the user preference in multiple aspects.

Other observations. First, PACE outperforms all other baseline methods, since its neural embedding part models the user-POI interactions through the implicit feedback data. In the meanwhile, the context graph incorporates the context knowledge from the unlabeled data. Second, RankGeoFM and IRENMF both perform relatively well, which confirms the results reported in (Liu et al., 2017). Third, although DeepAE applies a deep neural structure with weighted loss for implicit feedback, it still does not achieve better results than RankGeoFM and IRENMF. The reason is that DeepAE does not adopt the geographical information which is distinct for POI recommendation. But DeepAE performs better than WRMF and BPR, which may confirm that a deep network structure with non-linear activation functions can capture more sophisticated relations. Fourth, both WRMF and BPRMF are superior to MGMMF, one possible reason is that MGMMF is based on the probabilistic factor model, which models user check-in frequencies directly, instead of modeling user preferences on POIs. On the other hand, WRMF and BPRMF are designed for implicit feedback. WRMF not only considers the observed check-ins, but also gives a small confidence to all unvisited locations. On the other hand, BPRMF leverages location pairs as training data and optimize for correctly ranking location pairs.

Gowalla P@10 R@10 MAP@10
WAE 0.05599 0.13819 0.06728
SAE-WAE 0.06039 0.14808 0.07257
NAD-WAE 0.07029 0.17915 0.08699
Foursquare P@10 R@10 MAP@10
WAE 0.05961 0.11134 0.05632
SAE-WAE 0.06346 0.11813 0.06054
NAD-WAE 0.06598 0.12546 0.06333
Yelp P@10 R@10 MAP@10
WAE 0.03764 0.07386 0.03198
SAE-WAE 0.03951 0.07586 0.03307
NAD-WAE 0.04115 0.08016 0.03402
Table 3. The performance of the self-attentive encoder and neighbor-aware decoder on Gowalla, Foursquare, and Yelp.

5.6. Impacts of Self-Attentive Encoder and Neighbor-Aware Decoder

The self-attentive encoder and neighbor-aware decoder are two important components of the proposed model. To verify the performance of each component, we solely evaluate each component in the weighted stacked autoencoder (section 4.4). Here, we denote the stacked autoencoder with the weighted loss as WAE (equals to DeepAE), the self-attentive encoder (SAE) with WAE as SAE-WAE, and the neighbor-aware decoder (NAD) with WAE as NAD-WAE. The performance is shown in Table 3.

The results in Table 3 exhibit the effectiveness of the individual component of the proposed model. There are several observations: (1) The autoencoder with the weighted loss (WAE) achieves a reasonably good result, which even better than some baseline methods that incorporating the geographical influence. This illustrates that the frequency of the implicit feedback is a significant point to reveal user preferences. (2) By adopting the self-attention mechanism, SAE-WAE outperforms WAE on three datasets. The reason is that the self-attentive encoder attends the POIs that are more representative to reflect user preferences, leading to more personalized and effective user hidden representations. (3) NAD-WAE achieves better performance than SAE-WAE and WAE on three datasets. The reason why NAD-WAE performs better is that NAD-WAE captures the correlations between checked-in POIs and unvisited POIs, and applies these effects to the last layer of the decoder which directly determines the model output. The results further confirm that modeling geographical influence is essential for POI recommendation.

5.7. Sensitivity of Parameters

(a) on Gowalla.
(b) on Foursquare.
Figure 5. The effect of .
(a) on Gowalla.
(b) on Foursquare.
Figure 6. The effect of .

In the proposed model, two hyper-parameters are critical for performance improvement: the number of attention aspects in the self-attentive encoder (section 4.2) and the geographical correlation level of POIs in the neighbor-aware decoder (section 4.3). The effects of these two parameters are shown in Figure 5 and 6. Due to the space limit, we only present the effects on Gowalla and Foursquare datasets, the parameter effects on Yelp dataset have similar trends.

The variation of is shown in Figure 5. We can observe that a single importance value from the attention layer is not sufficient to express the complex human sentiment on checked-in POIs. By assigning an importance vector to each checked-in POI, the user preference on those visited POIs can be captured from different aspects. With the increase of , the model performance largely improves and becomes steady.

The variation of is shown in Figure 6. From the figure, we can observe that when the model does not consider the distance between POIs, leading to poor results. This also testifies the significance of geographical influence in POI recommendation. The larger value of strengthens the correlated level between two certain POIs, where neighbors of checked-in POIs will make a big difference in the inference of users’ preferences.

6. Conclusion

In this paper, we proposed an autoencoder-based model for POI recommendation, which consists of a self-attentive encoder and a neighbor-aware decoder. In particular, the self-attentive encoder was used to adaptively discriminate the degree of user preference on each checked-in POI, by assigning an importance score vector. The neighbor-aware decoder was adopted to model the geographical influence checked-in POIs exerted on unvisited POIs, which differentiates the user reachability on unvisited POIs. Experimental results on three real-world datasets clearly validated the improvements of our model over many state-of-the-art baseline methods.


  • (1)
  • Bao et al. (2015) Jie Bao, Yu Zheng, David Wilkie, and Mohamed F. Mokbel. 2015. Recommendations in location-based social networks: a survey. GeoInformatica 19, 3 (2015), 525–565.
  • Bobadilla et al. (2013) Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. 2013. Recommender systems survey. Knowl.-Based Syst. 46 (2013), 109–132.
  • Chen et al. (2017) Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR. ACM, 335–344.
  • Cheng et al. (2012) Chen Cheng, Haiqin Yang, Irwin King, and Michael R. Lyu. 2012. Fused Matrix Factorization with Geographical and Social Influence in Location-Based Social Networks. In AAAI. AAAI Press.
  • Cheng et al. (2013) Chen Cheng, Haiqin Yang, Michael R. Lyu, and Irwin King. 2013. Where You Like to Go Next: Successive Point-of-Interest Recommendation. In IJCAI. IJCAI/AAAI, 2605–2611.
  • Cho et al. (2011) Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility: user movement in location-based social networks. In KDD. ACM, 1082–1090.
  • Gao et al. (2015) Huiji Gao, Jiliang Tang, Xia Hu, and Huan Liu. 2015. Content-Aware Point of Interest Recommendation on Location-Based Social Networks. In AAAI. AAAI Press, 1721–1727.
  • Gong and Zhang (2016) Yuyun Gong and Qi Zhang. 2016. Hashtag Recommendation Using Attention-Based Convolutional Neural Network. In IJCAI. IJCAI/AAAI Press, 2782–2788.
  • He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. ACM, 173–182.
  • Hu et al. (2008) Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM. IEEE Computer Society, 263–272.
  • Kabbur et al. (2013) Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: factored item similarity models for top-N recommender systems. In KDD. ACM, 659–667.
  • Kingma and Ba (2014) Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).
  • Koren et al. (2009) Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. IEEE Computer 42, 8 (2009), 30–37.
  • Li et al. (2016) Huayu Li, Yong Ge, Richang Hong, and Hengshu Zhu. 2016. Point-of-Interest Recommendations: Learning Potential Check-ins from Friends. In KDD. ACM, 975–984.
  • Li et al. (2015a) Huayu Li, Richang Hong, Shiai Zhu, and Yong Ge. 2015a. Point-of-Interest Recommender Systems: A Separate-Space Perspective. In ICDM. IEEE Computer Society, 231–240.
  • Li et al. (2015b) Sheng Li, Jaya Kawale, and Yun Fu. 2015b. Deep Collaborative Filtering via Marginalized Denoising Auto-encoder. In CIKM. ACM, 811–820.
  • Li et al. (2015) Xutao Li, Gao Cong, Xiaoli Li, Tuan-Anh Nguyen Pham, and Shonali Krishnaswamy. 2015. Rank-GeoFM: A Ranking based Geographical Factorization Method for Point of Interest Recommendation. In SIGIR. ACM, 433–442.
  • Lian et al. (2014) Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, Enhong Chen, and Yong Rui. 2014. GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation. In KDD. ACM, 831–840.
  • Lin et al. (2017) Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-attentive Sentence Embedding. CoRR abs/1703.03130 (2017).
  • Liu et al. (2013) Bin Liu, Yanjie Fu, Zijun Yao, and Hui Xiong. 2013. Learning geographical preferences for point-of-interest recommendation. In KDD. ACM, 1043–1051.
  • Liu et al. (2017) Yiding Liu, Tuan-Anh Pham, Gao Cong, and Quan Yuan. 2017. An Experimental Evaluation of Point-of-interest Recommendation in Location-based Social Networks. PVLDB 10, 10 (2017), 1010–1021.
  • Liu et al. (2014) Yong Liu, Wei Wei, Aixin Sun, and Chunyan Miao. 2014. Exploiting Geographical Neighborhood Characteristics for Location Recommendation. In CIKM. ACM, 739–748.
  • Luong et al. (2015) Thang Luong, Hieu Pham, and Christopher D. Manning. 2015.

    Effective Approaches to Attention-based Neural Machine Translation. In

    EMNLP. The Association for Computational Linguistics, 1412–1421.
  • Manotumruksa et al. (2017) Jarana Manotumruksa, Craig Macdonald, and Iadh Ounis. 2017. A Deep Recurrent Collaborative Filtering Framework for Venue Recommendation. In CIKM. ACM, 1429–1438.
  • Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS. 3111–3119.
  • Pan et al. (2008) Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan M. Lukose, Martin Scholz, and Qiang Yang. 2008. One-Class Collaborative Filtering. In ICDM. IEEE Computer Society, 502–511.
  • Pei et al. (2017) Wenjie Pei, Jie Yang, Zhu Sun, Jie Zhang, Alessandro Bozzon, and David M. J. Tax. 2017. Interacting Attention-gated Recurrent Networks for Recommendation. In CIKM. ACM, 1459–1468.
  • Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461.
  • Seo et al. (2017) Sungyong Seo, Jing Huang, Hao Yang, and Yan Liu. 2017. Interpretable Convolutional Neural Networks with Dual Local and Global Attention for Review Rating Prediction. In RecSys. ACM, 297–305.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 6000–6010.
  • Wang et al. (2015) Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In KDD. ACM, 1235–1244.
  • Wang et al. (2017) Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration. In KDD. ACM, 2051–2059.
  • Wu et al. (2016) Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In WSDM. ACM, 153–162.
  • Yang et al. (2017) Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han. 2017.

    Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation. In

    KDD. ACM, 1245–1254.
  • Yang et al. (2016) Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical Attention Networks for Document Classification. In HLT-NAACL. The Association for Computational Linguistics, 1480–1489.
  • Ye et al. (2010) Mao Ye, Peifeng Yin, and Wang-Chien Lee. 2010. Location recommendation for location-based social networks. In GIS. ACM, 458–461.
  • Ye et al. (2011) Mao Ye, Peifeng Yin, Wang-Chien Lee, and Dik Lun Lee. 2011. Exploiting geographical influence for collaborative point-of-interest recommendation. In SIGIR. ACM, 325–334.
  • Yin et al. (2015) Hongzhi Yin, Xiaofang Zhou, Yingxia Shao, Hao Wang, and Shazia Wasim Sadiq. 2015. Joint Modeling of User Check-in Behaviors for Point-of-Interest Recommendation. In CIKM. ACM, 1631–1640.
  • Yuan et al. (2014) Quan Yuan, Gao Cong, and Aixin Sun. 2014. Graph-based Point-of-interest Recommendation with Geographical and Temporal Influences. In CIKM. ACM, 659–668.
  • Zhang and Chow (2013) Jia-Dong Zhang and Chi-Yin Chow. 2013. iGSLR: personalized geo-social location recommendation: a kernel density estimation approach. In SIGSPATIAL/GIS. ACM, 324–333.
  • Zhao et al. (2016) Shenglin Zhao, Tong Zhao, Haiqin Yang, Michael R. Lyu, and Irwin King. 2016. STELLAR: Spatial-Temporal Latent Ranking for Successive Point-of-Interest Recommendation. In AAAI. AAAI Press, 315–322.