Neighborhood-Enhanced and Time-Aware Model for Session-based Recommendation

09/25/2019 ∙ by Yang Lv, et al. ∙ 0

Session based recommendation has become one of the research hotpots in the field of recommendation systems due to its highly practical value.Previous deep learning methods mostly focus on the sequential characteristics within the current session and ignore collaborative information.SessionKNN is a strong baseline for session based recommendation since it utilizes the collaborative information from neighborhood sessions.However,SessionKNN neglects the sequential characteristics within the current session.To this end,we propose a novel neural networks framework,namely Neighborhood Enhanced and Time Aware Recommendation Machine(NETA) for session based recommendation. Firstly,we introduce an efficient neighborhood retrieve mechanism to find out similar sessions which includes collaborative information.Then we design a guided attention with time-aware mechanism to extract collaborative representation from neighborhood sessions.Especially,temporal recency between sessions is considered separately.Finally, we design a simple co-attention mechanism to determine the importance of complementary collaborative representation when predicting the next item.Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of our proposed model.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recommender Systems(RS) are critical for online users to alleviate the problem of information overload in the era of big data and are widely used in various scenarios.Session based recommendation(SRS) has become one of the research hotpots in the field of recommendation systems due to its highly practical value.In common situations,user’s profiles or past interactions may be not available for recommendation systems,since some users are anonymous/first-time visitors or the online platform only tracks the session’s identifier[1].To address this problem,session based recommendation is proposed to model limited interactions during the ongoing session while general recommendation methods rely on user’s profiles[1][2].

Previous works have made great progress for session based recommendation in the past few years.Early works are devoted to discovery item-to-item relations,such as transition relation and co-occurrence relation. Markov Chain(MC)

[3][4] and ItemKNN[5][6] are typical examples. Markov Chain assumes the next action is based on the previous one and learns the transition patterns between items.ItemKNN computes the similarity between items based on their co-occurrence frequency. However, these item-to-item models lack the ability of learning complex dependencies within the current session. Recent studies utilize neural network to model the whole action sequence.For instance,Hidasi et al.[1]

apply recurrent neural networks(RNN) for session based recommendation and treat this problem as time series prediction.Li et al.

[2] improve the RNN-based model by proposing an attention mechanism to capture the user’s main purpose additionally.Liu et al.[7] highlight the importance of short-term memory by employing an attentive network to model the last item separately.

It is noteworthy that the aforementioned deep learning methods only focus on sequence modeling and are based on the limited actions of the current sessions, while rich information of collaborative filtering still has the potential to be further exploited.A well-known intuition is that similar sessions tend to click similar items.SessionKNN[8] compares the entire current session with past sessions to determine the items to be recommended based on their occurences in k most similar past sessions(neighbors).But SessionKNN ignores the user’s sequential behavior and interest shift within the current session,which is extremely essential because the orders of the clicked items indicate user’s main intention.Recently,CSRM [9] is proposed to retrieve neighborhood sessions by applying an inner memory encoder(to remember recent sessions’ representations) and an outer memory encoder(to identify neighbors).However,the CSRM model lacks an efficient retrieval capability and can’t associate real similar neighborhood sessions due to memory limitation (the memory network only remembers the last 1w sessions refers to the implemented code for example).

To tackle the above problems,we propose a novel neural networks framework,namely Neighborhood Enhanced and Time Aware Recommendation Machine(NETA) for session based recommendation. We consider the task of SRS from the perspective of the combination of neighborhood model and sequential model.Figure 1 illustrates the workflow of the proposed model. Firstly,we introduce an efficient neighborhood retrieve mechanism to find out k most similar sessions for the current session,which shares similar behavior pattern with the current session and includes collaborative information that indicates the next item.Then,in order to extract complementary representation(i.e.collaborative information) from neighborhood sessions .Inspired by the Transformer model in machine translation[10]

,we apply a guided-attention mechanism that calculate attention weight for each neighborhood session guided by current session,then the weighted sum vector of all neighborhood session representations is used as the complementary collaborative representation.Especially,the attention weights are calculated based on the sequential characteristics and the temporal recency, since the nearness of sessions in time(recency) has been shown to be useful while determining similar sessions

[11].Finally, we design a simple co-attention mechanism to determine the importance of collaborative information when predicting the next item.Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of our proposed framework.Our main contributions are listed as follows:

1.we propose a novel NETA model,which combines the ability of k-nearest-neighbors(KNN) approach to find similar neighborhood sessions effectively and the advantage of neural network to model the sequential behavior within the current session.

2.A novel attention mechasim is proposed,in which the attention weights are calculated by sequential behavior characteristics and the temporal recency. To the best of our knowledge,this is the first effort to take temporal recency into account in neural networks for session based recommendation.

3.We conducted extensive experiments on two real-world dataset Experimental results show that our proposed model achieves state-of-the-art.

Figure 1: Pipeline

2 Methods

In this section,we first introduce the session based recommendation task.Then we describe the proposed NETA in detail.

2.1 Session based Recommendation

Let denotes a set of all unique items that appears in all sessions,and call them item dictionary.A session can be regarded as a sequence of click actions created by an anonymous vistor,eg.listening to a song,watching a video.Let denote session ,where denotes a item is clicked at timestamp n in session s.

denotes the prefix of session s truncated at t-th timestamp.Our model can generate a ranking list over the item dictionary,and calculate the predicted probability for each item,eg.

,where denotes the recommendation score for the i-th item in the item dictionary.Finally,the top- items in are recommended.

2.2 Neighborhood Retrieval

As shown in Figure 1,the first step is to retrieve neighborhood sessions,which aims to learn dynamic collaborative information for the current session which is consisted of limited actions.Neighborhood sessions are associated with those sessions which also interacted with the same item in current session.The whole process is finished in two steps.

:The first step is to search for sessions who interacted with the same item,i.e. like-minded users share similar interests,which are reflected in similar sessions.This step is specially valuable for cold-start scenario.Following SessionKNN[8],we focus on the whole session instead of the last item(eg.ItemKNN) when finding neighborhood sessions.Note that the session’s lookup process is implemented by two hash tables that store the map of session-id to item-id and the map of item-id to session-id, which is highly efficient.

:The second step is to determine the k most similar neighbors.Technically,given a session ,we first determine k most similar past sessions(neighbors)

by applying a suitable similarity measure,e.g.,the cosine similarity.To be specific,each session is represented as a binary vector in the m-dimensional space of items(value of 1 for n-th dimension means the n-th item in item dictionary is in this session).The cosine distance between binary vectors reflects quite well the similarity between sessions.Cosine similarity between

and is given as follow:

Now,given the current session ,its whole neighbors can be found and we choose the k most similar sessions as neighborhood sessions.

2.3 Guided attention with time aware

Since the current session only has limited behaviors and lacks user ’s profile or long-term interactions, which is insufficient for a accurate prediction.In order to improve the precision and diversity of predictions, we make full use of neighborhood sessions by extracting related collaborative information. Thus we propose a novel attention mechanism motivated by Transformer[10].

Figure 2: Guided Attention

Considering giving a query ,a key matrix and a value matrix .We calculate each similarity between query and each vector(row) in key,and apply a softmax function to obtain the final attention weights on value.Here the similarity function is "Scaled Dot-Product Attention".The attended feature is the weighted sum over value based on attention weights.The attended feature is computed as:

In order to improve the expressing ability of attention.Mului-head attention is introduced to jointly pay attention to representation from different subspace at different positions,e.g,some heads may focus on long-term interests while others may concentrate on short-term interests.

Notice that the projection matrices and

Here in our problem,we need to generate collaborative information from neighborhood sessions guided by current session.For example,we can use current session’s main purpose as query and regard neighborhood sessions’ sequential behaviors as key and value.Thus,the generated collaborative representation for the current session can be understood as reconstructing it by all the neighborhood sessions with respect to their scaled dot-product similarity to current session,i.e generating similar behaviors in neighborhood sessions based on main purpose of current session.

Recent research work[11] shows,nearness of sessions in time(recency) has been shown to be of great use while determining similar neighborhood sessions.Since the occurrences of items in the item dictionary do not obey the assumption of independent and identical distribution(iid assumption),an item will only appear in a session when it is released[12].Besides,some items are timeliness and tend to be clicked repeatedly during a certain period, such as seasonal fruits and hot products in e-commerce. Therefore,it is not suitable for current session to consider all neighborhood sessions from different periods as equally significant.

One way to solve this problem is, we choose a head in multi-head attention to focus on the time intervals between neighbor sessions and current session,namely Time Head, which pays more attention to closely sessions.

Since the multi-head attention learns pairwise relationship between current session and each of neighborhood sessions,i.e ,where denotes the i-th sample of neighborhood sessions.Time-Head calculates:

where is the occurrence time of current session.Time-Head mechanism emphasizes temporal recency by adding the weights of time intervals to multi-head attention.

2.4 Sequence modeling

We proceed to present two sequential modeling methods by applying our proposed framework.Two sequential models for SBS are used as examples in this paper,i.e.,NARM[2] and STAMP[7].We denote them as NETA-NARM and NETA-STAMP repectively.

NARM consists of two components:a global encoder and a local encoder.The global encoder focuses on modeling user’s sequential behavior and the local encoder aims at capture user’s main intention in the current session.First the GRU layer converts the input action sequence

into s set to high-dimensional hidden representation

.Then the final hidden state is used as sequential behavior representation,i.e..The weighted sum of all items can reflect user’s main purpose, i.e.which kind of items should pay more attention to.

STAMP model aims to capture user’s long-term and short-term interests in order to obtain user’s main purpose.Long-term interests is generated by an attention mechanism over items in the current session.Short-term interest is simply defined as the embedding of the last clicked item .Then two fully-connected layers are used for feature abstraction, and indicate user’s long-term interest and short-term interest finally.

Here NARM and STAMP are used as a sequence feature extractor to transform item embeddings of a session into a session representation.Take NETA-NARM as an example,we apply NARM for both the current session and neighborhood sessions,and extract the collaborative information from sequential behaviors of neighborhood sessions guided by the main purpose of current session.

2.5 Prediction layer

Both current session representation( or ) and complementary representation from neighborhood sessions have strengths and weaknesses.It is essential to accommodate them.Instead of concatenating them easily ,we design an adaptive method for information fusion.We apply a co-attention mechanism to current session representation and complementary representation to determine which part should play a more important role.

The final session representation is computed as :

where . ‘, ‘.

Let be the i-th item in item dictionary and here can be regarded as a single candidate item.We generate the final recommendation score by calculating dot product of each candidate item and the final session representation :

Predicting the next item is essentially a matter of forcing the model to predict the embedding of next item.For example,if you click on three bottles of milk and a dozen eggs,then the final session representation will lie between the milk and the egg,the model will recommender some items which are closely to the embeddings of the milk and the egg in the semantic space.

The objective function is defined as the cross-entropy of the prediction and ground truth.

where and denotes the one-hot vector of the ground truth item.

3 Experimental Setup

3.1 Research Questions

In this section,we first detail our experimental setup.And then we conduct experiments to evaluate the performance of our NETA model to answer the following research questions:

(RQ1)What is the performance of NETA in session-based recommendation tasks?Does it achieve state-of-the-art performance.

(RQ2)How does the collaborative information influence the performance of NETA model?

(RQ3)How does the temporal recency influence the performance of NETA model?

(RQ4)How do the key hyper-parameters influence the performance of NETA model,such as the number of neighbors and the weight of the Time Head?

3.2 Datasets

We conduct all the experiments on two real-world datasets:Diginetica and Retailrocket.Diginetica comes from CIKM Cup 2016,among them we only used the released transactional data.Retailrocket comes from an e-commerce personalization company,which contains six months of user browsing activities.Following[13],we manually divide the interaction history into sessions through a 30-minute interval.Clicks within 30 minutes of the same user will be regarded as being from the same session,which is consistent with the server. Following [12, 2, 7],we filter out sessions of length 1 and items that appear less than 5 times or only appear in test set.Testing set consists of the sessions from subsequent week. After the pre-processing phase,there remains 202633 sessions of 982961 clicks on 43097 items in Diginetica dataset,and 113433 sessions of 413648 clicks on 24095 items in Retailrocket dataset.

Following[12, 2, 7],for a input session ,we generate the sequences and corresponding labels for training set and testing set.The statics of the two datasets are shown in Table 1.

3.3 Experimental Settings

Baselines.We use the following baseline methods for comparison,including state-of-the-art and closely related work.

Pop[1]:Predictor always recommends the most popular items in the training set of the platform ,which is a simple but highly competitive method and commonly used in the recommendation system.

Session-Pop[1]:Predicter recommends popular items based on occurrence frequency of the current session and the whole training set.

ItemKNN[5]:An item-based k-nearest neighbor method,which recommends the most similar k items for the last item in the current session. Similarity between items is consistent with the co-occurences of two items in the same sessions.

SessionKNN[8]:A session-based k-nearest neighbor method.Scores of candidate items are computed by their occurrences in the neighborhood session when predicting the next item for current session.

GRU4Rec[1]:A RNN-based deep learning model for session based recommendation,which uses a session-parallel mini-batch training process and applys a negative sampling technique during the training phase.

NARM[2]:An improved model based on GRU4Rec,which consists of a global encoder to capture user’s main purpose and a local encoder to model the user’s sequential behavior.

STAMP[7]:An novel short-term attention priority model,which considers both the short-term and long-term interests of the current session.It abandons RNN structure and is based on attention mechanism,which is computationally efficient.

CSRM[9]:A state-of-the-art deep learning model for session based recommendation,which consists of an Inner Memory Encoder and an Outer Memory Encoder,where the IME is for current session modeling and OME remembers the most recent sessions to generate collaborative information.

Dataset train sessions test sessions clicks items avg.len
Diginetica 719470 60858 982961 43097 5.12
Retailrocket 264453 35762 413648 24095 6.68
Table 1: Statistics of the experiment datasets

Evaluation Metric.

In order to measure the performance of the SRS models, we apply the following evaluation metrics, which are widely used in related work.

Recall@20:Recall@K indicates the proportion of test samples with the correct recommended items in the top-k position of the ranking list,which is defined as:

MRR@20:We also use Mean Peciprocal Rank,which is the average of reciprocal ranks of the desie item.

It is notable that main difference between mrr and recall is that the order of the recommended items is considered,which is valuable because the order of recommendations indicates its performance.


We implement NETA with Tensorflow and conduct experiments on a GeForce GTX TitanXGPU.

Following[2, 9],to alleviate over-fitting,we use two dropout layers,the first is right after the embedding layer with 25% dropout,the second is right before the inner product between session representation and embedding of candidate items with 50% dropout.

We use 10% of the training set as validation set for adjustment of hyperparameters for all models that contain hyperparameters.We report the best models which are selected by early stopping based on the Recall@20 score on the validation set.Notice that the validation set does not participate in training the neural networks.

According to the validation set,we use the following hyper-parameters:,where is embedding dimension, is learning rate, is learning rate decay.The mini-batch settings are .The number of neighborhood sessions of NETA is selected in ,and finally set to 40.

4 Results and Analysis

4.1 Rq1

Dataset Diginetica Diginetica RetailRocket RetailRocket
Measures Recall@20 MRR@20 Recall@20 MRR@20
Pop 0.96 0.24 1.24 0.32
Session-Pop 21.11 14.60 40.48 32.04
IKNN 21.11 14.60 40.48 32.04
SKNN 49.79 18.59 61.78 34.39
GRU4Rec 57.95 24.93 - -
STAMP 62.03 27.38 61.08 33.10
NETA-STAMP 63.43 27.83 63.44 33.84
NARM 62.58 27.35 61.79 34.07
CSRM(NARM) 63.07 27.45 63.64 34.76
NETA-NARM 63.34 27.60 64.12 34.85
Table 2: Performance Comparison(More experiments need to be conducted for hyperparameters adjustment)

We compare our NETA model to all baselines and find that NETA achieves state-of-the-art performance. The experimental results are shown in Table 2.

We have the following observations:

1)The session-Pop has substantial gain over the Pop while the main difference them is that Session-Pop repeatedly recommends some items of the current session,which indicates us repeat consumption is of great significance.This is mainly because the session is a collection of similar items to a certain extent, and a user’s interest within an individual session can be considered to be stable.

2)As for two KNN-based methods(ItemKNN and SessionKNN),we observe that SessionKNN outperforms ItemKNN.The reason is that SessionKNN makes full use of each click item within the session while ItemKNN only pays attention to the last item,which is obviously insufficient.It is notable that although SessionKNN takes advantage of the entire session and considers collaborative filtering,it neglects the sequential order within the session ,which is exactly the problem solved in this paper.

3)NETA outperforms both conventional baselines and state-of-the-art methods.As for NETA and CSRM,the improvement mean that explicitly finding neighborhood sessions instead of remembering recent sessions is useful,because the latter model may contain a lot of noise from irrelevant sessions.

4)In order to verify the performance of NETA in more realistic scenarios,where recommendation system only gives a few items at once since viewers are impatient.We additionally test the performance of recall@10, recall@5, mrr@10 and mrr@5 and the experimental results are summarized in Table 3.It can be observed that NETA still retains certain advantages,which indicates that NETA tends to make more precise recommendations.

5)In the RetailRocket dataset, little performance difference between the NETA and SessionKNN may indicates us that the sequential order within session is not as important as we assumed in some cases,which is also the direction of our future work.We suppose this is due to different characteristics of different datasets.

4.2 Rq2


  • [1] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-based recommendations with recurrent neural networks,” arXiv preprint arXiv:1511.06939, 2015.
  • [2] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma, “Neural attentive session-based recommendation,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1419–1428, ACM, 2017.
  • [3] F. Garcin, C. Dimitrakakis, and B. Faltings, “Personalized news recommendation with context trees,” arXiv preprint arXiv:1303.0665, 2013.
  • [4] Q. He, D. Jiang, Z. Liao, S. C. Hoi, K. Chang, E.-P. Lim, and H. Li, “Web query recommendation via sequential query prediction,” in 2009 IEEE 25th International Conference on Data Engineering, pp. 1443–1454, IEEE, 2009.
  • [5] J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston, et al., “The youtube video recommendation system,” in Proceedings of the fourth ACM conference on Recommender systems, pp. 293–296, ACM, 2010.
  • [6] G. Linden, B. Smith, and J. Y. A. Com, “Industry report: Amazon. com recommendations: Item-to-item collaborative filtering,” in IEEE Distributed Systems Online, Citeseer, 2003.
  • [7] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang, “Stamp: short-term attention/memory priority model for session-based recommendation,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1831–1839, ACM, 2018.
  • [8] D. Jannach and M. Ludewig, “When recurrent neural networks meet the neighborhood for session-based recommendation,” in Proceedings of the Eleventh ACM Conference on Recommender Systems, pp. 306–310, ACM, 2017.
  • [9] M. Wang, P. Ren, L. Mei, Z. Chen, J. Ma, and M. de Rijke, “A collaborative session-based recommendation approach with parallel memory modules,” 2019.
  • [10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
  • [11] D. Garg, P. Gupta, P. Malhotra, L. Vig, and G. Shroff, “Sequence and time aware neighborhood for session-based recommendations: Stan,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1069–1072, ACM, 2019.
  • [12] Y. K. Tan, X. Xu, and Y. Liu, “Improved recurrent neural networks for session-based recommendations,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 17–22, ACM, 2016.
  • [13] M. Ludewig and D. Jannach, “Evaluation of session-based recommendation algorithms,” User Modeling and User-Adapted Interaction, vol. 28, no. 4-5, pp. 331–390, 2018.