TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation

05/06/2020 ∙ by Feng Yu, et al. ∙ 0

Session-based recommendation nowadays plays a vital role in many websites, which aims to predict users' actions based on anonymous sessions. There have emerged many studies that model a session as a sequence or a graph via investigating temporal transitions of items in a session. However, these methods compress a session into one fixed representation vector without considering the target items to be predicted. The fixed vector will restrict the representation ability of the recommender model, considering the diversity of target items and users' interests. In this paper, we propose a novel target attentive graph neural network (TAGNN) model for session-based recommendation. In TAGNN, target-aware attention adaptively activates different user interests with respect to varied target items. The learned interest representation vector varies with different target items, greatly improving the expressiveness of the model. Moreover, TAGNN harnesses the power of graph neural networks to capture rich item transitions in sessions. Comprehensive experiments conducted on real-world datasets demonstrate its superiority over state-of-the-art methods.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recommender systems are one of the most successful applications in the fields of data mining and machine learning research. Most previous work studies approaches that personalize the recommendation according to constantly recorded user profiles. In many real-world applications, however, such long-term profiles, even the users’ identities may not exist. As an emerging method aiming to solve this problem, session-based recommendation predicts the next action (e.g., which item to click) of a user, given her previous behaviors within the ongoing session.

Considering the high practical values of session-based recommendation, many approaches have been proposed so far. Traditionally, Markov-chain-based methods

(Rendle et al., 2010)

predict the user’s next action solely based on the previous action. Such a strong independence assumption suffers from noisy data and thus restricts its use in session-based recommendation scenarios. Recent trends in recommender systems have led to a proliferation of studies using deep neural network techniques. Models based on recurrent neural networks (RNNs) have achieved promising performance. For example,

Hidasi et al. propose a RNN-based method GRU4Rec (Hidasi et al., 2016)

to model short-term preferences with gated recurrent units (GRUs). Recently, NARM

(Li et al., 2017) proposes two RNN-based subsystems to capture users’ local and global preference respectively. Similar to NARM, STAMP (Liu et al., 2018) extracts users’ potential interests using a simple multilayer perception model and an attentive network.

Despite their effectiveness, we would argue that those methods are still in their infancy. Previous work highlights that complex user behavioral patterns are of great significance for session-based recommendation (Li et al., 2017; Liu et al., 2018). However, these sequence-based methods only model sequential transitions between consecutive items, with complex transitions neglected. Take repeated purchases as an example, which is one of the most prominent behaviors in e-shopping scenarios. Suppose a session for a user is . Then, it is hard for these sequence-based methods to capture such a to-and-fro relationship between items. Specifically, they will be confused about the relationship between item and items . In this paper, we propose to discover the complex transitional patterns underneath sessions through session graphs (Wu et al., 2019). By modeling items in sessions as session graphs, this natural means of encoding the abundant temporal patterns within sessions produces more accurate representation for each item.

Moreover, candidate items are usually abundant and users’ interests are usually diverse. Previous work (Li et al., 2017; Liu et al., 2018; Wu et al., 2019) represents one session using one embedding vector. That fixed-size vector represents all interests of a single user, which cannot express diverse user interests, and thereby limits the expressiveness of the recommender model. As a brute-force solution, we may consider enlarging the dimension of that fixed-length vector, which will in turn expose the risk of overfitting and deteriorate the model performance. However, we observe that it is not necessary to embed all user interests into one vector when making prediction for a specific candidate item. For example, suppose that a customer has a historical session of (swimming suits, purse, milk, frying pan). If we want to recommend a handbag for her, we focus on her interests in the purse rather than the frying pan. That is to say, the interests of a user with rich behaviors can be specifically activated given a target item (Zhou et al., 2018). In this paper, we refine the proposed graph-based model through a novel target attention module. We term the resulting model as Target Attentive Graph Neural Networks for session-based recommendation, TAGNN111Code available at https://github.com/CRIPAC-DIG/TAGNN for brevity. The proposed target attention module aims to adaptively activate user interests by considering the relevance of historical behaviors given a target item. By introducing a local target attentive unit, specific user interests related to a target item in the current session are activated, which will benefit downstream session representations as a result.

Figure 1 gives an overview of the TAGNN method. We first construct session graphs using items in historical sessions. After that, we obtain corresponding embeddings using graph neural networks to capture complex item transitions based on session graphs. Given the item embeddings, we employ a target-aware attentive network to activate specific user interests with respect to a target item. Following that, we construct session embeddings. At last, for each session, we can infer the user’s next action based on item embeddings and the session embedding.

The main contribution of this work is threefold. Firstly, we model items in sessions as session graphs to capture complex item transitions within sessions. Then, we employ graph neural networks to obtain item embeddings. Secondly, to adaptively activate users’ diverse interests in sessions, we propose a novel target attentive network. The proposed target attentive module can reveal the relevance of historical actions given a certain target item, which further improves session representations. Finally, we conduct extensive experiments on real-world datasets. The empirical studies show that our method achieves state-of-the-art performance.

2. The Proposed Method: TAGNN

2.1. Problem Statement and Constructing Session Graphs

In session-based recommendation, an anonymous session can be represented by a list ordered by timestamps and we denote as the set consisting of all unique items (e.g., user clicks) involved in sessions. Session-based recommendation aims to predict the next action given session

. Our model produces a ranking list of probabilities for all candidate items and items with top-

probability values will be selected for recommendation.

In our model, we represent each session as a directed session graph , where are the node set, the edge set, and the adjacency matrix, respectively. In this graph , each node represents an item and each edge represents a user visits item and consecutively. Here we define as the concatenation of two adjacency matrices and to reflect the bidirectional relationship between items in sessions, where and represents weighted connections of outgoing and incoming edges respectively. Here we use the same strategy of constructing session graphs as SR-GNN (Wu et al., 2019). However, it is flexible to adopt different mechanisms of constructing the session graph for different session-based recommendation scenarios. For clarity, we omit subscript for referring item embeddings hereafter.

Figure 1. An overview of the proposed TAGNN method. Session graphs are constructed based on sessions at first. Then, graph neural networks capture rich item transitions in sessions. Last, from one session embedding vector, target-aware attention adaptively activates different user interests concerning varied target items to be predicted.

2.2. Learning Item Embeddings

After constructed session graphs, we transform every node into a unified embedding space. The resulting vector is a -dimensional representation of item obtained using graph neural networks. Then, we can represent each session using item embeddings. The graph neural network (GNN) (Scarselli et al., 2009; Kipf and Welling, 2017; Li et al., 2016)

is a class of widely used deep learning models. GNNs generate node representations on top of graph topology, which models complex item connections. Therefore, they are particularly suitable for session-based recommendation. In this paper, we employ gated graph neural networks (GGNNs)

(Li et al., 2016), a variant of GNN, to learn the node vectors. Formally, for node of graph , its update rules are:


where is the training step, is the -th row in matrix corresponding to node , and are weight and bias parameter respectively, is the list of node vectors in session , and are the reset and update gates respectively,

is the sigmoid function, and

denotes element-wise multiplication. For each session graph , the GGNN model propagates information between neighboring nodes. The update and reset gate decides what information to be preserved and discarded respectively.

2.3. Constructing Target-Aware Embeddings

Previous work captures users’ interests only using intra-session item representations. In our model, once we obtained the node vector for each item, we proceed to construct target embeddings, to adaptively consider the relevance of historical behaviors concerning target items. Here we define the target items as all candidate items to predict. Usually, a user’s action given a recommended item only matches a part of her interests. To model this process, we design a novel target attention mechanism to calculate soft attention scores over all items in the session with respect to each target item.

In this section, we introduce a local target attentive module to calculate attention scores between all items in session and each target item

. Firstly, a shared non-linear transformation parameterized by a weight matrix

is applied to every node-target pair. Then, we normalize the self-attention scores using the softmax function:


Finally, for each session , the users’ interests towards a target item is represented by , as given below:


The obtained target embedding for representing users’ interests varies with different target items.

2.4. Generating Session Embeddings

In this section, we further exploit users’ short- and long-term preference exhibited in the current session using node representations involved in session . The resulting two representations along with the user’s target embedding will be further concatenated to generate better session embeddings.

Local embedding. As the user’s final action is usually determined by her last action, we simply represent the user’s short-term preference as a local embedding as the representation of the last-visited item .

Global embedding. Then, we represent the user’s long-term preference as a global embedding by aggregating all involved node vectors. We adopt another soft-attention mechanism to draw dependencies between the last-visited item and each item involved in the session:


where and are weight parameters.

Session embedding. Finally, we generate the session embedding of session by taking linear transformation over the concatenation of the local and global embeddings and the target embedding:


where projects the three vectors into one embedding space . Please kindly note that we generate different session embeddings for each target item.

2.5. Making Recommendation

After obtaining all item embeddings and session embeddings, we compute the recommendation score for each target item by taking inner-product of item embedding and session representation . Following that, we use the softmax function over all unnormalized scores for all target items and get the final output vector:


Here denotes the probabilities of nodes being the next action in . The items with the top- probabilities in will be selected as recommended items.

For training the model, we define the loss function as the cross-entropy of the prediction and the ground truth:



denotes the one-hot encoding vector of the ground truth items. We use the back-propagation through time (BPTT) algorithm to train the proposed model.

3. Experiments

In this section, we aim to answer the following two questions:

RQ1. Does the proposed TAGNN achieve state-of-the-art performance compared with existing representative baseline algorithms?

RQ2. How do different schemes for representing user interests affect the model performance?

3.1. Experimental Configurations


We evaluate the proposed method using two widely-used real-world datasets Yoochoose222http://2015.recsyschallenge.com/challenge.html and Diginetica333http://cikm2016.cs.iupui.edu/cikm-cup, obtained from two contests in data mining conference RecSys 2015 and CIKM 2016 respectively. For fair comparison, we closely follow the same data preprocessing scheme as Li et al. (2017); Liu et al. (2018); Wu et al. (2019). Specifically, we drop items appearing less than 5 times and sessions consisting of less than 2 items. For generating training and test sets, sessions of last days are used as the test set for Yoochoose, and sessions of last weeks as the test set for Diginetica. For an existing session , we generate a series of input session sequences and corresponding labels as . Since the Yoochoose dataset is too large, we only use its the most recent 1/64 fractions of the training sessions, denoted as Yoochoose 1/64.


To evaluate the performance of the proposed method, we comprehensively compare TAGNN with representative baselines. The traditional baselines include (a) frequency-based methods POP and S-POP, (b) similarity-based method Item-KNN

(Sarwar et al., 2001), and (c) factorization-based methods Bayesian personalized ranking (BPR-MF) (Rendle et al., 2009) and factorizing personalized Markov chain model (FPMC) (Rendle et al., 2010). We also consider deep learning baselines, including RNN-based recommender model GRU4REC (Hidasi et al., 2016), neural attentive recommender model (NARM) (Li et al., 2016), short-term attention/memory priority model (STAMP) (Liu et al., 2018), and GNN-based recommender model SR-GNN (Wu et al., 2019).

Evaluation Metrics.

We adopt two commonly-used metrics for evaluation, including Precision@20 and MRR@20. The former one evaluates the proportion of correct recommendation in an unranked list, while the latter one further considers the position of correct recommended items in a ranked list.

Hyperparameter Setup.

Following previous methods (Liu et al., 2018; Wu et al., 2019), we set

for hidden dimensionality in all experiments. We tune other hyperparameters based on a random 10% validation set. The initial learning rate for Adam is set to 0.001 and will decay by 0.1 after every 3 training epochs. The batch size is set to 100 for both datasets and the

penalty is set to .

3.2. Comparison with Baseline Methods (RQ1)

To evaluate the performance of the proposed method, we firstly compare it with existing representative baselines (RQ1). The overall performance in terms of Precision@20 and MRR@20 is summarized in Table 1, with the highest performance highlighted in boldface.

Method Diginetica Yoochoose 1/64
Precision@20 MRR@20 Precision@20 MRR@20
POP 0.89 0.20 6.71 1.65
S-POP 21.06 13.68 30.44 18.35
Item-KNN 35.75 11.57 51.60 21.81
BPR-MF 5.24 1.98 31.31 12.08
FPMC 26.53 6.95 45.62 15.01
GRU4REC 29.45 8.33 60.64 22.89
NARM 49.70 16.17 68.32 28.63
STAMP 45.64 14.32 68.74 29.67
SR-GNN 50.73 17.59 70.57 30.94
TAGNN 51.31 18.03 71.02 31.12
Improv.(%) 1.14 2.50 0.64 0.58
Table 1. The performance of TAGNN compared with other baseline methods using two datasets.

In all, TAGNN aggregates session items into session graphs and further considers modeling user preference through target-aware attention. It is apparent from the table that the proposed TAGNN model achieves state-of-the-art performance on all datasets in terms of Precision@20 and MRR@20, which confirms the effectiveness of the proposed method.

This table is quite revealing in several ways. Firstly, traditional methods including POP and S-POP achieve poor performance. They mainly emphasize items with high co-occurrence, which is over-simplified in session-based recommendation. Interestingly, the simple method Item-KNN still shows favorable performance, compared with POP, BPR-MF, and FPMC. Without knowing the sequential information, Item-KNN only recommends items with high similarity. This may be explained by the fact that latent factors representing user preference play a key role in generating accurate recommendation. It can also be seen that Item-KNN surpasses most Markov-chain-based methods, such as BPR-MF and FPMC, which demonstrates that modeling limited dependencies in session sequence is not realistic in session-based recommendation scenarios.

Secondly, there is a clear trend that deep learning methods greatly outperform conventional models. These methods have a stronger capability to capture complex user behavior, leading to superior performance over traditional ones. Sequential models such as GRU4REC and NARM only considers single-way transitions between successive item. Compared with SR-GNN, which further models session as graphs and is able to capture more implicit connections between user clicks, these methods neglect complex item transitional patterns. However, the performance of these models is still inferior to that of the proposed method. TAGNN elaborates graph-based models by further considering user interests with target-aware attentions. This mechanism specifically activates diverse user interests given different target items, which improves the expressiveness of the recommender model. In summary, these results demonstrate the efficacy of the proposed TAGNN method.

3.3. Ablation Studies (RQ2)

Figure 2. The performance of different session representations.

We conduct ablation studies on session embedding strategies in this section. We design 4 model variants to analyze how different representations for user preference affect the model performance (RQ2): (a) local embedding only (TAGNN-L), (b) global embedding using average pooling only (TAGNN-Avg), (c) attentive global embedding (TAGNN-Att), and (d) local embedding plus attentive global embedding (TAGNN-L+Att). Figure 2 presents experimental results using different session representations.

It is apparent from this figure that the hybrid embedding strategy used by TAGNN achieves the best performance on all datasets, which verifies the necessity of explicitly incorporating target-aware attention for better representing user interests. Also, it is clear that the mixed use of attentive global embedding with local embedding outperforms other opponents, demonstrating the necessity to capture both long- and short-term preference exhibited within sessions. Please note that the performance of TAGNN-L, which merely uses the last item as the session representation stands out in this figure. It indicates that the last item has a great impact on a user’s final action. Moreover, we can observe that using average pooling over items in session achieves bad performance. This phenomenon may be explained by the diversity of user behavior in the session, which further highlights the importance of using the proposed target-aware attention to capture diverse user interests.

4. Conclusions

In this paper, we have developed a novel target attentive graph neural network model for session-based recommendation. By incorporating graph modeling and a target-aware attention module, the proposed TAGNN jointly considers user interests given a certain target item as well as complex item transitions in sessions. We have conducted thorough empirical evaluation to investigate TAGNN. Extensive experiments on real-world benchmark datasets demonstrate the effectiveness of our model.

This work is jointly supported by National Key Research and Development Program (2018YFB1402600, 2016YFB1001000) and National Natural Science Foundation of China (U19B2038, 61772528).


  • B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk (2016) Session-based Recommendations with Recurrent Neural Networks. In ICLR, Cited by: §1, §3.1.
  • T. N. Kipf and M. Welling (2017) Semi-Supervised Classification with Graph Convolutional Networks. In ICLR, Cited by: §2.2.
  • J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma (2017) Neural Attentive Session-based Recommendation. In CIKM, Cited by: §1, §1, §1, §3.1.
  • Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2016) Gated Graph Sequence Neural Networks. In ICLR, Cited by: §2.2, §3.1.
  • Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang (2018) STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation. In KDD, Cited by: §1, §1, §1, §3.1, §3.1, §3.1.
  • S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009) BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI, Cited by: §3.1.
  • S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme (2010) Factorizing Personalized Markov Chains for Next-basket Recommendation. In WWW, Cited by: §1, §3.1.
  • B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl (2001) Item-based Collaborative Filtering Recommendation Algorithms. In WWW, Cited by: §3.1.
  • F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2009) The Graph Neural Network Model. IEEE T-NN 20 (1). Cited by: §2.2.
  • S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2019) Session-based Recommendation with Graph Neural Networks. In AAAI, Cited by: §1, §1, §2.1, §3.1, §3.1, §3.1.
  • G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai (2018) Deep Interest Network for Click-Through Rate Prediction. In KDD, Cited by: §1.