1 Introduction
With the rapid growth of the amount of information on the Internet, recommendation systems become fundamental for helping users alleviate the problem of information overload and select interesting information in many Web applications, e.g., search, ecommerce, and media streaming sites. Most of the existing recommendation systems assume that the user profile and past activities are constantly recorded. However, in many services, user identification may be unknown and only the user behavior history during an ongoing session is available. Thereby, it is of great importance to model limited behavior in one session and generate the recommendation accordingly. Conversely, conventional recommendation methods relying on adequate useritem interactions have problems in yielding accurate results under this circumstance.
Due to the highly practical value, increasing research interests in this problem can be observed, and many kinds of proposals for sessionbased recommendation have been developed. Based on Markov chains, some work
[Shani, Brafman, and Heckerman2002, Rendle, Freudenthaler, and SchmidtThieme2010] predicts the user’s next behavior based on the previous one. With a strong independence assumption, independent combinations of the past components confine the prediction accuracy.In recent years, the majority of research [Hidasi et al.2016a, Tan, Xu, and Liu2016, Tuan and Phuong2017, Li et al.2017a]
apply Recurrent Neural Networks (RNNs) for sessionbased recommendation systems and obtain promising results. The work
[Hidasi et al.2016a] proposes a recurrent neural network approach at first, then the model is enhanced by data augmentation and considering temporal shift of user behavior [Tan, Xu, and Liu2016]. Recently, NARM [Li et al.2017a] designs a global and local RNN recommender to capture user’s sequential behavior and main purposes simultaneously. Similar to NARM, STAMP [Liu et al.2018] also captures users’ general interests and current interests, by employing simple MLP networks and an attentive net.Although the methods above achieve satisfactory results and become the stateofthearts, they still have some limitations. Firstly, without adequate user behavior in one session, these methods have difficulty in estimating user representations. Usually, the hidden vectors of these RNN methods are treated as the user representations, such that recommendations can be then generated based on these representations, for instance, the global recommender of NARM. In sessionbased recommendation systems, however, sessions are mostly anonymous and numerous, and user behavior implicated in session clicks is often limited. It is thus difficult to accurately estimate the representation of each user from each session. Secondly, previous work reveals that patterns of item transitions are important and can be used as a local factor [Li et al.2017a, Liu et al.2018] in sessionbased recommendation, but these methods always model singleway transitions between consecutive items and neglect the transitions among the contexts, i.e. other items in the session. Thus, complex transitions among distant items are often overlooked by these methods.
To overcome the limitations mentioned above, we propose a novel method for Sessionbased Recommendation with Graph Neural Networks, SRGNN for brevity, to explore rich transitions among items and generate accurate latent vectors of items. Graph Neural Networks (GNNs) [Scarselli et al.2009, Li et al.2015]
are designed for generating representations for graphs. Recently, it has been employed to model graphstructured dependencies for natural language processing and computer vision applications flourishingly, e.g., script event prediction
[Li, Ding, and Liu2018], situation recognition [Li et al.2017b], and image classification [Marino, Salakhutdinov, and Gupta2017]. For the sessionbased recommendation, we first construct directed graphs from historical session sequences. Based on the session graph, GNN is capable of capturing transitions of items and generating accurate item embedding vectors correspondingly, which are difficult to be revealed by the conventional sequential methods, like MCbased and RNNbased methods. Based on accurate item embedding vectors, the proposed SRGNN constructs more reliable session representations and the nextclick item can be inferred.Figure 1 illustrates the workflow of the proposed SRGNN method. At first, all session sequences are modeled as directed session graphs, where each session sequence can be treated as a subgraph. Then, each session graph is proceeded successively and the latent vectors for all nodes involved in each graph can be obtained through gated graph neural networks. After that, we represent each session as a composition of the global preference and the current interest of the user in that session, where these global and local session embedding vectors are both composed by the latent vectors of nodes. Finally, for each session, we predict the probability of each item to be the next click. Extensive experiments conducted on realworld representative datasets demonstrate the effectiveness of the proposed method over the stateofarts. The main contributions of this work are summarized as follows:

We model separated session sequences into graphstructured data and use graph neural networks to capture complex item transitions. To best of our knowledge, it presents a novel perspective on modeling in the sessionbased recommendation scenario.

To generate sessionbased recommendations, we do not rely on user representations, but use the session embedding, which can be obtained merely based on latent vectors of items involved in each single session.

Extensive experiments conducted on realworld datasets show that SRGNN evidently outperforms the stateofart methods.
To make our results fully reproducible, all the relevant source codes have been made public at https://github.com/CRIPACDIG/SRGNN.
The rest of this paper is organized as follows. We review prior related literature in Section 2. Section 3 presents the proposed method of sessionbased recommendation with graph neural networks. Detailed experiment results and analysis are shown in Section 4. Finally, we conclude this paper in Section 5.
2 Related Work
In this section, we review some related work on sessionbased recommendation systems, including conventional methods, sequential methods based on Markov chains, and RNNbased methods. Then, we introduce the neural networks on graphs.
Conventional recommendation methods. Matrix factorization [Mnih and Salakhutdinov2007, Koren, Bell, and Volinsky2009, Koren and Bell2011] is a general approach to recommendation systems. The basic objective is to factorize a useritem rating matrix into two lowrank matrices, each of which represents the latent factors of users or items. It is not very suitable for the sessionbased recommendation, because the user preference is only provided by some positive clicks. The itembased neighborhood methods [Sarwar et al.2001] is a natural solution, in which item similarities are calculated on the cooccurrence in the same session. These methods have difficulty in considering the sequential order of items and generate prediction merely based on the last click.
Then, the sequential methods based on Markov chains are proposed, which predict users’ next behavior based on the previous ones. Treating recommendation generation as a sequential optimization problem, Shani:2002:MRS:2073876.2073930 (Shani:2002:MRS:2073876.2073930) employ Markov decision processes (MDPs) for the solution. Via factorization of the personalized probability transition matrices of users, FPMC
[Rendle, Freudenthaler, and SchmidtThieme2010] models sequential behavior between every two adjacent clicks and provides a more accurate prediction for each sequence. However, the main drawback of Markovchainbased models is that they combine past components independently. Such an independence assumption is too strong and thus confines the prediction accuracy.Deeplearningbased methods. Recently, some prediction models, especially language models [Mikolov et al.2013] are proposed based on neural networks. Among numerous language models, the recurrent neural network (RNN) has been the most successful one in modeling sentences [Mikolov et al.2010] and has been flourishingly applied in various natural language processing tasks, such as machine translation [Cho et al.2014], conversation machine [Serban et al.2016], and image caption [Mao et al.2015]. RNN also has been applied successfully in numerous applications, such as the sequential click prediction [Zhang et al.2014], location prediction [Liu et al.2016], and next basket recommendation [Yu et al.2016].
For sessionbased recommendation, the work of [Hidasi et al.2016a] proposes the recurrent neural network approach, and then extends to an architecture with parallel RNNs [Hidasi et al.2016b]
which can model sessions based on the clicks and features of the clicked items. After that, some work is proposed based on these RNN methods. Tan:2016:IRN:2988450.2988452 (Tan:2016:IRN:2988450.2988452) enhances the performance of recurrent model by using proper data augmentation techniques and taking temporal shifts in user behavior into account. Jannach:2017:RNN:3109859.3109872 (Jannach:2017:RNN:3109859.3109872) combine the recurrent method and the neighborhoodbased method together to mix the sequential patterns and cooccurrence signals. Tuan:2017:CNS:3109859.3109900 (Tuan:2017:CNS:3109859.3109900) incorporates session clicks with content features, such as item descriptions and item categories, to generate recommendations by using 3dimensional convolutional neural networks. Besides, A listwise deep neural network
[Wu and Yan2017] models the limited user behavior within each session, and uses a listwise ranking model to generate the recommendation for each session. Furthermore, a neural attentive recommendation machine with an encoderdecoder architecture, i.e. NARM [Li et al.2017a], employs the attention mechanism on RNN to capture users’ features of sequential behavior and main purposes. Then, a shortterm attention priority model (STAMP) [Liu et al.2018] using simple MLP networks and an attentive net, is proposed to efficiently capture both users’ general interests and current interests.Neural network on graphs. Nowadays, neural network has been employed for generating representation for graphstructured data, e.g., social network and knowledge bases. Extending the word2vec [Mikolov et al.2013], an unsupervised algorithm DeepWalk [Perozzi, AlRfou, and Skiena2014] is designed to learn representations of graph nodes based on random walk. Following DeepWalk, unsupervised network embedding algorithms LINE [Tang et al.2015] and node2vec [Grover and Leskovec2016] are most representative methods. On the another hand, the classical neural network CNN and RNN are also deployed on graphstructured data. [Duvenaud et al.2015] introduces a convolutional neural network that operates directly on graphs of arbitrary sizes and shapes. A scalable approach [Kipf and Welling2016] chooses the convolutional architecture via a localized approximation of spectral graph convolutions, which is an efficient variant and can operate on graphs directly as well. However, these methods can only be implemented on undirected graphs. Previously, in form of recurrent neural networks, Graph Neural Networks (GNNs) [Gori, Monfardini, and Scarselli2005, Scarselli et al.2009] are proposed to operate on directed graphs. As a modification of GNN, gated GNN [Li et al.2015]
uses gated recurrent units and employs backpropagation through time (BPTT) to compute gradients. Recently, GNN is broadly applied for the different tasks, e.g., script event prediction
[Li, Ding, and Liu2018], situation recognition [Li et al.2017b], and image classification [Marino, Salakhutdinov, and Gupta2017].3 The Proposed Method
In this section, we introduce the proposed SRGNN which applies graph neural networks into sessionbased recommendation. We formulate the problem at first, then explain how to construct the graph from sessions, and finally describe the SRGNN method thoroughly.
3.1 Notations
Sessionbased recommendation aims to predict which item a user will click next, solely based on the user’s current sequential session data without accessing to the longterm preference profile. Here we give a formulation of this problem as below.
In sessionbased recommendation, let denote the set consisting of all unique items involved in all the sessions. An anonymous session sequence can be represented by a list ordered by timestamps, where represents a clicked item of the user within the session . The goal of the sessionbased recommendation is to predict the next click, i.e. the sequence label, for the session . Under a sessionbased recommendation model, for the session , we output probabilities for all possible items, where an element value of vector is the recommendation score of the corresponding item. The items with top values in will be the candidate items for recommendation.
3.2 Constructing session graphs
Each session sequence can be modeled as a directed graph . In this session graph, each node represents an item . Each edge means that a user clicks item after in the session . Since several items may appear in the sequence repeatedly, we assign each edge with a normalized weighted, which is calculated as the occurrence of the edge divided by the outdegree of that edge’s start node. We embed every item into an unified embedding space and the node vector indicates the latent vector of item learned via graph neural networks, where is the dimensionality. Based on node vectors, each session can be represented by an embedding vector , which is composed of node vectors used in that graph.
3.3 Implementing graph neural networks with session graphs
Then, we present how to obtain latent vectors of nodes via graph neural networks. The vanilla graph neural network is proposed by 4700287 (4700287), extending neural network methods for processing the graphstructured data. DBLP:journals/corr/LiTBZ15 (DBLP:journals/corr/LiTBZ15) further introduce gated recurrent units and propose gated GNN. Graph neural networks are wellsuited for sessionbased recommendation, because it can automatically extract features of session graphs with considerations of rich node connections. We first demonstrate the learning process of node vectors in a session graph. Formally, for the node of graph , the update functions are given as follows:
(1)  
(2)  
(3)  
(4)  
(5) 
where and are the reset and update gates respectively, is the list of node vectors in session ,
is the sigmoid function, and
is the elementwise multiplication operator. represents the latent vector of node . The connection matrix determines how nodes in the graph communicate with each other and are the two columns of blocks in corresponding to node .Here is defined as a concatenation of two adjacency matrices and , which represents weighted connections of outgoing and incoming edges in the session graph respectively. For example, consider a session , the corresponding graph and the matrix are shown in Figure 2. Please note that SRGNN can support different connection matrices for various kinds of constructed session graphs. If different strategies of constructing the session graph are used, the connection matrix will be changed accordingly. Moreover, when there exists content features of node, such as descriptions and categorical information, the method can be further generalized. To be specific, we can concatenate features with node vector to deal with such information.
For each session graph , the gated graph neural network proceeds nodes at the same time. Eq. (1) is used for information propagation between different nodes, under restrictions given by the matrix . Specifically, it extracts the latent vectors of neighborhoods and feeds them as input into the graph neural network. Then, two gates, i.e. update and reset gate, decide what information to be preserved and discarded respectively. After that, we constructs the candidate state by the previous state, the current state, and the reset gate as described in Eq. (4). The final state is then the combination of the previous hidden state and the candidate state, under the control of the update gate. After updating all nodes in session graphs until convergence, we can obtain the final node vectors.
3.4 Generating session embedding vectors
Previous sessionbased recommendation methods always assume there exists a distinct latent representation of user for each session. On the contrary, the proposed SRGNN method does not make any assumptions on that vector. Instead, a session is represented directly by nodes involved in that session. To better predict the users’ next clicks, we plan to develop a strategy to combine longterm preference and current interests of the session, and use this combined embedding as the session embedding.
After feeding all session graphs into the gated graph neural networks, we obtain the vectors of all nodes. Then, to represent each session as an embedding vector , we first consider the local embedding of session . For session , the local embedding can be simply defined as of the lastclicked item , i.e. .
Then, we consider the global embedding of the session graph by aggregating all node vectors. Consider information in these embedding may have different levels of priority, we further adopt the softattention mechanism to better represent the global session preference:
(6)  
where parameters and control the weights of item embedding vectors.
Finally, we compute the hybrid embedding
by taking linear transformation over the concatenation of the local and global embedding vectors:
(7) 
where matrix compresses two combined embedding vectors into the latent space .
3.5 Making recommendation and model training
After obtained the embedding of each session, we compute the score for each candidate item by multiplying its embedding by session representation , which can be defined as:
(8) 
Then we apply a softmax function to get the output vector of the model :
(9) 
where denotes the recommendation scores over all candidate items and denotes the probabilities of nodes appearing to be the next click in session .
For each session graph, the loss function is defined as the crossentropy of the prediction and the ground truth. It can be written as follows:
(10) 
where
denotes the onehot encoding vector of the ground truth item.
Finally, we use the BackPropagation Through Time (BPTT) algorithm to train the proposed SRGNN model. Note that in sessionbased recommendation scenarios, most sessions are of relatively short lengths. Therefore, it is suggested to choose a relatively small number of training steps to prevent overfitting.
3.6 Scalability and practical deployment
To train the model, the neighbors’ information will be aggregated at first and then the node status will be updated using GRUs. Thus, the overall time complexity of training the model is , where is the average length of the sequence and is the number of sessions. Note that in reality, so the proposed method scales linearly with the number of sessions.
As for practical deployment, the recommender can be divided into two parts, i.e. the offline part and the online part. The offline part learns item embedding and thus does not require realtime updates, while the online part is only responsible for prediction, which can be done in a realtime manner.
4 Experiments and Analysis
In this section, we first describe the datasets, compared methods, and evaluation metrics used in the experiments. Then, we compare the proposed SRGNN with other comparative methods. Finally, we make detailed analysis of SRGNN under different experimental settings.
4.1 Datasets
We evaluate the proposed method on two realworld representative datasets, i.e. Yoochoose^{1}^{1}1http://2015.recsyschallenge.com/challege.html and Diginetica^{2}^{2}2http://cikm2016.cs.iupui.edu/cikmcup. The Yoochoose dataset is obtained from the RecSys Challenge 2015, which contains a stream of user clicks on an ecommerce website within 6 months. The Diginetica dataset comes from CIKM Cup 2016, where only its transactional data is used.
Statistics  Yoochoose 1/64  Yoochoose 1/4  Diginetica 

# of clicks  557,248  8,326,407  982,961 
# of training sessions  369,859  5,917,745  719,470 
# of test sessions  55,898  55,898  60,858 
# of items  16,766  29,618  43,097 
Average length  6.16  5.71  5.12 
For fair comparison, following [Li et al.2017a, Liu et al.2018], we filter out all sessions of length 1 and items appearing less than 5 times in both datasets. The remaining 7,981,580 sessions and 37,483 items constitute the Yoochoose dataset, while 204,771 sessions and 43097 items construct the Diginetica dataset. Furthermore, similar to [Tan, Xu, and Liu2016], we generate sequences and corresponding labels by splitting the input sequence. To be specific, we set the sessions of subsequent days as the test set for Yoochoose, and the sessions of subsequent weeks as the test set for Diginetiva. For example, for an input session , we generate a series of sequences and labels , where is the generated sequence and denotes the nextclicked item, i.e. the label of the sequence. Following [Li et al.2017a, Liu et al.2018], we also use the most recent fractions 1/64 and 1/4 of the training sequences of Yoochoose. The statistics of datasets are summarized in Table 1.
4.2 Baseline Algorithms
To evaluate the performance of the proposed method, we compare it with the following representative baselines:

POP and SPOP recommend the top frequent items in the training set and in the current session respectively.

ItemKNN
[Sarwar et al.2001]recommends items similar to the previously clicked item in the session, where similarity is defined as the cosine similarity between the vector of sessions.

BPRMF [Rendle et al.2009]
optimizes a pairwise ranking objective function via stochastic gradient descent.

FPMC [Rendle, Freudenthaler, and SchmidtThieme2010] is a sequential prediction method based on markov chain.

GRU4REC [Hidasi et al.2016a] uses RNNs to model user sequences for the sessionbased recommendation.

NARM [Li et al.2017a] employs RNNs with attention mechanism to capture the user’s main purpose and sequential behavior.

STAMP [Liu et al.2018] captures users’ general interests of the current session and current interests of the last click.
4.3 Evaluation Metrics
Following metrics are used to evaluate compared methods.
P@20 (Precision) is widely used as a measure of predictive accuracy. It represents the proportion of correctly recommended items amongst the top items.
MRR@20 (Mean Reciprocal Rank) is the average of reciprocal ranks of the correctlyrecommended items. The reciprocal rank is set to 0 when the rank exceeds 20. The MRR measure considers the order of recommendation ranking, where large MRR value indicates that correct recommendations in the top of the ranking list.
4.4 Parameter Setup
Following previous methods [Li et al.2017a, Liu et al.2018], we set the dimensionality of latent vectors for both datasets. Besides, we select other hyperparameters on a validation set which is a random
subset of the training set. All parameters are initialized using a Gaussian distribution with a mean of 0 and a standard deviation of 0.1. The minibatch Adam optimizer is exerted to optimize these parameters, where the initial learning rate is set to 0.001 and will decay by 0.1 after every 3 epochs. Moreover, the batch size and the L2 penalty is set to 100 and
respectively.4.5 Comparison with baseline methods
To demonstrate the overall performance of the proposed model, we compare it with other stateofart sessionbased recommendation methods. The overall performance in terms of P@20 and MRR@20 is shown in Table 2, with the best results highlighted in boldface. Please note that, as in [Li et al.2017a], due to insufficient memory to initialize FPMC, the performance on Yoochoose 1/4 is not reported.
Algorithm  Yoochoose 1/64  Yoochoose 1/4  Diginetica  

P@20  MRR@20  P@20  MRR@20  P@20  MRR@20  
POP  6.71  1.65  1.33  0.30  0.91  0.23 
SPOP  30.44  18.35  27.08  17.75  21.07  14.69 
ItemKNN  51.60  21.81  52.31  21.70  28.35  9.45 
BPRMF  31.31  12.08  3.40  1.57  15.19  8.63 
FPMC  45.62  15.01  –  –  31.55  8.92 
GRU4REC  60.64  22.89  59.53  22.60  43.82  15.46 
NARM  68.32  28.63  69.73  29.23  62.58  27.35 
STAMP  68.74  29.67  70.44  30.00  62.03  27.38 
SRGNN  70.57  30.94  71.36  31.89  63.03  27.42 
SRGNN aggregates separated session sequences into graphstructured data. In this model, we jointly consider the global session preference as well as the local interests. According to the experiments, it is obvious that the proposed SRGNN method achieves the best performance among all methods on the three datasets in terms of P@20 and MRR@20. This verifies the effectiveness of the proposed method.
Regarding those traditional algorithms like POP and SPOP, their performance is relatively poor. Such simple models make recommendations solely based on repetitive cooccurred items or successive items, which is problematic in sessionbased recommendation scenarios. Even so, the SPOP still outperforms its opponents such as POP, BPRMF, and FPMC, demonstrating the importance of session contextual information. ItemKNN achieves better results than FPMC which is based on Markov chains. Please note that, ItemKNN utilizes only the similarity between items without considerations of sequential information. This indicates that the assumption on the independence of successive items, which traditional MCbased methods mostly rely on, is not realistic.
Neuralnetworkbased methods, such as NARM and STAMP, outperform the conventional methods, demonstrating the power of adopting deep learning in this domain. Short/longterm memory models, like GRU4REC and NARM, use recurrent units to capture a user’s general interest while STAMP improves the shortterm memory by utilizing the lastclicked item. Those methods explicitly model the users’ global behavioral preferences and consider transitions between users’ previous actions and the next click, leading to superior performance against these traditional methods. However, their performance is still inferior to that of the proposed method. Compared with the stateofart methods like NARM and STAMP, SRGNN further considers transitions between items in a session and thereby models every session as a graph, which can capture more complex and implicit connections between user clicks. Whereas in NARM and GRU4REC, they explicitly model each user and obtain the user representations through separated session sequences, with possible interactive relationships between items ignored. Therefore, the proposed model is more powerful to model session behavior.
Besides, SRGNN adopts the softattention mechanism to generate a session representation which can automatically select the most significant item transitions, and neglect noisy and ineffective user actions in the current session. On the contrary, STAMP only uses the transition between the lastclicked item and previous actions, which may not be sufficient. Other RNN models, such as GRU4REC and NARM, fail to select impactful information during the propagation process as well. They use all previous items to obtain a vector representing the user’s general interest. When a user’s behavior is aimless, or his interests drift quickly in the current session, conventional models are ineffective to cope with noisy sessions.
4.6 Comparison with different connection schemes
The proposed SRGNN method is flexible in constructing connecting relationships between items in the graph. Since user behavior in sessions is limited, we propose in this section another two connection variants in order to augment limited relationships between items in each session graph. Firstly, we aggregate all session sequences together and model them as a directed whole item graph, which is termed as the global graph hereafter. In the global graph, each node denotes a unique item, and each edge denotes a directed transition from one item to another. Secondly, we model all highorder relationships between items within one session as direct connections explicitly. In summary, the following two connection schemes are proposed to compare with SRGNN:

SRGNN with normalized global connections (SRGNNNGC) replaces the connection matrix with edge weights extracted from the global graph on the basis of SRGNN.

SRGNN with full connections (SRGNNFC) represents all higherorder relationships using boolean weights and appends its corresponding connection matrix to that of SRGNN.
The results of different connection schemes are shown in Figure 3. From the figures, it is seen that all three connection schemes achieve better or almost the same performance as the stateoftheart STAMP and NARM methods, confirming the usefulness of modeling sessions as graphs.
Compared with SRGNN, for each session, SRGNNNGC takes the impact of other sessions into considerations in addition to items in the current session, which subsequently reduces the influence of edges that are connected to nodes with high degree within the current session graph. Such a fusion method notably affects the integrity of the current session, especially when the weight of the edge in the graph varies, leading to performance downgrade.
In regard to SRGNN and SRGNNFC, the former one only models the exact relationship between consecutive items, and the latter one further explicitly regards all highorder relationships as direct connections. It is reported that SRGNNFC performs worse than SRGNN, though the experimental results of the two methods are not of much difference. Such a small difference in results suggests that in most recommendation scenarios, not every highorder transitions can be directly converted to straight connections and intermediate stages between highorder items are still necessities. For instance, considering that the user has viewed the following pages when browsing a website: , it is not appropriate to recommend page directly after without intermediate page , due to the lack of a direct connection between and .
4.7 Comparison with different session representations
We compare the session embedding strategy with the following three approaches: (1) local embedding only (SRGNNL), (2) global embedding with average pooling (SRGNNAVG), and (3) global embedding with the attention mechanism (SRGNNATT). The results of methods with three different embedding strategies are given in Figure 4.
From the figures, it can be observed that the hybrid embedding method SRGNN achieves best results on all three datasets, which validates the importance of explicitly incorporating current session interests with the longterm preference. Furthermore, the figures show that SRGNNATT performs better than SRGNNAVG with average pooling on three datasets. It indicates that the session may contain some noisy behavior, which cannot be treated independently. Besides, it is shown that attention mechanisms are helpful in extracting the significant behavior from the session data to construct the longterm preference.
Please note that SRGNNL, a downgraded version of SRGNN, still outperforms SRGNNAVG and achieves almost the same performance as that of SRGNNATT, supporting that both the current interest and longterm preference are crucial for sessionbased recommendation.
4.8 Analysis on session sequence lengths
We further analyze the capability of different models to cope with sessions of different lengths. For comparison, we partition sessions of Yoochoose 1/64 and Diginetica into two groups, where “Short” indicates that the length of sessions is less than or equal to 5, while each session has more than 5 items in “Long”. The pivot value 5 is chosen because it is the closest integer to the average length of total sessions in all datasets. The percentages of session belonging to short group and long group are 0.701 and 0.299 on the Yoochoose data, and 0.764 and 0.236 on the Diginetica data. For each method, we report the results evaluated in terms of P@20 in Table 3.
Our proposed SRGNN and its variants perform stably on two datasets with different session lengths. It demonstrates the superior performance of the proposed method and the adaptability of graph neural networks in sessionbased recommendation. On the contrary, the performance of STAMP changes greatly in short and long groups. STAMP [Liu et al.2018] explains such a difference according to replicated actions. It adopts the attention mechanism, so replicated items can be ignored when obtaining user representations. Similar to STAMP, on Yoochoose, NARM achieves good performance on the short group, but the performance drops quickly with the length of the sessions increasing, which is partially because RNN models have difficulty in coping with long sequences.
Then we analyze the performance of SRGNNL, SRGNNATT, and SRGNN with different session representations. These three methods achieve promising results comparing with STAMP and NARM. It is probably because that based on the learning framework of graph neural networks, our methods can attain more accurate node vectors. Such node embedding not only captures the latent features of nodes but also models the node connections globally. On such basis, the performance is stable among variants of SRGNN, while the performance of two stateofart methods fluctuate considerably on short and long datasets. Moreover, the table shows that SRGNNL can also achieve good results, although this variant only uses local session embedding vectors. It is maybe because that SRGNNL also implicitly considers the properties of the firstorder and higherorder nodes in session graphs. Such results are also validated by Figure 4, where both SRGNNL and SRGNNATT achieve the closetooptimal performance.
Method  Yoochoose 1/64  Diginetica  
Short  Long  Short  Long  
NARM  71.44  60.79  62.04  64.33 
STAMP  70.69  64.73  59.91  64.58 
SRGNNL  70.11  69.73  59.2  60.34 
SRGNNATT  70.31  70.64  62.72  63.55 
SRGNN  70.47  70.70  62.97  63.92 
5 Conclusions
Sessionbased recommendation is indispensable where users’ preference and historical records are hard to obtain. This paper presents a novel architecture for sessionbased recommendation that incorporates graph models into representing session sequences. The proposed method not only considers the complex structure and transitions between items of session sequences, but also develops a strategy to combine longterm preferences and current interests of sessions to better predict users’ next actions. Comprehensive experiments confirm that the proposed algorithm can consistently outperform other stateofart methods.
Acknowledgements
The first two authors Shu Wu and Yuyuan Tang contribute to this work equally. The work is done during the internship of Yanqiao Zhu and Yuyuan Tang at Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. The correspondence author is Yanqiao Zhu.
This work is jointly supported by National Key Research and Development Program (2016YFB1001000), National Natural Science Foundation of China (61403390, U1435221).
References
 [Cho et al.2014] Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoderdecoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing 1724–1734.
 [Duvenaud et al.2015] Duvenaud, D.; Maclaurin, D.; AguileraIparraguirre, J.; GómezBombarelli, R.; Hirzel, T.; AspuruGuzik, A.; and Adams, R. P. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the 28th International Conference on Neural Information Processing Systems  Volume 2, NIPS’15.
 [Gori, Monfardini, and Scarselli2005] Gori, M.; Monfardini, G.; and Scarselli, F. 2005. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, 729–734 vol. 2.
 [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. Node2vec: Scalable feature learning for networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, 855–864. New York, NY, USA: ACM.
 [Hidasi et al.2016a] Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; and Tikk, D. 2016a. Sessionbased recommendations with recurrent neural networks. In Proceedings of the 2016 International Conference on Learning Representations, ICLR ’16.
 [Hidasi et al.2016b] Hidasi, B.; Quadrana, M.; Karatzoglou, A.; and Tikk, D. 2016b. Parallel recurrent neural network architectures for featurerich sessionbased recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, 241–248. New York, NY, USA: ACM.
 [Jannach and Ludewig2017] Jannach, D., and Ludewig, M. 2017. When recurrent neural networks meet the neighborhood for sessionbased recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys ’17, 306–310. New York, NY, USA: ACM.
 [Kipf and Welling2016] Kipf, T. N., and Welling, M. 2016. Semisupervised classification with graph convolutional networks. In Proceedings of the 2016 International Conference on Learning Representations, ICLR ’16.
 [Koren and Bell2011] Koren, Y., and Bell, R. 2011. Advances in collaborative filtering. In Recommender Systems Handbook. Springer. 145–186.
 [Koren, Bell, and Volinsky2009] Koren, Y.; Bell, R.; and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. Computer 42(8):30–37.
 [Li et al.2015] Li, Y.; Tarlow, D.; Brockschmidt, M.; and Zemel, R. S. 2015. Gated graph sequence neural networks. In Proceedings of the 2015 International Conference on Learning Representations, volume abs/1511.05493 of ICLR ’15.
 [Li et al.2017a] Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; and Ma, J. 2017a. Neural attentive sessionbased recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, 1419–1428. New York, NY, USA: ACM.
 [Li et al.2017b] Li, R.; Tapaswi, M.; Liao, R.; Jia, J.; Urtasun, R.; and Fidler, S. 2017b. Situation recognition with graph neural networks. In 2017 IEEE International Conference on Computer Vision (ICCV), 4183–4192.
 [Li, Ding, and Liu2018] Li, Z.; Ding, X.; and Liu, T. 2018. Constructing narrative event evolutionary graph for script event prediction.

[Liu et al.2016]
Liu, Q.; Wu, S.; Wang, L.; and Tan, T.
2016.
Predicting the next location: A recurrent model with spatial and
temporal contexts.
In
AAAI Conference on Artificial Intelligence
, 194–200.  [Liu et al.2018] Liu, Q.; Zeng, Y.; Mokhosi, R.; and Zhang, H. 2018. Stamp: Shortterm attention/memory priority model for sessionbased recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, 1831–1839. New York, NY, USA: ACM.
 [Mao et al.2015] Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Huang, Z.; and Yuille, A. 2015. Deep captioning with multimodal recurrent neural networks (mrnn). International Conference on Learning Representations.

[Marino, Salakhutdinov, and
Gupta2017]
Marino, K.; Salakhutdinov, R.; and Gupta, A.
2017.
The more you know: Using knowledge graphs for image classification.
In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 00, 20–28.  [Mikolov et al.2010] Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; and Khudanpur, S. 2010. Recurrent neural network based language model. In INTERSPEECH, volume 2, 3.
 [Mikolov et al.2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Annual Conference on Neural Information Processing Systems, 3111–3119.
 [Mnih and Salakhutdinov2007] Mnih, A., and Salakhutdinov, R. 2007. Probabilistic matrix factorization. In Advances in neural information processing systems, 1257–1264.
 [Perozzi, AlRfou, and Skiena2014] Perozzi, B.; AlRfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, 701–710. New York, NY, USA: ACM.
 [Rendle et al.2009] Rendle, S.; Freudenthaler, C.; Gantner, Z.; and SchmidtThieme, L. 2009. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, 452–461.
 [Rendle, Freudenthaler, and SchmidtThieme2010] Rendle, S.; Freudenthaler, C.; and SchmidtThieme, L. 2010. Factorizing personalized markov chains for nextbasket recommendation. In Proceedings of the 19th international conference on World wide web, 811–820. ACM.
 [Sarwar et al.2001] Sarwar, B.; Karypis, G.; Konstan, J.; and Riedl, J. 2001. Itembased collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01.
 [Scarselli et al.2009] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfardini, G. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20(1):61–80.
 [Serban et al.2016] Serban, I. V.; Sordoni, A.; Bengio, Y.; Courville, A.; and Pineau, J. 2016. Building endtoend dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, 3776–3784.
 [Shani, Brafman, and Heckerman2002] Shani, G.; Brafman, R. I.; and Heckerman, D. 2002. An mdpbased recommender system. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, UAI’02, 453–460. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
 [Tan, Xu, and Liu2016] Tan, Y. K.; Xu, X.; and Liu, Y. 2016. Improved recurrent neural networks for sessionbased recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, 17–22. New York, NY, USA: ACM.
 [Tang et al.2015] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. 2015. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, 1067–1077. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.
 [Tuan and Phuong2017] Tuan, T. X., and Phuong, T. M. 2017. 3d convolutional networks for sessionbased recommendation with content features. In Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys ’17, 138–146. New York, NY, USA: ACM.
 [Wu and Yan2017] Wu, C., and Yan, M. 2017. Sessionaware information embedding for ecommerce product recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, 2379–2382. New York, NY, USA: ACM.
 [Yu et al.2016] Yu, F.; Liu, Q.; Wu, S.; Wang, L.; and Tan, T. 2016. A dynamic recurrent basket recommendation model. In Proceedings of the 39nd international ACM SIGIR conference on Research and development in information retrieval. ACM.
 [Zhang et al.2014] Zhang, Y.; Dai, H.; Xu, C.; Feng, J.; Wang, T.; Bian, J.; Wang, B.; and Liu, T.Y. 2014. Sequential click prediction for sponsored search with recurrent neural networks. In AAAI Conference on Artificial Intelligence, 1369–1376.
Comments
There are no comments yet.