Balancing Multi-level Interactions for Session-based Recommendation

10/29/2019 ∙ by Yujia Zheng, et al. ∙ 0

Predicting user actions based on anonymous sessions is a challenge to general recommendation systems because the lack of user profiles heavily limits data-driven models. Recently, session-based recommendation methods have achieved remarkable results in dealing with this task. However, the upper bound of performance can still be boosted through the innovative exploration of limited data. In this paper, we propose a novel method, namely Intra-and Inter-session Interaction-aware Graph-enhanced Network, to take inter-session item-level interactions into account. Different from existing intra-session item-level interactions and session-level collaborative information, our introduced data represents complex item-level interactions between different sessions. For mining the new data without breaking the equilibrium of the model between different interactions, we construct an intra-session graph and an inter-session graph for the current session. The former focuses on item-level interactions within a single session and the latter models those between items among neighborhood sessions. Then different approaches are employed to encode the information of two graphs according to different structures, and the generated latent vectors are combined to balance the model across different scopes. Experiments on real-world datasets verify that our method outperforms other state-of-the-art methods.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Session-based Recommendation System (SRS) has attracted much attention for its highly practical value, especially in some real-world scenarios that concentrated with multitudes of anonymous interactive data (e.g., social media, e-commerce and web search) (Ludewig and Jannach, 2018). Different from most of the other recommendation tasks that need explicit user preference profiles, SRS only relies on anonymous user action logs (e.g, clicks) in an ongoing session to predict the user’s next action (Ludewig and Jannach, 2018; Quadrana et al., 2018).

Under these circumstances, several methods are proposed to tackle the SRS task. Markov Chains (MC)

(Zimdars et al., 2001; Shani et al., 2005; Rendle et al., 2010)

is a representation of traditional methods. It predicts the user’s next action based on the previous one thus introduces sequentiality into SRS. Recently, neural network-based methods have become popular due to their strong abilities to model sequential data (e.g., Recurrent neural networks (RNN)-based networks). For instance, experiments conducted on real-world datasets

(Hidasi et al., 2015)

show that Gated Recurrent Unit (GRU) significantly improves the performance of SRS compared with traditional methods. Based on

(Hidasi et al., 2015), Tan et al. (Tan et al., 2016) improve the recommendation performance by applying data augmentation. After that, Li et al (Li et al., 2017) propose NARM to capture more representative features by using a global and local RNN structure. Similar to that, Liu et al. (Liu et al., 2018)

propose STAMP to model general and current interests using a novel attention mechanism. Moreover, as geometric deep learning methods (e.g., Graph Neural Networks (GNN)) have achieved state-of-the-art performance in various tasks, it is also applied in SRS after modeling sessions into graph-structured data

(Wu et al., 2019).

Despite the surprising performance of deep learning methods, neighborhood-based methods can still provide competitive results. Traditional Item-based KNN (Item-KNN)

(Sarwar et al., 2001) considers the similarities between the last item in a current session and other items and recommends the most similar items to users. More recent methods, such as Session-based KNN (SKNN), consider each session as a whole and use the similarity between sessions to make recommendations (Ludewig and Jannach, 2018; Jannach and Ludewig, 2017; Bonnin and Jannach, 2015). Then, based on SKNN, KNN-RNN (Jannach and Ludewig, 2017) widens the application area by integrating GRU4REC to model the sequentiality. Most recently, CSRM (Wang et al., 2019) achieves state-of-the-art performance by applying Parallel Memory Modules on NARM to incorporate collaborative information. It calculates the similarities between the current and other session representations from the external memory module to extract collaborative information.

However, all existing methods are deficient in exploiting the depth or width of session data, or both. RNN-based methods (Hidasi et al., 2015; Li et al., 2017; Liu et al., 2018; Jannach and Ludewig, 2017; Wang et al., 2019), including those combined with neighborhood-based methods (Jannach and Ludewig, 2017; Wang et al., 2019), only support unidirectional interactions between consecutive items and neglect those among other contextual items in the same session. SR-GNN (Wu et al., 2019) applies GNN to overcome this limitation but only within a small scope of a single session, which is a common deficiency in almost all purely neural network-based methods. Because there is no side-information (e.g., user profiles) in SRS, being poor at mining in a wider scope consisting of multiple sessions leads to a lack of collaborative information that limits the upper bound of model performance. Item-KNN-based methods do consider a large set of items that spread across multiple sessions, but they only treat items as independent elements, ignoring the integrity and sequentiality of sessions. While SKNN considers the integrity by measuring similarities between different sessions and its improved version KNN-RNN integrates GRU4REC to extract intra-session item-level interactions, interactions of items among different sessions are ignored by treating sessions as the minimum granularity. CSRM only introduces an end-to-end neural network method to push performance higher, but it does not address the aforementioned issue of coarse granularity and still suffers from the lack of contextual item-level interactions as RNN-based methods.

Figure 1. Here are three kinds of interactions. (a): Interactions between four users are Inter-session Session-level Interactions, and those of items within a single user’s session are Intra-session Item-level Interactions. (b): Inter-session Item-level Interactions.

Thus, we propose a novel method, namely Intra-and Inter-Session Interaction-aware Graph-enhanced Network (I3GN), to reclaim the virgin land: inter-session item-level interactions. Figure 1 visualizes the difference between the newly introduced interactions and others. The motivation of introducing that is to naturally model interactions within related items. For example, in Figure 1, a computer, which is in the current session, is linked with a camera, phone, and briefcase in different sessions. Therefore, we could assume that those items have certain commonalities, and each of these may have a relatively significant relevance to the computer. Other items in sessions of those related items, albeit to a lesser extent, may have similar effects. Previous methods, such as CSRM, unfortunately, mix related items with other relatively unimportant items together when they roughly divide them into different session representations, which are called inter-session session-level interactions (Figure 1a). However, our introduced inter-session item-level interactions explicitly highlight the importance of related items by considering all interactions among them (Figure 1b). Apparently, only focusing on that is not enough to make a strong recommendation. Therefore, we propose Session Merging Module (SMM) to construct intra-and inter-session graphs for intra-and inter-session item-level interactions respectively. Because those related items share a certain commonality but in different relative positions for the computer, we use undirected edges to build the inter-session graph. As sequentialiy matters a lot when it comes to a single current session without any collaborative information, we choose the directed graph as the intra-session graph. Then Cross-Scope Encoder (CSE) can be applied in these graphs to model complex items-level interactions across different scopes, using GNN’s powerful ability of extracting rich contextual interactions by nodes propagation. In this way, not only contextual items in current sessions can be well represented but also different items in the inter-session graph can be given different weights based on their distance to the target item node. Because of the structural difference, we use different approaches to encode each kind of graph. Finally, for the current session, we combine the latent vectors of its two kinds of graphs and generate its representation vector for prediction.

The main contributions of our work are summarized as follows:

  • We propose a novel graph-based model I3GN. To the best of our knowledge, it is the first method to integrate inter-session item-level interactions in SRS.

  • We introduce a SMM to model a current session into an intra-session graph and an inter-session graph, which lays a solid foundation for the extraction of complex item-level interactions.

  • We design a CSE to balance the model in different kinds of session data. Inspired by SR-GNN, CSE introduces a wider scope of item-level interactions to boost the performance upper bound.

  • We evaluate our model in two real-world datasets. Our extensive experiments show that I3GN outperforms the state-of-art methods.

2. Related Work

In this section, we introduce some related works in Session-based Recommendation, Collaborative Filtering, and Graph Embedding.

2.1. Session-based Recommendation

Traditional methods for session-based recommendation are mainly based on Markov Chains (MC) (Zimdars et al., 2001; Mobasher et al., 2002; Shani et al., 2005; Rendle et al., 2010), which introduces the sequentiality in SRS by predicting the user’s next action based on the last action. Zimdars et al. (Zimdars et al., 2001)

apply probabilistic decision-tree models to study the way to extract the sequentiality. Mobasher et al.

(Mobasher et al., 2002) choose the contiguous sequential patterns for SRS after studying the effect of different patterns. Shani et al. (Shani et al., 2005)

employ the Markov Decision Processes that consider the long-term effect and the expected value of each recommendation. However, MC-based methods lose a balance between user’s general preference and sequential behavior, for they seldom consider sequentiality between items that are not consecutively adjacent in the same session. To achieve that balance, Rendle et al.

(Rendle et al., 2010) propose a hybrid method taking account of the combination of Matrix Factorization and MC, namely FPMC.

Like most other fields of recommendation, deep learning methods frequently appear in recent SRS models and obtain new state-of-the-art performance in terms of accuracy, especially RNN-based methods (Hidasi et al., 2015; Tan et al., 2016; Li et al., 2017; Liu et al., 2018). Hidasi et al. (Hidasi et al., 2015) employ RNN with the Gated Recurrent Unit (GRU) into SRS and outperform traditional methods. Tan et al. (Tan et al., 2016) further improve it by introducing data augmentation, distillation integrating privileged information, and a pre-training approach to account for temporal shifts in the data distribution. Later, attention mechanism is applied by an encoder-decoder recommendation method (NARM) to combine sequentiality and user’s general preference (Li et al., 2017). However, STAMP (Liu et al., 2018) also adopts the concept combining general and current interest, but the difference is that STAMP explicitly models the current interest reflected by the last click to emphasize the importance of last click, while NARM considers them as equally important. Most recently, geometric deep learning has become popular in a variety of tasks. SR-GNN (Wu et al., 2019) transforms sessions into the graph-structured data and applies GNN based on that. The significant improvement in recommendation performance proves the potential of geometric deep learning in SRS, and the motivation behind this work is enlightening to our work. However, all aforementioned deep learning methods only consider intra-session item-level interactions, which limits the upper bound of performance because of the lack of collaborative information.

Moreover, Collaborative Filtering (CF) idea-based methods are also popular in SRS. Unlike traditional user-based (Resnick et al., 1994; Hill et al., 1995; Miller et al., 2003; Jin et al., 2004) or item-based (Linden et al., 2003; Billsus and Pazzani, 1998; Pareek et al., 2013; Sarwar et al., 2000) CF models in other recommendation tasks, modifications need to be made for them to perform well in SRS. Simply using item neighborhood information (Sarwar et al., 2001) cannot extract the integrity and sequentiality of items in the current session, which are extremely important for SRS application scenarios because of the lack of auxiliary data. Thus, SKNN (Bonnin and Jannach, 2015) is proposed to consider each session as a whole and its improved version KNN-RNN (Jannach and Ludewig, 2017) integrates GRU4REC to extract the sequentiality. Later, an end-to-end neural model (CSRM (Wang et al., 2019)) outperforms KNN-RNN with learnable latent session representations. The major difference between our method and theirs (KNN-RNN and CSRM) is that they only stay at the minimum granularity of session as the collaborative information, while we dig deeper and integrate inter-session item-level interactions into SRS. And the performance of CSRM is highly related to the quality of RNN-based encoder, which suffers from the lack of contextual item-level interactions that can be easily extracted by GNN in our method.

Figure 2. The framework of I3GN.

2.2. Graph Embedding

The most important part of aforementioned deep learning-based methods is embedding, because generating more accurate and meaningful session embedding directly decides the performance. Thus, graph embedding becomes a critical component of the method when it comes to graph-structured data. However, traditional kernel-based methods (e.g., Weisfeiler-Lehman kernel (Shervashidze et al., 2011), Deep Graph Kernels (Yanardag and Vishwanathan, 2015)) focus more on the unsupervised tasks and have trouble scaling to large graphs, so we mainly introduce neural network-based graph embedding methods here.

The concept of Graph Neural Networks (GNN) is first purposed by Gori et al. (Gori et al., 2005), then developed and deepened by Scarselli et al. (Scarselli et al., 2008) and Micheli et al. (Micheli, 2009)

. These early methods mainly generate representations of target nodes by using the recurrent neural unit to aggregate information of neighbor nodes. Inspired by the success of Convolutional Neural Network (CNN) in the image classification task, Bruna et al.

(Bruna et al., 2013) propose the spectral Graph Convolutional Neural Network (GCN). Then Defferrard et al. propose a variant model by introducing fast localized spectral filtering (Defferrard et al., 2016), and Kipf et al. improve it with a first-order approximation of spectral graph convolutions to motivate the choice of convolutional architecture (Kipf and Welling, 2016). Moreover, Message Passing Neural Network (MPNN) generalizes these GCN-based methods and introduces a two-step framework: message passing and readout (Gilmer et al., 2017; Duvenaud et al., 2015; Li et al., 2015b). Gated Graph Neural Networks (GGNN) (Li et al., 2015a) extends GNN to the sequential output, which is of great significance for sequential recommendations, such as SRS. However, as a large number of studies have shown that the attention mechanism improves the performance of deep learning-based methods in various tasks, it is therefore natural for researchers to import it on graphs (Veličković et al., 2017; Choi et al., 2017). Velickovic et al. (Veličković et al., 2017) propose Graph Attention Network (GAT), which uses attention mechanisms to learn node embedding in a graph. By making the weights of nodes trainable, GAT can extract more information from the most critical part of the graph structure without a priori knowledge of structure, which is especially important for the scalability of graph embedding. Thus we apply it to our relatively large inter-session graphs.

3. Model

In this section, we introduce the proposed model. We start with a general introduction to the overall process of the model. Then, the internal structure of Intra-and Inter-Session Interaction-aware Graph-enhanced Network (I3GN) is explained in detail.

3.1. Notations

The aim of SRS can be defined as using users’ current sequential session data to predict users’ next click items. Let represents a set of unique items in all sessions. represents an anonymous session which contains items ordered by timestamps. denotes the whole sessions set. For each item in , we embed it into a unified embedding space. Let denotes the latent vector of corresponding item and represents a set of all latent representations. Given a session , the aim of our model is to predict user’s possible next click item, i.e. the sequence label

. We generate probabilities

for all possible items based on input session . Each element’s value of vector is the recommendation score of the corresponding item. The items with a top- recommendation score will be recommended as our model’s output.

3.2. Framework

As illustrated in Figure 2, I3GN consists of following parts: Session Merging Module (SMM), Cross-Scope Encoder (CSE) and the final Prediction Module. Moreover, CSE can be divided into Intra-session Module and Inter-session Module according to encoding approaches of different graph structures. Specifically, for current session , SMM uses a specific similarity to find the most similar sessions (neighbors) from all past sessions. Then intra-session graph is built based on and SMM uses and to construct inter-session graph . According to and , the representation of intra-and inter-session and are generated by Intra-and Inter-session Modules respectively. Then CSE combines them to generate the final session’s representation . After that, based on , the Prediction Module produces the final output vector as the recommendation scores for all possible items. Finally, the top- items in are recommended.

3.3. Session Merging Module

To integrate inter-session item-level interactions, we need to model sessions into graph-structured data.

Given a session , the first step is to determine the neighbor set of the most similar past sessions in the training for graph-building. To achieve this, we construct the set of possible neighbors by creating the union of sessions in which the items of are contained. Recent study indicates that it is most effective to focus only on the most recent session when selecting neighbors (Ludewig and Jannach, 2018), so we create a subsample of which contains most recent sessions denoted as . In our method, we set to 1000 based on (Ludewig and Jannach, 2018).

After obtaining , we need to choose neighbors of

from it. First, we compute the cosine similarities between

and every other session . Sessions and are encoded as binary vectors , where if an item appears then the corresponding element in the vector is set to one, otherwise zero. Then we use cosine similarity to calculate the similarity between and , which can be defined as:


where and represent the length of and respectively.

For all sessions in , we first filter out sessions of which the similarity is lower than 0.5. Then the top- similar sessions are selected to create neighborhood set .

The second step is to build the intra-and inter-session graphs for intra-and inter-session item-level interactions, respectively. For intra-session graph, we model the current session as a directed graph by treating each item in as a node and as an edge, which represents a user clicking on item and then clicking on in . The reason why we use the directed graph is because sequentiality matters a lot when dealing with a single session.

For inter-session graph, we model all sessions in and as a single undirected inter-session graph . In , each node represents a item that appears in session or any neighbor session in , and each edge denotes a user clicking on item before or after in session or any neighbor sessions in . The motivation of why we use undirected graph is that related items might be located in different relative positions for target items in the current session.

The visualization of this process is shown in Figure 2.

3.4. Intra-session Module

After obtaining the intra-session graph , Intra-session Module is used to extract intra-session item-level interactions in session through the following two processes: Node representation learning and Intra-session representation generating, which are demonstrated as below.

Figure 3. A example of a session graph and the connection matrix and


3.4.1. Node representation learning

Based on , the first step to obtaining node representations is to get the node’s contextual information by aggregating information from other nodes to the target node. That aggregation can be defined as two parallel processes: in-degree nodes aggregation and out-degree nodes aggregation. Those two aggregation processes are dependent on two adjacency matrices , which denote weighted connections of out-degree and in-degree edges in the session graph respectively. For example, considering a session , the corresponding intra-session graph and adjacency matrices are illustrated in Figure 3. According to , the process of aggregation for target node in graph can be denoted as follow:

Figure 4. Details of an attention head.
Figure 5. The graphical model of the single convolutional layer using multi-head attention mechanism.

where are parameter matrices.

represent bias vectors.

is the list of node vectors in the session . are two rows of elements in and respectively corresponding to the node . After extracting the contextual information of out-degree nodes and in-degree nodes , we combine them to get the final contextual information representation of node , which can be denoted as , by the following operation:


where represents the concatenation operation.

After obtaining , the second step is to feed and the previous hidden state into gated update functions to update the hidden state of node , where the update functions are demonstrated as follows:


where and are the reset and update gates respectively, which decide what information to be preserved and discarded respectively.

represents the logistic sigmoid function and

denotes element-wise multiplication. The whole process is like a typical GRU-based updating that integrating information from other nodes and previous states to update the current hidden state of the target node. When the update process for all nodes in the graph is finished, we obtain the final vector representation of each node.

3.4.2. Intra-session representation generating

After feeding session graph into the gated graph neural networks, we obtain the updated vectors of all nodes in session . To alleviate the random interests drifts caused by users’ unintended clicks, we combine both users’ long-term preference and current interests of the session to generate the final intra-session representation. For the session , we use the last click item’s embedding to represent user’s current interests, i.e. . Then according to current interests , we aggregate all node vectors in to obtain the long-term preference representation by adopting a soft-attention mechanism. Specifically, we derive by the following calculation:


where and are learnable weighted parameters.

Finally, we combine and to generate . Technically, we first concatenate two interests and

, then use a linear transformation to compress the concatenation:


where transfers the concatenation vectors latent space from to .

means the final representation of the current session, which only contains information within a single session, namely intra-session representation.

3.5. Inter-session Module

The aim of Inter-session Module is to integrate inter-session item-level interactions into the collaborative information for the current session based on the inter-session graph . Same as Intra-session Module, the process of Inter-session Module can also be divided into the following two: Node representation learning and Intra-session representation generating, which are demonstrated as below.

3.5.1. Node representation learning

Because the graph is relatively large and the importance of nodes is unevenly distributed (not all items in neighbor sessions are equally related to the current session), assigning each node with a fixed normalized weight is not the best choice when aggregating neighbors nodes. Therefore, we introduce attention mechanisms (Veličković et al., 2017) when modeling the complex interactions between and . The attention mechanism can adaptively assign different weights to different neighbor nodes thus decreasing noises caused by less relevant items.

For the set of all item node vectors , where denotes the number of nodes in , the shared self-attention mechanism is applied to every node to compute attention coefficients:


where represents a shared weight matrix applied to every node vector and indicates the importance of the node ’s vector to the node . The attention mechanism applies a single-layer feedforward neural network with a weight vector . In addition, the LeakyReLU with the negative input slope is applied in the attention mechanism. In our experiment, we consider the first-order neighbors of (including ) in a single layer. So we only compute for nodes , where denotes some neighborhoods of node in the graph (including ).

Then we normalize attention coefficients using the softmax function:


To stabilize the learning process of self-attention, we apply multi-head attention (Veličković et al., 2017) in Inter-session Module. Specifically, we use independent attention heads to extract different latent vectors of node . Details of each attention head are illustrated in Figure 5. Then the module concatenates those node vectors as an output. So the update process of node is defined by the following operation:


where represents normalized attention coefficients computed by the -th attention mechanism , represents the concatenation for all attention mechanism outputs, and represents Sigmoid function. denotes the updated vector of node .

Whereas for the output layer, because we need to reduce the dimension of the node vector form to , we use averaging instead of combining. So Equation 9 can be rewritten as:


The structure of a single convolutional layer with multi-head attention is demonstrated in Figure 5.

According to the experiment result, the number of convolutional layers is set to two and the number of attention heads is set to eight for both layers to obtain the best result.

After updating, each node in session aggregates the information of its neighbor nodes in multiple sessions. Thus the collaborative information within related items among different sessions is encoded into each node’s representation. In order to distinguish the vector representation obtained by Intra-session Module and Inter-session Module, we use to denote the updated latent vector of node .

Datasets # of clicks # of training sessions # of testing sessions # of items average length
YOOCHOOSE 1/64 565,552 375,043 55,405 17,319 6.07
YOOCHOOSE 1/4 7,980,529 5,969,416 55,872 30,638 5.71
Diginetica 982,961 719,470 68,977 43,097 5.12
Table 1. Statistics of datasets used in our experiments.

3.5.2. Inter-session representation generating

The process of generating inter-session representation is similar to that in Intra-session Module. For session , we use the last click item’s updated node vectors to represent the user’s current interest with collaborative information, thus . Then the same soft-attention mechanism is adopted to aggregate all updated node vectors in to obtain long-term preference with collaborative information, the calculation formation is similar as Equation 5, where , .

Finally, a linear transformation is applied on the concatenation of two types of neighborhood information and to compute the inter-session information representation:


represents the inter-session representation generated from the current session’s neighbors that should have high relevance to the current session.

3.6. Intra-and Inter-Session Representations Combination

After obtaining the intra-session representation and inter-session representation , the last step for the final session representation is to combine them. Inspired by (Wang et al., 2019), we use fusion gating mechanism to obtain the final session representation :


where , denote the weight matrices in fusion gate and represents the bias vector. And is the final session representation.

3.7. Prediction Module and Objective Function

After obtaining the final session representation , we use it to multiply each candidate item vector to generate recommendation score for corresponding item:


Then we apply a softmax function to generate the output vector of the model :


where represents the recommendation scores over all candidate items and denotes the probabilities of items becoming the next-click item in session .

In the training process, we apply Cross-entropy as the loss function:



denotes the one-hot encoding vector of the ground truth item.

The proposed I3GN is trained by Back-Propagation Through Time (BPTT) algorithm in the learning process.

4. Experiments

In this section, we describe the information of datasets used in experiments and introduce the baseline methods used for comparison.

4.1. Datasets

To evaluate the efficiency of our proposed method, we conduct experiments on two real-world datasets: YOOCHOOSE111 dataset and Diginetica222 dataset. The YOOCHOOSE dataset is released by the RecSys challenge 2015, which records click sequences (item views, purchases) for a period of six months. The Diginetica dataset is published by CIKM Cup 2016, in which we only select the transaction data for experiments.

We filter out sessions of length one and items that appear less than five times for both datasets as same as previous studies (Liu et al., 2018; Wu et al., 2019). Furthermore, we use the last one day in YOOCHOOSE and last seven days in Diginetica to generate the test data. Because collaborative filtering idea-based methods cannot recommend an item which has not appeared before (Hidasi et al., 2015), we filter out items from test set which do not appear in the training set. According to previous studies (Liu et al., 2018; Wu et al., 2019), we use the most recent 1/4 and 1/64 of training sessions in YOOCHOOSE, which make up the YOOCHOOSE 1/64 and YOOCHOOSE 1/4 datasets respectively. Similar to (Wu et al., 2019), data augmentation is applied to preprocess the data. Specifically, we augment the data by splitting input sessions. For example, for an input session , we generate the sub-sessions and their corresponding labels , …, . We also sort all sessions in chronological order for all datasets.

The statistics of datasets are shown in Table 1.

4.2. Baseline

We compare proposed method with following representative and state-of-the-art methods as baselines to evaluate the performance:

  • POP: A model that always recommends the most popular items in the training set.

  • Item-KNN (Sarwar et al., 2001): A traditional model that recommends items based on the similarity between the existing items in the session.

  • FPMC (Rendle et al., 2010): A hybrid model that combines Matrix Factorization and Markov Chain for next-basket recommendation.

  • BPR-MF (Rendle et al., 2009): A widely used matrix factorization method, which optimizes a pairwise ranking objective function with Bayesian Personalized Ranking loss.

  • SKNN (Bonnin and Jannach, 2015): A neighborhood-based method considering the integrity of sessions.

  • GRU4REC (Hidasi et al., 2015): An RNN-based SRS model. It employs GRU units and the session-parallel mini-batch training process.

  • NARM (Li et al., 2017): This model employs RNNs with the attention mechanism to capture the user’s main purpose and sequential behavior and combines them to make recommendations.

  • STAMP (Liu et al., 2018): This model uses the last click to represent the short-term interest and utilize the attention mechanism to capture the user’s long-term interest. Then it combines them to make recommendations.

  • SR-GNN (Wu et al., 2019): A model uses Graph Neural Networks to generate latent vectors of items and make recommendations with attention mechanisms.

  • KNN-RNN (Jannach and Ludewig, 2017): A hybrid model that weightedly combine GRU4REC with SKNN to get a better result.

  • CSRM (Wang et al., 2019): A hybrid neural network-based framework which takes session-level collaborative information into account.

MRR Recall MRR Recall MRR Recall
@5 @10 @5 @10 @5 @10 @5 @10 @5 @10 @5 @10
POP 1.36 1.51 3.29 4.59 0.20 0.27 0.37 0.85 0.19 0.22 0.39 0.63
Item-KNN 19.97 21.38 32.80 43.39 19.57 21.08 32.07 43.38 8.05 8.95 14.47 21.30
SKNN 22.55 24.31 39.22 52.34 22.52 24.29 39.17 52.38 16.37 17.79 27.46 38.12
FPMC 19.76 20.85 29.61 37.81 16.69 17.90 26.79 35.96 15.84 16.09 18.55 20.45
BPR-MF 17.77 18.35 24.86 29.16 15.89 16.12 20.15 21.85 13.39 13.50 16.69 17.53
GRU4REC 24.60 26.48 39.22 52.34 20.11 21.78 34.55 47.12 6.69 7.69 12.98 20.52
NARM 26.21 27.97 44.34 57.50 26.08 28.10 44.34 57.83 25.02 26.53 40.67 51.91
STAMP 27.26 28.92 45.69 58.07 27.47 29.24 46.39 59.62 25.21 26.69 41.04 52.07
SR-GNN 28.01 29.97 46.49 60.33 29.34 31.08 48.15 61.06 25.56 26.82 41.11 51.47
KNN-RNN 25.39 27.26 43.15 57.03 22.00 23.60 37.17 49.06 10.42 11.51 13.40 21.06
CSRM 27.84 29.62 46.76 60.06 28.91 29.12 47.98 60.28 25.17 26.64 41.36 52.89
I3GN 28.67 30.44 47.32 60.41 29.53 31.28 48.33 61.29 26.30 27.84 41.81 52.66
Table 2. Performance comparison of I3GN with baseline methods.

4.3. Evaluation Metrics

To evaluate the performance of the proposed method, we adopt the following two common metrics in our experiments:

  • Recall@: Recall@ is a common metric to evaluate the performance of SRS model. Recall@ is the proportion of cases having the desired item amongst the top- items in all test cases.

  • MRR@: MRR@(Mean Reciprocal Rank) is the average of reciprocal ranks of the desired items. The reciprocal rank is set to zero if the value of rank exceeds . MRR is especially important to measure the performance of SRS because it considers the order of recommendation results and users tend to focus on higher-ranked items.

Because most users are only interested in viewing recommendations on the first page of real application scenarios (e.g., web sites of e-commerce), the relevant item should be amongst the first few items in the recommendation list (Hu et al., 2017; Quadrana et al., 2017). So we report values of all metrics at =.

MRR Recall MRR Recall MRR Recall
@5 @10 @5 @10 @5 @10 @5 @10 @5 @10 @5 @10
26.87 28.79 44.86 57.76 26.99 28.69 45.10 57.65 25.28 26.67 40.22 50.62
28.01 29.97 46.49 60.33 29.34 31.08 48.15 61.06 25.56 26.82 41.11 51.47
I3GN 28.67 30.44 47.32 60.41 29.53 31.28 48.33 61.29 26.30 27.84 41.81 52.66
Table 3. Performance comparison of I3GN with different level session representation on three datasets.

4.4. Parameter Setup

In our experiments, we set the embedding dimension of items as 100 on two YOOCHOOSE datasets and 50 on Diginetica dataset. We use a Gaussian distribution with a mean of zero and a standard deviation of 0.1 to initialize model parameters. We also adopt the mini-batch Adam optimizer to optimize parameters, where the initial learning rate is set to 0.001. The batch size is set to 128 on both YOOCHOOSE 1/64 and Diginetica, and it is 256 on YOOCHOOSE 1/4. For parameters of Inter-session Module, we decay the learning rate by 0.1 every five epochs, while the other modules in I3GN decay the learning rate by 0.1 every three epochs. We set the number of nearest neighbors

in the neighbor sessions retrieval according to the experiment results. For a fair comparison, on each dataset, we unify the dimension of embedding for all baselines and set the number of neighbors in CSRM to the same as ours. We use PyTorch to implement our model where graph models are carried out by PyTorch Geometric library

(Fey and Lenssen, 2019). The model is trained on a Geforce Titan V GPU.

5. Results and analyses

MRR Recall MRR Recall MRR Recall
@5 @10 @5 @10 @5 @10 @5 @10 @5 @10 @5 @10
I3GN- 24.69 26.84 41.72 55.12 25.38 27.23 43.12 56.91 20.28 21.87 34.80 46.65
I3GN- 27.76 29.48 46.02 58.89 28.50 30.25 47.10 60.12 24.86 26.28 39.93 50.55
I3GN- 28.04 29.77 46.62 59.53 28.74 30.48 47.78 60.70 25.45 26.87 40.48 51.12
I3GN- 28.49 30.10 47.05 60.06 29.18 30.92 47.95 60.85 26.12 27.45 41.53 52.24
I3GN 28.67 30.44 47.32 60.41 29.53 31.28 48.33 61.29 26.30 27.84 41.81 52.66
Table 4. Performance comparison of I3GN with different graph modeling strategies.

In this section we compare the proposed model with other state-of-the-art methods, then we conduct detailed analyses of our model under different experimental settings.

5.1. Comparison with baseline methods

The experimental results of all methods in top-5 and top-10 recommendation on YOOCHOOSE and Diginetica datasets are illustrated in Table 2, and the following observations stand out:

  • Two KNN-based methods: Item-KNN and SKNN considerably outperform other conventional baseline methods. This proves the effectiveness of adopting inter-session collaborative information on recommendations. Furthermore, SKNN takes the entire current session into consideration when calculating similarity while Item-KNN only considers the last item in the current session and ignores session contextual information. So SKNN achieves a better result.

  • All of the neural network-based methods distinctly outperforms other conventional recommendation methods, demonstrating the superiority of adopting deep learning technology to make recommendations. The key reason for this may be RNN’s ability to process sequentiality and thus model the intra-session item-level interactions.

  • By comparing the performance of the original model and its neighborhood-enhanced version (e.g., GRU4REC and KNN-RNN, NARM and CSRM), we can observe that utilizing neighborhood information can enhance model performance. This result confirms the effectiveness of combining different scopes in session-based recommendations.

  • On the whole, graph-based methods (SR-GNN and I3GN) outperform RNN-based methods (GRU4REC, NARM, STAMP, KNN-RNN, and CSRM). This indicates that it is important for SRS to explicitly model contextual item-level interactions because GNN can easily extract those by aggregating information among multiple nodes, but RNN can only deal with unidirectional transitions between consecutive items.

  • Finally, our proposed I3GN obtains the best performance in almost every experiment, which validates that taking inter-session item-level interactions into account is beneficial. Although the performance of I3GN on Diginetica is lower than CSRM in terms of Recall@10, our model performs better than CSRM under stricter rules (e.g., the evaluation metric of Recall@5 on all datasets).

5.2. Influence of Inter-session Module

We further analyze the effort of utilizing inter-session information. Two downgraded versions of our model are proposed to compare with I3GN: is a version of I3GN without Intra-session Module, and refers to I3GN without Inter-session Module. only models intra-session interactions and extracts information to make recommendations. Since the process of modeling the current session and obtaining the intra-session representation is identical to SR-GNN, we use the performance of SR-GNN to represent .

ignores the intra-session information and extracts item-level interactions between the current session and its neighbor sessions directly. Table 3 displays the experimental results on three datasets.

Results show that substantially outperforms , which demonstrates that intra-session information plays a more crucial role when making recommendations. Moreover, the best performance obtained by I3GN reveals that combining two types is an effective strategy.

5.3. Influence of the Number of Neighbors

In this section, we vary the neighbor number to investigate its influence. We vary from zero to 200 and conduct our experiments on YOOCHOOSE 1/64 dataset. means the neighbor sessions are eliminated and only the current session is used to build the graph in Inter-session Module, thus no information from other sessions could be influential. In order to better analyze the role of , we adopt SKNN to compare it with I3GN. The parameters of SKNN are the same as the neighbors retrieval process in I3GN. Formally, given a session , the recommendation score for each item is generated by the following computation:


where the indicator function returns one if session contains item and 0 otherwise. SKNN is unable to make recommendations without neighbor sessions, so we do not report the result of SKNN when . The results of Recall@10 with different are illustrated in Figure 6.

From Figure 6 we can observe that, with the increase of , the performance of I3GN and SKNN are increased at first since the more neighbors there are for each session, the more information could be utilized to make recommendations. However, after , the performance of I3GN starts to drop and the improvement of SKNN is marginal. This result can be explained that when reaches a certain value, the benefits brought by additional growth of less similar neighbor sessions to SKNN gradually decrease, and extra noise sessions began to have negative effects on I3GN.

Figure 6. The Recall@10 of I3GN and SKNN with different number of neighbors on the YOOCHOOSE 1/64 dataset.

5.4. Further Analysis of Inter-session Module

To deeply understand the mechanism of Inter-session Module in I3GN, we further conduct experiments to analyze the efficacy of two pivotal components: graph modeling and attention mechanisms. Four variants of our model are proposed for comparisons:

  • I3GN-: This model uses an average pooling operation on all item node vectors in neighborhood sessions to generate , which is formulated as:


    where . I3GN- represents the simplest way to employ collaborative information without graph structure modeling and attention mechanisms.

  • I3GN-: This model uses a mean-based aggregation method instead of applying attention mechanisms while aggregating current session’s neighbors. Each edge in has fixed normalized weight and the aggregation process of node can be formalized as:


    where .

  • I3GN-: This model uses average pooling operation on current session’s node vectors to replace soft attention mechanisms when generating long-term preference related neighbors information. So the calculation formation of in Inter-session Module can be rewritten as:

  • I3GN-: This variant combines I3GN- and I3GN-, removing all attention mechanisms in Inter-session Module but reserving the graph structure.

The results of comparisons among I3GN and its variants are shown in Table 4. Observations of the results can be listed as follows:

  • I3GN- gets the worst performance in the experiments, the lack of graph structure and attention mechanisms leads to dramatic performance drops.

  • Compare the performance of I3GN- with I3GN-, the adoption of the graph structure significantly increases the model performance. Besides, applications of attention mechanisms (I3GN- and I3GN-) further boost the performance. This result indicates that not all interacted items in the Inter-session Graph contribute equally to the current session, i.e., not all items from neighborhood sessions are relevant to users’ interests.

  • The performance of I3GN- outperforms I3GN- in all experiments, we speculate the reason is that neighbor nodes of less important items in a current session may also be less important when making recommendations.

  • I3GN makes full advantage of utilizing graph structure to extract complex item-level interactions between and its neighbors and applying attention mechanisms to alleviate the influence of less relevant items. Hence I3GN surpasses all variants of itself.

Figure 7. The number of attention heads.
Figure 8. The number of convolution layers.

5.5. Parameter sensitivity

In order to study the effect of the number of attention heads and the number of graph convolutional layers in Inter-session Module, we vary from one to 32 and layer number from one to four, respectively. Other parameters are fixed in experiments. The results of Recall@10 on YOOCHOOSE 1/64 dataset are shown as Figure 8 and Figure 8. From Figure 8 we can observe that as the number of attention heads increases, the performance of I3GN also increases because multi-head attention brings a more stable learning process. However, the performance gets worse after , this result may be due to possible overfitting.

According to Figure 8, the best performance of I3GN is achieved when the layer number of graph convolution is set to two. We assume that fewer layers could only encode limited information from neighborhood sessions and more layers may aggregate higher-order nodes which may be less relevant to the current session’s nodes.

6. Conclusions

We have proposed a novel method I3GN to integrate inter-session item-level interactions into session-based recommendations. I3GN is consist of two major parts: SMM and CSE. SMM uses similarity to find neighbor sessions and constructs the intra-session graph and inter-session graph for the current session based on them. After that, CSE employs different strategies to encode two kinds of graphs and combine them to get the final session representation for prediction. Extensive experiments on real-world datasets prove that I3GN outperforms other state-of-the-art methods in different evaluation metrics. Further experiments and analysis demonstrate the following facts: (1) Inter-session item-level interactions have high potential in session-based recommendations. (2) Combining intra-and inter-session graphs is a rational way to balance the model across scopes, and its superior performance indicates that we need to take both the width and depth of session data into account.


  • D. Billsus and M. J. Pazzani (1998) Learning collaborative information filters.. In ICML, Vol. 98, pp. 46–54. Cited by: §2.1.
  • G. Bonnin and D. Jannach (2015) Automated generation of music playlists: survey and experiments. ACM Computing Surveys (CSUR) 47 (2), pp. 26. Cited by: §1, §2.1, 5th item.
  • J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.2.
  • E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun (2017)

    GRAM: graph-based attention model for healthcare representation learning

    In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795. Cited by: §2.2.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.2.
  • D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: §2.2.
  • M. Fey and J. E. Lenssen (2019) Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428. Cited by: §4.4.
  • J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    pp. 1263–1272. Cited by: §2.2.
  • M. Gori, G. Monfardini, and F. Scarselli (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: §2.2.
  • B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk (2015) Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939. Cited by: §1, §1, §2.1, 6th item, §4.1.
  • W. Hill, L. Stead, M. Rosenstein, and G. Furnas (1995) Recommending and evaluating choices in a virtual community of use. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 194–201. Cited by: §2.1.
  • L. Hu, L. Cao, S. Wang, G. Xu, J. Cao, and Z. Gu (2017) Diversifying personalized recommendation with user-session context.. In IJCAI, pp. 1858–1864. Cited by: §4.3.
  • D. Jannach and M. Ludewig (2017) When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems, pp. 306–310. Cited by: §1, §1, §2.1, 10th item.
  • R. Jin, J. Y. Chai, and L. Si (2004) An automatic weighting scheme for collaborative filtering. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 337–344. Cited by: §2.1.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.2.
  • J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma (2017) Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1419–1428. Cited by: §1, §1, §2.1, 7th item.
  • Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015a) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.2.
  • Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015b) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.2.
  • G. Linden, B. Smith, and J. York (2003) Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet computing (1), pp. 76–80. Cited by: §2.1.
  • Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang (2018) STAMP: short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1831–1839. Cited by: §1, §1, §2.1, 8th item, §4.1.
  • M. Ludewig and D. Jannach (2018) Evaluation of session-based recommendation algorithms. User Modeling and User-Adapted Interaction 28 (4-5), pp. 331–390. Cited by: §1, §1, §3.3.
  • A. Micheli (2009) Neural network for graphs: a contextual constructive approach. IEEE Transactions on Neural Networks 20 (3), pp. 498–511. Cited by: §2.2.
  • B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and J. Riedl (2003) MovieLens unplugged: experiences with an occasionally connected recommender system. In Proceedings of the 8th international conference on Intelligent user interfaces, pp. 263–266. Cited by: §2.1.
  • B. Mobasher, H. Dai, T. Luo, and M. Nakagawa (2002) Using sequential and non-sequential patterns in predictive web usage mining tasks. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings., pp. 669–672. Cited by: §2.1.
  • J. Pareek, M. Jhaveri, A. Kapasi, and M. Trivedi (2013) SNetRS: social networking in recommendation system. In Advances in Computing and Information Technology, pp. 195–206. Cited by: §2.1.
  • M. Quadrana, P. Cremonesi, and D. Jannach (2018) Sequence-aware recommender systems. ACM Computing Surveys (CSUR) 51 (4), pp. 66. Cited by: §1.
  • M. Quadrana, A. Karatzoglou, B. Hidasi, and P. Cremonesi (2017) Personalizing session-based recommendations with hierarchical recurrent neural networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems, pp. 130–137. Cited by: §4.3.
  • S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009) BPR: bayesian personalized ranking from implicit feedback. In

    Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence

    pp. 452–461. Cited by: 4th item.
  • S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme (2010) Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web, pp. 811–820. Cited by: §1, §2.1, 3rd item.
  • P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl (1994) GroupLens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on Computer supported cooperative work, pp. 175–186. Cited by: §2.1.
  • B. Sarwar, G. Karypis, J. Konstan, and J. Riedl (2000) Application of dimensionality reduction in recommender system-a case study. Technical report Minnesota Univ Minneapolis Dept of Computer Science. Cited by: §2.1.
  • B. M. Sarwar, G. Karypis, J. A. Konstan, J. Riedl, et al. (2001) Item-based collaborative filtering recommendation algorithms.. Www 1, pp. 285–295. Cited by: §1, §2.1, 2nd item.
  • F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.2.
  • G. Shani, D. Heckerman, and R. I. Brafman (2005) An mdp-based recommender system. Journal of Machine Learning Research 6 (Sep), pp. 1265–1295. Cited by: §1, §2.1.
  • N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt (2011) Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12 (Sep), pp. 2539–2561. Cited by: §2.2.
  • Y. K. Tan, X. Xu, and Y. Liu (2016) Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 17–22. Cited by: §1, §2.1.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2.2, §3.5.1, §3.5.1.
  • M. Wang, P. Ren, L. Mei, Z. Chen, J. Ma, and M. de Rijke (2019) A collaborative session-based recommendation approach with parallel memory modules. Cited by: §1, §1, §2.1, §3.6, 11st item.
  • S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2019) Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 346–353. Cited by: §1, §1, §2.1, 9th item, §4.1.
  • P. Yanardag and S. Vishwanathan (2015) Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374. Cited by: §2.2.
  • A. Zimdars, D. M. Chickering, and C. Meek (2001) Using temporal data for making recommendations. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 580–588. Cited by: §1, §2.1.