Session-based Recommendation with Heterogeneous Graph Neural Network

08/12/2021 ∙ by Jinpeng Chen, et al. ∙ 0

The purpose of the Session-Based Recommendation System is to predict the user's next click according to the previous session sequence. The current studies generally learn user preferences according to the transitions of items in the user's session sequence. However, other effective information in the session sequence, such as user profiles, are largely ignored which may lead to the model unable to learn the user's specific preferences. In this paper, we propose a heterogeneous graph neural network-based session recommendation method, named SR-HetGNN, which can learn session embeddings by heterogeneous graph neural network (HetGNN), and capture the specific preferences of anonymous users. Specifically, SR-HetGNN first constructs heterogeneous graphs containing various types of nodes according to the session sequence, which can capture the dependencies among items, users, and sessions. Second, HetGNN captures the complex transitions between items and learns the item embeddings containing user information. Finally, to consider the influence of users' long and short-term preferences, local and global session embeddings are combined with the attentional network to obtain the final session embedding. SR-HetGNN is shown to be superior to the existing state-of-the-art session-based recommendation methods through extensive experiments over two real large datasets Diginetica and Tmall.



There are no comments yet.


page 1

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the era of big data, it is extremely difficult for users to select the information they need from a large number of products and services. The recommendation system can learn user preferences based on the user’s historical data to help users make reasonable decisions and choices. With the development of recommendation systems and the analysis of user data, user preferences can be divided into long-term preferences and short-term preferences [20]. However, the traditional recommendation systems only considers the user’s long-term preferences but ignores the transfer of user preferences. For example, in an e-commerce platform, the items purchased by users constitute the users’ behavior sequence, and the purpose of the traditional recommendation system aims to learn users’ long-term preferences from transitions of items. Fig. 1 LABEL:sub@fig:1a shows the transitions of six kinds of items. The traditional recommendation algorithm considers the transformational relations of items in all user behaviors so that the recommendation system can only focus on the items that the user pays attention to for a long time, which makes it impossible to quickly perceive the changes in users’ interests. As a result, the recommendation system cannot quickly learn the transfer of user preferences. For example, a user who bought jeans for a long time suddenly likes sports pants. However, because the user has bought jeans more often than sports pants, the traditional recommendation system believes that the user’s preference is still jeans. Therefore, the traditional recommendation system ignores the transfer of user preferences because it does not consider the transaction structure of user behavior. To solve this problem, it is necessary to decompose user behavior into smaller granularity, that is session. A session is a collection of an event or a group of events, which is transactional. For example, goods purchased by a user at one time, songs a user listens to in an hour, and web pages browsed within a day, can all be considered as a session [20]. The session is derived from the segmentation of user behavior into smaller granularity, which is endowed with transactional nature. The session-based recommendation method takes the session as the basic unit of recommendation, which can reduce the user’s information loss, and it has been extensively studied.

Fig. 1: Transitions of items and session sequence. (a)The transitions between items in the traditional recommendation system. (b)The session sequence in session -based recommendation system: the user behavior is divided into session sets. Here, the arrows indicate that users purchase the item v after purchasing the item v.

As shown in Fig. 1 LABEL:sub@fig:1b, in the e-commerce platform, the users’ purchasing behavior in Fig. 1 LABEL:sub@fig:1a is divided into smaller granularity, and generate the session sequence {s1, s2, s3}. The session-based recommendation system decomposes user behavior into a set of sessions, thereby giving the user behavior transactional attribute (a session can be regarded as a transaction) so that the recommendation system can focus on the transfer of user’s preferences. For example, if a user has bought leather shoes for a long time, but recently prefers sports shoes, the session-based recommendation system can capture the transfer of user preferences promptly. Instead of continuing to recommend items related to leather shoes, as traditional recommendation systems do, it will recommend items related to sneakers. Thus, the session-based recommendation system ignores the long-term preferences of users. To make the session-based recommendation system not only focus on the short-term preferences of users but also consider the long-term preferences of users, it is necessary to fully consider the dependence between sessions. There are three main aspects of dependency in the conversation set: dependencies between different items in the same session, dependencies between different sessions, and dependencies between different items in different sessions. This requires that the session-based recommendation system model can fully consider the context of sessions and learn the complex transitions between items.

In recent years, graph neural networks have been applied to session-based recommendation systems. In the GNN based recommendation system, firstly, the session set is constructed as a directed graph, where the nodes of the directed graph represent items, and the edges represent the transfer of one item to another. Secondly, according to the constructed directed graph, the graph neural networks can learn the complex transformation relationship between items, and learn the item embeddings with strong expression ability, to generate session embeddings containing complex transformation information of items. For example, Session-based Recommendation with Graph Neural Networks, SR-GNN [22]

, can not only capture the transfer of items in a short period but also consider the dependencies between distant items. Therefore, accurate item embedding vectors can be learned. Meanwhile, SR-GNN adds an attention network to focus on the user’s local session embeddings and global session embeddings, so that the model can consider the user’s long and short-term preferences. Although SR-GNN considers the long and short-term preferences of users and the complex transitions between items, it ignores other valid information in the session sequence, such as user information, which may lead to the loss of specific preferences of different users. For example, in Fig.

2, two users have bought shoes after buying jeans, but they have different types of shoes. One user buys leather shoes and the other user buys sports shoes; u buys gloves and u buys hats. One new user (anonymous user) generates a session s, the two items in s are related to u. When recommending the next item for the new user, u’s preference should be considered to a certain extent. It can be seen that certain user information can be captured in the item sequence in the session, which can improve the accuracy of the recommendation results.

Fig. 2: Session sequence of different users. Here, the red wire frames show that different users may have different preferences after purchasing the item v.

This paper introduces HetG [17] to construct the user and other information that cannot be expressed in a homogenous graph. HetG is a heterogeneous graph that contains multiple types of nodes and edges that represent different relationships. It can not only represent the transitions between nodes of the same type, but also the dependencies between different types of nodes, so the information that the heterogeneous graph can express is richer. Although HetG has a stronger ability to express data, it is very difficult to embed different nodes of the heterogeneous graph into a unified vector space. Heterogeneous Graph Neural Network, HetGNN [26], learns latent vectors of different types of nodes in the heterogeneous graph according to the idea of aggregating heterogeneous neighbors. HetGNN can capture the heterogeneity of structure and content at the same time, and fully consider the transformation relationship of items.

To make the next item recommendation with rich and multi-type data information, we propose a novel method for Session-based Recommendation with Heterogeneous Graph Neural Networks, SR-HetGNN. SR-HetGNN takes full account of the dependencies among items, users, and sessions, and it can learn session embeddings with rich information and complex transitions of items.

The main contributions of this paper are as follows:

  • We construct the session sequence into a heterogeneous graph, in which the rich dependency relationships among items, sessions, and users can be fully considered.

  • We propose a session-based recommendation with heterogeneous graph neural network, SR-HetGNN. SR-HetGNN can capture user’s potential information from item sequence in anonymous session.

  • We conduct a large number of experiments on real-world datasets. The results shows that SR-HetGNN is superior to other existing methods.

The rest of this paper is organized as follows: Section 2 introduces the related work of the recommendation system; Section 3 gives the definition of the related symbols and problems; Section 4 introduces Session-Based Recommendation with Heterogeneous Graph Neural Networks in detail; Section 5 summarizes this paper.

Ii Related Work

In this section, we will first review the related work of a session-based recommendation system, including some traditional recommendation methods, and neural networks based recommendation methods. Then we will introduce the heterogeneous graph neural network.

Ii-a Traditional recommendation methods

Collaborative filtering [1] is one of the most popular recommendation methods. It categorizes users according to their ratings for items, and finds other users with similar interests for the target user. Then CF recommends items that other users are interested in but the target user has not seen or purchased to the target user. Matrix Factorization (MF) [9] is a typical traditional recommendation approach. It can embed users and items in the same vector space. The inner product of user embedding and item embedding is the user’s degree of interest in the item, but MF cannot learn the sequence transformation of items. Therefore, Steffen Rendle et al.

combine Matrix Factorization and Markov Chain (MC), and realize the Factorized Personalized MC (FPMC)

[16]. In this model, MF is used to learn user preferences, and MC uses transition graphs to model user sequence behavior so that FPMC can make the next recommendation better. Steffen rendele et al. also propose a general optimization criterion BPR opt for personalized sorting, and apply it to MF algorithm to realize BPR-MF [15].

Ii-B Neural-network-based methods

In recent years, neural networks have been widely used in session-based recommendation methods. Hidasi B et al.

apply Recurrent Neural Network (RNN) to the recommendation system and propose a recommendation approach based on RNN, GRU4Rec

[7]. GRU4Rec is essentially an improvement of the basic GRU model, it can consider the dependency between the previous node and the current node. GRU4Rec is sensitive to the user’s sequence behavior, so that it can capture the transfer of user’s preferences in time, but it cannot learn the user’s long-term preferences. Li et al. propose the Neural Attentive Recommendation Machine (NARM) [10]. NARM can learn users’ short-term preferences and long-term preferences respectively with two RNNs. Liu et al. propose Short-Term Attention/Memory Priority Model (STAMP) [11]. This paper first proposes the short-term memory priority model (STMP). STMP can give priority to the user’s current interest preferences while considering the long-term preferences of users built outside the model. However, STMP may have the interest drift issue leading to incorrect recommendations. To address this issue, STMP adds an attention network to learn users’ long-term preferences, that is the STAMP. STAMP can capture the user’s long-term preferences in the long-term user behavior, and obtain the user’s short-term preference through the latest click item in the current session, which achieves a good recommendation performance. Guo et al. have implemented a Hierarchical Leaping Networks, HLN [6]. HLN can explicitly establish the user’s multiple preferences. It has a Leap Recurrent Unit (LRU), which is used to skip items that are not related to user preferences and accept learned preferences. HLN also has a Preference Manager (PM) to manage the learned preferences. HLN recommends items for users based on the users’ preferences in PM.

Graph Neural Network (GNN) [27, 23, 13] can capture the dependences between nodes through message passing between nodes. With the development of Graph Embedding [4], Graph Neural Networks (GNNs) have also been widely used in recommendation systems. For example, Wang et al. propose Global Context Enhanced Graph Neural Networks, GCE-GNN [21]. GCE-GNN constructs local session graph and global session graph from local session sequence and global session sequence respectively and uses GNN to learn session-level item embedding and global item embedding. Here GNN can consider the complex transformation relationship between the items in the local session graph and the global session graph, which makes the learning item embeddings more effectively. GCE-GNN also combines session-level item embeddings and global item embeddings through the attention mechanism, and finally generates mixed session embeddings. Yu et al. implement the Target Attentive Graph Neural Networks (TAGNN) [25]. TAGNN constructs the session sequence into a session graph and learns the item embeddings with the GNN. TAGNN also designs a target-aware attention mechanism, which can capture the complex transformation relationship between items, and learn the changes of user’s interest in different items at the same time.

Compared with the traditional recommendation methods, the neural-network-based session recommendation methods are able to capture the complex dependencies among users, items and sessions. Therefore, more and more researchers apply different deep learning models to the session based recommendation and solve a series of problems,as shown in Table


Model Problems Solved
Traditional recommendation methods Mining user preferences with display information
Recurrent Neural Network Mining sequential patterns of user behavior using RNN
Attentional mechanism Emphasize the user’s current purpose and ignore the user’s wrong interactions
Graph Neural Network Consider complex structures and transformation relationship between items
TABLE I: Comparison of session recommendation models

Ii-C Heterogeneous Graph Neural Network

Heterogeneous Information Network (HIN) [18] [24] can build multiple types of nodes into the same graph, and link these nodes with different types of edges, so that HIN can model complex context information [18]. However, it is difficult to embed these nodes or edges into the same vector space, because HIN has different types of nodes or edges. This paper calls HIN as Heterogeneous Graph (HetG) to compare with Homogenous Graph.

Recently, graph representation learning [2] has developed rapidly, and many graph embedding methods have been proposed, such as DeepWalk [12], metapath2vec [3], node2vec [5], and LINE [19], etc. The DeepWalk algorithm generates node sequences by random walk, which is regarded as a sentence, and uses language modeling to generate the embedded representation of nodes. This paper uses the DeepWalk algorithm to embed heterogeneous graphs into the same vector space to obtain pre-embedding representations of various types of nodes. In recent years, HetGNN is also often applied to recommendation systems. Ren et al. propose an effective citation recommendation method based on information network clustering, ClusCite [14], which learns the relationship between citations in a heterogeneous information network and clusters them into interest groups. Hu et al. develop a new deep neural network with a common attention mechanism: Meta-path based Context for Recommendation, MCRec [8]. MCRec learns the effective representation of users, objects, and context-based on meta-paths, and it can realize powerful interactive functions.

Iii Problem Definition

This section first defines the symbols that will be used, and then formally defines the studied problem.

Iii-a Symbol definition

User behavior V={v, v, v, …, v} is composed of the click sequence of all users U={u, u, u, …, u}. Session-based recommendation system decomposes user behavior V into smaller granularity to obtain the session sequence S={s, s, s, …, s}. A session s is composed of multiple items, s={v, v, v, …, v}, where v V. To embed all nodes in the same model, it is necessary to embed all nodes into a vector space to obtain all node embeddings , where is session embedding set, is item embedding set, and is user embedding set. Common symbols can be seen in Table II.

Notations Descriptions
V Item set contains all items
S Session set contains all sessions
U User set contains all users
Arrt Attribute set of a node
v An item in item set
s A session in session set
u An user in user set
arrt An attribute in attribute set
Item embedding set
Session embedding set
User embedding set
Attribute embedding set
An item embedding
A session embedding
An user embedding
An attribute embedding
All node embedding set
TABLE II: Symbol Table

Iii-B Problem definition

Iii-B1 Heterogeneous graph Construction

We construct the session sequence S into a heterogeneous graph G=(V, S, U, E, E). G contains three types of nodes: item node V, session node S, and user node U, and two types of edges: the set of directed edges between two items E, and the set of undirected edges E. E=(v, v) means that the user has purchased the next item v after the user purchased item v. E can represent the complex transformation relationship between items. E ={(v, s), (v, u), (s, u)}, represents the relationship between items and sessions, items and users, and users and sessions, respectively.

Iii-B2 Learning item embeddings with Heterogeneous Graph Neural Network

After constructing the heterogeneous graph G, this paper designs the Heterogeneous Graph Neural Network model with parameters . The model is used to learn the embedded representation of items, and the information of user nodes and session nodes are aggregated into the item embeddings , where the item embedding is the vector representation of node v.

Iii-B3 Session-based Recommendation with HetGNN

The goal of session-based recommendation is to recommend the next item for the user in the current session. The model in this paper decomposes user behavior V into session sequence S and constructs session sequence S into heterogeneous graph G. Then the heterogeneous graph G is embedded into the HetGNN model , and it will learn item embeddings , which contain rich information and transformational relations of items. Then the model will generate session embeddings according to the item embeddings

. Finally, the scores of items are obtained through the softmax layer, the top-n items will be recommended as the next item v

for the user.

Iv The Proposed Method

This section will introduce the model structure of SR-HetGNN, SR-HetGNN is mainly divided into three parts: constructing the session sequence into a heterogeneous graph; learning the item embeddings; and generating the session embeddings. Fig. 3 shows the model structure of SR-HetGNN.

Fig. 3: Model structure diagram. (a)The model structure diagram of SR-HetGNN: Construct the session sequence into a heterogeneous graph, which contains three types of nodes: item nodes, session nodes, and user nodes; next, the item embeddings are learned through the HetGNN; then generate the session embeddings through the attention network, and finally give the recommendation result of the next item. (b) The structure of heterogeneous graph neural network(HetGNN): first, sample heterogeneous neighbors for all item nodes; next, aggregate the content of heterogeneous neighbor nodes; then aggregate the heterogeneous neighbors of the same type; finally aggregate different types to obtain the final embedding of item nodes.

Iv-a Building the Heterogeneous Graph

First, session sequence S needs to be constructed into a Heterogeneous Graph G=(V, S, U, E, E), G is shown in Fig. 3 (a), and its structure is described in Section 3.2. After G is created, the DeepWalk algorithm [12] is used to embed all nodes in the heterogeneous graph into a vector space. The steps of the DeepWalk algorithm are as follows: First, the random walk with fixed step length is carried out from each node to get the word vectors , where is the symbolic representation of a node in G. Then Word2vec is used to train these word vectors to generate the pre-embedding vectors for all nodes. For example, as shown in Fig. 3 (a), the random walk of the DeepWalk starts at node v, and walk to node u randomly, then jumps to node v, and repeat the random walk until the word vector ={v, u, v, …} with length is obtained. Finally, Word2vec trains the word vectors to get the pre-embedding vectors of all nodes.

Iv-B Learning item embeddings

After the pre-embedding node vectors are generated, each node in the heterogeneous graph G can be represented by vectors of the same dimension. However, the pre-embedding node vectors lack the expressive ability and cannot express the complex transitions between items. Therefore, this paper uses Heterogeneous Graph Neural Network, HetGNN, to learn item embeddings that contain rich information and complex transformation relationships. The model structure of HetGNN can be seen in Fig. 3 (b). The core idea of HetGNN is aggregation, which is mainly divided into four steps: Sampling heterogeneous neighbors, aggregating the node content of heterogeneous neighbors, aggregating heterogeneous neighbors of the same type, and aggregating different types.

Iv-B1 Sampling heterogeneous neighbors

An important issue for learning item embeddings is how to aggregate heterogeneous neighbor nodes with different content. The types and numbers of these heterogeneous neighbor nodes are different, and aggregating these nodes may require different feature transformations. For example, a user purchases three items in two sessions, and another user purchases four items in three sessions. Here, the sizes of neighbor nodes of the two users are different. It is an important problem to select nodes as heterogeneous neighbors of users so that they can use the same model to aggregate these heterogeneous neighbors. To solve this problem, this paper adopts the restart-based random walk (RWR) [26] method to sample heterogeneous neighbors. The main steps of RWR are as follows:

  1. For all item nodes V={v, v, v, …, v}, start random walkings from each item node v

    . In the process of random walk, there is a probability

    that it will return to the initial node v. In order to get all types of nodes during the random walk, RWR controls the number of each type of nodes to walk, and stores all the walked nodes in a list, namely RWR(v);

  2. Classify all nodes in RWR(v), and select top-k nodes for each type t as heterogeneous neighbors of item node v. This method can select all types and the same number of heterogeneous neighbors for each item node.

Iv-B2 Aggregating the node content of heterogeneous neighbors

Different types of heterogeneous neighbors have different content of nodes. For example, user nodes may contain attributes such as age and gender, and item nodes contain attributes such as item name and type. Thus how to encode the different attributes of nodes into fixed-dimensional embedded representations through neural networks? HetGNN designes an architecture based on Bi-directional LSTM (BiLSTM) to obtain the interaction between features and aggregate all the attributes of the node into the embedded representation of the node so that it has greater representation ability. The specific steps are as follows:

First, node v is a heterogeneous neighbor of node v, its attribute set is Arrt={arrt, arrt, …, arrt}. We use different models to transform the attribute arrt into the embedding vector of the same dimension , HetGNN provides different solutions for different types of attributes, such as using one-hot for text attributes and CNN for images. After getting the embedding vector of each attribute of the node, the node embedding can be formulated as follows:


where , is the dimension of node embedding; is the vector transformation layer, which can be a fully connected layer; is the connection operation. The formula of BiLSTM is as follows:


where , etc. are learning parameters, , and are the forget gate vector, input gate vector and output gate vector; is the output hidden state.

This method can aggregate heterogeneous content to make node embeddings more expressive, and it is convenient to add content attributes of nodes.

Iv-B3 Aggregating heterogeneous neighbors of the same type

After aggregating the node contents of heterogeneous neighbors, their embedding representations of heterogeneous neighbors are generated. Each node has multiple types of heterogeneous neighbors, and each type has multiple heterogeneous neighbors. HetGNN has designed a neural network to aggregate the same type of nodes into an embedding vector. This part still uses BiLSTM to aggregate nodes of the same type and learn the complex relationships between them, so that the learned type embedding has a stronger expressive ability. The calculation of type embedding is formulated as follows:


where the BiLSTM model is the same as Formula (2).

Iv-B4 Aggregating different types

After getting the type embedding, we need to aggregate all types of embedding into a vector, which is the final embedding of node v. However, different types of heterogeneous neighbors have different effects on node v, so the attention mechanism is introduced. The calculation formula for the importance of different types of node v are as follows:



is the leaky version of a Rectified Linear Unit, and

is the attention parameter. is the typeset of heterogeneous nodes, is the set of , and is a heterogeneous node type in . After calculating the importance of each type to node v, the final embedding of node v is calculated as follows:


As the final embedding of the item node , it not only contains the transitions between items but also learns the information of other types of nodes to make it more expressive.

Iv-C Generating Session Embeddings

After learning the item embeddings with HetGNN, the session embeddings can be generated. For session s={v, v, v, …, v}, the calculation of pre-embedding vector is shown as follows:


where is the node embedding of node v; is zero vector without a fixed dimension; and is connection operation. Since the number of items in a session may be different, needs to be connected so that all pre-embedding session vectors have the same dimension.

Since the user’s long-term preferences and short-term preferences have different effects on the recommendation results, the attention mechanism is added to the model to obtain a hybrid session embedding that can express long-term and short-term preferences. In this paper, we first consider the local embedding of this session. is formulated as follows:


where is the embedding vector of the last item in the current session.

For the user’s long-term preferences, it is necessary to consider the transformation relationship between all items. In this paper, a soft attention mechanism is used to learn the global embedding . The global session embedding is formulated as follows:


where the matrice are the weights used to generate item embeddings. After obtaining the global embedding and local embedding of the session, the hybrid session embedding can be formulated as follows:


where the matrix is used to fuse and into the hybrid session embedding.

After generating session embedding, the score of each candidate item is calculated through the softmax layer, and then the model is trained by a backpropagation algorithm. The detailed training process of the model is shown in Algorithm



Algorithm 1 The training process with SR-HetGNN
1:Session set S, User set U, Item set; Number of user neighbors, Number of item neighbors, Number of session neighbors.
2:Initialize the model of SR-HetGNN .
3:Build the Heterogeneous Graph G=(V, S, U, E, E).
4:Generate the pre-embedding vectors for all nodes by DeepWalk.
5:for  do
6:     Sample a mini-batch of session sequence S = {s, s, s,…, s }.
7:     for each sdo
8:         for each v s do
9:              Sample heterogeneous neighbors by Restart-based Random Ralk (RWR).
10:              Get the aggregated embeddings of each heterogeneous node by (1)
11:              Get the aggregated embeddings of each heterogeneous type by (3)
12:              Calculate attention factor of each heterogeneous type by (4).
13:              Get final embedding of item by (5).
14:         end for
15:         Get local session embedding by (7).
16:         Get global session embedding by (8).
17:         Generate final session embedding by (9).
18:     end for
19:     Calculate scores candidate items through the Softmax Layer.
20:     Calculate

by cross entropy loss function.

21:     Updata of the model of SR-HetGNN .
22:end for

V Experiments and Analysis

In this section, we first introduce the data set used in the experiment, baselines for comparison, and evaluation metrics. Then, we give the experimental results and analyze the results.

V-a Dataset

Two real-world datasets, Diginetica111 and Tmall222, are used to evaluate the performance of the proposed model.

Diginetica comes from CIKM Cup 2016. Firstly, Diginetica is preprocessed to remove the data of anonymous users based on its transaction data. Because user nodes are needed when constructing heterogeneous graphs, anonymous users’ records in the data should be deleted. Secondly, we delete items that appear less than 5 times in the data set, and all sessions with only one item in the session. Finally, we split the data set and use the last few days of data as the test set, and the others as the training set. At the same time, users who do not exist in the train set are deleted in the test set to verify the influence of user information on the recommendation results, and the items that do not exist in the train set are also deleted. The final data set is as Table III.

Statistics Diginetica
# of items 16882
# of training sessions 149295
# of testing sessions 2346
# of users 42205
TABLE III: Diginetica Dataset

Tmall dataset comes from the Tianchi Dataset of Alibaba Cloud. This paper selects the add-to-favorite data of users in Tmall dataset within 4 months. Firstly, the Tmall dataset is preprocessed to remove the users whose operations are less than 20, the items that appear less than 10 times, and the session whose length is 1. Then, the dataset is divided and the data of the last 15 days is selected as the test set. As with the Diginetica data, users and items that do not exist in the train set are removed from the test set. The final dataset after preprocessing resulting dataset is shown in Table IV.

Statistics Tmall
# of items 10513
# of training sessions 75348
# of testing sessions 9175
# of users 13076
TABLE IV: Tmall Dataset

V-B Baselines

To evaluate the performance of this model, we compare our model with the following baselines:

  • BPR-MF [15] : It uses Matrix Factorization(MF) to learn user preferences, and it proposes a general optimization criterion BPR-opt for personalized sorting and applies it to MF.

  • FPMC [16] : Matrix Factorization (MF) is used to learn users’ preferences, and Markov Chain (MC) is used to model user’s sequential behavior by transition graph.

  • GRU4Rec [7] : It is an RNN-based recommendation model, an improvement of the basic GRU, and it is sensitive to the item sequence.

  • SR-GNN [22] : It constructs the session sequence into a homogenous graph and uses Graph Neural Network(GNN) to learn item embeddings.

V-C Evaluation Metrics

This paper uses the following evaluation metric to evaluate the model.

: Recall is widely used as a measure of recommendation. It represents the percentage of correctly recommended items in the sample. is defined as follows:


where is the set of targets for sessions in the test set, is the recommended target of a session, r(s) is the result of top-n recommendation. If is in the recommended result , otherwise, .

(a) User type
(b) Item type
(c) Session type
Fig. 4: Performance w.r.t. sampled neighbor size on Diginetica.
(a) User type
(b) Item type
(c) Session type
Fig. 5: Performance w.r.t. sampled neighbor size on Tmall.

V-D Parameter setup

Hyper-parameters are very important for model training. To a certain extent, they determine the quality of model training results. In our model, the number of heterogeneous neighbors of an item node is a very important parameter. This set of parameters includes the number of user neighbors, the number of item neighbors, and the number of session neighbors. This paper selects the optimal values of this set of parameters through a large number of experiments on Diginetica and Tmall dataset, and draws the curves based on the experimental results, as shown in Fig. 4 and Fig. 5.

According to Fig. 4, on the Diginetica dataset, the performance of the model is better when the number of user neighbors is 15, the number of item neighbors is 10, and the number of session neighbors is 1. However, it is different from Diginetica dataset that the model performs better on Tmall dataset when the number of user neighbors is 1, the number of item neighbors is 1, and the number of session neighbors is 15. The possible reason is that the model needs a different number of neighbors to extract information in different length session sequences. For example, on Diginetica dataset, the optimal value of the number of user neighbors is 15, while the optimal value of the number of user neighbors is 1 on Tmall dataset. It can be seen that, in different scenarios, the model uses too many heterogeneous neighbors to extract information, which may be disturbed by noisy nodes, thus reducing the performance of the model.

(a) Performance on Diginetica
(b) Performance on Tmall
Fig. 6: Performance w.r.t. learning rate.

As an important hyper-parameter in deep learning, the learning rate (lr) determines whether the objective function converges to the local minimum and when it converges to the minimum. In this paper, different learning rates are set to train the model, the final results as shown in Fig. 6. It can be seen from the figure that when lr, the performance of SR-HetGNN is best on both Diginetica and Tmall datasets. Therefore, lr is chosen as the optimal parameter of the learning rate.

To understand the effect of value in top-n on the recommendation performance, this paper compares with SR-GNN in different top-n recommendations, and draws the curve according to the experimental results in Fig. 7. It can be seen from Fig. 7 LABEL:sub@fig:topna that Recall@n of SR-GNN is higher than that of SR-HetGNN when n40, Recall@n of SR-HetGNN is better than that of SR-GNN when n40 on Diginetica. Fig. 7 LABEL:sub@fig:topnb shows that the performance of SR-HetGNN is always superior to that of SR-GNN on Tmall. With the increase of , the performance advantage of SR-HetGNN is more obvious.

(a) Comparison on Diginetica
(b) Comparison on Tmall
Fig. 7: Comparison of top-n recommendation results.

During the training process of the model, the objective function will get the optimal value at a certain moment. At this time, it is meaningless to continue training the model, because the performance of the model will hardly improve significantly. So it is important when to stop training the model to save time. In this paper, the loss curve of the model is drawn to judge when the model converges. The loss curve in the training process is shown in Fig.


Fig. 8: Convergence process of the model.

As shown in Fig. 8

, the loss values decrease rapidly in the first four epochs, and the model is converged at the 10th epoch on Diginetica and Tmall dataset. Therefore, the number of the epoch can be set to 10 when training the model.

V-E Comparison with Baselines

In order to test the performance of our model, it is compared with other recommendation methods. The results are shown in Fig. 9 and Fig. 10.

Fig. 9: Performance of different recommendation algorithms on Diginetica.
Fig. 10: Performance of different recommendation algorithms on Tmall.

It can be seen from the figures that SR-HetGNN gets better results in both Recall@40 and Recall@50. Because SR-HetGNN constructs the session sequence into a heterogeneous graph containing item nodes, user nodes, and session nodes. The heterogeneous graph contains complex dependencies, which can show the transitions between items, the connections between items and users. At the same time, HetGNN can learn item embeddings, which not only contain complex transformation relationships of items but also contain the information of users and sessions, which makes the generated session embeddings more expressive.

There are two traditional recommendation methods in the baseline methods adopted in this paper, BPR-MF, and FPMC. BPR-MF uses Matrix Factorization (MF) to learn user preferences, but MF does not work well on dealing with serialized item relationships, so its final results are not good; FPMC combines Matrix Factorization and Markov Chain (MC), where MF is used to learn user preferences, MC can construct the user’s sequence behavior, but this recommendation method is difficult to learn the transfer of user preferences, that is user’s short-term preferences. The performance of the FPMC model in the data set used in this paper is better than that of BPR-MF. The fundamental reason is the user sequence behavior established by the Markov Chain.

In this paper, two neural network methods, GRU4Rec and SR-GNN, are used for comparison. The core of GRU4Rec is GRU. GRU is a variant of Recurrent Neural Network (RNN). RNN is sensitive to serialized data and can learn the sequence patterns of user behaviors to produce better recommendations. However, the recommendation algorithm based on RNN can only simulate the one-way transfer between consecutive items, and can not consider the dependences of remote items, thus ignoring some information in the session sequence. SR-GNN constructs a homogenous graph of items in session sequence, and it uses Graph Neural Network to learn item embeddings. The Graph Neural Network can learn the complex transformation relationship between items. It not only extracts the relationship between different items in the same session but also captures the relationship between items in different conversations, which makes the learning item embeddings more expressive. It can be seen from the experimental data in Fig. 9 and Fig. 10 that SR-GNN performs better than GUR4Rec. The reason is that the Graph Neural Network learns the complex transitions between items through the constructed graph. However, SR-GNN is still limited. That is it constructes a homogenous graph based on the items in the session sequence. This homogenous graph only has the transitions between items but ignores other information in the session sequence, especially user information. In general, SR-HetGNN can learn the user’s long-term preferences and short-term preferences, which cannot be done by the matrix factorization method that can only learn the user’s long-term preferences. Also, the Heterogeneous Graph Neural Network learns the complex dependencies between nodes in the Heterogeneous Graph, and its expressive ability in serialization is better than the Markov chain. At the same time, the heterogeneous graph constructed by SR-HetGNN can express more information than the homogenous graph established by SR-GNN. As a result, SR-HetGNN performs better.

V-F Ablation Experiment

To further verify the effectiveness of different components of SR-HetGNN, this paper compares SR-HetGNN with the following methods:

  • The nodes of session type are ignored when building the heterogeneous graph (SR-HetGNN-S).

  • The pre-embedding vectors generated by the deep walk algorithm are directly inputted to the attention layer. That is to say, the heterogeneous graph neural network is not used (SR HetGNN-Het).

The experimental results are shown in Table V. It can be seen from the table that the results of SR-hetGNN are the best, which indicates that these two components of SR-hetGNN have a positive impact on the recommendation results. Using session-type nodes enables the recommender system to capture the interdependence of items between different sessions. At the same time, the heterogeneous graph neural network can aggregate the information of neighbor nodes and promote the development of the model.

Model Diginetica Tmall
Recall@40 Recall@50 Recall@40 Recall@50
# SR-HetGNN-Het 59.16 61.72 25.91 27.38
# SR-HetGNN-S 59.51 62.28 26.10 27.51
# SR-HetGNN 60.87 64.24 26.82 28.45
TABLE V: Comparison of ablation experiments

Vi Conclusions

In this paper, Heterogeneous Graph Neural Network (HetGNN) is applied to the session-based recommendation. First, the session sequence is constructed into a heterogeneous graph (HetG) containing multiple types of nodes. Then, the HetGNN is used to learn the item embeddings containing the complex transitions of items and user information. Finally, the Attentional Network is used to generate the session embeddings with powerful expressions. SR-HetGNN not only captures the complex transformation relationship between items but also extracts user information from session sequences so that the learned session embeddings can express users’ specific preferences. Experiments show that this method is superior to other popular methods.

In this paper, two e-commerce datasets are used, and we hope that SR-HetGNN can be applied to different types of datasets. Due to the limitation of the fields in the dataset, only three types of nodes are considered when constructing the Heterogeneous Graph. In the future, we will work on real-world datasets with multiple types of nodes, so that the learned session embedding will have a more powerful expression ability. At the same time, SR-HetGNN did not optimize the ranking of the recommended results. In future work, we will focus on adding sorting optimization to the model.


  • [1] J. M. Barajas and X. Li (2005) Collaborative filtering on data streams. See DBLP:conf/pkdd/2005, pp. 429–436. Cited by: §II-A.
  • [2] P. Cui, X. Wang, J. Pei, and W. Zhu ieee. Cited by: §II-C.
  • [3] Y. Dong, N. V. Chawla, and A. Swami (2017) Metapath2vec: scalable representation learning for heterogeneous networks. See DBLP:conf/kdd/2017, pp. 135–144. Cited by: §II-C.
  • [4] P. Goyal and E. Ferrara Graph embedding techniques, applications, and performance: A survey. knowl.. Cited by: §II-B.
  • [5] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. See DBLP:conf/kdd/2016, pp. 855–864. Cited by: §II-C.
  • [6] C. Guo, M. Zhang, J. Fang, J. Jin, and M. Pan (2020) Session-based recommendation with hierarchical leaping networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1705–1708. Cited by: §II-B.
  • [7] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk (2015) Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939. Cited by: §II-B, 3rd item.
  • [8] B. Hu, C. Shi, W. X. Zhao, and P. S. Yu (2018)

    Leveraging meta-path based context for top- N recommendation with A neural co-attention model

    In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1531–1540. Cited by: §II-C.
  • [9] Y. Koren and R. M. Bell (2015) Advances in collaborative filtering. See DBLP:reference/sp/2015rsh, pp. 77–118. Cited by: §II-A.
  • [10] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma (2017) Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1419–1428. Cited by: §II-B.
  • [11] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang (2018) STAMP: short-term attention/memory priority model for session-based recommendation. See DBLP:conf/kdd/2018, pp. 1831–1839. Cited by: §II-B.
  • [12] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) DeepWalk: online learning of social representations. See DBLP:conf/kdd/2014, pp. 701–710. Cited by: §II-C, §IV-A.
  • [13] R. Qiu, Z. Huang, J. Li, and H. Yin Exploiting cross-session information for session-based recommendation with graph neural networks. ACM. Cited by: §II-B.
  • [14] X. Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han (2014) ClusCite: effective citation recommendation by information network-based clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 821–830. Cited by: §II-C.
  • [15] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidtthieme BPR: bayesian personalized ranking from implicit feedback. arxiv. Cited by: §II-A, 1st item.
  • [16] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme (2010) Factorizing personalized markov chains for next-basket recommendation. See DBLP:conf/www/2010, pp. 811–820. Cited by: §II-A, 2nd item.
  • [17] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu Pathsim: meta path-based top-k similarity search in heterogeneous information networks. proceedings. Cited by: §I.
  • [18] Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu PathSelClus: integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM. Cited by: §II-C.
  • [19] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) LINE: large-scale information network embedding. See DBLP:conf/www/2015, pp. 1067–1077. Cited by: §II-C.
  • [20] S. Wang, L. Cao, and Y. Wang A survey on session-based recommender systems. arxiv. Cited by: §I.
  • [21] Z. Wang, W. Wei, G. Cong, X. Li, X. Mao, and M. Qiu (2020) Global context enhanced graph neural networks for session-based recommendation. See DBLP:conf/sigir/2020, pp. 169–178. Cited by: §II-B.
  • [22] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2019) Session-based recommendation with graph neural networks. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 33, pp. 346–353. Cited by: §I, 4th item.
  • [23] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu ieee. Cited by: §II-B.
  • [24] D. Yang, Z. Wang, J. Jiang, and Y. Xiao (2019) Knowledge embedding towards the recommendation with sparse user-item interactions. In ASONAM ’19: International Conference on Advances in Social Networks Analysis and Mining, Vancouver, British Columbia, Canada, 27-30 August, 2019, pp. 325–332. Cited by: §II-C.
  • [25] F. Yu, Y. Zhu, Q. Liu, S. Wu, L. Wang, and T. Tan TAGNN: target attentive graph neural networks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pp. 1921–1924. Cited by: §II-B.
  • [26] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla (2019) Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 793–803. Cited by: §I, §IV-B1.
  • [27] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: A review of methods and applications. corr abs/1812.08434. External Links: 1812.08434 Cited by: §II-B.