Introduction
Sequential recommendation (SR) is a vital task in recommender systems. SR aims at predicting successive items that the user is likely to interact with by modeling sequential and transitional correlation [18, 5, 6, 23, 8, 12].
Modeling the item correlation within a user’s sequence of interacted items or across different users’ sequences of interactions lies at the core of modern SR [6, 23, 8, 12], namely intrasequence and intersequence item correlation. Figure 1 shows an intuitive example from MovieLens, concretely, we term the intrasequence item correlation as the sequential dependence of two items within a user’s sequence of interacted items. For example, movie 1721 and 1682 are intracorrelated (with order 1) because they are sequentially correlated in User 8’s behavior sequence. We term intersequence item correlation as the occurrence of two items in difference users’ sequences and there exists a path between these two items with some intermediate nodes. For example, movie 1721 and 1784 are intercorrelated (with order 4) because they exist in user8’s and user 201’s sequences and there exist a path () between them. In fact, we observe in the log data ^{1}^{1}1https://grouplens.org/datasets/movielens/1m/ that user 90 actually watches movie 1682 and 1784 after she watched movie 1721. This observation verifies that both intra and intersequence item correlation are informative for SR.
However, existing works for SR mainly put effort in modeling intrasequence correlation yet neglect the effect of intersequence correlation. The authors in [6, 15, 7]
apply recurrent neural networks (RNNs), which aggregate the sequence of the user’s interacted items to capture sequential correlation among the items. Different from RNNbased approaches,
[23, 31, 29]treat the embedding matrix of items in a sequence as an image and apply convolutional neural networks (CNNs) to model sequential correlation. However, the above RNNbased and CNNbased approaches do not model the different impacts of the items consumed at different time steps on current decision. Hence, the authors in
[8, 21, 32] adopt attention networks to differentiate and learn the contribution of each individual item in a sequence to model the user interest when making predictions on next items. To obtain accurate item embedding and take complex transitions of items into account, Wu et al. [27] propose SRGNN for sessionbased recommendation. In addition, gated networks [12] and neural variational models [28] are also utilized in SR. However, the above mentioned methods mainly focus on modeling the intrasequence item correlation within each individual sequence and the intersequence item correlation across different sequences is neglected. Though the intrasequence item correlation is vital, we argue that explicitly modeling intersequence item correlation is also critical, as it not only captures the users’ general tastes, but also can help remedy the data scarce issue in SR[5].To the best of our knowledge, only a few existing studies improve the recommendation quality by considering both intra and intersequence item correlation in SR [18, 5, 24]. FPMC [18]
applies a firstorder Markov Chain (MC) to model users’ sequential behavior and utilizes matrix factorization to learn the intersequence item correlation. Later, Fossil is proposed to address the data scarce problem by utilizing highorder Markov chain and similarity models
[5]. Recently, Wang et al. propose Collaborative Sessionbased Recommendation Machine (CSRM) and consider intersequence correlation between different sessions. However, the intersequence information is not fully exploited in these works: (i) For FPMC and Fossil, the order of the intersequence item correlation is limited by the latent method, which is the MF and similarity model respectively; (ii) For CSRM, the intersequence correlation is considered in an implicit way, a simple sessionlevel nearest neighborbased approach.Therefore, to make better use of intersequence item correlation, in this paper, we propose the InterSequence enhanced framework for personalized Sequential Recommendation (ISSR), where both intra and intersequence item correlation are considered and encoded in two different modules. Figure 2 shows the workflow of the framework. In particular, intersequence item correlation is depicted with graphs. Graph neural networks are used to propagate the information along the paths between any two items. We choose graph neural network for its ability in modeling highorder item correlation and for its promising performance in information propagation [26].
In summary, the main contributions are as follows:

We propose the InterSequence enhanced framework for personalized Sequential Recommendation (ISSR), which integrates both the intrasequence and intersequence item correlation.

In the intersequence item correlation encoder, graph neural networks are applied to encode highorder intersequence item correlation which is able to gather information from different sequences in an explicit manner.

We conduct experimental studies on three realworld largescale datasets. Extensive experimental results shows that 1) ISSR outperforms many stateofthe art models. 2) The performance of classic intrasequence based models can be boosted significantly by adding up our proposed intersequence item correlation encoder. 3) The application of graph neural network in modeling intersequence correlation performs better than loworder, e.g. Matrix Factorization(MF)based, methods.
Related Work
In this section, we review both the conventional and the deep learningbased methods for sequential recommendation.
Conventional methods. Two categories of conventional methods can be applied for sequential recommendation. The first category, such as Matrix Factorization [10] and knearest neighbor [4] methods, relies on computing useritem or itemitem similarities for recommendation. However, this line of works ignore the sequential patterns in users’ behavior. In the second category, such as shani et al. [20] tries to model itemitem transitions in sequences with firstorder Markov chains to capture the sequential patterns. And the authors in [18] considers both useritem similarities and firstorder itemitem transitions for sequential recommendation (FPMC). For better capturing user’s general interest and sequential patterns, Wang et al. [25] extend FPMC by using a hierarchical structure to learn uses representation. Moreover, he et al. [5] improve FPMC by utilizing highorder Markov chains to solve the sparsity problem in sequential recommendation. However, the above MCbased methods only model the intrasequence interest between adjacent interactions.
Deep learningbased methods. Recently, benefit from the powerful feature representation ability, deep learningbased methods are increasingly popular in sequential recommendation. Recurrent Neural Network is the most popular technique for sequential recommendation [6, 7, 16], due to its inherent ability for modeling sequential dynamics. The authors in [6]
utilize Gated Recurrent Units (GRU) to model the sequential dynamics for sessionbased recommendation, and use sessionparallel minibatches technique to train the model. What’s more, an improved version is proposed in
[7], where a novel ranking loss function and an efficient sampling strategy are proposed. In addition to the RNNbased methods, Convolutional Neural Network (CNN) is also adopted for sequential recommendation
[23, 31, 29]. In [23], the researchers embeds the recent engaged items into an “image” in the latent space, then employ different convolutional kernels to extract sequential patterns. And the authors in [31] improves the work in [23] by applying dilated convolutional layers and residual block structure such that the performance can be improved, especially for long sequences. Moreover, Xu et al. [29] combine RNN and CNN to learn user long and shortterm interest for sequential recommendation, where the hidden states of the RNN layer is the input the of the CNN layer. However, such RNN and CNNbased methods always encodes the user interactions into hidden states or latent factors without considering the different impacts of the items consumed at different time steps on current decision. Therefore, the attention based models, which exhibit promising performance in sequence learning, are also utilized in sequential recommendation[8, 21]. In addition, the authors in [12] propose to utilize gated network for sequential recommendation, where a feature gating layer and an instance gating layer are employed to select what item features can be passed to the downstream layers from the feature and instance levels, respectively. Recently, Graph Neural Networks (GNN) gains great attentions in recommendation community [14, 22, 3, 33, 27], and the authors in [27]firstly employ GNN for session based recommendation. It models the sequences of items for each sessions separately, however, the ability of capturing the crosssequence interest among the items is limited. Moreover, in other lines of work, transfer learning
[13] and variational model [28] are also utilized for sequential recommendation.As we can see, the majority of the conventional and deep learningbased methods for sequential recommendation focus on intrasequence interest within each individual sequence, but neglect the intersequence interest across different sequences. Different from above methods, our proposed MGSR coordinates the intrasequence interest and intersequence interest modeling in sequential recommendation.
Problem Formulation
Let and denote the set of users and the set of items, respectively. Given the sequences of interacted items from all users, the goal of sequential recommendation is to recommend a list of items from to each user
such that the user is most likely to interact with the recommended items. For a specific user, our framework outputs the probabilities for all candidate items, which represent how likely she will engage with the items based on her engaged sequence of items.
Methodology
As shown in Figure 2, the proposed framework ISSR consists of an intersequence item correlation encoder, an intrasequence item correlation encoder, and a prediction decoder. Firstly, we model the intersequence item correlation with two graphs, which are the useritem bipartite graph [26] and the itemitem cooccurrence graph [11]. Based on these graphs, the highorder intersequence item correlation can be captured through stacking multiple GNN layers. Although the bipartite graph and cooccurrence graph have already been utilized in recommender systems, we are the first to combine the two graphs to exploit the highorder intersequence item correlation in sequential recommendation scenario. Next, an intrasequence item correlation encoder is developed, which prefuse the intersequence item correlation information with their intrasequence sequential correlation and temporal dynamics. And we then integrate the item representations, which comprehensively captures both the inter and intrasequence item correlation, to generate the representation of the user’s current interest. Finally, in the prediction decoder, the user’s preference on different items is computed based on the user’s interest representation. Each part will be elaborated in the following.
Intersequence Item Correlation Encoder
To obtain the informative intersequence item correlation, we propose an item correlation encoder, which is specified in Figure 3. As Figure 3
shows, ISSR exploits the item correlation from both the useritem bipartite graph and the itemitem cooccurrence graph. In addition, ISSR also considers the residual connection to preserve the original item representation. Finally, we generate the integrated item representation by fusing the three types of item information.
Intersequence Item Correlation from UserItem Bipartite Graph.
The useritem bipartite graph contains two types of nodes, namely the user nodes and the item nodes. An edge exists between a user and an item if the user interacted with the item. For clarity, the adjacent nodes of a target node in the graph is defined as the hop neighbors of the target node. Particularly, for each node in the graph, denotes the set of 1hop neighbors of . Thus, the highorder intersequence item correlation can be seized via multiple hops on the graph through the user nodes. For easy to follow, a path connected with multiple item nodes and user nodes is highlighted in the bipartite graph (the top left of Figure 3), where the correlation between item nodes located in the start and the end of the path can be captured^{2}^{2}2Note that the arrows only highlight the paths, and the two graphs are undirected.. We apply graph convolutional network [30] on the useritem bipartite graph (denoted as GCN) to aggregate the neighborhood information. As a result, the item correlation from the hop neighbors can be captured via stacking multiple GCN layers.
We denote the initial embedding of item node as with dimension (or for the user node
) and the hidden representation of
at layer as (or as for the user node ). In GCN, a node embedding depends on both the node information and the graph structure around it. We first aggregate the neighborhood information of the target node (as shown in Eq. (1)), and then integrate the aggregated neighborhood information with the target node (as shown in Eq. (2)).Specifically, we represent the neighborhood of an item node at layer (or a user node at layer), by applying an aggregate function on all its neighbors at layer():
(1) 
(or ) is the representation of the neighborhood of item node (or of user node ) at layer(). and (or, and ) are the weight matrix and bias of the item (or user) aggregator at layer, respectively. A pooling function (such as weighted sum, weighted average, and etc.) is performed to aggregate the neighbor representations.
is an activation function.
Note that different neural networks are utilized to transform the representations of the user nodes and item nodes from lower layers to higher layers, which differs form existing GCN works [30, 26], where normally a unified network is utilized. The reason is that user nodes and item nodes in bipartite graphs are intrinsically different and such difference should be considered when we aggregate information from user nodes or from item nodes.
After generalizing the representation of the target node’s neighborhood, we integrate such neighborhood representation with the current representation of the target node, as:
(2)  
and (or, and ) are the transformation weight matrix and bias of the item (or user) at layer, respectively. represents concatenation.
Intersequence Item Correlation from ItemItem Cooccurrence Graph
The GCN captures the item correlation through multiplehop neighbors in the useritem bipartite graph. In addition, the item correlation can also be modeled from an itemitem graph. Though there exists multiple methods to construct such an itemitem graph. We utilize item cooccurrence information in users’ behavior sequences to build it following [11], which can be treated as a complementary of the useritem bipartite graph to directly model the frequency of item dependence.
As shown in Figure 3, the cooccurrence graph includes a set of nodes where each node represents an item. An edge connects two nodes, if these two items are adjacent in a certain user’s behavior sequence. The weight of an edge represents the number of times that the two items occur in users’ behavior sequences, and such weight is utilized for neighborhood sampling, which we specify it later. We apply the graph convolutional network on such itemitem cooccurrence graph where we denote it as GCN. GCN works differently from GCN in the following two aspects. First, the itemitem cooccurrence graph includes only one type of nodes, so that GCN has neither aggregator weight matrices nor transformation weight matrices for user nodes. Second, the itemitem cooccurrence graph associates weights on edges, which results in a different neighborhood sampling strategy as we will discuss in ‘Network Training’ section. Due to the limit of space, we will not elaborate the details of GCN. We represent the embedding of an item node in this itemitem cooccurrence graph as .
In summary, the useritem bipartite graph and the itemitem cooccurrence graph are designed to capture the intersequence item correlation, which act as complementary roles.
Residual Connection
Inspired by [8], we introduce the residual connection component in the intersequence item correlation encoder, which plays a role of preserving the original item representations: . where is the original embedding of the item . The residual connection transforms the original embedding to via a hidden layer with and .
Information Fusion
As presented in Figure 3, the item representations learned from the two graphs and the residual connection are fused through an information fusion function . More formally, the final representation of an item is generated as: , and varies according to different application scenarios. In this paper, we empirically choose elementwise sum as the fusion function (sum pooling in Figure 3) due to its superior performance compared to the other operations, such as concatenation and elementwise mean and gated networks [2].
Intrasequence Item Correlation Encoder
In the intrasequence item correlation encoder, ISSR aims to model the intrasequence item correlation with considering the intersequence item correlation captured in the intersequence item correlation encoder, and finally integrate the user’s current interest over the candidate items. As presented in Figure 2, a GRU layer is performed to capture the intrasequence item sequential correlation among the items in a sequence, where the hidden states in the GRU layer represent user’s interests at different time steps. Then an attention network is utilized to aggregate the user’s interests at different time steps, generating the final representation of the user’s current interest.
The GRU Layer
The Attention Layer
User’s interests at different time steps contribute differently to the user’s current decision. Moreover, different users have different levels of sensitivity to the temporal dynamics. Due to such motivations, we devise a personalized attention network to capture the evolving interests of each particular user.
Specifically, the inputs of the attention network are the sequence of hidden states generated by the GRU layer, e.g., , and the embedding of the user
. With multilayer perceptrons, the attention network generates a weight for each input hidden state representing the contribution of the user’s interest at that time step (represented by the input hidden vector) on the user’s final decision.
(3) 
As shown in Eq. (3), is the weight of the hidden state on the final decision of user . are parameters of the multilayer perceptrons. For simplicity, we only present two layers of parameters in multilayer perceptions in Eq. (3), which is also the case in our implementation. The user’s interest, considering both the intrasequence item sequential correlation and the temporal dynamics, is computed as:
(4) 
As shown in [27], the latest engaged item is able to reflect user’s most recent interest. Therefore, we integrate the latest hidden state into user’s final interest representation:
Prediction Decoder
After obtaining the representation of the user’s current interest, we adopt the classic matrix factorization approach to infer the user’s preference on the items. The prediction score of user on item is the inner product of the user’s interest and the item embedding . The probability that the user will interact with the item is defined as the softmax of the prediction score: .
Network Training
Loss Function
To generate the training data, we extract consecutive items in a sequence as user’s behavior sequence and the following items as positive samples. We also sample a number of items that the user did not interacted with as negative samples following [8]. We adopt cross entropy as the training loss to describe the discrepancy between the predicted probabilities and the ground truth labels as shown in Eq. (5) .
(5) 
where is the number of instances, if the predicted item is engaged by the user; otherwise, .
Neighborhood Sampling
We adopt neighborhood sampling techniques to facilitate the network training. Specifically, for GCN on the bipartite graph, we sample neighbors for each node uniformly at random. For GCN on the cooccurrence graph, we apply importance sampling, which samples neighbors for each node according to the weights of the edges.
Experiments
In this section, we compare the proposed framework with the stateoftheart methods on three realworld datasets. We also comprehensively analyze the results of the proposed framework under different experimental settings.
Datasets
Experiments are conducted on the following three public benchmark datasets: MovieLens (1M) (abbreviated as ML (1M)), Steam [8], MovieLens (20M) (abbreviated as ML (20M)), which is of different scales and sparsity. We process the three datasets following the existing research [23], in which all the ratings are treated as implicit feedbacks. And the items in a user’s sequence are sorted in chronological order. We hold the first 70%, following 10% and last 20% of items in each user’s sequence as the training set, the validation set and the testing set, respectively^{3}^{3}3Note that the data partition strategy is different from the original SASRec [8], they treat the last item in the whole sequence of each user as the test set, the second last item as validation set, and all the previous items as training set. So their training set is much larger than the one utilized in this paper while the test set is much smaller. Moreover, in HGN, the authors treat all the ratings less than four as negative samples, and then filter the noise data. However, in Caser, the authors treat all the ratings as implicit feedbacks. Therefore, the size of the ML (20) dataset in HGN is only half of that in this paper.. The statistics of the three processed datasets are summarized in Table 1.
Datasets  #users  #items  avg.#items/user  sparsity  #interactions 

ML (1M)  6,040  3,416  165.50  95.16%  999,611 
Steam  334,730  13,047  11.01  99.92%  3,686,172 
ML (20M)  138,493  15,451  144.16  99.07%  19,964,833 
Experimental Settings
Evaluation Metrics
Following [23], we adopt Recall@, nDCG@, HR@ and MRR@ for to evaluate the effectiveness of different methods.
Compared Methods
To demonstrate the superiority of the proposed ISSR, we compare it to several stateoftheart baselines. For easy to follow, we summarize them into two categories according to whether they can model the inter and intrasequence item correlation, namely (1) only intrasequence based methods: GRU4Rec [6], Caser [23], SASRec [8], SRGNN [27] and HGN [12]; (2) both inter and intrasequence based methods: FPMC [18], Fossil [5] and CSRM [24].
Methods such as the popularity [1] and BPRMF [17] which we treat as only intersequence based methods are omitted in comparison, because they are proven to be inferior to the compared methods as they lack the ability of modeling intrasequence sequential correlation [23, 12].
We implement FPMC, Fossil and ISSR using TensorFlow with Adam
[9] optimizer. We utilize the source code of GRU4Rec, Caser, SASRec, SRGNN, CSRM, HGN to reproduce their performance.Dataset  Model  Recall@5  Recall@10  nDCG@5  nDCG@10  HR@5  HR@10  MRR@5  MRR@10  

ML (1M)  Intra  GRU4Rec  0.0707  0.1232  0.4661  0.5012  0.1558  0.2507  0.0792  0.0918 
Caser  0.0764  0.1326  0.4783  0.5176  0.1613  0.2624  0.0826  0.0961  
SASRec  0.0812  0.1320  0.4778  0.5156  0.1726  0.2773  0.0891  0.1018  
SRGNN  0.0834  0.1385  0.4859  0.5234  0.1854  0.2906  0.1034  0.1152  
HGN  0.0816  0.1378  0.4973  0.5295  0.1836  0.2945  0.1012  0.1154  
intra  FPMC  0.0653  0.1095  0.4183  0.4632  0.1567  0.2554  0.0798  0.0904  
and  Fossil  0.0701  0.1213  0.4555  0.4925  0.1611  0.2501  0.0803  0.0912  
inter  CSRM  0.0815  0.1381  0.4901  0.5238  0.1768  0.2836  0.0965  0.1107  
ISSR  
Improv.  11.99%  12.85%  7.66%  6.36%  16.34%  12.67%  14.70%  15.86%  
Steam  Intra  GRU4Rec  0.0381  0.0714  0.0485  0.0702  0.0404  0.0755  0.0134  0.0181 
Caser  0.0408  0.0744  0.0507  0.0735  0.0459  0.0825  0.0158  0.0206  
SASRec  0.0403  0.0737  0.0502  0.0726  0.0447  0.0821  0.0151  0.0199  
SRGNN  0.0415  0.0759  0.0548  0.0782  0.0472  0.0837  0.0181  0.0232  
HGN  0.0437  0.0767  0.0588  0.0803  0.0495  0.0851  0.0200  0.0247  
intra  FPMC  0.0354  0.0612  0.0491  0.0663  0.0380  0.0658  0.0174  0.0210  
and  Fossil  0.0408  0.0745  0.0504  0.0721  0.0455  0.0800  0.0171  0.0232  
inter  CSRM  0.0427  0.0747  0.0573  0.0773  0.0453  0.0825  0.0197  0.0239  
ISSR  
Improv.  12.59%  13.69%  9.52%  10.09%  11.31%  14.22%  11.94%  12.15%  
ML (20M)  Intra  GRU4Rec  0.0584  0.1031  0.3171  0.3612  0.0963  0.1630  0.0477  0.0564 
Caser  0.0644  0.1127  0.3371  0.3829  0.1045  0.1750  0.0530  0.0623  
SASRec  0.0624  0.1102  0.3341  0.3794  0.1177  0.1940  0.0609  0.0756  
SRGNN  0.0752  0.1271  0.3643  0.4060  0.1349  0.2142  0.0710  0.0814  
HGN  0.0812  0.1350  0.3749  0.4135  0.1468  0.2288  0.0780  0.0888  
intra  FPMC  0.0569  0.0982  0.3054  0.3459  0.0928  0.1515  0.0476  0.0548  
and  Fossil  0.0607  0.1030  0.3152  0.3681  0.0989  0.1589  0.0483  0.0562  
inter  CSRM  0.0724  0.1259  0.3625  0.4054  0.1370  0.2102  0.0650  0.0759  
ISSR  
Improv.  8.74%  10.00%  8.64%  7.67%  10.29%  12.28%  8.72%  9.57% 
Parameter Settings
The best hyperparameters for each model are found from exhaustive search on the validation set. In particular, for Caser, the number of the vertical and horizontal filters are searched from . For SASRec, the number of selfattention blocks is searched from . For SRGNN and HGN, we follow the best parameter settings in the original paper. For CSRM, the memory size is searched from for different dataset. For our proposed framework ISSR, the neighborhood information is aggregated from at most 3hop due to the efficiency concern. The best performance is observed when we consider 2hop neighbors in GCN and 1hop neighbors in GCN. We follow the settings in [23, 12] to set the sequence length to be 5 (i.e., ) and the number of subsequent items to be 3 (i.e., ) for all methods, unless stated otherwise, for fair comparison.
Overall Performance Comparison
Table 2 summarizes the overall performance of all the compared models on the three datasets, where the underlined numbers are the best results of the baselines, and the bold numbers are the best results of all models. indicates the statistically significant improvement [19]
(i.e., twosided ttest) with pvalue
over the best baselines. The row “%Improv.” indicates the relative improvement of ISSR compared to the best baselines. We have the following observations.ISSR is more superior than HGN, SRGNN, SASRec, Caser and GRU4Rec. This is due to three possible reasons. First, except for capturing the intrasequence item correlation with GRU and attention layers, the two graphs constructed by ISSR capture the intersequence item correlation across different users. Differently, no matter how the session graph is constructed by SRGNN or the other methodology utilized in baselines, they only consider the intrasequence item correlation within each individual session. Second, residual connections in ISSR preserve the original item representations, served as a complementary to the item representations produced by the graph neural network, which makes the item representations more informative. Third, the attention network in ISSR is personalized based on the fact that users may have different levels of sensitivity to temporal dynamics, which is not considered by the baseline methods.
ISSR outperforms FPMC, Fossil and CSRM, the reasons are as follows: (1) for FPMC and Fossil, ISSR captures highorder intersequence item correlation from the two graph neural networks with multiple hops, whereas, FPMC and Fossil only capture loworder intersequence item correlation with MF and similarity based model; In addition, FPMC and Fossil both utilize Markov Chains to model the intrasequence item sequential correlation, which is proven to be inferior to deep neural network based model [6, 23, 8]; (2) for CSRM, it captures intersequence correlation with a simple sessionlevel nearest neighborbased approach instead of the finegrained itemlevel correlation we addressed in ISSR.
Another observation is that, on the sparse Steam dataset, Fossil achieves comparable performance to Caser and SASRec, or even slightly better w.r.t. Recall@10 etc. evaluation metrics. This observation also indicates that explicitly modeling the intersequence item correlation can remedy the data scarce issue to some extent.
To summarize, ISSR shows consistent and significant improvement over the compared baselines on all the (dense or sparse, small or large scale) conducted datasets in terms of Recall, nDCG, HR and MRR, which demonstrates the superiority of our proposed framework.
Effect of Intersequence Item Correlation Encoder for Sequential Recommendation
To verify the effectiveness of the proposed intersequence item correlation encoder, we incorporate the proposed intersequence item correlation encoder into the existing RNNbased, CNNbased and attentionbased sequential recommendation methods, respectively. The results are reported in Table 3. From Table 3, we observe that these models are enhanced after equipping the proposed intersequence item correlation encoder. For instance, we achieve 25.0%, 14.8% and 9.0% improvements in terms of Recall@10 on MovieLens (1M) datasets. The observation demonstrates that modeling the intersequence item correlation as in the proposed ISSR can boost the performance of existing SR methods.
variants  ML(1M)  Steam  

Recall@10  nDCG@10  Recall@10  nDCG@10  
GRU4Rec  0.1232  0.5012  0.0714  0.0702 
Inter+GRU4Rec  0.1545  0.5609  0.0856  0.0872 
SASRec  0.1320  0.5156  0.0737  0.0726 
Inter+SASRec  0.1558  0.5621  0.0864  0.0878 
Caser  0.1326  0.5176  0.0744  0.0735 
Inter+Caser  0.1445  0.5460  0.0815  0.0828 
Effect of the Graphs in InterSequence Item Correlation Encoder
To verify the effectiveness of the bipartite and the cooccurrence graphs in capturing highorder intersequence item correlations, we design several variants of the intersequence encoder. These variants are Only intra, MF+intra, Co+intra, Bi+intra and ISSR as shown in Tables 4 where ISSR is our proposed framework. They represent null, MFbased low order, cooccurrence graph based high order, bipartite graph based high order and dual graph based high order intersequence encoder, respectively. From Table 4, we have the following observations. (1) Only intra performs obviously worse than the other variants which demonstrates the indispensability of the intersequence encoder. (2) The performance of MF+intra is inferior to the graph based intersequence encoder due to MF’s inability in explicitly capturing higher order item correlations across different sequences. Graph neural networks can model the highorder item correlations across different sequences with multiple hops. (3) The minor difference observed between Bi+intra and Co+intra indicates useritem bipartite graph and itemitem cooccurrence graph contribute almost equally to the final performance, and combining the two graphs together will enhance the performance.
variants  ML(1M)  Steam  

Recall@10  nDCG@10  Recall@10  nDCG@10  
Only intra  0.1378  0.5244  0.0747  0.0772 
MF+intra  0.1412  0.5265  0.0766  0.0787 
Co+intra  0.1541  0.5604  0.0852  0.0865 
Bi+intra  0.1548  0.5608  0.0856  0.0869 
ISSR  0.1563  0.5632  0.0872  0.0884 
Effect of Dimensionality of Item Embedding
Figure 4 presents the performance of all the compared models varying the dimension of item embedding. Due to the space limit, we only present the Recall@10 on ML(1M) and Steam. As we increase , the performance of all the models increases until reaching the best values. Then, the performance drops or keeps stable as we continue increasing . The reason is that the capacity of the models increases as the dimensionality of item embeddings is enlarged. However, after reaching its peak value, the model capacity will not keep increasing even if the dimensionality of item embeddings continues increasing since the model capacity is limited by the amount of informative data. ISSR reaches its best performance when setting .
Conclusion
In this paper, we propose a InterSequence enhanced framework for personalized Sequential Recommendation (ISSR). The intersequence item correlation encoder in ISSR utilizes two graphs (i.e., a useritem bipartite graph and an itemitem cooccurrence graph) to capture the intersequence item correlations. The intrasequence item correlation encoder aggregates the learned intersequence item correlation information and considers the item sequential correlations and temporal dynamics in a current sequence to generate the representation of users’ interests. Then the user’s next behavior on the candidate items can be predicted by the learned interests. Extensive experiments on three realworld datasets are conducted. The results demonstrate the superiority of ISSR over many stateoftheart methods.
References
 [1] (2010) Performance of recommender algorithms on topn recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems, RecSys, pp. 39–46. Cited by: Compared Methods.
 [2] (2017) Language modeling with gated convolutional networks. In ICML, pp. 933–941. Cited by: Information Fusion.
 [3] (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: Related Work.

[4]
(2019)
A novel KNN approach for sessionbased recommendation
. In PAKDD, pp. 381–393. Cited by: Related Work.  [5] (2016) Fusing similarity models with markov chains for sparse sequential recommendation. In ICDM, pp. 191–200. Cited by: Introduction, Introduction, Introduction, Related Work, Compared Methods.
 [6] (2016) Sessionbased recommendations with recurrent neural networks. See DBLP:conf/iclr/2016, External Links: Link Cited by: Introduction, Introduction, Introduction, Related Work, Compared Methods, Overall Performance Comparison.
 [7] (2018) Recurrent neural networks with topk gains for sessionbased recommendations. In CIKM, pp. 843–852. Cited by: Introduction, Related Work.
 [8] (2018) Selfattentive sequential recommendation. See DBLP:conf/icdm/2018, pp. 197–206. External Links: Link, Document Cited by: Introduction, Introduction, Introduction, Related Work, Residual Connection, Loss Function, Datasets, Compared Methods, Overall Performance Comparison, footnote 3.
 [9] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Compared Methods.
 [10] (2009) Matrix factorization techniques for recommender systems. Computer (8), pp. 30–37. Cited by: Related Work.
 [11] (2019) Graph intention network for clickthrough rate prediction in sponsored search. In SIGIR, pp. 961–964. Cited by: Intersequence Item Correlation from ItemItem Cooccurrence Graph, Methodology.
 [12] (2019) Hierarchical gating networks for sequential recommendation. See DBLP:conf/kdd/2019, pp. 825–833. External Links: Link, Document Cited by: Introduction, Introduction, Introduction, Related Work, Compared Methods, Compared Methods, Parameter Settings.
 [13] (2019) net: A parallel informationsharing network for sharedaccount crossdomain sequential recommendations. See DBLP:conf/sigir/2019, pp. 685–694. External Links: Link, Document Cited by: Related Work.
 [14] (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: Related Work.
 [15] (2017) Personalizing sessionbased recommendations with hierarchical recurrent neural networks. See DBLP:conf/recsys/2017, pp. 130–137. External Links: Link, Document Cited by: Introduction.
 [16] (2019) Contextaware sequential recommendations withstacked recurrent neural networks. See DBLP:conf/www/2019, pp. 3172–3178. External Links: Link, Document Cited by: Related Work.

[17]
(2009)
BPR: bayesian personalized ranking from implicit feedback.
In
Proceedings of the twentyfifth conference on uncertainty in artificial intelligence
, pp. 452–461. Cited by: Compared Methods.  [18] (2010) Factorizing personalized markov chains for nextbasket recommendation. See DBLP:conf/www/2010, pp. 811–820. External Links: Link, Document Cited by: Introduction, Introduction, Related Work, Compared Methods.

[19]
(2006)
The unequal variance ttest is an underused alternative to student’s ttest and the mann–whitney u test
. Behavioral Ecology 17 (4), pp. 688–690. Cited by: Overall Performance Comparison. 
[20]
(2005)
An mdpbased recommender system.
Journal of Machine Learning Research
6 (Sep), pp. 1265–1295. Cited by: Related Work.  [21] (2019) BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer. arXiv preprint arXiv:1904.06690. Cited by: Introduction, Related Work.
 [22] (2015) Line: largescale information network embedding. In Proceedings of the 24th international conference on world wide web, WWW, pp. 1067–1077. Cited by: Related Work.
 [23] (2018) Personalized topn sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, New York, NY, USA, pp. 565–573. External Links: ISBN 9781450355810, Link, Document Cited by: Introduction, Introduction, Introduction, Related Work, Datasets, Evaluation Metrics, Compared Methods, Compared Methods, Parameter Settings, Overall Performance Comparison.
 [24] (2019) A collaborative sessionbased recommendation approach with parallel memory modules. See DBLP:conf/sigir/2019, pp. 345–354. External Links: Link, Document Cited by: Introduction, Compared Methods.
 [25] (2015) Learning hierarchical representation model for nextbasket recommendation. In SIGIR, pp. 403–412. Cited by: Related Work.
 [26] (2019) Neural graph collaborative filtering. In SIGIR, Cited by: Introduction, Intersequence Item Correlation from UserItem Bipartite Graph., Methodology.
 [27] (2019) Sessionbased recommendation with graph neural networks. See DBLP:conf/aaai/2019, pp. 346–353. External Links: Link Cited by: Introduction, Related Work, The Attention Layer, Compared Methods.
 [28] (2019) Hierarchical neural variational model for personalized sequential recommendation. See DBLP:conf/www/2019, pp. 3377–3383. External Links: Link, Document Cited by: Introduction, Related Work.
 [29] (2019) Recurrent convolutional neural network for sequential recommendation. In WWW, pp. 3398–3404. Cited by: Introduction, Related Work.
 [30] (2018) Graph convolutional neural networks for webscale recommender systems. See DBLP:conf/kdd/2018, pp. 974–983. External Links: Link, Document Cited by: Intersequence Item Correlation from UserItem Bipartite Graph., Intersequence Item Correlation from UserItem Bipartite Graph..
 [31] (2019) A simple convolutional generative network for next item recommendation. See DBLP:conf/wsdm/2019, pp. 582–590. External Links: Link, Document Cited by: Introduction, Related Work.
 [32] (2019) Next item recommendation with selfattentive metric learning. In ThirtyThird AAAI Conference on Artificial Intelligence, Vol. 9. Cited by: Introduction.
 [33] (2017) Scalable graph embedding for asymmetric proximity. In ThirtyFirst AAAI Conference on Artificial Intelligence, AAAI, Cited by: Related Work.
Comments
There are no comments yet.