SCoRe
SCoRe is a sequential recommendation model with dual side neighbor-based collaborative filtering. Implementation of our WSDM 2020 paper.
view repo
Sequential recommendation task aims to predict user preference over items in the future given user historical behaviors. The order of user behaviors implies that there are resourceful sequential patterns embedded in the behavior history which reveal the underlying dynamics of user interests. Various sequential recommendation methods are proposed to model the dynamic user behaviors. However, most of the models only consider the user's own behaviors and dynamics, while ignoring the collaborative relations among users and items, i.e., similar tastes of users or analogous properties of items. Without modeling collaborative relations, those methods suffer from the lack of recommendation diversity and thus may have worse performance. Worse still, most existing methods only consider the user-side sequence and ignore the temporal dynamics on the item side. To tackle the problems of the current sequential recommendation models, we propose Sequential Collaborative Recommender (SCoRe) which effectively mines high-order collaborative information using cross-neighbor relation modeling and, additionally utilizes both user-side and item-side historical sequences to better capture user and item dynamics. Experiments on three real-world yet large-scale datasets demonstrate the superiority of the proposed model over strong baselines.
READ FULL TEXT VIEW PDF
Existing item-based collaborative filtering (ICF) methods leverage only ...
read it
Deep learning-based sequential recommender systems have recently attract...
read it
In recommender systems, modeling user-item behaviors is essential for us...
read it
Modeling the sequential correlation of users' historical interactions is...
read it
Predicting users' preferences based on their sequential behaviors in his...
read it
Nowadays, news apps have taken over the popularity of paper-based media,...
read it
User response prediction, which models the user preference w.r.t. the
pr...
read it
SCoRe is a sequential recommendation model with dual side neighbor-based collaborative filtering. Implementation of our WSDM 2020 paper.
With the emergence of large online information systems such as e-commerce platform, the amount of user behavioral data grows rapidly. Therefore, in recent years, the researchers in both academic and industrial fields have devoted many efforts on sequential recommendation task which aims to mine the resourceful yet complex temporal dynamics embedded in user behavior sequences.
As has been stated in many related works (Hidasi and Karatzoglou, 2018; Koren, 2009; He et al., 2016; Agarwal et al., 2009), the temporal dynamics have high impacts on the future user behaviors, especially accounted for concept drifting (Widmer and Kubat, 1996), long-term behavior dependency (Koren, 2009), periodic patterns (Ren et al., 2018a), etc.
Commonly, the current sequential recommendation models regard user’s purchasing or browsing behaviors as “token”s in the natural language processing (NLP) field. And the mainstream models use sequential modeling techniques that are widely used in NLP such as recurrent neural networks (RNNs)
(Hidasi et al., 2016; Zhou et al., 2018a), convolutional neural networks (CNNs)
(Tang and Wang, 2018) and Transformer (Kang and McAuley, 2018). These models make huge success on sequential recommendation task with many deployed real-world applications (Zhou et al., 2018a; Wu et al., 2016).Despite the success of current sequential recommendation methods, there are still some limitations of them. The first one is that most of the models (Hidasi et al., 2016; Tang and Wang, 2018; Kang and McAuley, 2018) only consider user’s (or item’s) own interaction history, while ignoring similar users or items that have collaborative relations with itself. Therefore, each user (item) only know her (its) own behaviors, it is bad for the variety of recommendation and may hurt the recommendation performance.
We could regard the user-item interactions as a bipartite graph in which the nodes are users and items, and links are interaction records as illustrated in Figure 1. Traditional models (Koren, 2009, 2008) only consider the directly interacted items (users) of the target user (item) which are the 1-hop neighbors of the target node. In this way, it is difficult to capture the collaborative relations among users and items. But when we make a step forward and consider the 2-hop neighbors, we find that the neighbors in 2-hop have collaborative relations with the target node because both the 2-hop neighbors and the target node have interacted with the same group of nodes which are the 1-hop neighbors. As these relations are found through 2-hops on the graph, we call them high-order collaborative relations.
By this means, we may find the corresponding collaborative information for the target user or item. Moreover, there are various patterns across the neighbors that can be utilized. By aggregating these collaborative relations to the representation of users and items, we could model more complex and various user interests (item attractions). And the collaborative modeling can be done in a sequential way to better handle the temporal dynamics.
Another key limitation of current models is that they only consider the user-side temporal dynamics while ignoring the ones on the item side. The user-side sequence consists of the items that are browsed by the user, and thus it could reveal the user’s drifting interests. However, the item-side also contains sequential patterns: an item attracts different users at different time which could reveal the item dynamics such as popularity trend or social topic drift. For example, the recommender system may present Christmas card to a user when the holiday is coming even before her interacting with any related items because the Christmas card has attracted the other users who share similar interests or collaborative relations to that specific user. The modeling of item-side sequence is similar with information dissemination (Petrovic et al., 2011; Rizoiu et al., 2018), which means the item information disseminates from users to users at different time and to predict which user will be the next one that the information disseminates to.
There are already some sequential recommendation models that have tried combining user-side and item-side sequences to perform dual sequence modeling (Wu et al., 2017, 2019a). However, these works intend to consider the two sequences in a relatively independent manner and the sequential representations of both sequences have interactions only in the final prediction stage. Nevertheless, our work aims to model the dual sequences in a more interactive way which means the information of both sequences have interactions along the timeline. As we do collaborative relation capturing from both user-side and item-side, it is natural that we interact both sequences at synchronized time which is illustrated in Figure 2.
To address the limitations mentioned above, we propose Sequential Collaborative Recommender (SCoRe) which considers high-order collaborative relations and models dual sequences in an interactive and more expressive manner. The contribution of the paper can be summarized in three-fold:
[leftmargin=5mm]
We propose to aggregate high-order collaborative relations which could enrich the representation of users and items. More importantly, through cross neighbor relation modeling, our model can effectively capture the various and complex patterns in the neighbor-to-neighbor collaborative relations.
We propose to model both user-side and item-side sequence. Dual sequences interactions are modeled in a more thorough way, which makes the modeling of the dual sequence more expressive.
We conduct extensive experiments of evaluating and comparing our model with several strong baselines over three real-world yet large-scale recommendation datasets. The results have proved the efficacy of SCoRe model and the detailed analysis reveals some key principles of training our model.
The rest of the paper is organized as follows. Section 2 and Section 3 present the preliminaries and describe the SCoRe model in detail. We also make some discussions about the model efficiency. We conduct comprehensive experiments and present the experimental setups with the corresponding results in Section 4. In Section 5, we discuss about some related works. Finally we conclude the paper and point out some future works in Section 6.
In a recommender system, there are users in and items in . In the history, any user may reveal interests on some items and the interaction behaviors would be tracked in the system as and
(1) |
The user preference is either implicit feedback (Agarwal et al., 2009), e.g., clicks, or explicit user rating (Koren, 2009). Without loss of generality, we focus on the implicit feedback which is more common in practice. For sequential recommendation, each user-item interaction has the corresponding timestamp , thus we use the triplet to denote one interaction. All the observed interaction records are denote as .
For a target pair of
that we need to predict the interaction probability, we could extract some interaction relations of the target user
and target item .Definition 1. (Interaction Set): Given the target user , we can conduct the interaction set of as
(2) |
The user’s interaction set is the collection of all the items that the user has interacted with. Symmetrically, we can define the item ’s interaction set as,
(3) |
which contains all the users that have interaction with the item .
Definition 2. (Co-Interaction Set): To explicitly capture the collaborative relations among users and items, i.e., similar users or similar items, it is natural to consider the users (items) that have similar tastes (attractions). Therefore we define the co-interaction set of and respectively as
(4) |
(5) |
For user , the co-interaction set actually consists of a group of users that shares similar behaviors with , because they all interacted with the items in ’s interaction set. Therefore they have collaborative relations in some extent. For item , similarly, the co-interaction set consists of the items that attract the same group of users (’s interaction set) with the target item .
As shown in Figure 1, we can tell that the interaction set and co-interaction set are essentially the 1-hop and 2-hop neighbors of the rooted node ( or ) respectively.
Now that we have defined the local interaction relations of the user (item ), we take one step forward and consider it in a temporal way.
Specifically, the users (items) may conduct different interactions with different items (users) at different time. Thus, the interaction relations are evolving all the time and could be regarded as a series of time-sliced processes.
To better model the temporal patterns of the interaction relations, we slice the whole timeline into time frames, each of which is constructed within a unified time interval . In this way, all observed interactions and contains the triplet that happens in the -th time slice. Using the interaction records within , we could construct the user ’s and item ’s interaction relations. We denote them as,
(6) |
(7) |
The goal of the recommender system is to estimate the probability of interactions
between the target user and the given item , with consideration of the user’s interaction history and the item’s interaction history as(8) |
through the learned function with parameters where and . We conclude the notations and the corresponding descriptions in Table 1.
Notation | Description. |
The target user and the target item. | |
The number of users and items. | |
The indicator and the predicted probability of the user-item interaction. | |
Dense representation of target user and target item . | |
User ’s/item ’s interaction set at -th time slice. | |
User ’s/item ’s co-interaction set at -th time slice. | |
User ’s/item ’s interaction relations at -th time slice. | |
Aggregated user/item-side representation at -th time slice. | |
User/item-side sequence. | |
Time interval to split the whole timeline. | |
Size of (co-)interaction set. |
In this section, we present our proposed model SCoRe (Sequential Collaborative Recommender) in detail. We first introduce the high-order collaborative relation mining through cross neighbor modeling, and then we describe the dual sequence modeling in an interactive manner. Furthermore, we analyze the time complexity of the proposed model.
In this section, we describe the proposed Co-Attention Network for handling the complex relations across the neighbors of interaction set and co-interaction set.
At each time slice , we use Co-Attention Network to capture the complex relations across neighbors in interaction and co-interaction sets.
One of the key parts of recent success of recommendation models (Song et al., 2019; Wu et al., 2019b; Zhou et al., 2018a) are the attention mechanism which attributes different credits to different item representations or temporal representations (e.g. hidden states of RNNs). The attentive weight of a user interacted item w.r.t target item is calculated following the paradigm as,
(9) |
where the function can be various, is representation of which could be embedding or hidden states. The calculated measures the correlation (e.g. similarity) between and . This paradigm only focuses on the relations between user interacted items and single target item . But there are many neighboring items of (those items in ), so we could calculate neighbor-to-neighbor correlations between items in and those in . In this way, the relation between target and can be modeled with more resourceful information.
To model this cross neighbor collaborative relations, we propose Co-Attention Network, which is illustrated in Figure 3. We not only consider the relatedness between the user interacted items and the target item, but also take the relatedness across user interacted items and the collaborative neighbors of the target item into account.
At each time slice, we calculate a co-attention relatedness matrix, , each element of which is calculated as
(10) |
where is the embedding of the target item ,
is the ReLU activation function
and is the number of items in and . As the objects in and are all items, we denote this relatedness matrix as . Followed by a softmax operation, we get the co-attention matrix , each element of which is calculated as,(11) |
Symmetrically, we could calculate each element of as
(12) |
and using the softmax operation described in Eq. (11).
In this way, we could capture cross neighbor relations which are more complex and resourceful than the original paradigm described in Eq. (9).
In collaborative filtering (CF) models (Koren, 2008; He et al., 2017; Cheng et al., 2018) and sequential recommendation models (Hidasi et al., 2016; Tang and Wang, 2018; Zhou et al., 2018a), we only use the user (item) directly interacted items (users) to represent a user (item) which may cause a narrow understanding of user or item properties.
However, in previous section, by incorporating the information of co-interaction sets, not only could we model the cross neighbor collaborative relations but integrate high-order information by summarizing the co-interaction set. By integrating these co-interaction objects, we could enrich the representation of target user and item .
By sum pooling (SP) along the rows or columns of the two co-attention matrix and , we could get four attentive vector,
(13) |
(14) |
(15) |
(16) |
We denote the embeddings of interaction set and co-interaction set for and at -th time slice as (interaction), (co-interaction), , respectively, where is the dimension of the embedding and is the size of (co-)interaction set. The aggregated representation of and at -th time slice are
(17) |
(18) |
In this section, we describe our approach on temporal dynamics modeling. We conduct a dual sequence modeling method which considers both user-side and item-side sequences and interactively models the relations among two sequences of synchronized time slice.
At each time slice, we get the aggregated representations of target user and target item as and respectively following the co-attention mechanism in Section 3.1. After that we get two sequences and , which are the sequences of target user’s and item’s aggregated representation at different time respectively. For simplicity, we denote and thus
(19) |
We use two recurrent neural network models to model the temporal dynamics for user-side and item-side respectively. And we implement each recurrent cell as Gated Recurrent Unit
(Cho et al., 2014) (GRU). Each GRU unit takes the corresponding representation (or ) at each time step and the hidden state from the last time step, and then calculates as(20) | ||||
where is the element-wise product operator.
The item-side temporal dynamics are modeled in the same way. Till now, we’ve got user-side and item-side sequence of temporal representations: and .
As illustrated in Figure 4, different time slice has different impact on the final prediction at . And hereby we introduce our Interactive Attention Mechanism. Unlike the attention mechanism in (Zhou et al., 2018a) and (Zhou et al., 2018b) which uses the target item to query the interacted items sequence, we utilize dual sequences information at the same time interactively to weigh across different time slice. The attention value of each time slice is calculated as,
(21) |
(22) |
where R is a three-layer MLP with ReLU activation function.
And the final representations of user-side and item-side sequences are
(23) |
It is natural that we consider both side information to calculate attention, because the cross neighbor relations modeling described in Section 3.1 utilizes both user-side neighbors and item-side neighbors. As a result, we consider using both sides representations of synchronized time slice to interactively calculate attentive value. In this way, the modeling of two sequences are highly correlated.
The predicted probability of interaction between the target user and the target item is calculated as
(24) |
where
is implemented as a multi-layer perceptron with the ReLU activation function. The parameters set of the MLP is
. The inference procedure is illustrated in Figure 4.As for the loss function, we take an end-to-end training and introduce (i) the widely used cross entropy loss
(Ren et al., 2018b; Zhou et al., 2018b, a) over the whole training dataset and (ii) the parameter regularization . We utilize Adam algorithm for optimization. Thus the final loss function is(25) | ||||
where includes the parameters in GRUs, , in Co-Attention Network and the parameters of the three-layer MLP in the Interactive Attention Mechanism.
In this section, we analyze the computational complexity of our SCoRe model. From the previous sections, we can tell that the forward inference of SCoRe can be regarded as two relatively separate parts. The first part is the cross neighbor collaborative relations modeling, which can be paralleled conducted for each time slice. The cost of it can be viewed as a constant
as the co-attention network conduct single layer non-linear transformation, softmax and sum-pooling operations. The second part is the GRU temporal modeling. We assume the average time performance of the GRU is a constant
which is related to the implementation of the GRU module yet can be parallelly executed through GPU processor. Recall that we have time slices, thus the time complexity of temporal inference is . Therefore the overall time complexity of SCoRe is which is the time complexity of ordinary recurrent neural networks.In this section, we present the details of the experiment setups and the corresponding results. To illustrate the effectiveness of our proposed model, we compare it with some strong baselines on sequential recommendation task. Moreover, we have published our reproductive code^{1}^{1}1https://github.com/qinjr/SCoRe.
We start with three research questions (RQ) to lead the experiments and the following discussions.
Compared to the baseline models, does SCoRe achieve state-of-the-art performance in sequential recommendation task?
What is the influence of different components in SCoRe? Are the proposed co-attention network and interactive attention necessary for improving performance?
What patterns does the proposed model capture for the final recommendation decision?
In this part, we describe our experiment setups including datasets with preprocessing method, some important implementation details, evaluation metrics and the compared baselines.
We use three real-world large-scale datasets to evaluate all the compared models. The dataset statistics have been shown in Table 2.
[leftmargin=15pt]
(Cao et al., 2016) is a dataset consists of movie rating (from integer score 1 to 5) logs collected from Douban, which is one of China’s largest movie review websites. The data is collected and dumped in May 2015.
(Zhu et al., 2018) is a dataset consisting of user behavior data collected from Taobao^{2}^{2}2https://tianchi.aliyun.com/dataset/dataDetail?dataId=649, one e-commerce platform in China. It contains user behaviors from November 25 to December 3, 2017 of several behavior types including click, purchase, add to cart and item favoring.
is provided by Alibaba Group which contains user behavior history on Tmall e-commerce platform from May 2015 to November 2015.
Dataset Preprocessing. We cut the time line into total time slices with the specific time interval as shown in Table 2. And for each time slice, we use the interaction records within it to construct interaction and co-interaction set for both user and item. Here we use a simple way to do time slicing, we leave finer segmentation strategy in future work.
Dataset | Users # | Items # | Interaction # | Time slices | Time interval |
CCMR | 4,920,695 | 190,129 | 283,775,314 | 41 | 90 days |
Taobao | 987,994 | 4,162,024 | 100,150,807 | 9 | 1 day |
Tmall | 424,170 | 1,090,390 | 54,925,331 | 13 | 15 days |
Positive & Negative Samples. To evaluate the recommendation performance, we use one positive item and sample 100 negative items at the prediction time for each user in all three datasets. For Tmall and Taobao datasets, as we only have the positive user feedbacks (click, buying, etc.), we have to randomly sample the negative items. As for CCMR datasets, we regard items whose ratings are 5 or 4 as positive items and those whose ratings are 1,2 or 3 as negative items. If a user does not have enough negative items, we use random sampling to generate negative items for her. The positive items in CCMR form the behavior sequence.
Train & Test Splitting. The training set contains the sequential behaviors from the first to the th time slice, we use the interactions history from 1 to to predict in . For the validation set, we use the interactions data from 1 to to predict in . In testing set, interactions data from 1 to are used to predict in .
Implementation Details. It is common that the target user doesn’t have any interaction record in a time slice, and similarly, the target item may be not visited by any user in a time slice. To handle this issue, we use a unified embedding vector to represent the situation.
We set the size of interaction set to
which can be regarded as a hyperparameter. For simplicity, the size of co-interaction set is
too. If there are more than objects in a set, we use random sampling. If there are less than objects (say () objects), we random sample times among the original set.Three evaluation metrics are used and all of them are widely used in recommendation tasks.
HR@k (Hit Ratio@k) measures the proportion of samples that the positive item is among the top-k in all test cases which is computed as,
(26) |
where is the ranking position of the user ’s interacting with item , and is the indicator function.
NDCG@k (Normalized Discounted Cumulative Gain) is a position-aware metric which assigns larger weights on higher ranks of the positive item, which is calculated as,
(27) |
MRR (Mean Reciprocal Rank) is another position-aware metric that is calculated as,
(28) |
As HR@1 is equal to NDCG@1, so in this work, we report HR@{1,5,10}, NDCG@{5,10} and MRR in detail.
To illustrate the effectiveness of our model, we compare SCoRe with two CF models, three single sequence recommendation models and two dual sequence models. We follow (Zhou et al., 2018a) that all the models take the input sparse features and feed them through an embedding layer for the subsequent inference.
The first group of models are CF models:
[leftmargin=40pt]
(Koren, 2008) is a hybrid method of latent factor model and neighbor-based model which is the fundamental approach of collaborative filtering recommendation. It regards all the sequential behaviors as a whole and ignores the temporal dynamics.
(Cheng et al., 2018) is the state-of-the-art CF method which utilizes deep neural networks to capture complex non-linear interaction patterns from both user-side and item-side.
The second group contains sequential recommendation methods that utilize single user-side sequence, which are based on RNNs, CNNs, or Transformer architecture:
The third group is dual sequence recommendation models.
[leftmargin=40pt]
(Wu et al., 2017) is the first RNN-based model that considers both the user- and item-side sequence. It uses sum-pooling to aggregate the information inside a time slice.
(Wu et al., 2019a) feed the user-side and item-side sequence respectively into two identical sequential models, and let the two models play a game with each other where one model will use the predicted score of the other model as feedback to guide the training.
is our proposed model which is described in Section 3.
Dataset | Metric | Group 1 | Group 2 | Group 3 | |||||
SVD++ | DELF | GRU4Rec | Caser | SASRec | RRN | DEEMS | SCoRe | ||
CCMR | HR@1 | 0.0797 | 0.0755 | 0.0739 | 0.0845 | 0.0817 | 0.0739 | 0.0968 | 0.1035 |
HR@5 | 0.1865 | 0.2255 | 0.2477 | 0.2469 | 0.2480 | 0.2214 | 0.2444 | 0.2518 | |
HR@10 | 0.2686 | 0.3422 | 0.3494 | 0.3663 | 0.3613 | 0.3431 | 0.3599 | 0.3688 | |
NDCG@5 | 0.1340 | 0.1638 | 0.1689 | 0.1736 | 0.1779 | 0.1733 | 0.1776 | 0.1891 | |
NDCG@10 | 0.1604 | 0.2051 | 0.1985 | 0.2113 | 0.2128 | 0.2060 | 0.2115 | 0.2167 | |
MRR | 0.1516 | 0.1750 | 0.1706 | 0.1829 | 0.1893 | 0.1799 | 0.1896 | 0.1954 | |
Taobao | HR@1 | 0.1947 | 0.3381 | 0.3439 | 0.3562 | 0.3510 | 0.3204 | 0.3255 | 0.3688 |
HR@5 | 0.4489 | 0.6077 | 0.6035 | 0.6085 | 0.6159 | 0.6220 | 0.6478 | 0.6816 | |
HR@10 | 0.5933 | 0.7084 | 0.7189 | 0.7224 | 0.7371 | 0.7620 | 0.7517 | 0.8068 | |
NDCG@5 | 0.3256 | 0.4731 | 0.4866 | 0.5005 | 0.5101 | 0.4779 | 0.4814 | 0.5339 | |
NDCG@10 | 0.3723 | 0.5089 | 0.5139 | 0.5174 | 0.5199 | 0.5233 | 0.5476 | 0.5745 | |
MRR | 0.3224 | 0.4405 | 0.4617 | 0.4744 | 0.4818 | 0.4615 | 0.4988 | 0.5121 | |
Tmall | HR@1 | 0.3447 | 0.3386 | 0.3501 | 0.3588 | 0.3622 | 0.3634 | 0.3669 | 0.3770 |
HR@5 | 0.5594 | 0.5636 | 0.5727 | 0.5712 | 0.5819 | 0.7310 | 0.7331 | 0.7381 | |
HR@10 | 0.6554 | 0.6562 | 0.6646 | 0.6662 | 0.6686 | 0.8378 | 0.8373 | 0.8479 | |
NDCG@5 | 0.4589 | 0.4654 | 0.4784 | 0.4768 | 0.4843 | 0.5594 | 0.5565 | 0.5693 | |
NDCG@10 | 0.4901 | 0.4986 | 0.5080 | 0.5074 | 0.5124 | 0.5942 | 0.5951 | 0.6051 | |
MRR | 0.4527 | 0.4669 | 0.4741 | 0.4730 | 0.4778 | 0.5256 | 0.5259 | 0.5363 |
The experimental results are shown in Table 3, we find several observations as below.
By comparing the performance of SCoRe and other baseline models, it outperforms baselines by 28.9% to 3.1%, 58.8% to 2.7% and 18.5% to 2.0% on MRR in CCMR, Taobao and Tmall dataset, respectively. And it also shows significant improvements on the other metrics so SCoRe achieves the state-of-the-art performance in sequential recommendation task.
For the models in Group 1, they do not consider the temporal dynamics of user behaviors thus perform not so good as models in Group 2 and 3. DELF uses both user-side and item-side interaction information so it achieves better performance than SVD++ which only utilizes user-side information.
By comparing the performance of Group 2 and 3, we find in Tmall and Taobao dataset that, Group 3 outperforms Group 2. But over CCMR dataset, SASRec and Caser are better than Group 3. As shown in Table 2, Tmall and Taobao has a lot more items than CCMR, which makes ranking items more difficult in Tmall and Taobao. So it is more important on these two datasets to take item-side sequence into consideration because it gives the models more information than just using single user-side sequence.
DEEMS performs better than RRN in many cases which means the two player game of dual sequence in DEEMS is effective and show the potential of finer design on dual modeling.
Influence from the size of (co-)interaction set. We vary the size of (co-)interaction set to further investigate the robustness of SCoRe. For simplicity, we set interaction and co-interaction set to have same size in dual side. The results on Tmall and Taobao dataset are shown in Figure 5. We find that when size increases, the performance is improved at first because that the larger the size is, the more information it contains. And when the size continues to increase, the performances begin to drop which indicates that too much noise and useless information is introduced.
In this section, we conduct some ablation studies to investigate the effectiveness of three important components of SCoRe : (1) Interactive Attention Mechanism in dual sequence modeling; (2) Co-Attention Network for cross neighbor relation mining and aggregation; (3) The consideration of using both user-side and item-side.
Dataset | Metric | models | ||||
RIA | RCA | User | Item | SCoRe | ||
CCMR | HR@10 | 0.3667 | 0.3461 | 0.3491 | 0.3218 | 0.3688 |
NDCG@10 | 0.2098 | 0.2077 | 0.2012 | 0.1892 | 0.2167 | |
MRR | 0.1872 | 0.1795 | 0.1782 | 0.1567 | 0.1954 | |
Taobao | HR@10 | 0.7888 | 0.7702 | 0.7678 | 0.7453 | 0.8068 |
NDCG@10 | 0.5436 | 0.5286 | 0.5192 | 0.5001 | 0.5745 | |
MRR | 0.4785 | 0.4669 | 0.4652 | 0.4495 | 0.5121 | |
Tmall | HR@10 | 0.8467 | 0.8406 | 0.8355 | 0.8122 | 0.8479 |
NDCG@10 | 0.6030 | 0.5903 | 0.5843 | 0.5671 | 0.6051 | |
MRR | 0.5352 | 0.5289 | 0.5192 | 0.5085 | 0.5363 |
We set four comparative settings, and the performances of them have been shown in Table 4. The details of the four settings are listed as below.
[leftmargin=15pt]
RCA (Remove Co-Attention) removes the co-attention network and uses simply sum pooling to aggregate neighbor information in interaction and co-interaction set.
User only uses the user-side sequence to do final prediction, as .
Item only uses the item-side sequence to do final prediction, as .
Except for the changes mentioned above, the other parts of the models and experimental settings remain identical to ensure the fairness of comparison.
From Table 4 we can find that (1) SCoRe performs the best indicating the efficacy of different components of the model. (2) the performance decreases more when removing co-attention than interactive attention which means the cross neighbor relation modeling is more important and fundamental to SCoRe’s performance. (3) Using single sequence hurt the performance badly and item-side is harder to model compared to user-side thus have worse performance.
In this section, we further investigate what patterns SCoRe captures by studying and visualizing a specific case sampled from the CCMR dataset. In Figure 6, we plot the prediction of user-item pair (u36, m1911) where m1911 is the movie American Sniper. The ground truth is , we plot the Caser and SCoRe predictions which are and respectively. By looking into the user behavior sequence, we find the reason of SCoRe’s better prediction result.
The user’s recent behaviors are favouring comedy, cartoon or fiction, as illustrated in the upper part of Figure 6 which are not very relevant to the target item American Sniper which is a biographical and action movie. So it is natural that models like Caser tend to give lower prediction score.
However, we plot the interactive attention value of recent time slices of SCoRe and find the 40-th slice have higher attention value. So we further plot the co-attention matrix of this time slice . We find m4682 (Interstellar, fiction and adventure) of has high relation with m12508 (007:Casino Royal, action and fiction) of . Movie m12508 is more relevant to the user’s interest, and its representation is aggregated to the item-side, so it is reasonable that SCoRe has the ability to give the target item higher predicted probability score, which has precisely been interacted by the target user in the test data.
In recommender system literatures, the most widely used method is collaborative filtering (Goldberg et al., 1992), which learns from the historical user-item interactions without exogenous information about items or users. It recommends according to the modeled user preferences, e,g., clicks (Qu et al., 2016; Agarwal et al., 2009) and ratings (Koren, 2009), over the items. Many works (Koren et al., 2009; Salakhutdinov and Mnih, 2007; Yang et al., 2012; Zhang et al., 2013) have been proposed based on collaborative filtering. Among them, latent factor models play a key role in CF methods, ranging from pLSA (Hofmann, 2004) and Latent Dirichlet Allocation (Blei et al., 2003) to SVD-based models (Koren, 2008; Chen et al., 2012) and factorization machines (Rendle, 2010)
. Nowadays, deep neural network (DNN) has attracted huge attentions in recommender systems because of its effective feature extraction and end-to-end model training with satisfying generalization
(Zhang et al., 2017). Some DNN methods (He et al., 2017, 2018; Qu et al., 2016) are proposed for latent factor collaborative filtering. However, almost all of these approaches, either conventional matrix factorization methods or deep models, lack of temporal pattern mining.Recently, sequential recommendation has drawn huge attention since the sequences of user behaviors have rich information for the user interests, especially for concept drifting (Widmer and Kubat, 1996), long-term behavior dependency (Koren, 2009; Ren et al., 2019), periodic patterns (Ren et al., 2018a), etc.
There are three categories for sequential recommendation. The first one is from the view of temporal collaborative filtering (Koren, 2009)
with the consideration of drifting user preferences. The second stream is based on Markov-chain methodology
(Rendle et al., 2010; He and McAuley, 2016; He et al., 2016) which implicitly models the user state dynamics and derive the outcome behaviors. The third school is based on deep neural networks, such as recurrent neural networks (RNNs) (Hidasi et al., 2016; Hidasi and Karatzoglou, 2018; Wu et al., 2017; Jing and Smola, 2017; Liu et al., 2016; Beutel et al., 2018; Villatel et al., 2018) and convolutional neural networks (CNNs) regarding the behavior history as an image (Tang and Wang, 2018; Kang and McAuley, 2018). However, most of these methods only care about user’s interest drifting and do not consider the sequential patterns of items, which also deliver rich information for user-item matching. Models like (Wu et al., 2017, 2019a) considers both sequences but in a relatively independent way, which leaves space for finer design of dual sequence modeling. Furthermore, most of these sequential models only care about user’s own interaction history while ignoring the information that could be found in similar users or items. And thus the sequential models may suffer from narrowness of recommendation.In this paper, we propose SCoRe, a model that utilizes and aggregates high-order collaborative information using cross neighbor modeling to improve representation learning and collaborative relation mining. Furthermore, we propose an interactive attention mechanism to model the user-side and item-side sequences. In this way, dual sequence modeling captures temporal dynamics from both user and item-side and significantly facilitate final recommendation performance.
For the future work, we plan to further investigate on the time segmentation strategy of the evolving sequential interactions and its influence to the recommendation performance. We also seek to deploy our method on the real-world recommender systems.
Acknowledgement. The corresponding author Weinan Zhang thanks the support of National Natural Science Foundation of China (61702327, 61772333, 61632017) and Shanghai Sailing Program (17YF1428200).
Journal of machine Learning research
3, Jan (2003), 993–1022.
Comments
There are no comments yet.