Conventional recommendation methods often assume that user intents are static; they ignore the dynamic and evolving characteristics of user behavior. Sequential Recommendations (SRs) have been introduced to address this issue; they aim to predict the next item(s) by modeling the sequence of a user’s previous behavior .
Early studies into SRs are mostly based on Markov Chains (MC) . Due to the unmanageable state space issue of MCs , Recurrent Neural Network (RNN) or Transformer based neural models have attracted a lot of attentions . A number of studies have investigated various factors that might influence SR performances, e.g., personalization , repeat consumption , context , etc. However, these methods focus on improving recommendation accuracy only, which might have the risk of over-specialization, i.e., the recommended items are super homogeneous.
This is problematic considering that users usually have multiple intents. For example, as shown in Fig. 1, although the user shows most interest in cartoon movies from her/his historic watching behaviors, s/he also watches family and action movies occasionally. A better recommendation strategy should provide a diverse recommendation list to satisfy all these intents. In the case of Fig. 1, we should recommend a list containing action, cartoon and family movies simultaneously instead of only cartoons. Besides, sometimes, user intents are exploratory which means they do not have a specific intent in mind. Thus homogeneous recommendation lists cannot satisfy such users and they easily get bored with the low diverse recommendation lists .
Diversification has been well studied in conventional recommendations  as well as Web search . Current approaches for diversified recommendation mainly focus on how to re-rank the items based on a certain diversity metric with general recommendation models. However, they are not ideally applicable for SRs for two reasons. First, some of them assume user intents are static and require that user intents are prepared beforehand, which is unrealistic in most SR scenarios [4, 6]. Second, most of them belong to the post-processing paradigm and achieve recommendation accuracy and diversity in two separate steps, i.e., 1) scoring items and generating a candidate item set with a recommendation model; and 2) selecting a diverse recommendation list based on both the item scores and some implicit/explicit diversity metrics [26, 15]. Because the recommendation models are not aware of diversity during learning and it is hard to design ideal diversity strategies for different recommendation models, these methods are generally inferior and far from satisfactory.
To address the above issues, we take into account both recommendation accuracy and diversity, and propose an end-to-end neural model, namely Intent-aware Diversified Sequential Recommendation (IDSR), for SRs. Generally, IDSR employs an Implicit Intent Mining (IIM) module to automatically capture multiple latent user intents reflected in user behavior sequences, and directly generate accurate and diverse recommendation lists w.r.t the latent user intents. In order to supervise the learning of the IIM module and force the model to take recommendation diversity into consideration during training, we design an Intent-aware Diversity Promoting (IDP) loss which evaluates recommendation accuracy and diversity based on the whole generated recommendation lists. Specifically, a sequence encoder is first used to encode the user behaviors into representations. Then, the IIM module employs multiple attentions to mine user intents with each attention capturing a particular latent user intent. Finally, an intent-aware recommendation decoder is used to generate a recommendation list by selecting one item at a time. Especially, when selecting the next item, IDSR also takes the already selected items as input so that it can track the satisfaction degree of each latent user intent. During training, we devise the IDP loss to instruct IDSR to learn to mine and track user intents. All parameters are learned in an end-to-end back-propagation training paradigm within a unified framework. We conduct extensive experiments on two benchmark datasets. The results show that IDSR outperforms the state-of-the-art baselines on two publicly available datasets in terms of both accuracy metrics, i.e., Recall and MRR, and diversity metric, ILD.
Our contributions can be summarized as follows:
We propose an Intent-aware Diversified Sequential Recommendation (IDSR) method, which is the first end-to-end neural framework that considers diversification for SRs, to the best of our knowledge.
We devise an Implicit Intent Mining (IIM) module to automatically mine latent user intents from user behaviors and an intent-aware recommendation decoder to generate diverse recommendation lists.
We present an IDP loss to better supervise IDSR in terms of recommendation accuracy and diversity.
We carry out extensive experiments and analyses on two benchmark datasets to verify the effectiveness of IDSR.
2 Related Work
2.1 Sequential recommendation
Traditional methods are mainly based on MCs 
, which investigate how to extract sequential patterns to learn users’ next preferences with probabilistic decision-tree models. Following this idea, Fusing2016 Fusing2016 fuse similarity models withMCs for SRs to solve sparse recommendation problems. However, MC-based methods only model local sequential patterns between adjacent interactions, which could not take the whole sequence into consideration.
Recently, RNNs have been devised to model variable-length sequential data. GRU2016 GRU2016 introduce an RNN-based model for SRs that consists of Gated Recurrent Units and uses a session-parallel mini-batch training process. HRNN2017 HRNN2017 extend this idea and develop a hierarchical RNN structure which takes users’ profile into account by considering cross-session information. Attention mechanisms have been applied to recommendation tasks to help models exploit users’ preferences better . NARM2017 NARM2017 propose a neural attentive session-based recommendation machine that takes the last hidden state from the session-based RNN
as the sequential behavior, and uses the other hidden states of previous clicks for computing attention to capture users’ current preferences in a given session. Xu:2019:RCN Xu:2019:RCN propose a novel Recurrent Convolutional Neural Network model to capture both long-term as well as short-term dependencies for sequential recommendation. Recently, a purely attention-based sequence-to-sequence model, Transformer, has achieved competitive performance on machine translation tasks. self-attentive18 self-attentive18 introduce Transformer into SRs by presenting a two-layer Transformer model to capture user’s sequential behaviors. BERT2019 BERT2019 introduce a Bidirectional Encoder Representations from Transformers for sequential recommendation.
Although there are a number of studies for SRs, they only focus on accuracy of the recommendation list. None of the aforementioned studies has considered users’ multiple intents and the diversification for SRs.
2.2 Diversified recommendation
Promoting the diversity for recommendation or search results has received increasing research attentions. The most representative implicit approach is Maximal Marginal Relevance (MMR) . MMR represents relevance and diversity by independent metrics and uses the notion of marginal relevance to combine the two metrics with a tradeoff parameter. Sha:2016:FRR Sha:2016:FRR introduce a submodular objective function to combine relevance, coverage of user’s interests, and the diversity between items. Learning To Rank (LTR) has also been exploited to address diversification 
. Cheng:2017:LRA Cheng:2017:LRA first label each user by a set of diverse as well as relevant items with a heuristic method and then propose a diversified collaborative filtering algorithm to learn to optimize the performance of accuracy and diversity for recommendation. The main problem is that they all need diversified ranking lists as ground-truth for learning, which is usually unavailable in recommendations. Recently, Chen:2018:FGM Chen:2018:FGM propose to improve recommendation diversification through Determinantal Point Process (DPP)  with a greedy maximum a posterior inference algorithm.
All above methods achieve recommendation accuracy and diversity in two separate processes, i.e., training an offline recommendation model to score items in terms of accuracy and then re-ranking items by taking diversity into account. In addition, they are not suitable for SRs where users’ sequential behaviors need to be considered.
3 Intent-aware Diversified Sequential Recommendation
Given a user and her/his behavior sequence = ( is the item that interacts with, e.g., watched movies), our goal is to provide with a recommendation list for predicting her/his next interaction, of which the items are expected to be both relevant and diverse.
Different from existing SR methods, we assume there are latent intents behind each behavior sequence, i.e., =. Then, we seek to generate a recommendation list by maximizing the satisfactory degree of all intents.
where denotes the importance of the intent to user .
is the satisfactory probability ofto .
It is hard to directly optimize due to the huge search space. Therefore, we propose to generate greedily, i.e., selecting one item at a time with the maximum score .
where is the item to be selected at step ; is the set of all items; is the generated recommendation list until step ; guarantees that the selected item is different from previous generated recommendations in at step . returns the score of item by
Generally, it is a combination of the relevance score and diversification score balanced by a hyper-parameter . is the relevance score reflecting the interest of on v. is the probability of with the intent . is the satisfactory degree of to . denotes the likelihood that the already generated recommendation list cannot satisfy .
We propose an end-to-end IDSR model to directly generate a diversified recommendation list upon Eq. 3. As shown in Fig. 2, IDSR consists of three modules: a Sequence encoder, an Implicit Intent Mining (IIM) module and an Intent-aware Diversity Promoting (IDP) decoder. The sequence encoder encodes users’ sequential behaviors into latent representations. Then, the IIM module is used to capture users’ multiple latent intents reflected in the sequential behaviors. Finally, the IDP decoder is employed to generate a recommendation list w.r.t. Eq. 3. We devise an IDP loss to train IDSR which evaluates the whole recommendation list in terms of both recommendation accuracy and diversity. Note that there is no re-ranking involved in IDSR and both recommendation accuracy and diversity are jointly learned in an end-to-end way. Next, we introduce the separate modules.
3.2 Sequence encoder
Two commonly used technologies for sequence modeling are GRU and Transformer. We use both in our experiments (see §5) and find that IDSR shows better performance with GRU. Thus here we use a GRU to encode .
where denotes the embedding of item ; , and are weight parameters;
denotes the sigmoid function. The inputs of the encoder is the behavior sequence=
and the outputs are hidden representations, where . We stack those representations into matrix . Generally, we consider the last representation as a summary of the whole sequence . Thus we set the global preference .
3.3 Iim module
The IIM module is to evaluate the in Eq. 3. Intuitively, a user’s multiple intents can be reflected by different interactions in the sequential behaviors. Some interactions are more representative for a particular intent than the other interactions, e.g., the last two behaviors in Fig. 1 reflects the intent of watching cartoon movies. Motivated by this, we fuse a multi-intent attention mechanism with each attention to capture one particular intent. Specifically, IIM first projects and into spaces w.r.t. the latent intents respectively. Then, attention functions are employed in parallel to produce the user’s intent-specific representations .
where the projection matrices for intent , i.e., , and , are learnable parameters. We use the scaled dot-product attention in this work . After that, the importance of each intent, i.e., in Eq. 3, can be calculated by
3.4 Idp decoder
The IDP decoder is to generate based on the intents mined with the IIM module. To begin with, we model the relevance score of to user (i.e., in Eq. 3) as follows:
Similarly, we model the relevance of to intent (i.e., in Eq. 3) as follows:
To track the already selected items to date, we use another GRU to encode = into
. Then we estimate the satisfactory ofto each intent (i.e., in Eq. 3) by calculating the matching between and as follows:
Finally, we can calculate the score of each item (Eq. 3), select the item with highest probability, and append it to the recommendation list.
3.5 Idp loss
Since our goal is to generate a recommendation list which is both relevant and diverse, we design the IDPloss function to evaluate the whole generated list :
where is a trade-off parameter, which is the same one as that in Eq 3, as they both control the contributions of accuracy and diversification. When increases, we consider more about accuracy in both of IDP decoder and IDP loss. Given the output recommendation list from IDSR and the ground-truth item (i.e., the next consumed item), is defined as follows:
where is an indicator function that equals 1 if and 0 otherwise. denotes the probability of the ground-truth item at -th step when generating the recommendation list . encourages the positive recommendation list which contains the ground-truth item and punishes the negative ones otherwise. Note that also takes the ranking position of the ground-truth item into consideration by weighting the loss with the position .
To promote diversity, inspired by the result diversification in Web search , we define as the probability of each intent having at least one relevant item in
All parameters of IDSR as well as the item embeddings can be learned in an end-to-end back-propagation training paradigm.
4 Experimental Setup
We seek to answer the following research questions:
What is the performance of IDSR compared with state-of-the-art baselines in terms of accuracy?
Does IDSR outperform state-of-the-art baselines in terms of diversity?
What is the impact of different sequence encoders (i.e., GRU, Transformer) on IDSR?
How does the trade-off parameter affect the performance of IDSR?
How does IDSR influence the recommendation results?
|Number of users||943||6,040|
|Number of items||1,682||3,706|
|Number of interactions||100,000||1,000,209|
|Number of item genres||19||18|
|Avg. number of genres per item||1.7||1.6|
To answer our research questions, we use two publicly available datasets for the experiments. Table 1 lists the statistics of the two datasets.
We do not use the datasets as in  because there are only item ids. We cannot conduct diversity evaluation nor case study. We preprocess ML100K and ML1M for SR experiments with the following steps. First, we filter out users who have less than 5 interactions and the movies that are rated less than 5 times. Then, we sort the rated movies according to the “timestamp” field to get a sequence of behaviors for each user. Finally, we prepare each data sample by regarding the former 9 behaviors as input and the next behavior as output. For evaluation, we randomly divide the datasets into training (70%), validation (10%) and test (20%) sets. We make sure that all movies in the test set have been rated by at least one user in the training set and the test set contains the most recent behaviors which happened later than those in the training and validation sets.
4.2 Comparison methods
We select several traditional recommendation methods as well as recent state-of-the-art neural SR methods as baselines.
POP: POP ranks items based on the number of interactions, which is a non-personalized approach .
FPMC: FPMC is a hybrid model that combines MCs and collaborative filtering for SRs .
GRU4Rec: GRU4Rec is an RNN-based model for SRs. GRU4Rec utilizes session-parallel mini-batches as well as a ranking-based loss function in the training process .
HRNN: HRNN is a hierarchical RNN for SRs based on GRU4Rec. It adopts a session-level RNN and a user-level RNN to model users’ short-term and long-term preferences .
Because there is no previous work specific for diversified SRs, we construct three baselines FPMC+MMR, GRU4Rec+MMR and HRNN+MMR ourselves. Specifically, we first get the relevance scores for each item with FPMC, GRU4Rec or HRNN. Then, we rerank the items using the MMR criteria
where is a candidate item set and is a trade-off parameter to balance the relevance and the minimal dissimilarity between item and item . MMR first initializes and then iteratively selects the item into , until . When =0, MMR returns diversified recommendations without considering relevance; when =1, it returns the same results as the original baseline models. We cannot choose the best relying solely on accuracy or diversity. To balance both, we set in our experiments.
In addition, we consider two variants of our IDSR model:
IDSR: We use transformer to encoder users’ behavior sequences in IDSR.
IDSR: We use GRU to encoder users’ behavior sequences in IDSR.
4.3 Evaluation metrics
Recall: A primary metric which is used to evaluate the recall of the recommender system, i.e., whether the test item is contained in the recommendation list.
MRR: A metric measures the ranking accuracy of the recommender system, i.e., whether the test item is ranked at the top of the list.
4.4 Implementation details
We set the item embedding size and GRU hidden state sizes to 100. We use dropout with drop ratio . We initialize model parameters randomly using the Xavier method . We optimize the model using Adam  with the initial learning rate , two momentum parameters and , and
. The mini-batch size is set to 128. We test the model performance on the validation set for every epoch. The code used to run the experiments is available online222https://url.suppressed.for.anonymity.
5 Results and Analysis
5.1 Performance in terms of accuracy
To answer RQ1, we examine the performance of IDSR and the baseline models in terms of Recall and MRR; see Table 2.
First, we can see that IDSR outperforms the traditional methods, i.e., POP and FPMC, as well as the neural-based methods, i.e., GRU4Rec and HRNN in terms of Recall and MRR. When the size of recommendation list changes from 10 to 20, the improvements of IDSR over the best baseline HRNN get increased. In detail, the improvements are 4.16% and 6.37% in terms of Recall@10 and Recall@20; 5.52% and 6.72% in terms of MRR@10 and MRR @20 on ML100K. Besides, the improvements of IDSR over HRNN in terms of MRR are larger than that of Recall. For instance, on the ML1M dataset, the improvements of IDSR over HRNN are 5.32% in terms of MRR@20 while 3.22% in terms of Recall@20. This shows that our model can not only boost the number of relevant items but the ranking of relevant items. It may be due to the fact that our IIM module can capture a user’s multiple intents and their importances in his current preferences. Thus at the first steps in IDP decoder, IDSR can give high probabilities to relevant items.
Second, we note that after re-ranking with MMR, the performances of the baseline models drop a little bit in terms of both Recall and MRR. This indicates that although post-processing with MMR can improve the diversity of recommendation list, it might hurt the accuracy. Because most of the candidate items generated by the baseline models have similar genres, the diversity score for the relevant item may be much lower than irrelevant ones. This might lead to a situation that the irrelevant items have higher final scores than the relevant item, which results in a worse performance.
5.2 Performance in terms of diversity
To answer RQ2, we report the ILD scores on both datasets in Table 2. We can see that IDSR consistently outperforms all baselines. The improvements of IDSR over HRNN are 19.90% and 20.29% in terms of ILD@10 and ILD@20 on ML100K dataset, 57.76% and 50.85% on ML1M dataset. Compared with the post-processing baselines, our model outperforms the best baseline model, i.e., HRNN+MRR, by 7.39% and 7.97% in terms of ILD@10 and ILD@20 on ML100K dataset and 10.83% and 11.81% on ML1M dataset. Significant improvements of IDSR against the best performing baseline are observed with a paired -test. This proves that our model is competitive in improving the diversity for sequential recommendation than post-processing methods. It may be because that the effectiveness of post-processing methods, e.g., MMR, can be impacted by the baseline models. When the candidate items generated by the sequential recommendation baselines are of similar genres, the performance of MMR method is limited to some extent.
5.3 Performance with different encoders
To answer RQ3, we test the performance of IDSR with different sequence encoders, i.e, GRU or Transformer. Table 3 shows the results.
IDSR shows a better performance with GRU than with Transformer in terms of all metrics on both datasets. We believe the reason is that Transformers rely on position embeddings to capture sequential information which are less effective than GRU, especially when Transformers are not pre-trained on a large-scale dataset. On the contrast, the recurrent nature of GRU is especially designed for sequence modeling, which means it needs fewer data to capture the sequential information. This can be proved by the fact that the improvements of DSR over DSR on ML1M are smaller than that on ML100K. For example, the improvements in terms of Recall@20 are 10.94% and 5.69% on ML100K and ML1M, respectively.
5.4 Influence of trade-off parameter
To answer RQ4, we investigate the impact of on IDSR by ranging it from to with a step size of . The results are shown in Fig. 3.
We can see that the accuracy metrics, i.e., Recall@20 and MRR@20, show upward trends generally when increases from to . When , IDSR shows the worst performance. However, a noticeable increase is observed when changes from to . It is because that means we only consider diversity without accuracy, thus the model cannot be trained well to recommend relevant items. IDSR shows its best performance in terms of accuracy metrics with at around 0.5 on ML100K, while around 0.8 on ML1M dataset.
Regarding recommendation diversity, IDSR increases when changes from 0 to 0.1 and then decreases from 0.4 (0.2) to 1 in terms of ILD@20 on both datasets. The best diversification performance of IDSR appears with a small value of , i.e., 0.2 on ML100K and 0.4 on ML1M. When is set to 1, ILD@20 of IDSR drops significantly as it indicates that we do not consider diversification in our IDP loss. To conclude, Fig. 3 demonstrates that our designed IDP loss can boost the performance of IDSR when we take both of accuracy and diversification into consideration simultaneously.
5.5 Case study
To answer RQ5, we select an example from the test set of ML100K to illustrate the different recommendation results by IDSR and HRNN in Fig. 4.
Fig. 4 shows 8 movies that the user watched recently and the top 5 recommendations generated by IDSR and HRNN models, respectively. The ground truth item is marked with red box. According to the user’s historical watchings, we can tell that the user like Drama the most. But the user also shows interest in Comedy, Action, Thriller. HRNN recommends four Drama movies which only takes care of the main intent of this user. Differently, IDSR accommodates all intents and diversifies the recommendation list with Drama, Comedy, Action, Adventure and Thriller movies. Especially, IDSR also recognizes the most important intent and rank a Drama movie at the top. This confirms that IDSR can not only mine users’ implicit intents, but also generate a diversified recommendation list to cover those intents.
6 Conclusion and Future Work
In this paper, we propose IDSR model to improve the diversification for SRs. We devise the IIM module to capture users’ multiple intents and IDP decoder to generate a diversified recommendation list covering those intents. We also design an IDP loss to supervise the model to consider accuracy and diversification simultaneously during training. Our experimental results and in-depth analyses confirm the effectiveness of IDSR on two datasets.
As to future work, on the one hand, we plan to apply our model to other recommendation scenes, e.g., shared-account recommendations where the behaviors can be from multiple users with totally different intents. On the other hand, we hope to improve the recommendation accuracy by incorporating useful mechanisms from recent SR models into IDSR.
-  (2016) A survey on search results diversification techniques. Neural Computing and Applications 27 (5), pp. 1207–1229. Cited by: §1.
-  (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering. 17 (6), pp. 734–749. Cited by: 1st item.
-  (2009) Diversifying search results. In WSDM ’09, New York, NY, USA, pp. 5–14. Cited by: §3.5.
-  (2015) Optimal greedy diversity for recommendation. In IJCAI’15, pp. 1742–1748. Cited by: §1, 3rd item.
-  (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR ’98, New York, NY, USA, pp. 335–336. Cited by: §2.2.
-  (2018) Fast greedy map inference for determinantal point process to improve recommendation diversity. In NIPS’18, USA, pp. 5627–5638. Cited by: §1.
-  (2017) Learning to recommend accurate and diverse items. In WWW ’17, pp. 183–192. Cited by: §2.2.
-  (2010) Understanding the difficulty of training deep feedforward neural networks. In AI&Statistics ’10, Chia Laguna Resort, Sardinia, Italy, pp. 249–256. Cited by: §4.4.
-  (2018) NAIS: neural attentive item similarity model for recommendation. IEEE Transactions on Knowledge and Data Engineering 30 (12), pp. 2354–2366. Cited by: §2.1.
-  (2016) Session-based recommendations with recurrent neural networks. In ICLR ’16, Cited by: §1.
-  (2016) Session-based recommendations with recurrent neural networks. In ICLR’16, pp. 1–10. Cited by: 3rd item.
-  (2018) Self-attentive sequential recommendation. In ICDM ’18, pp. 197–206. Cited by: §1.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.4.
Determinantal point processes for machine learning. Now Publishers Inc., Hanover, MA, USA. Cited by: §2.2.
-  (2017) Diversity in recommender systems a survey. Know.-Based Syst. 123 (C), pp. 154–162. Cited by: §1.
-  (2017) Neural attentive session-based recommendation. In CIKM ’17, New York, NY, USA, pp. 1419–1428. Cited by: §3.2, §4.3.
-  (2018) STAMP: short-term attention/memory priority model for session-based recommendation. In KDD ’18, New York, NY, USA, pp. 1831–1839. Cited by: §4.3.
-  (2018) Sequence-aware recommender systems. ACM Computing Surveys 51 (4), pp. 66:1–66:36. Cited by: §1.
-  (2017) Personalizing session-based recommendations with hierarchical recurrent neural networks. In RecSys ’17, New York, NY, USA, pp. 130–137. Cited by: §1, 4th item, §4.1.
-  (2019) Context-aware sequential recommendations with stacked recurrent neural networks. In The Web Conference, pp. 3172–3178. Cited by: §1.
-  (2019) RepeatNet: a repeat aware neural recommendation machine for session-based recommendation. In AAAI ’19, Cited by: §1.
-  (2010) Factorizing personalized markov chains for next-basket recommendation. In WWW ’10, New York, NY, USA, pp. 811–820. Cited by: §1, 2nd item.
-  (2016) A framework for recommending relevant and diverse items. In IJCAI’16, pp. 3868–3874. Cited by: §1.
-  (2017) Attention is all you need. CoRR abs/1706.03762. Cited by: §3.3.
-  (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30, pp. 5998–6008. Cited by: §2.1.
-  (2019) Recent advances in diversified recommendation. CoRR abs/1905.06589. External Links: Cited by: §1, §2.2.
-  (2008) Avoiding monotony: improving the diversity of recommendation lists. In RecSys ’08, New York, NY, USA, pp. 123–130. Cited by: §4.3.
-  (2017) Deep learning based recommender system: A survey and new perspectives. arXiv preprint arXiv:1707.07435. Cited by: 2nd item.
-  (2001) Using temporal data for making recommendations. In UAI ’01, San Francisco, CA, USA, pp. 580–588. Cited by: §2.1.