Purchase demand refers to a user’s desire and willingness to pay a price for a specific product. It implies a user’s strong purchase intent for a product, and can be the most useful way to promote products sales if well utilized (Ha et al., 2002; Skaer et al., 1993). Users’ purchase demands are time sensitive (Afeche et al., 2015), hence a good recommender system should not only be able to find the right products, but also recommend them at the right time to meet the demands of users, so as to maximize their values. So far, the majority of recommendation models, e.g., collaborative filtering (He et al., 2016; Koren, 2008a) and sequence-based models (Wang et al., 2015a; Li et al., 2017; Quadrana et al., 2017), have mainly focused on modeling user’s general interests to find the right products, while the aspect of meeting users’ demands at the right time has been much less explored. Based on this consideration, our aim is to learn the time-sensitive purchase demands from users’ purchase history, and to leverage such information to better understand the real-time demand of a user, so as to make a more accurate prediction of the next item. To fulfill the above purpose, we have to address two challenging issues: (1) how to characterize the time sensitive purchase demands of users; (2) how to utilize such purchase demands in our model for predicting next-item of users.
To characterize the time sensitive demands, inspired by the studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), we summarize two aspects of the time sensitive demands: termed as long-time demands and short-time demands. Long-time demands refer that a user purchase the same product repetitively, showing a long-time persistent interest (Bhagat et al., 2018); while short-time demands refer the co-purchase of items (Guidotti et al., 2017), e.g., buying paintbrushes after pigments. These two kinds of demands are commonly seen and had been much accounted in marketing (Tsai and Chiu, 2004). For example, online companies e.g., contact lenses company Clearly111https://www.clearly.ca/ and drug mart shoppers222https://www1.shoppersdrugmart.ca/en/health-and-pharmacy/patient-contact
predict the long-time repeated purchase demands of users by estimating the service life of products user had purchased. The companies will send emails or notifications to users after authorizations, with the aim of reminding users to refill up products before they run out of the products. The short-time demands are also identified as a marketing strategy for promotional activities in online retail shops(Ha et al., 2002), e.g., the system will recommend some co-purchased products after you had bought one related products (Guidotti et al., 2017; Yap et al., 2012). Although such long- and short-time demands of users can be estimated by some simple quantity or statistics scenarios, in most e-commerce websites, the service life of a large amount of products are hard to estimated, and co-purchase items may also vary due to different using or purchasing habits of users.
For utilizing such purchase demands, we propose a novel Long-Short Demands-aware Model (LSDM), in which both user’s interests towards items and user’s demands over time are incorporated.
To model users’ long-short time demands, We use a time scale (detail see in Sec. 2.2) to cluster the successive product purchases within a time span together (i.e., short-time demands) in order to cope with users’ repeated purchase demands of items at a certain time frequency (i.e., long-time demands). However, the repeated purchase frequencies of the same or similar products could vary greatly, e.g., a user may purchase milk more frequently than the detergent. Such different purchase demands may not be easily discovered from a single time scale which did in most session-based models (Hidasi et al., 2015; Li et al., 2017; Jannach and Ludewig, 2017). Also, the single time scale may be insufficient to group the various types of the co-purchase items. In our model, we design multiple-time scales to learn from different types of long-short purchase demands. More specifically, we propose a hierarchical neural architecture with multi-time scales in LSDM (see in Figure 1), in which users’ purchase preferences of items over time are captured by recurrent neural networks, and long-short purchase demands in different time scales are address by joint learning strategies.
We found limited research about demands-aware purchase recommendations and none of that deals with the real-world purchase data on e-commerce websites. Our contributions in this paper are as follows:
We propose a novel Long-Short Demands-aware Model (LSDM), in which both users’ interests towards items and user demands over time are incorporated. To the best of our knowledge, such end-to-end neural model which automatically incorporate users’ purchase demands into recommendation model has been much less explored.
We characterize two aspects of users’ purchase demands: long-time demands (i.e., repeated purchase demands) and short-time demands (i.e., successive purchase demands), and incorporate them into a hierarchical architecture with multi-time scales.
Experimental results on three real-world commercial datasets (Ta-Feng, BeiRen and Amazon) demonstrate the effectiveness of our model for next-item recommendation, showing the usefulness of modeling user long-short purchase demands of items.
The rest of this paper is organized as follows: we first introduce some preliminary definition in Section 2. Following it, section 3 presents our long-short demands-aware model for recommendation and section 4 introduces the experiments. Related work is summarized in section 5. Finally, section 6 concludes this paper.
In this section, we formulate the next-item recommendation problem and give a detail explanation of some key concepts (e.g., long-short purchase demands and time scales) in our paper.
2.1. Problem Statement
Assume we have a set of users and items, denoted by and respectively. Let denote a user and an item. The number of users and items are denoted as and respectively. Given a user , his purchase records is a sequence of items sorted by time, which can be represented as , where is the item purchased by user at time . Given the purchase history of a user , the next-item recommendation aims to predict the next item that
would probably buy at time:
where is the probability of item being purchased by at the next time , and is the prediction function. The prediction problem can also be formulated as a ranking problem of all items for each user. With the ranked list of all items, we recommend the top items to the user.
2.2. Long-Short Purchase Demands
Purchase demands refer to a user’s desire and willingness to pay a price for a specific product. Inspired by the studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), we summarize two aspects of the time sensitive demands: termed as long-time demands and short-time demands.
Long-time demands. Long-time demands refer to the fact that a user purchases the same product repetitively, showing a long-time persistent interest (Bhagat et al., 2018).
Short-time demands. Short-time demands refer to the co-purchase purchase demands of users, e.g., buying paintbrushes after pigments.
Time scale. Time scale refers to a specific unit of time that divides up the purchase history of users into meaningful periods. We formulate the long-short demands with different time scales as follows: given a user and his purchase records , we use time window to define the time scale interval. With all purchases within a time window being grouped together, the purchase records of a user can be grouped into sub-sequences by time scale as follows:
where is a set of items within the same time window.
Multi-time scales. Multi-time scales are composed with multiple different time scales. As we mentioned above, for modeling users’ long-short time demands, we use time scale to cluster the successive products together (i.e., short-time demands), and cope with users’ repeated purchase demands of items at different time frequencies (i.e., long-time demands). However, as shown in Fig. 1, the repeated purchase frequencies of same products could vary greatly, e.g., user purchase pigments more frequently than the paintbrushes, which may not be easily discovered from a single time scale. Also, the single time scale may be insufficient to group the various types of the co-purchase items, i.e., successively purchasing paintbrushes after pigments ( a kind of short-time demands in a small time scale) and a laptop and its cleaner are purchased separated by a few products ( a short-time demand in a slightly larger time scale). Hence we use multi-time scales to model a more general long-short purchase demands (see Sec. 3.2).
By organizing the purchase history of a user with multi-time scales, our model is powerful to model more general successive purchase demands (i.e., short-time demands) and repeated purchase demands (i.e., long-time demands). The utility of multiple time scales to observe user’s purchase sequence is well documented in studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), which showed abundant evidence that human activities are largely regulated at several time scales and the final decision is based on interposition of them. The design of our model aims to fit these general observations on human behaviors.
3. Our Proposed Model
In this section, we describe our Long-Short Demands-aware Model (LSDM) for personalized next-item recommendation. we design a hierarchical neural architecture in LSDM with multi-time scales (see in Figure 2).
We use a sequential modeling of users’ purchase preferences of items over time via LSTM in each time scale. The final prediction of next-item is calculated by applying joint learning of the long-short purchase demands with multi-time scales.
3.1. Modeling Purchase Records
Two main elements should be modeled: the sequence of purchased items and the user’s interests.
3.1.1. Encoding Attentive Transaction Information.
In each time scale, we use a time window (see in Eq. 2) to group all purchases within a time window together, which is termed as a transaction. Given a user and his purchase records , a transaction at step is . We represent the information of the set of items using an -dimensional one-hot representation, denoted by , in which only the entry corresponding to the item involved in the transaction will be set to 1. Then a lookup layer is applied to translate each item
into a latent vector
where is the transformation vector for lookup, and is the embedding dimension of each item. To obtain the representation of , we adopt a concatenation operation to integrate the information of all items in transaction by
where is the latent vector of transaction . is the number of items in
. Since the number of items in each transaction varies, we use a masked zero-padding value in the embedding layer to convert each transaction to a fixed-dimension of representation vector.
Inspired by the success of attention mechanism in capturing the important information of previous states (Luong et al., 2015; Li et al., 2017), we adopt an attention mechanism to relay user’s appetite of a transaction to another. For a transaction , we assume the user’s appetite for the transaction is . is initialized randomly and learned through the training process over all transactions. We integrate attention weights with the latent vector of the transaction as follows
where “” denotes the element-wise product of two vectors. Once we have obtained the attentive representations for each transaction , we introduce how to model user’s sequential behavior.
3.1.2. Modeling User’s Sequential Behavior.
Given a user , we represent it using a -dimensional one-hot representation, denoted by . Then we apply a lookup layer to transform the one-hot vectors of into latent vectors
where is the transformation vector for lookup.
The sequence of transactions at time granularity of user is . We obtain the representation of each attentive transaction in Eq. 5. The sequence of all attentive transactions can be represented as
. To model the sequential behavior of a user, we adopt Long Short-Term Memory (LSTM) networks(Hochreiter and Schmidhuber, 1997), which have proven effective in solving sequence learning problems (He et al., 2017a). The input of LSTM at step is . The output of LSTM (i.e., hidden state) is represented as . We model the interaction of user and transaction by
where (see in Eq. 6) is the embedding vector of user and is the interaction vector of user and current transaction. By updating user’s latent vector at each step of transaction, our model can learn user’s evolving interests to items in the sequential records.
Given a user and his or her previous transactions , we define the probability of an item being purchased in the next transaction by a softmax function
where is the interaction vector of the user and the transaction at step (see in Eq. 7). For the whole set of items , the predicted probability can be represented as .
3.2. Joint Learning with Multi-Time Scales
As introduced in Sec. 2.2, we use the time scales with different time windows to capture the long and short purchase demands of users. The different time scales can be denoted as
. At each training epoch, the prediction results (see in Eq.8) of the next step at all time scales are denoted as , where is the matrix of prediction results of our long-short demands-aware model. Then we feed into a joint learning function to generate the final prediction results, denoted as
where is the final prediction results of our model and is the joint learning function.
The joint learning function is flexible to be extended to an arbitrarily complex method. We will discuss it in the experiment section (see in ). In our experiments, we consider two kinds of joint learning functions: linear (i.e., average, max and weighted joint learning) and non-linear (i.e.,multilayer perceptron joint learning) functions. Given the set of time scales and their corresponding prediction results , the four joint learning functions are defined as follows
Average joint learning function. It uses the average of the prediction results of all time scales.
Max joint learning function. It uses a maximum operation on the prediction results of all time scales.
Weighted joint learning function. It learns user’s preference of all items in the different time scales by the weights. Given a time granularity , we learn a weight vector of . is initialized randomly and learned automatically in the training process of our model. We obtain the weighted prediction results of all time scales by
Multilayer Perceptron (MLP) joint learning function. This is a non-linear joint learning function. We examine if this function is more powerful to capture user’s preference at different time scales. We first concatenate the prediction results at different time scales by
where concat is the concatenate operation of vectors. Then a multilayer perceptron (Gardner and Dorling, 1998) is used to obtain by
where , , and-th layer’s perceptron respectively. For activation functions
of MLP layers, one can freely choose among sigmoid, hyperbolic tangent (tanh), and Rectifier (ReLU), among others.
3.3. The Loss Function for Optimization
Our LSDM is optimized by a joint learning process. The objective functions to be optimized in all time scales is denoted as . The objective function in our LSDM can be defined as
where are sequences at different time scales. The model parameters on different time scales are . (see in Eq. 9) and are the learning function of LSDM and each time scale model respectively. The parameters to be learned in LSDM are .
For a time scale , we adopt a weighted cross-entropy as the optimization objective at each step of LSTM:
where is the th transaction in time scale . is the probability of an item being purchased in the next transaction (see in Eq. 8). If an item is purchased in the the next transaction, , otherwise, . and are the weights of positive and negative instances (i.e., item is purchased or not in the next transaction). These weights are used to cope with unbalanced number of positive and negative examples. In our experiments, the ratio of positive and negative instances is about 500, so we set to 500 times higher than to reduce the training bias.
In our LSDM, the objective function is defined as:
where is the probability of items in our LSDM (see Eq. 9).
After training, given a user’s history purchase records, we can obtain the probability of each item being purchased at the next step according to Eq. 8. We than rank the items according to their probability, and select top results as the final recommended items to the user.
We conduct experiments on three real-world datasets to verify the effectiveness of our proposed model for next-item recommendation. In particular, we aim at answering the following questions:
Q1: Are purchase demands of users really useful for next-item recommendation task?
Q2: Are multi-time scales more powerful to capture the long-short purchase demands of users?
Q3: Is the joint learning function effectively incorporate users’ purchase demands with multi-time scales?
In the following section, we will first introduce our experimental settings, including datasets, baselines, and evaluation metrics. Then we will analyze the various experimental results to answer the three questions one by one.
4.1. Experimental Settings
We experiment with three real-world datasets: Ta-Feng333http://www.bigdatalab.ac.cn/benchmark/bm/dd?data=Ta-Feng, BeiRen444http://www.bigdatalab.ac.cn/benchmark/bm/dd?data=Ta-Feng and Amazon555http://jmcauley.ucsd.edu/data/amazon/. Ta-Feng and BeiRen are two online shopping datasets with real purchase records of users. These two datasets are the only public available ones we are aware of that contain the real purchase history of users. We also conduct experiments on Amazon dataset, which is commonly used in many recommender models. Amazon a review dataset: users’ purchase records are collected from reviews666Note that a review usually implies a purchase, but a user may purchase an item without leaving a review.
Ta-Feng (Wang et al., 2015a) is a grocery shopping dataset, it covers products from food, office supplies to furniture. We use the data in a quarter (i.e., from December 2000 to February 2001) of shopping transactions of the Ta-Feng supermarket. Since it is unreliable to include users with few purchase times or limited active time for evaluation, we first remove the products which were bought less than 15 times and then keep users with purchase records in at least 5 weeks. We leave 7,044 items and 1,951 users with total 90,986 purchase records. The average number of purchase records of users are 50 and and the average times each item had been purchased is 14.
BeiRen (Le et al., 2017) is an online shopping dataset. We use 4 months (April 2013 to July 2013) in BeiRen and conduct the same data filtering methods as conducted in Ta-Feng dataset. Finally, we obtain 211,519 purchase records involving 3,264 users on 5,818 items. The average number of purchase records of users is 65.
Amazon (He and McAuley, 2016) is one of the largest Internet retailer in the world. We only can obtain the product review records. since users usually only post reviews after they made product purchases, we assume that reviews on Amazon correspond to actual purchases most of the time (Bai et al., 2018). We use review records in half a year (i.e., from January 1st, 2014 to June 30th, 2014). We first remove the products which have been purchased less than 5 times and then retain users with purchase records in at least 5 weeks. We obtain 6,092 items and 1,443 users with 15,811 purchases. The average number of product being reviewed by user is 11.
4.1.2. Time Scales Selection.
To implement the multi-time scale model, we need to determine what time scales to use. As introduced in Sec. 2.2, time scale is used to cluster the successive products together (i.e., short-time demands), and cope with users’ repeated purchase demands of items at a certain time frequency (i.e., long-time demands). Inspired by the studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), which showed abundant evidence that regularities structure is a defining feature of human activities, and the strongest influence is made by daily, followed by weekly and seasonal regular structures. It has also been found that “rhythms of life” are superimposition of different regularities (Nowak and Vallacher, 1998). So it is well justified to model user’s long-short purchase demands by following these strongest rhythms, i.e., daily, weekly and seasonal. Considering the time periods in our datasets are less than half a year, in our experiments, we select the daily and weekly scales in our model, i.e., daily scale and weekly scale. We also keep the original sequence information of users’ purchase history, we call it item scale. The usefulness of these rhythm based time scales (i.e., daily and weekly scales) are demonstrated in the following Sec. 4.3.
4.1.3. Evaluation Metrics.
Given a user, we infer the next item that the user would probably buy at next purchase. Each candidate method will produce an ordered list of items for the recommendation. We adopt two widely used ranking-based metrics to evaluate the performance of a ranked list: Hit ratio at rank (Hit@) and Normalized Discounted Cumulative Gain at rank (NDCG@).
Hit ratio at rank (HR@). Given the predicted ordered list of items for a user, Hit@ is defined as:
Normalized Discounted Cumulative Gain at rank (NDCG@). Given the predicted ordered list of items for a user, NDCG@ is defined as:
where is the position of items in the ranking list. returns 1 if was adopted by user in original dataset, and 0 otherwise.
Hit@ intuitively measures whether the test item is present in the Top- List, and it accounts for the position of the hit by assigning higher scores to the hit with higher ranks. We report the top (i.e., and ) items in the ranking list as the recommended set.
4.1.4. Baseline Methods Compared.
We compare our model with the state-of-the-art methods from different types of recommendation approaches, including:
Pop. It ranks the items according to their popularity measured by the number of being purchased. This is a widely used simple baseline.
BPR (Rendle et al., 2009). It optimizes the MF model with a pairwise ranking loss. This is a state-of-the-art model for item recommendation, but the sequential information is ignored in this method.
HRM (Wang et al., 2015a): It employs a neural network to conduct a nonlinear operations to integrate the representation of customers and purchase history of items from the adjacent transactions.
RRN (Wu et al., 2017): This is a representative approach that utilizes RNN to learn the dynamic representation of users and items in recommender systems. Our model with sequence of items can be regarded as equivalent to RRN model.
NARM (Li et al., 2017): This is a state-of-the-art approach in personalized session-based recommendation with RNN models. It uses attention mechanism to determine the relatedness of the past purchases in the session for the next purchase. As our datasets do not have explicit information of sessions, we simulate sessions by the transactions within each day.
The above methods cover different kinds of the approaches in recommender systems: BPR is a classical method among traditional recommendation approaches; FPMC and HRM are representative methods which utilize the adjacent sequential information. RNN and NARM are recent methods using the whole sequential information for recommendation. Table 1 summarizes the properties of different methods. Our LSDM is a demands-aware model. Short-time demands of users are model by the local sequence information of items within a transaction, and long-time demands are captured by the global information from the whole sequence of records.
Other sequential methods, e.g., DREAM (Yu et al., 2016), TransRec (He et al., 2017a), user-based RNN (Donkers et al., 2017) and HRNN (Quadrana et al., 2017), are similar to our baseline methods, so they are not included in our comparison. For sequence methods with auxiliary information, e.g., content-based neural model (Suglia et al., 2017; Beutel et al., 2018)
, neural tensor factorization(Wu et al., 2018) and pattern mining based model (Yap et al., 2012; Song and Yang, 2014; Guidotti et al., 2017; Quadrana et al., 2018), we also do not make comparisons due to additional information used in them.
4.1.5. Parameter Settings.
The hyper parameters of each method, with which we obtain the best prediction results, are listed below. (1) BPR: the latent factors are 300, 300,200, the learning rates are 0.001, 0.001, 0.0005 in Ta-Feng, BeiRen and Amazon datasets respectively. (2) FPMC: the latent factors are 32,32 and 16, the learning rates are 0.015, 0.01, 0.001 in the three datasets. (3) HRM: the embedding size is 40, the learning rate is 0.005 and droprate is 0.5 in all datasets. (4) RRN: The embedding size is 50, the learning rate is 0.001 in all datasets. Batch size is set to 100, 20, 100 respectively. (5) NARM: the embedding sizes are 25, 15, 20, with 25, 25, 20 hidden units. The learning rates are 0.0001, 0.0008, 0.0008 and batch sizes are 256, 640, 640 in the three datasets. (6) MGASM: The embedding size is 50, the learning rate is 0.001 in all datasets. Batch sizes are 100, 20, 100 in the three datasets respectively.
For all the methods, we take the last item of each user as the predicting target, the penultimate item as the validation data for model selection, and the remaining part in each sequence as the training data to optimize the model parameters.
4.2. Performance Comparison (RQ1)
We present the results of Hit@ and NDCG@, (i.e., and ) on the next-item recommendation performance in Table 2.
Note: LSDM uses three typical time scales, i.e., item, daily and weekly ( which discussed in Sec. 4.3) and MLP joint learning function.
indicates the statistically significant improvements (i.e., two-sided -test with ) over the best baseline.
We have the following observations:
(1) Pop is the weakest baseline in all datasets, since it is a non-personalized method. BPR performs better than Pop, but is not as good as FPMC, which uses adjacent sequential information of the transition cubes. This shows that the local adjacent sequential information is useful in predicting the next item.
(2) HRM and RRN perform better than BPR and FPMC that do not use neural network in Ta-Feng and BeiRen datasets. It indicates that neural network is capable of modeling complex interactions between user’s general taste and their sequential behavior. RRN performs better than HRM, which may lie in that RRN model uses recurrent neural network to learn from the whole sequential data, while HRM only utilizes the adjacent sequential information.
(3) NARM is the state-of-the-art neural model for sequential prediction task, and performs the best among all of the baseline methods except on Hit on BeiRen dataset. The attention mechanism enables NARM to attend to the most related purchases in the sessions and generate more accurate results. The effect is most visible on NDCG.
(4) Our LSDM with three typical time scales, i.e., item, daily and weekly, and MLP joint learning function ( which will discussed in Sec. 4.3 and Sec. 4.4) performs the best in almost all datasets. LSDM significantly outperforms all the baseline methods on Ta-Feng and BeiRen datasets. This indicates that the long-short purchase demands information used in our model is useful in predicting the real-time purchase demands of next item. Compared with RRN, LSDM not only utilizes global information from the whole sequence, but also captures some more complex local sequential information by grouping items into transactions. Compared with NARM, the multi-time scales of LSDM can adaptively learn from different repeated purchase demands and co-purchase items, which creating a more fine-grained model for users.
(5) In Amazon dataset, although our LSDM improves the recommendation performance, the results only significantly in Hit@. Similarly, sequence model HRM and RRN also loss advantages compared with BPR and FPMC. The reason is that Amazon dataset is a review dataset, in which the purchase records of a user are collected from users’ reviews. Users will not write reviews after all purchases, hence such incomplete sequence may not be well learned by sequence models.
To further verify whether there exists repeated purchase of items at different time scales, we calculate the percentage of users who at least periodically purchase one item (i.e., an item is purchased successively in at least half of all transactions). The percentages are 20.4%, 41.2%, 5.3% at time scale of days, while 43.4%, 75.0%, 10.7% at weeks in Ta-Feng, BeiRen and Amazon datasets respectively. This indicates that many users have repeated purchase records in purchase datasets Ta-Feng and BeiRen, while the repeated purchases are far smaller in Amazon, due to it is a review datasets with incomplete real purchase records. Due to the incomplete sequence information in Amazon dataset, in the following results analysis, we only use real purchase datasets Ta-Feng and BeiRen to demonstrate the effectiveness of our model.
4.3. Usefulness of Multi-Time Scales (RQ2)
In our LSDM, we use rhythm based time scales, i.e., daily and weekly scales, and item scale. For demonstrate the usefulness of multi-time scales used in our model. We address two issues:
The effectiveness of rhythm based time scales. Following the “rhythms of life” in society theory (Nowak and Vallacher, 1998), we use daily and weekly time scales (i.e., LSDM and LSDM), termed as rhythm based time scale in our experiments. To demonstrate the effectiveness of these rhythm based time scales, we generate other no-rhythm based time scales , i.e.,
we cluster every two, five, ten items in the purchase sequence into a transaction. We borrow the terms in natural language processing, call these time scales as 2-gram, 5-gram and 10-gram scales (i.e., LSDM, LSDM and LSDM). We compute the increased ratios on Hit@ and NDCG@ of the different time scales compared to the model with item scale only (i.e., LSDM).
The usefulness of multi-time scales. We further examine the usefulness of our multi-time scales, we compare our LSDM with degraded LSDM, i.e., LSDM, LSDM and LSDM.
We compute the percentage of improvement on Hit@ and NDCG@ of the different methods over the model with item scale only (i.e., LSDM), and present the results in Fig. 3. We can see that: (1) It can be observed that both rhythm based LSDM and no-rhythm based LSDM methods are better than the model with item scale only. This shows that much of the underlying long and short purchase demands, e.g., co-purchase information, may fail to be extracted from item sequence. Any time scales we consider is helpful to learn the long-short demands for next item prediction task; (2) The rhythm based LSDM performs better than the no-rhythm based model on all datasets. This indicates that the time rhythms, i.e., daily and weekly, are really help to capture user’s purchase demands in the next item prediction; (3) The performance of LSDM with multi-time scales is the best, i.e., the performance of LSDM is better than degraded LSDM and LSDM. It indicates that user’s complex purchase demands can be well captured by multi-time scales. The performance of daily scale is higher than that of weekly scale, which indicates that the time scale of a day is more useful than that of week on our datasets. However, this observation is highly dependent on the data. It is possible that larger time scale become more useful on a dataset of larger time span (e.g., covering records of several years). Nevertheless, we can conclude now that different time scales are useful to detect different user purchase demands. These time scales tend to be complementary, leading to improved results when they are all considered. The multi-time scales architecture in our model is flexible to learn from any time time scales, e.g., the period years, which is easily to extend if we have a longer observation of the purchase records of users.
4.4. Effects of Joint Training Strategy (RQ3)
We use a joint learning function (see Eq. 9) to integrate the purchase demands with multi-time scales in LSDM. In our experiments, we examine different joint learning functions: average, max, weighted and multilayer perceptron (MLP). The MLP method uses one layer, and sigmoid as activation function. We present the results of our LSDM with the four joint learning functions on Hit@ and NDCG@, (i.e., and ) in Table 3.
We can see that the MLP joint learning function performs the best on both datasets. This implies that non-linear function (i.e., MLP) is more effective to capture the purchase demands information in different time scales. To further examine the effects of joint learning, we present the training loss and experimental performance on Hit@5 and NDCG@5 of LSDM and degenerated model with a single time scale (i.e., item, daily and weekly) in Fig. 4. We can see that: (1) The training loss is smaller in LSDM than other degenerated models, and the item scale is the worst from this perspective. (2) As the iteration increases, LSDM tends to outperform the degenerated models and converges faster than others; (3) The performance of degenerated LSDM with time scale of day is better than week, and the item scale is the worst. This indicates that much of the purchase behavior may can not be observed from the item sequence, while it is easier by using a larger time scale of day.
5. Related Work
Recommender systems have attracted a lot of attentions from the research community and industry. According to whether the sequence information is used, we summarizes the related methods of recommender system as follows.
Non-sequential Methods. Traditional non-sequential approaches at the early stage can be roughly divided into two categories, namely memory-based approaches and model-based approaches (Sarwar et al., 2001). Memory-based methods mainly rely on the neighborhood information for collaborative filtering; while model-based methods try to learn a prediction function using the history data. Model-based collaborative filtering methods such as matrix factorization algorithms and their variants have been proven to be effective to address the scalability and sparsity challenges in recommendation tasks (Koren et al., 2009; Rendle et al., 2009; Koren, 2008b; Chen et al., 2017)
. Recently, deep learning techniques have been successfully applied in recommendation tasks and some pioneering studies have yielded promising results. Deep recommendation models mainly utilize deep learning techniques as a powerful data representation model, in which complicated user-item interactions and auxiliary information can be modeled in a unified representation. For example, neural rating prediction(Salakhutdinov et al., 2007), neural collaborative filtering (He et al., 2017b; Bai et al., 2017) and auto-encoder based recommender (Sedhain et al., 2015; Wang et al., 2015b; Wu et al., 2016). Those traditional approaches and the neural methods do not utilize sequential information, which disable them to capture user’s varying appetite of items over time.
Sequential Methods. Detecting the purchase appetites of users and their evolution over time has been an active research topic in recent years. The main approaches to model the sequential behavior of a user have been developed in different recommendation settings: next-basket recommendation (Rendle et al., 2010; Wang et al., 2015a; Yu et al., 2016; Guidotti et al., 2017), session-based (Hidasi et al., 2015, 2016; Quadrana et al., 2017; Li et al., 2017; Jannach and Ludewig, 2017) and direct transaction-based recommendation (Song et al., 2016; He et al., 2017a; Donkers et al., 2017; Wu et al., 2017; Beutel et al., 2018; Wu et al., 2018). The next-basket prediction aims at predicting what items the user could put in his basket. The items are certainly dependent on the general interests of the user, but are also dependent on the items that the user has purchased in both his previous baskets and current basket. Two main approaches have been used to address the next basket recommendation problem: Markov Chains (MC) and Recurrent Neural Network (RNN). The Factorizing Personalized Markov Chains (FPMC) approach (Rendle et al., 2010) models both user’s sequential behavior and general tastes by conducting a tensor factorization over the transition cubes. The RNN based model, e.g., Hierarchical Representation Model (HRM) (Wang et al., 2015a), improves FPMC by employing a two-layer architecture to construct a non-linear hybrid aggregation of the user profile vector and the transaction representation. Dynamic REcurrent bAasket Model (Yu et al., 2016) (DREAM) adopts RNN to model global sequential features which reflect interactions among baskets, and uses the hidden state of RNN to represent user’s dynamic interests over time. Session-based sequence models are commonly used in the web page clicking scenarios. It is different from next basket recommendation in that the order of clicks on items in a session is considered. RNN-based methods (Medsker and Jain, 2001; Hidasi et al., 2015, 2016; Quadrana et al., 2017; Li et al., 2017; Jannach and Ludewig, 2017) are usually adopted to capture the long historical records of users. In (Quadrana et al., 2017), the user’s characteristics are learned by modeling user’s representation in the sequence. The rich features of items are also incorporated into RNN model to learn the preference of users (Hidasi et al., 2016). To make more accurate prediction, attention mechanism is utilized in (Li et al., 2017) to capture user’s main interests in the current session. Different from basket and session-based methods, which generally cluster items explicitly into baskets or sessions, some transaction-based approaches directly model the sequence of transaction of items (Song et al., 2016; He et al., 2017a; Donkers et al., 2017; Wu et al., 2017; Beutel et al., 2018; Wu et al., 2018). In addition, sequential patterns have also been extracted to reflect the co-occurrences (or dependencies) of items or periodical characteristic of item purchases (Mobasher et al., 2002; Tzvetkov et al., 2005; Yap et al., 2012; Guidotti et al., 2017). Recent work also leverage low-rank tensor completion and product category inter-purchase duration vector (Yi et al., 2017) to model the duration of items instead of the time scales of purchases.
In all the above studies, we observe that the models work on a single sequence of items, transactions or sessions. The previous studies demonstrated that some types of purchase patterns can be extracted from such a sequence, but none of them attempted to extract different patterns at different time scales. This latter is exactly the goal of our study - our LSDM considers several sequences at different time scales so as to draw a more complete picture of the sequential behavior of the user and allows us to discover various co-purchase patterns and repeated purchasing at different time scales. Modeling users’ purchase with multi-time scales enables our model to better understand the real-time purchase demands of users and recommend the items at the right time. We will show in our experiments that this results in better predictions.
In this paper, we explored the utilization of different time scales for next-item recommendation. Our assumption was that different long- and short time purchase demands (i.e., repetitive purchase and co-purchase) of users can exhibit with different time scales. This assumption was validated by the experimental results in our model on next-item recommendation task. Our proposed Long-Short Demands-aware Model (LSDM) captures both user’s interests towards items and user’s demands over time. Experimental results on three public datasets (i.e., Ta-Feng, BeiRen and Amazon) demonstrate the effectiveness of our model. While the idea of using multiple time scales is validated, our implementation can be further improved, with respect to detect the best time scales from the data automatically. It is also possible to incorporate richer information in the recommendation process, such as attribute information of items (i.e., category, price) and textual description of items, etc. We will explore these avenues in the future.
- Afeche et al. (2015) Philipp Afeche, Opher Baron, Joseph Milner, and Ricky Roet-Green. 2015. Pricing and prioritizing time-sensitive customers with heterogeneous demand rates. Under review (2015).
- Bai et al. (2017) Ting Bai, Ji-Rong Wen, Jun Zhang, and Wayne Xin Zhao. 2017. A Neural Collaborative Filtering Model with Interaction-based Neighborhood. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1979–1982.
- Bai et al. (2018) Ting Bai, Xin Zhao, Yulan He, Jian-Yun Nie, and Ji-Rong Wen. 2018. Characterizing and Predicting Early Reviewers for Effective Product Marketing on E-Commerce Websites. IEEE Transactions on Knowledge and Data Engineering (2018).
- Beutel et al. (2018) Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and H Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. (2018).
- Bhagat et al. (2018) Rahul Bhagat, Srevatsan Muralidharan, Alex Lobzhanidze, and Shankar Vishwanath. 2018. Buy It Again: Modeling Repeat Purchase Recommendations. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 62–70.
- Chen et al. (2017) Chao Chen, Dongsheng Li, Qin Lv, Junchi Yan, Li Shang, and Stephen M. Chu. 2017. GLOMA: Embedding Global Information in Local Matrix Approximation Models for Collaborative Filtering. In AAAI. 1295–1301.
- Donkers et al. (2017) Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential User-based Recurrent Neural Network Recommendations. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 152–160.
- Gardner and Dorling (1998) Matt W Gardner and SR Dorling. 1998. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment 32, 14-15 (1998), 2627–2636.
- Guidotti et al. (2017) Riccardo Guidotti, Giulio Rossetti, Luca Pappalardo, Fosca Giannotti, and Dino Pedreschi. 2017. Next Basket Prediction using Recurring Sequential Patterns. arXiv preprint arXiv:1702.07158 (2017).
- Ha et al. (2002) Sung Ho Ha, Sung Min Bae, and Sang Chan Park. 2002. Customer’s time-variant purchase behavior and corresponding marketing strategies: an online retailer’s case. Computers & Industrial Engineering 43, 4 (2002), 801–820.
- He et al. (2017a) Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017a. Translation-based Recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 161–169.
- He and McAuley (2016) Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 507–517.
- He et al. (2017b) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017b. Neural Collaborative Filtering. In WWW. 173–182.
- He et al. (2016) Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 549–558.
- Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- Hidasi et al. (2016) Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 241–248.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- Jannach and Ludewig (2017) Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 306–310.
- Koren (2008a) Yehuda Koren. 2008a. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 426–434.
- Koren (2008b) Yehuda Koren. 2008b. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD. 426–434.
- Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009).
- Le et al. (2017) Duc Trong Le, Hady W Lauw, and Yuan Fang. 2017. Basket-sensitive personalized item recommendation. IJCAI.
- Li et al. (2017) Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1419–1428.
- Luong et al. (2015) Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).
- Medsker and Jain (2001) LR Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications 5 (2001).
- Mobasher et al. (2002) Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. 2002. Using sequential and non-sequential patterns in predictive web usage mining tasks. In Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE, 669–672.
- Nowak and Vallacher (1998) Andrzej Nowak and Robin R Vallacher. 1998. Dynamical social psychology. Vol. 647. Guilford Press.
- Quadrana et al. (2018) Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems. arXiv preprint arXiv:1802.08452 (2018).
- Quadrana et al. (2017) Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 130–137.
- Rahimi and Wang (2013) Seyyed Mohammadreza Rahimi and Xin Wang. 2013. Location recommendation based on periodicity of human activities and location categories. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 377–389.
Rendle et al. (2009)
Steffen Rendle, Christoph
Freudenthaler, Zeno Gantner, and Lars
BPR: Bayesian personalized ranking from implicit
Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 452–461.
- Rendle et al. (2010) Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 811–820.
- Salakhutdinov et al. (2007) Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey E. Hinton. 2007. Restricted Boltzmann machines for collaborative filtering. In ICML. 791–798.
- Sarwar et al. (2001) Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. ACM, 285–295.
et al. (2015)
S. Sedhain, A. K. Menon,
S. Sanner, and L. Xie.
Autorec: Autoencoders meet collaborative filtering. InWWW. 111–112.
- Skaer et al. (1993) TL Skaer, DA Sclar, DJ Markowski, and JK Won. 1993. Effect of value-added utilities on prescription refill compliance and health care expenditures for hypertension. Journal of human hypertension 7, 5 (1993), 515–518.
- Song and Yang (2014) Wei Song and Kai Yang. 2014. Personalized Recommendation Based on Weighted Sequence Similarity. In Practical Applications of Intelligent Systems. Springer, 657–666.
- Song et al. (2016) Yang Song, Ali Mamdouh Elkahky, and Xiaodong He. 2016. Multi-rate deep learning for temporal recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 909–912.
- Suglia et al. (2017) Alessandro Suglia, Claudio Greco, Cataldo Musto, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro. 2017. A Deep Architecture for Content-based Recommendations Exploiting Recurrent Neural Networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, 202–211.
- Tsai and Chiu (2004) C-Y Tsai and C-C Chiu. 2004. A purchase-based market segmentation methodology. Expert Systems with Applications 27, 2 (2004), 265–276.
- Tzvetkov et al. (2005) Petre Tzvetkov, Xifeng Yan, and Jiawei Han. 2005. TSP: Mining top-k closed sequential patterns. Knowledge and Information Systems 7, 4 (2005), 438–457.
- Wang et al. (2015b) Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015b. Collaborative Deep Learning for Recommender Systems. In KDD. 1235–1244.
- Wang et al. (2015a) Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2015a. Learning hierarchical representation model for nextbasket recommendation. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, 403–412.
- Wu et al. (2017) Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J Smola, and How Jing. 2017. Recurrent recommender networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 495–503.
- Wu et al. (2018) Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, and Nitesh Chawla. 2018. Neural Tensor Factorization. arXiv preprint arXiv:1802.04416 (2018).
- Wu et al. (2016) Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. 2016. Collaborative denoising auto-encoders for top-n recommender systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 153–162.
- Yap et al. (2012) Ghim-Eng Yap, Xiao-Li Li, and Philip S Yu. 2012. Effective next-items recommendation via personalized sequential pattern mining. In International Conference on Database Systems for Advanced Applications. Springer, 48–64.
- Yi et al. (2017) Jinfeng Yi, Cho-Jui Hsieh, Kush R Varshney, Lijun Zhang, and Yao Li. 2017. Scalable Demand-Aware Recommendation. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 2412–2421. http://papers.nips.cc/paper/6835-scalable-demand-aware-recommendation.pdf
- Yu et al. (2016) Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 729–732.