1. Introduction
Purchase demand refers to a user’s desire and willingness to pay a price for a specific product. It implies a user’s strong purchase intent for a product, and can be the most useful way to promote products sales if well utilized (Ha et al., 2002; Skaer et al., 1993). Users’ purchase demands are time sensitive (Afeche et al., 2015), hence a good recommender system should not only be able to find the right products, but also recommend them at the right time to meet the demands of users, so as to maximize their values. So far, the majority of recommendation models, e.g., collaborative filtering (He et al., 2016; Koren, 2008a) and sequencebased models (Wang et al., 2015a; Li et al., 2017; Quadrana et al., 2017), have mainly focused on modeling user’s general interests to find the right products, while the aspect of meeting users’ demands at the right time has been much less explored. Based on this consideration, our aim is to learn the timesensitive purchase demands from users’ purchase history, and to leverage such information to better understand the realtime demand of a user, so as to make a more accurate prediction of the next item. To fulfill the above purpose, we have to address two challenging issues: (1) how to characterize the time sensitive purchase demands of users; (2) how to utilize such purchase demands in our model for predicting nextitem of users.
To characterize the time sensitive demands, inspired by the studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), we summarize two aspects of the time sensitive demands: termed as longtime demands and shorttime demands. Longtime demands refer that a user purchase the same product repetitively, showing a longtime persistent interest (Bhagat et al., 2018); while shorttime demands refer the copurchase of items (Guidotti et al., 2017), e.g., buying paintbrushes after pigments. These two kinds of demands are commonly seen and had been much accounted in marketing (Tsai and Chiu, 2004). For example, online companies e.g., contact lenses company Clearly^{1}^{1}1https://www.clearly.ca/ and drug mart shoppers^{2}^{2}2https://www1.shoppersdrugmart.ca/en/healthandpharmacy/patientcontact
predict the longtime repeated purchase demands of users by estimating the service life of products user had purchased. The companies will send emails or notifications to users after authorizations, with the aim of reminding users to refill up products before they run out of the products. The shorttime demands are also identified as a marketing strategy for promotional activities in online retail shops
(Ha et al., 2002), e.g., the system will recommend some copurchased products after you had bought one related products (Guidotti et al., 2017; Yap et al., 2012). Although such long and shorttime demands of users can be estimated by some simple quantity or statistics scenarios, in most ecommerce websites, the service life of a large amount of products are hard to estimated, and copurchase items may also vary due to different using or purchasing habits of users.For utilizing such purchase demands, we propose a novel LongShort Demandsaware Model (LSDM), in which both user’s interests towards items and user’s demands over time are incorporated.
To model users’ longshort time demands, We use a time scale (detail see in Sec. 2.2) to cluster the successive product purchases within a time span together (i.e., shorttime demands) in order to cope with users’ repeated purchase demands of items at a certain time frequency (i.e., longtime demands). However, the repeated purchase frequencies of the same or similar products could vary greatly, e.g., a user may purchase milk more frequently than the detergent. Such different purchase demands may not be easily discovered from a single time scale which did in most sessionbased models (Hidasi et al., 2015; Li et al., 2017; Jannach and Ludewig, 2017). Also, the single time scale may be insufficient to group the various types of the copurchase items. In our model, we design multipletime scales to learn from different types of longshort purchase demands. More specifically, we propose a hierarchical neural architecture with multitime scales in LSDM (see in Figure 1), in which users’ purchase preferences of items over time are captured by recurrent neural networks, and longshort purchase demands in different time scales are address by joint learning strategies.
We found limited research about demandsaware purchase recommendations and none of that deals with the realworld purchase data on ecommerce websites. Our contributions in this paper are as follows:

We propose a novel LongShort Demandsaware Model (LSDM), in which both users’ interests towards items and user demands over time are incorporated. To the best of our knowledge, such endtoend neural model which automatically incorporate users’ purchase demands into recommendation model has been much less explored.

We characterize two aspects of users’ purchase demands: longtime demands (i.e., repeated purchase demands) and shorttime demands (i.e., successive purchase demands), and incorporate them into a hierarchical architecture with multitime scales.

Experimental results on three realworld commercial datasets (TaFeng, BeiRen and Amazon) demonstrate the effectiveness of our model for nextitem recommendation, showing the usefulness of modeling user longshort purchase demands of items.
The rest of this paper is organized as follows: we first introduce some preliminary definition in Section 2. Following it, section 3 presents our longshort demandsaware model for recommendation and section 4 introduces the experiments. Related work is summarized in section 5. Finally, section 6 concludes this paper.
2. Preliminaries
In this section, we formulate the nextitem recommendation problem and give a detail explanation of some key concepts (e.g., longshort purchase demands and time scales) in our paper.
2.1. Problem Statement
Assume we have a set of users and items, denoted by and respectively. Let denote a user and an item. The number of users and items are denoted as and respectively. Given a user , his purchase records is a sequence of items sorted by time, which can be represented as , where is the item purchased by user at time . Given the purchase history of a user , the nextitem recommendation aims to predict the next item that
would probably buy at time
:(1) 
where is the probability of item being purchased by at the next time , and is the prediction function. The prediction problem can also be formulated as a ranking problem of all items for each user. With the ranked list of all items, we recommend the top items to the user.
2.2. LongShort Purchase Demands
Purchase demands refer to a user’s desire and willingness to pay a price for a specific product. Inspired by the studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), we summarize two aspects of the time sensitive demands: termed as longtime demands and shorttime demands.
Longtime demands. Longtime demands refer to the fact that a user purchases the same product repetitively, showing a longtime persistent interest (Bhagat et al., 2018).
Shorttime demands. Shorttime demands refer to the copurchase purchase demands of users, e.g., buying paintbrushes after pigments.
Time scale. Time scale refers to a specific unit of time that divides up the purchase history of users into meaningful periods. We formulate the longshort demands with different time scales as follows: given a user and his purchase records , we use time window to define the time scale interval. With all purchases within a time window being grouped together, the purchase records of a user can be grouped into subsequences by time scale as follows:
(2) 
where is a set of items within the same time window.
Multitime scales. Multitime scales are composed with multiple different time scales. As we mentioned above, for modeling users’ longshort time demands, we use time scale to cluster the successive products together (i.e., shorttime demands), and cope with users’ repeated purchase demands of items at different time frequencies (i.e., longtime demands). However, as shown in Fig. 1, the repeated purchase frequencies of same products could vary greatly, e.g., user purchase pigments more frequently than the paintbrushes, which may not be easily discovered from a single time scale. Also, the single time scale may be insufficient to group the various types of the copurchase items, i.e., successively purchasing paintbrushes after pigments ( a kind of shorttime demands in a small time scale) and a laptop and its cleaner are purchased separated by a few products ( a shorttime demand in a slightly larger time scale). Hence we use multitime scales to model a more general longshort purchase demands (see Sec. 3.2).
By organizing the purchase history of a user with multitime scales, our model is powerful to model more general successive purchase demands (i.e., shorttime demands) and repeated purchase demands (i.e., longtime demands). The utility of multiple time scales to observe user’s purchase sequence is well documented in studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), which showed abundant evidence that human activities are largely regulated at several time scales and the final decision is based on interposition of them. The design of our model aims to fit these general observations on human behaviors.
3. Our Proposed Model
In this section, we describe our LongShort Demandsaware Model (LSDM) for personalized nextitem recommendation. we design a hierarchical neural architecture in LSDM with multitime scales (see in Figure 2).
We use a sequential modeling of users’ purchase preferences of items over time via LSTM in each time scale. The final prediction of nextitem is calculated by applying joint learning of the longshort purchase demands with multitime scales.
3.1. Modeling Purchase Records
Two main elements should be modeled: the sequence of purchased items and the user’s interests.
3.1.1. Encoding Attentive Transaction Information.
In each time scale, we use a time window (see in Eq. 2) to group all purchases within a time window together, which is termed as a transaction. Given a user and his purchase records , a transaction at step is . We represent the information of the set of items using an dimensional onehot representation, denoted by , in which only the entry corresponding to the item involved in the transaction will be set to 1. Then a lookup layer is applied to translate each item
into a latent vector
(3) 
where is the transformation vector for lookup, and is the embedding dimension of each item. To obtain the representation of , we adopt a concatenation operation to integrate the information of all items in transaction by
(4) 
where is the latent vector of transaction . is the number of items in
. Since the number of items in each transaction varies, we use a masked zeropadding value in the embedding layer to convert each transaction to a fixeddimension of representation vector.
Inspired by the success of attention mechanism in capturing the important information of previous states (Luong et al., 2015; Li et al., 2017), we adopt an attention mechanism to relay user’s appetite of a transaction to another. For a transaction , we assume the user’s appetite for the transaction is . is initialized randomly and learned through the training process over all transactions. We integrate attention weights with the latent vector of the transaction as follows
(5) 
where “” denotes the elementwise product of two vectors. Once we have obtained the attentive representations for each transaction , we introduce how to model user’s sequential behavior.
3.1.2. Modeling User’s Sequential Behavior.
Given a user , we represent it using a dimensional onehot representation, denoted by . Then we apply a lookup layer to transform the onehot vectors of into latent vectors
(6) 
where is the transformation vector for lookup.
The sequence of transactions at time granularity of user is . We obtain the representation of each attentive transaction in Eq. 5. The sequence of all attentive transactions can be represented as
. To model the sequential behavior of a user, we adopt Long ShortTerm Memory (LSTM) networks
(Hochreiter and Schmidhuber, 1997), which have proven effective in solving sequence learning problems (He et al., 2017a). The input of LSTM at step is . The output of LSTM (i.e., hidden state) is represented as . We model the interaction of user and transaction by(7) 
where (see in Eq. 6) is the embedding vector of user and is the interaction vector of user and current transaction. By updating user’s latent vector at each step of transaction, our model can learn user’s evolving interests to items in the sequential records.
Given a user and his or her previous transactions , we define the probability of an item being purchased in the next transaction by a softmax function
(8) 
where is the interaction vector of the user and the transaction at step (see in Eq. 7). For the whole set of items , the predicted probability can be represented as .
3.2. Joint Learning with MultiTime Scales
As introduced in Sec. 2.2, we use the time scales with different time windows to capture the long and short purchase demands of users. The different time scales can be denoted as
. At each training epoch, the prediction results (see in Eq.
8) of the next step at all time scales are denoted as , where is the matrix of prediction results of our longshort demandsaware model. Then we feed into a joint learning function to generate the final prediction results, denoted as(9) 
where is the final prediction results of our model and is the joint learning function.
The joint learning function is flexible to be extended to an arbitrarily complex method. We will discuss it in the experiment section (see in ). In our experiments, we consider two kinds of joint learning functions: linear (i.e., average, max and weighted joint learning) and nonlinear (i.e.,multilayer perceptron joint learning) functions. Given the set of time scales and their corresponding prediction results , the four joint learning functions are defined as follows

Average joint learning function. It uses the average of the prediction results of all time scales.
(10) 
Max joint learning function. It uses a maximum operation on the prediction results of all time scales.
(11) 
Weighted joint learning function. It learns user’s preference of all items in the different time scales by the weights. Given a time granularity , we learn a weight vector of . is initialized randomly and learned automatically in the training process of our model. We obtain the weighted prediction results of all time scales by
(12) 
Multilayer Perceptron (MLP) joint learning function. This is a nonlinear joint learning function. We examine if this function is more powerful to capture user’s preference at different time scales. We first concatenate the prediction results at different time scales by
(13) where concat is the concatenate operation of vectors. Then a multilayer perceptron (Gardner and Dorling, 1998) is used to obtain by
(14) where , , and
denote the weight matrix, bias vector, and activation function for the
th layer’s perceptron respectively. For activation functionsof MLP layers, one can freely choose among sigmoid, hyperbolic tangent (tanh), and Rectifier (ReLU), among others.
3.3. The Loss Function for Optimization
Our LSDM is optimized by a joint learning process. The objective functions to be optimized in all time scales is denoted as . The objective function in our LSDM can be defined as
where are sequences at different time scales. The model parameters on different time scales are . (see in Eq. 9) and are the learning function of LSDM and each time scale model respectively. The parameters to be learned in LSDM are .
For a time scale , we adopt a weighted crossentropy as the optimization objective at each step of LSTM:
(16)  
where is the th transaction in time scale . is the probability of an item being purchased in the next transaction (see in Eq. 8). If an item is purchased in the the next transaction, , otherwise, . and are the weights of positive and negative instances (i.e., item is purchased or not in the next transaction). These weights are used to cope with unbalanced number of positive and negative examples. In our experiments, the ratio of positive and negative instances is about 500, so we set to 500 times higher than to reduce the training bias.
In our LSDM, the objective function is defined as:
(17) 
where is the probability of items in our LSDM (see Eq. 9).
After training, given a user’s history purchase records, we can obtain the probability of each item being purchased at the next step according to Eq. 8. We than rank the items according to their probability, and select top results as the final recommended items to the user.
4. Experiments
We conduct experiments on three realworld datasets to verify the effectiveness of our proposed model for nextitem recommendation. In particular, we aim at answering the following questions:

Q1: Are purchase demands of users really useful for nextitem recommendation task?

Q2: Are multitime scales more powerful to capture the longshort purchase demands of users?

Q3: Is the joint learning function effectively incorporate users’ purchase demands with multitime scales?
In the following section, we will first introduce our experimental settings, including datasets, baselines, and evaluation metrics. Then we will analyze the various experimental results to answer the three questions one by one.
4.1. Experimental Settings
4.1.1. Datasets.
We experiment with three realworld datasets: TaFeng^{3}^{3}3http://www.bigdatalab.ac.cn/benchmark/bm/dd?data=TaFeng, BeiRen^{4}^{4}4http://www.bigdatalab.ac.cn/benchmark/bm/dd?data=TaFeng and Amazon^{5}^{5}5http://jmcauley.ucsd.edu/data/amazon/. TaFeng and BeiRen are two online shopping datasets with real purchase records of users. These two datasets are the only public available ones we are aware of that contain the real purchase history of users. We also conduct experiments on Amazon dataset, which is commonly used in many recommender models. Amazon a review dataset: users’ purchase records are collected from reviews^{6}^{6}6Note that a review usually implies a purchase, but a user may purchase an item without leaving a review.

TaFeng (Wang et al., 2015a) is a grocery shopping dataset, it covers products from food, office supplies to furniture. We use the data in a quarter (i.e., from December 2000 to February 2001) of shopping transactions of the TaFeng supermarket. Since it is unreliable to include users with few purchase times or limited active time for evaluation, we first remove the products which were bought less than 15 times and then keep users with purchase records in at least 5 weeks. We leave 7,044 items and 1,951 users with total 90,986 purchase records. The average number of purchase records of users are 50 and and the average times each item had been purchased is 14.

BeiRen (Le et al., 2017) is an online shopping dataset. We use 4 months (April 2013 to July 2013) in BeiRen and conduct the same data filtering methods as conducted in TaFeng dataset. Finally, we obtain 211,519 purchase records involving 3,264 users on 5,818 items. The average number of purchase records of users is 65.

Amazon (He and McAuley, 2016) is one of the largest Internet retailer in the world. We only can obtain the product review records. since users usually only post reviews after they made product purchases, we assume that reviews on Amazon correspond to actual purchases most of the time (Bai et al., 2018). We use review records in half a year (i.e., from January 1st, 2014 to June 30th, 2014). We first remove the products which have been purchased less than 5 times and then retain users with purchase records in at least 5 weeks. We obtain 6,092 items and 1,443 users with 15,811 purchases. The average number of product being reviewed by user is 11.
4.1.2. Time Scales Selection.
To implement the multitime scale model, we need to determine what time scales to use. As introduced in Sec. 2.2, time scale is used to cluster the successive products together (i.e., shorttime demands), and cope with users’ repeated purchase demands of items at a certain time frequency (i.e., longtime demands). Inspired by the studies in marketing strategies and human behaviors (Tsai and Chiu, 2004; Nowak and Vallacher, 1998; Rahimi and Wang, 2013), which showed abundant evidence that regularities structure is a defining feature of human activities, and the strongest influence is made by daily, followed by weekly and seasonal regular structures. It has also been found that “rhythms of life” are superimposition of different regularities (Nowak and Vallacher, 1998). So it is well justified to model user’s longshort purchase demands by following these strongest rhythms, i.e., daily, weekly and seasonal. Considering the time periods in our datasets are less than half a year, in our experiments, we select the daily and weekly scales in our model, i.e., daily scale and weekly scale. We also keep the original sequence information of users’ purchase history, we call it item scale. The usefulness of these rhythm based time scales (i.e., daily and weekly scales) are demonstrated in the following Sec. 4.3.
4.1.3. Evaluation Metrics.
Given a user, we infer the next item that the user would probably buy at next purchase. Each candidate method will produce an ordered list of items for the recommendation. We adopt two widely used rankingbased metrics to evaluate the performance of a ranked list: Hit ratio at rank (Hit@) and Normalized Discounted Cumulative Gain at rank (NDCG@).

Hit ratio at rank (HR@). Given the predicted ordered list of items for a user, Hit@ is defined as:
(18) 
Normalized Discounted Cumulative Gain at rank (NDCG@). Given the predicted ordered list of items for a user, NDCG@ is defined as:
(19) where is the position of items in the ranking list. returns 1 if was adopted by user in original dataset, and 0 otherwise.
Hit@ intuitively measures whether the test item is present in the Top List, and it accounts for the position of the hit by assigning higher scores to the hit with higher ranks. We report the top (i.e., and ) items in the ranking list as the recommended set.
4.1.4. Baseline Methods Compared.
We compare our model with the stateoftheart methods from different types of recommendation approaches, including:

Pop. It ranks the items according to their popularity measured by the number of being purchased. This is a widely used simple baseline.

BPR (Rendle et al., 2009). It optimizes the MF model with a pairwise ranking loss. This is a stateoftheart model for item recommendation, but the sequential information is ignored in this method.

FPMC (Rendle et al., 2010)
: It learns a transition matrix based on underlying Markov chains. Sequential behavior is modeled only between the adjacent transactions.

HRM (Wang et al., 2015a): It employs a neural network to conduct a nonlinear operations to integrate the representation of customers and purchase history of items from the adjacent transactions.

RRN (Wu et al., 2017): This is a representative approach that utilizes RNN to learn the dynamic representation of users and items in recommender systems. Our model with sequence of items can be regarded as equivalent to RRN model.

NARM (Li et al., 2017): This is a stateoftheart approach in personalized sessionbased recommendation with RNN models. It uses attention mechanism to determine the relatedness of the past purchases in the session for the next purchase. As our datasets do not have explicit information of sessions, we simulate sessions by the transactions within each day.
The above methods cover different kinds of the approaches in recommender systems: BPR is a classical method among traditional recommendation approaches; FPMC and HRM are representative methods which utilize the adjacent sequential information. RNN and NARM are recent methods using the whole sequential information for recommendation. Table 1 summarizes the properties of different methods. Our LSDM is a demandsaware model. Shorttime demands of users are model by the local sequence information of items within a transaction, and longtime demands are captured by the global information from the whole sequence of records.
Pop  BPR  FPMC  HRM  RNN  NARM  LSDM  

P  
D  
L  
G  
D 
Other sequential methods, e.g., DREAM (Yu et al., 2016), TransRec (He et al., 2017a), userbased RNN (Donkers et al., 2017) and HRNN (Quadrana et al., 2017), are similar to our baseline methods, so they are not included in our comparison. For sequence methods with auxiliary information, e.g., contentbased neural model (Suglia et al., 2017; Beutel et al., 2018)
, neural tensor factorization
(Wu et al., 2018) and pattern mining based model (Yap et al., 2012; Song and Yang, 2014; Guidotti et al., 2017; Quadrana et al., 2018), we also do not make comparisons due to additional information used in them.4.1.5. Parameter Settings.
The hyper parameters of each method, with which we obtain the best prediction results, are listed below. (1) BPR: the latent factors are 300, 300,200, the learning rates are 0.001, 0.001, 0.0005 in TaFeng, BeiRen and Amazon datasets respectively. (2) FPMC: the latent factors are 32,32 and 16, the learning rates are 0.015, 0.01, 0.001 in the three datasets. (3) HRM: the embedding size is 40, the learning rate is 0.005 and droprate is 0.5 in all datasets. (4) RRN: The embedding size is 50, the learning rate is 0.001 in all datasets. Batch size is set to 100, 20, 100 respectively. (5) NARM: the embedding sizes are 25, 15, 20, with 25, 25, 20 hidden units. The learning rates are 0.0001, 0.0008, 0.0008 and batch sizes are 256, 640, 640 in the three datasets. (6) MGASM: The embedding size is 50, the learning rate is 0.001 in all datasets. Batch sizes are 100, 20, 100 in the three datasets respectively.
For all the methods, we take the last item of each user as the predicting target, the penultimate item as the validation data for model selection, and the remaining part in each sequence as the training data to optimize the model parameters.
4.2. Performance Comparison (RQ1)
We present the results of Hit@ and NDCG@, (i.e., and ) on the nextitem recommendation performance in Table 2.
Datasets  TaFeng  BeiRen  Amazon  

Models  Hit@5  Hit@10  NDCG@5  NDCG@10  Hit@5  Hit@10  NDCG@5  NDCG@10  Hit@5  Hit@10  NDCG@5  NDCG@10 
Pop  0.0731  0.0862  0.0573  0.0663  0.1743  0.1982  0.1035  0.1109  0.0133  0.0188  0.0095  0.0114 
BPR  0.0928  0.1065  0.0698  0.0822  0.1814  0.2114  0.1255  0.1367  0.0172  0.0249  0.0105  0.0121 
FPMC  0.0945  0.1100  0.0772  0.0829  0.1843  0.2155  0.1273  0.1390  0.0174  0.0256  0.0102  0.0122 
HRM  0.0912  0.1143  0.0770  0.0825  0.1863  0.2001  0.1082  0.1120  0.0169  0.0236  0.0121  0.0134 
RRN  0.0984  0.1117  0.0720  0.0833  0.1869  0.2190  0.1317  0.1430  0.0163  0.0245  0.0114  0.0136 
NARM  0.1021  0.1186  0.0789  0.0833  0.1824  0.2053  0.1487  0.1518  0.0177  0.0261  0.0117  0.0144 
LSDM  0.1194*  0.1281*  0.0824*  0.0890*  0.2187*  0.2290*  0.1617*  0.1646*  0.0182*  0.0265  0.0119  0.0147 

Note: LSDM uses three typical time scales, i.e., item, daily and weekly ( which discussed in Sec. 4.3) and MLP joint learning function.

indicates the statistically significant improvements (i.e., twosided test with ) over the best baseline.
We have the following observations:
(1) Pop is the weakest baseline in all datasets, since it is a nonpersonalized method. BPR performs better than Pop, but is not as good as FPMC, which uses adjacent sequential information of the transition cubes. This shows that the local adjacent sequential information is useful in predicting the next item.
(2) HRM and RRN perform better than BPR and FPMC that do not use neural network in TaFeng and BeiRen datasets. It indicates that neural network is capable of modeling complex interactions between user’s general taste and their sequential behavior. RRN performs better than HRM, which may lie in that RRN model uses recurrent neural network to learn from the whole sequential data, while HRM only utilizes the adjacent sequential information.
(3) NARM is the stateoftheart neural model for sequential prediction task, and performs the best among all of the baseline methods except on Hit on BeiRen dataset. The attention mechanism enables NARM to attend to the most related purchases in the sessions and generate more accurate results. The effect is most visible on NDCG.
(4) Our LSDM with three typical time scales, i.e., item, daily and weekly, and MLP joint learning function ( which will discussed in Sec. 4.3 and Sec. 4.4) performs the best in almost all datasets. LSDM significantly outperforms all the baseline methods on TaFeng and BeiRen datasets. This indicates that the longshort purchase demands information used in our model is useful in predicting the realtime purchase demands of next item. Compared with RRN, LSDM not only utilizes global information from the whole sequence, but also captures some more complex local sequential information by grouping items into transactions. Compared with NARM, the multitime scales of LSDM can adaptively learn from different repeated purchase demands and copurchase items, which creating a more finegrained model for users.
(5) In Amazon dataset, although our LSDM improves the recommendation performance, the results only significantly in Hit@. Similarly, sequence model HRM and RRN also loss advantages compared with BPR and FPMC. The reason is that Amazon dataset is a review dataset, in which the purchase records of a user are collected from users’ reviews. Users will not write reviews after all purchases, hence such incomplete sequence may not be well learned by sequence models.
To further verify whether there exists repeated purchase of items at different time scales, we calculate the percentage of users who at least periodically purchase one item (i.e., an item is purchased successively in at least half of all transactions). The percentages are 20.4%, 41.2%, 5.3% at time scale of days, while 43.4%, 75.0%, 10.7% at weeks in TaFeng, BeiRen and Amazon datasets respectively. This indicates that many users have repeated purchase records in purchase datasets TaFeng and BeiRen, while the repeated purchases are far smaller in Amazon, due to it is a review datasets with incomplete real purchase records. Due to the incomplete sequence information in Amazon dataset, in the following results analysis, we only use real purchase datasets TaFeng and BeiRen to demonstrate the effectiveness of our model.
4.3. Usefulness of MultiTime Scales (RQ2)
In our LSDM, we use rhythm based time scales, i.e., daily and weekly scales, and item scale. For demonstrate the usefulness of multitime scales used in our model. We address two issues:
The effectiveness of rhythm based time scales. Following the “rhythms of life” in society theory (Nowak and Vallacher, 1998), we use daily and weekly time scales (i.e., LSDM and LSDM), termed as rhythm based time scale in our experiments. To demonstrate the effectiveness of these rhythm based time scales, we generate other norhythm based time scales , i.e.,
we cluster every two, five, ten items in the purchase sequence into a transaction. We borrow the terms in natural language processing, call these time scales as 2gram, 5gram and 10gram scales (
i.e., LSDM, LSDM and LSDM). We compute the increased ratios on Hit@ and NDCG@ of the different time scales compared to the model with item scale only (i.e., LSDM).The usefulness of multitime scales. We further examine the usefulness of our multitime scales, we compare our LSDM with degraded LSDM, i.e., LSDM, LSDM and LSDM.
We compute the percentage of improvement on Hit@ and NDCG@ of the different methods over the model with item scale only (i.e., LSDM), and present the results in Fig. 3. We can see that: (1) It can be observed that both rhythm based LSDM and norhythm based LSDM methods are better than the model with item scale only. This shows that much of the underlying long and short purchase demands, e.g., copurchase information, may fail to be extracted from item sequence. Any time scales we consider is helpful to learn the longshort demands for next item prediction task; (2) The rhythm based LSDM performs better than the norhythm based model on all datasets. This indicates that the time rhythms, i.e., daily and weekly, are really help to capture user’s purchase demands in the next item prediction; (3) The performance of LSDM with multitime scales is the best, i.e., the performance of LSDM is better than degraded LSDM and LSDM. It indicates that user’s complex purchase demands can be well captured by multitime scales. The performance of daily scale is higher than that of weekly scale, which indicates that the time scale of a day is more useful than that of week on our datasets. However, this observation is highly dependent on the data. It is possible that larger time scale become more useful on a dataset of larger time span (e.g., covering records of several years). Nevertheless, we can conclude now that different time scales are useful to detect different user purchase demands. These time scales tend to be complementary, leading to improved results when they are all considered. The multitime scales architecture in our model is flexible to learn from any time time scales, e.g., the period years, which is easily to extend if we have a longer observation of the purchase records of users.
4.4. Effects of Joint Training Strategy (RQ3)
We use a joint learning function (see Eq. 9) to integrate the purchase demands with multitime scales in LSDM. In our experiments, we examine different joint learning functions: average, max, weighted and multilayer perceptron (MLP). The MLP method uses one layer, and sigmoid as activation function. We present the results of our LSDM with the four joint learning functions on Hit@ and NDCG@, (i.e., and ) in Table 3.
Evaluation  Hit@5  Hit@10  NDCG@5  NDCG@10 

TaFeng  
Max  0.1148  0.1246  0.0817  0.0874 
Avg  0.1180  0.1275  0.0802  0.0883 
Weighted  0.1071  0.1266  0.0819  0.0872 
MLP  0.1194  0.1281  0.0824  0.0890 
BeiRen  
Max  0.2107  0.2285  0.1496  0.1534 
Avg  0.2118  0.2234  0.1598  0.1604 
Weighted  0.1976  0.2242  0.1480  0.1504 
MLP  0.2187  0.2290  0.1617  0.1646 
We can see that the MLP joint learning function performs the best on both datasets. This implies that nonlinear function (i.e., MLP) is more effective to capture the purchase demands information in different time scales. To further examine the effects of joint learning, we present the training loss and experimental performance on Hit@5 and NDCG@5 of LSDM and degenerated model with a single time scale (i.e., item, daily and weekly) in Fig. 4. We can see that: (1) The training loss is smaller in LSDM than other degenerated models, and the item scale is the worst from this perspective. (2) As the iteration increases, LSDM tends to outperform the degenerated models and converges faster than others; (3) The performance of degenerated LSDM with time scale of day is better than week, and the item scale is the worst. This indicates that much of the purchase behavior may can not be observed from the item sequence, while it is easier by using a larger time scale of day.
5. Related Work
Recommender systems have attracted a lot of attentions from the research community and industry. According to whether the sequence information is used, we summarizes the related methods of recommender system as follows.
Nonsequential Methods. Traditional nonsequential approaches at the early stage can be roughly divided into two categories, namely memorybased approaches and modelbased approaches (Sarwar et al., 2001). Memorybased methods mainly rely on the neighborhood information for collaborative filtering; while modelbased methods try to learn a prediction function using the history data. Modelbased collaborative filtering methods such as matrix factorization algorithms and their variants have been proven to be effective to address the scalability and sparsity challenges in recommendation tasks (Koren et al., 2009; Rendle et al., 2009; Koren, 2008b; Chen et al., 2017)
. Recently, deep learning techniques have been successfully applied in recommendation tasks and some pioneering studies have yielded promising results. Deep recommendation models mainly utilize deep learning techniques as a powerful data representation model, in which complicated useritem interactions and auxiliary information can be modeled in a unified representation. For example, neural rating prediction
(Salakhutdinov et al., 2007), neural collaborative filtering (He et al., 2017b; Bai et al., 2017) and autoencoder based recommender (Sedhain et al., 2015; Wang et al., 2015b; Wu et al., 2016). Those traditional approaches and the neural methods do not utilize sequential information, which disable them to capture user’s varying appetite of items over time.Sequential Methods. Detecting the purchase appetites of users and their evolution over time has been an active research topic in recent years. The main approaches to model the sequential behavior of a user have been developed in different recommendation settings: nextbasket recommendation (Rendle et al., 2010; Wang et al., 2015a; Yu et al., 2016; Guidotti et al., 2017), sessionbased (Hidasi et al., 2015, 2016; Quadrana et al., 2017; Li et al., 2017; Jannach and Ludewig, 2017) and direct transactionbased recommendation (Song et al., 2016; He et al., 2017a; Donkers et al., 2017; Wu et al., 2017; Beutel et al., 2018; Wu et al., 2018). The nextbasket prediction aims at predicting what items the user could put in his basket. The items are certainly dependent on the general interests of the user, but are also dependent on the items that the user has purchased in both his previous baskets and current basket. Two main approaches have been used to address the next basket recommendation problem: Markov Chains (MC) and Recurrent Neural Network (RNN). The Factorizing Personalized Markov Chains (FPMC) approach (Rendle et al., 2010) models both user’s sequential behavior and general tastes by conducting a tensor factorization over the transition cubes. The RNN based model, e.g., Hierarchical Representation Model (HRM) (Wang et al., 2015a), improves FPMC by employing a twolayer architecture to construct a nonlinear hybrid aggregation of the user profile vector and the transaction representation. Dynamic REcurrent bAasket Model (Yu et al., 2016) (DREAM) adopts RNN to model global sequential features which reflect interactions among baskets, and uses the hidden state of RNN to represent user’s dynamic interests over time. Sessionbased sequence models are commonly used in the web page clicking scenarios. It is different from next basket recommendation in that the order of clicks on items in a session is considered. RNNbased methods (Medsker and Jain, 2001; Hidasi et al., 2015, 2016; Quadrana et al., 2017; Li et al., 2017; Jannach and Ludewig, 2017) are usually adopted to capture the long historical records of users. In (Quadrana et al., 2017), the user’s characteristics are learned by modeling user’s representation in the sequence. The rich features of items are also incorporated into RNN model to learn the preference of users (Hidasi et al., 2016). To make more accurate prediction, attention mechanism is utilized in (Li et al., 2017) to capture user’s main interests in the current session. Different from basket and sessionbased methods, which generally cluster items explicitly into baskets or sessions, some transactionbased approaches directly model the sequence of transaction of items (Song et al., 2016; He et al., 2017a; Donkers et al., 2017; Wu et al., 2017; Beutel et al., 2018; Wu et al., 2018). In addition, sequential patterns have also been extracted to reflect the cooccurrences (or dependencies) of items or periodical characteristic of item purchases (Mobasher et al., 2002; Tzvetkov et al., 2005; Yap et al., 2012; Guidotti et al., 2017). Recent work also leverage lowrank tensor completion and product category interpurchase duration vector (Yi et al., 2017) to model the duration of items instead of the time scales of purchases.
In all the above studies, we observe that the models work on a single sequence of items, transactions or sessions. The previous studies demonstrated that some types of purchase patterns can be extracted from such a sequence, but none of them attempted to extract different patterns at different time scales. This latter is exactly the goal of our study  our LSDM considers several sequences at different time scales so as to draw a more complete picture of the sequential behavior of the user and allows us to discover various copurchase patterns and repeated purchasing at different time scales. Modeling users’ purchase with multitime scales enables our model to better understand the realtime purchase demands of users and recommend the items at the right time. We will show in our experiments that this results in better predictions.
6. Conclusion
In this paper, we explored the utilization of different time scales for nextitem recommendation. Our assumption was that different long and short time purchase demands (i.e., repetitive purchase and copurchase) of users can exhibit with different time scales. This assumption was validated by the experimental results in our model on nextitem recommendation task. Our proposed LongShort Demandsaware Model (LSDM) captures both user’s interests towards items and user’s demands over time. Experimental results on three public datasets (i.e., TaFeng, BeiRen and Amazon) demonstrate the effectiveness of our model. While the idea of using multiple time scales is validated, our implementation can be further improved, with respect to detect the best time scales from the data automatically. It is also possible to incorporate richer information in the recommendation process, such as attribute information of items (i.e., category, price) and textual description of items, etc. We will explore these avenues in the future.
References
 (1)
 Afeche et al. (2015) Philipp Afeche, Opher Baron, Joseph Milner, and Ricky RoetGreen. 2015. Pricing and prioritizing timesensitive customers with heterogeneous demand rates. Under review (2015).
 Bai et al. (2017) Ting Bai, JiRong Wen, Jun Zhang, and Wayne Xin Zhao. 2017. A Neural Collaborative Filtering Model with Interactionbased Neighborhood. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1979–1982.
 Bai et al. (2018) Ting Bai, Xin Zhao, Yulan He, JianYun Nie, and JiRong Wen. 2018. Characterizing and Predicting Early Reviewers for Effective Product Marketing on ECommerce Websites. IEEE Transactions on Knowledge and Data Engineering (2018).
 Beutel et al. (2018) Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and H Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. (2018).
 Bhagat et al. (2018) Rahul Bhagat, Srevatsan Muralidharan, Alex Lobzhanidze, and Shankar Vishwanath. 2018. Buy It Again: Modeling Repeat Purchase Recommendations. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 62–70.
 Chen et al. (2017) Chao Chen, Dongsheng Li, Qin Lv, Junchi Yan, Li Shang, and Stephen M. Chu. 2017. GLOMA: Embedding Global Information in Local Matrix Approximation Models for Collaborative Filtering. In AAAI. 1295–1301.
 Donkers et al. (2017) Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential Userbased Recurrent Neural Network Recommendations. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 152–160.
 Gardner and Dorling (1998) Matt W Gardner and SR Dorling. 1998. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment 32, 1415 (1998), 2627–2636.
 Guidotti et al. (2017) Riccardo Guidotti, Giulio Rossetti, Luca Pappalardo, Fosca Giannotti, and Dino Pedreschi. 2017. Next Basket Prediction using Recurring Sequential Patterns. arXiv preprint arXiv:1702.07158 (2017).
 Ha et al. (2002) Sung Ho Ha, Sung Min Bae, and Sang Chan Park. 2002. Customer’s timevariant purchase behavior and corresponding marketing strategies: an online retailer’s case. Computers & Industrial Engineering 43, 4 (2002), 801–820.
 He et al. (2017a) Ruining He, WangCheng Kang, and Julian McAuley. 2017a. Translationbased Recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 161–169.
 He and McAuley (2016) Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with oneclass collaborative filtering. In proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 507–517.
 He et al. (2017b) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and TatSeng Chua. 2017b. Neural Collaborative Filtering. In WWW. 173–182.
 He et al. (2016) Xiangnan He, Hanwang Zhang, MinYen Kan, and TatSeng Chua. 2016. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 549–558.
 Hidasi et al. (2015) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Sessionbased recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
 Hidasi et al. (2016) Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel recurrent neural network architectures for featurerich sessionbased recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 241–248.
 Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long shortterm memory. Neural computation 9, 8 (1997), 1735–1780.
 Jannach and Ludewig (2017) Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for sessionbased recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 306–310.
 Koren (2008a) Yehuda Koren. 2008a. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 426–434.
 Koren (2008b) Yehuda Koren. 2008b. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD. 426–434.
 Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009).
 Le et al. (2017) Duc Trong Le, Hady W Lauw, and Yuan Fang. 2017. Basketsensitive personalized item recommendation. IJCAI.
 Li et al. (2017) Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Sessionbased Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1419–1428.
 Luong et al. (2015) MinhThang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025 (2015).
 Medsker and Jain (2001) LR Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications 5 (2001).
 Mobasher et al. (2002) Bamshad Mobasher, Honghua Dai, Tao Luo, and Miki Nakagawa. 2002. Using sequential and nonsequential patterns in predictive web usage mining tasks. In Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE, 669–672.
 Nowak and Vallacher (1998) Andrzej Nowak and Robin R Vallacher. 1998. Dynamical social psychology. Vol. 647. Guilford Press.
 Quadrana et al. (2018) Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. SequenceAware Recommender Systems. arXiv preprint arXiv:1802.08452 (2018).
 Quadrana et al. (2017) Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing Sessionbased Recommendations with Hierarchical Recurrent Neural Networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 130–137.
 Rahimi and Wang (2013) Seyyed Mohammadreza Rahimi and Xin Wang. 2013. Location recommendation based on periodicity of human activities and location categories. In PacificAsia Conference on Knowledge Discovery and Data Mining. Springer, 377–389.

Rendle et al. (2009)
Steffen Rendle, Christoph
Freudenthaler, Zeno Gantner, and Lars
SchmidtThieme. 2009.
BPR: Bayesian personalized ranking from implicit
feedback. In
Proceedings of the twentyfifth conference on uncertainty in artificial intelligence
. AUAI Press, 452–461.  Rendle et al. (2010) Steffen Rendle, Christoph Freudenthaler, and Lars SchmidtThieme. 2010. Factorizing personalized markov chains for nextbasket recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 811–820.
 Salakhutdinov et al. (2007) Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey E. Hinton. 2007. Restricted Boltzmann machines for collaborative filtering. In ICML. 791–798.
 Sarwar et al. (2001) Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Itembased collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. ACM, 285–295.

Sedhain
et al. (2015)
S. Sedhain, A. K. Menon,
S. Sanner, and L. Xie.
2015.
Autorec: Autoencoders meet collaborative filtering. In
WWW. 111–112.  Skaer et al. (1993) TL Skaer, DA Sclar, DJ Markowski, and JK Won. 1993. Effect of valueadded utilities on prescription refill compliance and health care expenditures for hypertension. Journal of human hypertension 7, 5 (1993), 515–518.
 Song and Yang (2014) Wei Song and Kai Yang. 2014. Personalized Recommendation Based on Weighted Sequence Similarity. In Practical Applications of Intelligent Systems. Springer, 657–666.
 Song et al. (2016) Yang Song, Ali Mamdouh Elkahky, and Xiaodong He. 2016. Multirate deep learning for temporal recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 909–912.
 Suglia et al. (2017) Alessandro Suglia, Claudio Greco, Cataldo Musto, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro. 2017. A Deep Architecture for Contentbased Recommendations Exploiting Recurrent Neural Networks. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, 202–211.
 Tsai and Chiu (2004) CY Tsai and CC Chiu. 2004. A purchasebased market segmentation methodology. Expert Systems with Applications 27, 2 (2004), 265–276.
 Tzvetkov et al. (2005) Petre Tzvetkov, Xifeng Yan, and Jiawei Han. 2005. TSP: Mining topk closed sequential patterns. Knowledge and Information Systems 7, 4 (2005), 438–457.
 Wang et al. (2015b) Hao Wang, Naiyan Wang, and DitYan Yeung. 2015b. Collaborative Deep Learning for Recommender Systems. In KDD. 1235–1244.
 Wang et al. (2015a) Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2015a. Learning hierarchical representation model for nextbasket recommendation. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, 403–412.
 Wu et al. (2017) ChaoYuan Wu, Amr Ahmed, Alex Beutel, Alexander J Smola, and How Jing. 2017. Recurrent recommender networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 495–503.
 Wu et al. (2018) Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, and Nitesh Chawla. 2018. Neural Tensor Factorization. arXiv preprint arXiv:1802.04416 (2018).
 Wu et al. (2016) Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. 2016. Collaborative denoising autoencoders for topn recommender systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 153–162.
 Yap et al. (2012) GhimEng Yap, XiaoLi Li, and Philip S Yu. 2012. Effective nextitems recommendation via personalized sequential pattern mining. In International Conference on Database Systems for Advanced Applications. Springer, 48–64.
 Yi et al. (2017) Jinfeng Yi, ChoJui Hsieh, Kush R Varshney, Lijun Zhang, and Yao Li. 2017. Scalable DemandAware Recommendation. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 2412–2421. http://papers.nips.cc/paper/6835scalabledemandawarerecommendation.pdf
 Yu et al. (2016) Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 729–732.
Comments
There are no comments yet.