I Introduction
Nowadays, a growing number of companies are operating ecommerce websites that allow users to browse and purchase a variety of items via the Internet [45]. In this situation, there is great potential value in analyzing users’ itemchoice behavior from clickstream data, which is a record of users’ page view (PV) history on an ecommerce website. If we grasp a user’s purchase intention behind the PV history, we can lead the user to a target page or design a special sales promotion. This gives companies an opportunity to build profitable relationships with website users [22, 33]. Companies can also use the clickstream data to enhance the quality of operational forecasting and inventory management [18]. Meanwhile, users often find it difficult to select an appropriate item from the plethora of choices presented by an ecommerce website [1]. Analyzing users’ itemchoice behavior can improve the performance of recommender systems that assist users to discover new and worthwhile items [20]. For all of these reasons, a number of prior studies have investigated clickstream data from various perspectives [7]. In particular, we focus on closely examining the relationship between users’ PV history and their itemchoice behavior on an ecommerce website.
It has been demonstrated that the recency and frequency of a user’s past purchases are critical indicators for purchase prediction [13, 46] and sequential pattern mining [9]. In light of this observation, Iwanaga et al. [19] developed the shaperestricted optimization model specialized for estimating the itemchoice probabilities from the recency and frequency of each user’s previous PVs. This method creates a twodimensional probability table consisting of itemchoice probabilities for all recency and frequency combinations of each user’s previous PVs. Nishimura et al. [32]
employed a latentclass modeling to integrate item heterogeneity into the twodimensional probability table. Their experimental results demonstrated that higher prediction performance was achieved with the twodimensional probability table than with common machine learning methods, namely, logistic regression, kernelbased support vector machines, artificial neural networks, and random forests. It is notable, however, that each user’s PV history is reduced to the two dimensions (
i.e., recency and frequency) by means of the twodimensional probability table. Such a dimensionality reduction may markedly decrease the amount of information contained in the PV history about users’ itemchoice behavior.This paper is focused on the PV sequence, which represents a time series of the number of PVs taken by a user–item pair in each period. In contrast to the twodimensional probability table, the PV sequence allows us to retain detailed information contained in the PV history. However, since there are a huge number of possible PV sequences, it is extremely difficult to accurately estimate itemchoice probabilities for all those PV sequences. To overcome this difficulty, we propose a shaperestricted optimization model, where the monotonicity constraint is imposed on itemchoice probabilities based on a partially ordered set (poset) specialized for PV sequences. Although this optimization model contains a huge number of constraints, all redundant constraints can be eliminated according to the transitivity of partial order. To accomplish this, we compute a transitivity reduction [2] of a directed graph representing the poset. We demonstrate the effectiveness of our method through experiments using realworld clickstream data.
The main contributions of this paper are highlighted as follows.

We propose a shaperestricted optimization model for estimating itemchoice probabilities from each user’s previous PV sequence. This PV sequence model exploits the monotonicity constraint to provide precise estimates of itemchoice probabilities.

We derive two types of posets of PV sequences according to the recency and frequency of each user’s previous PVs. Experimental results show that the monotonicity constraint based on these posets greatly enhances the prediction performance of our PV sequence model.

We devise constructive algorithms for transitive reduction specialized for our posets. The time complexity of our algorithms is much smaller than that of generalpurpose algorithms. Experimental results reveal that the transitive reduction improves the efficiency both in terms of computation time and memory usage of our PV sequence model.

We verify based on experimental results that higher prediction performance is achieved with our method than with the twodimensional probability table and common machine learning methods, namely, logistic regression, artificial neural networks, and random forests.
The remainder of this paper is organized as follows. Section 2 gives a brief review of related work. Section 3 explains the twodimensional probability table [19], and Section 4 presents our PV sequence model. Section 5 describes our constructive algorithms for transitive reduction. Section 6 evaluates the effectiveness of our method based on experimental results. Section 7 concludes with a brief summary of our work and a discussion of future research directions.
Ii Related work
This section gives a brief survey of predicting online user behavior and discusses some related work on shaperestricted regression.
Iia Prediction of online user behavior
There are a number of prior studies that aim at predicting users’ purchase behavior on ecommerce websites [10]. A mainstream research involves predicting the occurrence of purchase for each session by means of stochastic/statistical models [5, 23, 30, 31, 36, 41, 46], whereas this approach gives no consideration to which item to be chosen by users.
Various machine learning methods have been employed for prediction of online itemchoice behavior; these include logistic regression [12, 53], association rule mining [37], support vector machines [38, 53], ensemble learning methods [25, 26, 39, 52, 54], and artificial neural networks [21, 47, 50]. Some tailored statistical models have also been proposed; for instance, Moe [29]
devised a twostage multinomial logit model that separates the decisionmaking process into a itemview decision and a purchase decision. Yao
et al. [51] proposed a joint framework consisting of userlevel factor estimation and itemlevel factor aggregation based on the buyer decision process. Borges and Levener [6]employed Markov chain models to estimate the probability of the next link choice of a user.
These prior studies have made effective use of clickstream data in various prediction methods. Additionally, paying attention to timeevolving user behavior is crucial for precise prediction of online itemchoice behavior. In light of these insights, we focus on sequences of user PVs to estimate users’ itemchoice probabilities on ecommerce websites. Moreover, we evaluate the prediction performance of our method by comparison with machine learning methods that are commonly employed in prior studies.
IiB Shaperestricted regression
In many practical situations, we know prior information about the relationship between explanatory and response variables. For instance, utility functions are assumed to be increasing and concave from economic theory
[28], and option pricing functions are restricted to be monotone and convex from finance theory [3]. Shaperestricted regression fits a nonparametric function to a set of given observations under such shape restrictions (e.g., monotonicity, convexity/concavity, and unimodality) [8, 15, 16, 48].Isotonic regression is the most commonly used method of shaperestricted regression. In general, the isotonic regression is the problem of estimating a realvalued monotone (i.e., nondecreasing or nonincreasing) function with respect to a given partial order of observations [35]. Some regularization techniques [14, 44] and estimation algorithms [17, 35, 43] have been proposed for isotonic regression.
One of the greatest advantages of shaperestricted regression is that the prediction performance of regression models can be improved because overfitting is mitigated by shape restrictions [4]. To take this advantage of shape restrictions, Iwanaga et al. [19] devised the shaperestricted optimization model for estimating itemchoice probabilities on ecommerce websites. In line with Iwanaga et al. [19], we propose a shaperestricted optimization model based on order relations of PV sequences to upgrade the prediction performance.
Iii Twodimensional probability table
This section gives a brief review of the twodimensional probability table proposed by Iwanaga et al. [19].
Iiia Empirical probability table
#PVs  choice  

user  item  Apr. 1st  Apr. 2nd  Apr. 3rd  Apr. 4th  
1  0  1  0  
0  1  0  1  
3  0  0  0  
0  0  3  1  
1  1  1  0  
2  0  1  0 
Table I gives an example of a PV history of six user–item pairs. For instance, user viewed the webpage of item once each on April 1st and 3rd. We focus on user choices (e.g., revisit and purchase) on April 4th, which is called the base date. For instance, user chose not item but item on the base date. It is supposed for each user–item pair that recency and frequency are characterized by the day of last PV and the total number of PVs, respectively. As shown in Table I, the PV history can be summarized by the recency and frequency combination , where and are the index sets representing the recency and frequency, respectively.
Let us denote by the number of user–item pairs that have . We also set to the number of choices occurred by user–item pairs that have on the base date. In the case of Table I, the empirical probability table is calculated as
(1) 
where, for reasons of expediency, for with .
IiiB Twodimensional monotonicity model
It is reasonable to assume that the recency and frequency of user–item pairs are positively associated with users’ itemchoice probabilities. To estimate users’ itemchoice probabilities for all recency and frequency combinations , the twodimensional monotonicity model [19] minimizes the weighted sum of squared errors under monotonicity constraints with respect to recency and frequency.
(2)  
subject to  (3)  
(4)  
(5) 
It is notable, however, that different PV histories are often indistinguishable according to the recency and frequency. A typical example is a set of user–item pairs , , and in Table I; although their PV histories are really different, they have the same value of the recency and frequency combination. To distinguish between these PV histories, we exploit the PV sequence in the next section.
Iv PV sequence model
This section presents our shaperestricted optimization model for estimating itemchoice probabilities from each user’s previous PV sequence.
Iva PV sequence
The PV sequence for each user–item pair represents a time series of the number of PVs:
where is the number of PVs periods ago; see also Table I. Note that the sequence terms are arranged in reverse chronological order; that is, moves back into the past as the index increases.
Throughout the paper, we express the set of consecutive integers as
where when . Then, the set of possible PV sequences is defined as
where is the maximum number of PVs in each period, and is the number of considered periods.
Our objective is to estimate itemchoice probabilities for all PV sequences . However, it is extremely difficult to accurately estimate such probabilities because there are a huge number of PV sequences. In the case of for instance, the number of different PV sequences is , whereas that of the recency and frequency combinations is only . To avoid this difficulty, we shall make effective use of monotonicity constraints on itemchoice probabilities as in the optimization model (2)–(5). In the next section, we introduce three operations underlying the development of monotonicity constraints.
IvB Operations based on recency and frequency
It is reasonable from the perspective of frequency that the itemchoice probability increases as the number of PVs in a particular period gets larger. To formulate this reasoning, we define the following operation.
Definition 1 (Up).
On the domain
the function is defined as
For instance, we have , and . Since the frequency of PVs is increased by this operation, the monotonicity constraint should be satisfied by itemchoice probabilities.
It is inferred from the perspective of recency that more recent PVs have larger effects of increasing the itemchoice probability. To formulate this inference, we consider the following operation that moves one PV from an old period to a new period.
Definition 2 (Move).
On the domain
the function is defined as
For instance, we have , and . Since the number of recent PVs is increased by this operation, the monotonicity constraint should be satisfied by itemchoice probabilities.
The PV sequence represents a user’s continued interest in a certain item for three periods. In contrast, the PV sequence implies that a user’s interest decreases during the recent two periods. In this sense, the monotonicity constraint may not be validated. Accordingly, we define the following alternative operation that exchanges the numbers of PVs such that the number of recent PVs will be increased.
Definition 3 (Swap).
On the domain
the function is defined as
We have because , and because . Since the number of recent PVs is increased by this operation, the monotonicity constraint should be satisfied by itemchoice probabilities. It is notable that the monotonicity constraint is not implied by this operation.
IvC Partially ordered sets
Let be a subset of PV sequences. The image of each operation is then defined as
Let us define for . The following definition states that the binary relation holds when can be transformed into by repeated application of Up and Move.
Definition 4 ().
Suppose that . We write if and only if there exists such that
We also write if or .
Similarly, we define for . Then, the binary relation holds when can be transformed into by repeated application of Up and Swap.
Definition 5 ().
Suppose that . We write if and only if there exists such that
We also write if or .
To prove properties of these binary relations, we can use the lexicographic order, which is an wellknown linear order [40].
Definition 6 ().
Suppose that . We write if and only if there exists such that and for . We also write if or .
Each application of Up, Move, and Swap makes a PV sequence greater in the lexicographic order. Therefore, we can obtain the following lemma.
Lemma 1.
Suppose that . If or , then .
The following theorem states that a partial order of PV sequences is derived by the operations Up and Move.
Theorem 1.
The pair is a poset.
Proof.
In the same manner, we can prove the following theorem for the operations Up and Swap.
Theorem 2.
The pair is a poset.
IvD Shaperestricted optimization model
Let be the number of user–item pairs that have the PV sequence . Also, is the number of choices provoked by user–item pairs that have on the base date. Similarly to Eq. (1), we can calculate empirical itemchoice probabilities as
(6) 
Our shaperestricted optimization model minimizes the weighted sum of squared errors subject to the monotonicity constraint.
(7)  
subject to  (8)  
(9) 
where, in Eq. (8) is defined by one of the partial orders and .
The monotonicity constraint (8) aids in enhancing the estimation accuracy of itemchoice probabilities. In addition, our shaperestricted optimization model can be used at a postprocessing step to upgrade prediction performance of other machine learning methods. Specifically, we first compute itemchoice probabilities by using a machine learning method. We next substitute the computed values into and then solve the optimization model (7)–(9). Consequently, we can obtain itemchoice probabilities corrected by the monotonicity constraint (8). The usefulness of this approach will be illustrated in Section 6.4.
V Algorithms for transitive reduction
(a) Operationbased graph 
(b) Transitive reduction 
(a) Operationbased graph 
(b) Transitive reduction 
This section describes our constructive algorithms for transitive reduction to decrease the problem size of our shaperestricted optimization model.
Va Transitive reduction
A poset can be represented by a directed graph , where and are the sets of nodes and directed edges, respectively. Each directed edge in this graph corresponds to the order relation , so the number of directed edges coincides with that of constraints in Eq. (8).
Figs. 1 and 2 show directed graph representations of the posets and , respectively. Each edge in Figs. 1(a) and 2(a) corresponds to one of the operations Up, Move, and Swap; edge is redcolored if , and it is blackcolored if or . These directed graphs shown in Figs. 1(a) and 2(a) can be created easily.
Now, let us suppose that there are three edges
In this case, edge is implied by the other edges due to the transitivity of partial order:
or equivalently,
As a result, the edge is redundant and can be removed from the directed graph.
A transitive reduction, also known as a Hasse diagram, of a directed graph is its subgraph such that all redundant edges are removed using the transitivity of partial order [2]. Figs. 1(b) and 2(b) show transitive reductions of the directed graphs shown in Figs. 1(a) and 2(a), respectively. By computing transitive reductions, the number of edges is reduced from 90 to 42 in Fig. 1, and from 81 to 46 in Fig. 2. It is known that the transitive reduction is unique [2].
VB Generalpurpose algorithms
The transitive reduction is characterized by the following lemma [40].
Lemma 2.
Suppose that . Then, holds if and only if both the following conditions are fulfilled:
 (C1)

;
 (C2)

if satisfies , then .
A basic strategy of generalpurpose algorithms for transitive reduction involves the following steps.
 Step 1:

An exhaustive directed graph is generated from a given poset .
 Step 2:

The transitive reduction is computed from the directed graph using Lemma 2.
Various algorithms have been proposed so far to speed up the computation of Step 2. Recall that in our situation. Warshall’s algorithm [49] has the time complexity of to complete Step 2 [40]. This time complexity can be reduced to using a sophisticated algorithm for fast matrix multiplication [24].
However, these generalpurpose algorithms are clearly inefficient especially when is very large. In addition, a huge amount of computations are required also for Step 1. To resolve this difficulty, we devise specialized algorithms to directly construct a transitive reduction.
VC Constructive algorithms
Let be a transitive reduction of a directed graph representing the poset . Then, the transitive reduction can be characterized by the following theorem.
Theorem 3.
Suppose that . Then, holds if and only if any one of the following conditions is fulfilled:
 (UM1)

;
 (UM2)

such that .
Proof.
See Appendix AA. ∎
Theorem 3 gives a constructive algorithm that directly computes the transitive reduction without generating an exhaustive directed graph . Our algorithm is based on the breadthfirst search algorithm [11]. Specifically, we start with the node list . At each iteration of the algorithm, we choose , enumerate such that , and add these nodes to the list .
Table II shows this enumeration process for with . The operations Up and Move generate
which amounts to searching edges in Fig. 1(a). We next check whether each satisfies the conditions (UM1) and (UM2) of Theorem 3. As shown in Table II, we choose
and add them to the list ; this amounts to enumerating edges in Fig. 1(b).
Operation  (UM1)  (UM2)  

—  
—  
—  
— 
We show a pseudocode of our constructive algorithm (Algorithm 1) in Appendix BA. Recalling the time complexity analysis of the breadthfirst search [11], one readily sees that the time complexity of Algorithm 1 is , which is much smaller than achieved by the generalpurpose algorithm [24] especially when is very large.
Next, we focus on the transitive reduction of a directed graph representing the poset . Then, the transitive reduction can be characterized by the following theorem.
Theorem 4.
Suppose that . Then, holds if and only if any one of the following conditions is fulfilled:
 (US1)

such that and for all ;
 (US2)

such that and for all .
Proof.
See Appendix AB. ∎
1 Theorem 4 also gives a constructive algorithm for computing the transitive reduction . Let us consider again as an example with . As shown in Table III, the operations Up and Swap generate
and we choose
Operation  (US1)  (US2)  

—  
—  
—  
— 
Vi Experiments
The experimental results reported in this section evaluate the effectiveness of our method for estimating itemchoice probabilities.
We used realworld clickstream data collected from a Chinese ecommerce website Tmall^{1}^{1}1https://tianchi.aliyun.com/dataset/. We used the date set^{2}^{2}2https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbwigKjcPTBI6ZPAa?dl=0 preprocessed by Ludewig and Jannach [27]. Each record corresponds to one PV and contains information such as user ID, item ID, and time stamp. The data set involves 28,316,459 unique user–item pairs composed of 422,282 users and 624,221 items.
Via Methods for comparison
Abbrev.  Method 

2dimEmp  Empirical probability table (1) [19] 
2dimMono  twodimensional monotonicity model (2)–(5) [19] 
SeqEmp  Empirical probabilities (6) for PV sequences 
SeqUM  Our PV sequence model (7)–(9) using 
SeqUS  Our PV sequence model (7)–(9) using 
LR  regularized logistic regression 
ANN  Artificial neural networks for regression 
RF  Random forest of regression trees 
Training  

Pair ID  Start  End  Validation 
1  21 May 2015  18 August 2015  19 August 2015 
2  31 May 2015  28 August 2015  29 August 2015 
3  10 June 2015  7 September 2015  8 September 2015 
4  20 June 2015  17 September 2015  18 September 2015 
5  30 June 2015  27 September 2015  28 September 2015 
#Cons in Eq. (8)  

Enumeration  Operation  Reduction  
#Vars  SeqUM  SeqUS  SeqUM  SeqUS  SeqUM  SeqUS  
5  1  32  430  430  160  160  48  48 
5  2  243  21,383  17,945  1,890  1,620  594  634 
5  3  1,024  346,374  255,260  9,600  7,680  3,072  3,546 
5  4  3,125  3,045,422  2,038,236  32,500  25,000  10,500  12,898 
5  5  7,776  18,136,645  11,282,058  86,400  64,800  28,080  36,174 
5  6  16,807  82,390,140  48,407,475  195,510  144,060  63,798  85,272 
1  6  7  21  21  6  6  6  6 
2  6  49  1,001  861  120  105  78  93 
3  6  343  42,903  32,067  1,638  1,323  798  1,018 
4  6  2,401  1,860,622  1,224,030  18,816  14,406  7,350  9,675 
5  6  16,807  82,390,140  48,407,475  195,510  144,060  63,798  85,272 
Time [s]  
Enumeration  Operation  Reduction  
#Vars  SeqUM  SeqUS  SeqUM  SeqUS  SeqUM  SeqUS  
5  1  32  0.00  0.01  0.00  0.00  0.00  0.00 
5  2  243  2.32  1.66  0.09  0.07  0.03  0.02 
5  3  1,024  558.22  64.35  3.41  0.71  0.13  0.26 
5  4  3,125  OM  OM  24.07  13.86  1.72  5.80 
5  5  7,776  OM  OM  180.53  67.34  9.71  36.94 
5  6  16,807  OM  OM  906.76  522.84  86.02  286.30 
1  6  7  0.00  0.00  0.00  0.00  0.00  0.00 
2  6  49  0.03  0.01  0.01  0.00  0.00  0.00 
3  6  343  12.80  1.68  0.20  0.03  0.05  0.02 
4  6  2,401  OM  OM  8.07  4.09  2.12  2.87 
5  6  16,807  OM  OM  906.76  522.84  86.02  286.30 
#Cons in Eq. (8)  Time [s]  F1 score [%],  

#Vars  SeqUM  SeqUS  SeqUM  SeqUS  SeqEmp  SeqUM  SeqUS  
3  30  29,791  84,630  118,850  86.72  241.46  12.25  12.40  12.40 
4  12  28,561  99,372  142,800  198.82  539.76  12.68  12.93  12.95 
5  6  16,807  63,798  85,272  86.02  286.30  12.90  13.18  13.18 
6  4  15,625  62,500  76,506  62.92  209.67  13.14  13.49  13.48 
7  3  16,384  67,584  76,818  96.08  254.31  13.23  13.52  13.53 
8  2  6,561  24,786  25,879  19.35  17.22  13.11  13.37  13.35 
9  2  19,683  83,106  86,386  244.15  256.42  13.07  13.40  13.37 
(a) ,  (b) ,  (c) , 
(d) ,  (e) ,  (f) , 
We compared the performance of the methods listed in Table IV. All computations were performed on an Apple MacBook Pro computer with an Intel Core i75557U CPU (3.10 GHz) and 16 GB of memory.
The optimization models (2)–(5) and (7)–(9) were solved using OSQP^{3}^{3}3https://osqp.org/docs/index.html [42], a numerical optimization package for solving convex quadratic optimization problems. As in Table I, dailyPV sequences were calculated for each user–item pair, where is the maximum number of daily PVs, and is the number of terms (i.e., past days) in the PV sequence. In this process, all PVs more than days ago were added to the number of PVs days ago, and the number of daily PVs of more than was rounded down to . Similarly, the recency and frequency combinations were calculated using daily PVs as in Table I, where .
Other machine learning methods (i.e., LR, ANN, and RF) were respectively implemented using the LogisticRegressionCV, MLPRegressor, and RandomForestRegressor
functions in scikitlearn, a Python library of machine learning tools. Related hyperparameters were tuned through the 3fold crossvalidation according to the parameter settings of benchmark study
[34]. These machine learning methods employed the PV sequence as input variables for computing itemchoice probabilities. Here, each input variable was standardized, and undersampling was conducted to improve prediction performance.ViB Performance evaluation methodology
There are five pairs of training and validation sets of clickstream data in the preprocessed data set [27]. As shown in Table V, each training period is 90 days, and the next day is the validation period. The first four pairs of training and validation sets were used for model estimation, and the last pair 5 was used for performance evaluation. To examine how the sample size affects prediction performance, we prepared smallsample training sets by choosing user–item pairs randomly from the original training set. Here, the sampling rates are 0,1%, 1%, and 10%, and the original training set is referred to as “fullsample.” Note that the results were averaged over ten trials for the sampled training sets.
We considered the top selection task to evaluate prediction performance. Specifically, we focused on items that were viewed by a particular user during a training period. From them, we selected , a set of top items for the user according to estimated itemchoice probabilities. Here, recently viewed ones were selected when there were two or more items that had the same choice probability. Let be the set of items viewed by the user in the validation period. Then, the F1 score is defined by the harmonic average of and as
In the following sections, we examine the F1 scores that were averaged over all users. The percentage of user–item pairs leading to itemchoices is only 0.16%.
ViC Effects of the transitive reduction
We generated constraints in Eq. (8) based on the following three directed graphs.
 Case 1

(Enumeration):
All edges satisfying were enumerated.  Case 2
 Case 3
Table VI gives the problem size of our PV sequence model (7)–(9) for some settings of of PV sequence. Here, the column labeled “#Vars” shows the number of decision variables (i.e., ), and the subsequent columns show the number of constraints in Eq. (8) for the three cases mentioned above.
The number of constraints grew rapidly as and increased in the enumeration case. In contrast, the number of constraints was always kept the smallest by the transitive reduction among the three cases. When for instance, the number of constraints in the operation case was reduced to for SeqUM and for SeqUS by transitive reductions.
The number of constraints was larger for SeqUM than for SeqUS in the enumeration and operation cases. In contrast, the number of constraints was often smaller for SeqUM than for SeqUS in the reduction case. This means that the transitive reduction had greater impacts on SeqUM than on SeqUS in terms of the number of constraints.
Table VII gives the computation time required for solving the optimization problem (7)–(9) for some settings of of PV sequence. Here, “OM” indicates that the computation was aborted due to out of memory. The enumeration case often caused out of memory because of a huge number of constraints; see also Table VI. The operation and reduction cases completed the computations for all the settings of of PV sequence. Moreover, the transitive reduction made the computations faster. A notable example is SeqUM with ; the computation time in the reduction case (i.e., 86.02 s) was only onetenth of that in the operation case (i.e., 906.76 s). These results demonstrate that the transitive reduction improves the efficiency both in terms of computation time and memory usage.
Table VIII gives the computational performance of our optimization model (7)–(9) for some settings of of PV sequence. Here, for each , the largest was chosen such that the computation finished within 30 min. Both SeqUM and SeqUS always delivered higher F1 scores than SeqEmp did. This means that our monotonicity constraint (8) works well for improving the prediction performance. The F1 scores provided by SeqUM and SeqUS were very similar, and they were the largest with . In view of these results, we focus on the setting in the following sections.
ViD Prediction performance of our PV sequence model
(a) ,  (b) ,  (c) , 
(d) ,  (e) ,  (f) , 
(a) SeqEmp,  (b) SeqEmp,  (c) SeqEmp, 
(d) SeqUM,  (e) SeqUM,  (f) SeqUM, 
(g) SeqUS,  (h) SeqUS,  (i) SeqUS, 
(a) SeqEmp,  (b) SeqEmp,  (c) SeqEmp, 
(d) SeqUM,  (e) SeqUM,  (f) SeqUM, 
(g) SeqUS,  (h) SeqUS,  (i) SeqUS, 
Fig. 3 shows the F1 scores of the twodimensional probability table and our PV sequence model using the sampled training sets, where the number of selected items is , and the setting of PV sequence is .
When the fullsample training set was used, SeqUM and SeqUS always delivered better prediction performance than the other methods did. When the 1% and 10%sampled training sets were used, the prediction performance of SeqUS decreased slightly, whereas SeqUM still performed the best of all the methods. When the 0.1%sampled training set was used, 2dimMono always performed better than SeqUS did, and in the case of , 2dimMono showed the best prediction performance of all the methods. These results suggest that our PV sequence model performs very well especially when the sample size is sufficiently large.
The prediction performance of SeqEmp deteriorated rapidly as the sampling rate decreased, and this performance was always much worse than that of 2dimEmp. Meanwhile, SeqUM and SeqUS maintained high prediction performance even when the 0.1%sampled training set was used. This means that the monotonicity constraint (8) in our PV sequence model is more effective than the monotonicity constraints (3)–(4) in the twodimensional monotonicity model.
Fig. 4 shows the F1 scores of the machine learning methods (i.e., LR, ANN, and RF) and our PV sequence model (i.e., SeqUM) using the fullsample training set, where the number of selected items is , and the setting of PV sequence is . Note that in this figure, SeqUM( ) represents the optimization model (7)–(9), where the itemchoice probabilities computed by each machine learning method were substituted into ; see also Section 4.4.
SeqUM delivered better prediction performance than all the machine learning methods did except in the case of Fig. 4(f); only in this case, LR showed better prediction performance. Moreover, SeqUM( ) improved the prediction performance of machine leaning methods, and such improvement effects were especially large for ANN and RF. This means that our monotonicity constraint (8) is also very helpful in correcting the prediction values of other machine learning methods.
ViE Analysis of estimated itemchoice probabilities
Fig. 5 shows itemchoice probabilities estimated by our PV sequence model using the fullsample training set, where the setting of PV sequence is . Here, we focus on PV sequences of the form and depict estimates of itemchoice probabilities on for each . Note also that the number of associated user–item pairs got smaller as the value of increased.
Since SeqEmp takes no account of the monotonicity constraint (8), itemchoice probabilities estimated by SeqEmp have irregular shapes for . In contrast, itemchoice probabilities estimated with the monotonicity constraint (8) are relatively smooth. Because of the Up operation, itemchoice probabilities estimated by SeqUM and SeqUS increase as moves from to . Because of the Move operation, itemchoice probabilities estimated by SeqUM also increase as moves from to . On the other hand, itemchoice probabilities estimated by SeqUS are relatively high around . This highlights the difference in the monotonicity constraint (8) between the two posets and .
Fig. 6 shows itemchoice probabilities estimated by our PV sequence model using the 10%sampled training set, where the setting of PV sequence is . In this case, since the sample size was reduced, itemchoice probabilities estimated by SeqEmp are highly unstable. In particular, itemchoice probabilities were estimated to be zero for all with in Fig. 6(c); however, this is unreasonable from the perspective of frequency. In contrast, SeqUM and SeqUS estimated itemchoice probabilities that increase monotonically with respect to .
Vii Conclusion
This paper dealt with the shaperestricted optimization model for estimating itemchoice probabilities on an ecommerce website. Our monotonicity constraint based on the tailored order relations made it possible to obtain closer estimates of itemchoice probabilities for all possible PV sequences. To improve computational efficiency of our optimization model, we devised constructive algorithms for transitive reduction that removes all redundant constraints from the optimization model.
We assessed the effectiveness of our method through experiments using the realworld crickstream data. The experimental results demonstrated that the transitive reduction enhanced the efficiency both in terms of computation time and memory usage of our optimization model. In addition, our method delivered better prediction performance than did the twodimensional monotonicity model [19] and the common machine learning methods. Our method was also helpful in correcting prediction values computed by other machine learning methods.
Our research contribution is threefold. First, we derived two types of posets by exploiting the properties of recency and frequency of user’s previous PVs. These posets allow us to place appropriate monotonicity constraints on itemchoice probabilities. Next, we developed algorithms for transitive reduction specialized for our posets. Our algorithms are more efficient in terms of the time complexity than generalpurpose algorithms for transitive reduction. Finally, our method expands a great potential of shaperestricted regression for predicting user behavior on ecommerce websites.
Once the optimization model for estimating itemchoice probabilities has been solved, the obtained results can easily be put into practical use on ecommerce websites. Accurate estimates of itemchoice probabilities will be useful in customizing a sales promotion according to the needs of a particular user. In addition, our method, which can estimate user preferences from clickstream data, aids in creating a highquality user–item rating matrix for recommender algorithms [20].
A future direction of study will be to develop new posets that further improve the prediction performance of our PV sequence model. Another direction of future research will be to incorporate user/item heterogeneity into our optimization model, as in the case of latent class modeling of twodimensional probability table [32].
Appendix A Proofs
Aa Proof of Theorem 3
The “only if” part
Firstly, we suppose that . We then have from Definition 4 and Lemma 2. Therefore, we consider the following two cases.
Case 1: for some
For the sake of contradiction, we assume that (i.e., ).
Then there exists an index such that .
If , then we have and .
If , then we have and .
These results imply that , which contradicts due to condition (C2) of Lemma 2.
Case 2: for some
We assume that (i.e., ) for the sake of contradiction.
Then there exists an index such that .
If , then we have and .
If , then we have and .
These results imply that , which contradicts due to condition (C2) of Lemma 2.
The “if” part
Next, we show that in the following two cases.
Case 1: Condition (UM1) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied.
To draw the condition (C2), we consider such that .
From Lemma 1, we have .
Since is next to in the lexicographic order, we have .
Case 2: Condition (UM2) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied.
To draw the condition (C2), we consider such that .
From Lemma 1, we have , which implies that for all .
Therefore, we cannot apply any operations to for in the process of transforming from into .
To keep the value of constant, we can apply only the Move operation.
However, once the Move operation is applied to for , the resultant sequence cannot be converted into .
As a result, only can be performed.
This means that or .
AB Proof of Theorem 4
The “only if” part
Firstly, we suppose that . We then have from Definition 5 and Lemma 2. Therefore, we consider the following two cases.
Case 1: for some
For the sake of contradiction, we assume that for some .
If , then we have and .
If , then we have and .
These results imply that , which contradicts due to condition (C2) of Lemma 2.
Case 2: for some
For the sake of contradiction, we assume that for some .
If , then we have , , and .
If , then we have and .
If
Comments
There are no comments yet.