Predicting Online Item-choice Behavior: A Shape-restricted Regression Perspective

04/18/2020 ∙ by Naoki Nishimura, et al. ∙ University of Tsukuba 0

This paper is concerned with examining the relationship between users' page view (PV) history and their item-choice behavior on an e-commerce website. We focus particularly on the PV sequence, which represents a time series of the number of PVs for each user–item pair. We propose a shape-restricted optimization model to accurately estimate item-choice probabilities for all possible PV sequences. In this model, we impose monotonicity constraints on item-choice probabilities by exploiting partial orders specialized for the PV sequences based on the recency and frequency of each user's previous PVs. To improve the computational efficiency of our optimization model, we devise efficient algorithms for eliminating all redundant constraints according to the transitivity of the partial orders. Experimental results using real-world clickstream data demonstrate that higher prediction performance is achieved with our method than with the state-of-the-art optimization model and common machine learning methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Nowadays, a growing number of companies are operating e-commerce websites that allow users to browse and purchase a variety of items via the Internet [45]. In this situation, there is great potential value in analyzing users’ item-choice behavior from clickstream data, which is a record of users’ page view (PV) history on an e-commerce website. If we grasp a user’s purchase intention behind the PV history, we can lead the user to a target page or design a special sales promotion. This gives companies an opportunity to build profitable relationships with website users [22, 33]. Companies can also use the clickstream data to enhance the quality of operational forecasting and inventory management [18]. Meanwhile, users often find it difficult to select an appropriate item from the plethora of choices presented by an e-commerce website [1]. Analyzing users’ item-choice behavior can improve the performance of recommender systems that assist users to discover new and worthwhile items [20]. For all of these reasons, a number of prior studies have investigated clickstream data from various perspectives [7]. In particular, we focus on closely examining the relationship between users’ PV history and their item-choice behavior on an e-commerce website.

It has been demonstrated that the recency and frequency of a user’s past purchases are critical indicators for purchase prediction [13, 46] and sequential pattern mining [9]. In light of this observation, Iwanaga et al. [19] developed the shape-restricted optimization model specialized for estimating the item-choice probabilities from the recency and frequency of each user’s previous PVs. This method creates a two-dimensional probability table consisting of item-choice probabilities for all recency and frequency combinations of each user’s previous PVs. Nishimura et al. [32]

employed a latent-class modeling to integrate item heterogeneity into the two-dimensional probability table. Their experimental results demonstrated that higher prediction performance was achieved with the two-dimensional probability table than with common machine learning methods, namely, logistic regression, kernel-based support vector machines, artificial neural networks, and random forests. It is notable, however, that each user’s PV history is reduced to the two dimensions (

i.e., recency and frequency) by means of the two-dimensional probability table. Such a dimensionality reduction may markedly decrease the amount of information contained in the PV history about users’ item-choice behavior.

This paper is focused on the PV sequence, which represents a time series of the number of PVs taken by a user–item pair in each period. In contrast to the two-dimensional probability table, the PV sequence allows us to retain detailed information contained in the PV history. However, since there are a huge number of possible PV sequences, it is extremely difficult to accurately estimate item-choice probabilities for all those PV sequences. To overcome this difficulty, we propose a shape-restricted optimization model, where the monotonicity constraint is imposed on item-choice probabilities based on a partially ordered set (poset) specialized for PV sequences. Although this optimization model contains a huge number of constraints, all redundant constraints can be eliminated according to the transitivity of partial order. To accomplish this, we compute a transitivity reduction [2] of a directed graph representing the poset. We demonstrate the effectiveness of our method through experiments using real-world clickstream data.

The main contributions of this paper are highlighted as follows.

  • We propose a shape-restricted optimization model for estimating item-choice probabilities from each user’s previous PV sequence. This PV sequence model exploits the monotonicity constraint to provide precise estimates of item-choice probabilities.

  • We derive two types of posets of PV sequences according to the recency and frequency of each user’s previous PVs. Experimental results show that the monotonicity constraint based on these posets greatly enhances the prediction performance of our PV sequence model.

  • We devise constructive algorithms for transitive reduction specialized for our posets. The time complexity of our algorithms is much smaller than that of general-purpose algorithms. Experimental results reveal that the transitive reduction improves the efficiency both in terms of computation time and memory usage of our PV sequence model.

  • We verify based on experimental results that higher prediction performance is achieved with our method than with the two-dimensional probability table and common machine learning methods, namely, logistic regression, artificial neural networks, and random forests.

The remainder of this paper is organized as follows. Section 2 gives a brief review of related work. Section 3 explains the two-dimensional probability table [19], and Section 4 presents our PV sequence model. Section 5 describes our constructive algorithms for transitive reduction. Section 6 evaluates the effectiveness of our method based on experimental results. Section 7 concludes with a brief summary of our work and a discussion of future research directions.

Ii Related work

This section gives a brief survey of predicting online user behavior and discusses some related work on shape-restricted regression.

Ii-a Prediction of online user behavior

There are a number of prior studies that aim at predicting users’ purchase behavior on e-commerce websites [10]. A mainstream research involves predicting the occurrence of purchase for each session by means of stochastic/statistical models [5, 23, 30, 31, 36, 41, 46], whereas this approach gives no consideration to which item to be chosen by users.

Various machine learning methods have been employed for prediction of online item-choice behavior; these include logistic regression [12, 53], association rule mining [37], support vector machines [38, 53], ensemble learning methods [25, 26, 39, 52, 54], and artificial neural networks [21, 47, 50]. Some tailored statistical models have also been proposed; for instance, Moe [29]

devised a two-stage multinomial logit model that separates the decision-making process into a item-view decision and a purchase decision. Yao

et al. [51] proposed a joint framework consisting of user-level factor estimation and item-level factor aggregation based on the buyer decision process. Borges and Levener [6]

employed Markov chain models to estimate the probability of the next link choice of a user.

These prior studies have made effective use of clickstream data in various prediction methods. Additionally, paying attention to time-evolving user behavior is crucial for precise prediction of online item-choice behavior. In light of these insights, we focus on sequences of user PVs to estimate users’ item-choice probabilities on e-commerce websites. Moreover, we evaluate the prediction performance of our method by comparison with machine learning methods that are commonly employed in prior studies.

Ii-B Shape-restricted regression

In many practical situations, we know prior information about the relationship between explanatory and response variables. For instance, utility functions are assumed to be increasing and concave from economic theory 

[28], and option pricing functions are restricted to be monotone and convex from finance theory [3]. Shape-restricted regression fits a nonparametric function to a set of given observations under such shape restrictions (e.g., monotonicity, convexity/concavity, and unimodality) [8, 15, 16, 48].

Isotonic regression is the most commonly used method of shape-restricted regression. In general, the isotonic regression is the problem of estimating a real-valued monotone (i.e., non-decreasing or non-increasing) function with respect to a given partial order of observations [35]. Some regularization techniques [14, 44] and estimation algorithms [17, 35, 43] have been proposed for isotonic regression.

One of the greatest advantages of shape-restricted regression is that the prediction performance of regression models can be improved because overfitting is mitigated by shape restrictions [4]. To take this advantage of shape restrictions, Iwanaga et al. [19] devised the shape-restricted optimization model for estimating item-choice probabilities on e-commerce websites. In line with Iwanaga et al. [19], we propose a shape-restricted optimization model based on order relations of PV sequences to upgrade the prediction performance.

Iii Two-dimensional probability table

This section gives a brief review of the two-dimensional probability table proposed by Iwanaga et al. [19].

Iii-a Empirical probability table

#PVs choice
user item Apr. 1st Apr. 2nd Apr. 3rd Apr. 4th
1 0 1 0
0 1 0 1
3 0 0 0
0 0 3 1
1 1 1 0
2 0 1 0
TABLE I: Page View History of Six User–item Pairs

Table I gives an example of a PV history of six user–item pairs. For instance, user viewed the webpage of item once each on April 1st and 3rd. We focus on user choices (e.g., revisit and purchase) on April 4th, which is called the base date. For instance, user chose not item but item on the base date. It is supposed for each user–item pair that recency and frequency are characterized by the day of last PV and the total number of PVs, respectively. As shown in Table I, the PV history can be summarized by the recency and frequency combination , where and are the index sets representing the recency and frequency, respectively.

Let us denote by the number of user–item pairs that have . We also set to the number of choices occurred by user–item pairs that have on the base date. In the case of Table I, the empirical probability table is calculated as

(1)

where, for reasons of expediency, for with .

Iii-B Two-dimensional monotonicity model

It is reasonable to assume that the recency and frequency of user–item pairs are positively associated with users’ item-choice probabilities. To estimate users’ item-choice probabilities for all recency and frequency combinations , the two-dimensional monotonicity model [19] minimizes the weighted sum of squared errors under monotonicity constraints with respect to recency and frequency.

(2)
   subject to (3)
(4)
(5)

It is notable, however, that different PV histories are often indistinguishable according to the recency and frequency. A typical example is a set of user–item pairs , , and in Table I; although their PV histories are really different, they have the same value of the recency and frequency combination. To distinguish between these PV histories, we exploit the PV sequence in the next section.

Iv PV sequence model

This section presents our shape-restricted optimization model for estimating item-choice probabilities from each user’s previous PV sequence.

Iv-a PV sequence

The PV sequence for each user–item pair represents a time series of the number of PVs:

where is the number of PVs periods ago; see also Table I. Note that the sequence terms are arranged in reverse chronological order; that is, moves back into the past as the index increases.

Throughout the paper, we express the set of consecutive integers as

where when . Then, the set of possible PV sequences is defined as

where is the maximum number of PVs in each period, and is the number of considered periods.

Our objective is to estimate item-choice probabilities for all PV sequences . However, it is extremely difficult to accurately estimate such probabilities because there are a huge number of PV sequences. In the case of for instance, the number of different PV sequences is , whereas that of the recency and frequency combinations is only . To avoid this difficulty, we shall make effective use of monotonicity constraints on item-choice probabilities as in the optimization model (2)–(5). In the next section, we introduce three operations underlying the development of monotonicity constraints.

Iv-B Operations based on recency and frequency

It is reasonable from the perspective of frequency that the item-choice probability increases as the number of PVs in a particular period gets larger. To formulate this reasoning, we define the following operation.

Definition 1 (Up).

On the domain

the function is defined as

For instance, we have , and . Since the frequency of PVs is increased by this operation, the monotonicity constraint should be satisfied by item-choice probabilities.

It is inferred from the perspective of recency that more recent PVs have larger effects of increasing the item-choice probability. To formulate this inference, we consider the following operation that moves one PV from an old period to a new period.

Definition 2 (Move).

On the domain

the function is defined as

For instance, we have , and . Since the number of recent PVs is increased by this operation, the monotonicity constraint should be satisfied by item-choice probabilities.

The PV sequence represents a user’s continued interest in a certain item for three periods. In contrast, the PV sequence implies that a user’s interest decreases during the recent two periods. In this sense, the monotonicity constraint may not be validated. Accordingly, we define the following alternative operation that exchanges the numbers of PVs such that the number of recent PVs will be increased.

Definition 3 (Swap).

On the domain

the function is defined as

We have because , and because . Since the number of recent PVs is increased by this operation, the monotonicity constraint should be satisfied by item-choice probabilities. It is notable that the monotonicity constraint is not implied by this operation.

Iv-C Partially ordered sets

Let be a subset of PV sequences. The image of each operation is then defined as

Let us define for . The following definition states that the binary relation holds when can be transformed into by repeated application of Up and Move.

Definition 4 ().

Suppose that . We write if and only if there exists such that

We also write if or .

Similarly, we define for . Then, the binary relation holds when can be transformed into by repeated application of Up and Swap.

Definition 5 ().

Suppose that . We write if and only if there exists such that

We also write if or .

To prove properties of these binary relations, we can use the lexicographic order, which is an well-known linear order [40].

Definition 6 ().

Suppose that . We write if and only if there exists such that and for . We also write if or .

Each application of Up, Move, and Swap makes a PV sequence greater in the lexicographic order. Therefore, we can obtain the following lemma.

Lemma 1.

Suppose that . If or , then .

The following theorem states that a partial order of PV sequences is derived by the operations Up and Move.

Theorem 1.

The pair is a poset.

Proof.

It is clear from Definition 4 that the relation is reflexive and transitive. Suppose that and . It follows from Lemma 1 that and . Since the relation is antisymmetric, we have , which proves that the relation is also antisymmetric. ∎

In the same manner, we can prove the following theorem for the operations Up and Swap.

Theorem 2.

The pair is a poset.

Iv-D Shape-restricted optimization model

Let be the number of user–item pairs that have the PV sequence . Also, is the number of choices provoked by user–item pairs that have on the base date. Similarly to Eq. (1), we can calculate empirical item-choice probabilities as

(6)

Our shape-restricted optimization model minimizes the weighted sum of squared errors subject to the monotonicity constraint.

(7)
subject to (8)
(9)

where, in Eq. (8) is defined by one of the partial orders and .

The monotonicity constraint (8) aids in enhancing the estimation accuracy of item-choice probabilities. In addition, our shape-restricted optimization model can be used at a post-processing step to upgrade prediction performance of other machine learning methods. Specifically, we first compute item-choice probabilities by using a machine learning method. We next substitute the computed values into and then solve the optimization model (7)–(9). Consequently, we can obtain item-choice probabilities corrected by the monotonicity constraint (8). The usefulness of this approach will be illustrated in Section 6.4.

However, since , it follows that the number of constraints in Eq. (8) is , which can be extremely large. When for instance, we have . In the next section, we cope with this difficulty by removing redundant constraints in Eq. (8).

V Algorithms for transitive reduction

(a) Operation-based graph
(b) Transitive reduction
Fig. 1: Directed graph representations of the poset with
(a) Operation-based graph
(b) Transitive reduction
Fig. 2: Directed graph representations of the poset with

This section describes our constructive algorithms for transitive reduction to decrease the problem size of our shape-restricted optimization model.

V-a Transitive reduction

A poset can be represented by a directed graph , where and are the sets of nodes and directed edges, respectively. Each directed edge in this graph corresponds to the order relation , so the number of directed edges coincides with that of constraints in Eq. (8).

Figs. 1 and 2 show directed graph representations of the posets and , respectively. Each edge in Figs. 1(a) and 2(a) corresponds to one of the operations Up, Move, and Swap; edge is red-colored if , and it is black-colored if or . These directed graphs shown in Figs. 1(a) and 2(a) can be created easily.

Now, let us suppose that there are three edges

In this case, edge is implied by the other edges due to the transitivity of partial order:

or equivalently,

As a result, the edge is redundant and can be removed from the directed graph.

A transitive reduction, also known as a Hasse diagram, of a directed graph is its subgraph such that all redundant edges are removed using the transitivity of partial order [2]. Figs. 1(b) and 2(b) show transitive reductions of the directed graphs shown in Figs. 1(a) and 2(a), respectively. By computing transitive reductions, the number of edges is reduced from 90 to 42 in Fig. 1, and from 81 to 46 in Fig. 2. It is known that the transitive reduction is unique [2].

V-B General-purpose algorithms

The transitive reduction is characterized by the following lemma [40].

Lemma 2.

Suppose that . Then, holds if and only if both the following conditions are fulfilled:

(C1)

;

(C2)

if satisfies , then .

A basic strategy of general-purpose algorithms for transitive reduction involves the following steps.

Step 1:

An exhaustive directed graph is generated from a given poset .

Step 2:

The transitive reduction is computed from the directed graph using Lemma 2.

Various algorithms have been proposed so far to speed up the computation of Step 2. Recall that in our situation. Warshall’s algorithm [49] has the time complexity of to complete Step 2 [40]. This time complexity can be reduced to using a sophisticated algorithm for fast matrix multiplication [24].

However, these general-purpose algorithms are clearly inefficient especially when is very large. In addition, a huge amount of computations are required also for Step 1. To resolve this difficulty, we devise specialized algorithms to directly construct a transitive reduction.

V-C Constructive algorithms

Let be a transitive reduction of a directed graph representing the poset . Then, the transitive reduction can be characterized by the following theorem.

Theorem 3.

Suppose that . Then, holds if and only if any one of the following conditions is fulfilled:

(UM1)

;

(UM2)

such that .

Proof.

See Appendix A-A. ∎

Theorem 3 gives a constructive algorithm that directly computes the transitive reduction without generating an exhaustive directed graph . Our algorithm is based on the breadth-first search algorithm [11]. Specifically, we start with the node list . At each iteration of the algorithm, we choose , enumerate such that , and add these nodes to the list .

Table II shows this enumeration process for with . The operations Up and Move generate

which amounts to searching edges in Fig. 1(a). We next check whether each satisfies the conditions (UM1) and (UM2) of Theorem 3. As shown in Table II, we choose

and add them to the list ; this amounts to enumerating edges in Fig. 1(b).

Operation (UM1) (UM2)
TABLE II: Process of enumerating such that

We show a pseudocode of our constructive algorithm (Algorithm 1) in Appendix B-A. Recalling the time complexity analysis of the breadth-first search [11], one readily sees that the time complexity of Algorithm 1 is , which is much smaller than achieved by the general-purpose algorithm [24] especially when is very large.

Next, we focus on the transitive reduction of a directed graph representing the poset . Then, the transitive reduction can be characterized by the following theorem.

Theorem 4.

Suppose that . Then, holds if and only if any one of the following conditions is fulfilled:

(US1)

such that and for all ;

(US2)

such that and for all .

Proof.

See Appendix A-B. ∎

1 Theorem 4 also gives a constructive algorithm for computing the transitive reduction . Let us consider again as an example with . As shown in Table III, the operations Up and Swap generate

and we choose

see also Figs. 2(a) and 2(b).

Operation (US1) (US2)
TABLE III: Process of enumerating such that

We show a pseudocode of our constructive algorithm (Algorithm 2) in Appendix B-B. Its time complexity is estimated to be , which is larger than that of Algorithm 1 but is still much smaller than the general-purpose algorithm [24] especially when is very large.

Vi Experiments

The experimental results reported in this section evaluate the effectiveness of our method for estimating item-choice probabilities.

We used real-world clickstream data collected from a Chinese e-commerce website Tmall111https://tianchi.aliyun.com/dataset/. We used the date set222https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0 preprocessed by Ludewig and Jannach [27]. Each record corresponds to one PV and contains information such as user ID, item ID, and time stamp. The data set involves 28,316,459 unique user–item pairs composed of 422,282 users and 624,221 items.

Vi-a Methods for comparison

Abbrev. Method
2dimEmp Empirical probability table (1[19]
2dimMono two-dimensional monotonicity model (2)–(5[19]
SeqEmp Empirical probabilities (6) for PV sequences
SeqUM Our PV sequence model (7)–(9) using
SeqUS Our PV sequence model (7)–(9) using
LR -regularized logistic regression
ANN Artificial neural networks for regression
RF Random forest of regression trees
TABLE IV: Methods for comparison
Training
Pair ID Start End Validation
1 21 May 2015 18 August 2015 19 August 2015
2 31 May 2015 28 August 2015 29 August 2015
3 10 June 2015 7 September 2015 8 September 2015
4 20 June 2015 17 September 2015 18 September 2015
5 30 June 2015 27 September 2015 28 September 2015
TABLE V: Training and validation periods
#Cons in Eq. (8)
Enumeration Operation Reduction
#Vars SeqUM SeqUS SeqUM SeqUS SeqUM SeqUS
5 1 32 430 430 160 160 48 48
5 2 243 21,383 17,945 1,890 1,620 594 634
5 3 1,024 346,374 255,260 9,600 7,680 3,072 3,546
5 4 3,125 3,045,422 2,038,236 32,500 25,000 10,500 12,898
5 5 7,776 18,136,645 11,282,058 86,400 64,800 28,080 36,174
5 6 16,807 82,390,140 48,407,475 195,510 144,060 63,798 85,272
1 6 7 21 21 6 6 6 6
2 6 49 1,001 861 120 105 78 93
3 6 343 42,903 32,067 1,638 1,323 798 1,018
4 6 2,401 1,860,622 1,224,030 18,816 14,406 7,350 9,675
5 6 16,807 82,390,140 48,407,475 195,510 144,060 63,798 85,272
TABLE VI: Problem size of our PV sequence model (7)–(9)
Time [s]
Enumeration Operation Reduction
#Vars SeqUM SeqUS SeqUM SeqUS SeqUM SeqUS
5 1 32 0.00 0.01 0.00 0.00 0.00 0.00
5 2 243 2.32 1.66 0.09 0.07 0.03 0.02
5 3 1,024 558.22 64.35 3.41 0.71 0.13 0.26
5 4 3,125 OM OM 24.07 13.86 1.72 5.80
5 5 7,776 OM OM 180.53 67.34 9.71 36.94
5 6 16,807 OM OM 906.76 522.84 86.02 286.30
1 6 7 0.00 0.00 0.00 0.00 0.00 0.00
2 6 49 0.03 0.01 0.01 0.00 0.00 0.00
3 6 343 12.80 1.68 0.20 0.03 0.05 0.02
4 6 2,401 OM OM 8.07 4.09 2.12 2.87
5 6 16,807 OM OM 906.76 522.84 86.02 286.30
TABLE VII: Computation time for our PV sequence model (7)–(9)
#Cons in Eq. (8) Time [s] F1 score [%],
#Vars SeqUM SeqUS SeqUM SeqUS SeqEmp SeqUM SeqUS
3 30 29,791 84,630 118,850 86.72 241.46 12.25 12.40 12.40
4 12 28,561 99,372 142,800 198.82 539.76 12.68 12.93 12.95
5 6 16,807 63,798 85,272 86.02 286.30 12.90 13.18 13.18
6 4 15,625 62,500 76,506 62.92 209.67 13.14 13.49 13.48
7 3 16,384 67,584 76,818 96.08 254.31 13.23 13.52 13.53
8 2 6,561 24,786 25,879 19.35 17.22 13.11 13.37 13.35
9 2 19,683 83,106 86,386 244.15 256.42 13.07 13.40 13.37
TABLE VIII: Computational performance of our PV sequence model (7)–(9)
(a) , (b) , (c) ,
(d) , (e) , (f) ,
Fig. 3: Comparison of prediction performance with the two-dimensional probability table

We compared the performance of the methods listed in Table IV. All computations were performed on an Apple MacBook Pro computer with an Intel Core i7-5557U CPU (3.10 GHz) and 16 GB of memory.

The optimization models (2)–(5) and (7)–(9) were solved using OSQP333https://osqp.org/docs/index.html [42], a numerical optimization package for solving convex quadratic optimization problems. As in Table I, daily-PV sequences were calculated for each user–item pair, where is the maximum number of daily PVs, and is the number of terms (i.e., past days) in the PV sequence. In this process, all PVs more than days ago were added to the number of PVs days ago, and the number of daily PVs of more than was rounded down to . Similarly, the recency and frequency combinations were calculated using daily PVs as in Table I, where .

Other machine learning methods (i.e., LR, ANN, and RF) were respectively implemented using the LogisticRegressionCV, MLPRegressor, and RandomForestRegressor

functions in scikit-learn, a Python library of machine learning tools. Related hyperparameters were tuned through the 3-fold cross-validation according to the parameter settings of benchmark study 

[34]. These machine learning methods employed the PV sequence as input variables for computing item-choice probabilities. Here, each input variable was standardized, and undersampling was conducted to improve prediction performance.

Vi-B Performance evaluation methodology

There are five pairs of training and validation sets of clickstream data in the preprocessed data set [27]. As shown in Table V, each training period is 90 days, and the next day is the validation period. The first four pairs of training and validation sets were used for model estimation, and the last pair 5 was used for performance evaluation. To examine how the sample size affects prediction performance, we prepared small-sample training sets by choosing user–item pairs randomly from the original training set. Here, the sampling rates are 0,1%, 1%, and 10%, and the original training set is referred to as “full-sample.” Note that the results were averaged over ten trials for the sampled training sets.

We considered the top- selection task to evaluate prediction performance. Specifically, we focused on items that were viewed by a particular user during a training period. From them, we selected , a set of top items for the user according to estimated item-choice probabilities. Here, recently viewed ones were selected when there were two or more items that had the same choice probability. Let be the set of items viewed by the user in the validation period. Then, the F1 score is defined by the harmonic average of and as

In the following sections, we examine the F1 scores that were averaged over all users. The percentage of user–item pairs leading to item-choices is only 0.16%.

Vi-C Effects of the transitive reduction

We generated constraints in Eq. (8) based on the following three directed graphs.

Case 1

(Enumeration):
All edges satisfying were enumerated.

Case 2

(Operation):
Edges corresponding to the operations Up, Move, and Swap were generated as in Figs. 1(a) and 2(a).

Case 3

(Reduction):
Transitive reduction was computed using our algorithms as in Figs. 1(b) and 2(b).

Table VI gives the problem size of our PV sequence model (7)–(9) for some settings of of PV sequence. Here, the column labeled “#Vars” shows the number of decision variables (i.e., ), and the subsequent columns show the number of constraints in Eq. (8) for the three cases mentioned above.

The number of constraints grew rapidly as and increased in the enumeration case. In contrast, the number of constraints was always kept the smallest by the transitive reduction among the three cases. When for instance, the number of constraints in the operation case was reduced to for SeqUM and for SeqUS by transitive reductions.

The number of constraints was larger for SeqUM than for SeqUS in the enumeration and operation cases. In contrast, the number of constraints was often smaller for SeqUM than for SeqUS in the reduction case. This means that the transitive reduction had greater impacts on SeqUM than on SeqUS in terms of the number of constraints.

Table VII gives the computation time required for solving the optimization problem (7)–(9) for some settings of of PV sequence. Here, “OM” indicates that the computation was aborted due to out of memory. The enumeration case often caused out of memory because of a huge number of constraints; see also Table VI. The operation and reduction cases completed the computations for all the settings of of PV sequence. Moreover, the transitive reduction made the computations faster. A notable example is SeqUM with ; the computation time in the reduction case (i.e., 86.02 s) was only one-tenth of that in the operation case (i.e., 906.76 s). These results demonstrate that the transitive reduction improves the efficiency both in terms of computation time and memory usage.

Table VIII gives the computational performance of our optimization model (7)–(9) for some settings of of PV sequence. Here, for each , the largest was chosen such that the computation finished within 30 min. Both SeqUM and SeqUS always delivered higher F1 scores than SeqEmp did. This means that our monotonicity constraint (8) works well for improving the prediction performance. The F1 scores provided by SeqUM and SeqUS were very similar, and they were the largest with . In view of these results, we focus on the setting in the following sections.

Vi-D Prediction performance of our PV sequence model

(a) , (b) , (c) ,
(d) , (e) , (f) ,
Fig. 4: Comparison of prediction performance with machine learning methods
(a) SeqEmp, (b) SeqEmp, (c) SeqEmp,
(d) SeqUM, (e) SeqUM, (f) SeqUM,
(g) SeqUS, (h) SeqUS, (i) SeqUS,
Fig. 5: Item-choice probabilities estimated from the full-sample training set with
(a) SeqEmp, (b) SeqEmp, (c) SeqEmp,
(d) SeqUM, (e) SeqUM, (f) SeqUM,
(g) SeqUS, (h) SeqUS, (i) SeqUS,
Fig. 6: Item-choice probabilities estimated from the 10%-sampled training set with

Fig. 3 shows the F1 scores of the two-dimensional probability table and our PV sequence model using the sampled training sets, where the number of selected items is , and the setting of PV sequence is .

When the full-sample training set was used, SeqUM and SeqUS always delivered better prediction performance than the other methods did. When the 1%- and 10%-sampled training sets were used, the prediction performance of SeqUS decreased slightly, whereas SeqUM still performed the best of all the methods. When the 0.1%-sampled training set was used, 2dimMono always performed better than SeqUS did, and in the case of , 2dimMono showed the best prediction performance of all the methods. These results suggest that our PV sequence model performs very well especially when the sample size is sufficiently large.

The prediction performance of SeqEmp deteriorated rapidly as the sampling rate decreased, and this performance was always much worse than that of 2dimEmp. Meanwhile, SeqUM and SeqUS maintained high prediction performance even when the 0.1%-sampled training set was used. This means that the monotonicity constraint (8) in our PV sequence model is more effective than the monotonicity constraints (3)–(4) in the two-dimensional monotonicity model.

Fig. 4 shows the F1 scores of the machine learning methods (i.e., LR, ANN, and RF) and our PV sequence model (i.e., SeqUM) using the full-sample training set, where the number of selected items is , and the setting of PV sequence is . Note that in this figure, SeqUM(  ) represents the optimization model (7)–(9), where the item-choice probabilities computed by each machine learning method were substituted into ; see also Section 4.4.

SeqUM delivered better prediction performance than all the machine learning methods did except in the case of Fig. 4(f); only in this case, LR showed better prediction performance. Moreover, SeqUM(  ) improved the prediction performance of machine leaning methods, and such improvement effects were especially large for ANN and RF. This means that our monotonicity constraint (8) is also very helpful in correcting the prediction values of other machine learning methods.

Vi-E Analysis of estimated item-choice probabilities

Fig. 5 shows item-choice probabilities estimated by our PV sequence model using the full-sample training set, where the setting of PV sequence is . Here, we focus on PV sequences of the form and depict estimates of item-choice probabilities on for each . Note also that the number of associated user–item pairs got smaller as the value of increased.

Since SeqEmp takes no account of the monotonicity constraint (8), item-choice probabilities estimated by SeqEmp have irregular shapes for . In contrast, item-choice probabilities estimated with the monotonicity constraint (8) are relatively smooth. Because of the Up operation, item-choice probabilities estimated by SeqUM and SeqUS increase as moves from to . Because of the Move operation, item-choice probabilities estimated by SeqUM also increase as moves from to . On the other hand, item-choice probabilities estimated by SeqUS are relatively high around . This highlights the difference in the monotonicity constraint (8) between the two posets and .

Fig. 6 shows item-choice probabilities estimated by our PV sequence model using the 10%-sampled training set, where the setting of PV sequence is . In this case, since the sample size was reduced, item-choice probabilities estimated by SeqEmp are highly unstable. In particular, item-choice probabilities were estimated to be zero for all with in Fig. 6(c); however, this is unreasonable from the perspective of frequency. In contrast, SeqUM and SeqUS estimated item-choice probabilities that increase monotonically with respect to .

Vii Conclusion

This paper dealt with the shape-restricted optimization model for estimating item-choice probabilities on an e-commerce website. Our monotonicity constraint based on the tailored order relations made it possible to obtain closer estimates of item-choice probabilities for all possible PV sequences. To improve computational efficiency of our optimization model, we devised constructive algorithms for transitive reduction that removes all redundant constraints from the optimization model.

We assessed the effectiveness of our method through experiments using the real-world crickstream data. The experimental results demonstrated that the transitive reduction enhanced the efficiency both in terms of computation time and memory usage of our optimization model. In addition, our method delivered better prediction performance than did the two-dimensional monotonicity model [19] and the common machine learning methods. Our method was also helpful in correcting prediction values computed by other machine learning methods.

Our research contribution is threefold. First, we derived two types of posets by exploiting the properties of recency and frequency of user’s previous PVs. These posets allow us to place appropriate monotonicity constraints on item-choice probabilities. Next, we developed algorithms for transitive reduction specialized for our posets. Our algorithms are more efficient in terms of the time complexity than general-purpose algorithms for transitive reduction. Finally, our method expands a great potential of shape-restricted regression for predicting user behavior on e-commerce websites.

Once the optimization model for estimating item-choice probabilities has been solved, the obtained results can easily be put into practical use on e-commerce websites. Accurate estimates of item-choice probabilities will be useful in customizing a sales promotion according to the needs of a particular user. In addition, our method, which can estimate user preferences from clickstream data, aids in creating a high-quality user–item rating matrix for recommender algorithms [20].

A future direction of study will be to develop new posets that further improve the prediction performance of our PV sequence model. Another direction of future research will be to incorporate user/item heterogeneity into our optimization model, as in the case of latent class modeling of two-dimensional probability table [32].

Appendix A Proofs

A-a Proof of Theorem 3

The “only if” part

Firstly, we suppose that . We then have from Definition 4 and Lemma 2. Therefore, we consider the following two cases.

Case 1: for some
For the sake of contradiction, we assume that (i.e., ). Then there exists an index such that . If , then we have and . If , then we have and . These results imply that , which contradicts due to condition (C2) of Lemma 2.

Case 2: for some
We assume that (i.e., ) for the sake of contradiction. Then there exists an index such that . If , then we have and . If , then we have and . These results imply that , which contradicts due to condition (C2) of Lemma 2.

The “if” part

Next, we show that in the following two cases.

Case 1: Condition (UM1) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To draw the condition (C2), we consider such that . From Lemma 1, we have . Since is next to in the lexicographic order, we have .

Case 2: Condition (UM2) is fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To draw the condition (C2), we consider such that . From Lemma 1, we have , which implies that for all . Therefore, we cannot apply any operations to for in the process of transforming from into . To keep the value of constant, we can apply only the Move operation. However, once the Move operation is applied to for , the resultant sequence cannot be converted into . As a result, only can be performed. This means that or .

A-B Proof of Theorem 4

The “only if” part

Firstly, we suppose that . We then have from Definition 5 and Lemma 2. Therefore, we consider the following two cases.

Case 1: for some
For the sake of contradiction, we assume that for some . If , then we have and . If , then we have and . These results imply that , which contradicts due to condition (C2) of Lemma 2.

Case 2: for some
For the sake of contradiction, we assume that for some . If , then we have , , and . If , then we have and . If