The growth of online retailing has created new opportunities for algorithmic techniques to improve the shopper’s experience and drive customer engagement and revenue growth. One such opportunity is optimizing the selection and order of products in response to a search query. In a typical online retailing scenario, upon receiving a search query from a shopper, the platform displays a relevant assortment of products. These products, which can range from grocery items in Amazon Fresh to a list of restaurants on OpenTable, are ranked by the platform, typically in a vertical list. The online shopper then scrolls down this list, to a point depending on her time and patience, in order to find her favorite product. Based on a model of user behavior, the online platform ranks the products in a way that optimizes certain performance metrics. In this paper, we consider two of the most widely-used performance metrics.
User Engagement is a key metric for any online service and retailing platforms are not exceptions. In our context, we define it as how likely it is that a visitor to the website clicks on one of the products she browses.
Revenue is also an important financial metric for a firm. Online retailing platforms have various ways of extracting revenue from sellers that can generally be categorized under one of the following schemes: (1) pay-per-impression, (2) pay-per-click, and (3) pay-per-transaction.
At a high level, we study the platform’s problem of ranking the products in the presence of users with various browsing behaviors and tastes with respect to these two objective functions. The platform offers products to the customers. When a user visits the website, the platform has the opportunity of presenting her with a permutation
of these products. The platform knows the probability distribution of the patience level of the customer (i.e., how far the customer is likely to scroll down and browse the products). Letdenote the probability that the customer has patience level , i.e., she browses the first products on the list.
The platform also has an estimation on how likely it is that a customer with patience levelbrowsing a set of products clicks on some product during her browsing. We assume that this is given by monotone submodular set functions , where is the set of the top products in the ranking. As we will discuss in detail in Section 2, such functions can capture a variety of choice models considered in the current literature on assortment optimization and online retailing. Finally, we assume a revenue structure in which the customer receives a monetary payment for showing product in rank in addition to a constant amount for each click made by the user.
We consider two problems of interest for the platform.
Maximizing User Engagement. Some online platforms mainly focus on maximizing their market share through improving user engagement. One such example is Wayfair (www.wayfair.com), a multi-billion dollar online retailer of home goods (Ferreira et al. 2019). For these platforms, we consider the problem of maximizing user engagement, i.e.,
Maximizing Revenue Subject to Lower Bound on Engagement. Most platforms aim at maximizing revenue, while trying to keep a certain level of market share. In this paper, we focus on the first two schemes of extracting revenue, namely, pay-per-impression and pay-per-click, and omit pay-per-transaction. We assume that the platform charges for showing product at location in addition to a fixed dollar amount per click. One such example is OpenTable (www.opentable.com), another multi-billion dollar online platform that provides restaurant reservation services (OpenTable 2010). For such platforms, we consider the following problem for a given target level for user engagement.
We call functions of the form sequentially submodular. Subsequently, the maximization problems that contain such functions in the objective (and perhaps constraints) are called sequential submodular maximization problems. In this paper, we initiate the study of sequential submodular maximization problems and provide approximation algorithms for them.
For the problem of maximizing user engagement, our main contribution is an optimal approximation algorithm for this problem. On the technical side, our algorithm is based on a reduction of the problem of sequential submodular maximization to the classic problem of submodular maximization subject to a (laminar) matroid constraint. Our reduction relies on two major technical components. The first component is lifting the problem to a larger space where every element is copied times, and then defining a specific submodular function and laminar matroid in this larger space that capture the objective function and the new combinatorial aspect of our problem, i.e., returning a permutation rather than a set. The second component is a post-processing rounding algorithm that given a feasible base of the mentioned matroid returns a permutation by only increasing the objective function. For the reduced problem, we use the known approximation algorithm for monotone submodular functions subject to matroids (Calinescu et al. 2011), which works by running the continuous greedy algorithm on the multi-linear extension of our submodular function, accompanied with the pipage rounding algorithm to return a feasible base of the laminar matroid. Putting all these pieces together allows us to provide an optimal -approximation algorithm for the problem of maximizing user engagement – an unconstrained sequential submodular maximization problem. We note that this generalizes the result of Ferreira et al. (2019) who provide a -approximation ratio for the special case of this problem where the underlying submodular functions are coverage functions.
For the problem of maximizing revenue subject to lower bound on engagement, our main contribution is a bi-criteria approximation algorithm, that is, an algorithm where the objective is approximately optimized and the constraint is approximately satisfied . At the heart of our algorithm, lies a novel characterization of the polytope of all feasible policies (deterministic or randomized) for ranking products. Unfortunately, this polytope can only be described using exponential number of variables and
(doubly) exponential number of constraints. In order to overcome this obstacle, we solve a simpler polynomial-size relaxation linear program using Ellipsoid method which approximates the optimal solution over the polytope of feasible policies. The main technical ingredient of this approximation is a contention resolution scheme(Chekuri et al. 2014) defined on a laminar matroid that captures the combinatorial properties of permutations, and further bounding the integrality gap using correlation gap for submodular functions (Agrawal et al. 2010). We believe our approach to define the polytope of feasible ranking policies and finding approximate solution to submodular optimization problems defined over it could be of independent interest.
1.1 Related Work
A work closely related to ours is that of Ferreira et al. (2019) who, in the context of online retailing, consider the problem of ranking assortments with the objective of maximizing engagement. This problem is a special case of sequential submodular optimization, where the submodular functions of interest are all coverage functions. Ferreira et al. (2019) provide a 0.5-approximation greedy algorithm for this problem (while showing a -approximation is admissible under the assumption that click probabilities and patience levels are independent). They eventually feed their algorithm to an online “learning-then-earning” algorithm which poses the interesting question of whether or not our algorithm can be turned into an online learning algorithm as well.
Also important is the nice result of Golrezaei et al. (2018), who also study product ranking in online platforms. Their models of customer behavior and product ranking are different from ours in crucial ways, and therefore the results are not quantitatively comparable.
Our work is also related to the practice of display advertising. The platform’s decision on which ads to show and in what order, in different pages of a website (e.g., in different sections of an online newspaper) or within a given page (e.g., throughout a long article) is closely related to the information the platform has access to, regarding both the browsing and clicking behavior of the users. A long stream of works, both in marketing literature (see for instance Anand and Shachar 2011 and Hoban and Bucklin 2015) and in optimization literature (see for instance, Ghosh et al. 2009, Balseiro et al. 2014, Aouad and Segev 2015, and Sayedi 2018) study various aspects of this problem. In particular, the trade-off between engagement (as an indicator of the quality of service) and revenue (as an indicator of performance) in online advertising has been a topic of investigation from the optimization perspective (see for instance, Lahaie and Pennock 2007 and Radlinski et al. 2008). In case of display advertising, Zhu et al. (2009)
provide a machine learning algorithm for jointly maximizing revenue while providing high quality ads.
From a methodological point of view, our work fits within the literature of submodular optimization. The main technical challenge of our work is due to the less-considered set of feasible solutions (i.e., permutations) we deal with. Part of our technical contribution (Section 3) is a machinery that relates the problem of sequential submodular optimization to the classic problem of submodular optimization over matroids, hence, allowing us to use the rich body of work on the subject (for instance, see Calinescu et al. 2011). We believe that this machinery could be of independent interest.
Our work is also closely related to assortment planning which is the study of optimally presenting a subset of products to a user. Assortment optimization has an extensively growing literature over recent decades. We refer the reader to related surveys and books (cf. Kök et al. 2008a, Lancaster 1990, Ho and Tang 1998)
for a comprehensive study. Later work have considered various consumer choice models, e.g., multinomial logit models(Talluri and van Ryzin 2004, Liu and van Ryzin 2008, Topaloglu 2013), the Lancaster choice model (Gaur and Honhon 2006), ranked-list preference (Honhon et al. 2010, Goyal et al. 2016), and non-parametric (data-driven) choice model (Farias et al. 2013). The main difference of our model with the assortment optimization literature is that we optimize over permutations of products rather than subsets.
The rest of the paper is structured as follows. Section 2 is devoted to formalizing the model and notations. We discuss the customer choice model and the platforms’ problem in detail and characterize sequential submodular maximization problems. In Section 3 we study the user engagement problem. Finally, in Section 4 we study the problem of maximizing revenue subject to a lower bound on user engagement.
2 Model and Notations
Consider an online platform presenting products to users in a ranked list. Denote the set of products with and the set of all permutations over with . The platform chooses a permutation to rank the products presented to a user , knowing the distribution from which is selected. User has a type specified by a pair . He or she inspects the first products in the list and makes a click with probability
We assume is a monotone non-decreasing and submodular set function for every . This is a natural assumption consistent with a wide range of choice functions including multinomial logit, nested logit, mixture of multinomial logit functions, and the paired combinatorial logit (see Kök et al. (2008b) for definitions of these models and how they are used in practice). Another example is the general choice function arising from the class of random utility models, in which the value of user for product is a random variable . The user is unit-demand and chooses the item with the highest value from the consideration subset , but only if the value of that item is at least , namely, the value of the outside option. In that case, is monotone submodular as well, even for possibly correlated values .
Maximizing User Engagement
In Section 3, we focus on finding an ordering to maximize the expected probability of click . We call this problem maximizing user engagement. The formulation of this problem can be simplified with a change of variables. For each , define and . The probability that the user makes a click is then given by defined as
where ’s are monotone submodular functions and ’s are non-negative.
In the problem of maximizing user engagement, the platform solves the following problem.
Note that maximizing the expected user welfare under the random utility models can also be cast as an example of the above problem by a small change in the definition of ’s, that is, . For consistency, we limit our attention to the user engagement formulation.
We also consider the platform’s problem of maximizing its revenue in Section 4. Online retailers typically have multiple sources of revenue. They may charge sellers for the placement of their product, clicks on its description, and finally if the user makes a transaction. In this paper, we focus on the first two sources of revenue. Suppose the platform receives a monetary payment of for placing product in position . It is reasonable to assume that ’s are monotone non-increasing in . In addition, assume the platform collects a constant amount for each click made by the users. The total expected revenue generated by the platform thus becomes
In the problem of maximizing revenue subject to a lower bound on the user engagement, the platform solves the following problem.
Note that in this problem, both the objective and the constraint involve sequentially submodular functions. Hence, one can only hope for a bi-criteria approximation solution where the objective is approximately optimized and the constraint is approximately satisfied.
[Bi-criteria Approximation Ratio] Let denote the optimum permutation for the optimization problem (2). We call an algorithm a -approximation if it finds a permutation where and .
3 Maximizing User Engagement
In this section, we study the problem of maximizing user engagement which (as discussed in Section 2) can be modeled as the optimization problem (1). The well-studied problem of monotone submodular maximization subject to a cardinality constraint is in fact a special case of this problem, as it can be reduced to this problem by setting and for . Hence no approximation ratio better than is achievable, unless .
A natural algorithm for this problem is probably a naive greedy algorithm, that is, picking the item with the maximum marginal gain to as the next item in the ordering . See Algorithm 1. For particular special cases of our problem, e.g., for the assortment ranking problem in Ferreira et al. (2019) where the click probabilities and attention window lengths are independent, the approximation ratio of greedy is known to be (cf. Ferreira et al. 2019). Interestingly, unlike the mentioned special cases, the approximation ratio of the greedy algorithm for the general submodular version of our problem is exactly . The proof of the approximation ratio and the tightness example are rather straightforward, but we mention them here for completeness.
Proof. Suppose the optimal permutation is . We have:
Now consider the following sum:
where the first inequality follows by submodularity and the second inequality follows by monotonicity. Summing above inequalities for proves the claim. Therefore by (3), we have:
which shows the approximation ratio of . ∎
Consider two products and , and two users and that each appear with probability . We assume the selection probability functions of the users and are linear, and the probability of click on product is for user 1 and 0 for user 2 and the probability of click on product is 0 for user 1 and for user 2. In this case, the greedy algorithm will pick the ordering which achieves expected user engagement of , whereas picking the order would achieve an expected user engagement of .
The main result of this section is showing the optimal approximation ratio for the user engagement problem. We present Algorithm 2 and prove it attains this approximation ratio.
Proof. We reduce the problem to submodular maximization subject to a matroid constraint for which we know there exists a approximation due to Calinescu et al. (2011). The underlying matroid is as follows. The ground set contains elements . Each element corresponds to placing product in position in the permutation. The family of independent sets is defined by a laminar family and a capacity function on , such that a set is in if and only if for each . The laminar family is the set where
and the capacity function is equal to .
Given , the function is defined for each as
The problem of maximizing the linear combination of ’s over the space of all permutations reduces (in polynomial time) to optimizing over all the independent subsets of the laminar matroid . We show this by first converting any feasible solution , potentially resulting from maximizing or approximately maximizing over , into a set corresponding to a permutation such that . Then we show that the linear combination of ’s over has the same value as , which proves the reduction. The final approximation guarantee is proved by further showing that the optimal objective value of the former problem is no smaller than the latter problem, as any permutation can be naturally mapped to a set such that .
First, we establish the main properties of .
The function is monotone and submodular.
Proof. To prove this, we need to show that for any two subsets such that , the following two properties hold:
for any .
To do this, let
By definition, we must have for all . Since each is monotone, for each we have,
Where and are defined as follows,
This proves property 1. To see property 2, note that for any ,
Again since each is submodular, we have for each ,
which proves the second property. ∎
Given a set , we can create a permutation such that .
Proof. For each product , we define to be the smallest such that , and if no such exists. We then sort the products based on their values (we arbitrarily break the ties) to get a permutation . We claim that .
To see this, consider a product . Since , we must have
where is the indicator variable which has value if and otherwise. This inequality implies that the position at which each product appears in has to be less that or equal to . Therefore, by definition, we must have . ∎
Proof. Suppose . Let
Observe that and by definition of , . ∎
Putting everything together, due to Claim 3, and the result of Calinescu et al. (2011), we can find in polynomial time such that . Claim 3 shows that . Finally by Claim 3, we can turn into a permutation (same as the one created by Algorithm 2) such that
which finishes the proof of the first part of the theorem.
To prove the hardness, as mentioned earlier, note that for the special case where all the functions are coverage functions, this problem reduces (in an approximation preserving way) to maximum coverage problem for which a hardness of approximation is known. Therefore, unless , no polynomial time algorithm can achieve a better than approximation.
Remark 1. As we mentioned earlier, no policy can achieve a better than -approximation even when ’s are coverage functions. Coverage functions have appeared in special cases of our problem studied in previous work (e.g., Ferreira et al. (2019)). For such functions, it is possible to get a -approximation using an LP-based approach which might be of independent interest. We present this result in Appendix 6.
Remark 2. Instead of defining a laminar matroid, it is tempting to directly optimize over the space of subsets that correspond to permutations. More specifically, we can treat each element as an edge between position and product and then maximize over the space of perfect matchings. We show in Appendix 7 why this approach does not work.
4 Maximizing Revenue Subject to Lower Bound on Engagement
In this section we consider the scenario where the platform is interested in finding a permutation of products that maximizes the revenue subject to the user engagement not dropping below a certain threshold , as defined in eq. (2).
4.1 Polytope of Feasible Ranking Policies
Define a feasible policy for the product ranking problem to be a procedure which starts with an empty list, and keeps adding products one by one at the end of the list until it ends up with a permutation. The choice of the elements at every step can deterministic or randomized.
For every and with , let represent the probability that the set of the first elements in the permutation is . We say an assignment of values to is implementable, if there exists a feasible policy so that for every set of size , the probability that it places the products in set in the first positions is exactly . Given , we can find the probability of product being at position by . Thus, the expected revenue of the policy that implements can be written as
We will next identify the necessary and sufficient conditions for implementability of .
The vectoris implementable by a feasible policy, if and only if
For each we have .
For any collection of subsets of with size , we have:
where is the collection of all the subsets of with size , such that for any , there exists a set such that .
Proof. We first show the easy direction. Every policy by definition fills all the positions with a product, so there exists a set of size exactly that is placed in the first positions. Thus, it satisfies condition (i). For condition (ii), consider a collection of subsets of with size . Suppose is the permutation generated by the policy. Note that . Therefore,
For the opposite direction, suppose satisfies conditions (i) and (ii). We will show how to construct a policy with assignment probabilities equal to . To do this, observe that we can check implementability of “layer by layer”. Assume for every with cardinality equal to , the probability that the set of first products in the permutations generated by the policy to be is equal to . We want to check if there exists a way to add an additional product at position , such that for every with cardinality , the probability that the set formed by the first products in the permutation to be is . We call this property implementability of layer . It is clear from the definition that if is implementable, then all the layers are implementable as well. In addition, If all the layers are implementable, then is implementable.
In order to check the implementability of a layer, we use a max flow argument. Construct a flow network , consisting of a node for each subset and a node for each subset . For any two subsets of size and , respectively, there exists an edge from to with capacity . We also have a source and a sink . The source is connected to each node with capacity and each node is connected to the sink with capacity . Note that the sum of the capacity of edges exiting the source and entering the sink are both due to condition (i). See Figure 1 for more details.
The layer is implementable if and only if there exists a flow of from source node to sink node in the flow network (Figure 1).
Proof. To see the if direction, assume there exists a flow of from the source to the sink. We can implement layer as follows. Whenever the set of the first elements is , we add product in position with probability where is the flow going from to . By doing so, the probability of any set of size appearing in the first positions is
To see the only if direction, suppose layer is implementable. Let be the probability that the policy implementing layer places product at position , conditioned on the first products are set . We define a feasible flow of as follows. Let the flow from to each node be . Similarly, let the flow from each node to be . For any node and any , let the flow from to be . Note that we have and therefore the inflow and outflow for each node are equal. In addition, due to implementability, we must have for any with , and therefore the inflow and outflow for each node are equal. So, we have a feasible flow of from to . ∎
Finally, we show that there exists a flow of from to at each layer if and only if the second property holds. This in fact holds because of the generalized Hall’s matching theorem for network flows (Hall 1935), or equivalent versions of the max-flow min-cut theorem (Ford and Fulkerson 1958). Therefore, because of Section 4.1, all the layers are implementable.
Given that all layers are implementable, we construct our final policy as follows. We start with an empty sequence, and at each step we add a new product at the end of the current sequence. If at step the set of added products so far is , at step we select product independently from probability distribution , and add it to the end of the sequence. Here, is the flow going from to in the graph we constructed to check implementability of layer . Due to implementability of all layers , this randomized policy implements . ∎
Given Section 4.1, we can define the polytope of feasible policies for the product ranking problem, denoted by , as follows. Note that variables are , where basically represents the probability that set is the set of first elements realized by the policy:
where is the power set of all the subsets of with size . The first two constraints are conditions (i) and (ii) in Section 4.1. Now suppose we are interested in the revenue maximizing policy that keeps the user engagement above some threshold . Given , the LP to find such a policy is
In the above LP, the first constraint computes the value of using the definition of , the second constraint is enforcing the solution to keep user engagement above , and the third constraint guarantees the solution to be implementable. Unfortunately, this LP has exponentially many variables and (doubly) exponentially many constraints which renders it unsolvable. We will show in the next section how we can drop some of these constraints and only keep polynomially many constraints without the objective value increasing by much. This will allow us to approximately solve the relaxed LP in polynomial time and round its solution to get a constant factor bi-criteria approximation policy.
4.2 The Bi-criteria Approximately Optimal Policy
The linear program 4.1 has exponentially many variables and (doubly) exponentially many constraints and thus is not solvable exactly. We are also not aware of any techniques to obtain an approximately optimal solution of this LP directly. To circumvent these issues, we try to find a relaxation for this program that has exponentially many variables, but polynomially many constraints. We then consider the dual of the program that has polynomially many variables and exponentially many constraints. Finally, we try to find an approximately optimal separation oracle for the dual, and by employing the ellipsoid method we obtain an approximately optimal solution for the primal.
More formally, define . Now we relax the program by only considering constraints that are among . Based on our definition of , these constraints correspond to .
We further change some of the constraints of the linear program 4.1 from equality to inequality and obtain the following relaxation.
The objective value of the above LP is clearly an upper bound on the revenue of the optimal policy since we just relaxed some of the constraints. In Section 8, we show by a computer-aided example that the objective value of 4.2 might in fact be strictly more than the revenue of the optimal policy. See Section 8 for more details.
In the rest of this section, we first present an approach for finding a permutation of products that achieves constant fraction of as its user engagement, and a constant fraction of the optimal objective value of 4.2 as its revenue. We present an overview of our approach in Algorithm 3 and then describe how to carry out each step of the algorithm in what follows.
Putting everything together, we prove this theorem, which is the main result of the section.
Given values and functions , and a lower bound on user engagement, Algorithm 3 finds a permutation in polynomial time, such that
The revenue generated by is within factor of the objective value of 4.2 (which is at least the revenue of the optimal policy satisfying the lower bound on user engagement).
The user engagement generated by is at least .
Therefore, it is a bi-criteria -approximation algorithm, where .
4.3 Detailed Steps of Algorithm 3 and Proof of Section 4.2
We start by the following technical lemma, which essentially shows how to find a fractional approximately optimal solution to our relaxed linear program 4.2.
Given values and functions , and a lower bound on user engagement, we can find a solution to 4.2 in polynomial time such that
This solution attains an objective value at least fraction of the optimal objective value of 4.2.
This solution satisfies the second constraint approximately, i.e.,
and all the other constraints exactly.
Proof. We approximately optimize a linear program slightly different from 4.2, and then show how the obtained solution of this modified LP relates to the solution of the original linear program, 4.2. We start by writing the dual of 4.2.
Define ModifiedLP2 to be the same as 4.2, with the exception that the fourth constraint is replaced with