This paper revisits a quintessential problem in online algorithms and mechanism design: how should a planner allocate a limited number of goods or resources to a set of agents arriving over time? Examples of this canonical problem range from selling seats in a concert hall to the multi-billion dollar online retail and sponsored-search auctions. In many of these applications, it is often reasonable to assume that each agent has a private valuation drawn from a known distribution. Moreover, the allocation is often subject to combinatorial constraints such as matroids, matchings, or knapsacks. The goal of the planner is then to maximize social-welfare, i.e. the total value of served agents.111In a single-parameter Bayesian setting like in this paper, the problem of maximizing the revenue can also be reduced to the maximization of welfare with a simple transformation using (ironed) virtual values (Myerson, 1981). This allocation problem, termed as Bayesian online selection, originated from the seminal work of Krengel and Sucheston in the 70s, and has since been studied quite extensively (see Lucier, 2017, for a comprehensive survey).
A common approach to the above stochastic online optimization problem is to attain the so-called prophet inequalities; the goal there is to evaluate the performance of an online algorithm relative to an offline “omniscient prophet”, who knows the valuation of each agent and therefore can easily maximize the social-welfare. The upshot of a significant line of work studying prophet inequalities is that in many complex combinatorial settings there exist simple and elegant take-it-or-leave-it pricing rules that obtain a constant factor approximation with respect to the omniscient prophet benchmark. Examples include (but are not limited to) matroids (Samuel-Cahn et al., 1984; Hajiaghayi et al., 2007; Chawla et al., 2010; Kleinberg and Weinberg, 2012), matchings (Chawla et al., 2010; Alaei et al., 2012; Alaei, 2014), and combinatorial auctions (Feldman et al., 2013). Somewhat surprisingly, it is also often possible to prove matching information theoretic lower-bounds (e.g. in matroids), i.e. showing that no online algorithm can obtain a better constant factor of the omniscient prophet than that of the simple pricing rules.
In this paper, we deviate from the above framework and dig into the question of characterizing and computing optimum online policies. Given the sequence of value distributions, Richard Bellman’s “principle of optimality” (Bellman, 1954) proposes a simple dynamic programming that computes the optimum online policy for all of the above problems. Again, these policies turn out to be simple adaptive pricing rules. On the flip side, the dynamic program needs to track the full state of the system and therefore it often requires exponential time and space.
While there are fairly strong lower bounds for the closely related computation of Markov Decision Processes (seePapadimitriou and Tsitsiklis (1987) for the PSPACE-hardness of the general Markove decision processes with partial observations), the computational complexity of the stochastic online optimization problems with a concise combinatorial structure, like the one we are considering here, is poorly understood. Here, we ask whether it is possible to approximate the optimum online in polynomial time, and obtain improved approximation factors compared to those derived form the prophet inequalities. If we answer this question in the affirmative, it justifies the optimum online policy as a less pessimistic benchmark compared to the omniscient prophet benchmark.
We focus on two special cases of the Bayesian online selection problem. Consider a firm producing (or receiving) multiple copies of different product types over time. The firm offers the products to arriving unit-demand buyers, where each buyer is interested in one product type and they have valuations drawn independently from known possibly non-identical distributions. The goal is to compute approximations to the optimum online policy for maximizing social-welfare (or revenue) subject to two constraints. First, at any time the total number of sold items of each type is no more than the number of produced items. Second, the total number of sold items does not exceed the total shipping capacity. We term this stochastic online optimization problem as production constrained Bayesian selection .
We also consider a generalization of the above problem to the laminar Bayesian selection, which is a special case of the well-known matroid Bayesian online selection problem studied in Kleinberg and Weinberg (2012), when the underlying matroid is laminar. In this problem, elements arrive over time with values drawn from heterogeneous but known independent distributions. Kleinberg and Weinberg, 2012 show a tight -approximation prophet inequality for this problem. Again, we focus on approximations to the optimum online policy instead and show that both of these problems are amenable to polynomial time approximations with any degrees of accuracy.
Main results: we give Polynomial Time Approximation Schemes (PTAS) for the production constrained Bayesian selection problem, as well as its generalization to the laminar Bayesian selection problem when the depth of the laminar family is bounded by a constant.
Overview of the techniques.
We start by characterizing the optimum online policy for both of the problems through a Linear Programming formulation. The LP formulations capture Bellman’s dynamic program by tracking the state of the system through allocation and state variables (see Section 2 for more details) and express the conditions for a policy to be feasible and online implementable as linear constraints. The resulting LPs are exponentially big but they accept polynomial-sized relaxations with a small error. Furthermore, the relaxations can be rounded and implemented as online implementable policies, in the same way as exponential-sized LPs.
More precisely, we propose a hierarchy of linear programming relaxations that systematically strengthen the commonly used “ex-ante” LP formulation of the problem and approximate the optimum solution with any degrees of accuracy. The first level of our LP hierarchy is the ex-ante relaxation, which is a simple linear program requiring that the allocation satisfies the capacity constraint(s) only in expectation. It is well-known that the integrality gap of this LP is 2 (Düetting et al., 2017; Alaei, 2014). At the other extreme, the linear program is of exponential size and is equivalent to the dynamic program.
Given as the error parameter of the desired PTAS, we show how to choose a linear program that combines the constraints of these two LPs in a careful way to get -close to the optimum solution. In a nutshell, this hierarchy is parametrized by how we divide up the capacity constraints into “large” and “small”. In the production constrained Bayesian selection, we simply consider two cases based on shipping capacity being large or small (right figure). In the laminar Bayesian selection, we consider the tree corresponding to the laminar family of constraints. Our approach here is based on chopping the tree (with the constraints as its internal nodes) by a horizontal cut, and then marking the constraints above the cut as large and below the cut as small (left figure). The final relaxation then needs to respect all the small constraints exactly and all the large constraints only in expectation.
Our final algorithms start by reducing the capacities of large bins by a factor of to create some slack, solve the corresponding LP relaxation, and then adaptively round the solution. A coupling argument shows that the LP solution can be implemented with an adaptive online pricing policy (potentially with randomized tie-breaking). However, the resulting online policy respects the large constraints only in expectation. The main technical ingredient of the remaining analysis is establishing negative cylinder dependency between the allocation events of this policy; this in turn leads to concentration results on the number of allocated items in large bins (e.g. see Dubhashi and Ranjan, 1998), and shows that the policy only violates the large capacity constraints with a small probability.
In the first problem, the negative dependency analysis uses a very careful argument that essentially establishes submodularity of the value function of the dynamic program. See Section 2 for the details. Surprisingly, the negative dependency of optimum online policies no longer holds for laminar matroids with arbitrary arrival order of elements. We present examples in which the event that one agent accepts the offered price leads the optimum online to offer a lower price to the next agent! In this case, we use a different trick by carefully chopping the laminar tree and marking the constraints to ensure negative dependency. See Section 3 for the marking algorithm and its analysis.
Further related work.
Besides the combinatorial settings mentioned earlier, constraints such as knapsack (Feldman et al., 2016), -uniform matroids (for better bounds) (Hajiaghayi et al., 2007; Alaei, 2014), or even general downward-closed (Rubinstein, 2016) have been studied in the literature on prophet inequalities. Moreover, many variations such as prophet inequalities with limited samples form the distributions (Azar et al., 2014), i.i.d. and random order prophets (Esfandiari et al., 2017; Abolhassani et al., 2017; Azar et al., 2018), and free-order prophets (Yan, 2011) have been explored, and connections to the price of anarchy (Düetting et al., 2017), online contention resolution schemes (Feldman et al., 2016; Lee and Singla, 2018)
, and online combinatorial optimization(Göbel et al., 2014) have been of particular interest in this literature. Finally, techniques and results in this literature had an immense impact on mechanism design (Chawla et al., 2010; Cai et al., 2012; Feldman et al., 2013; Babaioff et al., 2015; Cai et al., 2016; Chawla and Miller, 2016). For a full list, refer to Lucier, 2017.
Stochastic optimization problems with similar flavors, either online or offline, have also been massively studied both in the computer science and the operations research literature. Examples include (but not limited to) stochastic knapsack (Dean et al., 2004; Bhalgat et al., 2011; Ma, 2014), online stochastic matching (Manshadi et al., 2012), stochastic probing (Chen et al., 2009; Gupta et al., 2016), and pre-planning in stochastic optimization (Immorlica et al., 2004). Closest to our work in the computer science literature are (Li and Yuan, 2013) and the recent work of Fu et al., 2018. The closest work in the operations research literature to our paper is (Halman et al., 2014). These papers also obtain PTASs for some specific stochastic dynamic programs; however they diverge from our treatment both in terms of techniques, results, and the category of the problems they can solve.
2 Production Constrained Bayesian Selection
The goal of this section is to first formalize the production constrained Bayesian selection problem, and then propose a PTAS for the optimal online policy for maximizing social-welfare. On our way to achieve this goal, we will discuss an exponential-sized dynamic program and how it can also be written as a linear program. We further relax this linear program and explore how it can be rounded to a feasible online policy without a considerable loss in expected social-welfare. The combination of these two ideas gives us our first polynomial time approximation scheme.
2.1 Problem description
Consider a firm that produces multiple copies of different product types over time. The firm offers these items in an online fashion to arriving unit-demand buyers, where each buyer is interested in one type and has a private valuation drawn independently from a possibly different but known distribution. We further assume the ordering of buyers and their types are known in advance. Our goal is to find a feasible online policy for allocating the items to the buyers to maximize social-welfare, or equivalently sum of the valuations of all the served buyers. A feasible policy should respect the production constraints, i.e. at any time the number of sold items of each type is no more than the number of produced items. Moreover, it should respect the shipping constraint, i.e. the total number of sold items does not exceed the total shipping capacity of .222As a running example throughout the paper, the reader is encouraged to think of TESLA Inc. as the firm and its different models of electric cars, i.e. Model 3, Model X and Model S, as different product types.
We assume each buyer has private value drawn independently from the value distribution (which can be atomic or non-atomic). Buyers arrive at continuous times over the span of days, and reveal their value upon arrival.333Although the main goal of this paper is the selection problem and not the incentive compatible mechanism design, as we will see later, all of our policies are pricing and hence truthful for myopic buyers. Suppose by the beginning of day , the firm has received (or produced) units of type . Let (referred to as a bin) denote the type buyers arriving before day and denote the set of all the buyers of type . See Fig. 1 for more details.
Throughout the paper, we will focus on characterizing the optimal online policy and will evaluate our algorithms against that benchmark. In that sense, we deviate from the prophet inequality framework that compares various policies against the omniscient prophet or equivalently the optimum offline. It is not hard to see, and we will show this later, that these two benchmarks could be off by a factor 2 of each other even for the special case of single item prophet inequality (see also Kleinberg and Weinberg, 2012).
2.2 A simple exponential-sized dynamic program
Our production constrained Bayesian selection problem can be solved exactly using a simple exponential-sized dynamic program. Let
be the vector maintaining the current number of sold products of different types. We sayis a feasible state at time if it can be reached at time by a feasible online policy respecting all production constraints and the shipping constraint. It is possible to check whether is feasible at time using a simple greedy algorithm.
Define to be the maximum total expected welfare that an online policy can obtain from time to time given . Define when is not feasible at time and for all . We can compute for the remaining values of and recursively as follows. At time , the policy offers the buyer the price . Depending on whether or not the value of the customer is above , the mechanism obtains either or , where is the standard basis vector with a single non-zero entry of value at location and is the request type of buyer . The probability of each event can be computed using the distribution of valuation of buyer . Therefore, the dynamic programming table can be computed using the following rule also known as the Bellman equation:
Note that the price maximizes the above equation, and so the final prices of an optimal online policy can be computed easily given the table values.
The above dynamic program has an exponentially large table. In the rest of this section, we describe a linear programming formulation equivalent to the above dynamic program, a natural relaxation for the LP, and a randomized rounding of the relaxation that yields a PTAS.
2.3 Linear programming formulation and ex-ante relaxation
An online policy can be fully described by allocation variables , where for every time and state , represents the conditional probability of the event that the buyer at time is served and the state upon her arrival is , conditioned on . We further use state variables to represent the probability of the event that an online policy reaches the state upon the arrival of buyer , and auxiliary variables for the marginal allocation probability of buyer conditioned on .
Having this description, the LP formulation of the dynamic program in Section 2.2 is a combination of two new ideas. The first idea is to ensure the feasibility of the policy by adding the constraint to the linear program for an infeasible state at time . The second idea is to add another constraint describing how the probability updates from time to as the result of the probabilistic decision made by the policy at time . As will be elaborated more later, this constraint is the necessary and sufficient condition for any policy to be implementable in an online fashion.
Let the set be a finite set containing all possible feasible states at any time .444For the ease of exposition, we do not consider time-specific state spaces. In particular, let to be the set of all possible states that can happen by serving a subset of buyers of size at most . This set contains states, where at any time only a subset of them are actually reachable. Consider the following exponential-sized (both in the number of variables and constraints) linear program:
where is the polytope of point-wise feasible online policies, defined by these linear constraints:
where, as a reminder, is the type of the buyer arriving at and is the standard basis vector with a single non-zero entry of value at location . We also use to denote the set of all forbidden neighboring states at time . It is easy to see that the the set has at most states.
It is also not hard to see that any feasible online policy induces a feasible assignment for the linear program (LP). The only tricky constraint to check is the constraint corresponding to the “state update”. To do so, note that the online policy will reach the state at time if and only if either the state at time is and the buyer is not served, or the state at time is and the buyer gets served, evolving the state from to .
More importantly, we show the converse holds by proposing an exact rounding algorithm in the form of an adaptive pricing with randomized tie-breaking policy; such a policy offers the price to buyer if the current state is . In case of a tie (), the pricing policy breaks the tie independently with probability , in favor of selling the item.
There exists an adaptive pricing policy with randomized tie breaking, whose expected social-welfare is equal to the optimal solution of the linear program (LP) and is a feasible online policy for the production constrained Bayesian selection problem.
We postpone the formal proof and a discussion on how to compute prices and tie-breaking probabilities (given the LP optimal assignment) to Appendix A, and just sketch the main ideas here.
Let and be the optimal solutions of LP. Consider the following simple online randomized rounding scheme: start from the all-zero assignment at time . Now, suppose at time , the current state, i.e., number of sold products of different types, is and the realized buyer value is . The rounding algorithm first checks whether is zero. If yes, it skips the buy request. Otherwise, it accepts the request with probability .
It is not hard to show this simple scheme will have allocation and state probabilities matching the LP optimal assignment, i.e. and . Moreover, for all forbidden neighboring states , i.e. infeasible states that can only be reached from a feasible state at time by accepting an extra request. Hence an inductive argument shows that the resulting online policy is always feasible. There is also a simple coupling argument, with shifting the probability masses to higher values, showing that the above algorithm can be implemented using an adaptive pricing policy with the randomized tie breaking. Prices and probabilities can then be computed by straightforward calculations. ∎
LP has exponential size. Nevertheless, without a shipping constraint, it can be solved in polynomial time. In fact, any online policy can be decomposed into separate online policies for type-specific sub-problems; in each sub-problem, its corresponding policy only requires to respect the production constraints of its type. At the same time, the dynamic programming table of each sub-problem is polynomial-sized, as the state at time is essentially the number of sold products of type before . Therefore the overall optimal online policy can be computed in polynomial time.
What if we relax the shipping constraint to hold only in expectation (over the randomness of the policy/values)? This relaxation is used in the prophet inequality literature (Alaei et al., 2015; Feldman et al., 2016; Düetting et al., 2017; Lee and Singla, 2018), where is termed as the ex-ante relaxation.
Similar to the linear program of the optimal online policy (LP), we formulate the ex-ante relaxation as an LP. First, re-define to be the set of possible states of each sub-problem.555Notably, we only need to be a superset pf all feasible states of each sub-problem at any time . Second, for each type and buyer , we use allocation variables , marginal variables , and state variables , where represents the number of sold items of type before the arrival of buyer . We further use variables to represent the expected number of served buyers of each type .
where is the set of all buyers of type and is the polytope of point-wise feasible online policies for serving type buyers, defined by the following set of linear constraints (similar to LP):
where is the set of all forbidden neighboring states of the sub-problem of type at time , i.e.
Note that because of the collapse of the state space, LP program has polynomial size.
As every online policy for our problem induces a feasible online policy for each request type , and because it respects the shipping capacity point-wise, we have the following proposition.
LP is a relaxation of the optimal online policy for maximizing expected social-welfare in the production constrained Bayesian selection problem.
2.4 Polynomial-Time Approximation Scheme (PTAS)
Given parameter , our proposed polynomial time approximation scheme is based on solving a linear program with size polynomial in and an adaptive pricing mechanism with randomized tie breaking that rounds this LP solution to a -approximation. For notation purposes here and in Section 2.5, let be the optimal assignment of LP, and be the optimal assignment of LP for the buyers .
Consider the linear program of the optimal online policy (LP) and the ex-ante relaxation (LP). For a given constant , we turn to one of these linear programs, depending on whether the shipping capacity is small () or large (). In the former case, we pay the computational cost of solving LP and then round it exactly to a point-wise feasible online policy. In the latter case, we first reduce the large shipping capacity by a factor of to create some slack, and then solve LP with this reduced shipping capacity (which has polynomial size). We then round the LP solution exactly by an adaptive pricing with randomized tie breaking policy. The hope is the resulting online policy respects all the constraints of the production constrained Bayesian selection problem with high probability because:
For every type , the policy respects the corresponding production constraints point-wise.
The policy respects the reduced large shipping capacity in expectation.
Moreover, we show that the total number of served buyers concentrates because of the negative dependency of the selection rule.
Finally, due to the available slack in the large shipping capacity and because of the mentioned concentration, this capacity is exhausted with small probability.
More precisely, we run the following algorithm (Algorithm 1).
Computing prices and tie-breaking probabilities.
Given , the proof of Proposition 2.1 in Appendix A (sketched in Section 2.3) gives a recipe to find and efficiently, so that the corresponding adaptive pricing with randomized tie-breaking policy maintains the same expected marginal allocation as the optimal online policy for every buyer and state , while having at least the same expected social-welfare.
For the case of large shipping capacity, we apply exactly the same argument for each sub-problem separately. Given for , we can efficiently find prices and probabilities , so that the corresponding adaptive pricing with randomized tie-breaking for buyers with type maintains the same expected marginal allocation for every and , while having at least the same expected social-welfare from serving each individual buyer of type .
Feasibility, running time and social-welfare.
Clearly, Algorithm 1 is a feasible online policy in the case of small shipping capacity (Proposition 2.1). In the case of large shipping capacity, as for any forbidden neighboring state of sup-problem , the same argument shows that it respects all of the production constraints of each type . The policy also never violates the shipping capacity by construction, and hence is feasible. In terms of running time, the linear program of optimal online policy LP has at most states, as no more than requests can be accepted. By setting , Algorithm 1 has running time . We further show that its expected welfare is at least fraction of the expected welfare of the optimal online policy (whose proof is deferred to Section 2.4.1, Proposition 2.6).
Theorem 2.3 (PTAS for optimal online policy).
By setting , Algorithm 1 is a -approximation for the expected social-welfare of the optimal online policy of the production constrained Bayesian selection problem, and runs in time .
2.4.1 Analysis of the algorithm (proof of Theorem 2.3)
If the shipping capacity is small, i.e. , Algorithm 1 has the optimal expected social-welfare among all the feasible online policies, because of the optimality of LP (Proposition 2.1). Next consider the ex-ante relaxation LP in the case when . By Proposition 2.2, its optimal solution is an upper bound on the social-welfare of any feasible online policy. By scaling the shipping capacity by a factor , we change the optimal value of this LP by only a multiplicative factor of at least .
As sketched before, for each type the adaptive pricing policies extract an expected value from buyer that is at least equal to the contribution of this buyer to the objective value of the ex-ante relaxation LP. However, buyer can be served by the adaptive pricing policy of type only if the large shipping capacity has not been exceeded yet. So, to bound the loss, the only thing left to prove is that the probability of this bad event is small (as small as ).
Concentration and negative cylinder dependency.
In the case when the shipping capacity is large, let
be a Bernoulli random variable, indicating whether the resulting pricing policy of typeserves the buyer or not. Note that , as LP ensures feasibility of the shipping constraint in expectation. Now, if the total count concentrates around its expectation, we will be able to bound the probability of the bad event that the shipping capacity is exceeded.
Clearly, are mutually independent, as we run a separate adaptive pricing policy for each type
. However, the indicators random variables of the same type are not mutually independent with each other. So, for proving the required concentration, Chernoff bound cannot be applied immediately. Yet, we can use a certain variant ofnegative dependency instead of independence (Dubhashi and Ranjan, 1998; Pemantle, 2000; Chekuri et al., 2009), known as the negative cylinder dependency, and still prove Chernoff-style concentration bounds.
Definition 2.5 (Negative cylinder dependency, Pemantle (2000)).
Random variables satisfy the negative cylinder property if and only if for every ,
We now prove the following proposition, assuming the required negative cylinder dependency.
If for every request type , the random variables satisfy the negative cylinder dependency and if , then the probability that the shipping capacity is exhausted is .
Applying the negative cylinder dependency property, for every we have:
where in (1) we use the negative cylinder property among for each type , and the mutual independence of the Bernoulli variables across different types. Now, as Eq. 3 is essentially the first step of the proof of Chernoff bound, it ensures that we have a Chernoff-type concentration (Dubhashi and Ranjan, 1998). Therefore, as , we have:
2.5 Negative cylinder dependency for optimal online policy
Fix a product type . For notation simplicity, re-index as , where and . To show satisfy the negative cylinder dependency, it is enough to show that:
To see this, note that . Hence if Eq. 4 holds, we have
which shows the negative cylinder dependence. Given this simple observation, one needs to show that for the adaptive pricing with randomized tie-breaking used in Algorithm 1, the probability of accepting a buy request at time can only decrease conditioned on more requests being accepted in the past. Another neat observation is that given as the optimal soft shipping capacity that needs to hold only in expectation, for is indeed the optimal solution of the following linear program.
Therefore, it is enough to show the same property holds for another adaptive pricing with randomized tie-breaking algorithm that is used for exactly rounding LP-sub, as both of these rounding algorithms have the same allocation distribution for the buyers of type .
For simplicity of the proofs in this section, we assume that the valuations are non-atomic.777For the case of atomic distributions, one can think of dispersing each value distribution first to get non-atomic distributions, and then proving negative cylinder dependency for any small dispersion. Then the negative cylinder dependency for the original atomic distribution can be deduced from negative cylinder dependency of the dispersed distribution for small enough dispersion. Note that under this assumption, there will be no need for randomized tie breaking, and indeed our rounding algorithm will be a pure adaptive pricing. We now prove our claim in two steps.
Step 1: by using LP duality, we show that the optimal online policy of the sub-problem of type with an extra soft shipping constraint is indeed the optimal online policy for an instance that has no soft constraint and all the values are shifted by some number , i.e. .
Step 2: we show that the optimal online policy of a particular sub-problem , whether value distributions have negative points in their support or not, satisfies the negative cylinder dependency.
Putting the two pieces, we prove the negative cylinder dependency among as desired. In the remaining of this section, we prove the two steps.
Proof of Step 1.
Let be the optimum dual solution. By dropping the constant terms and rearranging we get the following equivalent program for the optimal solution:
This shows that the optimal online policy respecting the soft shipping constraint is equivalent to the optimal online policy for an instance of the problem where all the values are shifted by some constant . ∎
Proof of Step 2.
We only need to show that the negative cylinder dependency holds for the dynamic programming that solves each sub-problem, as the distributions are non-atomic and there is a unique deterministic optimal online policy, characterized both by the LP and the dynamic programming. Consider sub-problem . We use induction to show by serving more customers in the past, the prices for new buyers increase. Let denote the total number of products of type that have been sold up to the arrival of buyer . Note that the algorithm only needs to decide whether buyer should be served.
Let denote the maximum total expected welfare that an online policy can obtain from time to , assuming that it starts from state . Also let denote the set of production checkpoints of type that occur at or after time . Using the Bellman equations we have
As the base of our induction, we know that if we serve the last buyer, the probability that we serve any other buyers does not increase. Now assume while serving buyer , we have
Note that this shows the price offered to buyer increases if we serve more buyers before buyer . When buyer arrives, we need to show
which is equivalent to
Note that this property is linear in the terms involved. So it is enough to assume that the value is deterministic first and prove the above inequality. Then by linearity of expectation, the inequality would hold in the general case.
Note that for the case where , the inequality holds trivially because we assume for any non-negative integer such that . According to our induction hypothesis, if is updated, then the other two variables are updated as well. More precisely, if , then
In a similar way, if , then
Considering these relations, we have three different cases. (i) none of these variables are updated. In this case, Eq. 6 turns into Eq. 5 which holds according to our induction hypothesis. (ii) all of these variables are updated. In this case, Eq. 6 turns into
which holds again according to the induction hypothesis. Finally, (iii) the case where is updated and is not updated. In this case, we can write Eq. 6 as
and this always holds because for any two values , . ∎
3 Generalization to the Laminar Matroid Bayesian Selection
Another approach to the production constrained Bayesian selection problem is to view it as follows: for each product type , nested subsets are given, where has capacity and each subset has capacity . A subset of requests in is considered to be feasible if from each subset or no more than their capacities are selected. This alternative view suggests looking at the problem as a special case of a combinatorially richer stochastic online optimization problem known as the laminar matroid Bayesian selection (Kleinberg and Weinberg, 2012; Feldman et al., 2016).
3.1 Problem description
In a laminar matroid Bayesian selection problem, we have a set of elements and a laminar family of subsets of these elements termed as the bins, i.e. a collection of subsets where for every either , or . It is often helpful to represent the laminar family as a tree whose internal nodes are the bins and the leaves are the elements. The elements arrive over time in an arbitrary but known order, and reveal their values. We further assume values are drawn independently from known heterogeneous distributions. Moreover, each bin has capacity . The goal is to design an online policy/algorithm for selecting a subset of elements with maximum possible total expected value, so that no more than elements are picked from each bin . Figure 2 describes how this problem generalizes the production constrained Bayesian selection problem of Section 2.
We consider generalizing the machinery developed in Section 2 to this problem. Again, similar to Section 2, we deviate from the prophet inequality framework that compares various policies against the omniscient prophet or equivalently the optimum offline. Our main result in this section is a PTAS for the optimal online policy, when the depth of the family (or equivalently the height of the tree) is constant. We also show that our final algorithm has the form of an adaptive pricing with randomized tie-breaking.
We further generalize our setting by replacing each singleton element with a laminar Bayesian selection sub-problem. As a simple corollary, we show our result holds if the optimal online policy for each sub-problem can be implemented efficiently and satisfies the negative cylinder dependency (Definition 2.5). Notably, the production constrained Bayesian selection is subsumed under this corollary as a special case.
3.2 Sketch of our approach
Our general strategy to solve this problem is to first divide the internal nodes of the laminar tree into large and small bins, similar to what we did for the shipping constraint in the previous section. To do so, we start from the root, and mark each node as either large or small. Once a bin is marked as small, all of its descendants will also be marked as small. Next, inspired by our approach in Section 2, we proceed with these steps:
Finding a linear programming formulation for characterizing feasible online policies in each small bin. As in Section 2, we use an state update rule (similar to the dynamic programming update) to help us with this characterization.
Writing a hierarchy of linear programming relaxations, where the relaxations are parameterized by how we divide the bins into small and large. For a given marking, the corresponding relaxation should select a point-wise feasible online policy in each small bin, and impose global capacity constraints that hold only in expectation for large bins.
Using a particular marking algorithm to select a polynomially solvable linear programming relaxation in the above mentioned hierarchy.
Using an adaptive pricing with randomized tie-breaking to round this LP relaxation.
Using a concentration argument to show large capacities are only violated with small probabilities.
We next elaborate more on each of the bullets above, while highlighting the new technical pieces we need to add to the techniques in Section 2.
3.3 A hierarchy of linear programming relaxations for general laminar matroids
We define a family of linear programming relaxations, parametrized by different markings of bins into small and large. Given a particular marking, as described in Section 3.2, let be the set of large bins and be the set of maximal small bins.
For an instance of the laminar matroid Bayesian selection problem with laminar family , an online policy is said to be at the state upon the arrival of an element if is the vector of remaining capacities of all of the bins in . Given this definition, the state is a sufficient information to find the optimal online policy from time to , for both the allocation and the expected social-welfare, using (an exponential-time) dynamic programming à la Section 2.2.
To avoid exponentially many states in our hierarchy of LP relaxations, we use the same state-space structure, but we only track the local state of each maximal small bin in separately. In other words, we can think of each maximal small bin as a separate laminar matroid Bayesian selection sub-problem with laminar family , where each arriving element is only in one of the sub-problems (because subsets in form a partition of the set of all elements). Now, if the arriving element at time belongs to , the linear program only needs to keep track of the change in the local state of the sub-problem , i.e. the vector of remaining capacities of the bins in .
For every small bin , define to be the set of all feasible local states of the sub-problem , i.e. the set of all possible remaining capacity vectors of the bins in , where each vector can be reached by an online policy for this sub-problem that respects all the capacities in . Note that , because no feasible online policy for the sub-problem can pick more than elements. We now can write a linear program with the following variables and constraints:
We add allocation variables , marginal allocation variables and state variables as before, in a similar fashion to the ex-ante relaxation linear program (LP) in Section 2. For the variables and , assuming the element arriving at time belongs to the maximal small bin , the vector represents the local state of right before arrival of this element.
We add two categories of linear constraints to our LP relaxations:
Global ex-ante constraints: these constraints ensure that the capacity of all large bins are respected in expectation, i.e.
Local online feasibility constraints: for every bin , similar to LP, we can define a polytope of feasible online policies that ensures a feasible assignment of the linear program is online implementable by a feasible policy. So, these constraints will be:
Polytope of feasible online policies.
where is a binary vector denoting which bins in will be used if we pick the element arriving at time ,888i.e. for every , if and only if . and is the set of all forbidden neighboring states of sub-problem , i.e.
Given these variables and constraints, the LP relaxation corresponding to the marking (which we show later why is actually a relaxation) can be written down as following.
Again, it is easy to see that any feasible online policy for the sub-problem is represented by a feasible point inside the polytope . As every online policy for the laminar matroid Bayesian selection problem induces a feasible online policy for each sub-problem (by simulating the randomness of the policy and values outside of ), and because it respects all the large bin capacity constraints point-wise, we have the following proposition (formal proof is deferred to Appendix B).
For any marking of the laminar tree, LP is a relaxation of the optimal online policy for maximizing expected social-welfare in the laminar matroid Bayesian selection problem.
3.4 Exact rounding through adaptive pricing with randomized tie-breaking
One can also use exactly the same technique as in Section 2 to develop exact rounding algorithms for the induced optimal solution of LP inside each maximal small bin . Formally speaking, given a particular marking , we show there exists a family of adaptive pricing with randomized tie-breaking policies , where each of these pricing policies exactly rounds the solution induced by the optimal solution of (LP) in each small bin .
The above rounding schemes can then be combined with each other, resulting in an online policy that is point-wise feasible inside each small bin and only ex-ante feasible inside each large bin, i.e. it only respects the large bin capacity constraints in expectation. The combining procedure is simple: once an element arrives at time that belonged to , the algorithm looks at the state of the bin (suppose it is ), and posts the price with tie-breaking probability . The element is then accepted w.p. 1 if , w.p. 0 if , and w.p. if .
For the laminar Bayesian online selection problem, given any marking of the laminar tree, there exists an adaptive pricing policy with randomized tie breaking whose expected welfare is equal to the optimal solution of the linear program (LP). Moreover, the resulting policy is feasible inside each small bin and ex-ante feasible inside each large bin.
We first employ a simple randomized rounding to show how to exactly round the LP, and then use a simple coupling argument to argue why the optimal solution should have the form of a thresholding with randomized tie-breaking. Details of the proof are similar to those of the proof of Proposition 2.1 and hence are omitted for brevity. ∎
By putting all the pieces together, we run the following algorithm given a particular marking.
Once an element arrives, the algorithm identifies the maximal small bin that contains , and finds the current state in this bin. It then posts the price with randomized tie-breaking probability .
3.5 Marking and concentration for constant-depth laminar
In this section, we want to show that our rounding algorithm (Algorithm 2) achieves a fraction of the expected social-welfare obtained by the optimal online policy. Note that (LP) is a relaxation, and scaling down the large capacities decreases the benchmark by at most a factor .
Once an element arrives at time , consider all large bins that are alongside a path from this element to the root of the laminar tree. By construction, the expected value extracted from this element by the pricing policy would be exactly equal to the contribution of this element to the objective value of (LP) (after scaling down the capacities), but only if the element is not ignored; an element will be ignored, i.e. offered a price of infinity, if one of the mentioned large capacities is exceeded. Therefore, to show that the loss is bound by fraction of total, we only need to show that the bad event of an element being ignored happens with a probability that is bounded by .
In order to bound the above probability, we need a concentration bound for the random variable corresponding to the total number of elements picked in each large bin. Previously in Section 2, we could show negative (cylinder) dependency among selection indicators of the optimal online policy for a chain of nested bins (with a particular ordering of the elements), which gave us the required concentration.
Nevertheless, negative dependency does not hold for general laminar matroids with arbitrary arrival order of elements. To see this, consider the laminar matroid depicted in Fig. 3 with elements arriving one by one from to . Let and