Online posted pricing problems are one of the canonical examples in online decision-making and optimal control. The basic model comprises of a fixed supply of non-replenishable items; buyers (demand) arrive in an online fashion over a fixed time interval, and the platform sets prices to maximize some objective such as social surplus (welfare) or revenue. Another variant of this setting is found in internet advertising, where the number of advertisements (supply) is assumed to be fixed (for example, based on contracts between the publisher and advertisers), while keywords/impressions (demand) arrive online, and are matched to ads via some policy. The demand is typically assumed to obey some underlying random process, which allows the problem to be cast as a Markov Decision Process (MDP); however, in many settings, such a formulation suffers from a “curse of dimensionality”, making it infeasible to solve optimally.
An important idea for circumventing the computational intractability of optimal pricing is that of prophet inequalities
— heuristics with performance guarantees with respect to the optimal policy in hindsight (i.e., the performance of a prophet with full information of future arrivals). The simplest prophet inequality has its origins in the statistics community  — given a single item and arriving buyers with values drawn from known distributions, there is a pricing scheme using only a single price that extracts at least half the social surplus earned by the prophet (moreover, this is tight). More recently, there has been a long line of work generalizing this setting to incorporate multiple (possibly non-identical) items, as well as combinatorial buyer valuations [17, 6, 22, 13, 10, 25, 8, 1, 11].
The aim of our work is to develop a theory of prophet inequalities for settings with uncertainty in future supply. This is a natural extension of the basic posted-price setting, and indeed special cases of our framework have been considered before [27, 17] (in the context of optimal secretary problems with a random “freeze” on hiring). What makes these problems of greater relevance today is the rise of online ‘sharing economy’ marketplaces, such as those for transportation (Lyft, Uber), labor (Taskrabbit, Upwork), lodging (Airbnb), medical services (PlushCare), etc. The novelty in such marketplaces arises because of their two-sided nature: in addition to buyers who arrive online, the supply is now controlled by “sellers” who can arrive and depart in an online fashion. For example, in the case of ridesharing/lodging platforms, the units of supply (empty vehicles/vacant listings) arrive over time, and have some patience interval after which they abandon the system (get matched to rides on other platforms/remove their listings). Supply uncertainty also arises in other settings, for instance, if items are perishable and last for a priori random amounts of time. Our work aims to understand the design of pricing policies for such settings, and characterize how the resulting prophet inequalities depend on the characteristics of the supply uncertainty.
We introduce “supply uncertainty” into the basic prophet inequality setting as follows: There are items present initially, but these do not last till the end of the buyer arrivals, but instead, depart after an a priori unknown amount of time. Formally, we assume each item samples a horizon from a distribution , at which time it departs. We assume the horizon lengths for items are mutually independent, and also independent of the valuation distribution of the buyers. Note though that the items can have different horizon distributions. We denote the maximum possible horizon length for any item as .
On the demand side, we assume there is an infinite stream of unit-demand buyers arriving online, where the valuation of the
-th arriving buyer is a random variabledrawn i.i.d. from a distribution . From the perspective of a buyer, all items are interchangeable, and hence being matched to any item that has not yet departed yields value . Note that assuming an infinite stream of buyers is without loss of generality, because we can encode any upper bound on the number of buyers in the horizon distributions.
The algorithm designer knows the horizon distribution for each item, and the buyer value distribution , but not the realized horizons for each item (until the item actually departs), or the value for any buyer. The goal is to design an online pricing scheme that competes with a prophet that knows the realized horizons of each item and the valuation sequence of buyers, and extracts full social surplus (or welfare).
The main outcome of the standard prophet inequality is that there are constant-competitive algorithms for maximizing welfare, even when buyers are heterogeneous and arrive in arbitrary order. This however turns out to be impossible in the presence of item horizons without additional assumptions. First, even with i.i.d. horizons, achieving a constant factor turns out to be impossible for general horizon distributions (cf. Theorem 4); thus to make progress, we need more structure on the horizons. One natural assumption is that each item is more and more likely to depart as time goes on, which can be formalized as follows.
A horizon distribution satisfies the monotone-hazard-rate (MHR) condition if:
Several distributions satisfy the MHR condition, including uniform, geometric, deterministic, and Poisson; note also that truncating an MHR distribution preserves the condition.
Finally, even with MHR horizons, buyer heterogeneity is a barrier for obtaining a constant-competitive algorithm, as demonstrated by the following example, with deterministic valuations and known order of arrivals.
item with horizon following a geometric distribution with parameter, consider a sequence of buyers with for . The expected value of the prophet is while any algorithm can only achieve a constant value in expectation.
1.2 Main Result: Prophet Inequalities under Uncertain Supply
The above discussion motivates us to study settings with i.i.d. buyers, and items with MHR horizons. Our main result is that these two assumptions are sufficient to obtain a constant-competitive approximation to the prophet welfare. In particular, our main technical result is the following theorem, which we prove in Section 2.
There is a constant-competitive online policy for social surplus for any items with independent and possibly non-identical MHR horizon distributions, and unit-demand buyers arriving with i.i.d. valuations.
Though the complete algorithm is somewhat involved, at a high level, it is based on a simple underlying idea: to be constant-competitive against the prophet, we need to choose prices so as to balance the rate of matches and departures. Achieving this in the general case is non-trivial, and requires some new technical ideas. However, for the special case of a single item, balancing can be achieved via a simple fixed pricing scheme. In Section 3, we use this to obtain the following tight result for the setting (this also serves as a primitive for our overall algorithm):
There is a fixed pricing scheme for a single item with an MHR horizon distribution with mean that has competitive ratio . Further, this bound is tight for the geometric horizon distribution with mean .
Intuitively, the factor of two in the above theorem corresponds to the prophet considering matching and departures as the same, which an algorithm cannot do. The surprising aspect is that this simple policy is worst-case optimal within the class of instances with MHR horizons — this is in contrast to deterministic horizons, where fixed pricing is known to be suboptimal for the special case of one item with known (deterministic) horizon and i.i.d. buyers [18, 11].
1.3 Lower Bounds
We complement our positive results by showing several lower bounds that establish their tightness. As mentioned above, in Section 3, we show a (tight) lower bound of for items with MHR horizons. Our main lower bounds in Section 4 generalizes this to items.
For the multi-item setting with i.i.d. geometric horizons:
For any number of items, there is a lower bound of on the competitive ratio of any dynamic pricing scheme; in the limit when the number of items goes to infinity, this improves to .
No fixed pricing scheme can be -competitive where is the number of items.
The above theorem implies that the MHR horizon setting, even with i.i.d. horizons, is significantly different from the setting with multiple items and a single deterministic horizon (where fixed pricing extracts -fraction of surplus ). Put differently, the lower bound emphasizes that even with i.i.d. horizons, to obtain a constant-competitive algorithm, it is not sufficient to replace the horizon distributions by their expectations and use standard prophet inequalities — the stochastic nature of the horizons allows for significant deviations in the order of departures of the items, and a policy that knows this ordering can potentially extract much more welfare. Given this, it is quite surprising that a simple dynamic pricing scheme achieves a constant approximation.
Finally, we consider the general case where there is no restriction on the horizon distribution. In this setting, the presence of supply uncertainty severely limits the performance of any non-anticipatory dynamic pricing scheme in comparison to the omniscient prophet. In particular, we show that for any number of items and i.i.d. buyer valuations, the ratio between the welfare of any algorithm and the prophet grows with the horizon, even if the algorithm knows the realized valuations.
For any items, there exists a family of instances such that the prophet has welfare -factor larger than any online policy, even if the policy knows all the realized values, but not the realized horizons. Here, .
1.4 Technical Highlights
At a high level, we achieve our results via a conceptually simple and natural class of balancing policies that generalizes policies for the deterministic-horizon case:
Balancing Policy. Balance the rate at which buyers are accepted to the rate at which items depart the system because their horizon is reached.
Converting this high-level description of balancing into a concrete policy requires new technical ideas. We first note the technical challenges we encounter. In the setting with deterministic identical horizons [23, 13], we can achieve constant-competitive algorithms (or even better) via a global expected value relaxation that yields a fixed pricing scheme. Indeed, such an argument can safely assume buyers are non-identical with adversarial arrival order. However, the setting with stochastic horizons is very different. First, as Example 1 shows, even for item with geometric horizon, there is an lower bound when buyer valuations are not identically distributed. Secondly, for items, we need dynamic pricing even in the simplest settings — when horizons are i.i.d. geometric (see Theorem 3), or when they are deterministic. This precludes the use of a global one-shot analysis.
At this point, we could try using techniques from stochastic optimization, particularly stochastic matchings [7, 5] and multi-armed bandits [15, 16]. Here, the idea is to come up with a weakly coupled relaxation, say one policy per item, and devise a feasible policy by combining these. However, these algorithms crucially require the state of the system to only change via policy actions, and our problem more is similar to a restless bandit problem  where item departures cause the state of the system can change regardless of policy actions taken. Indeed, the actual departure process itself may significantly deviate from its expected values, making it non-trivial to use a global relaxation.
This brings up our technical highlight: Instead of encoding the departure process in a fine-grained way into a relaxation, we simulate its behavior in our final policy. In more detail, we first write a weak relaxation of the prophet’s welfare separately in a sequence of stages with geometrically decreasing number of items. This only uses the expected number of items that survive in the stage, and not the identity of these items. The advantage of such a weak relaxation is that it yields a solution with nice structure: this policy non-adaptively sets a fixed price in each stage to balance the departure rate with the rate of matches. However, it is non-trivial to construct a feasible policy from this relaxation, since the relaxation decouples the allocations of the prophet across different stages, while any feasible algorithm’s allocations are clearly coupled. Indeed, the optimal feasible policy is the solution to a dynamic program with state space exponential in , and the prophet is further advantaged by knowing which items depart earlier in the future.
Surprisingly, we show that our simple relaxation is still enough to achieve a constant-competitive algorithm. We do so by simulating the departure process, that is, by choosing items for matching with the same probability that they would have departed at a future point in time. This couples the stochastic process that dictates the number of items available in the policy with that in the prophet’s upper bound, albeit with a constant-factor speedup in time. This yields anon-adaptive policy that makes its pricing decisions for the entire horizon, as well as the (randomized) sequence in which to sell the items, in advance. We believe such a policy construction that simulates the evolution of state of the system may find further applications in the analysis of restless MDPs.
Lower Bounds from Time-Reversal.
— here, we first consider a canonical, asymptotic regime where the horizon distribution is geometric with mean approaching infinity, and show that we can closely approximate the behavior of the prophet and the algorithm via an appropriate Markov chain. We then define and analyze a novel time-reversed Markov chain encoding the prophet’s behavior, that captures matching a departing item to the optimal buyer that arrived previously.
1.5 Related Work
The first prophet inequalities are due to Krengel and Sucheston [23, 24]. It was subsequently shown , there is a -competitive fixed pricing scheme that is oblivious to the order in which the buyers arrive, and this ratio is tight in the worst case over the arrival order. Motivated by applications to online auctions, since then there have been several extensions to multiple items [20, 17, 3], matching setting [2, 28], matroid constraints  and general combinatorial valuation functions [13, 25].
Our work is a generalization of the single-item setting where buyer valuations are i.i.d. and the horizon is known, to the case where the horizon is stochastic and there are multiple items. The setting with known horizons was first considered in Hill and Kertz . In this case, the optimal pricing scheme can be computed by a dynamic program, and a sequence of results [21, 1, 8] show a tight competitive ratio of for this dynamic program against the prophet. In contrast, we show that when the horizon is MHR, a simple fixed pricing scheme has optimal competitive ratio of .
A generalization of the i.i.d. setting is the recently-introduced prophet secretary problem where the buyers are not identical, but the order of arrival is a random permutation. In this case, fixed pricing is a tight -approximation [12, 11]; and a dynamic pricing scheme can beat this bound [4, 9] by a slight amount. Though our results extend to this setting, it is not the focus of our paper since the i.i.d.-valuations case is sufficient to bring out our conceptual message.
The random horizon setting has been extensively studied in the context of the classic secretary problem. When the horizon is unknown (that is, no distributional information at all), no constant-competitive algorithm is possible . In the context of prophet inequalities, the unknown-horizon setting was considered by Hajiaghayi et al. , who show again that no constant-competitive algorithm is possible. We use a similar example to extend this lower bound to the case where the horizon is stochastic from a known distribution.
2 Prophet Inequality for Heterogeneous Items with MHR Horizons
In this section, we present the proof of Theorem 1. We first give an overview of our algorithm. At a high level, this scheme attempts to balance the rate that items are assigned to buyers and the rate that items naturally depart. In Section 2.1, we first introduce a way to divide the entire time horizon into disjoint stages in a way such that during the -th stage, items depart in expectation. We then bound the prophet’s welfare separately for each stage (Section 2.2) — we do so via a relaxation that ignores the identity of the items, and only captures the constraint that the expected number of matches in a stage is at most the expected number of items present at the beginning of that stage.
The key technical hurdle at this point is that when we make a matching, we do so without knowing exactly when items depart in the future. This changes the distribution of the items available in subsequent stages. To get around this, in each stage, we first simulate the future departure of items, and use this to select items available for matching in the current stage. In more detail, in Section 2.3
, we split the stages alternately into even and odd stages, and develop an algorithm whose welfare approximates the welfare of the relaxed prophet from the odd stages (and by symmetry, another algorithm that approximates the welfare from the even stages).
For approximating the welfare from the odd stages, the algorithm re-divides time into a new set of stages corresponding to the odd stages under the old division (See Figure 1). We then use each new stage to approximate the welfare generated in the corresponding odd stage in the old division; to do so, we sample candidate items for matching in the current stage with the probability they would leave in the subsequent even stage under the old division. Consequently, for every item, the probability of departure during an even stage under the old division is the same as of being selected for matching in the current stage. We show that this process couples the behavior of the algorithm and the benchmark, assuming the departure processes are MHR. Using concentration bounds, we show that this approach yields a constant approximation.
In addition to the above process, our algorithm needs to separately handle any stage of length (i.e., any single time period where the expected number of available items reduces by at least half), as well as a final stage where the expected number of available items is constant. We show that the welfare in the length phases is approximated by a blind matching algorithm which matches all incoming buyers (Section 2.4), while the welfare of the final period is approximated by an algorithm that randomly selects only one item for matching at the beginning, and discards the rest (Section 2.5). For the latter setting (i.e., for a single item setting), we present a tight -competitive fixed pricing scheme for the setting in Section 3. Finally, the overall algorithm is based on randomly choosing one of the four candidate algorithms (i.e., for approximating the prophet welfare in odd stages, even stages, short stages, and the final stage), with an appropriately chosen distribution.
2.1 Splitting Time into Stages
As a first step, we divide the time horizon into stages. The -th stage corresponds to an interval . For , we define by
Also for ; and .
We set to be the smallest non-negative integer so that , i.e., . Within the first stages, we separate stages of length from the rest. We term the stages of length at least as Long stages, and those of length as Short stages. We term the stage as the Final stage. Note that based on our choice of , the expected number of items which remain in the final stage is at most , and unless , at least items in expectation survive at one time step earlier into the final stage.
2.2 Upper Bound on Prophet’s Welfare
In this section, we develop a tractable upper bound for the prophet. Let Pro denote the optimal welfare obtainable by the prophet. We term the total welfare of Pro in the Long stages as ProLong, the total welfare in the Short stages as ProShort, and the welfare in the Final stage as ProFinal. Clearly, we have:
We bound ProLong and ProShort separately for each stage. Let denote the welfare from stage , so that .
For , we have:
where satisfies .111The existence of such is without loss of generality: Let . When there exists some such that and , we could accept all values greater than and accept with probability .
Fix a stage . Let be the expected welfare that the prophet gets from buyer , and let be the probability that buyer is matched by the prophet ().
Notice that in expectation, at most items have horizons of at least by the definition of stages. Therefore, .
Let be the CDF of the distribution . We have , since when buyer is matched with probability , the prophet cannot do better than getting the top -percentile of the distribution from the buyer. With these constraints, we write a relaxation for the welfare of the prophet during stage :
Clearly ’s should be equal in the optimal solution. Therefore,
where . Summing over the stages finishes the proof. ∎
Notice that in our upper bound for , if an item departs during stage , we allow it to be matched once in stage , once in stage , …, and once in stage . However, since the expected number of departures in each stage exponentially decreases, only a constant factor is lost comparing with the finer relaxation where we enforce the constraint that each item is only matched once across the stages. Our coarser relaxation enables a cleaner benchmark to work on.
We next bound ProFinal. Let be the optimal welfare of the prophet (from all stages) if item is the only item available in the system, i.e., the single-item setting. We consider this setting in detail in Section 3.
Let be the welfare that the prophet can get from item during the final stage. We have
where the second inequality comes from the MHR condition of : — item would depart faster if it started at time .
Summing up the items, we have:
where the three term correspond to an upper bound on the prophet’s welfare in the Long, Short and Final stages respectively (i.e., ProLong, ProShort, and ProFinal). In the next three sections, we describe three separate algorithms, each one of which, if run independently, provides an approximation to one of the terms. Our overall algorithm is then based on randomly choosing between the three algorithms with appropriately chosen distribution.
2.3 Approximating ProLong: The DepartureSimulation Algorithm
We first approximate upper bound given in Lemma 2. Within this, we approximate ProLong and ProShort separately. We first focus on ProLong, since this is technically the most interesting, and postpone approximating ProShort to Section 2.4.
We approximate ProLong by Algorithm 1. We divide all the stages into alternate odd and even stages. We focus on illustrating the approximation for odd stages, and that for even stages is identical. We then re-divide time into stages corresponding to the original odd stages, as illustrated in Figure 1, where stands for the old stage and stands for the new stage . At each odd stage, we sample items according to their departure rates during the next (fictitious) even stage. During the new process when items become unavailable by being sampled, each item is as least as likely to survive a stage as before, since the sampling is only as frequent as the natural departures during the original even stages.
Note that we set each to be time step shorter than the corresponding and make each fictitious even stage time step longer (unless the length of is ). We do this to ensure enough items will be sampled: Because of integrality constraints, an even stage may be too short (e.g., of length ) and if so, little (or nothing if the stage has length ) can be sampled there. This is also the reason why Short stages are separately considered.
Note that Algorithm 1 can be easily modified to work with even stages instead of odd stages, and will yield the corresponding version of the theorem below with “odd” replaced by “even”. In order to show Theorem 1, we will use either the odd stages or even stages algorithm depending on which yields larger expected welfare. Note that it is entirely possible that one of these stages yields very low welfare compared to the other.
Algorithm 1 is a -approximation to the sum of over odd stages with .
We use to denote . For any odd with , let the random variable be the number of items in the set that has horizon of at least , i.e., the end of (new) stage . We denote by in the rest of the proof.
is the sum of independent Bernoulli random variables, where the -th one denotes whether item is in and has horizon of at least . We have
where we calculate the probability that item has horizon of at least , was never selected into ’s during previous stages , and was selected into . Further simplifying it, we have
Since the MHR condition implies the item is more likely to survive in earlier time steps, we have:
Now, and . Thus,
Note that for . For , . Thus, for any . By Chernoff bound,
Now let be where is the CDF of distribution , just as in Algorithm 1. Let the random variable denote the number of buyers with valuation of at least among the next buyers. We have
If , then and with probability . In this case, Algorithm 1 gets at least in this stage. Since with probability at least , we know Algorithm 1 gets at least and thus is an -approximation during the stage.
If , then and . By Chernoff bound,
When , Algorithm 1 gets at least the benchmark during the stage. Therefore, it is an -approximation. ∎
2.4 Approximating ProShort
In this section, we deal with length- stages using Algorithm BlindMatch, that simply matches each arriving buyer to any available item.
Algorithm BlindMatch is a -approximation to .
Let , the number of length- stages. Consider the time . Since there are still at least length- stages after time , at least items in expectation have horizons of at least , by the definition of the stages. Using Chernoff bound, the probability that at least items with horizons of at least is greater than . If this happens, the first items will be matched. Therefore, Algorithm BlindMatch is a -approximation to , completing the proof. ∎
2.5 Approximating ProFinal
We now approximate ProFinal from Lemma 3. by the definition of the stages. We run Algorithm 2. We randomly sample an item and focus on the item in our algorithm. The probability that item is sampled is proportional to . If item is sampled, we run an algorithm for the single-item setting (lines and in Algorithm 2). The single-item policy is analyzed in Section 3 where it is shown to achieve welfare at least .
Algorithm 2 is a -approximation of ProFinal in expectation.
2.6 Proof of Theorem 1
Now we are ready to prove our main theorem.
Proof of Theorem 1.
To summarize our previous discussion:
Theorem 5 yields a -approximation to , where the sum is over odd stages with .
If we replace “odd” with “even” in Theorem 5 and the corresponding algorithm, we have a -approximation over even stages with .
Theorem 6 is a -approximation to over stages with .
Theorem 7 yields a -approximation to ProFinal.
An algorithm can do one of (1) to (4) with probability and respectively, yielding a -approximation to Pro.∎
3 Prophet Inequality for Single Item with MHR Horizon
In this section, we consider the case where there is item, and present a proof of Theorem 2. The algorithm also serves as our approximation for ProSingle, which we use for the overall algorithm with multiple items
We show that the following fixed-price balancing scheme is a -approximation, and this bound is tight for geometric distributions:
Pretend the item departs uniformly over time at rate , where . Choose a price s.t. the rate of acceptance of buyers matches the rate of departure of the item.
We bound the performance of this policy by using a simple linear programming upper bound onPro that only uses expected values. Though the relaxation is simple, just as in Section 2.2, it brings out the key insight that the upper bound also behaves like a balancing scheme, except it assumes the item lasts forever when performing the matching. Surprisingly, such a simple relaxation yields the worst-case optimal bound over all MHR distributions.
Let . Then for items, there is a fixed pricing policy that is -competitive. This policy sets the price such that where .
First we find an upper bound for Pro. Let be a random variable with distribution . Consider the following LP:
Variable is the probability that a buyer with realized value is chosen by prophet. The first constraint requires the item to be sold at most once in expectation. The second constraint says each value can be chosen only when it appears. Both of the constraints are relaxations as they should hold for any realization while the constraints are in expectation. The optimal objective is thus an upper bound for the expected value of the prophet.
Let be the Lagrange multiplier associated with the first constraint. The partial Lagrangian of the LP is:
The partial Lagrangian is decoupled for each value and is maximized when for any and otherwise. For any , this gives us an upper bound on the prophet’s welfare. Let be the value such that . If we set , we get the following upper bound for the prophet’s value:
Essentially, the prophet pretends that the horizon is infinite and it can always find a buyer with value at least . Now we look at Alg which is an algorithm with a single price . The algorithm has to also consider the event that the horizon ends before the item is matched.
Now, we show that for MHR horizons, this algorithm is -competitive. The key idea is to use second order stochastic dominance to show that the upper bound is maximized for geometric distributions with the same mean. Somewhat surprisingly, we also show in Theorem 10 that this result is tight in the sense that for geometric distributions, no online policy can do better.
For any MHR distribution with mean ,
In order to prove the above theorem, we use second-order stochastic dominance.
If distribution is second-order stochastically dominant over , and and have the same mean, then for any convex function , .
We now use second order stochastic dominance to show the following.
Geometric distribution with mean is second-order stochastically dominated by any other MHR horizon distribution with the same mean.
Let be the following convex function:
where is a positive integer. Let be the geometric distribution with mean . From Definition 2, the lemma holds if and only if for any and any MHR distribution with the same mean .
We prove this by contradiction. Let be an MHR distribution with mean which satisfies for some . The set of MHR distributions with the same tail after (the same for any ) is homeomorphic to a closed and bounded set in , which means it’s compact. The function is continuous in under -norm, so there is a maximizing among MHR distributions with the same tail after . This differs from at some . Define and . Because is MHR, ’s are decreasing. Also as otherwise the mean cannot be , and as otherwise cannot hold. Thus there is some such that and .
We are going to show for a pair of small enough and , decreasing by and increasing by such that the mean is preserved will increase . Let . When , we have . This implies , which means is increased. It contradicts with the fact that maximizes . ∎
No online algorithm is better than -competitive for items when the horizon distribution is geometric with mean .
Let be the probability that the process continues after each step. We have .
Define Alg* as the expected value of the optimal algorithm and Pro as that of the prophet. Let the valuation distribution be: with probability and with probability , . At each step, Alg* will set the price to if it expects to get more than afterwards. Otherwise it will set the price to . Randomizing over and cannot help Alg*. Also, because the geometric distribution is memoryless, Alg* will make the same decision every time, i.e., the optimal algorithm is single-threshold. We have
When and , the theorem holds because . Otherwise, we set so that Alg* is indifferent between its two options. In that case,