Prophet Secretary Through Blind Strategies

07/19/2018 ∙ by Jose Correa, et al. ∙ 0

In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Gairing (1977) established that this worst case ratio is a universal constant equal to 1/2. In the last decade prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A very interesting variant is the Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms achieve a constant of 1-1/e and very recently this barrier was slightly improved. This paper analyzes strategies that set a nonincreasing sequence of thresholds to be applied at different times. The gambler stops the first time a sample surpasses the corresponding threshold. Specifically we consider a class of strategies called blind quantile strategies. They consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that they can achieve a constant of 0.665, improving upon the best known result of Azar et al. (2018), and on Beyhaghi et al. (2018) (order selection). Our proof analyzes precisely the underlying stopping time distribution, relying on Schur-convexity theory. We further prove that blind strategies cannot achieve better than 0.675. Finally we prove that no nonadaptive algorithm for the gambler can achieve better than 0.732.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Prophet-Inequalities. For fixed , let be non-negative, independent random variables and the set of stopping times associated with the filtration generated by . A classic result of Krengel and Sucheston, and Garling [16, 17] asserts that and that 2 is the best possible bound. The interpretation of this result says that a gambler, who only knows the distribution of the random variables and that looks at them sequentially, can select a stopping rule that guarantees her half of the value that a prophet, who knows all the realizations could get. The study of this type of inequalities, known as prophet inequalities, was initiated by Gilbert and Mosteller [11] and attracted a lot of attention in the eighties [13, 14, 15, 21]. In particular Samuel-Cahn [21] noted that rather than looking at the set of all stopping rules one can (quite naturally) only look at threshold stopping rules in which the decision to stop depends on whether the value of the currently observed random variable is above a certain threshold (and possibly on the rest of the history). In the last decade the theory of prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms (PPMs) which are frequently used in online sales (see [5] and [8]). The way these mechanisms work is as follows. Suppose a seller has an item to sell. Consumers arrive one at a time and the seller proposes to each consumer a take-it-or-leave-it offer. The first customer accepting the offer pays that price and takes the item. This is again a stopping problem, and we refer the reader to [5, 7, 12, 19] for the connection between this stopping problem and prophet inequalities.

Although the situation for the standard prophet inequality just described is well understood, there are variants of the problem, which are particularly relevant given the connection to PPMs, for which the situation is very different. In what follows we describe three important variants that are connected to each other and constitute the main focus of this paper.

  • Order selection. In this version the gambler is allowed to select the order in which she examines the random variables. For this version [5] improved the bound of (of the standard prophet inequality) to . This bound remained the best known for quite some time until Azar et al. [3] improved it to . Interestingly, the bound of Azar et al. actually applies to the random order case described below. Very recently Beyhaghi et al. [4], use order selection to further improve the bound to .

  • Prophet secretary (or random order). In this version the random variables are shown to the gambler in random order, as in the secretary problem. This version was first studied by Esfandiari et al. [9] who found a bound of . Their algorithm defines a nonincreasing sequence of thresholds that only depend on the expectation of the maximum of the and on . The gambler at time-step stops if the value of (the variable shown at step ) surpasses . Later, Correa et al. [6] proved that the same factor of can be obtained with a personalized but nonadaptive sequence of thresholds, that is thresholds such that whenever variable is shown the gambler stops if its value is above . In recent work, Ehsani et al. [10] show that the bound of can be achieved using a single threshold (having to randomize to break ties in some situations). This result appears to be surprising since without the ability of breaking ties at random is the best possible constant and this insight turns out to be the starting point of our work. Shortly after the work of Ehsani et al., Azar et al. [3] improved it to through an algorithm that relies on some subtle case analysis.

  • IID Prophet inequality. Finally, we mention the case when the random variables are identically distributed. Here, the constant can also be improved. Hill and Kertz [13] provided a family of “bad” instances from which Kertz [14] proved the largest possible bound one could expect is , where is the unique solution to . Quite surprisingly, this upper bound is still the best known upper bound for the two variants above. Regarding algorithms Hill and Kertz also proved a bound of which was improved by Abolhassan et al. [1] to . Finally Correa et al. [6] proved that is a tight value. To this end they exhibit a quantile strategy for the gambler in which some quantiles , that only depend on (and not on the distribution), are precomputed and then translated into thresholds so that if the gambler gets to step

    , she will stop with probability

    .

1.1 Our contribution

In this paper we study the prophet secretary problem and propose improved algorithms for it. In particular our work improves upon the recent work of Ehsani et al. [10], Azar et al. [3], and Beyhaghi et al. [4] by providing an algorithm that guarantees the gambler a fraction of in the prophet secretary setting. Our main contribution however is not the actual numerical improvement but rather the way in which this is obtained. In addition, we provide an example that shows that no algorithm can achieve a factor better than for the prophet secretary setting.

From a conceptual viewpoint we introduce a class of algorithms which we call blind strategies, that are very robust in nature. This type of algorithm is a clever generalization of the single threshold algorithm of Ehsani et al. to a multi-threshold setting. In their algorithm Ehsani et al. first compute a threshold such that and then use this as a single threshold strategy, so that the gambler stops the first time in which the observed value surpasses . They observe that this strategy only works for random variables with continuous distributions, however they also note that by allowing randomization the strategy can be extended to general random variables. Rather than fixing a single probability of acceptance we fix a function which is used to define a sequence of thresholds in the following way. Given an instance with continuous distributions we draw uniformly and independently random values in , and reorder them as . Then we set thresholds such that and the gambler stops at time if . Note that if the function is nonincreasing this will lead to a nonincreasing sequence of thresholds.

The idea of blind strategies comes from the i.i.d. case mentioned above. In that setting the strategies are indeed best possible as shown by Correa et al. [6]. What makes blind strategies attractive is that although decisions are time dependent, this dependence lies completely in the choice of the function , which is independent of the instance. This independence significantly simplifies the analysis of multi-threshold strategies. Again, when facing discontinuous distributions we also require randomization for our results to hold.

From a technical standpoint our analysis introduces the use of Schur convexity [20] in the prophet inequality setting. We start our analysis by revisiting the single threshold strategy of Ehsani et al., which corresponds to a constant blind strategy . We exhibit a new analysis of this strategy showing stochastic dominance type result. Indeed we prove that the probability that the gambler gets a value of more than is at least that of the maximum being more than , rescaled by a factor . This result uses Schur convexity to deduce that if there is a value above the threshold , then it is chosen by the gambler with probability at least . Then we extend this analysis to deal with more general functions which require precise bounds on the distribution of the stopping time corresponding to a function . These bounds make use of results of Esfandiari et al. [9] and Azar et al. [3] and of newly derived bounds that follow from the core ideas in Schur convexity theory.

Again in this more general setting we find an appropriate stochastic dominance type bound on the probability that the gambler obtains at least a certain amount with respect to the probability that the prophet obtains the same amount. Interestingly we manage to make the bound solely dependent on the blind strategy by basically controlling the implied stopping time distribution (patience of the gambler). Then optimizing over blind strategies leads to the improved bound of . Through the paper we show two lower bounds on the performance of a blind strategy, the second more involved than the first one. In the first case, there is a natural way to optimize over the choice of

solving an ordinary differential equation, leading to a guarantee of

. In the second case, using a refined analysis, we derive the stated bound of . Although it may seem that our general approach still leaves some room for improvement, we prove that blind strategies cannot lead to a factor better than 0.675. This bound is obtained by taking two carefully chosen instances and proving that no blind strategy can perform well in both.

Finally, we prove an upper bound on the performance of any algorithm: we construct an instance (which is not i.i.d.) in which no algorithm can perform better than . This improves upon the best possible bound known of which corresponds to the i.i.d. case and was proved by Hill and Kertz [13]. Furthermore it improves and generalizes a recent bound of of Azar et al. [3] for the more restricted class of Deterministic distribution-insensitive algorithms. Prior to our work, no separation between prophet secretary and i.i.d. prophet inequality was known.

1.2 Preliminaries and statement of results

Given nonnegative independent random variables and a random permutation 111Here denotes the set , in the prophet secretary problem a gambler is presented with the random variables in the order given by , i.e., at time she sees the realization of . The goal of the gambler is to find a stopping time such that is as large as possible. In particular we want to find the largest possible constant such that

where is the set of stopping times.

Throughout this paper we denote by the underlying distributions of , which we assume to be continuous. All our results apply unchanged to the case of general distributions by introducing random tie-breaking rules (this is made precise in Section 6). To see why random tie-breaking rules are actually needed, consider the single threshold strategy of Ehsani et al. [10]. Recall that they compute a threshold such that and then use this as a single threshold strategy, which, by allowing random tie-breaking, leads to a performance of . However, if random tie-breaking is not allowed, a single threshold strategy cannot achieve a constant better than . Indeed, consider the instance with deterministic random variables equal to and one random variable giving with probability and zero otherwise. Now, for a fixed threshold the gambler gets with probability and 1 otherwise so that she gets approximately , whereas if the gambler gets with probability , leading to an expected value of . Noting that the expectation of the maximum in this instance equals , we conclude that fixed thresholds cannot achieve a guarantee better than . However, as Ehsani et al. note, if in this instance the gambler accepts a deterministic random variable with probability , then her expected value approaches . In Section 6 we extend this idea for the more general multi-threshold strategies.

The main type of stopping rules we deal with in this paper uses a nonincreasing threshold approach. This is a quite natural idea, since Esfandiari et al.[9] already use such an approach to derive a guarantee of . Interestingly, the analysis of multi-threshold strategies becomes rather difficult when trying to go beyond this bound. This is evident from the fact that the more recent results take a different approach. In this paper we use a rather restrictive class of multi-threshold strategies that we call blind strategies. These are simply given by a nonincreasing function which is turned into an algorithm as follows: given an instance of continuous distributions, we independently draw

from a uniform distribution on

and find thresholds such that

where is the i-th order statistic of . Then the algorithm for the gambler stops at the first time in which a value exceeds the corresponding threshold, in other words the gambler applies the following.

1:  for   do
2:     if   then
3:        Take
4:     end if
5:  end for
Algorithm 1 Time Threshold Algorithm ()

Note that a blind strategy is uniquely determined by the choice of function , independent of the actual distributions or even size of the instance. This justifies that we may simply talk about strategy . Our goal is thus to find a good function such that the previous algorithm performs well against any instance.

For a blind strategy and an instance , we will be interested in the underlying stopping time which is the random variable defined as , where the are the corresponding thresholds. In particular the reward of the gambler is , which we simply denote by .
Our main result is the following: There exists a nonincreasing function such that

where is the stopping time of the blind strategy . In addition, we prove the following upper bound on the performance of blind strategies: No blind strategies can guarantee a constant better than 0.675. Finally, we prove the following upper bound on the performance of general strategies: No strategy can guarantee a constant better than . The rest of the paper is organized as follows. In Section 2 we present an alternative simple proof that single threshold strategies guarantee a constant , that will help the reader to understand the proof of Theorem 1. Then, in Section 3, we prove that blind strategies guarantee a constant 0.665. In Section 4, we sharpen the analysis of Section 3 to prove Theorem 1, and also prove Theorem 1.2. In Section 5, we prove the upper bound of Theorem 1.2. Last, Section 6 explains how to deal with discontinuous distributions.

2 Single Threshold

As a warm-up exercise, we illustrate the main ideas in this paper by providing an alternative proof of a recent result by Eshani et al. [10]. Consider the blind strategy given by , where is a fixed number (taking gives exactly the single threshold algorithm of Eshani et al.).

Let . Given ,

Recall that given an instance , the blind strategy first computes such that and then uses , which simply stops the first time a value above is observed.

Note that for , we have that . Now, for , we have that

The second equality stems from the independence of the , the first inequality is a consequence of Lemma 2 (which relies on Schur convexity), and the second inequality follows from the union bound.

For a nonnegative random variable we have that . Thus, an immediate consequence of Theorem 2 is a result of Eshani et al. [10].

Corollary ([10])

Take , then .

To complete the previous proof, we prove the following lemma.

Lemma

Consider independent random variables and an independent random uniform permutation of . Let be the stopping time of and , then for all such that we have that

Denoting the distribution of by , using the fact that is a uniform random order and the definition of , we get the following.

Define by

and . Thus we have that

Clearly and is permutation symmetric. Therefore, to check that it is Schur convex we must simply confirm the following condition, known as the Schur-Ostrowski criterion [20],

Straightforward calculations yield

and, by symmetry, . Then,

Finally, since and , we get that if and only if , which holds by monotonicity of the exponential function. Therefore we have proven that is Schur-convex.

Schur convexity readily implies that the optimization problem is solved at . Consequently, for fixed , and under the constraint that , the quantity is minimal when, for all , .
It follows that, since and are independent,

Now we note that the left hand side does not depend on

: we can add some dummy variables (

) and the probability does not change. Therefore, taking limit on we get

3 Beating

In the rest of the paper we assume for simplicity that are continuous (see Section 6 for an explanation on how to extend the results to the discontinuous case). In this section, we prove the following proposition, that is a weaker version of Theorem 1 (the constant is 0.665 instead of 0.669):

There exists a nonincreasing function such that

where is the stopping time of the blind strategy . We first present this result because it already beats significantly the best known constant in literature, and it is simpler to prove than Theorem 1.

To do the analysis we first need to note that a blind strategy can be interpreted as the limit, as the size of the instance goes to infinity, of strategies that do not use randomization.

Consider a nonincreasing function . The deterministic blind strategy given by is the strategy that applies to the sequence of thresholds defined by the following conditions:

To turn a deterministic blind strategy into a blind strategy, consider an instance and add to this instance deterministic random variables equal to zero, that is, for , so that the new instance becomes . Denoting by the stopping time given by the deterministic blind strategy applied to this instance, we have that

Indeed, recalling the definition of blind strategies in Section 1.2, it is easy to see that a deterministic blind strategy applied to instance is approximately a blind strategy applied to instance , where the random variables are drawn from , rather than from . Thus, by taking the limit as the claim follows.

The conclusion of this remark is that in order to analyze the performance of blind strategies it is sufficient to study the performance of deterministic blind strategies and then take the limit as grows to infinity.

We are now ready to start analyzing deterministic blind strategies.

Lemma

Given an instance and nonincreasing thresholds , it holds that, for and ,

where is the stopping time given by .

Notice that, since thresholds are nonincreasing,

and therefore, we must simply analyze the second term.

To give a lower bound on the right-hand side term, we use the following simple inequality:

Lemma

For all ,

(1)

This is a particular case of Lemma 4, that will be proved in the next section. We can now minorize by a quantity that depends only on the cumulative distribution of and on .

Let be nonincreasing, and let be the deterministic blind strategy stopping time. For every instance and ,

where .

Note that for all and , . Plugging this inequality and inequalities and (1) into Lemma 3 yield the result. We now bound the cumulative distribution of in function of the , . Both the lower and upper bounds are sharp in the sense that they are achieved by different instances: the lower bound corresponds to the case where there is only one non-zero variable and the upper bound corresponds to the case where all distributions are equal.

Lemma

Fix . For every instance consider the sequence of thresholds such that

Denote by the stopping time of . For all , we have

The proof consists in highlighting the role of and in and using the symmetry that the random order induces. The key idea is to distinguish between the following cases:

  1. , i.e. : only one of the variables and shows before time .

  2. , i.e. : both and show before time .

  3. , i.e. : neither nor shows before time .

To express this formally, define

For , we have that either

  1. s.t. and

  2. s.t. and

  3. .

This is the key decomposition we use to show the inequality. In what follows, we write for the probability that is strictly larger than , given that the instance is . We have

To simplify the notation, let us define

Then,

Let us show that both and change in the correct direction when we change and , by and , or and , respectively. For this, note that for all

and for all

Then,

and

We can conclude on the lower bound by applying the inequality times and noticing that

The upper bound follows from applying the inequality infinitely many times and noticing that

Remember that in the proof of Lemma 2 we solved the following optimization problem: . The value of this problem was obtained by noticing that is Schur convex. This time we considered the problem

where “opt” is a symbol in . This problem is harder since it involves optimizing over functions rather than real numbers. Trying to apply Schur convexity theory again, one could see

as a function of the distributions evaluated at each threshold, that is as a function of the vector

. Unfortunately this domain is not symmetric and moreover the constraint of the product being constant results in different constraints.

However, note that the previous lemma shows that is nearly log-Schur-convex: it increases when the components of the argument get more concentrated in some coordinate. Nevertheless, the behavior of is not always monotone along the curve , a property that would be satisfied by a log-Schur-convex function if were numbers. In spite of the latter, there is a step by step way to go from