DeepAI

# Are Two (Samples) Really Better Than One? On the Non-Asymptotic Performance of Empirical Revenue Maximization

The literature on "mechanism design from samples," which has flourished in recent years at the interface of economics and computer science, offers a bridge between the classic computer-science approach of worst-case analysis (corresponding to "no samples") and the classic economic approach of average-case analysis for a given Bayesian prior (conceptually corresponding to the number of samples tending to infinity). Nonetheless, the two directions studied so far are two extreme and almost diametrically opposed directions: that of asymptotic results where the number of samples grows large, and that where only a single sample is available. In this paper, we take a first step toward understanding the middle ground that bridges these two approaches: that of a fixed number of samples greater than one. In a variety of contexts, we ask what is possibly the most fundamental question in this direction: "are two samples really better than one sample?". We present a few surprising negative result, and complement them with our main result: showing that the worst-case, over all regular distributions, expected-revenue guarantee of the Empirical Revenue Maximization algorithm given two samples is greater than that of this algorithm given one sample. The proof is technically challenging, and provides the first result that shows that some deterministic mechanism constructed using two samples can guarantee more than one half of the optimal revenue.

• 21 publications
• 12 publications
• 81 publications
• 44 publications
02/27/2021

### Improved Two Sample Revenue Guarantees via Mixed-Integer Linear Programming

We study the performance of the Empirical Revenue Maximizing (ERM) mecha...
11/18/2017

### Average-case Approximation Ratio of Scheduling without Payments

Apart from the principles and methodologies inherited from Economics and...
02/26/2021

### Revelation Gap for Pricing from Samples

This paper considers prior-independent mechanism design, in which a sing...
03/12/2020

### Escaping Cannibalization? Correlation-Robust Pricing for a Unit-Demand Buyer

A single seller wishes to sell n items to a single unit-demand buyer. We...
12/26/2018

### The Prophet Inequality Can Be Solved Optimally with a Single Set of Samples

The setting of the classic prophet inequality is as follows: a gambler i...
06/24/2019

### Online Revenue Maximization for Server Pricing

Efficient and truthful mechanisms to price resources on remote servers/m...
06/25/2020

### Average-case Complexity of Teaching Convex Polytopes via Halfspace Queries

We examine the task of locating a target region among those induced by i...

## 1 Introduction

Arguably the simplest revenue maximization problem is that of maximizing the revenue of a single buyer from selling a single item to a single bidder. In this problem, the seller customarily possesses some prior information about the buyer, traditionally modeled via a distribution from which the valuation (maximum willingness to pay for the item) of the buyer is drawn, and the seller’s task in this Bayesian model is to devise a selling mechanism that maximizes her expected revenue over this distribution. This classic problem was completely resolved in the seminal paper of Myerson (1981), who showed that the optimal mechanism (among all truthful, possibly even randomized, mechanisms) is to offer the item for a take-it-or-leave-it price tailored for the given prior distribution.

In recent years, the literature at the interface of economics and computer science, influenced by the newly found popularity of machine learning, has seen the rise of a line of work that relaxes the assumption of complete knowledge of the underlying distribution by the seller, to the assumption of the seller having access to samples from this distribution. In a sense, this model offers a bridge, via the number of samples that are available to the seller, between the classic computer-science approach of worst-case analysis (corresponding to “no samples”) and the above-mentioned classic economic approach of average-case analysis for a given prior distribution (conceptually corresponding to the number of samples tending to infinity). Nonetheless, all of the results that we know of in this vein are in one of two extreme and almost diametrically opposed directions: one looking at asymptotic results where the number of samples grows large

(Cole and Roughgarden, 2014; Morgenstern and Roughgarden, 2015, 2016; Devanur et al., 2016; Roughgarden and Schrijvers, 2016; Gonczarowski and Nisan, 2017; Balcan et al., 2016; Alon et al., 2017; Balcan et al., 2018), and the other asking what can be done with a single sample (Dhangwatnotai et al., 2015; Huang et al., 2015; Fu et al., 2015)

. For example, a result of the former direction would tell us that under certain conditions, a number of samples that is polynomial in certain parameters of the problem suffices for attaining a certain approximation to the optimal revenue with high probability, while a result of the latter direction would tell us that under certain conditions, access to a single sample from the buyer’s distribution allows the seller to design a mechanism that attains some constant fraction of the optimal revenue in expectation. In this paper, we take a first step towards understanding the middle ground that bridges these two approaches: that of a fixed number of samples greater than one. In particular, we ask what is possibly the most fundamental question in this direction:

are two samples really better than one sample?

To understand the specific context in which we ask the above question, and why it is more involved than may be expected, we zoom-in to provide some more context. Arguably the most natural algorithm for pricing a good given samples from an underlying distribution is the Empirical Revenue Maximization (henceforth ERM) algorithm, which sets the price to be the one that would maximize the expected revenue over the empirical distribution

— the uniform distribution over the given samples. For a single sample, this means offering the good for a price that equals this sample, which is shown by

Dhangwatnotai et al. (2015) to guarantee a revenue of one half (!) of the optimal revenue when the underlying distribution is regular111Regularity (sometimes called Myerson-regularity) is a standard mild restriction on valuation distributions. It is well known that without any restriction on the possible class of valuation distributions, no revenue guarantees can be proven when constructing a mechanism from a finite number of samples. To see this, consider, for small , a distribution that attains the value with probability  and the value with probability . The optimal expected revenue of is attained by posting a price of , however it is futile to try and learn even the order-of-magnitude of this price (for arbitrarily small ) from a finite number of samples that does not itself depend on ..222To be completely clear, the guarantee here is that for any regular distribution, the expected revenue from this mechanism, where the expectation is taken both over the given sample and over the bidder’s valuation (which are drawn i.i.d. from the underlying regular distribution), is at least half of the optimal expected revenue, where this expectation is taken over the bidder’s valuation. Huang et al. (2015) have in fact shown that this guarantee of one half cannot be improved upon by any deterministic mechanism that is designed based on a single sample from a regular distribution.333Fu et al. (2015) have nonetheless been able to slightly improve upon this guarantee of one half using a randomized pricing mechanism. As the number of samples grows large, Huang et al. (2015) show that ERM attains revenue that asymptotically tends to optimal for regular distributions. The main question that we ask in this paper is whether, for regular distributions, the worst-case guarantee of ERM constructed based on two samples is better, the same, or worse, then the one-half worst-case guarantee of ERM constructed based on one sample.

While it is clear that the optimal method to price an item based on two samples guarantees at least as much revenue as the optimal method to price an item based on a single sample — indeed, one could always ignore the second sample and only use the first sample as in the single-sample case — it is far less clear that ERM when based on two samples should have a better worst-case guarantee than ERM when based on a single sample.444For example, with many samples, ERM is in fact known to not be a worst-case-optimal pricing algorithm, and to be inferior in this sense to guarded ERM (Dhangwatnotai et al., 2015; Cole and Roughgarden, 2014). To drive this point home, we present two seemingly-similar problems, for which we show that increasing the number of samples has an undesired effect on the revenue of ERM, and an additional problem where another natural notion of monotonicity fails to hold for ERM. To phrase these three problems, it will be convenient to use the following notation: given a distribution and a natural number , we denote by the expected revenue over when pricing an item according to ERM given samples from the (underlying) distribution .

#### Problem 1: Is it true that for any fixed underlying regular distribution F and any number n, it holds that ERM(F,n+1)≥ERM(F,n)?

The answer, even for , turns out to be No! In fact, it turns out that for certain distributions , taking more samples “confuses” ERM, hurting revenue: .

#### Problem 2: Is it true that for any two fixed regular distributions F and G, if ERM(F,n)>ERM(G,n) then also ERM(F,n+1)>ERM(G,n+1)?

The answer, even for , turns out to be No! In fact, it turns out that for certain distributions and , while , it is in fact the case that .

#### Problem 3: Is it true that for any two fixed regular distributions F and G such that F first-order stochastically dominates G, and for any number n, it holds that ERM(F,n)≥ERM(G,n)?

While for this is known to hold, the answer, already for , turns out to be No! In fact, it turns out that for certain such distributions and , while the revenue from each fixed posted price is higher from than it is from , the structure of “confuses” ERM when based on two samples, hurting revenue by causing ERM to post lower-revenue prices.

The analyses of the above three problems are given in Section 3. Despite the surprising negative answers surveyed above to all three problems, we do manage to show monotonicity in the sense that the worst-case guarantee of the price computed by ERM based on two samples from a regular distribution is strictly higher than one half of the optimal expected revenue obtained by setting a posted price tailored specifically to the underlying distribution. To formalize this result, which is our main result, it will be convenient to use the following notation: given a distribution , we denote by the highest expected revenue over  attained by the optimal truthful mechanism (which, recall, is a posted-price mechanism).

###### Theorem 1 (See Corollaries 1 and 3).

There exists such that for every regular distribution , we have . In particular, .

We note that to the best of our knowledge, Theorem 1 is in fact the first result to show that some deterministic mechanism constructed using two samples can guarantee more than one half of the optimal revenue for every regular distribution.555We emphasize again, as mentioned above, that Fu et al. (2015) have constructed a randomized mechanism for one sample that guarantees more than one half of the optimal revenue in expectation. Our mechanism is thus the first deterministic mechanism constructed using two samples that guarantees more than one half of the optimal revenue. (Incidentally, our lower bound beats one half by orders of magnitude more than the lower bound of Fu et al. (2015) — see Section 4.) Proving Theorem 1 turns out to be a considerably more technically challenging than may have been expected (or rather, considerably more technically challenging than may have been expected before observing the negative answers to the above three problems, which may be seen as evidence that the proof of Theorem 1 should be challenging), up to the point that extending our methods even for comparing ERM for two and for three samples (let alone for higher values of ) seems intractable. The main problem that we leave open, therefore, is whether the monotonicity of that is uncovered in Theorem 1 when going from a single sample to two samples, holds for any number of samples.

###### Open Problem 1.

Is it true that for every ?

The rest of this paper is structured as follows. In Section 2 we formally present the model and definitions. In Section 3, we present the analysis of the above-surveyed negative results (to Problem 1, to Problem 2, and to an additional more technical problem). In Section 4, we present Theorem 1, which is our main result, and give a high-level overview of its proof. The proof itself is given in Sections 7, 6 and 5, with some calculations relegated to the appendix.

### 1.1 Further Related Work

The literature on “mechanism design from samples” is preceded and inspired by the literature on prior-free and prior-independent mechanism design. Early work in Algorithmic Mechanism Design has mainly focused on prior-free mechanism design, aiming for either worst-case welfare approximation (e.g., Lehmann et al., 2002) or worst-case revenue approximation with respect to some instance-specific benchmark (e.g., the best revenue when selling at least 2 items (Goldberg et al., 2001)). For more results on prior-free mechanisms see, for example, Chapter 7 of Hartline (2017).

The standard economic model of revenue maximization assumes that the value of each player is drawn from a known prior distribution, and the seller aims to maximize her expected revenue for that prior (Myerson, 1981). Bulow and Klemperer (1996) have presented a remarkable result, showing that a seller can attain at least his optimal revenue for buyers with values drawn i.i.d. from a regular distribution by using the VCG mechanism, as long as she can recruit one additional buyer (whose value is drawn independently from the same distribution). This mechanism is prior-independent in the sense that the mechanism does not depend on the priors, yet the approximation is obtained with respect to the optimal revenue for the specific distribution, even though this optimal revenue is unknown when the mechanism is run. Hartline and Roughgarden (2009) have initiated the systematic study of such prior-independent mechanisms. For more results on prior-independent mechanisms see, for example, Chapter 5 of Hartline (2017).

A slightly less demanding model than the prior-independent model is the model in which the mechanism has access to samples from the unknown underlying distribution, with the benchmark still being the (unknown-to-the-mechanism) optimal revenue for that specific distribution. The current paper uses this model, and prior work in this model is surveyed in the introduction above.

A somewhat similar model where the auction is also chosen based on sampled data, which is studied in the learning literature, is an online-learning model where the mechanism designer can use information from prior auctions to on-line optimize the parameters of the next auction (Cesa-Bianchi et al., 2015; Weed et al., 2016). The goal in this model is to optimize the overall performance.

The literature on “mechanism design from samples” restricts the dependence of the auction mechanism on the full details of the buyer’s valuation distribution, by having it depend only on sample valuations drawn from this distribution. An alternative approach to restricting the dependence of the auction mechanism on the full details of the valuation distribution is by having it depend only on certain statistical measures of the valuation distribution, such as its mean, its variance, or its median

(Azar and Micali, 2013; Azar et al., 2013).

## 2 Preliminaries

### 2.1 Model and Notation

#### Distributions and Revenues

We consider one seller and one buyer. The seller has one good for sale, which has no value for the seller if it is left unsold. The buyer has a private value (valuation, i.e., maximum willingness to pay) for the good, which is drawn from some distribution . For each real price , the expected revenue attained from posting price is thus simply . The highest possible expected revenue attainable from any price is denoted by . The seller does not know the value of the buyer or the distribution .

#### Empirical Revenue Maximization

Given samples from , the empirical distribution over these samples is simply the uniform distribution over the samples, i.e., sample is drawn with probability . The Empirical Revenue Maximization algorithm is given independent samples from , computes the price that maximizes the expected revenue attained from the empirical distribution over the given samples, and posts this price. We denote by the expected revenue of the price computed by the ERM algorithm over a fresh draw from (independent from the samples used to compute the price posted by the ERM algorithm); note that this revenue is in expectation over both the samples and the fresh draw.

#### Quantile Space

For our analysis, it would be convenient to reason about the possible values of the buyer using their quantiles

. The quantile of a value

with respect to a distribution is .666Throughout this paper, we use many definitions that depend on the distribution . To avoid clutter, we will omit the respective distribution from these notations when it is clear from context. (So the revenue from posting a price is .) Note that lower quantiles correspond to higher valuations . We also define the inverse map, from quantiles back to values: for a quantile , the value corresponding to that quantile with respect to a given atomless distribution is denoted by (and is well defined since is atomless). We note that sampling a value is therefore equivalent to uniformly sampling a quantile and then taking the value corresponding to that quantile .

#### Revenue Curve in Quantile Space

The revenue curve (in quantile space) that corresponds to an atomless value distribution is the mapping from a quantile to the expected revenue of posting the value as the price. We note that the value function (and hence also the distribution ) can be recovered from the revenue curve via , that is, is precisely the slope of the line connecting the origin to the point . We will at times write instead of , write instead of , etc. Note that . An atomless distribution  is called regular if its corresponding revenue curve (in quantile space) is concave.777While the more popular definition of regularity (Myerson, 1981) is phrased using virtual values, these two standard definitions are known to be equivalent (in fact, the definition used here is more general as it also applies to nondifferentiable revenue curves). Indeed, it is well known that the derivative of the revenue curve at quantile is Myerson’s virtual value at , and so the definition of regularity as the virtual-value function being increasing corresponds to the derivative of the revenue curve being decreasing, i.e., to the revenue curve being convex.

###### Definition 1 (e2).

Given a regular distribution with revenue curve , we define , as follows.

 e2(q1,q2)=er2(q1,q2)≜{r(argmaxq∈{q1,q2}v(q))max{v(q1),v(q2)}≥2min{v(q1),v(q2)},r(argminq∈{q1,q2}v(q))otherwise;

note that

 ERM(F,2)≜E(q1,q2)∼U([0,1]2)[e2(q1,q2)].

Namely, given a pair of quantiles , if the value is at least twice as large as the value then is the expected revenue of the price . Otherwise is the expected revenue of the price .

During our analysis, it will be useful to work with revenue curves that are normalized so that . The following simple Lemma, whose proof is given in Appendix C for completeness, justifies that this is without loss of generality.

###### Lemma 1.

For every regular distribution with revenue curve , and for every , we have that  (1)  for every ,  (2)  for every , and  (3) .

### 2.3 A Single Sample

In their paper, Dhangwatnotai et al. (2015) show that a celebrated theorem by Bulow and Klemperer (1996) can be reinterpreted to imply that guarantees one half of the optimal expected revenue for every regular distribution.

###### Theorem 2 (Dhangwatnotai et al., 2015).

for every regular distribution . Furthermore, this is tight, i.e., the constant cannot be replaced with any larger constant in this statement.

Dhangwatnotai et al. (2015) also give a direct simple proof for Theorem 2 that does not use Bulow and Klemperer’s result: recall that the quantile of a value drawn from is distributed uniformly in . Therefore, the expected revenue by using a value drawn from as a price is precisely the integral of the revenue curve , i.e., the area under the curve . Since this curve is convex, the area under it is at least half of the height of the highest point on this curve. (This bound is tight: fixing any triangular revenue curve, there exists a sequence of regular distributions whose revenue curves uniformly converge to this triangular curve. Thus, revenue curves with areas under them that are arbitrarily close to one half of the height of their highest point can be constructed.) Theorem 2 therefore follows since this height is precisely .

As noted in the introduction, while the bound of one half from Theorem 2 cannot be improved upon by any deterministic mechanism, Fu et al. (2015) do manage to nonetheless construct, using one sample, a randomized pricing mechanism that guarantees a strictly higher revenue of .

## 3 Three Negative Results

In this section we present the analysis of the three negative results surveyed in the introduction.

###### Proposition 1.

There exists a regular distribution such that .

###### Proof.

The equal revenue distribution is the distribution supported on with revenue for every posted price, that is, the distribution satisfying for every .

We take to be the equal revenue distribution, truncated at so that all of the mass of the equal revenue distribution at values is uniformly respread in throughout a small interval . The corresponding revenue curve (in quantile space) climbs up almost linearly (with a very slight convex curvature, which tends to linear as grows small) from (revenue ) until (revenue ), and continues at revenue from that point (i.e., for all quantiles ). For simplicity, we will perform our calculations by approximating the slightly curved convex climb of in be a linear climb (conceptually corresponding to tending to zero), that is:

 r(q)={10⋅qq≤0.1,1otherwise.

It is easy (and standard) to see that this approximation will have a negligible effect on our calculations of and , as its effect, for any quantile , on either or is negligible. For this to indeed hold, in the definition of when defining the quantile chosen by the operator, we will henceforth break ties between that have (such distinct can only occur in the initial linear climb of the revenue curve) in favor of larger quantiles (that is, higher revenue), as in the slightly curved initial convex climb of the revenue curve that is approximated by this linear climb, the value of a larger quantile is slightly smaller that that of a smaller quantile.

We start by precisely calculating the expected revenue from posting a price computed by ERM given one sample:

 ERM(F,1)=∫10r(q)dq=1−0.12=\nicefrac1920=0.95.

To calculate the expected revenue from posting a price computed by ERM given two samples, we note that the revenue will be nonoptimal (i.e., less than ) in precisely the following two cases:

• Both samples are from a quantile . The expected revenue, conditioned on this case, is .

• One sample is from a quantile and the other is from a quantile In this case, since , the price calculated by ERM given these two samples is , and so the expected revenue, conditioned on this case, is .

Therefore, the expected revenue from posting a price computed by ERM given two samples is:

 ERM(F,2)=1−0.12⋅(1−Eq1,q2∼U([0,0.1])2[max{10q1,10q2}])−2⋅0.1⋅0.8⋅(1−Eq1∼U([0,0.1])[10q1])==1−0.12⋅\nicefrac13−2⋅0.1⋅0.8⋅\nicefrac12=\nicefrac1112=0.91¯6.

And so, indeed, , as required. ∎

###### Proposition 2.

There exist regular distributions and such that and .

###### Proof.

An intuitive strategy for proving Proposition 2 is as follows: recall from the proof of Dhangwatnotai et al. (2015) of Theorem 2 that for distributions with “triangular” revenue curves, the expected revenue from posting a price computed by ERM given one sample, is precisely one half of optimal, since the area beneath the revenue curve is precisely one half of the maximum value of this curve. Nonetheless, when posting a price computed by ERM given two samples, it is not hard to observe that different triangular revenue curves result in different expected revenues. So, we will take and to be two distributions corresponding to triangular revenue curves with , such that . Since both revenue curves are triangular, we have that . By slightly perturbing the revenue curve of in a way that increases the area under this curve (causing to increase) while only slightly changing , we obtain that for this perturbed it still holds that , but at the same time it also holds that , as required. We omit the full details as Proposition 2 also follows from Proposition 3, whose more subtle proof we give below. ∎

Recall from the proof of Dhangwatnotai et al. (2015) for the single-sample case, that for any distribution , the expected revenue is the integral of the revenue curve with respect to the uniform measure over quantiles. In the case of two samples, can still be expressed as an appropriate integral of the revenue curse , with two main caveats: first, since two samples are involved, this integral is no longer with respect to the uniform measure on quantiles, and second, since the probability of using the price that corresponds to a certain quantile, i.e., the probability that ERM given two samples chooses this price, in fact depends in quite a delicate manner on the distribution through the value function , the measure with respect to which this integral is defined is itself intricately dependent on the distribution . To drive this point home, we present the following surprising observation:

###### Proposition 3.

There exist two regular distributions and with for all888Equivalently, first-order stochastically dominates . (so with respect to any measure), such that .

###### Proof of Proposition 3.

We will choose distributions and for which . We take to be the distribution function corresponding to the “triangular” revenue curve and take to be the distribution function corresponding to the following “quadrilateral” revenue curve, which is obtained from by adding a slight “bump” (while maintaining convexity) at (see Figure 1):

 rF(q)={2.2⋅qq≤0.11−0.78⋅1−q0.9otherwise.

By construction . It remains to show that .999As in our proof of Proposition 1, in order to avoid atoms in and in , we uniformly spread the mass of the atom at the highest value  in the support of each distribution throughout the interval , corresponds to replacing the first linear climb in each of their revenue curves with a slightly curved convex climb.

We start with (which corresponds to the uniform distribution over for negligible epsilon). By a simple calculation we have for one sample that and for two samples that . We now move on to ; the intuition behind our construction is that by adding the “bump” at , while we do increase revenue by slightly raising above , we in fact decrease the revenue by “confusing” the ERM algorithm and causing it to choose the higher price of the two samples, which corresponds to the lower revenue , in some cases, such as whenever the quantile of one of the samples is below and the quantile of the other is above (since in this case ). As it turns out, the latter effect dominates the former one, causing an overall decrease in expected revenue compared to that of . In Section A.1, we calculate and show that indeed , thereby completing the proof. ∎

## 4 Main Result

In this section, we phrase and prove our main result.

###### Theorem 3.

For every regular distribution , we have that .

Theorem 3 is the first part of Theorem 1 from the introduction. Combining Theorem 3 with Theorem 2, we obtain our main monotonicity result (the second part of Theorem 1 from the introduction).

###### Corollary 1.

.

As noted in the introduction, proving Theorem 3 turns out to be considerably more technically challenging than may have been expected. One hint as to why was already given in Proposition 3. To prove Theorem 3, we lower-bound the integral that defines

by estimating it over three domains: first, we estimate this integral conditioned upon the two samples corresponding to prices lower than the ideal posted price; second, we estimate this integral condition upon the two samples corresponding to prices higher than the ideal posted price; and finally, we estimate this integral conditioned upon the two samples falling on opposite sides of the ideal posted price. Estimating each of these integrals is quite involved. For the first two domains, we manage to show that ERM guarantees quantifiably more than one half of the optimal revenue, while unfortunately for the third domain, we do not manage to show that ERM guarantees even half of the optimal revenue, forcing us to estimate the integral for the first two domains tightly enough to enable us to argue that the losses in this third domain could be charged to the gains in the first two domains. To balance the charging argument, we must utilize quite a few observations in each domain, and furthermore estimate the integral in the first and last domains functions of the quantile of the ideal price. Putting all of these together, we manage to show an overall guarantee of at least

of — quite close to one half, but nonetheless strictly bounded away from one half (and still greater by orders of magnitude than the guarantee of that Fu et al. (2015) show for their randomized mechanism for one sample).

One possible way to approach Theorem 3 (and more generally, creftype 1) could have been to try and identify the worst-case distributions for samples (and more generally, for samples), i.e., distributions for which (and more generally, ) is smallest, and then to calculate this fraction for such distributions . Indeed, recall that this is how Dhangwatnotai et al. (2015) have proved Theorem 2: they have identified the distributions with triangular revenue curves as the worst-case distributions for one sample, and showed that for these distributions this fraction equals one half. Unfortunately, following this path, even for two samples, turns out quite elusive. In this vain, hoping that some distribution with a triangular revenue curve continues to be a worst-case distribution, Huang, Mansour, and Roughgarden (personal communication, 2015) have identified the single distribution with triangular revenue curve for which this fraction is smallest among all distributions with triangular revenue curves, and have calculated this fraction for this distribution (incidentally, it turns out to be considerably higher than our lower bound from Theorem 3). Unfortunately, in light of our proof of Proposition 3, it seems that one can show that for some distribution with a quadrilateral revenue curve (created, similarly to the proof of Proposition 3, by adding a small “bump” to the “left edge” of the triangular revenue curve identified by Huang, Mansour, and Roughgarden), this fraction turns out to be smaller than for the distribution identified by Huang, Mansour, and Roughgarden, and hence we conclude that no distribution with a triangular revenue curve is a worst-case distribution for ERM for two samples. As we do not know how to identify the worst-case distributions for ERM, even for two samples, our analysis must bound the fraction for arbitrary regular distributions.

In the remainder of this section, we survey the high-level ideas behind the proof of Theorem 3, whose full proof is given in Sections 7, 5 and 6, with some calculations relegated to Appendix A. Formally, we partition the set of pairs of quantiles as follows:

###### Definition 2 (L; R; B).

For a regular distribution with revenue curve , we define101010In case of multiple revenue-maximizing quantiles, we may pick arbitrarily among them. and partition into three sets:

1. ,

2. , and

3. .

To prove Theorem 3, we will lower-bound the expected revenue of a price chosen by the ERM algorithm given two samples, conditioned upon these two samples belonging to each of the sets , , and .

### 4.1 Both Samples Below the Ideal Price (\mathbfitR)

In Section 5, we lower-bound the expected revenue, conditioned upon both samples being lower than the ideal price :

###### Lemma 2.

For every regular ,

As noted above, it would not have sufficed to simply bound the expectation on the left-hand side of the above inequality away from one half of , as we will have to charge the losses below one-half-of- of our lower bound for the case where the quantiles are in (given in Lemma 4 below), which depend on , to the gains above one-half-of- from Lemma 2 (and from Lemma 3 below). To prove Lemma 2, we first lower-bound, for each possible quantile , the expected revenue of the price computed by ERM conditioned upon . Recall that since both quantiles are in , we have that is the quantile of the “better” sample, with higher expected revenue. Denoting by the threshold value of (see Figure 2)

so that the ERM algorithm chooses the price that corresponds to quantile (the price that attains better expected revenue among the two samples), it turns out that we are in a win-win situation: if is “close” to , then conditioned upon , there is a high probability that the higher quantile lies above , causing the “better” sample (with quantile ) to be chosen; conversely, the farther is from , the larger is, i.e., the revenue curve decreases quite moderately between and , and so even when the “worse” sample is chosen, the revenue is still reasonably high. The full proof of Lemma 2 is given in Section 5, with some calculations relegated to Section A.2.

### 4.2 Both Samples Above the Ideal Price (\mathbfitL)

In Section 6, we lower-bound the expected revenue, conditioned upon both samples being higher than the ideal price :

###### Lemma 3.

For every regular ,

 E(q1,q2)∼U(L)[e2(q1,q2)]≥0.528⋅OPT(F).

To survey the proof of Lemma 3, we define:

• — the expected revenue of ERM,

• — the expected revenue had we always picked the “better” sample (the sample with higher expected revenue).

The proof of Lemma 3 is based on a coupling of and , giving that

• , with equality if and only if is constant111111That is, ..

The proof of this inequality is similar to the proof of Lemma 2, and is achieved by lower-bounding, for each possible quantile , the expected revenue of the price computed by ERM conditioned upon . Recall that since both quantiles are in , we have that is the quantile of the “better” sample, with higher expected revenue. Denoting by the threshold value of (see Figure 3)

so that the ERM algorithm chooses the price that corresponds to quantile (the price that attains better expected revenue among the two samples), we show that we are once again in a win-win situation: if is “close” to , then conditioned upon , there is a high probability that the lower quantile lies below , causing the “better” sample (with quantile ) to be chosen; conversely, the farther is from , the larger is, i.e., the revenue curve decreases quite moderately between and , and so even when the “worse” sample is chosen, the revenue is still reasonably close to that of the “better” sample.

Unlike in the proof of Lemma 2, the proof of Lemma 3 requires some additional case-analysis beyond this point. A simple calculation shows that

• , with equality if and only if is linear121212That is, , so that ..

The observations in the two “bullets” above lead to the following win-win situation:

• Either is “far from linear”, and then , giving

 EL2(r)≥34⋅ML2(r)>34⋅(23+ε)⋅OPT(r)>12⋅OPT(r),
• or is “far from constant”, and then , giving

 EL2(r)≥(34+ε)⋅ML2(r)>(34+ε)⋅23⋅OPT(r)>12⋅OPT(r).

The full proof of Lemma 3 is given in Section 6, with some calculations relegated to Section A.3.

### 4.3 One Sample on Each Side of the Ideal Price (\mathbfitB)

In Section 7, we lower-bound the expected revenue, conditioned upon one of the two samples being lower than the ideal price and the other being higher than the ideal price :

###### Lemma 4.

For every regular ,

 E(q1,q2)∼U(B)[e2(q1,q2)]≥12⋅⎛⎝1−(q∗2⋅(1+q∗))2⎞⎠⋅OPT(F).

As already mentioned above, whenever , the right-hand side of the above inequality (which deteriorates as grows) is strictly less than the one-half-of- guarantee of ERM given one sample, so when proving Theorem 3 we will have to charge this loss to the gains over one-half-of- that we obtained in the lower bounds of Lemmas 3 and 2. The correctness of Lemma 4 follows from the following Sublemma.

###### Sublemma 4.

Let , let be a monotone nonincreasing function, and for every let be a monotone nondecreasing and concave function s.t.  for every . For every , define as follows:

 G(x,y)≜{r2(y)r2(y)≥T(x),r1(x)otherwise.

Then .

The proof of Sublemma 4 is given in Section 7. We will now show how Lemma 4 indeed follows from this Sublemma.

###### Proof of Lemma 4.

By Lemma 1, we may assume without loss of generality that . By symmetry of , it is enough to prove the claim w.r.t. , where . We define:

• by ,

• by ,

• by . (See Figure 4(a).)

We notice that under these definitions, we have (where is defined as in Sublemma 4) that

 E(q1,q2)∼U(B1)[e2(x,y)]=E(x,y)∼U([0,1]2)[G(x,y)].

We note that the lower-bound that we obtain by applying Sublemma 4, as is, to the above definitions of , is , which for is worse than the guarantee that we are attempting to prove in Lemma 4. The main idea that drives our improved revenue guarantee in Lemma 4 is to bound , the minimum value attained by , away from . Since the intersection of (the line defined by the points and ) and (the line defined by the points and ) is , then defining , we note that by monotonicity and convexity of and since , the range of is in fact . (See Figure 4(b).)

Applying Sublemma 4 with this value of , we obtain that

 E(q1,q2)∼U(B1)[e2(x,y)]=E(x,y)∼U([0,1]2)[G(x,y)]≥1+m2−12⋅(1+m2)2=12⎛⎝1−(q∗2(1+q∗))2⎞⎠,

as required. The calculation justifying the last equality is detailed in Section A.4. ∎

### 4.4 Completing the Proof of Theorem 3

Theorem 3 follows from combining Lemmas 4, 3 and 2.

###### Proof of Theorem 3.

By definition,

 ERM(F,2)=(1−q∗)2⋅E(q1,q2)∼U(R)[e2(q1,q2)]+q2∗⋅E(q1,q2)∼U(L)[e2(q1,q2)]+2q∗(1−q∗)⋅E(q1,q2)∼U(B)[e2(q1,q2)].

In Section A.5, we substitute the respective lower bounds from Lemmas 4, 3 and 2 for each of the above summands, and calculate and show that indeed , thereby completing the proof of Theorem 3. ∎

## 5 Both Samples Below the Ideal Price (\mathbfitR): Proof of Lemma 2

###### Proof of Lemma 2.

By Lemma 1, we may assume without loss of generality that . We define:

 ER2(r)=E(q1,q2)∼U(R)[e2(q1,q2)].

Let

. The random variable

has density function (see Lemma 5 in Appendix B). For , define

 ER2(r|q) ≜E[e2(q1,q2) ∣∣ min{q1,q2}=q].

Note that

 ER2(r) =Eq∼μ[ER2(r|q)].

Let . Conditioned131313This event has probability. However, since the pair has an appropriate joint density function, this conditioning is meaningful. on , we have that is equal to exactly when . Thus, we define as the threshold value that determines when .

###### Definition 3 (t(x); see Figure 2 on Figure 2).

For , we define

 t(q)≜sup{x≥q ∣∣ 2v(x)>v(q)}.

For , define , and when , define .

It is not hard to verify the following properties of the function .

###### Sublemma 5 (Properties of t).
1. if and only if .

2. For all : (due to monotonicity of ).

3. is monotone nondecreasing (due to monotonicity of ).

4. is continuous (due to continuity of ).

5. , whenever .

The following lemma relates with , which is the expected revenue obtained by always choosing the “better” sample when .

###### Sublemma 6.

For all :

 ER2(r|q)≥r(q)1−q(t(q)