Learning to bid in revenue-maximizing auctions

02/27/2019 ∙ by Thomas Nedelec, et al. ∙ 0

We consider the problem of the optimization of bidding strategies in prior-dependent revenue-maximizing auctions, when the seller fixes the reserve prices based on the bid distributions. Our study is done in the setting where one bidder is strategic. Using a variational approach, we study the complexity of the original objective and we introduce a relaxation of the objective functional in order to use gradient descent methods. Our approach is simple, general and can be applied to various value distributions and revenue-maximizing mechanisms. The new strategies we derive yield massive uplifts compared to the traditional truthfully bidding strategy.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern marketplaces like Uber, Amazon or Ebay enable sellers to fine-tune their selling mechanism by reusing their large number of past interactions with consumers. In the online advertising or the electricity markets, billions of auctions are occurring everyday between the same bidders and sellers. Based on the data gathered, different approaches learn complex mechanisms maximizing the seller revenue (Conitzer and Sandholm, 2002; Ostrovsky and Schwarz, 2011; Paes Leme et al., 2016; Golrezaei et al., 2017).

Most of the literature has focused on the auctioneer side (Milgrom and Tadelis, 2018). Algorithms focused on the bidder’s standpoint to enable them to be strategic against any smart data-driven selling mechanisms are lacking. These algorithms should ideally strengthen the balance of power driving the relationship between buyers and sellers. Our main objective is to exhibit simple robust algorithmic procedures that take advantage of various data-dependent revenue-maximizing mechanisms. This represents a big step forward in understanding possible strategic behaviors in revenue maximizing auctions. This is a new argument supporting the Wilson doctrine (Wilson, 1987) claiming that data-dependent revenue maximizing algorithms are not robust to strategic bidders.

1.1 Framework

In the early stage of the market design literature (see, e.g., Myerson (1981)), a typical underlying assumption is that the bidders’ value distributions were commonly known to the seller and other bidders. This can be justified if different group of bidders with the same value distribution are interacting successively with one seller. In the aforementioned modern applications, the same bidders have billions of interactions everyday with the seller. Even if the latter does not know the value distribution beforehand, it might use in many cases the past bid distributions as proxies of value distribution. Figure 1 shows why our setup is a good approximation to what is observed for some of the largest ad platforms.

Several mechanisms based on the value distribution of bidders have already been introduced. We will focus on the lazy second price auction with personalized reserve price (Paes Leme et al., 2016), the Myerson auction (Myerson, 1981), the eager version of the second price auction and the boosted second price auction (Golrezaei et al., 2017). When repeating these auctions (every day, or every milli-second, depending on the context) and if the bidder is myopic, i.e optimizing per stage and not long-term revenue, it is optimal to bid truthfully at each auction. So with myopic bidders, bids and values have the same distribution and the seller can design optimally the mechanism based on the former.

Non-myopic bidders optimize their long-term expected utility taking into account that their current strategy will imply a certain mechanism (for instance a specific reserve price) in the future. More precisely, we will consider the following steady state analysis. Assume the valuations of a bidder are drawn from a specific distribution ; a bidding strategy is a mapping from into that indicates the actual bid when the value is . As a consequence, the distribution of bids is the push-forward of by . In the steady state, the seller uses the distributions of bids to choose a specific auction mechanism among a given class of mechanisms . The objective of a long-term strategic bidder is to find her strategy that maximizes her expected utility when , she bids and the induced mechanism is . This steady-state objective is particularly relevant in modern applications as most of the data-driven selling mechanisms are using large batches of bids as examples to update their mechanism.

In terms of game theory, these interactions are a game between the seller - whose strategy is to pick a mechanism design that maps bid distributions to reserve prices - and the bidders - who chose bidding strategies. Our overarching objective is to derive the best-response, for a given bidder

, to the strategy of the seller (i.e., a given mechanism) and the strategies of the other bidders (i.e., their bid distributions).

Figure 1: Theoretical monopoly price of a large bidding engine function of reserve price sent by one of the biggest ad platform. The monopoly prices are computed on the bid distributions. This plot was built based on one day of display requests logged by a large bidding engine. We anonymized the axes due to confidentiality reasons. The monopoly price is the theoretically optimal reserve price as detailed in Section 2. The axes are in log-scale.

1.2 Contributions

Our main contributions are the following. We first introduce the optimization problem that strategic bidders are facing when the seller is optimizing personalized reserve prices based on their bid distributions. A straightforward optimization can fail because the objective is discontinuous as a function of the bidding strategy.

To circumvent this issue, we introduce a new relaxation of the problem which is stable to local perturbations of the objective function and computationally tractable and efficient. We numerically optimize this new objective through a simple neural network and get very significant improvements in bidder utility compared to truthful bidding. We also provide a theoretical analysis of

thresholded strategies (introduced in Nedelec et al. (2018)) and show their (local) optimality as improvements of bidding strategies with non-zero reserve value.

For the Myerson auction, the strategies learned by the model can be independently proved to be optimal. We apply the approach to other auction settings such as boosted second price or eager second price with monopoly price. We report massive uplifts compared to the traditional truthful strategy advocated in all these settings. Our simple approach can be plugged in any modern bidding algorithms learning distribution of the highest bid of the competition and we test it on other classes of mechanism without any known closed form optimal bidding strategies. We finally provide the code in PyTorch that has been used to run the different experiments. This approach opens avenues of research for designing good bidding strategies in many data-driven revenue-maximizing auctions.

1.3 Related work

Starting with the seminal work of Myerson (1981), a rich line of work indicates the type of auctions that is revenue-maximizing for the seller. In the case of symmetric bidders (Myerson, 1981), one revenue maximizing auction is a second price auction with a reserve price equal to the monopoly price, i.e, the price that maximizes . However, in most applications, the symmetric assumption is not satisfied (Golrezaei et al., 2017). In the asymmetric case, the Myerson auction is optimal (Myerson, 1981) but is difficult to implement in practice (Morgenstern and Roughgarden, 2015)

. In this case, a second price auction with a well-chosen vector of reserve prices guarantees at least one-half of the optimal revenue

(Hartline and Roughgarden, 2009).

In modern markets, some bidders are myopic simply because truthful bidding is a simple strategy to implement. Receiving truthful bid enables sellers to design various revenue maximizing auctions. (Conitzer and Sandholm, 2002) has therefore been interested in the automatic mechanism design that fine tunes mechanism based on some examples of bids. This work was extended recently in (Dütting et al., 2017)

with the use of deep learning. In

(Ostrovsky and Schwarz, 2011; Medina and Mohri, 2014; Paes Leme et al., 2016), it is shown specifically how to learn the optimal reserve prices in the lazy second price auction. This practice was theoretically addressed by (Cole and Roughgarden, 2014; Huang et al., 2018; Devanur et al., 2016) looking at the sample complexity of a large class of auctions assuming an oracle offering iid examples of the value distribution.

However, it is quite intuitive that non-myopic bidders should not bid truthfully. Robustness to strategic bidders has been studied in (Balseiro et al., 2017; Kanoria and Nazerzadeh, 2014; Epasto et al., 2018). A potential limitation of this type of approach is that it is either assumed that all bidders have the same value distribution (or up to for some specific metric on distributions) or that there is a very large number of bidders and a global mechanism designed so that any of them has no incentive to bid untruthfully. In (Ashlagi et al., 2016), an involved mechanism was designed that keeps the incentive compatibility property even if the seller is learning on former bids of the bidders.

None of these papers have exhibited optimal strategies that can be used when the seller is optimizing her mechanism based on past bids. This strategic behavior has been studied for posted price with one bidder and one seller (Mohri and Munoz, 2015). An independent line of work has focused on learning to bid when the value is not known to the bidders (Weed et al., 2016; Zhe Feng and Syrgkanis, 2018). Some Bayes-Nash equilibria corresponding to games where bidders can choose their bid distribution were designed (Tang and Zeng, 2018; Abeille et al., 2018) with some derivations of seller revenue and bidders utility at these equilibria. However, no strategies corresponding to these equilibria were provided in the general case. Our work is finally strongly related to (Nedelec et al., 2018) where a new class of shading strategies for second price auctions with personalized reserve price is proposed. Our new optimization pipeline is very general and enables bidders to learn good bidding strategies in multiple settings and for any value distribution.

2 The bidder’s optimization problem

We introduce in this section the optimization problem, starting with the lazy second price auction with personalized reserve prices (formalized below).

2.1 Notations and setting

To describe precisely our approach, we use the traditional setting of auction theory (see e.g. Krishna (2009)). Recall that is the value distribution of bidder and her strategy that maps values to bids. The corresponding distribution of bids is then , the push-forward of w.r.t. . In the steady-state, we assume that the seller has the perfect knowledge of each bid distribution . Notice that we have implicitly identified the distribution (resp. 

) with its cumulative distribution function (cdf) and use both terms exchangeably. We use


) for the corresponding probability density function (pdf).

For the sake of simplicity, let us first consider a lazy second price auction Krishna (2009). We recall that in this auction each bidder has a personalized reserve price. The item is attributed to the highest bidder, if she clears her reserve price, and not attributed otherwise; the winner then pays the maximum between the second highest bid and her reserve price. It is known that the optimal reserve price of bidder is her monopoly price equal to , or equivalently111at least for regular distributions, i.e., when is non-decreasing to , where is the usual virtual value function defined as

As a consequence, it is natural to assume that the strategy of bidder do not impact the strategy of other bidders (that can be either myopic or not) and from now on, we assume that bids are independent.

2.2 A variational approach

A fundamental result in auction theory is the Myerson lemma (Myerson, 1981). It expresses the expected payment of a bidder depending on her virtual value and the value distribution of the competition. An important notation is , the cdf of the maximum bid of players other than ; obviously, if the other bidders are truthful, is the distribution of the maximum value of the other bidders.

Lemma 1 (Integrated version of the Myerson lemma).

In a lazy second price auction with personalized reserve price , the payment of bidder with continuous strategy is


The proof is similar to the original one (Myerson, 1981), see also (Krishna, 2009), so we do not spell it out. It consists in using Fubini’s theorem and integration by parts to transform the standard form of the seller revenue, i.e.

into the above equation. It then suffices to work along the lines mentioned above with and realize that ’s expected payment can be written as

In lazy second price auction, the seller chooses as reserve price the monopoly price corresponding to the bid distribution of bidder . In this case, Lemma 1 implies that the expected payment of bidder is equal to

In order to simplify the computation of the expectation and remove the dependence on , this expected payment can be rewritten in the space of values, by introducing

and noting the equivalent following formulation

where when is increasing. We call it the reserve value, as it is the smallest value above which the seller accepts all bids from bidder .

The expected utility can be derived as a function of as


Finally, we remark that if is increasing and differentiable, verifies a simple first order differential equation.

Lemma 2.

Suppose is increasing and differentiable then


with and . Then,

If we consider only monotonically increasing differentiable strategies, and we denote by the class of such functions, the problem of the strategic bidder is therefore to solve with defined in Equation (1). This equation is crucial, as it indicates that optimizing over bidding strategy can be reduced to finding a distribution with a well-specified virtual value . A crucial difference between the long term vision and the classical, myopic (or one-shot) auction theory is that bidders also maximize expected utility. They might therefore be willing to sometime over-bid (incurring a negative utility at some specific auctions) if this reduces their reserve price. Indeed, having a lower reserve price increases the revenue of many other auctions. Lose small to win big. This reasoning is possible as there exist multiple interactions between bidders and seller, billions every day in the case of online advertising.

2.3 Discontinuity of the objective

In the previous section, we assumed the reserve value was defined as , which is well defined only if is increasing. This condition is complicated to ensure as, for instance, restricting the strategies to be increasing does not provide any guarantee on . If the later is not increasing, then the function that the seller mazimizes might have several local optima, as illustrated with a specific bid distribution in Figure 2. We mention here that this distribution actually arises during our numerical optimization using first order splines as described in the next section.

Figure 2: Revenue of the seller as a function of the reserve value. This shape of revenue by running the first order spline method described in Section 2.4 . For this distribution, there exists two local optima that are equivalent in terms of revenue for the seller but dramatically change the utility of the strategic bidder.

The fact that is not always strictly concave implies that the set of maximizer is not continuous but only upper hemi-continuous; stated otherwise, the reserve value can “jump” from a small to high value with an arbitrarily small change in the bidding strategy. In the example of Figure 2, the reserve value switches from 0.18 to 0.58. As a consequence, the expected utility of the bidder, which is another function depending on , might also jumps erratically. In the same example, the lower bound of integration increases from 0.18 to 0.58, so that the overall integral decreases from 0.14 to 0.09. This discontinuity makes the optimization of the real objective difficult.

2.4 An attempt with first-order splines

A natural question is whether the buyer can compute shading strategies numerically. A first approach is to look back at the gradient of the bidder’s utility in the direction of a certain function , i.e., the directional derivative, that can be computed by elementary calculus. and to look at shading function expressed in a specific basis as

and try to optimize over . It would be also quite natural to do an isotonic regression and optimize over non-decreasing functions directly; this approach is tackled later on.

A natural basis

Splines (see e.g. (Hastie et al., 2001) for a practical introduction) are a natural candidate for the function ’s. In particular, first order splines are piecewise continuous functions, hence evaluating derivatives is trivial and it is easy to account in the formula above for the finitely many discontinuities of the derivative that will arise. If ’s are given knots, first order splines are the functions

Higher order splines could of course also be used.

Lemma 3.

As described above, the optimal shading problem can be numerically approximated using steepest descent by a succession of linear programs, provided the non-decreasing constraint on

can be written linearly in . This is of course the case for 1st order spline.


After the function is expanded in a basis, the functional gradient becomes a standard gradient, and the shading function can be improved with a steepest descent. If the reserve value is not one of the knots, the gradient above is easy to compute: each step of the optimization requires to solve a constrained LP to ensure that the solution is increasing.

For 1st order splines, the derivative is constant between knots, thus checking that amounts to check finitely many linear constraints and so is amenable to an LP. ∎

The objective is not even continuous, though differentiable in a large part of the parameter space, the optimization problem is hard. In our experiments, we got significant improvement over bidding truthfully by using the above numerical method. However, we encountered the discontinuities of the optimization problem described above: our numerical optimizer got stuck at shading functions around which the reserve value was very unstable, which corresponds to revenue curves for the seller with several distant (approximate) local maxima: a small perturbation in function space does not induce much loss of the revenue on the seller side, but can have a huge impact on the reserve value and hence the buyer revenue. Note that in our numerical experiments we did not enforce the non-decreasing-constraint on but ended up with solutions that were non-decreasing. More details on this approach are provided in Appendix D.

This is precisely the reason why, in the next section, we introduce a relaxation of the problem that is easier to optimize and with the same solutions as our initial objective. We also change the class of shading functions we consider and using neural networks to fit them. Before we describe these experiments, we provide some theory for the problem of optimizing buyer revenue in lazy second price auctions.

3 Theory and a relaxation of the problem enabling the use of gradient descent

3.1 The family of optimal extensions of a strategy

In the context of lazy second price auctions, any increasing and continuous bidding strategy whose reserve value is not 0 can be improved with on where , i.e. by thresholding the virtual value below the current reserve value and keeping on . Indeed, Lemma 2 yields that on and elsewhere. So the seller is indifferent between setting the reserve price anywhere in and we might assume she picks 0 (if she is welfare benevolent, or it is always possible to give an -incentive to pick 0, for arbitrarily small). According to Myerson’s Lemma, the strategy generates the same payment as , so the revenue of the seller coming from this bidder is unchanged. On the other hand, that bidder wins more auctions with this new strategy, hence it improves her revenue and thus her expected utility.

In this subsection, we address the question of whether the strategy , which is simple and robust can be improved for the bidder. Our previous argument already shows that any improvement would be a strategy with 0 reserve value.

Differentiating Equation (2) yields

Let us denote by the current reserve price; we rewrite the family of bidding strategies with reserve value at 0 as elements of the following constraint set:

For all those strategies, the seller revenue is maximal for the reserve value , and hence under the assumption of welfare benevolence, the seller will accept all bids of the bidder. It is also clear that this set of constraints define all possible strategies with reserve value 0.

The strategy (which is increasing and continuous, say) that maximizes the revenue of the bidder corresponds to

under the constraints that

Let us limit ourselves to not changing our strategy beyond , e.g. by bidding truthfully beyond . Then we effectively need to maximize

with the continuity constraints that . The constraints can be rewritten into

along with . We call those strategies continuation strategies as they extend the bidding below the current reserve price/value.

Remark : in this class of feasible strategies, the optimal reserve value for the seller is zero. So the discontinuities of the objective function in the broader class of strategies considered before, which stemmed from discontinuities of the reserve value as a function of the shading function, are not anymore problematic.

The following theorem states one of our main result.

Theorem 1.

Let , and be differentiable on . Suppose that the virtual value is such that on . We consider shading functions on with .

Thresholding, i.e. using for is locally optimal among continuation strategies for which is differentiable on , provided on . It is also locally optimal among ’s such that is differentiable as function of .

Furthermore, if and , i.e. the competition’s distribution is Uniform, then thresholding is globally optimal among functions that are bounded by 1 and differentiable.

Sketch of proof : the proof consists in keeping track of the slack function , rewriting locally feasible ’s as functions of through differential equation manipulations and finally comparing their revenue and showing that the optimal is zero for our objective. This requires somewhat lengthy and delicate manipulations. In the case of , we are able to write all feasible ’s as a function of and carry out the program globally.

We note that we did not require in our analysis that our optimization be limited to non-decreasing functions; it turns out that our local optima are optimal in larger class of functions.

3.2 One relaxation of the objective

Instead of computing the exact reserve value in the definition of the expected utility of the bidder, we introduce a relaxation of the objective corresponding to :


We replaced by . This relaxation avoids to compute the reserve value at each step of the gradient descent and remove most of the discontinuities of the previous objective. We now prove that the function maximizing Equation. 3 has non-negative virtual value. The value of the relaxation objective at its optimum is equal to the one in the strategic bidder problem.

Theorem 2.

If an increasing and differentiable function is maximizing

it has non-negative virtual value, a reserve value equal to zero and with


We use the fact that if on a certain interval [a,b], we can find a new strategy with higher . Let us consider the rightmost interval [a,b] where . On , . Then on [a,b], . verifies on . Then if we denote , we define on [0,a] as . We have on [0,a]. is continuous. With , we see that is non-decreasing on [a,b]. Hence . Therefore, and . Hence, . Then, we tackle the next interval where by doing the same manipulation on . We conclude by induction on the intervals where .

Thus, a solution of the relaxation has a virtual value positive everywhere and a reserve value equal to zero. In this case, . ∎

This new objective enables to run simple gradient descent algorithms without the need to recompute the reserve value at each iteration. It is also more stable than the original one since a local change of the virtual value does not completely change the value of the objective, which could be the case when the reserve value were part of the objective.

4 Experimental setup

We present in this section the complete approach and report the uplift of the new bidding strategies in various revenue-maximizing auctions.

4.1 Our architecture

To fit the optimal strategies, we use a simple one-layer neural network with

ReLus. We replace the indicator function by a sigmoid function to have a fully differentiable objective and we optimize

with and . We start with a batch size of examples, sampled according to the value distribution of the bidder. We use a stochastic gradient algorithm (SGD) with a decreasing learning rate starting at . The full code in PyTorch is provided with the paper. The learning of an affine shading strategy is also provided in the notebook and is reaching already very decent performance.

In our setting, we assume that is known. However, we could replace its expression by an approximation learned from past examples of bids of the competition or on the winning distribution of bidder computed on past auctions (in practice one may have to use survival analysis techniques to account for censoring of the observations). The results for the lazy second price auction with personalized price are presented in Table 1 and in Table 2.

4.2 Extension to other types of auction

Our approach can easily be extended to many other types of auctions. Only a few lines of code are needed to adapt the objective to other mechanisms.

The Myerson auction.

The Myerson auction (Myerson, 1981) consists in using the virtual value to both for the allocation and payment rules. The item is allocated to the bidder with the highest non-negative virtual value that pays:

As for the lazy second price auction, we can use the Myerson lemma and show that the expected utility of the strategic bidder using the strategy in the Myerson auction is

with the cumulative distribution function of , is the value of bidder , and

is the virtual value function associated with the bid distribution. For some distribution, the optimal strategy can be analytically computed. For instance, for the uniform distribution, we can prove this lemma which defines the optimal strategies.

Lemma 4 (Shading against uniform bidders).

Suppose that has a positive density on its support and assume that is bounded by . Let be chosen by bidder 1 arbitrarily close to 0. Let us call

A near-optimal shading strategy is for bidder 1 to shade her value through

As goes to , this strategy approaches the optimum.

If the support of is within , then can be taken equal to 0.

The full proof is in Appendix C. Since in this specific setting optimal strategies have a known closed form, our optimization pipeline can be tested to see if it recovers these strategies. With the same pipeline used in Section 4.1, we optimize

Appendix E.2 focuses on the uniform distribution where our algorithm recover exactly the strategies proposed in Lemma 9 showing the robustness of our approach.

The interest of the optimization pipeline is the direct extension to all possible value distributions without the need to solve at each time a new system of differential equations. The performance with an exponential value distribution is provided in Table 1.

Auction Type K=2 K=3 K=4
Baselines Utility of truthful strategy (in revenue maximizing) 0.30 0.24 0.21
Utility of truthful strategy (in welfare maximizing) 0.50 0.33 0.25
4.0 Lazy second price auction Utility of strategic bidder
Uplift vs truthful bidding +50% +29% +14%
4.0 Eager second price auction Utility of strategic bidder
Uplift vs truthful bidding +73% +37% +19%
4.0 Myerson auction Utility of strategic bidder
Uplift vs truthful bidding +113% +87% +67%
4.0 Boosted second price Utility of strategic bidder
Uplift vs truthful bidding +60% +71% +52%
Table 1: All bidders have an exponential value distribution with parameter . The strategic bidder has K-1 opponents bidding truthfully and having a reserve price equal to 1.0, their monopoly price. The reserve price of the strategic bidder is computed on her bid distribution. For each run, the evaluation is based on samples, and we average the performances over 10 learnings. The utility of the strategic bidder can be higher that in the welfare-maximizing auction because revenue maximizing auctions remove competition below the reserve price, as illustrated by some examples in Appendix E.
Eager second price auction with monopoly reserve prices.

The eager second price auction consists in running a second price auction but only among bidders that clear their personalized reserve price. The objective function is very similar to the one of the lazy second price auction except that the winning distribution is different below the reserve price of the other bidders. Indeed, if all other bidders are below their reserve price, the strategic bidders that bids above his monopoly price is sure to win and only pays her monopoly price. We provide more details in Appendix E.3.

The boosted second price auction. (BSP)

Two small variants of the boosted second price auction (BSP) (Golrezaei et al., 2017) can also be addressed. We deal with the BSP auction as it seems to be one of the state of the art alternative to the second price auction with personalized reserve price to be used in practice and deals with heterogeneities between bidders. In the original paper, the seller computes first the reserve prices of each bidder based on their bid distributions. Then, the algorithm computes a boosting factor for each bidder by counterfactually maximizing the revenue of the seller. More precisely, the auction is ran according to :

  - First each bidder i submits his bid
  - Define S as a set of bidders whose bids exceed their reserve price, i.e,
  - If the set S is empty, the item is not allocated. Otherwise, the item is allocated to bidder with the highest boosted bid, i.e., and she pays . For other bidders, the payment is zero.
Algorithm 1 Boosted second price

To explain intuitively our two objectives corresponding to this auction, we consider first the example of the family of generalized Pareto distributions. As the virtual value of all distributions in this family is affine, the boosted second price auction is strictly equivalent to the Myerson auction in this family. It explains why this auction can perform well in practice for the seller since it avoids to compute exactly the virtual value by approximating it by a linear fit.

In the first model, we assume that the seller first makes an affine-fit through an L2 regression on the virtual values she observes. Then, she runs a Myerson auction based on these L2 fits. In the case of Generalized Pareto distributions, this procedure results exactly in the BSP auction. If we note the L2 fit of the virtual value corresponding to the bid distrution and , we optimize the Myerson objective with corresponding to the fit of . The main difference with the BSP auction for non-generalized pareto distributions is that the fit is used to compute the reserve price. In the second objective, we adress this limitation by computing first the reserve price based on the observed . Then, the algorithm computes a linear fit of on bids higher than . This linear fit is used as the boosting parameter for bidder i. To make the objective differentiable, we consider the relaxation where is assumed to be . We verify retrospectively that the final strategy verifies

Our experiments show that our approach can also be empirically generalized to more advanced, intricate, practical and modern settings, on top of working well theoretically on the lazy second price auctions.

4.3 Evaluation and results

Two different value distributions were used to run the experiments: the exponential distribution in Table

1 and the uniform distribution in Table 2 in Appendix. We focus on a small number of bidders since it is where the reserve price play an important role for the seller. (Celis et al., 2014) also noticed that the median of the number of participants in online advertising auctions is 6.

To compute the real performance of the strategy, we are conservative in the computation of the reserve price since we use . We then compute the performance by computing the objective (expected utility) with Monte-Carlo simulations. For the lazy second price with personalized reserve price, we use for instance

We compare the performance of our strategies with two baselines: the utility of one bidder bidding truthfully in a second price auction without reserve price (the welfare maximizing auction) and in a second price auction with monopoly price (with symmetric bidders, this auction is equivalent to the Myerson auction and is revenue-maximizing for the seller).

For BSP, we report results for the second objective which is the closest one to the corresponding procedure of (Golrezaei et al., 2017). The first one gives similar uplifts that the strategic behavior in the Myerson auction. The order of magnitude of the uplift reported is significant. We observe that BSP and the Myerson auction are less robust to strategic behavior than the lazy second price auction with personalized reserve price. Indeed, as in the eager version of the second price auction there is no competition when all other bidders are below their reserve price. It is not the case for the lazy second price auction explaining why the uplift are slightly lower for this specific auction.

We focused on the stationary case where the strategic bidder has to choose one strategy implying a bid distribution and the seller will immediately optimize their mechanism according to this bid distribution. However, our differential approach allows some generalizations. In future work, we could adapt the differential approach to more dynamic settings where the seller uses a particular dynamic to update the reserve price based on past bids of the bidders.

5 Conclusion

Recently, (Milgrom and Tadelis, 2018)

detailed how artificial intelligence and machine learning can impact market design. This survey is mostly focused on the auctioneer side. In this paper, we showed that machine learning can be efficiently used on the bidder side to learn how to shade in revenue-maximizing auctions that are optimized based on past bids (or a distribution announced by the bidder to which she commits). Our work, both theoretical and practical, complements the approach presented in that tutorial showing that strategic bidding can be implemented in some of the main revenue-maximizing auctions. Our work also raises questions about many automatic mechanism procedures since many are based on the assumption of having observed past truthful bids in order to optimize mechanisms. From an industrial point of view, our work provides a new argument to come back to simple and more transparent auction mechanisms that are less subject to optimization on both the bidders’ and the seller’s sides.


  • (1)
  • Abeille et al. (2018) Marc Abeille, Clément Calauzènes, Noureddine El Karoui, Thomas Nedelec, and Vianney Perchet. 2018. Explicit shading strategies for repeated truthful auctions. arXiv preprint arXiv:1805.00256.
  • Ashlagi et al. (2016) Itai Ashlagi, Constantinos Daskalakis, and Nima Haghpanah. 2016. Sequential mechanisms with ex-post participation guarantees. In Proceedings of EC 2016.
  • Balseiro et al. (2017) Santiago R Balseiro, Vahab S Mirrokni, and Renato Paes Leme. 2017. Dynamic mechanisms with martingale utilities. In Management Science.
  • Celis et al. (2014) L Elisa Celis, Gregory Lewis, Markus Mobius, and Hamid Nazerzadeh. 2014. Buy-it-now or take-a-chance: Price discrimination through randomized auctions. Management Science.
  • Cole and Roughgarden (2014) Richard Cole and Tim Roughgarden. 2014. The sample complexity of revenue maximization. In

    Proceedings of Theory of computing

  • Conitzer and Sandholm (2002) Vincent Conitzer and Tuomas Sandholm. 2002. Complexity of mechanism design. In Proceedings of UAI 2002. Morgan Kaufmann Publishers Inc., 103–110.
  • Devanur et al. (2016) Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. 2016. The sample complexity of auctions with side information. In Proceedings of Theory of Computing.
  • Dütting et al. (2017) Paul Dütting, Zhe Feng, Harikrishna Narasimhan, and David C Parkes. 2017. Optimal auctions through deep learning. arXiv preprint arXiv:1706.03459 (2017).
  • Epasto et al. (2018) Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, and Song Zuo. 2018. Incentive-aware learning for large markets. In Proceedings of WWW 2018.
  • Golrezaei et al. (2017) N. Golrezaei, M. Lin, V. Mirrokni, and H. Nazerzadeh. 2017. Boosted Second-price Auctions for Heterogeneous Bidders. In Management Science.
  • Hartline and Roughgarden (2009) Jason D Hartline and Tim Roughgarden. 2009. Simple versus optimal mechanisms. In Proceedings of EC 2009.
  • Hastie et al. (2001) Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning. Springer New York Inc., New York, NY, USA.
  • Huang et al. (2018) Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. 2018. Making the most of your samples. SIAM J. Comput.
  • Kanoria and Nazerzadeh (2014) Yash Kanoria and Hamid Nazerzadeh. 2014. Dynamic Reserve Prices for Repeated Auctions: Learning from Bids. In Proceedings of WINE 2014.
  • Krishna (2009) V. Krishna. 2009. Auction Theory.
  • Medina and Mohri (2014) Andres M Medina and Mehryar Mohri. 2014. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In Proceedings of ICML 2014.
  • Milgrom and Tadelis (2018) Paul R Milgrom and Steven Tadelis. 2018. How Artificial Intelligence and Machine Learning Can Impact Market Design. Technical Report. National Bureau of Economic Research.
  • Mohri and Munoz (2015) Mehryar Mohri and Andres Munoz. 2015. Revenue optimization against strategic buyers. In Proceedings of NIPS 2015.
  • Morgenstern and Roughgarden (2015) Jamie H Morgenstern and Tim Roughgarden. 2015. On the pseudo-dimension of nearly optimal auctions. In Proceedings of NIPS 2015.
  • Myerson (1981) R. B. Myerson. 1981. Optimal Auction Design. Math. Oper. Res. 6, 1.
  • Nedelec et al. (2018) Thomas Nedelec, Marc Abeille, Clément Calauzènes, Noureddine El Karoui, Benjamin Heymann, and Vianney Perchet. 2018. Thresholding the virtual value: a simple method to increase welfare and lower reserve prices in online auction systems. arXiv preprint arXiv:1808.06979 (2018).
  • Ostrovsky and Schwarz (2011) M. Ostrovsky and M. Schwarz. 2011. Reserve prices in internet advertising auctions: A field experiment. In Proceedings of EC 2011.
  • Paes Leme et al. (2016) Renato Paes Leme, Martin Pal, and Sergei Vassilvitskii. 2016. A field guide to personalized reserve prices. In Proceedings of WWW 2016.
  • Tang and Zeng (2018) Pingzhong Tang and Yulong Zeng. 2018. The price of prior dependence in auctions. In Proceedings of EC 2018.
  • Weed et al. (2016) Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In Proceedings of COLT 2016.
  • Wilson (1987) R. Wilson. 1987. Game-theoretic analyses of trading processes. In Advances in Economic Theory.
  • Zhe Feng and Syrgkanis (2018) Chara Podimata Zhe Feng and Vasilis Syrgkanis. 2018. Learning to bid without knowing your value. In Proceedings of EC 2018.

Appendix A Results for the uniform distribution

Auction Type K=2 K=3 K=4
Baselines Utility of truthful strategy (in revenue maximizing) 0.083 0.057 0.040
Utility of truthful strategy (in welfare maximizing) 0.166 0.083 0.050
4.0 Lazy second price auction Utility of strategic bidder
Uplift vs truthful bidding +72% +36% +20%
4.0 Eager second price auction Utility of strategic bidder
Uplift vs revenue-maximizing +51% +46% +25%
4.0 Myerson auction Utility of strategic bidder
Uplift vs revenue-maximizing +195% +130% +97.5%
4.0 Boosted second price Utility of strategic bidder
Uplift vs revenue maximizing +200% +40% +37.5%
Table 2: All bidders have a uniform value distribution. The strategic bidder has opponents, all bidding truthfully. The reserve price of all other bidders is equal to 0.5. The reserve price of the strategic bidder is computed on her bid distribution. For each run, the evaluation is based on samples. We average on 10 learnings the performance of the strategies. The utility of the strategic bidder can be higher that in the welfare-maximizing auction because revenue maximizing auctions are removing the competition below the reserve price. We provide some examples of strategies in Appendix E.

Appendix B Proofs for Section 3

Recall that our setup is that we are given a strategy and a current reserve value . We want to extend our strategy below in a way that is optimal, at least locally optimal. We assume throughout that the seller is welfare benevolent.

So we have to solve the infinite programming problem

where is given by the strategy that had reserve value at . This is just a continuity requirement and it ensures that Myerson’s formula applies.

Of course the seller revenue for bids below if she sets the reserve value at is

Note that the constraints mean that the max revenue of the seller is achieved for the reserve value 0 and it is 0. We know that the reserve value should be 0, because otherwise the buyer could use thresholding below the reserve value to increase her revenue and not change the revenue the seller derives from her in a lazy second price auction. So that guarantees that the reserve value is 0.

b.1 The case

Let be given. Call the revenue of the seller on . The constraints on is that , and on . That way the revenue of the seller is maximized at .

We assume that . So then .

b.1.1 Preliminaries

Using results in the main text, i.e. , our constraints can be written in differential form as

So if , we have equivalently

If we integrate by parts, using the fact that and call , we get


Assuming temporarily is differentiable, we differentiate Equation (4) to get

Because this is a first order ODE, we can integrate this equation fully to get

Note that this is the family of solutions among that are differentiable.

b.1.2 Key result

So we have the following theorem.

Theorem 3.

Suppose and call the revenue of the seller at . The unique shading function such that is differentiable and is such

Recall that our constraint is that . Of course corresponds to the thresholded function.


Put all of the above together and then integrate by parts to get the formulation above. ∎

b.1.3 Back to the optimization problem

We can naturally view as a slack variable. By contrast to classical finite dimensional optimization, our slack variable is a function. We wish to maximize

as we assume that remains below 1. Note that the part of the integral involving has now disappeared as we consider functions for which the average payment over is zero - that is the sense of our equality constraint.

Consider , with very small. Since , we see that

where we have used that . So we have

The question becomes whether

Lemma 5.

Suppose that on . Then, if ,


We have, using Fubini’s theorem,

Now since , we have equivalently


We conclude that for all ,

We have the following theorem.

Theorem 4.

For the problem of revenue optimization of the buyer, thresholding is a locally optimal (among functions such that is differentiable and is bounded by 1 on when the competition has cdf for .

Furthermore, it is globally optimal among those bidding strategies.


The local optimally comes from the previous analysis.

The global optimality comes from the fact that using the concavity of , we have for all . Then the upper bound is what we computed above. So an upper bound on the revenue is lower than the thresholded revenue and we have global optimality. ∎

b.2 The general case, local optimality

We now prove local optimality for general differentiable using differential ideas similar to the ones above. Let and . is the shading function corresponding to thresholding the virtual value. Recall that for continuity at . Below, is a differentiable function. We require to have continuity of at . Recall the constraints that for all feasible ’s,

by construction. We assume that is feasible for , which gives us the notion of locality we need. Because , we have

and hence by limiting ourselves to the first order term in the Taylor expansion (the second order term in is asymptotically negligible),

Assuming that is differentiable, we can differentiate the previous equality to get

and using the fact that because we want continuity at , we get

The last equality comes from integration by parts.

b.2.1 Back to the optimization problem

Recall that we see to maximize over admissible ’s (i.e. those for which the inequality constraints are verified),

Where the equality is to first order in the Taylor expansion. We claim that if is admissible, then the second term is negative.

Lemma 6.

Let . Assume that on [0,r]. Suppose that on . Suppose the function is such that for ,



The strategy is the same as above. Let us call