Reserve Price Optimization for First Price Auctions

06/11/2020 ∙ by Zhe Feng, et al. ∙ Google Harvard University 0

The display advertising industry has recently transitioned from second- to first-price auctions as its primary mechanism for ad allocation and pricing. In light of this, publishers need to re-evaluate and optimize their auction parameters, notably reserve prices. In this paper, we propose a gradient-based algorithm to adaptively update and optimize reserve prices based on estimates of bidders' responsiveness to experimental shocks in reserves. Our key innovation is to draw on the inherent structure of the revenue objective in order to reduce the variance of gradient estimates and improve convergence rates in both theory and practice. We show that revenue in a first-price auction can be usefully decomposed into a demand component and a bidding component, and introduce techniques to reduce the variance of each component. We characterize the bias-variance trade-offs of these techniques and validate the performance of our proposed algorithm through experiments on synthetic data and real display ad auctions data from Google ad exchange.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A reserve price in an auction specifies a minimum acceptable winning bid, below which the item remains with the seller. The reserve price may correspond to some outside offer, or the value of the item to the seller itself, and more generally may be set to maximize expected revenue [25]. In a data-rich environment like online advertising auctions it becomes possible to learn a revenue-optimal reserve price over time, and there is a substantial literature on optimizing reserve prices for second-price auctions, which have been commonly used to allocate ad space [27, 23, 24].

In this work we examine the problem of reserve price optimization in first-price (i.e., pay-your-bid) auctions, motivated by the fact that all the major ad exchanges have recently transitioned to this auction format as their main ad allocation mechanism [10, 6]. First-price auctions have grown in favor because they are considered more transparent, in the sense that there is no uncertainty in the final price upon winning [5].111The full reasons for the transition are complex, and include the rise of “header bidding” [30]. A header bidding auction is a first-price auction usually triggered by code in a webpage header (hence the name). Unless restrictive assumptions are met, there is in theory no revenue ranking between first- and second-price auctions [19], and there is no guarantee that reserve prices optimized for second-price auctions will continue to be effective in a first-price setting.

From a learning standpoint the shift from second- to first-price auctions introduces several new challenges. In a second-price auction, truthful bidding is a dominant strategy no matter what the reserve. The bidders’ value distributions are therefore readily available, and bids stay static (in principle) as the reserve is varied. In a first-price auction, in contrast, bidders have an incentive to shade their values when placing their bids, and bid-shading strategies can vary by bidder. The gain from setting a reserve price now comes if (and only if) it induces higher bidding, so an understanding of bidder responsiveness becomes crucial to setting effective reserves.

Bid adjustments in response to a reserve price can occur at different timescales. If a bidder observes that it wins too few auctions because of the reserve price, it may increase its bid in the long-term (in a matter of hours up to weeks). Our focus here is on setting reserves prices by taking into account immediate bidder responses to reserves. We assume that each bidder has a fixed, unknown bidding function that depends on its private value and the observed auction reserve . This agrees with practice in display ad auctions because the reserve is normally sent out in the ’bid request’ message to potential bidders [18]. To the extent that the bid function responds to , first-price reserves can potentially show an immediate positive effect on revenue.

Our Results

We propose a gradient-based approach to adaptively improve and optimize reserve prices, where we perturb current reserves upwards and downwards (e.g., by 10%) on random slices of traffic to obtain gradient estimates.

Our key innovation is to draw on the inherent structure of the revenue objective in order to reduce the variance of gradient estimates and improve convergence rates in both theory (e.g., see Corollary 4.1) and practice. We show that revenue in a first-price auction can be usefully decomposed into two terms: a demand curve component which depends only on the bidder’s value distribution; and a bidding component whose variance can be reduced based on natural assumptions on bidding functions.

A demand curve is a simpler, more structured object than the original revenue objective (e.g., it is downward-sloping), so the demand component lends itself to parametric modeling to reduce the variance. We offer two variance reduction techniques for the bidding component

222Variance reduction of the bidding component relies on the insight that bids far above the reserves are little affected by them (under natural bidding models), so these bids can be filtered out when computing gradient estimates—changes in such bids are likely due to noise rather than any effect of reserves., referred to as bid truncation and quantile truncation

. Bid truncation can strictly decrease variance with no additional bias assuming the right bidding model, whereas quantile truncation may introduce bias but is less sensitive to assumptions on the bidding model.

We evaluate our approach over synthetic data where bidder values are drawn uniformly, and also over real bid distributions collected from the logs of the Google ad exchange with different bidder response models. Our experimental results confirm that the combination of variance reduction on both objective components leads to the fastest convergence rate. For the demand component, a simple logistic model works well over the synthetic (i.e., uniform) data, but a flexible neural net is needed over the semi-synthetic data. For the bidding component, we find that quantile truncation is much more robust to assumptions on the bidding model.

Related Work

This paper connects with the rich literature on reserve price optimization for auctions, e.g., [25, 29]. How to set optimal reserve prices in second price auctions

based on access to bidders’ historical bid data has been an increasingly popular research direction in Machine Learning community, e.g., 

[26, 23, 24]. Another related line of work uses no-regret learning in second price auctions with partial information feedback to optimize reserve prices, e.g., [7, 9]. All of the works cited so far rely on the fact that the seller can directly learn the valuation distribution from historical bid data, since the second price auction is truthful.

For first-price auctions, we have found little work on setting optimal reserves for asymmetric bidders, since there are no characterizations of equilibrium strategies for this case. Results are only available for limited environments, such as bidders with uniform valuation distributions [19, 22]. Recently, there has been a line of work regarding revenue optimization against strategic bidders in repeated auctions, e.g., [3, 17]. In this paper, instead of assuming bidders act strategically, we assume each bidder has a fixed bidding function in response to reserves. This is a common assumption in large market settings and in the dynamic pricing literature [21].

The algorithms developed in this paper are related to the literature on online convex optimization with bandit feedback [11, 16, 1, 2]. However, there are two key differences with our work: (1) the revenue function in a first price auction is non-convex, and (2) the seller cannot obtain perfect revenue feedback under perturbed reserves with just a single query (i.e., auction)—the seller needs multiple queries to achieve accurate estimates with high confidence. Our algorithm is also related to zeroth-order stochastic gradient methods [14, 4, 13, 20], which we discuss in detail later in Section 3.

2 Preliminaries

We consider a setting where a seller repeatedly sells a single item to a set of bidders via a first price auction. In such an auction, the seller first sends out a reserve price to all bidders. Each bidder then submits a bid . The bidder with the highest bid larger than wins the item and pays their bid; if no bidder bids above , the item goes unallocated. Note that the type of reserve price we consider in this work is anonymous in the sense that each bidder sees the same reserve price.

Each bidder has a private valuation for the item, where each value is drawn independently (but not necessarily identically) from some unknown distribution.333This is without loss of generality, our analysis can easily be applied to any bounded valuation setting. In a first-price auction, only the highest bid matters for both allocation and pricing. Thus, to simplify the notation, we write to denote the maximum value and is drawn i.i.d. from an unknown distribution across each auction. Our analysis from here on will refer to this ‘representative’ highest bidder. (See Appendix A for a rigorous justification of why we can reduce multiple bidders to a single bidder.)

We write to denote the maximum bid when the reserve price is and the maximum value is , and to denote the distribution of for a fixed when is drawn according to . The main goal of the seller considered in this work is to learn the optimal reserve price that maximizes expected revenue:


Note that there is no reason for a bidder to bid a positive value less than the reserve : such a bid is guaranteed to lose. Therefore, without loss of generality we can assume that if , then . This allows us to write the revenue simply as:

In this paper, we focus on maximizing the revenue function in the steady state.

Response Models

We begin by describing some general properties of bidding functions that hold for any utility-maximizing bidders (see [22] for further discussion).

Definition 2.1.

A bidding function satisfies the following properties: 1) for all ; 2) for ; 3) for ; 4) is non-decreasing in for all .

In some of our algorithms, we would like to impose additional constraints on the response model which, while not a consequence of utility-maximizing behavior, are likely to hold in practice. One such constraint is the diminishing sensitivity in value of bid to reserve. This says that bidders with a larger value will change their bid less in response to a change in reserves.

Definition 2.2 (Diminishing sensitivity of bid to reserve).

If , then for and we have .

One natural and concrete example of a response model is a bidder that increases its bid to the reserve as long as the reserve is below its value. We refer to this as the perfect response model, formally defined as follows.

Definition 2.3.

A perfect response bidding function takes the form:

Note that the perfect response model is based on the original bid of the bidder under reserve price , namely . If is already above the reserve, then this bidder is unaffected by the reserve. Note that the perfect response model satisfies the diminishing sensitivity property.

In practice, bidders are unlikely to exactly follow the perfect response model; for example, bidders will often increase their bid to some amount strictly above the reserve so as to remain competitive with other bidders. For this reason, we propose a relaxation of the perfect response model which we call the -bounded response model: the bid is at most greater than what it would have been under the perfect response model if (see also Definition C.6). Note that the -bounded response model becomes the perfect response model when .

3 Gradient Descent Framework

The first-price auction setting introduces several challenges for setting reserve prices. First, the seller cannot observe true bidder values because truthful bidding is not a dominant strategy in a first-price auction. Second, how the bidders will react to different reserves is unknown to the seller—the only information that the seller receives is bids drawn from distribution when the seller sets a reserve price .

One natural idea, and the approach we take in this paper, is to optimize the reserve price via gradient descent. Gradient descent is only guaranteed to converge to the optimal reserve when our objective is convex (or at least, unimodal), which is not necessarily true for an arbitrary revenue function. However, gradient descent has a number of practical advantages for reserve price optimization, including:

  1. Gradient descent allows us to incorporate prior information we may have about the location of a good reserve price (possibly significantly reducing the overall search cost).

  2. The adaptivity of gradient descent allows us to quickly converge to a local optimum and follow this optimum if it changes over time, significantly saving on search cost (over global methods such as grid search).

  3. In practice, many revenue curves have a unique local optimum (see Section 5), so gradient descent is likely to converge to the optimal reserve.

More specifically, since the seller has no direct access to the gradients (i.e, first-order information) of , we consider approaches that fit in the framework of zeroth-order stochastic optimization. Our framework, summarized in Algorithm 1, proceeds in rounds. In round where the current reserve is , the seller selects a perturbation size and randomly sets the reserve price to either or on separate slices of experiment traffic, until it has received samples from both and . The seller then uses these samples to estimate the gradient of the revenue curve at and updates the reserve price based on this gradient estimate using learning rate (step size) .

We assume that we have access to a fixed total number of samples (the number of iterations is a variable that will be fixed later). There is then a trade-off between (i.e, the number of samples per iteration) and (the number of iterations available to optimize the reserve price).

  Input: Initial reserve , total number of iterations (a variable to be fixed later).
  Output: Reserve prices .
  for  do
     Set a reserve price of in auctions.
     Set a reserve price of in auctions.
     Construct an estimate of the gradient of revenue at , based on the feedback of experiments.
     Update reserve: , where
  end for
Algorithm 1 Zeroth-order stochastic projected gradient framework for reserve optimization.

Zeroth-order stochastic gradient descent is a well-studied problem 

[14, 4, 13, 20]. In this paper, we focus on taking advantage of the structure of to construct good discrete gradient estimates , as this aspect is specific to the problem of reserve price optimization. Specifically, we tackle the following problem which we term the discrete gradient problem:

  • Input: samples drawn i.i.d from and samples drawn i.i.d from , for known .

  • Output: An estimator for the discrete derivative . This estimator has bias and variance , where .

Solutions to the discrete gradient problem with small bias and variance directly translate into faster convergence rates for our gradient descent. We provide a detailed convergence result in Theorem C.2 in Appendix C.1. We summarize this result informally as follows.

Theorem 3.1 (Informal Restatement of Theorem c.2).

If for all , and then for optimal choices of and (and fixing ), Algorithm 1 satisfies

Here can be thought of as the true gradient at round (see Definition C.1 in Appendix).

Intuitively, we want to design an estimator and choose our parameters , so as to trade off between , , and . In the following sections, we show how to do this for a variety of bidder response models.

Naive Gradient Estimation

The simplest method for estimating the discrete gradient is to take the difference between the average revenue from bids from and the average revenue from bids from . More formally, we compute discrete gradient as,


We show that has the following properties.

Theorem 3.2.

Assume that , then .

This leads to the following convergence rate via Theorem 3.1.

Corollary 3.1.

Using this estimator , and setting and , Algorithm 1 achieves convergence,

Although there are no matching lower bounds, this is the best known asymptotic convergence rate for zeroth-order optimization over a non-convex objective [14, 4]. The naive gradient estimation approach has the advantage that it works regardless of response model, is simple to compute (it uses only revenue information and not individual bids), and leads to an unbiased estimator for the discrete derivative. The disadvantage is that the variance of this estimator can be large (especially as we take small). In the following section, we show how to address this by taking into account the inherent structure of the revenue objective based on an underlying bidder response model.

4 Variance Reduced Gradient Estimation

In this section, we first introduce another representation of the revenue formula by decomposing it into a demand component and a bidding component. We then propose techniques to reduce the variance of the discrete gradient of each component.

4.1 Revenue Decomposition

We can decompose the revenue in the following way.

Theorem 4.1.

We have that


Define and , so that . These two terms capture two different aspects of bidder behavior which contribute to revenue. The function amounts to a “demand curve” which gives the proportion of values that clear the reserve , and therefore the proportion of auctions that are bid on at . If the auction were just a simple posted-price auction (i.e., the winner is charged the quoted price ), then the demand component would be the associated revenue. However, in a first-price auction the winning bidder pays its bid, not the reserve. Therefore the bidding component captures the excess contribution from bids greater than the reserve.

To construct a good estimator for the discrete gradient of , it suffices to construct good estimators and for the discrete gradients of and respectively, and then output . Note that and , so it suffices to bound the bias and variance of each component separately.

4.2 Estimating the Demand Component Gradient

We begin by discussing how to estimate the gradient of the demand component of revenue. One method of doing so is by estimating with a parametric function , and using this approximation to estimate the gradient . (See Appendix B for additional justification for why this is likely to be possible and helpful). Suppose that we have access to additional historical data with which we can fit our parametric class to ; let be the resulting learned parameter. This learned demand function gives rise to the following estimator :


Note that this decreases overall variance, the variance of is 0 because the randomness of only comes from historical samples , which are independent of the samples obtained in the current round, at the cost of a possible increase in bias (due to inaccuracy in estimating ).

4.3 Estimating the Bidding Component Gradient

In this section we propose a variance reduction method to achieve a better estimator for for a variety of bidder models.

Variance reduction via bid truncation.

We first consider the special case of the perfect response (and more generally, the -bounded response) bidding model. In the perfect response model, if you were going to bid when the reserve was , you will bid the same bid when the reserve is . This means that large bids (bids larger than ) do not contribute in expectation to , but they do add noise to our gradient estimation. By filtering these out, we can reduce the variance of our estimator while keeping our estimator unbiased.

Since we only apply this filtering when estimating the bidding component but not the demand component , we must be careful when implementing this. Note that a large bid contributes to and to , and therefore to

. We can therefore construct an unbiased estimator for

by computing the contribution of unfiltered bids () from both or and then adding for each filtered bid in (or equivalently, each filtered bid in ; under perfect response, the fraction of filtered bids is equal in both models in expectation). Note that every bid from is either filtered or has excess , so we can write this gradient entirely in terms of bids from . Formally, we define truncated bid as

Our estimate for the gradient of is then given by


Since any bid in an -bounded model only differs from one in the perfect response model by at most , we can apply this same estimator to an -bounded response model. The following theorem characterizes the bias and variance of the estimator for the -bounded response model.

Theorem 4.2.

Assume that , then the estimator in Eq. (5) for -bounded response model, satisfies: .

Note that the bias of estimator is 0 for the perfect response model. The complete proof is given in Appendix C.5. Combining the above results for and , we have the following improved convergence result for the -bounded response model.

Corollary 4.1.

Suppose . Using the estimator proposed in Eq. (5) for the -bounded response model, setting and , Algorithm 1 achieves convergence, .

For perfect response bidding models, the above convergence rate is strictly faster than the convergence rate of naive estimator in Corollary 3.1 (state-of-the-art convergence rate for zeroth-order stochastic gradient descent), but with additional bias coming from demand estimation. However, we show this bias has practically negligible effect on the revenue in our experiments.

Variance reduction via quantile truncation.

In Eq. (5), we reduced the variance of by truncating all bids at the fixed threshold of . In general, this does not quite work: for bidder response models that are far from perfect response, this truncation can introduce a very large bias. Here we demonstrate one technique for constructing good estimators as long as the bidding function possesses diminishing sensitivity in value to reserve.

Instead of truncating in bid space, we will instead want to truncate in value space to reduce the variance. Even though we cannot directly truncate by values, since is monotonically increasing in , quantiles of bids (e.g., of and ) directly correspond to quantiles of values (of ). Instead of setting a threshold directly on the value, it is therefore equivalent to truncate at a fixed quantile of the bid distribution.

To achieve this, we first sort and in ascending order. Then we compute as


where is the quantile threshold used to truncate bids. The following theorem characterizes the bias and variance of the above ,

Theorem 4.3.

Let , , and . Then the estimator in Eq. (6) satisfies, .

Unlike with bid truncation, with quantile truncation we have a clear bias-variance tradeoff as we change : larger values of decrease the bias (both by decreasing and , which is decreasing due to diminishing sensitivity) but lead to larger variance. Since one can estimate this bound on the bias (by approximating via ), it is possible to choose to optimize this bias-variance tradeoff as one sees fit (for example, to minimize in Theorem 3.1). We show a convergence rate result for this quantile truncation approach in Corollary C.1 in Appendix C.7.

5 Experiments

We evaluate the performance of our algorithms on synthetic and semi-synthetic data sets. Due to space limitations, we present the complete experimental results in Appendix D.

5.1 Data Generation

The data generation process consists of two parts: a base bid distribution specifying the distribution of bids when no reserve is set, and a response model describing how a bidder with bid would update its bid in response to a reserve of .

Response models. We assume that in the absence of a reserve bidders bid a constant fraction of their value (i.e., ), which we refer to as linear shading. We consider linear shading combined with perfect response and with -bounded response, which we implement by adding a uniform random variable to the bid. We also examine equilibrium bidding for

i.i.d. bidders with uniformly distributed valuation 

[19]: .

Synthetic data. In our synthetic data sets, the (base) bid distribution is the uniform distribution. We apply the perfect response model, -bounded response model and equilibrium bidding model. In the simulations, we apply a constant shading factor of for the perfect response model and -bounded response model. For equilibrium bidding, we assume that each auction contains bidders.

Semi-synthetic data.

For our semi-synthetic data sets, we separately collected the empirical distributions of winning bids over one day for 20 large publishers on a major display ad exchange. Each distribution was filtered for outliers and normalized to the interval

. For this semi-synthetic data we only test the perfect-response model and -bounded response model, since there is no closed-form solution for the equilibrium bidding strategy. We use 0.3 as the constant shading factor for semi-synthetic data.

5.2 Methodology

Gradient descent algorithms. We examine five different gradient descent algorithms: (I) Naive GD: naive gradient descent using the gradient estimator in Eq. (2); (II) Naive GD with bid truncation: gradient descent using the gradient estimator in Eq. (5) for the bidding component, and a naive estimate444We can form a naive unbiased estimator , where and similarly for . of the demand component; (III) Naive GD with quantile truncation: gradient descent using the gradient estimator in Eq. (6) for the bidding component, and naive estimate of the demand component; (IV) Demand modeling with bid truncation: Same as the second variant, but with a parametric model of the demand curve to estimate demand component of gradient; (V) Demand modeling with quantile truncation: Same as the third variant, but with a parametric model of the demand curve to estimate demand component of gradient. The parameters used in these algorithms are specified in Appendix D.

(a) Synthetic data with perfect response.
(b) Synthetic data with equilibrium response.
(c) Semi-synthetic data with perfect response.
Figure 1: Revenue as a function of round for (a) synthetic data with perfect response, (b) synthetic data with equilibrium response, and (c) semi-synthetic data with perfect response.

Demand curve estimation. To reduce variance following the ideas of Section 4, we need a model for the demand component of the discrete gradient. Instead of estimating from historical data, we adaptively learn the demand curve during the training process. Concretely, at each round , we observe new (reserve, demand) pairs from samples and retrain our demand curve using all the samples observed up to the current round. We use this trained demand curve to compute based on (4

). For the synthetic data, a simple logistic regression can effectively learn the demand curve. However, the semi-synthetic data required a more flexible model so for this case we model demand using a fully connected neural network with 1 hidden layer, 15 hidden nodes and ReLU activations.

Figure 2: Revenue as a function of round for synthetic data with equilibrium response.

5.3 Evaluation

Effectiveness of gradient descent. First, we confirm that gradient descent can effectively find optimal reserves in our models. For each semi-synthetic model, we construct the revenue curve as a function of reserve with assumed response models. We find that 19 out of the 20 revenue curves have a clear single local maximum (the remaining curve has 2). In all cases (synthetic and semi-synthetic models), the revenue learned by the naive gradient descent algorithm is at least 95% of the revenue at the optimal reserve, which indicates that gradient descent can efficiently find the optimal reserve in these cases despite the lack of convexity.

Effectiveness of variance reduction methods. We first evaluate the performance of the quantile-based variance reduction method. We run the algorithm variants (I), (III) and (V) under synthetic data and semi-synthetic data with multiple bidder response models. Figures (0(a)) and (0(c)) show the revenue achieved by the three algorithms over time under the perfect response model. We find that quantile-based variance reduction leads to a more stable training process which converges faster than naive gradient descent. Figure (0(b)) evaluates the performance of the three algorithm variants under synthetic data and an equilibrium response model, with similar conclusions. Overall, quantile-based variance reduction outperforms naive gradient descent. Moreover, with the addition of demand curve estimation, algorithm variant (V) achieves better revenue and converges to an optimal reserve faster than the other two algorithms, in agreement with our theoretical guarantees.

We next consider variance reduction using bid truncation, which is used in algorithm variants (II) and (IV). Bid truncation is tailored to perfect response and performs the best overall for this response model, in accordance with the theoretical guarantees, but quantile truncation is competitive and often performs as well over the semi-synthetic data (see Appendix D for a detailed comparison). Under the equilibrium response model, bid truncation can in fact hinder the training process and lead to a substantially suboptimal reserve price (see Figure 2). In summary, quantile-based variance reduction coupled with a good demand-curve estimation is the method of choice to achieve good reserve prices under a range of different bid distributions and bidder response models.


  • Agarwal et al. [2010] Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT), June 2010.
  • Agarwal et al. [2011] Alekh Agarwal, Dean P Foster, Daniel J Hsu, Sham M Kakade, and Alexander Rakhlin. Stochastic convex optimization with bandit feedback. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 1035–1043, 2011.
  • Amin et al. [2013] Kareem Amin, Afshin Rostamizadeh, and Umar Syed. Learning prices for repeated auctions with strategic buyers. In Advances in Neural Information Processing Systems 26, 2013.
  • Balasubramanian and Ghadimi [2018] Krishnakumar Balasubramanian and Saeed Ghadimi. Zeroth-order nonconvex stochastic optimization: Handling constraints, high-dimensionality and saddle-points, 2018.
  • Benes [2017] Ross Benes. How SSPs use deceptive price floors to squeeze ad buyers., September 2017. Accessed: 2020-01-29.
  • Bigler [2019] Jason Bigler. Rolling out first price auctions to google ad manager partners digiday. products/admanager/rolling-out-first-price-auctions-google-ad-manager-partners, September 2019. Accessed: 2020-01-27.
  • Blum et al. [2003] Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. Online learning in online auctions. Theor. Comput. Sci., 324:137–146, 2003.
  • Boucheron et al. [2012] Stéphane Boucheron, Maud Thomas, et al. Concentration inequalities for order statistics.

    Electronic Communications in Probability

    , 17, 2012.
  • Cesa-Bianchi et al. [2015] N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in second-price auctions. IEEE Transactions on Information Theory, 61(1):549–564, Jan 2015.
  • Chen [2017] Yuyu Chen. Programmatic advertising is preparing for the first-price auction era. programmatic- advertising-readying-first-price-auction-era, October 2017. Accessed: 2020-01-29.
  • Flaxman et al. [2005] Abraham D. Flaxman, Adam Tauman Kalai, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, pages 385–394, Philadelphia, PA, USA, 2005.
  • Gentle [2009] James E Gentle. Computational statistics, volume 308. Springer, 2009.
  • Ghadimi [2019] Saeed Ghadimi. Conditional gradient type methods for composite nonlinear and stochastic optimization. Mathematical Programming, 173(1):431–464, Jan 2019.
  • Ghadimi and Lan [2013] Saeed. Ghadimi and Guanghui. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  • Ghadimi et al. [2013] Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155:267–305, 2013.
  • Hazan and Levy [2014] Elad Hazan and Kfir Y. Levy. Bandit convex optimization: Towards tight bounds. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, pages 784–792, 2014.
  • Huang et al. [2018] Zhiyi Huang, Jinyan Liu, and Xiangning Wang. Learning optimal reserve price against non-myopic bidders. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, 2018.
  • IAB [2016] IAB. OpenRTB API Specification, Version 2.5. www.iab. com/wp-content/uploads/2016/03/OpenRTB- API- Specification -Version-2-5-FINAL.pdf, December 2016. Accessed: 2020-01-26.
  • Krishna [2009] Vijay Krishna. Auction theory. Academic press, 2009.
  • Liu et al. [2018] S. Liu, X. Li, P. Chen, J. Haupt, and L. Amini. Zeroth-order stochastic projected gradient descent for nonconvex optimization. In 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1179–1183, Nov 2018.
  • Mao et al. [2018] Jieming Mao, Renato Leme, and Jon Schneider. Contextual pricing for lipschitz buyers. In Advances in Neural Information Processing Systems 31, pages 5643–5651, 2018.
  • Matthews [1995] Steven A. Matthews. A Technical Primer on Auction Theory I: Independent Private Values. Discussion Papers 1096, Northwestern University, Center for Mathematical Studies in Economics and Management Science, May 1995.
  • Mohri and Medina [2016] Mehryar Mohri and Andrés Muñoz Medina. Learning algorithms for second-price auctions with reserve. J. Mach. Learn. Res., 17(1):2632–2656, January 2016.
  • Munoz and Vassilvitskii [2017] Andres Munoz and Sergei Vassilvitskii. Revenue optimization with approximate bid predictions. In Advances in Neural Information Processing Systems 30, pages 1858–1866, 2017.
  • Myerson [1981] R. Myerson. Optimal auction design. Mathematics of Operations Research, 6:58–73, 1981.
  • Ostrovsky and Schwarz [2011] Michael Ostrovsky and Michael Schwarz. Reserve prices in internet advertising auctions: A field experiment. In Proceedings of the 12th ACM Conference on Electronic Commerce, EC ’11, pages 59–60, 2011.
  • Paes Leme et al. [2016] Renato Paes Leme, Martin Pal, and Sergei Vassilvitskii. A field guide to personalized reserve prices. In Proceedings of the 25th international conference on world wide web, pages 1093–1102, 2016.
  • Reddi et al. [2016] Sashank J. Reddi, Suvrit Sra, Barnabás Póczos, and Alexander J. Smola. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In NIPS, 2016.
  • Riley et al. [1981] John G Riley, William F Samuelson, et al. Optimal auctions. American Economic Review, 71(3):381–392, 1981.
  • Weiss [2019] Mark Weiss. Digiday research: Header bidding and first-price auctions boost publisher revenues. media/digiday-research-header-bidding-and-first-price-auctions-boost-publisher-revenues, January 2019. Accessed: 2020-01-29.

Appendix A From multiple bidders to a single bidder

Our auction can contain multiple bidders, each with their own value distribution and bid function . But when setting reserve prices, we only care about the maximum bid; more specifically, the distribution of maximum bid at each reserve. Thus it is useful to abstract away the set of bidders in the auction as a single “mega-bidder” whose value is the maximum of all the bidders’ values and who always bids the maximum of all the bids.

Theorem A.1.

Let be the distribution of (where each is independently drawn from ) and let be the distribution of . Then there exists a bid function such that the distribution of when is equal to the distribution .


If is the CDF of and is the cdf of , let . This guarantees that if , then . ∎

Note that this reduction also preserves the properties of Definition 2.1. For example, if for every bidder , then the induced also satisfies .

Appendix B Demand Function Estimation

As in Section 3, it is possible to form a naive unbiased estimate of the demand component via the estimator . The variance of the resulting unbiased estimator is then bounded by (see Theorem C.5), .

Note that for small , the variance guarantee here is significantly better than the variance guarantee in Theorem 3.2. Thus, in instances where the optimal reserve is small (and hence we mostly test small ), combining this naive estimator with better estimators for (like the ones we explore in the next section) can already lead to better convergence rates overall.

To obtain even better estimators, we can leverage the following two facts about the demand function. First, the demand function only depends on the value distribution of the bidders, and not their specific bidding behavior. Since we expect value to be relatively stable in comparison to bidding behavior, this means that we can reasonably use data from previous rounds to learn the demand function and inform calculation of (whereas the naive gradient update only uses data from the current round). Second, we expect the demand function to be simpler and more nicely structured than the full revenue function —for example, is weakly decreasing in —and therefore more amenable to parametric modeling.

Appendix C Omitted Proofs

c.1 Formal convergence rate

To show a convergence result for a non-convex problem with constraints, a measure called gradient mapping is widely used in the literature e.g. [15, 28, 20]. We define the gradient mapping used in this paper as follows,

Definition C.1 (Gradient Mapping).

Let be a differentiable function defined on , is a convex space, and is the projection operator defined as


The gradient mapping is then defined as

where is a gradient estimate of (can be biased), is the reserve and is the learning rate.

The gradient mapping can be interpreted as the projected gradient, which offers a feasible update from the previous reserve . Indeed, if the projection operator is the identity function, the gradient mapping just returns the gradient.

Theorem C.2.

Suppose is -smooth. Let be the gradient estimator at time , where almost surely. Fix , and . Assume for all that and . Then with probability at least , Algorithm 1 satisfies that

where is the gradient mapping.

To prove Theorem C.2, we first show some useful inequalities summarized in Lemma C.3 and Lemma C.4.

Lemma C.3 (Bernstein Inequality).

Let be the random variable of estimation of revenue’s gradient ( can be correlated), where almost surely, and . Then, we have

holds with probability at least .


By Bernstein’s inequality, we have

Let and solving for , we have

where the last inequality is based on the fact . Therefore, we have

Lemma C.4.

For any , we have


Since is a convex space, then for any , we have . Let and , we have

which implies

Thus, we have . Again, since is a convex space, . Then we can prove the second inequality,

Proof of Theorem c.2.

Denote and . First we bound the bias of compared with . can be decomposed as follows,


Then we bound the three terms above separately. For the first term, we have

By the smoothness of ,

Thus, we get

By assumption, the second term is bounded by . Combining Lemma C.3 in Appendix, for any fixed ,


holds with probability at least .

The -smoothness of revenue function implies the following inequalities, for any ,

The first inequality is because and the last inequality is based on first inequality in Lemma C.4. Rearranging the above inequalities, we have,


The last inequality is based on Cauchy-Schwartz inequality and the second statement in Lemma C.4. Rearranging Equation (10), we have


Using Cauchy-Schwartz inequality and Lemma C.4, we can bound in the following way,