1 Introduction
Digital advertising has been a tremendously fast growing industry in recent years – the worldwide digital advertising expenditure has reached $ 283 billion in 2018, and it is estimated to further grow to $ 517 billion in 2023.
^{1}^{1}1Digital advertising spending worldwide 20182023 https://www.statista.com/statistics/237974/onlineadvertisingspendingworldwide/Among all advertisement allocation mechanisms, real time bidding (RTB) is perhaps one of the most significant developments during the past decade, and it is widely applied at the major online advertising platforms, including–but not limited to–Google, Facebook, and Amazon. In RTB for display ads, an auction held by an Ad Exchange is triggered once a user visits a webpage, and the winner of the auction earns the ad slot and pays the publisher a certain price.
A form of auction commonly used in practice by Ad Exchanges is a secondprice auction with reserve price [22]. In such auctions, the highest bidder wins the ad slot and pays the maximum of the second price and a reserve price set by the the publisher or the Ad Exchange. In particular, the reserve price of an ad slot can help improve the revenue if it is between the top two bids.
One central question for Ad Exchanges is how to set the reserve price for each incoming impression in order to maximize the total revenue. In general, the reserve price is set based on the contextual information of the ad campaign, including data pertaining to the publisher (e.g. ad site and ad size), user (e.g. device type and various geographic information), or time (e.g. date and hour). In this paper, we study an offline linear model to set the reserve price for each individual ad slot by utilizing its contextual information in order to maximize the total revenue on the seller side. This maximization problem can be formalized as:
(1) 
where and are the (nonnegative) highest bidding price and second highest bidding price of impression , respectively,
is the contextual feature vector of impression
, and is a bounded hypercube which serves as a feasible region for the model parameters . Additionally, is a discontinuous reward function given as(2) 
Figure 1 plots the reward function , which is a simple univariate (though discontinuous) function for given constants and . The revenue function is a constant if is either set below or above , and it increases linearly if is between and . In other word, by setting the reserve price between and , the seller can potentially capture more revenue from the auctions. However, the reserve price is set before observing the bidding prices and , and the seller must be cautious to not set the reserve price too high, as an unsuccessful auction results in a significant drop in revenue when . At the two extremes, this setting recovers a first price auction (by setting ) or a pure pricesetting problem (by setting ).
Although the univariate function is simple, the average revenue function can be extremely complicated, even for small problem instances. Figure 2 plots the average revenue with single feature (i.e. ) and
samples, randomly drawn from a lognormal distribution as specified in Section
4. As we can see in Figure 2, the average revenue function has many local maximizers and is discontinuous, even in the smallsample, univariate setting. This complexity will only be exacerbated in the largesample, multivariate case which is the focus of this paper.1.1 Our Results
Our contribution in this work is threefold.
Hardness (Section 2). Our first main result is to build off the intuition gleaned from Figure 2 to show that (1) is, indeed, a hard problem. In particular, we show that there is no algorithm that solves (1) in polynomial time unless the Exponential Time Hypothesis fails. The Exponential Time Hypothesis is a very popular assumption is computational complexity and it is the basis of many hardness results [34, 42, 29, 13, 18, 36, 10, 1, 11]. This computational complexity assumption is based on the SAT problem, a famous problem which is in the core of NPcomplete problems [31]. The Exponential Time Hypothesis states that SAT can not be solved in subexponential time in the worst case. In order to show this result, we reduce our problem to the classic densest subgraph problem.
New algorithms (Section 3). Knowing that there is no polynomialtime algorithm for solving (1), we model the problem exactly using MixedInteger Programming (MIP). MIP is an optimization methodology capable of modeling complex, nonconvex feasible regions, and which is widely used in practice. In particular, MIP allows us to exactly
model the underlying discontinuous reward function, without relying on convex or continuous proxies which may be poor approximations or require careful hyperparameter tuning.
One issue with MIP is that it is not scalable beyond mediumsized instances (roughly speaking, we can potentially solve a MIP with hundreds to thousands variables, but not with about ten thousands variables). In order to deal with the largescale problems in daily auctions, we propose a Linear Programming (LP) relaxation of our proposed MIP formulation. Modern LP solvers, such as Gurobi, are capable to solve very large LPs with millions of variables. The solution to the LP not only provides a valid upper bound to the optimal expected revenue, but can also lead to a acceptable solutions to (1). On the other hand, we show that there exist pathological instances where the LP relaxation can produce arbitrarily bad bounds on the true optimal reward.
Computational validation (Section 4). Finally, we present a thorough computational study on both synthetic and real data. We start with a lowdimensional artificially generated data set where we observe that existing methods, while exhibiting low generalization error, are substantially outperformed by our MIPbased approaches. We also perform an analysis on a real data set comprised of eBay sports memorabilia auctions, where we observe a consistent improvement of our MIPbased methods over existing techniques. In both studies, we observe that our MIP formulation substantially outperforms the LP relaxation, its convex counterpart, suggesting the merit of using principled nonconvex approaches for this problem.
1.2 Related Work
Reserve Price Optimization Reserve price optimization has been widely studied both in both academia and industry due to its critical role in online advertisement. A major difference of our setting and previous works on reserve price optimization is how to utilize the contextual information . Most previous theoretical works proceed under the assumption that the bidding prices come from a certain distribution without the consideration of contextual information. For example, [12] shows a regret minimization under the assumption that all bids are independently drawn from the same unknown distribution; [30] shows the constant reserve is optimal when the distribution is known and satisfies certain regularity assumptions; [2] studies the case when the buyers are strategic and would like to maximize their longterm surplus.
In practice, however, an Ad Exchange logs the contextual information of every auction and utilizes that to determine the future reserve price. For example, in a large field study at Yahoo! [39], the contextual information of actions is used to learn the bidding distribution of buyers, which is then utilized to set up the future reserve price. This is an indirect use of contextual information. In contrast, our optimization problem (1) builds a linear model for reserve price optimization by directly using the contextual information.
To the best of our knowledge, the only work which directly uses the contextual information to set up the reserve price is that of Mohri and Medina [38]. In order to handle the discontinuity in the revenue function , [38] present a continuous piecewise linear surrogate function, and optimize over this surrogate function using differenceofconvex programming. There are several difficulties of the method proposed in [38]: (i) it is highly nontrivial to tune the hyperparameter in the surrogate function, which controls the closeness of the two problems and the hardness to solve the surrogate problem; (ii) the global convergence of differenceofconvex programming is slow (requiring, e.g., a cutting plane or branchandbound method) and requires a exceedingly careful implementation [26], and (iii) it can only find a local optimizer of the surrogate problem. In contrast, we directly solve the reserve price optimization problem (1) by mixedinteger programming.
MixedInteger Programming for Piecewise Linear Functions Mixedinteger programming has long been used to model piecewise linear functions arising in optimization problems arising in a number of application areas as disparate as operations [16, 17, 33], analytics [7, 8], engineering [23, 24], and robotics [19, 20, 32, 37]
. In this literature, our approach is most related to a recent strain of approaches applying mixedinteger programming to model highdimensional piecewise linear functions arising as trained neural networks for various tasks such as verification and reinforcement learning
[4, 3, 40, 41]. Moreover, there are incredibly sophisticated and mature implementation of algorithms for mixedinteger programming (i.e. solvers) that can reliably solve many instances of practical interest in reasonable time frames.Hardness We study the hardness of the reserve price optimization problem (1) and show that it is impossible to solve this optimization problem in polynomial time unless the Exponential Time Hypothesis [27] fails. The exponential time hypothesis is a very popular assumption in computational complexity and it is the basis for many hardness results such as approximating the best Nash equilibrium [11], densest subgraph [10, 27], SVP [1], network design [14], and many others [34, 42, 29, 13, 18, 36].
2 Hardness
In this section we show the hardness of the reserve price optimization problem (1). Specifically, we show that it is not possible to solve this problem in polynomial time unless the Exponential Time Hypothesis fails. We prove this by showing that a polynomial time optimal algorithm for this problem implies a polynomial time constant approximation algorithm for the densest subgraph problem.
Definition 1 (densest subgraph problem).
In the densest subgraph problem, given a graph , where represents the vertex set and represents the edge set. The goal is to find a subgraph with that maximizes .
In fact, there is no  approximation polynomial time algorithm for the densest subgraph problem unless the exponential time hypothesis fails [36], and hence our reduction implies that there is no polynomial time algorithm for the reserve price optimization problem (1), unless the Exponential Time Hypothesis fails.
Theorem 1.
There is no polynomial time algorithm for the reserve price optimization problem (1), unless the Exponential Time Hypothesis fails.
Proof.
Let be an arbitrary input graph to the densest subgraph problem, where is the vertex set of the graph and is the edge set of the graph. We construct an input to the reserve price optimization problem (1) based on , so that if it were possible to solve the reserve price optimization problem for this input in polynomial time, this would imply that it is possible to find an approximate solution to the densest subgraph problem on in polynomial time. However, it is known that it is impossible to give a polynomial time approximation algorithm for the densest subgraph problem unless the exponential time hypothesis fails [36]. This implies that it is impossible to solve the reserve price optimization problem (1) unless the exponential time hypothesis fails.
Next, we explain how to construct an input to the reserve price optimization problem (1) based on . In the optimization problem we set . We have two types of impressions as explained below.

We have impressions , where .

For each edge , we have one impression , where is a feature vector in which the components corresponding to and are , and all other components are .
First, we lower bound the optimal solution of the optimization problem (1) for this input. Consider a densest subgraph of , where is the vertex set of and is the edge set of . We define to be a feature vector in which the features corresponding to the vertices of are , and all other features are . Next we bound . We use this as a lower bound the optimum solution of the optimization problem (1).
Note that , and hence the contribution of each of the first type of impressions to is . Also, for each edge we have and hence the contribution of each of the second type of impressions corresponding to an edge in to is . Therefore, we have
(3) 
Next, we upper bound the optimal solution of the optimization problem (1) for our input. Let be the vector that maximizes . Note that if , the contribution of the first type of impressions is . This means that , which is a contradiction. Therefore, without loss of generality we can assume that .
Let be the set of vertices in with . Let be the subgraph of induced by . Note that if for a vertex we have , then for each edge neighboring , we have . Therefore, we have
(4) 
Now, we put inequalities (3) and (4) together to complete the proof. By the optimality of we have . This together with inequalities (3) and (4) implies that . Moreover, recall that for every vertex in we have . Also, we have . Hence, we have . Given a graph with vertices, one can easily cover the edges with subgraphs of size . By the pigeon hole principle one of these subgraphs contains edges, and hence it is a approximate solution to the densest subgraph. ∎
3 MixedInteger Programming Formulation
In this section, we develop a mixedinteger programming (MIP) formulation for solving (1), study its important computational properties, and discuss how it can be practically used to solve (1).
MIP is an common optimization methodology capable of modeling complex, nonconvex feasible regions. In general, a MIP formulation can model a set as
where is a polyhedron in .
In order to model (1) with MIP, we first start with the graph of the revenue function , which is defined as:
This set is not closed, due to the discontinuity of at the input . However, it is straightforward to compute its closure.
Lemma 1.
The closure of is , where
(5a)  
(5b)  
(5c) 
Moreover, working with the closure does not alter the optimization problem. That is, (1) can be reformulated as the following optimization problem:
(6a)  
s.t.  (6b)  
(6c)  
(6d) 
where the bounds on the variables are computed as and . The next proposition shows this formally.
Proposition 1.
Proof.
First, we show that each optimal solution for (1) has a corresponding feasible point for (6) with equal objective value. Take some optimal for (1). Setting for each , the feasibility of (i.e. ) implies that from the definition of and . Now take for each ; clearly (6c) is satisfied. Therefore, is feasible for (6) and has objective value .
Next, we show that each optimal solution for (6) corresponds to a feasible point for (1) with the same objective value. Clearly is feasible for (1). Additionally, (6c) means that for each , if then , whereas if then . As , the optimality of implies that we must have . Therefore, the objective value of is , giving the result. ∎
Given the representation for the closure of the graph of as a union of three polyhedral sets in Lemma 1, we can now construct a mixedinteger programming formulation for (6c).
Proposition 2.
A valid MIP formulation for the constraint
(7) 
is:
(8a)  
(8b)  
(8c)  
(8d)  
(8e)  
(8f)  
(8g)  
(8h) 
Proof.
Suppose is feasible for (8). It follows from (8f8h) that exactly one component of is equal to one, with the other two components equal to zero. We now consider each of these three cases.
Piecing it all together, we can present a MIP formulation for the original problem (1).
Corollary 1.
Take as the set of all points feasible for (8), given the data , , , and . Then (1) is equivalent to
(9a)  
s.t.  (9b)  
(9c)  
(9d) 
in the sense that: (i) if is an optimal solution to (9), then is an optimal solution to (1), and (ii) if is an optimal solution to (1), then there exists some and such that is an optimal solution to (9).
In the rest of this section, we discuss different aspects of our MIP formulation (8) and its LP relaxation, and in particular how we can utilize it in practice.
The Tightness of Formulation (8)
In general, there will exist many different possible MIP formulations for a given set. One way to measure the quality of a MIP formulation is by inspecting how tightly the LP relaxation approximates the underlying nonconvex set, as MIP formulations with tight relaxations are likely to solve much more quickly than those with looser relaxations [43]. The tightest possible MIP formulation is an ideal formulation, where the extreme points of the LP relaxation are integral. The next proposition shows that (8) is an ideal formulation of set (7).
Proposition 3.
Proof.
Take as the set of all feasible for (8). Using Lemma 1, we can infer that , where is the th unit vector of all zeros except a 1 in the th coordinate. Therefore, it can be expressed as a finite union of bounded polyhedron. Applying techniques due to Balas [5, 6], we can write a lifted representation for the convex hull of , i.e. one with auxiliary and variables:
(10a)  
(10b)  
(10c)  
(10d)  
(10e)  
(10f)  
(10g)  
(10h)  
(10i)  
(10j) 
Moreover, if is the set of all points feasible for (10), it is known that , i.e. the orthogonal projection eliminating the auxiliary variables and yields the convex hull of the set of interest . Therefore, the result follows by explicitly computing this projection, yielding a system of linear constraints equivalent to the LP relaxation of (8), i.e. (8a8g).
Use the three equations (10c), (10e), and (10g) to eliminate the variables. Then we may use the remaining equations (10a10b) to eliminate and , leaving the system
We may then apply the FourierMotkzin elimination procedure (e.g. [15]([Chapter 3.1])) to project out the last remaining auxiliary variable , giving the result. ∎
The feasible region
While the statement of the problem (1) constrains the model parameters to lie within a bounded hypercube, it may be difficult to infer the correct size of the domain a priori. To illustrate this, we present a simple lowdimensional family of instances where the problem data remains bounded in magnitude, but nevertheless the magnitude of the optimal model parameters goes to infinity.
Proposition 4.
Proof.
Parameterize the sequence of instances by . For each , define , , , and . Note that , and so all the problem data is bounded in magnitude by one. The unique optimal solution to (1) is , giving the result. ∎
In other words, we cannot bound the magnitude of the components of an optimal solution solely as a function of , , and the magnitude of the data. However, due to existential representability results [28], applying MIP formulation techniques to model (1) will require a bounded domain on the model parameters. To circumvent this, we model the magnitude of the bounding box as a hyperparameter, and tune it using a validation data set. This is the same approach taken in the differenceofconvex algorithm due to Mohri and Medina [38].
How to use the MIP formulation in practice
In general, mixedinteger programming encompasses a difficult class of problems, in both a theoretical and a practical sense. Nonetheless, there exist exceedingly mature, robust, and sophisticated solvers that are often capable of producing highquality solutions and proofs of optimality for problems of practical interest. These implementations use a variant of branchandbound (e.g. [15, Chapter 1.2]), which attempts to do an enumerative tree search in an efficient manner. However, the solver can be terminated before the search has been exhausted (and optimality proven), and will return the best solution found. In Section 4, we will present two variants of a MIPbased algorithm that use this basic property. The first will terminate the algorithm after a prespecified time budget is exceeded. The second terminates the the solver at the root node
, before the enumerative procedure begins. Up to this point, the solver will have run a bevy of heuristic methods to generate solutions and strengthen its LP relaxation, but will not have begun its enumerative tree search procedure. Crucially, these heuristics will rely on the knowledge that the underlying model is a MIP to produce better solutions and tighter relaxations than are possible with a pure linear programming model like the LP relaxation.
Our MIP formulation (8) comprises two types of constraints: linear equality or inequality constraints (8a8g), and integrality constraints (8h). The linear programming relaxation comprises only the linear constraints, and provides a valid dual upper bound on the optimal reward of a linear programming formulation. Furthermore, for this particular problem, each feasible solution for the linear programming relaxation corresponds to a feasible solution for the original problem (1).
Proposition 5.
Take as the set of all points feasible for the linear programming relaxation (8a8g), given the data , , , and . Then a linear programming relaxation for (1) is
(11a)  
s.t.  (11b)  
(11c)  
(11d) 
in the sense that the optimal reward of any feasible solution for (11) upper bounds the reward of any feasible solution for (1). Moreover, for any feasible solution to (11), is a feasible solution to (1).
Proof.
Therefore, a third approach to solve (1) is simply to solve the linear programming relaxation. Linear programming problems can be solved in polynomial time, and there exist algorithms that can very efficiently solve large scale problem instances. Therefore, the approach of Proposition 5 can be applied to very large scale instances of the problem (1).
The quality of the LP relaxation
As shown in Proposition 5, the linear programming relaxation offers an alternative approach for heuristically solving the problem (1). Roughly, the quality of the resulting solution will depend on the strength of the relaxation, i.e. how closely it approximates the convex hull of all feasible points for the MIP formulation (9). Additionally, modern MIP solvers depend heavily on the quality of this relaxation to converge quickly by pruning large swaths of the search tree in the hopes of keeping computation times manageable.
A straightforward corollary of Proposition 3 is that, if , the LP relaxation (11) is exact, and so exactly represents the convex hull of feasible points for (9). Unfortunately, the composition of ideal formulations will, in general, fail to be ideal. In this subsection, we show that when is permitted to grow, the LP relaxation (11) can be of arbitrarily poor quality.
Proposition 6.
Proof.
Consider the following problem instance parameterized by a positive integer . Take , , and . Furthermore, for each , define , , and . Similarly, for each , define , , and . From inspection, we can observe that for any , there is at most one with . Therefore, we can infer that the optimal reward for (1) is , which can be attained by setting for any .
In contrast, the LP relaxation bound can be bounded below by a constant. By projecting out the auxiliary variables from the LP relaxation (8a8g), we can compute that the convex hull of is
Furthermore, for each we can computer valid bounds on as and . Similarly, valid bounds for each are and . Piecing it all together, we now fix , which due to (11b) will fix for each . Accordingly, the largest value we may set such that is . Similarly, the maximum allowed value for each such that (11c) is satisfied is . The reward at this LP feasible point is then
∎
4 Computational study
In this section, we perform a computational study on the efficacy of our proposed methods on both synthetic data and real data.
4.1 Implementation Details
4.1.1 Methods
Throughout, we compare six methods:

CP: – This is an optimal constant reserve price policy (i.e, set the reserve price as a constant for all samples without using contextual information). It is used as a benchmark to measure the gain from contextual information.

MIP: (9) – The MIP formulation terminated after a time limit (to be specified in subsequent subsections).

MIPR: (9) – The MIP formulation, terminated at the root node.

DC: The differenceofconvex algorithm of Mohri and Medina [38].

UB: – This is a perfect information upper bound equal to the average first bid price. This is the largest reward that can possibly be garnered from the auction. Note that this may be quite a loose upper bound, as in general there will not exist a linear model capable of setting such reserve prices given the contextual information.
4.1.2 Hyperparameter Tuning
The DC, LP, MIPR, and MIP algorithms require that the model domain is explicitly specified. We utilize cross validation to tune the domain size as for . This cross validation step is the same as is done by Mohri and Medina [38].
Additionally, the DC algorithm utilizes a continuous piecewise linear function to approximate the discontinuous reward function . Thus, it requires another hyperparameter for the “slope” of the linear approximation. We do the same crossvalidation on this hyperparameter as suggested in Mohri and Medina [38].
4.1.3 Evaluation
For each experiment, we report the average reward (i.e. ) of the final model from each algorithm on both the train and test data sets. Additionally, we report the proportion of sold impressions, namely, the proportion of impressions that the set reserve price is less than the bid price. Finally, we use the “gap closed” metric to measure the improvement of our proposed MIP algorithm over DC, the best existing algorithm from the literature. Mathematically, we compute the gap closed as , where in an abuse of notation we use the algorithm names to denote their respective rewards. Note that UB serves as an upper bound on the best possible linear model, which can be a conservative estimation.
4.1.4 Implementation
We implement our experiment in Julia [9], using JuMP [21, 35] for modeling the MIP, MIPR, LP, and DC formulations. We use Gurobi v8.1.1 [25] to solve the optimization problems underlying MIP, MIPR, LP, and DC
. We intend to opensource our implementation of the methods in the near future.
4.2 Synthetic data
4.2.1 Data Generation.
Here we describe how we generate our synthetic data . First, the feature vectors
are generated i.i.d. from a Gaussian distribution with identity covariance matrix, i.e.,
, normalized so that . In order to generate the bidding prices and , we assume there are two buyers, and they have underlining generative parameters and , such that their bids come from lognormal distributions as and , wherecontrols the signaltonoise ratio of the lognormal distribution. We then set
and , where is a dilation factor to enlarge the difference between and .^{2}^{2}2We note that the dilation factor is similar to the scaling of the linear functions used in the synthetic data generative model used in [38]. Moreover, the underlying parameters and of the two buyers should be correlated, since the bidding prices for highvalued slots should be high for all buyers. In order to model this, we set and , where and controls the correlation between and .Overall, we have three parameters in the data generation process: controls the signaltonoise level of the model, controls the similarity between two buyers, and controls the degree of flexibility the seller has when setting a reserve price.
4.2.2 Experimentation
We fix features, training samples, along with test and validation data sets comprising 5000 samples each. We first set a “baseline” configuration for our generative model with , , and . To explore the robustness of our model to changes in the data generation scheme, we then study three variants of this baseline with “high noise” (), “low correlation” (), and “low margin” (). For each of these four parameter settings, we present aggregate results over three trials in Tables 14. In these experiments, we use a time limit for MIP of 3 minutes.
reward  proportion sold  

method  train  test  train  test 
CP  1.004  0.980  0.820  0.800 
LP  0.944  0.940  0.613  0.619 
MIP  1.511  1.486  0.998  0.990 
MIPR  1.484  1.461  1.0  0.991 
DC  0.786  0.735  0.996  0.991 
UB  1.682  1.672  1.0  1.0 
reward  proportion sold  

method  train  test  train  test 
CP  1.024  1.012  0.828  0.818 
LP  0.841  0.839  0.544  0.550 
MIP  1.376  1.352  0.978  0.964 
MIPR  1.195  1.185  0.923  0.917 
DC  0.783  0.757  0.965  0.963 
UB  1.815  1.804  1.0  1.0 
reward  proportion sold  

method  train  test  train  test 
CP  1.031  1.028  0.844  0.841 
LP  0.921  0.915  0.560  0.553 
MIP  1.497  1.487  0.999  0.988 
MIPR  1.477  1.471  1.0  0.990 
DC  0.800  0.776  0.993  0.985 
UB  1.762  1.769  1.0  1.0 
reward  proportion sold  

method  train  test  train  test 
CP  0.918  0.908  0.983  0.979 
LP  0.908  0.894  0.796  0.794 
MIP  1.123  1.097  0.999  0.985 
MIPR  1.002  0.988  0.937  0.932 
DC  0.947  0.917  0.999  0.991 
UB  1.256  1.246  1.0  1.0 
In all four experiments, we observe that MIP offer a considerable improvement over DC. On the baseline configuration, MIP closes an average of 80.82% of the gap left by DC on the training set, and 80.04% of the gap left remaining on the test set. Unsurprisingly, the high noise configuration leads to degradation of performance with respect to the perfect information upper bound, but MIP is still able to close 57.5% and 56.87% of the gap on the training and test data sets, respectively. The low correlation configuration sees MIP closing 72.68% and 71.76% of the remaining gap on training and test data sets, respectively, while on the low margin configuration MIP closes 56.76% of training gap and 54.67% of testing gap.
While MIPR does not quite attain the same level of performance as MIP, it is quite close and still handily outperforms DC both in and outofsample. The LP method also outperforms DC on three of four experiments, albeit by a smaller margin. Indeed, the DC algorithm is unable to recover the performance of the constant policy that completely disregards contextual information on three of the four experiments. This is despite the fact that its model leads to a sale on nearly every impression. In other words, the DC model fails by not setting reserve prices aggressively enough. In contrast, the LP algorithm sets reserve prices too aggressively, leading to a model that successfully completes an auction in only slightly more than half of all impressions. The MIP and MIPR methods both attain proportions sold near 1 while attaining very high reward. This indicates that they are not exploiting a small number of impressions that garner a high reward, but instead are intelligently setting a reserve price policy that captures excess reward across the population, without too aggressively setting the prices so that many impressions fail to sell.
4.3 eBay auctions for sports memorabilia
In this section, we turn our attention to a real data set. In particular, we utilize a published mediumsize eBay data set for reproducibility, which comprises sports memorabilia auctions, to illustrate the performance of our algorithms. The data set is provided by Jay Grossman and subsequently studied in the context of reserve price optimization in [38].^{3}^{3}3The dataset can be accessed at https://cims.nyu.edu/~munoz/data/. There are features in the data set, including seller information (e.g. seller rating and seller location), as well as item information. We refer the reader to [38] for a more detailed description of the data set. Finally, we set a time limit for MIP of 5 minutes, and note that we preprocess the data by normalizing the bidding prices with the mean of their first prices.
Table 5 and Table 6 depict the average and the confidence interval of the cumulative reward and of the proportion sold using different algorithms on both training and testing data set over random runs. In both, we use 2000 randomly selected samples from the data set for testing and validation. In Table 5, we train using 2000 randomly selected samples, while in Table 6 we utilize training samples.
In Table 5, MIP outperforms all other methods, producing the best performing models as measured on both the training and testing data sets. The DC algorithm is the next best performer, producing higher quality models than both LP and MIPR. Indeed, MIP closes 7.39% of the gap left by DC on the training data set, with respect to the conservative UB upper bound. However, due to a lack of generalization, this number shrinks considerably to 1.66% on the test data set. There is no doubt that DC has a smaller generalization gap, although one plausible explanation for this could be the additional hyperparameters tuned over in the DC method. Moreover, we emphasize that these gaps are computed based on a conservative upper bound (i.e., UB) which, as observed in Section 4.2, may be quite loose.
reward  proportion sold  

method  train  test  train  test 
CP  0.563  0.568  0.922  0.925 
LP  0.653  0.639  0.548  0.540 
MIP  0.726  0.714  0.995  0.983 
MIPR  0.657  0.652  0.851  0.843 
DC  0.704  0.709  0.998  0.989 
UB  0.992  1.014  1.0  1.0 
In order to understand the behavior of the algorithms in a larger data context, we increase the training data sample size to 5000 and repeat the eBay experiments. The results are depicted in Table 6. While the rankings of the algorithms remains the same, MIP is able to extract more information from the larger data set. While the training reward grows, the models produced generalize much more successfully to the testing data set. In contrast, the DC algorithm appears unable to exploit the extra available data, with train and test accuracy that remain nearly identical with the previous experiment. Indeed, MIP is able to close 9.11% of the remaining gap on the training data set, and 7.01% on the testing data set.
Comparing Table 5 and Table 6, we can clearly see that the difference in reward produced by MIP between the training and testing data sets decreases as number of samples increases. This is intuitively consistent with what could be expected from a learning theory analysis, and we expect that this gap will likely keep shrinking in the “big data” regime as we further enlarge the training sample size.
reward  proportion sold  

method  train  test  train  test 
CP  0.564  0.567  0.943  0.944 
LP  0.643  0.635  0.527  0.523 
MIP  0.731  0.725  0.994  0.991 
MIPR  0.596  0.596  0.999  0.997 
DC  0.704  0.704  0.998  0.996 
UB  1.002  0.999  1.0  1.0 
5 Conclusion
In this paper, we study the linear model for reserve price optimization in a secondprice auction. We first show that this is indeed a hard problem – unless the Exponential Time Hypothesis fails, there is no polynomial time optimal algorithm. Then we propose a mixedinteger programming formulation to exactly model this problem, and we show that this is ideal (i.e. the strongest possible formulation) when the number of sample . Since it can be computationally expensive to exactly solve the mixedinteger programming, we study the performance of its linear programming relaxation. Unfortunately, we provide a counterexample to show that, in the worst case, the objective gap between the linear programming relaxation and the true problem can scale linearly in the number of samples. Finally, we present a computational study of our methods on both synthetic dataset and real dataset, showcasing the advantages of our proposed methods.
References

[1]
(2018)
(Gap/s) eth hardness of svp.
In
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
, pp. 228–238. Cited by: §1.1, §1.2.  [2] (2013) Learning prices for repeated auctions with strategic buyers. In Advances in Neural Information Processing Systems, pp. 1169–1177. Cited by: §1.2.
 [3] (To appear) Strong mixedinteger programming formulations for trained neural networks. Mathematical Programming. Cited by: §1.2.

[4]
(2019)
Strong mixedinteger programming formulations for trained neural networks.
In
Proceedings of the 20th Conference on Integer Programming and Combinatorial Optimization
, A. Lodi and V. Nagarajan (Eds.), Cham, pp. 27–42. Note: https://arxiv.org/abs/1811.08359 Cited by: §1.2.  [5] (1985) Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM Journal on Algorithmic Discrete Methods 6 (3), pp. 466–486. Cited by: §3.
 [6] (1998) Disjunctive programming: Properties of the convex hull of feasible points. Discrete Applied Mathematics 89, pp. 3–44. Cited by: §3.
 [7] (201707) Optimal classification trees. Machine Learning 106 (7), pp. 1039–1082. Cited by: §1.2.

[8]
(2015)
An algorithmic approach to linear regression
. Operations Research 64 (1), pp. 2–16. Cited by: §1.2.  [9] (2017) Julia: A fresh approach to numerical computing. SIAM Review 59 (1), pp. 65–98. Cited by: §4.1.4.
 [10] (2017) ETH hardness for densestksubgraph with perfect completeness. In Proceedings of the TwentyEighth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 1326–1341. Cited by: §1.1, §1.2.
 [11] (2014) Approximating the best nash equilibrium in no (log n)time breaks the exponential time hypothesis. In Proceedings of the twentysixth annual ACMSIAM symposium on Discrete algorithms, pp. 970–982. Cited by: §1.1, §1.2.
 [12] (2014) Regret minimization for reserve prices in secondprice auctions. IEEE Transactions on Information Theory 61 (1), pp. 549–564. Cited by: §1.2.
 [13] (2012) The exponential time hypothesis and the parameterized clique problem. In International Symposium on Parameterized and Exact Computation, pp. 13–24. Cited by: §1.1, §1.2.
 [14] (2014) A tight algorithm for strongly connected steiner subgraph on two terminals with demands. In International Symposium on Parameterized and Exact Computation, pp. 159–171. Cited by: §1.2.
 [15] (2014) Integer programming. Springer. Cited by: §3, §3.
 [16] (200309) A comparison of mixedinteger programming models for nonconvex piecewise linear cost minimization problems. Management Science 49 (9), pp. 1268–1273. Cited by: §1.2.
 [17] (2007JanuaryFebruary) Variable disaggregation in network flow problems with piecewise linear costs. Operations Research 55 (1), pp. 146–157. Cited by: §1.2.
 [18] (2015) Lower bounds based on the exponentialtime hypothesis. In Parameterized Algorithms, pp. 467–521. Cited by: §1.1, §1.2.
 [19] (2014) Footstep planning on uneven terrain with mixedinteger convex optimization. In 2014 14th IEEERAS International Conference on Humanoid Robots (Humanoids), pp. 279–286. Cited by: §1.2.
 [20] (2015) Efficient mixedinteger planning for UAVs in cluttered environments. In IEEE International Conference on Robotics and Automation, pp. 42–49. Cited by: §1.2.

[21]
(2017)
JuMP: A modeling language for mathematical optimization
. SIAM Review 59 (2), pp. 295–320. Cited by: §4.1.4.  [22] (2010) Networks, crowds, and markets. Vol. 8, Cambridge university press Cambridge. Cited by: §1.
 [23] (2014) Mixedinteger linear methods for layoutoptimization of screening systems in recovered paper production. Optimization and Engineering 15, pp. 533–573. Cited by: §1.2.

[24]
(1990)
Simulation of hybrid circuits in constraint logic programming
. Computers and Mathematics with Applications 20 (9–10), pp. 45–56. Cited by: §1.2.  [25] (2020) Gurobi optimizer reference manual. External Links: Link Cited by: §4.1.4.
 [26] (1999) DC programming: overview. Journal of Optimization Theory and Applications 103 (1), pp. 1–43. Cited by: §1.2.
 [27] (2001) On the complexity of ksat. Journal of Computer and System Sciences 62 (2), pp. 367–375. Cited by: §1.2.
 [28] (1984) Modelling with integer variables. Mathematical Programming Study 22, pp. 167–184. Cited by: §3.
 [29] (2013) Complexity of sat problems, clone theory and the exponential time hypothesis. In Proceedings of the twentyfourth annual ACMSIAM symposium on Discrete algorithms, pp. 1264–1277. Cited by: §1.1, §1.2.
 [30] (2017) Dynamic reserve prices for repeated auctions: learning from bids. Available at SSRN 2444495. Cited by: §1.2.
 [31] (1975) On the computational complexity of combinatorial problems. Networks 5 (1), pp. 45–68. Cited by: §1.1.
 [32] (2016) Optimizationbased locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots 40 (3), pp. 429–455. Cited by: §1.2.
 [33] (201502) Global optimization method for network design problem with stochastic user equilibrium. Transportation Research Part B: Methodological 72, pp. 20–39. Cited by: §1.2.
 [34] (2011) Lower bounds based on the exponential time hypothesis. Bulletin of the EATCS (105), pp. 41–72. Cited by: §1.1, §1.2.
 [35] (2015Spring) Computing in operations research using Julia. INFORMS Journal on Computing 27 (2), pp. 238–248. Cited by: §4.1.4.
 [36] (2017) Almostpolynomial ratio ethhardness of approximating densest ksubgraph. In STOC, pp. 954–961. Cited by: §1.1, §1.2, §2, §2.
 [37] (2012) Mixedinteger quadratic program trajectory generation for heterogeneous quadrotor teams. In IEEE International Conference on Robotics and Automation, pp. 477–483. Cited by: §1.2.
 [38] (2016) Learning algorithms for secondprice auctions with reserve. The Journal of Machine Learning Research 17 (1), pp. 2632–2656. Cited by: §1.2, §3, item 5, §4.1.2, §4.1.2, §4.3, footnote 2.
 [39] (2011) Reserve prices in internet advertising auctions: a field experiment.. EC 11, pp. 59–60. Cited by: §1.2.
 [40] (2019) CAQL: Continuous action Qlearning. Note: https://arxiv.org/abs/1909.12397 Cited by: §1.2.
 [41] (2019) Verifying neural networks with mixed integer programming. In International Conference on Learning Representations, Cited by: §1.2.
 [42] (2015) Hardness of easy problems: basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In 10th International Symposium on Parameterized and Exact Computation (IPEC 2015), Cited by: §1.1, §1.2.
 [43] (2015) Mixed integer linear programming formulation techniques. SIAM Review 57 (1), pp. 3–57. Cited by: §3.
Comments
There are no comments yet.