1 Introduction
One of the main challenges of mathematical optimization is to construct a mathematical model describing the properties of a system. When the structure of the system cannot be fully determined from the knowledge at hand, machine learning and data mining techniques have been used in optimization instead of such models.They have, for example, been used as decision values
[1], fitness functions [2], or model parameters [3]. To the best of our knowledge, so far models learned by data mining or machine learning have only been used for optimization in a blackbox manner, e.g., using only the predictions of learned models but not their internal structure. In contrast to such blackbox approaches, in this paper, we develop a novel whitebox optimization method, that is, we map entire models to a set of integer linear programming constraints such that the internal structure of the models is used during problem solving. The proposed whitebox method together with a blackbox method also provide a solution to an optimization problem of key interest to the artificial intelligence and operations research communities: auction design. We briefly introduce this domain before going into the details of our methods.
1.1 Sequential auction design
Auctions are becoming increasingly popular for allocating resources or items in businesstobusiness and businesstocustomer markets. Often sequential auctions [4] are adopted in practice, where items are sold consecutively to bidders. Sequential auctions are in particular desirable when the number of items for sale is large (e.g., flower auctions [5]), or when the buyers enter and leave the auction dynamically (e.g., online auctions [6]). In a sequential auction, an auctioneer may tune several auction parameters to influence the outcome of an auction, such as reserve prices for items and in which order to sell them. In other words, (s)he can design auctions for the purpose of achieving some predefined goal. In this paper, we solve one specific auction design problem, namely, deciding the optimal ordering of items to sell in a sequential auction in order to maximize the expected revenue (OOSA in short). We assume bidders in such auctions are budget constrained. This is a highly relevant problem in today’s auctions since bidders almost always have limited budget, as seen for instance in industrial procurement [7]. Previous research has shown that with the presence of budget constraints, the revenue collected by the auctioneer is heavily dependent on the ordering of items to sell [8, 9, 10]. This holds already for a toy problem with 2 items. Let us use a simple example to illustrate the importance of ordering in such cases.
Example 1
Two agents and take part in a sequential auction of items. For sale are items and . The values of these items for the agents are given as follows: . In addition, both and have a budget limit of . Assume the reserve prices (i.e., the lowest price at which the auctioneer is willing to sell) for both items are 4.
Suppose the items are sold by means of firstprice, English auction^{1}^{1}1The English auction that we consider is the one where the starting price is the reserve price, and bidders bid openly against each other. Each subsequent bid should be higher than the previous bid, and the item is sold to the highest bidder at a price equal to her bid.. We assume a simple bidding strategy in this example. Each agent ’s reservation price (i.e., the highest price is willing to pay) for item is . Agents bid myopically on each item, i.e., they will bid up to their reservation price, if their budgets allow.^{2}^{2}2We will discuss different bidding strategies in Section 5. The auctioneer’s goal is to maximize the revenue, i.e., the total sold price of the items.
Consider one situation where the auctioneer sells first and then . will get when she bids 5, and will not be sold since is not interested in it and is out of budget. The total revenue is . However, if the selling sequence is (, ), will win with the reserve price 4, and then will win with price 4. The collected revenue is 8 in this case.
Most of the current approaches to the ordering problem in sequential auctions assume a very restricted market environment. They either study the problem of ordering two items, see [11, 12], or a market with homogeneous bidders [13]. To the best of our knowledge, we are the first to consider how to order items for realistic auction settings with many heterogeneous bidders competing for many different items. This problem is highly complex—a good design on ordering needs to take care of many uncertainties in the system. For instance, in order to evaluate the revenue given an ordering, the optimization algorithm needs to know the bidders’ budgets and preferences on items, which is usually private and unshared. Furthermore, the large variety of possible bidding strategies that bidders may use in auctions are unknown. This auction design problem is a typical example where the mathematical optimization model cannot be fully determined, and hence, machine learning and data mining techniques can come into play. This is exactly what our approach builds upon.
1.2 Learning models for whitebox and blackbox optimization
Nowadays more and more auctions utilize information technology, which makes it possible to automatically store detailed information about previous auctions along with their orderings and the selling price per auctioned item. Our approach to solving the problem of optimal ordering for sequential auctions starts with the historical auction data. We define and compute several relevant features and then use them to learn regression trees and linear regression models for the expected revenue. Given the models, we propose two approaches to find the optimal ordering for a new set of items: (1) a bestfirst search that uses the models as a blackbox to evaluate different orderings of the items; and (2) a novel whitebox optimization method that translates the models and the set of items into a mixedinteger program (MIP) and runs this in an ILPsolver (CPLEX). Figure 1 displays the general framework of our approaches using these two optimization methods.
Just like the traditional blackbox optimization approach (see, e.g. [14, 15]
), our bestfirst search is ignorant of the internal structure of the models and only calls it to perform function evaluations, i.e., predicting the revenue of an ordering of the items. Optimization is possible by means of a search procedure that uses heuristics to produce new orderings depending on previously evaluated ones. Our bestfirst search makes use of dynamic programming cuts inspired by sequential decision making in order to reduce the search space.
One of the main contributions of this paper is the realization that learned models can be evaluated efficiently inside modern mathematical optimization solvers. This evaluation includes the computation of feature values (the input to machine learning), the evaluation of these features using a learned model (the output from machine learning), and a possible feedback from such evaluations to new features. In this paper, we efficiently translate all of these steps for two types of learned models (regression trees and linear regression models) into mixedinteger constraints. The resulting mixedinteger program can then be evaluated in any modern integer linear programming (ILP) solver. We call the resulting method whitebox optimization with learned models.
In this way, modern exact solvers can be used instead of a heuristic search. These solvers use (amongst others) advanced branchandbound methods to cut the search space, compute and optimize a dual solution, and can prove optimality without testing every possible solution. This is the main benefit of using the whitebox method over a blackbox one. The downside, however, is that when the learned model is complex, the whitebox method may lead to a large mathematical model that is difficult to optimize. We compare these two approaches and investigate this tradeoff by applying them to the OOSA problem.
Contributions and organization
Although we use sequential auction design to illustrate our method, all of our constructions are general and can be applied to any optimization setting where unknown relations can be represented using regression models that have been learned from data. The only constraint for using the whitebox method is that the feature values need to be computable using integer linear functions from intermediate solutions. Our approach can thus be applied to complex optimization settings where entire orders, schedules, or plans need to be constructed beforehand.
We list our contributions as follows:

We demonstrate how to apply regression methods from machine learning to OOSA.

We give an efficient encoding of regression trees and linear regressors into MIP constraints.

We prove OOSA with budget constrained bidders to be NPhard, also when using these regression models.

We provide the first method that tackles OOSA in realistic settings.

We automate the construction of mathematical models using machine learning.
In Section 2, we formally introduce the problem of optimal ordering for sequential auctions (OOSA), and then we show how to learn regression models from historical auction data in Section 3 using standard machine learning methods. Based on the learned models, our whitebox optimization method and a blackbox optimization are introduced to find the optimal ordering for OOSA in Section 4. Extensive experiments are presented in Section 5 where we compare the performance of the two proposed optimization methods using both the learned models and the auction simulator. Before we conclude, we compare and discuss more related works in Section 6.
2 Optimal ordering for sequential auctions OOSA
We assume there is a finite set of bidders (or agents). Let denote the collection of the item types, and the quantity of each item type can be more than 1. When it is clear from the context, we will slightly abuse the notation and use to denote the multiset of all available items. Each bidder agent has a valuation (or preference) for each type of item . In addition, each agent has a budget on purchasing items, and (s)he desires to win as many items as are being auctioned within the budget limit.
In one auction, a set of items with type set will be auctioned sequentially using a predetermined order. For example, given types and with quantities of 1 and 2 respectively, there are three possible orderings of resources: , , and . For each that is being auctioned, agent puts a bid on based on its valuation function if its remaining budget allows. The agent who bids highest on wins . This sequential auction ends when all items have been auctioned, or when all agents run out of their respective budgets.
We assume that such an auction is repeated over time, and each auction sells possibly different items . At the end of each auction, we have the following information at our disposal: (1) the ordering of auctioned items; and (2) the allocation of items to agents with their winning bids. The optimization problem we study is: given a set of items and budget constrained bidders, finding an optimal ordering of items in sequential auctions such that the expected revenue is maximized. We call it OOSA. We now show that the decision version of this optimization problem is NPhard, even if we have complete information on bidders’ preferences and they are not strategic (i.e., they bid truthfully according to their true preferences).
Theorem 1
Given a set of items , preferences , and budgets for every bidder . The problem of deciding whether there exists an ordering that obtains a revenue of at least is NPhard.
Proof 1
By reduction from the wellknown NPhard partition problem [16]: Given a set of integers , is dividable into two sets and such that ? We need two bidders with preferences such that and for . The reserve price for each item () is . The agents’ budgets are and . The set of items is and is . We claim that is partitionable into two sets with equal sums if and only if there exists an ordering that obtains a revenue of (or more).
Given a partition of into sets and , we sell all items in first, and those in later. In this case, agent will buy all items in with price as it is the minimal bid to win the items from agent . After buying all items in , agent will have spent , which makes in total (since ). This is the entire budget of agent . All items in are then sold to agent with the reserve price . Thus agent pays . This makes a total revenue of .
Suppose we have an ordering such that agent and spend all of their budget ( in total). This means that agent wins the first set of items A, each costing till it uses all its budget. Thus we have , i.e., . Agent pays for the remaining items in , , and it uses all its money: . Hence, we have a partition of where .
The construction is clearly polynomial time.
Several related works deal with this type of ordering optimization problem. For example, in the Economics literature, the authors of [11] investigate the optimal ordering strategy for the case where the auctioneer has two items to sell. They show that when the items are different in value, the higher valued items should be auctioned first in order to increase the seller’s revenue. Pitchik [12] points out that in the presence of budget constraints, in a sealedbid sequential auction, if the bidder who wins the first good has a higher income than the other one, the expected revenue is maximized. With these greatly simplified auction settings, it is possible to derive bidders’ equilibrium bidding strategies. Hence, with some assumptions on the distributions of bidders’ budgets and valuations of items, the optimal ordering can be theoretically derived. However, as realworld auctions are much more complex and uncertain in terms of the sizes of items/bidders, agents’ preferences and bidding strategies, these existing results cannot be applied. In this paper, we instead focus on learning the overall behaviors of the group of bidders from historical auction data by machine learning techniques, as the first step of solving OOSA.
In order to apply ML techniques, we assume in every sequential auction the set of participating bidders and their characteristics (preferences, budgets, bidding strategies) to be similar. This simplifies the problem of learning a good ordering. Instead of learning the individual valuations/budget/bidding strategies of agents, we can treat the agent population as a single entity for which we need to find a single global function. Obviously, such an approach will fail if the agents are radically different in every auction. However we consider this assumption sensible in many auction settings such as industrial procurement auctions where the same companies repeatedly join the auctions with similar interests, and the Dutch flower auction where there can be different bidders every day, but it seldom occurs that one day bidders are only interested in roses and the next day they only want tulips. Although the different participants can be interested in different item types, the interests of the group of participants remain stable.
3 Learning predictive models for OOSA
At the end of each sequential auction, we have the following information at our disposal: (1) the ordering of auctioned items; and (2) the revenue of each sold item. We need to find a suitable way to model the expected revenue of such orderings. An ordering can be thought of as a sequence of items. However, to the best of our knowledge none of the existing sequence models fit our auction setting, see also Section 6.2
. What comes closest to our auction setting are models such as Markov decision processes (MDPs)
[17]. These directly model the expected price per item and come with methods that can be used to optimize the expected total reward (revenue). However, an implementation of the auction design problem as an MDP is not easy. We discuss this in more detail in Section 6.2.3.1 A regression model for OOSA
Instead of representing our problem with a sequence model such as an MDP, we view the prediction of an auction’s outcome as a regression problem. We split this problem into the subproblems of predicting the value of the auctioned items. We then sum these up to obtain the overall objective function:
where is a regression function that determines the expected value of given that was auctioned before and will be auctioned afterwards. The main benefit of this representation is that modern machine learning methods can be used to learn this function from data. In addition, since every item sold represents a single sample, every auction contains many samples that can be used for learning, further reducing the amount of required data.
3.2 The regression functions
In this paper, we use regression trees [18] and least absolute shrinkage and selection operator (LASSO) [19] as regression functions, and train them using features based on the items auctioned before and after the current item. We first briefly introduce these regressors.
Regression trees are a form of decision trees where the predicted values are real numbers. A decision tree is one of the most popular predictive models for mapping feature values to a target value. It is a treeshaped graph with a root node, interior nodes, and leaf nodes. The root and every interior node contains a Boolean test for a specific feature value
, such as . Every leaf node contains an output value . It maps the feature values to an output by performing all the tests along a path from the root to a leaf. For every test performed, if the outcome is true (), the path is continued along the left branch, if the outcome is false (), the path is continued along the right branch. Once a leaf is reached, it outputs the value it contains .A regression tree learner aims to find a tree that minimizes the mean squared error of the predicted and the actual observed values. Most regression tree learning algorithms follow a greedy strategy that splits interior nodes as long as the decrease in error is significant. A split replaces one leaf node by an interior node connected with two new leaf nodes. The interior node receives as Boolean constraint one that minimizes the meansquared error of the resulting tree, where the leaf nodes predict the mean value of all observed data values that end up in that leaf after mapping all data samples to leaf nodes.
LASSO is a method for constructing a linear regression function , where is the value to predict, are constants, are feature values, and
is the intercept. The standard approach to find such a function is to minimize the mean squared error, which is easy to compute. LASSO is a popular regularized version of this simple estimation that penalizes the absolute values of the regression coefficients
. Formally, given a dataset of features and target values , with where is the number of samples, it uses convex optimization in order to find a regression function that solves the following problem^{3}^{3}3This is the version implemented in the scikitlearn Python package [20], which we use to learn the models.:where is a parameter for the effect of the regularization. Intuitively, the larger , the larger the penalty of having large coefficients . Consequently, a larger value of
will drive more coefficients to zero. LASSO is a useful method when there are several correlated feature values, which could make an ordinary least squares model overfit on these values. We use LASSO regression because more zero coefficients implies we need to compute less feature values in order to evaluate the learned model, which has a positive effect on the optimization performance that we will discuss in Section
4.Currently, we provide the following features for these two regression models:
 Feature 1: sold

For every item type , the amount of items already auctioned.
 Feature 2: remain

For every item type , the amount of items still to be auctioned.
 Feature 3: diff

For every pair of item types and , the difference between the amount of and items already auctioned.
 Feature 4: sum

For every item type , the amount of value obtained from auctioning items, and the overall sum.
 Feature 5: index

For every item, the index at which it was auctioned.
Other sequential features such as sliding windows and Ngrams can of course be added to the model. However, since our whitebox method computes these values inside an ILP solver, the only requirement is that they can be represented using an integer linear formulation. Although the
diff feature can be determined using the first, we add it for convenience of learning a regression tree, which requires many nodes to represent such values. The influence of budget constraints is only directly modeled by the fourth feature: once the amount paid for items reaches a certain (to be learned) bound, we can expect all agents that only want items to be out of budget. Although, there only exists an indirect relation between the budget constraints and the first three features, including them can be beneficial and these are easier to compute. If used by the regression model, these features thus reduce the time needed to solve the auction design problem. For similar reasons, we add the last feature. Below we give an example of how an ordering and its obtained values is transformed into a data set using these 5 types of features.type  value  sold  sold  diff  sum  sum  sum  index 
4  0  0  0  0  0  0  1  
4  1  0  1  4  0  4  2  
5  0  0  0  0  0  0  1  
0  0  1  1  0  5  5  2 
Example 2
Consider the setting of Example 1. Assume two auctions have been carried out. One sold first and then . The other reversed. As shown in Example 1, the first auction would obtain a revenue of 8, and the second auction would receive 5. We compute feature values from these two auctions as depicted in Table 1. Subsequently, we learn regression trees for both item types and , as shown in Figure 2.^{4}^{4}4The learned linear regression model is more straightforward. Hence we skip such an example here.
After learning these regression trees, we can optimize the ordering for a new (unseen) multiset of items by trying all orderings and choosing one with maximum expected revenue: gives , gives , gives , and returns . Hence, the optimizer will choose to schedule the item before all items. In general, trying all possible orderings will be impossible: for a multiset of items of types, there are a total of unique orderings, which blows up very quickly.
3.3 Modeling power and tradeoff
Our method of regression modeling allows the use of any regression method from machine learning for predicting unknown quantities in optimization, such as objective values and parameters. In addition, since the regression function uses other values in a (proposed) solution as input () instead of only external parameters/data, a learned regression model represents unknown relations between the different values in a solution. The model thus answers the question “What is the value of X given that we do Y?”, as opposed to “What is the value of X?” that is answered by fitting only model parameters. Answering the first question allows for many more interesting possibilities. For instance, one could use stochastic optimization with fitted parameters to produce a schedule, use regression models to predict the effect of this schedule on the parameters, and use stochastic optimization again on the newly estimated parameters. This way, one can use machine learning tools to plan further ahead. Using our whitebox method, this can even be done using a single call to the optimization software.
This loopback functionality provides a lot of power to our method, but also comes with a risk. Every time the predictive models are used there is a probability that the predictions are inaccurate. When using a loopback, these possible inaccuracies influence all future predictions that depend on it. These future predictions are thus more inaccurate and the predicted overall objective value can potentially diverge from the true value. These cascading inaccuracies are an issue, however, the added modeling power makes up for it. We make use of it in the
sum feature, which relates the predicted value to the predictions of earlier auctioned items. This feature is very important for predicting budget constraints, and consequently is often used by the regression models to produce predictions.4 Whitebox and blackbox optimization for OOSA
Given the predictive models for the expected value per item, it is not straightforward to compute a good ordering as we already showed in Example 2. For a given ordering, we can predict the individual revenues of items using the regression model, and sum these up to obtain the revenue of the ordering. However, testing all possible orderings and choosing the one with the highest revenue will take a very long time. For instance, when we want to order 40 items of 8 types (the experimental setting in Section 5) with 5 of every type, we will need to test possible unique orderings.
In A, we also provide hardness results that demonstrate there is little hope (unless ) of finding an efficient (polynomial time) algorithm that gives the optimal ordering for any regression tree or linear regression predictor. In general, we cannot do better than performing a guided search through the space formed by all possible orderings. We present two such searchbased optimization methods: (1) a novel “whitebox” optimization (i.e., ILP model), and (2) a “blackbox” heuristic (i.e., bestfirst search).
4.1 Whitebox optimization: an ILP model
Given regression tree and linear regression models for the expected value per item type, we automatically formulate the problem of finding an optimal ordering as a mixed integer linear program (MIP). We discuss the encoding of a sequential auction, feature values, objective function, and translating the learned models (regression tree and linear regression respectively) below.
Ordering an auction
Given a multiset of items, each from a set of possible types , we use the following free variables to encode any possible ordering of : . Item is of type if and only if . Thus, if is equal to 1, it means that the third auctioned item is of type . We require that at every index at most one item type is auctioned, and that the total number of auctioned items of type is equal to the number of type items in .
=  for all  

=  for all 
Any assignment of ones and zeros to the variables that satisfies these two types of constraints corresponds to a valid ordering of the items in . The value of such an ordering is determined by the learned regression models.
Translating feature values
In order to compute the prediction of a regression model, we not only need to translate the models into ILP constraints, but also the values of the features used by these models. Feature 1, 2, 3, and 5 can be computed using linear functions from the variables:
=  for all  

=  for all  
=  for all  
=  for all 
For the fourth type of feature, we use an additional variable , which encodes the expected value of the item auctioned at index of type . If the item at index is not of type , is equal to . Since the variables require the predictions (expected values) as input, we provide their definition after defining the regression models.
=  for all , 

To aid the ILP solver, we also precompute the minimum and maximum obtainable values of every feature at every index and provide these as bounds to the solver.
Constructing the objective function
We aim to maximize the expected values :
These are the predictions of the regression functions, defined later. Although it is also possible to compute both the objective function and the sum values as very large sums over the and model variables (described next), specifying parts of these sums as intermediate continuous variables significantly reduces both the encoding size and the computation time.
Finally, we discuss how to encode the learned regression tree and the linear regression model as the constraints in the ILP model.
Encoding regression trees
We translate the regression tree models into ILP using carefully constructed linear functions. Our encoding only requires one new set of variables , representing whether a leaf node is reached for item type at index . The internal (decision) nodes of the trees can be represented implicitly by the constraints on these new variables. Intuitively, we encode that a variable has to be false when the binary test of any of its parent nodes fails. By additionally requiring that exactly one variable is true at every index, we fully encode the learned regression trees.
Let be the set of all decision nodes in the regression tree for type . Every decision node in contains a boolean constraint , which is true if and only if feature has a value less than or equal to a constant . A key insight of our encoding is that every such boolean constraint directly influences the value of several variables: if it is true (at index ), then all variables representing leafs in the right subtree are false; if it is false, then all that represent leafs in the left subtree are false. In this way, we require only two constraints per boolean constraint in order to represent all possible paths to leaf nodes.
for all  
for all 
where is a calculation of feature value for index , and are the leaf nodes in the left and right subtrees of the decision node with constraint in the regression tree for type , and and are the maximum and minimum values of feature at index . For the feature calculation we simply replace with the righthand sides of the corresponding feature definitions.^{5}^{5}5We ignore the possibility that a feature is equal to because, since features have a limited precision, we can always replace the constants in a decision node with one that cannot be obtained by , without changing its behavior.
The above constraints ensure that when obtains a value of , all of the binary test in the parent nodes on the path to in the tree for type return true at index . By construction of the regression trees, this ensures that at most one variable is true for every type and index . We require however that exactly one variable is true at every index.^{6}^{6}6Counterintuitively, it can occur that the objective function (discussed below) is maximized when every variable is false at an index . If a small sum is needed to reach a high revenue prediction, it can be beneficial to auction but not sell an item. This has to be of the same type as the item sold at index , denoted by the variables:
for all 
This completes our encoding of the regression trees. The predictions of the trees at every index are given by the variable that is true for index . We multiply this variable with the constant prediction in the leaf node it represents to obtain the prediction, and store it in the variables used to compute the sum feature values.
,  for all 
where is the constant prediction of leaf in the tree for type .
We now discuss how to encode a linear regression model.
Encoding linear regression model
Due to its linear nature, implementing linear regression in ILP is very straightforward. We can directly compute the value of the variables using the linear predictor function:
,  for all 
where Feat is the set of all features, is feature ’s values at index , and is the constant coefficient for feature in the regression function for type . The only somewhat difficult part is that at every index, the used regression function can change depending on the auctioned item type . We implemented this choice using indicator functions in CPLEX.^{7}^{7}7Many other solvers have similar constructions. If not, these constraints can be implemented using a ‘bigM’ formulation, similar to the one we use to determine the value of the variables in the regression tree formulation. This changes the above formulation as follows:
,  for all  
,  for all 
It states that if is true, then the values of is determined using the regression function. Otherwise, its value is 0. These are the only constraints needed to fully implement a linear regression function. When using LASSO regularization, some coefficients will receive the value 0. These are removed from the encoding, making the models smaller and easier to evaluate in the solver.
Now the two ILP models are complete and ready to solve the OOSA problem. We give the following example to illustrate how the formulation of ILP works given learned regression trees.
Example 3
Given the learned trees in Example 2, suppose we are asked to order a new multiset of items . We translate this new set, together with the learned trees into the following integer linear program with the following decision variables (for all ): :
where  
and 
subject to (for all )
This denotes that exactly one variable is true at every index , variables are true for item type , and for type . This encodes all possible orderings. From this we compute the feature values (for all ):
that are used in the constraints denoting the Boolean tests in the internal nodes:
where and . The first two constraints encode that if , ; and if , . Thus, if a variable is true for a leaf, then all the Boolean tests of internal nodes on the path from the root to that leaf have to succeed. The last two constraints are the same constraints for the second tree. At last, we require that exactly one variable is true at every index:
,  
. 
A satisfying assignment to the variables is set to 1, the rest to 0, corresponding to the ordering . Since , this leads to and , forcing . Since , this implies . For the next index, we have , giving and , and therefore forcing and . Similarly, we obtain . This results in an objective value of .
4.2 A blackbox heuristic: bestfirst search algorithm
We also provide a blackbox heuristic for solving the ordering problem, see also [21]. The traditional method to overcome the computational blowup caused by sequential decision making is to use a dynamic programming method. Although this lessens the computational load by combining the different paths that lead to the same sets of auctioned items, the search space is still too large and waiting for a solution will take too long. Instead, we therefore employ a bestfirst search strategy that can be terminated anytime in order to return the best found solution so far. We show how this bestfirst search strategy works in Algorithm 1.
The algorithm uses a hashtable and a priority queue. The hashtable is used to exclude the possibility of visiting the same nodes twice if the obtained value is less than before (just like a dynamic programming method). These dynamic programming cuts are sensible but lose optimality as on rare occasions it could be better to sell earlier items for less, leaving more budget for the remaining ones. The priority queue provides promising candidate nodes for the bestfirst strategy. By computing random orderings of the remaining items, the learned models can evaluate complete orderings of all items. The best one found is stored and returned if the algorithm is terminated. Unfortunately, this does not result in an admissible heuristic for an A* search procedure. Hence, even if the algorithm pops a solution from the queue, this is not necessarily optimal. In our experience, using random orderings of the remaining items in this heuristic provides a good spread over the search space. Although some nodes can be ‘unlucky’ and obtain a bad ordering of the remaining items, there are always multiple ways to reach nodes in the search space and it is very unlikely that all possibilities will be ‘unlucky’.
4.3 Whitebox or blackbox optimization?
The main difference between the two abovementioned approaches (see Figure 1
) is that the whitebox method specifies the predictors entirely as constraints, which can be used to infer bounds on the predictions and cut the search space. The blackbox method instead uses the predictors as oracles and is ignorant as to how the predictions are made, which are naturally more efficient to compute but cannot be used to infer search space cuts, i.e., to deduce that one ordering is better than another without testing both of them. Another key difference is that the whitebox method results in a single optimization model that can be run in any modern solver, while the blackbox method requires the use of executable code to produce the predictions. In the blackbox setting, it is therefore much harder to use the powerful solving methods available in dedicated solvers for problems such as integer linear optimization (ILP), satisfiability (SAT), or constraint programming (CP). Instead, general search methods can be used such as bestfirst search, beamsearch, metaheuristics, genetic algorithms, etc.
Both blackbox and whitebox approaches have their advantages. The main advantage of blackbox is that its performance is for a large part independent of the complexity of the used regression model. In contrast, by explicitly modeling the regression model as constraints in a whitebox, more complex regression models lead to many more constraints, which can dramatically increase in the time needed to solve it. Another advantage of blackbox optimization is that it is easy to include additional cuts such as the dynamic programming cuts discussed above. Such cuts can be added as constraints in an LP formulation, but this can lead to a blowup in runtime.
The main benefit of using the whitebox approach is the use of modern exact solvers instead of a heuristic search. These solvers use (amongst others) advanced branchandbound methods to cut the search space, compute and optimize a dual solution, and can prove optimality without testing every possible solution. Since Our whitebox constructions can also be easily integrated into existing (I)LP formulations that have been used in a wide range of applications in for instance Operations Research. In this way, one can combine the vast amount of expert knowledge available in these applications with the knowledge in the readily available data. Our whitebox method is the first we know of that makes the results of machine learning directly available to mathematical modeling in this fashion.
The most important downside of whitebox is that an evaluation of translated models likely requires more time than running the code as a blackbox, especially when the models or features are somewhat complex. In our opinion, however, the advantages of whitebox optimization largely outweigh those of blackbox optimization and make it a very interesting topic for research in machine learning and optimization. In the next section, we compare these two modeling approaches and investigate this tradeoff by applying them to OOSA.
5 Experiments
Designing an optimal ordering for sequential auctions is difficult with heterogeneous bidders, as they may value items differently, have different budget constraints, and moreover, bid rationally or irrationally with various bidding strategies. To evaluate the performance of the proposed optimization methods, ideally, we should collect real auction data, build the optimization models, run realworld auctions with real bidders using different ordering of items produced by different methods, and then compare the resulting revenues. Since this evaluation method is not feasible for us, nor is it the main purpose of this paper, we opted for a widely accepted evaluation approach in research community, that is, we created an auction simulator which simulates auctions with agents. We used this simulator to generate auction data sets, and to evaluate the proposed method. An overview of this process is given in Figure 3.
5.1 The simulator
Simulating auctions
We simulate several sequential auction settings with the simulator: (i) first price auctions where agents bid their reservation prices (agents’ reservation prices are lower than their valuations), as in [11]; (ii) second price sealed bid auctions (i.e. Vickrey auctions [22]), where agents bid truthfully on each item based on their valuation functions, as in [6] ^{8}^{8}8Note that in sequential Vickrey auctions with budget constrained agents, truthtelling is not an equilibrium bidding strategy (see [23]). To the best of our knowledge, how to compute the equilibrium bidding strategy in realistic auction settings is an open problem.; (iii) Vickrey auctions where agents bid smartly, i.e., they compare the utility obtained at the end of the auction when buying and not buying the item and place a bid based on the difference (see Section 5.4 for more details). On the last auctioned item, they bid truthfully^{9}^{9}9Vickrey [22] showed it is a weakly dominant strategy for bidders to bid their true values for the last auctioned item.. Given all bids on an item, the highest bid wins. If multiple agents have the same true value, one of these is selected as winner uniformly at random. With these different auction settings, we intend to show our method is robust to the actual auction format and bidding strategies.
Item types
We use a given set of 8 items to initialize the auction simulator. Every type gets assigned an average value of , for , and a reserve price of . Every type is assigned popularity and sparsity values, drawn uniformly from . The popularity value measures the degree of desirability of the item type by the agents. The sparsity is a measure for the frequency that an item type is sold. In every auction, items are generated using a roulette wheel drawing scheme using the sparsity values.
Bidder agents
The simulator starts with 20 randomly generated bidding agents. Every such agent gets assigned a budget between 50 and 250 uniformly at random. They may desire 1 to 5 of the 8 item types, where popular types have a higher probability of being selected, drawn using a roulette wheel selection on the item types’ popularity values. For bidding in firstprice auctions, every desired item type assigned to an agent is also given a reservation price of , where is the average value of that type, and is a uniform random value between 0.5 and 2.0. The following is an example of five agents generated for the experiments:
budget  

123  59  
141  78  81  41  115  
130  32  21  80  25  36  
126  32  53  28  36  
182  51  93 
Training
For a given set of 20 random bidders, the simulator generates 1000 historical auctions. The 40 items in these auctions are generated using the above scheme, and ordered randomly. These items are run in the simulator, where the agents use the abovementioned bidding strategies to decide what value to bid, in order to determine the winners and the item selling prices. The total selling price of all items in one sequential auction is the collected revenue. For the 20 generated bidders, we first experiment the effect of different item orderings on the collected revenue by trying 100 random orderings and comparing the smallest, median, and largest collected revenue. If the difference between the largest and smallest is less than one tenth of the median revenue, 20 new bidders are generated. This process is repeated until we find a set of agents that passes this check, which typically occurs after a few iterations. By performing this check, we remove irrelevant problem instances. For the 20 agents that pass this check, we generate 1000 auctions of 40 items and simulate these auctions together with the agents. The resulting sequences of itemprice pairs are then transformed to the features discussed in Section 3, and the resulting data set is used to train the regression trees and linear regressors.
Testing
For the same set of 20 bidders, we generate 5 sets of 40 items, which are used for testing. First, the regressors are tested by comparing their predictions with the revenues generated by the simulator on 50 random orderings of each of these item sets. Second, we translate each of the item sets into constraints for both the blackbox and whitebox optimization solvers. The best ordering found by these solvers are compared based on their values on the regression model, and in the simulator.
5.2 Experimental setup
In each experiment, we generate agents and items as described above. We use an implementation of regression trees and LASSO from the scikitlearn machine learning module [20] in Python to learn (and evaluate) the regressors. We learn trees of different depths of 3, 5, 8 (we call them tree3, tree5, tree8), and we set the minimum number of samples required to split an internal node to 10. The LASSO regressor is run with 3 different values for : 1.0, 0.1, and 0.000001 (lasso1, lasso2, lasso3). The resulting trees and linear models then get translated to ILP, which in turn gets solved by an ILPsolver (CPLEX [24]). In addition, we provide the solver with an initial solution (the best of 1000 random orderings) in order to start the search, and set the focus of the solver to finding integer feasible solutions. We set a time limit on the ILP solver of 15 minutes for each instance using a single thread on an Intel core i5 with 8GB RAM and record the best ordering of items that the ILP solver has obtained. The last minute is spent on solution polishing (a local search procedure in CPLEX). We apply our bestfirst search method on the same problem instances with the same running time limit.
Evaluations
There are two levels of evaluations involved in our problem. Firstly, we determine the quality of the learned regressors, as they influence the quality of the solution after optimization. For this, we tested different regression trees with different maximum depths and linear regression models with different parameters.
Next, the optimization methods are evaluated in terms of the quality of the predicted revenue. The optimization methods that we compare include the proposed whitebox ILP model which finds a solution based on the abovementioned 6 regression models, the proposed blackbox bestfirst search which evaluates a solution based on the 6 regression models, and in addition, two other simple ordering methods: (i) auctioning the most valuable item first (i.e., mvf), as suggested in [11]; and (ii) a random ordering strategy (i.e., mean5000), as seen in many realworld auctions for the purpose of fairness.
It is not feasible for us to compute the best solution given the problem size. Thus, we obtain a lower bound on the optimal solution as follows: given a set of items, we generate 5000 random orderings, and we use the true model (i.e., the simulator) to evaluate them and pick the one returning the highest revenue. We use the mean value of these 5000 random orderings as the output of the random ordering strategy.
We evaluate the 15 ordering methods in two ways:

Model evaluation: we use the learned regression models to evaluate the solutions returned by the ordering methods to compute the predicted revenues.

Actual evaluation: we run auctions with the solutions (i.e., orderings) returned by the ordering methods in our simulator to obtain the corresponding revenues. Note that such an evaluation is possible only when a simulator is available.
There are in total three sets of experiments presented in this paper.

In the first set of experiments, we simulate first price auctions where agents bid and pay (if winning) their reservation prices.

In the second set of experiments, the simulator runs Vickrey auctions where agents bid their true values on each item and the winner pays the second highest bid, or the reserve price if (s)he is the only bidder.

In the third set of experiments, the simulator runs Vickrey auctions and agents bid smartly based on their expected utility at the end of the auction (see 5.4).
5.3 Experiment 1
tree3lp 
tree3bf 
tree5lp 
tree5bf 
tree8lp 
tree8bf 
lasso1lp 
lasso1bf 
lasso2lp 
lasso2bf 
lasso3lp 
lasso3bf 
mvf 
best5000 
mean5000 
total wins 


tree3lp  0  171  105  152  107  125  102  130  71  127  82  119  231  17  240  2068 
tree3bf  125  0  83  129  81  105  71  116  61  94  67  86  226  15  230  1770 
tree5lp  192  213  0  197  149  168  144  169  111  156  128  164  249  24  273  2632 
tree5bf  142  163  97  0  105  122  94  134  74  118  90  115  238  16  241  2037 
tree8lp  188  217  142  193  0  178  137  181  113  160  128  162  260  23  287  2668 
tree8bf  170  192  130  168  121  0  112  157  91  133  106  138  251  28  269  2361 
lasso1lp  193  226  150  202  160  183  0  185  113  178  136  170  263  38  269  2756 
lasso1bf  168  181  127  158  113  139  100  0  88  135  91  131  233  18  251  2219 
lasso2lp  224  236  184  225  182  205  174  204  0  195  159  199  265  40  286  3073 
lasso2bf  171  200  141  177  134  163  115  155  94  0  107  124  243  17  273  2407 
lasso3lp  214  231  169  204  167  188  155  201  126  184  0  189  257  36  281  2899 
lasso3bf  176  210  131  182  134  161  124  154  89  156  96  0  245  15  276  2439 
mvf  68  73  51  61  38  48  36  65  34  55  41  54  0  9  123  982 
best5000  282  284  268  284  277  271  257  280  255  278  252  280  291  0  299  4158 
mean5000  57  65  27  58  13  30  31  48  12  26  18  24  174  1  0  864 
We generate 60 sets of agents. For each set of agents we generate and run new sets of items 5 times. This results in 300 models for each learning method.
We first report the performance of the learned trees and linear models in terms of prediction accuracy. Given the same set of items as used during learning, we randomly generate 50 permutations of these items as orderings and compute the predicted valued of these orderings using the learned models. Thus, for every set of agents, there are bids to predict. These predictions are compared with the evaluated values of the orderings by the simulator. We report the coefficients of determination, or scores, in Figure 4. This coefficient is a standard measure for comparing regression models and is defined as:
where is the number of samples, is the th data value (simulator evaluation), the predicted value, and the mean of all data values. A large value (close to 1.0) means the regressor is an almost perfect predictor, and smaller values indicate worse performance. Figure 4 shows that all regression models lead to good prediction, with the lowest score over 0.85. The learned trees with depth 8 give the best performance, followed by the trees with depth 5. Intuitively a larger tree may give a better prediction. The scores of lasso2 and lasso3 are very similar, and slightly better than lasso1. This result makes sense since LASSO with a higher regularization parameter (i.e., lasso1) implies the use of less features, and hence, may have less prediction power. The tree with depth 3 shows much worse performance than the larger trees, and it is on average worse than the three linear regression models.
We now discuss the actual performance of the different ordering methods. After every ordering method returns its best ordering, these orderings are then evaluated in the simulator to get corresponding revenues. As we ran 300 different instances (60 sets of different agents, each with 5 sets of different items), each method has 300 such revenues. We calculate the frequencies of wins by comparing the revenues in pairs in Table 2. One obvious conclusion from the table is that the ordering heuristic mvf (i.e., most valuable item first) performs worst, regardless of which method it compared to. In fact, this heuristic performed even worse than the random ordering strategy mean5000 (123 wins vs. 174).^{10}^{10}10Notice that there is one instance where mean5000 is better than best5000. This is due to the random scheme that we used in selecting winners who give two identical bids. Consequently, evaluating the same ordering twice in the simulator may result in two different revenues. We want to point out that this does not happen often and the effect is often negligible. This result contradicts the theoretical finding that was concluded using much simpler auction settings. Another observation is that given the learned models, the developed whitebox methods win over the blackbox methods more than half of the time. This holds consistently for all 12 proposed methods (wins lp vs. bf: 171, 197, 178, 185, 195, 189). It shows that our new way of utilizing the internal structure of the learned models for optimization is promising.
If we look at the results of the whitebox methods built from the learned regression trees, i.e., tree3lp, tree5lp, tree8lp, we notice that tree5lp and tree8lp performed similarly and they are slightly better than tree3lp. This result is consistent with the higher scores of the larger trees. Interestingly, despite their lower scores, the linear regression LP methods return good orderings especially lasso2lp and lasso3lp which are on average better than the three regression tree LP models. These are further confirmed with Figure 5, which depicts the revenue differences in the simulator between the ordering methods and mean5000. It is more obvious from this figure that the proposed linear regression LP models are the better optimization methods than the tree LP models in practice. We believe that this is due to the cascading inaccuracies, caused by the sum feature that relates the predicted value to the predictions of earlier auctioned items (see discussions in Section 3.3). Due to the crisp boundaries in regression trees, the effect of these errors on the solution evaluation is much greater than using linear regressors. We believe the effect is smaller for larger trees because they are more accurate.
Figure 6 shows the performance difference between the LP model and the bestfirst search, which are built upon the same learned regression model. The solutions are evaluated using the predictive models. The value differences between the whitebox and the blackbox methods are more significant on smaller trees than on bigger trees (tree3 vs tree5 vs tree8), and on linear models with higher regularization parameter than on lower regularization parameter (lasso1 vs lasso2 vs lasso3). This trend shown in the results is somehow expected. The smaller tree leads to a smaller LP model, which is easier to optimize by the solver and consequently gives a much better performance than the bestfirst search. Similarly, the linear regression with a higher regularization parameter implies less feature values to compute during whitebox optimization, and therefore, its advantage over the blackbox method is more obvious than lasso2 and lasso3 which are with smaller values of .
We observe that all whitebox LP methods are better than the blackbox methods, except the LP models resulting from the trees with depth 8. The depth 8 regression trees perform better for bestfirst search when evaluated on the model (Figure 6), but better for LP when evaluated in the simulator (see Figure 5). The most likely reason for the strange behavior of the depth 8 trees is that the bestfirst search outperforms LP on harder to predict instances.^{11}^{11}11Another possible cause is that the LP solver makes small rounding errors. Such errors are unavoidable because using both very small and very large coefficients and/or precision in one problem formulation can cause numerical instability. We therefore round the mean values of the leaf predictions to two digits after the decimal point in the regression tree models before translating it to LP. Unfortunately, due to the crisp decision boundaries in regression trees, these small errors can sometimes have a large effect. Intuitively, because hardertopredict instances typically result in larger models, they are harder to optimize in CPLEX. We checked this cause by investigating whether the scores of the depth 8 regression tree are correlated with which method performing better in the model evaluation. The mean of these scores are 0.950 when the LP performs better, and 0.942 when the bestfirst performs better. Although this difference seems small, it is significant.
Moreover, the small difference in scores between trees with depth 5 and depth 8 also has a significant effect on the difference between their model and simulator evaluations (e.g., due to cascading errors). To demonstrate this, we report in Figure 7 the solution differences of the same ordering methods when being evaluated by the model and the simulator. The purpose of this comparison is to test whether the predicted outcome (using learned models) corresponds to the actual outcome (using simulator). This test is important as in general, there are no simulators available to evaluate the solutions. The figure demonstrates that the linear regression based optimization methods return more reliable solutions, i.e., their solutions evaluated on the learned linear models are closer to the solution values returned by the simulator. Note that it is logical that most values are over estimated because we the optimization tries to solve a maximization problem. The trees with depth 5 and 8 show a significant difference in this evaluation. The depth 3 trees end up with the highest evaluation difference, and overestimate the solution values the most.
5.4 Experiment 2 and 3
In order to demonstrate that our method of auction optimization using learned models is robust to the used auction rule or the bidding strategies, we test it in a secondprice auction in Experiment 2, and with smart agents in Experiment 3. We generate 10 sets of agents. For each set of agents we run new sets of items 5 times. Figures 8 and 9 show the same plots for these settings as Figures 5 and 6 for the setting in the first experiment.
In the second experiment, we test with truthfultelling bidders in secondprice auctions. The only difference with the setting in the first experiment is that here agents bid their true values on the items and pay an amount equal to the second highest bid, or the reserve price if there is only one bidder. As Figure 8 shows, the results are very similar to those in the first experiments: (1) all methods outperform the naive ordering strategies from the literature, (2) the whitebox outperforms the blackbox methods, and (3) the linear regression models perform best. However, the difference between the best performing methods tree5 and lasso2 is no longer significant.
In the third experiment, we test with smart agents that aim to maximize their final utility in a secondprice auction. Specifically, when deciding what value to bid on the current item , they have access to the auction simulator and use it to run the remaining items . For computational reasons, when running the remaining items , it is assumed that every agent bids truthfully and pays according to the secondprice rule in these runs. For every bid , they run the simulator twice: once in the situation where they bid the item with their reservation prices (run1), and once where they do not buy the item (run2). They then decide what value to bid according to the following rules:

If after run2 the agent has a remaining budget greater than its value for the item, it bids truthfully. The intuition is that it is better to buy an item than to have remaining budget.

Else, if the total utility after run2 is less than after run1, it also bids truthfully. When it is better to buy an item, try to obtain it.

Else, it bids its true value minus the difference in utility after run2 and run1. When it is monetary units better to not buy an item, try to obtain it for the value minus .
Using these rules, the agents bid the highest value that they expect will give them an increase in utility.
As can be seen in Figure 9
, also in this challenging setting, our method performs significantly better than the naive ordering rules from the literature. There is however a much larger variance in the performance of the different methods, causing the difference between blackbox and whitebox to be insignificant in the simulator. When evaluated on the model, however, whitebox is still better than blackbox in all cases except
tree8. An interesting final observation is that, with these smart agents, the mvf method does seem to perform slightly better than mean5000 (although not significantly), while in the other experiments it performed consistently worse.6 Comparison to related work
We discuss related works and how our work contributes to and from several related research communities.
6.1 Interplay between mathematical optimization and machine learning
Many studies have investigated the interplay of data mining and machine learning with mathematical modeling techniques, see overview in e.g. [25, 26, 27]. Most of these investigate how to use data mining to estimate the value of parameters in decision making models or to replace decision model structure when it cannot be fully determined from the hypotheses at hand. For instance, Brijs et al. [28] build a decision model as an integer program that maximizes product assortment of a retail store. The decision model is then refined by incorporating additional decision attributes that are the learned patterns from recorded sales data. Li and Olafsson [3] use a decision tree to learn dispatching rules that are then used to decide which job should be dispatched first. These dispatching rules are previously unknown, and it is assumed that it is worthwhile to capture the current practices from previous data. Gabel and Riedmiller [1]
model production scheduling problem as multiagent reinforcement learning where each agent makes its dispatching decisions using a reinforcement learning algorithm based on a neural network function approximation.
Another line of work investigates how to use learning techniques during optimization in order to learn properties of good solutions. For instance, Defourny et al. [30] combine the estimation of statistical models for returning a decision rule given a state with scenario tree techniques from multistage stochastic programming. This line of work shares similarities with the field of blackbox optimization, see, e.g., [14, 15, 31]. In blackbox optimization, methods are used to approximate a function with unknown analytical form and which typically is expensive to execute. In contrast, in multistage stochastic programming this form is known but stochastic. An often applied technique for blackbox optimization is the use of surrogate methods, see, e.g., [32]. Surrogates are approximations of the blackbox function that are less expensive to execute, typical examples include linear/polynomial regression, neural networks, and other methods from machine learning. These functions are trained during optimization from (as few as possible) blackbox function calls.
As learning tasks can lead to challenging optimization problems, researchers have also applied mathematical optimization methods in order to increase learning efficiency. For instance, Bennett et al. [33] use linear programming for determining linear combination splits within twoclass decision trees. Chang et al. [29] propose a Constrained Conditional Model (CCM) framework to incorporate domain knowledge into a conditional model for structured learning, in the form of declarative constraints. CCMs solve prediction problems. In [34], the authors build a mixed integer program for multiclass data classification. A comprehensive overview of optimization techniques used in learning is given in [35]. Researchers are also interested in using mathematical optimization methods in order to find entire models and rules, see e.g., [36, 37, 38].
Our approach fits in the first line of research of this interplay. The proposed bestfirst search method uses regression models to learn good orderings, which is then applied during search to evaluate the solutions of OOSA. Hence, similar to the existing work, the models learned from data are used in a blackbox fashion. This approach shares similarities with surrogate methods for blackbox optimization. An important difference is that the (surrogate) models here are learned from data.
Furthermore, different from the existing work, our proposed whitebox optimization method makes all the properties of the learned models visible to the optimization solver. To the best of our knowledge, this way of using the results of machine learning is entirely novel. Moreover, it realizes the construction of an optimization model automatically from data, providing a new way of modeling in mathematical optimization.
6.2 Sequence models
As an auction ordering is essentially a sequence of items, our work is also related to the many machine learning approaches for sequence modeling. However, to the best of our knowledge none of the existing sequence models fits our auction setting. Language models such as deterministic automata [39]
are too powerful since they can model every possible sequence independently and therefore require too much data to learn accurately. Short sequence models such as hidden Markov models or Ngrams
[40] do not model the dependence on items sold a long time (more than the sliding window length) before.Markov decision processes (MDPs) (see, e.g., [41]) may be closest to our auction setting, as they can directly model the expected price per item and come with methods that can be used to optimize the expected total reward (revenue). However, we notice that a straightforward implementation of the auction design problem as an MDP is not possible. Let us try to model auction design as an MDP. Because of the Markov assumption, every state in this MDP has to contain all the relevant information for the auctioneer’s decision on which item to auction: the set of available items, the bidders’ valuation functions, budgets and strategies, and for every bidder the items (s)he already possesses. In every state of this process, the auctioneer can choose what item to put to auction from a multiset of available items . The next state resulting from auctioning item depends on the bidders and their valuations. These are unknown to the auctioneer, but probabilities can be used to estimate them. These probabilities provide a distribution over the possible next states and the corresponding rewards , given is auctioned. In every possible next state , the set of available items is equal to , i.e., equal to the items in minus the sold item . The goal of the auctioneer is to maximize the expected rewards (revenue) for a given set of items . In every state , (s)he thus has to take an action (choosing an item) that maximizes the sum of the expected rewards of items in starting in state :
In this equation, we separated from to highlight the major hurdle that needs to be overcome in order to represent the auction design problem as an MDP. needs to be included in the MDP since it determines the set of available actions in every state. However, since the set of items is finite this makes the MDP acyclic and at least as large as the number of possible subsets of items from (assuming the effect of their ordering is represented differently), i.e., at least . In order to learn the rewards and transition probabilities, an auctioneer would therefore need an extremely large data sample.
This intuitively shows why it is difficult to represent the auction design problem as an MDP. However, with a suitable factored representation of the states and/or function approximation [41] of the rewards, it could be possible to represent our auction problem as an MDP. In this case, a major hurdle will be to find a representation that results in Markovian states, which is needed to apply the dynamic programming methods. Since the problem of deciding whether good auction ordering exists is NPcomplete (Theorem 1), and these methods run in polynomial time, this is impossible without an exponentially large state space unless . Our method relies on solvers and search methods for NPcomplete problems, making a polynomial state space possible, and therefore requiring much less data to estimate the model parameters.
6.3 Auction design
In the auction literature, a few existing papers investigate the impact of ordering on the performance of sequential auctions. One line of related research focuses on theoretical analysis. In the economics literature (see [8, 12, 11]), such theoretical studies were typically carried out under very restricted markets. The main research focus there is to analyze equilibrium bidding strategies of bidders who compete for (usually) two items (heterogeneous or homogeneous), and then to gain insights on the impact of ordering on the auction outcome based on derived bidding behaviours. For instance, Elmaghraby [8] studies the influence of ordering on the efficiency of the sequential second price procurement auctions, where a buyer outsources two heterogeneous jobs to suppliers with capacity constraints. Suppliers can only win 1 job in this setting. The author shows that specific sequences lower procurement costs and identifies a class of bidders’ cost functions where the efficient orderings (i.e. the auction rewards the jobs to the suppliers with the lowest total costs) and equilibrium bidding strategies exist. Pitchik [12] points out that in the presence of budget constraints, a sealedbid sequential auction with two bidders and two goods may have multiple symmetric equilibrium bidding functions, and the ordering of sale affects the expected revenue. If the bidder who wins the first good has a higher income than the other one, the expected revenue is maximized. Subramaniam and Venkatesh [11] investigate the optimal auctioning strategy of a revenuemaximizing seller, who auctions two items, which could be complements or substitutes. They show that when the items are different in value, the higher valued item (among the two) should be auctioned first in order to increase the seller’s revenue. A similar revenuemaximizing strategy is proposed by Benoit and Krishna [42] in a complete information auction setting. The authors conclude that in such a setting, when selling two items to budget constrained bidders, it is always better to sell the more valued item first. However, this strategy does not optimize the revenue anymore when more than two items are to be auctioned.
In the computer science literature, Elkind and Fatima [13]
study how to maximize revenue in sequential auctions with secondprice sealedbid rules, where bidders are homogeneous, i.e., all their valuations are drawn from public known uniform distributions, they want to win only one item (but they can bid any of items). In this setting, the authors analyze the equilibrium bids, and develop an algorithm that finds an optimal agenda (i.e., ordering). Vetsikas et al.
[43] study a similar auction setting, but unlike [13], they assume the valuations are known to the bidders at the beginning of the auction. The focus of their work was to compute the equilibrium strategies for bidders. Later, Vetsikas [23] analyzes the bidding strategies for budget constrained bidders in sequential Vickrey auctions. However, it is a challenge to compute the equilibrium strategies in practice.Several empirical research has been conducted to test the theoretical findings in the economics community. Grether et al. [9] report on a field experiment that tests the ordering strategies of a seller in sequential, ascending automobile auctions. They conclude that the worst performing ordering in terms of revenue is for the seller to auction vehicles from highest to lowest values. Raviv [10] uses an example to demonstrate that there are cases where the ordering of the auction does not affect revenue using a second price sealbid auction. In his example, there are two heterogeneous items for sale among three risk neutral bidders. Raviv shows that when the ordering is randomized, the expected selling price stays the same, no matter which one of the items is sold first.
We are not the first who consider learning from the previous auctions. However, the difference lies in the fact that the most existing work study how bidders learn from the past information, and update their bids. Boutilier et al. [44] propose a learning model for bidders to update their bidding policies in sequential auctions for resources with complementarities. The bidding strategies are computed based on the estimated distribution over prices, that is modeled by dynamic programming. Goes et al. [45] present an empirical study of real sequential online auctions. They analyze the data from an online auction retailer, and show that bidders learn and update their willingness to pay in repeated auctions of the same item. In [6], the authors show the benefits of using earlier auction data for the management of sequential, multiunit auctions, where the seller needs to split its entire inventory into sequential auctions of smaller lots in order to increase its profit. In their work, an auction feedback mechanism is developed based on a Bayesian model, and it is used to update the auctioneer’s beliefs about the bidders’ valuation distribution.
Our contribution to the auction literature lies on the fact that our approach can be applied to design optimal auctions based on historical auction data. The advantage of using machine learning and data mining methods is that they are robust to the uncertainty (or noise), and hence have high potential to be applied to realworld auction design. Moreover, the approach itself is general and can be applied to many different auction optimization problems, such as finding best reserve price for items for sale, or maximizing social welfare instead of revenue. The only necessary change is on the selection of the features for learning regression models.
7 Conclusions and future work
Mathematical optimization relies on the availability of knowledge that can be used to construct a mathematical model for the problem at hand. This knowledge is not always available. For instance, in multiagent problems, agents are autonomous and often unwilling to share their local information. Frequently, this autonomy and private information influence the outcome of the optimization, making finding an optimal solution very difficult. In this paper, we adopt the idea of using machine learning techniques to estimate these influences for an optimization problem with many unknowns: the optimal ordering for sequential auctions (OOSA) problem.
We have demonstrated our approach by transforming historical auctions into data sets for learning regression trees and linear regression models, which subsequently are used to predict the expected value of orderings for new auctions. We proposed two types of optimization methods with learned models, a blackbox bestfirst search approach, and a novel whitebox approach that maps learned models to integer linear programs (ILP). We built an auction simulator with a set of bidder agents to simulate an auction environment. The simulator was used for generating historical auction data, and for evaluating the orderings of items returned by our methods. We ran an extensive set of experiments with different agents and bidding strategies. Although optimizing the orderings in sequential auctions is a hard problem, our proposed methods obtained very high values, significantly outperforming the naive methods proposed in the literature. The experimental results also demonstrate the advantage of using the whitebox method for optimization, which significantly outperforms the blackbox approach in nearly all settings. In addition, they indicate that when the learned model becomes more complex, it potentially results in more constraints and consequently, an increase in the time needed to solve the problem in a whitebox fashion. Since more complex models are (potentially) better predictors, this shows a clear tradeoff between modeling and optimization power in whitebox optimization. In our opinion, the benefits of the whitebox approach largely outweigh the benefits of using blackbox optimization.
Besides an improved performance, a very big benefit of the whitebox formulation is that it provides a new way of obtaining traditional mathematical models. Our method therefore has many other potential application areas, especially in problems where more and more data is being collected. Even in cases where there already exists a handcrafted optimization model, a model that is learned and translated using our method can easily be integrated into existing (I)LP formulations in order to determine part of the objective function based on data. In this way, one can combine the vast amount of expert knowledge available in these domains with the knowledge in the readily available data. We would like to investigate how such integration works in the future.
We chose a relatively simple auction model for ease of explanation in this paper. However, our approach works whenever regression models are able to provide reliable predictions of the bidding values. Hence we believe it can be applied to other auction formats with more complex valuation functions (i.e. combinatorial preferences [46]) and more complex bidding strategies. In the future, we plan to test our approach on real auction data.
Our experiments also highlight some interesting properties of the whitebox method. Firstly, they show an improvement in performance when the number of features is reduced and/or the models are less complex. It would therefore be very interesting to investigate the effect of pruning and feature selection or reduction on the performance of our methods. Secondly, they show a tendency of the regression tree optimizer to overestimate, i.e., find orderings that have a much higher expected revenue than its revenue in practice. Intuitively, the solver abuses the crisp nature of the regression tree in order to find a solution that satisfies exactly the right constraints. Part of the problem is that, although these constraints are learned from data, and therefore uncertain, the solver treats them as exact. Fortunately, there exists a long history of methods that try to optimize in the presence of such uncertainties in the area of robust optimization, see, e.g.,
[47]. In future work, we will investigate the potential uses of these techniques for learned models.Recently, regression tree models with linear models in the leaf nodes have also been successfully used as blackbox surrogate functions [48]. Since it is also straightforward to translate these trees given our two encodings (replace the leaf variables by indicators for which linear function to use), it would be very interesting to investigate the possibility of a whitebox alternative.
References
 [1] T. Gabel, M. Riedmiller, Adaptive reactive jobshop scheduling with learning agents, International Journal of Information Technology and Intelligent Computing 2 (4) (2008) 1–30.
 [2] A. Huyet, Optimization and analysis aid via datamining for simulated production systems, European Journal of Operational Research 173 (3) (2006) 827–838.
 [3] X. Li, S. Ólafsson, Discovering dispatching rules using data mining, Journal of Scheduling 8 (6) (2005) 515–527.
 [4] D. Bernhardt, D. Scoones, A note on sequential auctions, American Economic Review (3) 653–657.
 [5] E. van Heck, P. M. A. Ribbers, Experiences with electronic auctions in the dutch flower industry, Electronic Markets 7 (4) (1997) 29–34.
 [6] E. J. Pinker, A. Seidmann, Y. Vakrat, Using bid data for the management of sequential, multiunit, online auctions with uniformly distributed bidder valuations, European Journal of Operational Research 202 (2) (2010) 574–583.
 [7] J. Gallien, L. M. Wein, A smart market for industrial procurement with capacity constraints, Management Science 51 (2005) 76–91.
 [8] W. Elmaghraby, The importance of ordering in sequential auctions, Management Science 49 (2003) 673–682.
 [9] D. M. Grether, C. R. Plott, Sequencing strategies in large, competitive, ascending price automobile auctions: An experimental examination, Journal of Economic Behavior & Organization 71 (2) (2009) 75–88.
 [10] Y. Raviv, New evidence on price anomalies in sequential auctions: Used cars in new jersey, Journal of Business & Economic Statistics 24 (2006) 301–312.
 [11] R. Subramaniam, R. Venkatesh, Optimal bundling strategies in multiobject auctions of complements or substitutes, Marketing Science 28 (2009) 264–273. doi:10.1287/mksc.1080.0394.
 [12] C. Pitchik, Budgetconstrained sequential auctions with incomplete information, Games and Economic Behavior 66 (2) (2009) 928–949.
 [13] E. Elkind, S. Fatima, Maximizing revenue in sequential auctions, in: Proceedings of the 3rd international conference on Internet and network economics, WINE’07, SpringerVerlag, Berlin, Heidelberg, 2007, pp. 491–502.
 [14] D. R. Jones, M. Schonlau, W. J. Welch, Efficient global optimization of expensive blackbox functions, Journal of Global optimization 13 (4) (1998) 455–492.
 [15] S. Shan, G. G. Wang, Survey of modeling and optimization strategies to solve highdimensional design problems with computationallyexpensive blackbox functions, Structural and Multidisciplinary Optimization 41 (2) (2010) 219–241.
 [16] M. R. Garey, D. S. Johnson, Computers and intractability – a guide to the theory of NPcompleteness, W.H. Freeman and company, 1979.
 [17] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, NY, USA, 1994.
 [18] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984.
 [19] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B 58 (1994) 267–288.
 [20] scikitlearn: machine learning in Python, http://scikitlearn.org/.
 [21] S. Verwer, Y. Zhang, Revenue prediction in budgetconstrained sequential auctions with complementarities, AAMAS ’12, 2012, pp. 1399–1400.
 [22] W. Vickrey, Counterspeculation, auctions, and competitive sealed tenders, The Journal of Finance 16 (1) (1961) 8–37.
 [23] I. A. Vetsikas, Sequential auctions with budgetconstrained bidders, in: IEEE 10th International Conference on eBusiness Engineering, IEEE, 2013, pp. 17–24.
 [24] IBM ILOG CPLEX Optimizer, http://www01.ibm.com/software/integration/optimization/cplexoptimizer/.
 [25] K. P. Bennett, E. ParradoHernández, The interplay of optimization and machine learning research, Journal of Machine Learning Research 7 (2006) 1265–1281.
 [26] S. Meisel, D. Mattfeld, Synergies of operations research and data mining, European Journal of Operational Research 206 (1) (2010) 1–10.
 [27] D. Corne, C. Dhaenens, L. Jourdan, Synergies between operations research and data mining: The emerging use of multiobjective approaches, European Journal of Operational Research 221 (3) (2012) 469 – 479.
 [28] T. Brijs, G. Swinnen, K. Vanhoof, G. Wets, Building an association rules framework to improve product assortment decisions, Data Mining and Knowledge Discovery 8 (1) (2004) 7–23.
 [29] M. Chang, L. Ratinov, D. Roth, Structured learning with constrained conditional models, Machine Learning 88 (3) (2012) 399–431.
 [30] B. Defourny, D. Ernst, L. Wehenkel, Scenario trees and policy selection for multistage stochastic programming using machine learning, Journal on ComputingPublished online before print.
 [31] L. M. Rios, N. V. Sahinidis, Derivativefree optimization: A review of algorithms and comparison of software implementations, Journal of Global Optimization 56 (3) (2013) 1247–1293.
 [32] S. Koziel, D. E. Ciaurri, L. Leifsson, Surrogatebased methods, in: Computational Optimization, Methods and Algorithms, Springer, 2011, pp. 33–59.
 [33] K. P. Bennett, O. L. Mangasarian, Bilinear separation of two sets in nspace, COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2 (1993) 207–227.
 [34] F. Uney, M. Turkay, A mixedinteger programming approach to multiclass data classification problem, European Journal of Operational Research 173 (3) (2006) 910–920.
 [35] S. Sra, S. Nowozin, S. J. Wright, Optimization for machine learning, Mit Press, 2012.
 [36] E. Carrizosa, D. Romero Morales, Review: Supervised classification and mathematical optimization, Computers and Operations Research 40 (1) (2013) 150–165.
 [37] L. D. Raedt, T. Guns, S. Nijssen, Constraint programming for data mining and machine learning, in: AAAI, 2010, pp. 1671–1675.
 [38] M. J. Heule, S. Verwer, Exact DFA identification using SAT solvers, in: Grammatical Inference: Theoretical Results and Applications, Vol. 6339 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2010, pp. 66–79.
 [39] C. de la Higuera, Grammatical Inference: Learning Automata and Grammars, Cambridge University Press, New York, NY, USA, 2010.

[40]
C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), SpringerVerlag New York, Inc., Secaucus, NJ, USA, 2006.
 [41] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming, Vol. 414, John Wiley & Sons, 2009.
 [42] J.P. Benoit, V. Krishna, Multipleobject auctions with budget constrained bidders, Review of Economic Studies 68 (1) (2001) 155–79.
 [43] I. A. Vetsikas, N. R. Jennings, Sequential auctions with partially substitutable goods., Vol. 59 of Lecture Notes in Business Information Processing, 2009, pp. 242–258.
 [44] C. Boutilier, M. Goldszmidt, B. Sabata, Sequential auctions for the allocation of resources with complementarities, in: Proceedings of the 16th international joint conference on Artifical intelligence  Volume 1, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999, pp. 527–534.
 [45] P. B. Goes, G. G. Karuga, A. K. Tripathi, Understanding willingnesstopay formation of repeat bidders in sequential online auctions, Information Systems Research 21 (2010) 907–924.
 [46] P. Cramton, Y. Shoham, R. Steinberg, Combinatorial auctions, MIT Press, 2006.
 [47] A. BenTal, L. El Ghaoui, A. Nemirovski, Robust Optimization, Princeton Series in Applied Mathematics, Princeton University Press, 2009.

[48]
D. Verbeeck, F. Maes, K. De Grave, H. Blockeel, Multiobjective optimization with surrogate trees, in: Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference, ACM, 2013, pp. 679–686.
Appendix A Hardness of auction design using learned predictors
We show that using predictive models instead of agents with utility functions does not reduce the complexity of the problem: it remains NPcomplete for both regression trees and linear regression predictors.
Lemma 1
Using regression trees, the problem of whether there exists an ordering that has a total predicted value of at least is NPcomplete.
Proof 2
The proof follows from the fact that we can use simple regression trees to model the preferences of the two agents from Theorem 1, and evaluating an ordering using these trees can be done in polynomial time. The regression tree for every item type is shown in Figure 10.
Lemma 2
Using linear regression predictors, the problem of whether there exists an ordering that has a total predicted value of at least is NPcomplete.
Proof 3
We prove the lemma using a construction for computing the value of a quadratic function using only linear functions, the ordering problem, and our feature values. The maximum value of this quadratic function is then forced to coincide with the solution of a partition problem instance: Given a set of integers , is dividable into two sets and such that ?
From the partition instance, let , we construct the following items and linear regression predictors (functions ):

items of type , with , and

item of type , with .
The objective is to maximize . The corresponding decision problem is to ask whether there exists an ordering that achieves a value of .
(
) Let be a partition of such that , and let and be the corresponding items of type . The following ordering then gives a value of :
In this ordering, by definition of and since before item is auctioned. Consequently, , giving . thus already obtains the objective value of , and therefore should be equal to . By definition of , . Since , , and thus , proving that the ordering obtains a value of .
(
) To prove the other direction, let us further analyse the relation between the objective function and the auction ordering. The only term in the predictors that depends on the ordering is , all other terms are constants. This term is equal to zero for the items auctioned before , and equal to for the items auctioned after . Similar to the () part, let denote the items before , those after , and the corresponding partition of items in . The objective value is then given by . We analyse these three parts in turn.

since for these items.

.

, since , this becomes .
Since , the overall objective function is given by:
which is maximized when (for ) with value . This exact value of is obtained when . The sets and thus give a partition of .
Remarks. We proved the NPcompleteness for the general case of Lemma 2. However, we do not know whether the complexity holds for more realistic valuation functions that bidders have.
Comments
There are no comments yet.