Log In Sign Up

A Survey of Online Auction Mechanism Design Using Deep Learning Approaches

by   Zhanhao Zhang, et al.
Columbia University

Online auction has been very widespread in the recent years. Platform administrators are working hard to refine their auction mechanisms that will generate high profits while maintaining a fair resource allocation. With the advancement of computing technology and the bottleneck in theoretical frameworks, researchers are shifting gears towards online auction designs using deep learning approaches. In this article, we summarized some common deep learning infrastructures adopted in auction mechanism designs and showed how these architectures are evolving. We also discussed how researchers are tackling with the constraints and concerns in the large and dynamic industrial settings. Finally, we pointed out several currently unresolved issues for future directions.


page 1

page 2

page 3

page 4


Diffusion and Auction on Graphs

Auction is the common paradigm for resource allocation which is a fundam...

Approximately Optimal Mechanism Design

Optimal mechanism design enjoys a beautiful and well-developed theory, a...

PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning

The design of optimal auctions is a problem of interest in economics, ga...

When Blockchain Meets Auction Models: A Survey, Some Applications, and Challenges

In recent years, blockchain has gained widespread attention as an emergi...

Building High-Quality Auction Fraud Dataset

Given the magnitude of online auction transactions, it is difficult to s...

An Analysis of Selection Bias Issue for Online Advertising

In online advertising, a set of potential advertisements can be ranked b...

Auction-based Charging Scheduling with Deep Learning Framework for Multi-Drone Networks

State-of-the-art drone technologies have severe flight time limitations ...

1 Introduction

Auction has been adopted as a way to negotiate the exchanges of goods and commodities for centuries. Traditionally, Generalized second price auction (GSP) and Vickrey–Clarke–Groves auction (VCG) are widely used. However, GSP is no longer a truthful mechanism if a seller has more than one item to bid. VCG is an auction mechanism based on sealed-second price auction, where winners are charged on the reductions of social welfare of other participants. Nevertheless, the VCG mechanism generates low seller revenues and does not enforce monotinicity of seller’s revenues in the set of bidders and the amounts bid. It is also a non-truthful mechanism that is susceptible to multiple bids under same person or collusion of losing bidders Ausubel and Milgrom (2006).

Auction mechanisms with the properties of incentive compatiblility (IC) and individual rationality (IR) are highly desirable. If an auction is IC, then all bidders will truthfully reveal their private valuations of the items, so that platform administrators do not have the burden of considering bidders’ strategic behaviors and are therefore able to build a reliable and predictable system. All agents are guaranteed to have non-negative utilities if the auction system is IR, and it is a very important feature that allows the system to retain its customers in the long run.

The groundbreaking work by Myerson Myerson (1981) has defined the optimal strategyproof auction for selling a single item, but limited progress has been made in characterizing strategyproof and revenue-maximizing auctions beyond this setting Kuo et al. (2020). The dynamic nature of online auction platforms Cai et al. (2018); Tang (2017); Zhang et al. (2021) has made the problems more challenging, as the bidders, items, and platform’s objectives are changing over time. In the meantime, multiple performance metrics are required to be taken into considerations in order to make the auction system attractive to bidders, sellers, and the platform Liu et al. (2021); Tang (2017); Zhang et al. (2021).

With the advancement of technology, researchers are shifting gears towards deep learning approaches for the design of auction mechanisms. To the best of our knowledge, the deep learning architecture is built on top of either the hybrid multiple linear perceptron infrastructure, the RegretNet infrastructure

Dütting et al. (2020)

, the reinforcement learning infrastructure, or the DeepSet infrastructure

Zaheer et al. (2017)

. Blessed with the modern computing power, researchers are not only able to maximize the revenue, but also dealing with data sparsity and high-dimensional data, optimizing multiple performance metrics, preventing fraud, and enhancing fairness.

This article is organized as follows: we will first introduce the four common infrastructure of the deep neural networks for auction mechanism design. Then, we will discuss how researchers are tackling with the constraints and concerns other than maximizing the total revenue. Lastly, we will point out some unresolved issues and potential future directions.

2 Hybrid Multiple Linear Perceptron (MLP) Infrastructure

Many neural network structures are built by stacking several MLP structure into a cohesive one Shen et al. (2021); Zhou et al. (2018); Shin et al. (2019); Luong et al. (2017). The network is usually comprised of more than one components, where each components is a fully-connected feedforward neural network.

The most delicate architecture is the MenuNet Shen et al. (2021)

, which is comprised of a mechanism network and a buyer network. The mechanism takes a one-dimensional 1 as input and outputs an allocation matrix and a pricing vector. The allocation matrix contains the allocation of all items, which is obtained by a fully-connected layer followed by a sigmoid activation function. The payment vector is obtained by simply multiplying the constant 1 by a vector and it is used to represent the prices for different menu items. The buyer network doesn’t require any training. It takes the outputs from the mechanism network and computes the final utility based on buyer’s value profile. The training of MenuNet is very fast as the network structure is very simple. It is built upon the taxation principle

Vohra (2010), which states that simply letting the buyer do the selection can give an IC mechanism. It does not require buyer’s utility function since the network only outputs buyer’s strategy. It does not make any assumptions about buyer’s valuation and does not require any additional constraints (such as IC and IR) to be enforced to the network. Theoretical proofs have shown that the MenuNet always return revenue optimal mechanism with IC satisfied for menus of size 2 or 3.

Zhou et al has proposed Deep Interest Network Zhou et al. (2018) for the use of click-through rate prediction. It is built on the structure of MLP but aims to conquer the bottleneck caused by the fixed-length representation vector used in traditional MLP, which has limited ability in capturing user’s diverse interests from rich historical behaviors. The authors designed a local activation unit that can adaptively learn the representation of user interests from historical behaviors with respect to a certain advertisements. The data adaptive activation they adopted is Dice (Equation 1), which is a generalization of PReLu He et al. (2015).


In the training phase, and

are the mean and variance of input in each-minibatch, while in the testing phase,

and are moving averages of and over data. is a small constant and is set to be by authors.

They adopt different representation vector for different ads. The embedding layer uses single embedding vector (one-hot) and multiple embedding vectors (multi-hot) in combination. Pooling and concat layers are added to transform the list of embedding vectors into the same lengths, so that the network can allow different users to have different number of behaviors.

Shin et al transformed the charging scheduling problem into an auction problem using deep learning framework Shin et al. (2019). It is designed based on the concept of the Myerson auction Myerson (1981), which is one of the most efficient revenue-optimal single-item auctions. One of the most challenging issues about charging scheduling is the lack of prior knowledge on the distribution of the number of bidders. Employing the auction approach is useful when there is no accurate information of buyer’s true valuation, and buyers are not aware of the private true values of other buyers. Buyer’s values are represented by the urgency of drone machines, while seller’s revenue is generated from the payment from resource allocation. As Myerson’s auction system requires full knowledge of the distribution of bids in order to compute the expected payment, Shin et al used deep neural network to parametrize the virtual valuation function, the allocaton rule, and the payment rule.

The network begins with a monotonic network that transforms the bids of the drone into using the virtual valuation function parametrized by the network. In the , all outcomes of (Equation 3) are computed using the same weights, while the (Equation 2) calculates the outcome using different weights for each bid.


The payment rule network takes in the output from the monotonic network and returns according to Equation 4. Then, the payment value is computed using the inverse function of (Equation 5, 6

). Finally, the allocation rule network assigns the highest winning probability to the highest bidder with positive transformed bid.


Luong et al constructed a neural network Luong et al. (2017) for edge computing resource management based on analytical solution Myerson (1981), which guarantees the revenue maximization while ensuring the IC and IR. The network structure also has three key components as in Shin et al. (2019): neural network parametrized monotone transformation functions that map bids into transformed versions, an allocation rule that maps the transformed bids to a vector of assignment probabilities, and a conditional payment rule that is based on the maximum non-negative transformed bids. The allocation and payment rules are derived from SPA-0, second price auction with 0 reserve price, where the reserve price is the mininum price a seller is willing to accept from the buyer.

RochetNet Dütting et al. (2020) proposed by Dutting et al. is also an application of MLP. The RochetNet is a single-layered neural network that takes in the bids and outputs the maximum non-negative transformed values. It is used to model a non-negative, monotone, convex, and Lipschitz utility function, using linear functions with non-negative coefficients. The RochetNet easily extends to a single bidder with a unit-demand valuation 111Unit-demand valuation: the value of a subset is the maximum individual valuation within that subset.. Each linear function in the RochetNet corresponds to an option on the menue, with the allocation probabilities and payments encoded through its slope and intercept.

The MoulinNet Golowich et al. (2018)

proposed by Golowich et al. also adopts the structure of MLP, which is used to determine the optimal facility locations preferred by agents. MoulinNet is a monotone feed-forward neural network that learns the generalized median rules

Moulin (1980). For single-facility mechanisms, the mechanism in Equation 7 is strategy-proof, which selects the median of agents’ most preferred locations (the agents’ peaks). The inputs of the network are binary-encoded vectors that represent whether the bidded items in are selected. and are parameters in MoulinNet. The is the utility function for agent and represents the peaks of the facility. The output of the network is the optimal selection rules based on utilities.


3 RegretNet Infrastructure

The RegretNet Dütting et al. (2020) proposed by Dutting et al is comprised of an allocation network and a payment network. Both are built upon the MLP infrastructure, but the RegretNet has been adopted and extended in the auction designs in various settings Feng et al. (2018); Golowich et al. (2018); Peri et al. (2021); Kuo et al. (2020).

Two basic assumptions are required by the RegretNet architecture: additive valuation 222Additive valuation: an agent’s valuation for a subset of items is the sum of the individual items’ valuations. and unit-demand valuation. Both of the allocation network and the payment network takes in the bids as inputs, feeds them into MLP-structured networks with separate parameters, and returns the total payments based on the outputs from two networks. Therefore, the two networks are trained together. The network uses a sigmoidal unit to normalize the payment vector into [0, 1], so that the IR constraint will be enforced, where bidders are never charged for more than their expected value for the allocation.

The objective function is aiming to minimize the empirical loss (negated revenue) subject to the IC and IR constraints. The IC constraint can be enforced by the notion of ex post regret for bidders, which is the maximum increase in their utility considering all possible non-truthful bids. The ex post regret is estimated by the empirical regret, which is denoted as

. Therefore, the objective function becomes (Equation 8):


The optimization is achieved using Lagrange multipliers, augmented with a quadratic penalty term for violating the constraints (Equation 9):


Feng et al constructed a neural network Feng et al. (2018) built upon the structure of RegretNet, which consists of an allocation network and a payment network. It extends the RegretNet infrastructure by incorporating the budget constraints as well as handling Bayesian Incentive Compatible (BIC) 333Bayesian Incentive Compatible: truth-telling is the optimal strategy for a bidder in expectation with respect to the types of others, given that the other bidders report truthfully. and conditional IC constraints. Dutting et al enforces IC by requiring the empirical ex post regret to be zero, while Feng et al are able to handle more general forms of IC by constructing an appropriate notion of regret.

Assume we have an auction with rules . To handle BIC, Feng et al constrain the empirical interim regret (Equation 10) to zero. To handle conditional IC/BIC, they constrain the empirical conditional regret to zero. They also incorporate the individually rationality (IR) (Equation 11) and budget constraint (BC) (Equation 12) as penalties.


The loss function is the negated expected revenue

. Let denote the parameters of the allocation network, the induced allocation rule denoted by , and denote the parameters of the payment network, the induced payment rule is denoted by . The objective function is finally in Equation 13. The objective function is trained using Augmented Lagrangian Solver as in Dutting et al, where the quadratic penalty terms are added for each constraint.


Golowich et al proposed RegretNet-nm Golowich et al. (2018) that is able to give general mechanisms that are not limited by existing characterization results for multi-facility location problems. The notion of regret is extended to facility location mechanisms as the maximum expected utility gain agents can achieve by misreporting their preferences.

The network structure is the same as RegretNet, except for the inputs which are agents’ peaks. The misreported peaks are sampled uniformly within , with a granularity of . The ex post regret is integrated into the objective function using Augmented Lagrangian Solver, which uses a quadratic penalty term. The RegretNet-nm opens the door for mechanisms designs for settings without money, such as matching and allocation problems, using neural network approaches.

PreferenceNet Peri et al. (2021) is another extension of RegretNet. It encodes human preferences in auction designs. The network structure is comprised of RegretNet and a 3-layer MLP. These two components are trained in an EM-manner: MLP is first trained using a uniformly drawn sample of allocations as inputs, and it is optimized using binary cross entropy loss based on ground truth labels. Then, the RegretNet is trained using Augmented Lagrange Solver.

The loss function for the entire PreferenceNet is defined in Equation 14, where is the output of the trained MLP. Lastly, the allocations and payments are sampled every epochs from the partially trained RegretNet and use them to augment the MLP training set to adapt to the distributional shifts in allocations during training.


Peri et al. proposed Preference Classification Accuracy (PCA) metric to evaluate how well a learned auction model satisfies an arbitrary constraint. PCA is calculated by the fraction of test bids that satisfy the ground truth constraint. Then, authors use pairwise comparisons between allocations to elicit preferences. Each input set of allocations is compared against n other allocations on their preference scores and label it as either a positive or negative exemplar, based on if its preference score is higher than the majority of others.

ProportionNet Kuo et al. (2020) proposed by Kuo et al. is also based on the infrastructure of RegretNet. Like most other neural networks that deal with auction mechanism designs, it does not work under the setting of combinatorial valuations. Under the assumption of additive valuations and unit-demand valuations, the input space of valuations reduces from to , where is the number of items.


Kuo et al. follows the core idea of RegretNet: in the Bayesian auction setting, one knows the valuation distribution from which samples can presumably be drawn. In the meantime, as both of the allocation and payment rules are functions, we can parametrize them using neural networks. Strategyproofness can be enforced adding constraints that are solvable using Augmented Lagrange Optimizer.

It adopted same neural network architecture as RegretNet, but adding a constraint of unfairness in the loss function 15, so that discriminatory ad allocations among different demography can be mitigated. The regret term is consistent with the definition in RegretNet. The term is for quantifying the unfairness and discrimination in the auction system, which will be described in more details in Section 6.4.

4 Reinforcement Learning (RL) Infrastructure

As the online auction is more often a dynamic system whose users and platform objectives are evolving over time, researchers are more inclined to use dynamically trained models to adapt to the current status quo, leveraging reinforcement learning infrastructures Cai et al. (2018); Tang (2017); Zhang et al. (2021).

As in the RegretNet infrastructure, Cai et al. also adopted no-regret learning, in which case agents only need to reason about their own strategies and their interaction with the environment, while they don’t have to know the values of competitors or compute payoff-maximizing strategies over a long sequence of rounds. Reasoning about the strategies of other parties usually require strong cognitive assumption and highly burdensome computing power, which most agents don’t have access to.

Based on well-known bandit algorithms, Cai et al. identified four possible strategies for sellers.

  1. Greedy Watkins (1989): With probability , each seller selects a strategy uniformly at random. With probability , the strategy with the best observed empirical mean payoff is selected.

  2. First: For a horizon of rounds, the seller picks a strategy uniformly at random for the first rounds, and then picks the strategy that maximizes the empirical mean of the observed rewards for all the subsequent rounds.

  3. Exponential-weight Algorithm for Exploration and Exploitation (Exp3) Auer et al. (1995, 2003): In short, Exp3 selects a price according to a weighted distribution and then adjust the weights based on payoffs. To be more precise, suppose there are

    possible prices, the probability distribution

    of those prices at round is defined in Equation 16, where are the current weight of price at round and is a real number between .


    Select a price according to the distribution above and compute its payoff , then the weight for price is updated according to Equation 17, while the weights for all the other prices remain unchanged.

  4. Upper confidence Bound Algorithm (UCB1) Agrawal (1995); Auer et al. (2002): In the first rounds, select a price not used before from and then select the price with the max weighted value in the subsequent rounds.

    Initialize the weights for all prices to be 0. For any round , the seller chooses a price such that , computes the utility , and updates the weights for price to be , and keeping the weights for all the other prices unchanged.

    For any round , the seller chooses the price according to Equation.


First and have a clear distinction between exploration and exploitation and belong to the class of semi-uniform strategies. Exp3 makes no distributional assumptions about the rewards and is widely used for the full information setting and works in the adversarial bandit feedback model Auer et al. (2003). UCB1 maintains a certain level of optimism towards less frequently played actions and uses the empirical mean of observed actions to choose the action in the next round. UCB1 is best suited in scenarios where rewards follow some unknown distributions Cai et al. (2018).

These sellers’ models can model sellers with different degrees of sophistication or pricing philosophies, and it is consistent with the recent literature on algorithmic mechanism deisgn, in terms of modeling agetn rationality in complex dynamic environments.

Previous researchers have already come up with a few variants of reinforcement learning models. The disadvantage of Deep Q-Network (DQN) Mnih et al. (2015) is that it cannot handle continuous actions or high-dimensional action spaces, as stochastic actor-critic algorithms are hard to converge. The Deterministic Policy Gradient (DPG) algorithm Silver et al. (2014) is developed to train a deterministic policy with parameter vector. The DPG consists of the critic and actor. The critic approximates the action-value function, while the actor adjusts the parameters of the deterministic policy. Deep Deterministic Policy Gradient (DDPG) Lillicrap et al. (2019) is then proposed, as DPG is severely impacted by the high degree of temporal correlation that introduces high variance. DDPG stores the experiences of the agetn at each time step in a replay buffer and uniformly samples mini-batch from it at random for learning, which can eliminate the temporal correlation. DDPG also employs target networks for the regularization of the learning algorithm, which updates the parameters at a slower rate.

However, the size of the action space blows up very sharply with the number of sellers increases, so an direct application of DDPG will fail to converge. In addition, the DDPG is not able to handle variability on the set of sellers, since the algorithm uses a two-layer fully connected network and the positions of each seller plays an important role Cai et al. (2018).

Cai et al. proposed IA(GRU) Cai et al. (2018)

algorithm that aims to mitigate the problems from DDPG. It adopted the framework of DDPG by maintaining a sub-actor network and a sub-critic network. In each step of training, if utilizes a background network to perform a permutation transformations by ordering the sellers according to certain metrics, which maintains permutation invariance. In the meantime, it applies a recurrent neural network (RNN) on the history of sellers. The outputs from the permutation transformation and the outputs from the RNN on histories are then integrated together as inputs to the sub-actor and sub-critic networks.

In reality, participants of online auctions are constrained from both informational and computational aspects and therefore they are not fully rational. In addition, the historical data can be limited to the ones generated by mechanisms that are defined by only few sets of parameters, and therefore we do not have enough exploration for the past data. Both participants and auction system designers are impacted by multiple and complicated factors, and therefore their decisions are changing over time. To overcome those difficulties, Tang et al. models each player as an independent local Markov decision process

Tang (2017), where a local state encodes the part of historical actions and outcomes that the player can observe so far.

Tang et al. uses DDPG infrastructure to handle continuous action space, but it decomposes the original neural netowrk into a set of sub-networks, one for each seller, to handle the huge number of states. It depends on the assumption that sellers are independent and the Q-values are additive among multiple sellers. It uses LSTM to adaptively learn from its past bidding data and feedback to predict future bid distribution, while it does not explicitly model each advertiser’s bidding strategy. In order to optimize the designer’s markov decision process, it discretizes the action space and then use the Monte-Carlo tree search (MCTS) Shen et al. (2017) to speed up the forward-looking search. Experiments and case studies show that the dynamic pricing scheme proposed by Tang et al. outperforms all static schemes with large margins.

Another challenge in the dynamic online auction systems is the variety of performance metrics users and designers consider when making their decisions, while most state-of-the-art auction mechanisms only optimizes a single performance metrics, such as revenue or social welfare. Zhang et al. identified a list of performance metrics Zhang et al. (2021) that can be considered by users, advertisers, and the ad platform.

  1. Revenue Per Mille (RPM): , where PPC is the payment for winning ads.

  2. Click-Through Rate (CTR): .

  3. Add-to-Cart Rate (ACR): .

  4. Conversion Rate (CVR): .

  5. GMV Per Mille (GPM): .

Zhang et al. proposed the Deep GSP Zhang et al. (2021) that can optimize multiple performance metrics as in Equation 19, where is the bid vector from users, is the auction mechanism, is the performance metrics function, and is the weights associated with performance metrics and can be adjusted by the auction platform administrators from time to time.


The Deep GSP auction is built upon the classical generalized second-price auction. It takes in the features of items (e.g. category, historical click-through rate), user profile (e.g. gender, age, income), and user preference (e.g. budget, marketing demands) as inputs to a deep neural network, which integrates those features with the bids into an input vector and map them to a rank score , where is the mapping function. Bidders are then sorted based non-increasingly based on their rank scores, and the top-K bidders would win the auction. The payments of the winning bidders are based on the bids from the next highest bidders.

The mapping function needs to be monotone with respect to the bids , in order to satisfy the game equilibrium constraint. Some pieces of previous research enforced monotonicity by designing specific neural network architectures, but it increases the computational complexity for the training procedure. Therefore, Zhang et al. directly incorporate the monotonicity constraint by introducing a point-wise monotonicity penalty term (Equation 20) into the loss function, where is a non-linear function with bid and is parametrized using a deep neural network.


The smooth transition constraint is imposed in Equation 21, which ensures that the advertiser’s utility would not fluctuate too much when the auction mechanism is switched towards another objective.


5 DeepSet Infrastructure

Most deep neural network has an implicit constraint on the position of each input, while in reality the items in auction do not have an inherent ordering. Cai et al. uses permutation transformation to mitigate the position effect by ordering the sellers based on some metrics. Liu et al. took a step further by removing the ordering effect completely. They introduced Deep Neural Auction (DNA) Liu et al. (2021) that is built on the DeepSets Zaheer et al. (2017) architecture.

The set encoder for DNA is composed of two groups of layers and . Each instance is first mapped to a high-dimensional latent space using the shared fully connected layers , followed by the Exponential Linear Unit (ELU) Clevert et al. (2016) activation function . Then, it is processed with symmetric aggregation pooling (e.g. avgpool) to build the final set embedding for each ad with another fully connected layer . The entire procedure is described in Equation 22, where represent the hidden states from all bidders except .


DNA uses context-aware rank score uses a strictly monotone neural network with respect to bid, and supports efficient inverse transform given the next highest rank score. The rank score can be obtained using Equation 23 and the price can be obtained using Equation 24, where , , and are weights of the neural network.


This partially monotone MIN-MAX neural network represented by Equation 23 has been proved to be able to approximate any function Daniels and Velikova (2010).


The DNA also model the whole process of allocation and payment inside the neural network framework, as treating allocation and payment as an agnostic environment can limit the deep learning results. One of the challenges is that both the allocation and payment are built on a basic sorting operation, which is not differentiable. Liu et al. overcome this issue by proposing a differentiable sorting engine that caters to the top-K selection in the multi-slot auctions, leveraging Neural-Sort Grover et al. (2019).

In Equation 25, the intuitive interpretation of is the choice probabilities on all elements for getting the highest item. Where , and denotes the column vector of all ones. The top-K payments can therefore by recovered by a simple matrix multiplication in Equation 26.


6 Constraints and Concerns

Online auction system is complex gaming system among bidders, sellers, and auction system administrators. Merely considering the revenue or social welfare for one party is more often sub-optimal. To the best of our knowledge, we have identified four large categories of constraints and concerns when administrators are designing their auction systems: IC & IR, data sparcity & high dimensionality, multiple performance metrics and objectives, and fairness & fraud prevention. We will illustrate below why these constraints matter and then summarize how current researchers are tackling with them.

6.1 Incentive Compatibility (IC) & Individual Rationality (IR)

Auction platform designers are more often interested in maximizing long-term objectives Tang (2017). Therefore, building a reliable system that can adapt to the dynamic and complicated environment is crucial for the success.

IC property ensures that all agents will achieve the best outcome by reporting their values truthfully. Auction participants may come from many different background and therefore informational, cognitive, and computational constraints will limit their rationality in different extent. The stability and reliability of the system will be much harder to maintain if IC cannot be satisfied, as designers and agents have to take all potential strategic behaviors of all other agents into account. Some researchers accomplished IC by adopting theoretical frameworks, such as GSP by DNA Liu et al. (2021) and taxaction principal Vohra (2010) by MenuNet Shen et al. (2021). Other researchers Dütting et al. (2020); Feng et al. (2018); Golowich et al. (2018); Kuo et al. (2020); Peri et al. (2021) achieved IC by enforcing ex post regret Dütting et al. (2020) to zero.

IR property is also important as it ensures that all agents are receiving non-negative payoff. IR can be ensured by building upon the theoretical results from Myerson’s system Myerson (1981). It can also be enforced by integrating an additional constraint into the objective function and solve it using Augmented Lagrange Solver Dütting et al. (2020); Feng et al. (2018)

6.2 Data Sparsity & High Dimensionality

The action space can blow up very quickly as the number of agents increase. The high dimensionality in action space can introduce severe computational burdens. In order to mitigate the computational burden, Tang et al. and Cai et al. decomposed the neural network into sub-networks for each seller and discretized the action spaces Tang (2017); Cai et al. (2018).

In addition, many features are high-dimensional one-hot vectors, so data can be very sparse. The original regularization approaches take the entire vector into computations and the regularized vector has non-zero entries for most of the positions, which increases the computational time for sparse data drastically. Zhou et al. proposed a mini-batch aware regularization Zhou et al. (2018) approach, where only parameters of features appearing in the mini-batch participate in the computation of regularization.


The mini-batch aware regularization is shown in Equation 27, where is the learning rate, is the batch, is the weight of feature, is the number of occurrences of feature. The numerator denotes if at least one instance in the mini-batch has feature .

6.3 Multiple Performance Metrics and Objectives

The most intuitive objective for most auction designers is to maximize their profit. However, to adapt to the dynamic and complex nature of today’s online auction systems, designers may be better-off if they consider multiple performance metrics. Zhang et al. listed out several commonly used performance metrics (See 4). Zhang et al. and Liu et al. optimizes a linear combination of functions of those metrics, where both the linear weights and functions of those metrics and be specified by auction designers Zhang et al. (2021); Liu et al. (2021). Over the time, designers are free to adjust the weights and functions if their objectives have changed.

6.4 Fairness & Fraud Prevention

Due to the biases in the training data, many online platforms have discriminatory ad allocations among different demography Kuo et al. (2020). One of the major social problems associated with online advertising is the use in the job market, where unfairness can be very detrimental to the equality and the protection of underrepresented groups.

To mitigate the unfairness, PreferenceNet Peri et al. (2021) integrated three definitions of fairness into the model, all of which map the allocations onto . In all these equations (Equation 28, Equation 29, and Equation 30), refers to item while and refer to and agents.

  1. Total Variation Fairness (Equation 28): the distance between allocations cannot be larger than the discrepancies between these two users.

  2. Entropy (Equation 29

    ): the allocation for an agent tends to be more uniformly distributed.

  3. Quota (Equation 30): the smallest allocation to any agent should be greater than some threshold.


ProportionNet Kuo et al. (2020) also adopted the notion of total variation fairness, so that the allocations to similar users cannot differ by too much. It converted the Equation 28 into an unfairness constraint (Equation 31) that can be fed into the Augmented Lagrange Solver, which allows us to quantify the unfairness of the auction outcome for all users involved.


While unfairness can be introduced by the biases in auction mechanisms, it can also be induced by shill bidding behaviors in auctions. Sellers can adopt a variety of shill bidding strategies to inflate the final selling price of an item. The common four shill bidding strategies have been identified Sangwan and Arora (2019):

  1. Evaluator: single bid engagement at an early time with a high amount.

  2. Sniping: single bid engagement in the last moment, not leaving opportunity for anybody else to outbid.

  3. Unmasking: multiple bid engagement in a short span of time with a probability of intend to exposing the maximum bid or the highest bidders.

  4. Skeptic: multiple bid engagement with lowest possible bids each time.

7 Conclusion

In this article, we have gone through the rough evolving process of deep learning based online auction systems. Mechanisms designed using MLP infrastructure are usually built upon some theoretical results, and the MLP structure is used to represent the functions given in the theories. A more sophisticated structure came with the appearance of RegretNet, which parametrizes allocation rules and payment rules using separate networks. Many researchers have built extensions of RegretNet by integrating more constraints into the objective function or slightly adjusting the network structure but still keeping allocation and payment networks separate. The dynamic nature of online auction has encouraged researchers to adopt the deep reinforcement learning framework, which is more often a model-free approach that requires less assumptions on the data and is able to keep adapting itself as time progresses. As most traditional neural network has an implicit constraints on the positions of inputs, it integrates the ordering of auction participants into the model training, while in reality there is no inherent ordering among them. As a result, a deep learning based on DeepSet infrastructure has emerged, which can remove the effects of positions completely. We have also discussed the constraints and concerns faced by auction designers and we pointed out how researchers have attempted to address them.

Although researchers are progressing rapidly to the development of an online auction mechanism that can be reliable and profitable in the long term, there are still a lot of unresolved issues left for future researchers to investigate. As the size of users for online auction system is usually gigantic, the computational constraint and convergence problems for high-dimensional data are still non-negligible issues. Researchers are either mapping the data to lower dimension or reducing the action space by discretizing it, but it remains unclear how much information we are losing. In addition, most models assume the unit-demand and additive valuations, while this assumption might not be true in the real world. Last but not least, as most mechanism frameworks rely on the assumption that auction participants are independent from each other, their IC and IR constraints are also computed at individual levels. Therefore, their strategies might not be robust to the non-truthful behaviors conducted by participants in collusion.


  • [1] R. Agrawal (1995) Sample mean based index policies by o(log n) regret for the multi-armed bandit problem. Advances in Applied Probability 27 (4), pp. 1054–1078. External Links: Document Cited by: item 4.
  • [2] P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire (1995) Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science, Vol. , pp. 322–331. External Links: Document Cited by: item 3.
  • [3] P. Auer, N. Cesa-Bianchi, and P. Fischer (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 (2–3), pp. 235–256. External Links: Link Cited by: item 4.
  • [4] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire (2003-01) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), pp. 48–77. External Links: ISSN 0097-5397, Link, Document Cited by: item 3, §4.
  • [5] L. M. Ausubel and P. Milgrom (2006) The Lovely but Lonely Vickrey Auction. See cramton06a, pp. 17–40. External Links: Link Cited by: §1.
  • [6] Q. Cai, A. Filos-Ratsikas, P. Tang, and Y. Zhang (2018) Reinforcement mechanism design for e-commerce. External Links: 1708.07607 Cited by: §1, §4, §4, §4, §4, §6.2.
  • [7] D. Clevert, T. Unterthiner, and S. Hochreiter (2016) Fast and accurate deep network learning by exponential linear units (elus). External Links: 1511.07289 Cited by: §5.
  • [8] H. Daniels and M. Velikova (2010) Monotone and partially monotone neural networks. IEEE Transactions on Neural Networks 21 (6), pp. 906–917. External Links: Document Cited by: §5.
  • [9] P. Dütting, Z. Feng, H. Narasimhan, D. C. Parkes, and S. S. Ravindranath (2020) Optimal auctions through deep learning. External Links: 1706.03459 Cited by: §1, §2, §3, §6.1, §6.1.
  • [10] Z. Feng, H. Narasimhan, and D. C. Parkes (2018) Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, Richland, SC, pp. 354–362. Cited by: §3, §3, §6.1, §6.1.
  • [11] N. Golowich, H. Narasimhan, and D. C. Parkes (2018-07) Deep learning for multi-facility location mechanism design. In

    Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18

    pp. 261–267. External Links: Document, Link Cited by: §2, §3, §3, §6.1.
  • [12] A. Grover, E. Wang, A. Zweig, and S. Ermon (2019) Stochastic optimization of sorting networks via continuous relaxations. External Links: 1903.08850 Cited by: §5.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2015)

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification


    2015 IEEE International Conference on Computer Vision (ICCV)

    Vol. , pp. 1026–1034. External Links: Document Cited by: §2.
  • [14] K. Kuo, A. Ostuni, E. Horishny, M. J. Curry, S. Dooley, P. Chiang, T. Goldstein, and J. P. Dickerson (2020) ProportionNet: balancing fairness and revenue for auction design with deep learning. External Links: 2010.06398 Cited by: §1, §3, §3, §6.1, §6.4, §6.4.
  • [15] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra (2019) Continuous control with deep reinforcement learning. External Links: 1509.02971 Cited by: §4.
  • [16] X. Liu, C. Yu, Z. Zhang, Z. Zheng, Y. Rong, H. Lv, D. Huo, Y. Wang, D. Chen, J. Xu, F. Wu, G. Chen, and X. Zhu (2021) Neural auction: end-to-end learning of auction mechanisms for e-commerce advertising. External Links: 2106.03593 Cited by: §1, §5, §6.1, §6.3.
  • [17] N. C. Luong, Z. Xiong, P. Wang, and D. Niyato (2017) Optimal auction for edge computing resource management in mobile blockchain networks: a deep learning approach. External Links: 1711.02844 Cited by: §2, §2.
  • [18] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis (2015-02) Human-level control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. External Links: ISSN 00280836, Link Cited by: §4.
  • [19] H. Moulin (1980) On strategy-proofness and single peakedness. Public Choice 35 (4), pp. 437–455. External Links: ISSN 00485829, 15737101, Link Cited by: §2.
  • [20] R. B. Myerson (1981) Optimal auction design. Mathematics of Operations Research 6 (1), pp. 58–73. External Links: ISSN 0364765X, 15265471, Link Cited by: §1, §2, §2, §6.1.
  • [21] N. Peri, M. J. Curry, S. Dooley, and J. P. Dickerson (2021) PreferenceNet: encoding human preferences in auction design with deep learning. External Links: 2106.03215 Cited by: §3, §3, §6.1, §6.4.
  • [22] S. R. Sangwan and A. Arora (2019) Supervised machine learning based buyer’s bidding behaviour detection in online auction. Social Science Research Network. Cited by: §6.4.
  • [23] W. Shen, B. Peng, H. Liu, M. Zhang, R. Qian, Y. Hong, Z. Guo, Z. Ding, P. Lu, and P. Tang (2017) Reinforcement mechanism design, with applications to dynamic pricing in sponsored search auctions. External Links: 1711.10279 Cited by: §4.
  • [24] W. Shen, P. Tang, and S. Zuo (2021) Automated mechanism design via neural networks. External Links: 1805.03382 Cited by: §2, §2, §6.1.
  • [25] M. Shin, J. Kim, and M. Levorato (2019) Auction-based charging scheduling with deep learning framework for multi-drone networks. IEEE Transactions on Vehicular Technology 68 (5), pp. 4235–4248. External Links: Document Cited by: §2, §2, §2.
  • [26] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller (2014) Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pp. I–387–I–395. Cited by: §4.
  • [27] P. Tang (2017) Reinforcement mechanism design. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 5146–5150. External Links: Document, Link Cited by: §1, §4, §4, §6.1, §6.2.
  • [28] R. Vohra (2010-01)

    Mechanism design. a linear programming approach

    Mechanism Design: A Linear Programming Approach, pp. . External Links: Document Cited by: §2, §6.1.
  • [29] C. J. C. H. Watkins (1989-05) Learning from delayed rewards. Ph.D. Thesis, King’s College, Cambridge, UK. External Links: Link Cited by: item 1.
  • [30] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. Salakhutdinov, and A. J. Smola (2017) Deep sets. CoRR abs/1703.06114. External Links: Link, 1703.06114 Cited by: §1, §5.
  • [31] Z. Zhang, X. Liu, Z. Zheng, C. Zhang, M. Xu, J. Pan, C. Yu, F. Wu, J. Xu, and K. Gai (2021) Optimizing multiple performance metrics with deep gsp auctions for e-commerce advertising. External Links: 2012.02930 Cited by: §1, §4, §4, §4, §6.3.
  • [32] G. Zhou, C. Song, X. Zhu, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai (2018) Deep interest network for click-through rate prediction. External Links: 1706.06978 Cited by: §2, §2, §6.2.