Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising

06/07/2021 ∙ by Xiangyu Liu, et al. ∙ Shanghai Jiao Tong University 0

In e-commerce advertising, it is crucial to jointly consider various performance metrics, e.g., user experience, advertiser utility, and platform revenue. Traditional auction mechanisms, such as GSP and VCG auctions, can be suboptimal due to their fixed allocation rules to optimize a single performance metric (e.g., revenue or social welfare). Recently, data-driven auctions, learned directly from auction outcomes to optimize multiple performance metrics, have attracted increasing research interests. However, the procedure of auction mechanisms involves various discrete calculation operations, making it challenging to be compatible with continuous optimization pipelines in machine learning. In this paper, we design Deep Neural Auctions (DNAs) to enable end-to-end auction learning by proposing a differentiable model to relax the discrete sorting operation, a key component in auctions. We optimize the performance metrics by developing deep models to efficiently extract contexts from auctions, providing rich features for auction design. We further integrate the game theoretical conditions within the model design, to guarantee the stability of the auctions. DNAs have been successfully deployed in the e-commerce advertising system at Taobao. Experimental evaluation results on both large-scale data set as well as online A/B test demonstrated that DNAs significantly outperformed other mechanisms widely adopted in industry.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In online e-commerce, the advertising platform is an intermediary to help advertisers deliver their products to interested users (Goldfarb and Tucker, 2011). Auction mechanisms, such as Vickrey-Clarke–Groves (VCG) auction (Vickrey, 1961), Myerson auction (Myerson, 1981) and generalized second-price auction (GSP) (Edelman et al., 2007), have been used to enable efficient ad allocation in various advertising scenarios. On designing auction mechanisms for e-commerce advertising, we need to jointly consider and optimize multiple performance metrics from three major stakeholders, i.e., users, advertisers, and the ad platform. Users look for good shopping experiences, advertisers want to accomplish their ad marketing objectives, and the ad platform would like to extract large revenue while also provide satisfying services to both users and advertisers (Bachrach et al., 2014; Zhang et al., 2021). Furthermore, the ad platform may balance and adjust the importance of these metrics to satisfy the business’s strategies for users and advertisers in different e-commerce scenarios. High-quality user experiences and advertising services would guarantee the long-term prosperity of e-commerce advertising.

The traditional auction mechanisms are suboptimal for the e-commerce advertising with multiple performance metrics in dynamic environments. VCG auction and Myerson auction focus on optimizing either social welfare or revenue, and the procedures of the auctions are also too complex to explain for advertisers. Although GSP auction has nice interpretation and is easy to deploy in industry, the fixed allocation rule limits its capability to optimize multiple performance metrics in dynamic environments. To overcome these limitations, we turn our attention to data-driven auction mechanisms, inspired by the recent increase of interest on leveraging modern machine learning, and in particular deep learning, for auction design 

(Dütting et al., 2019; Feng et al., 2018; Rahme et al., 2021; Shen et al., 2019). The data-driven auction mechanisms enable us to exploit rich information, such as the context of auction environment and the performance feedback from auction outcomes, to guide the design of flexible allocation rules for optimizing multiple performance metrics, which significantly enlarge the design space of auction mechanisms.

Figure 1. The comparison between traditional auctions and Deep Neural Auction. The set encoder and the context-aware rank score network are applied to extract auction features, which improves representation space and flexibility of rank scores, compared with the fixed rank score in traditional auctions. Furthermore, the differentiable sorting engine makes the auctions, including allocation and pricing, continuous and differentiable w.r.t the inputs, thereby supporting the end-to-end back-propagation training.

However, it remains open to both academia and industry on how to make full use of the powerful deep learning on designing data-driven auction mechanisms for the industrial e-commerce advertising. We consider two critical challenges in this direction. The first one comes from the contradiction between auction mechanism and deep learning in design principle. The auction mechanisms, including allocation and pricing, usually involve various discrete optimization operations, e.g., the top-k ads selection in GSP auction (Varian, 2007), while the deep learning follows an end-to-end pipeline for continuous optimization. This contradiction prevents the performance feedback underlying the auction outcomes from integrating into the back-propagation model training as in deep learning. When designing learning models for data-driven auctions, we also need to take the game theoretical properties, such as Incentive Compatibility (Vickrey, 1961), into account, which further complicates the application of deep learning for auction design. Although some recent works have proposed deep neural network architectures for learning-based auction mechanisms (Dütting et al., 2019; Feng et al., 2018; Shen et al., 2019), they focused on the theoretical auction setting, either the complex combinatorial auctions (Dütting et al., 2019) or the simple single-bidder auctions (Shen et al., 2019), in lack of the insights from industrial deployment. Thus, it needs further efforts to integrate deep learning into the design and deployment of end-to-end auction mechanisms for practical industrial setting.

The second challenge is data efficiency. The current learning-based approaches (Zhang et al., 2021; Golrezaei et al., 2017) usually require a large number of samples to learn the optimal auction due to an ambiguity issue we observed in the data from auctions. It is a common case that an advertiser with the same feature profile can result in different outcomes in distinct auctions, e.g., wins in one auction but loses in another, due to the change of the auction context, e.g., the competition from the other advertisers. From the machine learning perspective, this may cause an ambiguity issue (Hüllermeier and Beringer, 2006), introducing the one-to-many relation in samples, i.e., the same feature (advertiser feature profile) but contradictory labels (winning or losing). Naive neural network models, which do not incorporate much inductive bias (Battaglia et al., 2018), may not fully handle the ambiguity phenomenon on data samples from auctions, resulting in inefficient learning for the end-to-end auction design.

In this paper, we aim to develop a data-efficient end-to-end Deep Neural Auction mechanism, namely DNA, to optimize multiple performance metrics for e-commerce advertising. Considering the nice properties of easy interpretation and deployment in industry, we stick to the rank score-based allocation and second-price payment procedures inherited from GSP auction. As shown in Figure 1, DNA contains a set encoder, a family of context-aware rank score functions and a differentiable sorting engine, with which the process of auction design can be integrated into an end-to-end learning pipeline. Specifically, the set encoder and a carefully designed neural network extract and compress the rich features of auction, such as auction context, bid, advertiser’s profile, predicted auction outcomes, etc, into a context-aware rank score, tremendously increasing the representation space and flexibility of rank scores in GSP auction. We further propose a module to relax the sorting in auctions as a differentiable operation, which makes the whole process of GSP auction, including allocation and pricing, to be continuous and fully differentiable with respect to the inputs, and then supports the end-to-end back-propagation training. When designing the learning models for these three modules, we introduce several constraints over the network structures, such as the monotonicity of the neural network in terms of bid, to preserve the game theoretical properties of GSP auction for DNA. Our contributions in this paper can be summarized as follows:

We make an in-depth study on leveraging the power of deep learning to design data-driven auctions for industrial e-commerce advertising. The proposed end-to-end Deep Neural Auction mechanisms, namely DNA, enable us to optimize multiple performance metrics using the real feedback from historical auction outcomes. The newly designed rank scores also largely enhance the flexibility of ad allocation, making it possible to adjust the auction mechanisms for various performance metrics in dynamic environments.

We employ three deep learning models to facilitate the design of data efficient and end-to-end learning auction mechanisms with the guarantee of game theoretical property. A set encoder model and a monotone neural network model are proposed to encode various features of auction into the context-aware rank score. With the proposed differentiable sorting engine, we can formulate the design of data-driven auction as a continuous optimization problem, which can be integrated into an end-to-end learning pipeline.

We have deployed the DNA mechanism in the advertising system at Taobao, one of the world’s leading e-commerce advertising platforms. Experimental results on both large-scale industrial data set as well as the online A/B test showed that DNA mechanism significantly outperformed other widely used industrial auction mechanisms on optimizing multiple performance metrics, such as Utility-based GSP (Bachrach et al., 2014) and Deep GSP (Zhang et al., 2021).

2. Preliminaries

2.1. Ad Auction Model

We describe a typical ad platform architecture in e-commerce advertising. Formally, advertisers compete for ad slots, which are incurred by a PV (page view) request from the user. Each advertiser submits bid

based on her private information, which could be the probability of the user’s behaviors (

e.g., , etc.) over the ad, obtained by learning-based prediction module (Cheng et al., 2016; Zhou et al., 2018)

. We use vector

to represent the bids of all advertisers, where are the bids from all advertisers except . We represent the ad auction mechanism by , where is the ad allocation scheme and is the payment rule. The ad allocation scheme would jointly consider the bids and the quality (e.g., and ) of the ads in a principled manner. We use to denote the advertiser wins the th ad slot, while represents the advertiser loses the auction. The winning ads would be displayed to the user. The auction mechanism module further calculates the payments for the winning ads with a rule , which would be carefully designed to guarantee the economic properties and the revenue of the auction mechanism.

2.2. Problem Formulation

Follow the work (Zhang et al., 2021), we formulate the problem as multiple performance metrics optimization in the competitive advertising environments. Given bid vector b from all the advertisers and ad performance metric functions (such as Revenue, CTR, CVR, etc), we aim to design an auction mechanism , such that

s.t. Incentive Compatibility (IC) constraint,
Individual Rationality (IR) constraint,

where is the advertisers’ bid distribution based on which bidding vectors are drawn. We define , where the objective is to maximize a linear combination of the multiple performance metrics ’s with preference parameters ’s. The parameters ’s are the inputs of our problem. The constraints of IC and IR guarantee that advertisers would truthfully report the bid, and would not be charged more than their maximum willing-to-pay for the allocation, which are important for the stability of the ad auction and would be discussed in details in Section 2.3.

In this work, we stick to the design rationale of classical GSP auction mechanism (Lahaie and Pennock, 2007; Bachrach et al., 2014), where the allocation scheme is to rank advertisers according to their rank scores with a non-increasing order. The pricing rule is to charge the winning advertisers with the minimum bid required to maintain the same ad slot. We study a learning-based GSP auction framework, leveraging the power of deep neural network to design a new rank score function and integrate it into the GSP. We use to denote this new rank score function, where denotes all available information, including bid and other features related to the advertiser ’s ad, in the auction. For ease of presentation, we also denote it as if there is no ambiguity. The training of this non-linear model is under the guideline of optimization objective in (1). With this new rank score, the allocation scheme and payment rule can be summarized as follows:

Allocation Scheme : Advertisers are sorted in a non-increasing order of new rank score . Without loss of generality, let


then the advertisers with top-K scores would win the corresponding ad slots, with ties broken randomly.

Payment Rule : The payment for the winning advertiser is calculated by the formula:


where is the rank score of the next highest advertiser, and is the inversion function of .

2.3. Economic Properties

In the auction mechanism design, one cannot just assume that an advertiser would truthfully reveal her maximum willing-to-pay price in the auction222 is not necessarily equal to the value of the PV request. For example, there may be budget constraints., since they have incentives to misreport these prices to manipulate their own interests (Edelman and Ostrovsky, 2007). This may seriously harm the stability of the advertising platform. Therefore, we need to guarantee the property of incentive compatibility (IC), from mechanism design. This property removes the computational burden of bidding strategy optimization from advertisers, and also, leads to reliable and predictable inputs for the auction mechanisms.

Definition 2.1 (Incentive Compatibility (Vickrey, 1961)).

An auction mechanism is IC if it is in the best interest of each advertiser to truthfully reveal her maximum willing-to-pay price, i.e., .

In traditional auction theory, the celebrated IC auction mechanisms, such as VCG auction (Vickrey, 1961) and Myerson auction (Myerson, 1981), typically build upon the assumption that advertisers are utility maximizers, that is, the goal of each advertiser is to optimize her quasi-linear utility, defined as the difference between her expected value and the payment , i.e., . However, we observe from the industrial e-commerce platform that this model could not fully capture the behavior pattern of advertisers. For example, in Taobao advertising platform, there are two representative types of advertisers: Optimized Cost Per Click (OCPC) advertisers with upper bounds of bids, and Multi-variable Constrained Bidding (MCB) advertisers with constraints over budgets and the average costs, such as pay-per-click (PPC) and pay-per-acquisition (PPA). The goal of both types of advertisers is to optimize the overall realized value of advertising, such as the quantity of conversions and clicks, under certain constraints over the payments. For these types of advertisers, they would calculate and report a maximum willing-to-pay price for each PV request based on the current status of the ad campaign, with the help of auto-bidding services (Zhu et al., 2017; Yang et al., 2019). This behavior pattern of advertisers could be well captured by the model of value maximizer (Wilkens et al., 2017), which is defined as follows:

Definition 2.2 (Value maximizer (Wilkens et al., 2017)).

A value maximizer optimizes value while keeping payment below her maximum willing-to-pay ; when value is equal, a lower is preferred.

In the auctions with multiple ad slots, a value maximizer prefers to the outcome with a higher slot when the payment is below the maximum willing-to-pay price, and then a smaller payment is preferred under the situation with equal value. The strategic behavior pattern of value maximizers would be quite different from the traditional utility maximizers. It has been proved that an auction mechanism is IC for value maximizers, as long as the following two conditions are satisfied (Aggarwal et al., 2009; Wilkens et al., 2017):

Monotonicity: An advertiser would win the same or a higher slot if she reports a higher bid;

Critical price: The payment for the winning advertiser is the minimum bid that she needs to report to maintain the same slot.

We note that IR is also guaranteed under these two IC conditions. Obviously given the monotonicity constraint, the critical price is strictly lower than the bid, and hence is lower than the maximum willing-to-pay price, i.e., . It could be easily verified that GSP satisfies these conditions and hence is IC and IR for value maximizers. In this work, we would design learning-based auction mechanisms, following the above conditions.

3. Deep Neural Auction

In this section, we present the details of Deep Neural Auction (DNA) mechanism for optimizing multiple performance metrics under the multi-slot setting for e-commerce advertising.

(a) Deep Neural Auction Architecture
(b) Set Encoder: Deep Set Architecture
(c) Context-Aware Rank Score Function: Partially Monotone MIN-MAX Network
(d) Differentiable Sorting Engine: NeuralSort Module
Figure 2. (a) The overall Deep Neural Auction architecture. (b) The Deep Set based set encoder receives the whole set of ad features and outputs a set embedding. (c) Partially monotone MIN-MAX neural network based context-aware rank score function. The straight lines represent connections with non-negative weights, whereas the dashed lines represent unconstrained connections. (d) The differentiable sorting engine takes in the generated rank scores and outputs a row-stochastic permutation matrix of argsort as well as its corresponding allocation and payments, by using NeuralSort.

3.1. Overall Architecture

As illustrated in Fig. (a)a, DNA consists of three modules: a set encoder, a context-aware rank score function, and a differentiable sorting engine. The set encoder learns a set embedding from the features of candidate ads, which encodes the context of the auction. This set embedding is attached as a complementary feature for each ad. Next, each advertiser employs a shared MIN-MAX neural network to generate context-aware rank scores from advertiser’s features. This neural network is partially monotonic with respective to bids, which is critical to the guarantee of IC property. Another advantage of the designed neural network is a closed form expression for the inverse transform, which enables an easy payment calculation. Then, the differentiable sorting engine conducts a continuous relaxation of sorting operator in auctions, and outputs a row-stochastic permutation matrix. We can use this row-stochastic permutation matrix to express the expected revenue as well as other predicted performance metrics, from which training losses from real feedback underlying the auction outcome can be constructed. With these components, the whole DNA mechanism is fully differentiable with respect to its inputs, and can be integrated into the end-to-end learning pipeline as in deep learning. It is worth to note that both the set encoder and the context-aware rank score function are parameterized neural network models, while the differentiable sorting engine is a non-parameterized operator. We next introduce the details of these three modules.

3.2. Auction Context Encoding

As the final auction outcome is jointly determined by all the candidate ads, we design a set encoder to automatically extract the feature of the auction context from all the candidate ads, instead of the individual ad. We attach this auction context feature as an augmented feature for each ad to overcome the ambiguity issue discussed in Section 1.

The set encoder receives the whole set of ad features as input. As there is no inherent ordering among the advertisers in the set, the feature set is permutation-equivariant (Zaheer et al., 2017) to the auction outcome, i.e., the auction outcome does not rely on the ordering of the features. Learning models that do not take this set structure into account (such as MLPs or RNNs) would cause the issue of discontinuities  (Zhang et al., 2019). Inspired from the recent progress on learning on set (Zhang et al., 2019; Zaheer et al., 2017), we implement the set encoder by designing a new deep neural network, which uses DeepSet (Zaheer et al., 2017) network to aggregate individual ad features to form a representation for the auction context. The main idea is that, by setting equivariant layers and a final symmetric layer, the DeepSet can learn to aggregate all the features’ information in a permutation-equivariant manner. The generated embedding vector can be trained to predict some interested statistical value about the whole set.

Concretely, as illustrated in Fig. (b)b, the set encoder is composed of two groups of layers, and . Given the features of candidate ads , each instance is firstly mapped to a high-dimensional latent space through shared fully connected layers , resulting in a set of intermediate hidden states :



is an Exponential Linear Unit (ELU) activation function 

(Clevert et al., 2016). Then, this hidden states set is processed with symmetric aggregation pooling (such as average pooling) to build the final set embedding for each ad with another fully connected layer :


where represent the hidden states from all advertisers except . This set encoder is built by composing permutation-equivariant operations (shared nonlinear transformation) with symmetric aggregation operations (average pooling) at the end. Since the symmetric operations are commutative to the input items, the output is the same regardless of the order of the items. The set encoder learns to extract the context of auction on the whole set of candidate ads (Zaheer et al., 2017; Edwards and Storkey, 2017), which is driven by the downstream training signals in an end-to-end manner. The output set embedding would be sent to the downstream rank score function as an augmented feature for each ad, which helps to infer each ad’s rank score in the current candidate ads set. It should be noted that the set encoder does not include the bids from all candidate ads, as shown in Fig. (b)b. This design is specified mainly for the guarantee of IC property, keeping the affection of each ad’s rank score only through her bid, which will be elaborated in Section 3.3.

3.3. Context-Aware Rank Score

We design a deep neural network to transform each advertiser’s augmented feature to a context-aware rank score. We use to denote this rank score, where represents the augmented features except bid, i.e., . From Section 2.3, we need to satisfy two conditions to guarantee the IC property for value maximizers: monotone allocation and critical price. Thus, we aim to design a strictly monotone neural network with respect to bid, and supports efficient inverse transform given the next highest rank score.

We incorporate the aforementioned constraints within the network architecture design, and restrict the search space to a family of partially monotone parameterized neural network. We model the rank score function as a two-layer feed-forward network with min and max operations over linear functions (Daniels and Velikova, 2010) as shown Fig. (c)c. For groups of linear functions, we associate strictly positive weights with and other unconstrained weights with , as well as intercepts , where . For simplicity, we denote as , and assume without loss of generality. We can define:


Since each of the above linear function in Eq. (6) is strictly non-decreasing on , so is . This partially monotone MIN-MAX neural network has been proved with the capability to approximate any function (Daniels and Velikova, 2010). Another particular advantage of this representation is that the inverse transform can be directly obtained from the parameters for the forward transform in a closed form expression. For example, given the next highest rank score , the payment for advertiser can be formulated as follows:


With the above designed MIN-MAX neural network, the two conditions for IC property can be satisfied, given the assumption that the bids affect the rank scores only through the input in . However in the industrial advertising environment, there are some engineered features in from domain knowledge that may have complex dependence relation with the bids. Therefore the bids may affect the rank scores and then the allocation in a complex way, which may violate this assumption. In a large-scale advertising platform, this effect of a change of one’s bid on the rank scores via this route would be quite small from our observations on the industrial data sets. To investigate the influence of this issue on IC property, we conduct comprehensive experiments in Section 4.2.3 to calculate the data-driven IC metric of DNA for value maximizers. We reserve the discussion about the strictly IC DNA mechanism design as an interesting open problem in our future work.

3.4. Differentiable Sorting Engine

After calculating the rank scores of all ads, the mechanism determines the allocation and payment, following Eq. (2) and (3). However, treating allocation and payment outside the model learning (i.e., as an agnostic environment) is in some sense poorly suited for deep learning. That is the processes of allocation and payment (actually the sorting operation) are not natively differentiable, and the gradients must all be evaluated via finite difference or likelihood ratio methods (such as the policy search used in Deep GSP (Zhang et al., 2021)

), with some additional issues of convergence stability and data-efficiency. Also, in another line of related work, the model-based reinforcement learning (RL) has achieved some notable successes 

(Moerland et al., 2020). Some recent works used a general neural network to learn a differentiable dynamic model (Kurutach et al., 2018) and argued that the model-based approaches are often more superior and data-efficient than model-free RL methods for many tasks (Ha and Schmidhuber, 2018; de Avila Belbute-Peres et al., 2018). These insights give us a motivation to model the whole process of allocation and payment inside the Deep Neural Auction framework.

In various types of auction mechanisms, both the allocation and payment are built on a basic sorting operation. Sorting operation outputs two vectors, neither of which is differentiable. On the one hand, the vector of sorted values is piecewise linear. On the other hand, the sorting permutation (more specifically, the vector of ranks via argsort operator) also has no differentiable properties as it is integer-valued.

To overcome this issue, we propose a differentiable sorting engine that caters to the top- selection in the multi-slot auctions. We present a novel use of differentiable sorting operator, i.e., NeuralSort (Grover et al., 2019), to derive a differentiable top- permutation matrix, which can be used to generate the various expected outcomes of the auctions. Given a set of unsorted rank scores , we are concerned with the argsort operator, where argsort returns the permutation that sorts in a decreasing order. Formally, we define the argsort operator as the mapping from -dimensional real vectors to the permutations over elements, where the permutation matrix is expressed as


Here indicates if is the th largest rank score in . The results from  (Grover et al., 2019) showed the identity:


where , denotes the absolute pairwise differences of elements in such that , and denotes the column vector of all ones. Then, by relaxing the operator argmax in Eq. (9) by a row-wise softmax, we can arrive at the following continuous relaxation for the argsort operator , which is called NeuralSort in (Grover et al., 2019):


where is a temperature parameter that controls the degree of the approximation, and as , . Intuitively, the th row of can be interpreted as the ‘choice’ probabilities on all elements in , for getting the th highest item.

This row-stochastic permutation matrix , can be used as a basic operator to construct task-specific sorting procedures according to the order of generated rank scores in a differentiable manner. For instance, if we let denotes the payments calculated by Eq. (7) for advertisers in a PV request, then the top- payments, sorted by their corresponding rank scores, can be recovered by a simple matrix multiplication:


This row-stochastic permutation matrix acts as a differentiable sorting engine that makes the discrete sorting procedure compatible with differentiability.

3.5. End-to-End Model Training

3.5.1. Data for Training

All data sets we used were generated under GSP auction, which is IC for e-commerce value maximizing advertisers (Wilkens et al., 2017)

. The data contains all advertisers’ bids, the estimated values (

e.g., , ), ads information (e.g., category, price of product), user features (e.g., genders, age, income level) as well as the context information (e.g., the source of traffic). These information consists of the input features of the DNA architecture. The data also contains the real feedback information (e.g., click, conversion or transaction) from users.

3.5.2. Training Loss

As the training data contains the user feedback for each ad exposure, we can directly use the row-stochastic permutation matrix to compute the -slots expected performance metrics via: , where represents the vector of aggregated performance metrics for all ads from real feedback:


with standing for the th performance metric for th ad from a PV request. Therefore, we can formulate the learning problem as minimizing the sum of top- expected negated performance metrics for each PV request:


One exception is the calculation of revenue. Due to the change of allocation order, the payment for each ad is distinct from what has happened. Thus we use the generated payments, defined in Eq. (11) to replace the ones appeared in the training data.

We set another auxiliary task to help train the DNA mechanism. With the benefit of hindsight from real feedback, we can access the optimal allocation to maximize the performance metrics in each PV request. Thus we set another multiclass prediction task, whose loss is the row-wise cross-entropy (CE) between the ground-truth and the predicted row-stochastic permutation matrix:


where is the ground-truth permutation matrix, calculated by sorting their real feedback. We found that this auxiliary task was beneficial to yield a stable training process in our experiments. We use a hyper-parameter to balance the target loss and the cross-entropy term .

However, the user feedback, especially with respect to the conversion behaviors, is scarce in industrial e-commerce advertising, e.g., users typically decide to purchase a product after seeing dozens of ads. To alleviate this problem, we replace the sparse user behaviors (typically one-hot) in data with the dense values from the prediction model (such as and ), and debias these predicted values with real user behaviors by the calibration techniques (Borisov et al., 2018; Deng et al., 2021). We also eliminate the deviation of CTR between different slots by debiasing with the posterior inherent CTR of different slots.

4. Experimental Evaluations

4.1. Experiment Setup

4.1.1. Evaluation Metrics

We consider the following metrics in our offline and online experiments, which reflect the platform revenue, user experience, as well as advertisers’ utility in the e-commerce advertising. For all experiments in this paper, metrics are normalized to a same scale.

1) Revenue Per Mille (RPM). .

2) Click-Through Rate (CTR). .

3) Conversion Rate (CVR). .

4) GMV Per Mille (GPM). .

Apart from the advertising performance indicators, we also evaluate the effectiveness of our designed learning-based auction mechanisms on the property of IC.

5) IC Metric (). We propose a new data-driven metric of IC, , to represent the ex-post regret of value maximizers, similar to the data-driven IC for utility maximizers (Feng et al., 2019). The metric of consists of the regret on value and the regret on payment , which denotes, through bid perturbation, the maximum percentage of value increase under the payment constraint, and the maximum percentage of payment decrease with the identical allocation, respectively. Concretely, we formulate as333Since our training data comes from a vanilla GSP mechanism, which is IC for value maximizers, we directly take the bid as the maximum willing-to-pay price of advertiser .:


where we denote and as the allocated slot indexes of advertiser when bidding truthfully and bidding a perturbed , respectively. We use as the click-through rate of slot , and as the payment of advertiser when bidding . It should be noted that the value of a click for advertiser is reduced from the fraction in Eq. (15). Intuitively, measures to what extent a value-maximizing advertiser could get better off via manipulating her bid. A larger value of indicates that an advertiser could obtain more extra value under the constraint of maximum willing-to-pay price. Similarly, a larger value of indicates that an advertiser could be undercharged more while obtaining the same ad slot. For example, as GSP is IC for value maximizers, its values of both and are 0.

4.1.2. Baselines Methods

We compare DNA with the widely used mechanisms in the industrial ad platform.

1) Generalized Second Price auction (GSP). The rank score in the classical GSP is simply the bids multiplying , namely effective Cost Per Milles (eCPM). The payment rule is the value of the minimum bid required to retain the same slot. The work (Lahaie and Pennock, 2007) suggested incorporating a squashing exponent into the rank score function, i.e., could improve the performance, where can be adjusted to weight the performance of revenue and CTR. We refer to this exponential form extension as GSP in the experiments.

2) Utility-based Generalized Second Price auction (uGSP). uGSP extends the conventional GSP by taking the rank score as a linear combination of multiple performance metrics using estimated values: , where represents other utilities, such as CTR and CVR: . The payment of uGSP follows the principle from GSP: . uGSP is widely used in industry to optimize multiple performance metrics (Bachrach et al., 2014).

3) Deep GSP (Zhang et al., 2021) Deep GSP uses a deep neural network to map ad’s related features to a new rank score within the GSP auction. This new rank score function is optimized using model-free reinforcement learning to maximize the interested performance metrics.

4.2. Offline Experiments

4.2.1. Datasets

The data sets we used for experiments come from Taobao, a leading e-commerce advertising system in China. We randomly select 5 million records logged data under GSP auctions from July 4, 2020 as training data, and 870k records logged data from July 5, 2020 as test data. Unless stated otherwise, all experiments are conducted under the setting of top-3 ads displayed (i.e., 3-slot auctions) in each PV request. Other details about the model configurations and training procedure are in Appendix A.

4.2.2. Performance in Offline Simulations

Figure 3. The positive correlations between learned rank scores and the targeted performance metrics. Each blue dot represents an ad in the candidate set.
Figure 4. The performance of DNA and other baseline mechanisms in the offline experiments.

We conduct experiments to compare the performance of DNA and other baseline mechanisms. In order to facilitate intuitive comparisons, we set only two performance metrics with the form , where is one of the metric selected from . For uGSP, we set the rank score function with . For GSP, we tune the variable in the interval . For both Deep GSP and DNA, we directly set the objective by selecting the values of uniformly from the interval .

We show the relation between the learned rank scores and the targeted performance metrics, and illustrate some results in Fig. 3. We calculate the Pearson’s Correlation Coefficient (), which is on the test data set, together with the p-value less than . This result indicates the strong positive correlation between the learned rank scores and the performance metrics, implying that the ads with higher targeted objectives also have higher rank scores. This would encourage advertisers to optimize their ads’ quality (such as CTR) to enhance their competitiveness in the auctions.

As some performance metrics may be conflicting, we next plot the Pareto Front for different baselines in Fig. 4

. We observe that all the learning-based methods (both Deep GSP and DNA) are above the curves of other baselines. The flexible learning-based rank score models have the ability to perform automatic feature extraction from raw data. Learning-based methods can alleviate the problem of inaccurate predicted values (such as

) used in GSP and uGSP to some extend, and learn ad auctions directly from the real feedback. We also note that the performance of GSP is poor when considering CVR and GPM, as GSP does not model the effect of these indicators explicitly in its rank score formulation.

DNA outperforms Deep GSP baseline by a clear margin. The rank score function of Deep GSP is only conditioned on each ad’s private information. While in DNA, it also contains a set embedding which models the context of auction from the candidate ads set explicitly, making it more competitive than the rank scores in the Deep GSP auctions. We analyze this set embedding in more details in Appendix B. The upgrade of training method also contributes to this superior performance improvement. Deep GSP treats the whole process of allocation as the environment and uses exploration based algorithms (i.e., policy search) to optimize the rank score functions. In comparison, DNA directly differentiates the sorting procedure in allocation, which is more data-efficient.

4.2.3. Evaluation on IC Property

We present the IC property of DNA, i.e., the regret metric , in Table 1. We compare with only the regret of truthful bidding in the Utility-based Generalized First Price (uGFP) auction, as the regret of other mechanisms (GSP and uGSP) are all . uGFP mechanism allocates the ad slots in the same way as uGSP while the payment of a winning advertiser is simply her bid. One can observe from Table 1 that of both DNA and uGFP are , which indicates that advertisers could not win higher slots under the payment constraints. For , DNA outperforms uGFP significantly under all experiment settings. For example of the first row (1.0RPM), an advertiser could be undercharged at most payment in DNA with the same allocation result, which is in uGFP.

1.0RPM 0.042%
0.5RPM+0.5CTR 0.059%
0.5RPM+0.5CVR 0.118%
0.5RPM+0.5GPM 0.028%
Table 1. IC Metric () experiments under four tasks with 1000 PV requests randomly selected from the test data. For each bidder, we randomly generate 100 perturbations (ranging from to times) to her value.

4.3. Online Experiments

We present the online experiments by deploying the proposed DNA in Taobao advertising system. As we only use the trained model to make inference in online system, we set the temperature parameter in the differentiable sorting engine to output the winning ads as well as the corresponding payments to the online advertising platform. Other deployment details are described in Appendix C.

% Improved Deep GSP DNA
CTR +6.43% +11.58%
CVR +6.38% +31.26%
GPM +2.77% +16.17%
Table 2. Online A/B test compared with uGSP on promoting different performance metrics, keeping the same RPM level.
% Improved +5.68% +18.93% +14.68% +14.53%
Table 3. Online A/B test (nearly two months) compared with GSP on promoting all performance metrics.

To demonstrate the performance of DNA, we conduct online A/B tests with 1% of whole production traffic from Jan 25, 2021 to Feb 8, 2021 (about one billion auctions). We also consider RPM, CTR, CVR and GPM metrics and conduct online experiments as in the offline experiments, i.e., setting only two performance metrics at a time. In order to make fair and efficient comparisons between different baselines in production traffic, we set the in uGSP with , and tune the ’s of both Deep GSP and DNA until the observed RPM performance reaches the same level with the one in uGSP. Then we record the relative improvements of the other metrics of Deep GSP and DNA compared with uGSP, which is shown in table 2. From the results, we can find that the DNA mechanism achieves the highest promotion for CTR, CVR and GPM. To verify the performance stability of the DNA, we conduct a relatively long-term experiment (nearly two months) to compare with the GSP auction. Table 3 shows that all the performance metrics related to users, advertisers, and advertising platform are promoted. Considering the massive advertisers and users, the promotion of marketing performance verifies the effectiveness of our proposed DNA framework.

5. Related Work

A plethora of works have used learning-based techniques for revenue-maximizing mechanism design in theoretical auction settings. Conitzer and Sandholm (Conitzer and Sandholm, 2002) proposed the paradigm of automated mechanism design (AMD) and laid the groundwork for this direction. Since then, many works (Golrezaei et al., 2017; Lahaie, 2011) have adopted learning approaches for mechanism design problems. Recently, Dütting et al. (Dütting et al., 2019) first leveraged deep neural networks for the automated design of optimal auctions. Several other works extended this study for various scenarios (Feng et al., 2018; Golowich et al., 2018; Rahme et al., 2021).

A particular stream of research have focused on the mechanism design in online advertising. The GSP and VCG auction have been widely adopted and investigated in various advertising system (Edelman et al., 2007; Lahaie and Pennock, 2007). Building on the success of deep reinforcement learning, Tang et al. (Tang, 2017; Cai et al., 2018) proposed reinforcement mechanism design for the optimization of reserve prices in online advertising. A recently proposed learning-based ad auction mechanism, called Deep GSP (Zhang et al., 2021), leveraged the deep learning technique to optimize multiple performance metrics in e-commerce advertising.

6. Conclusion

In this paper, we have proposed a Deep Neural Auction mechanism, towards learning data efficient and end-to-end auction mechanisms for e-commerce advertising. We have deployed the DNA mechanism on one of the leading e-commerce advertising platforms, Taobao. The offline experiments as well as the online A/B test showed that DNA mechanism significantly outperformed other existing auction mechanism baselines on optimizing multiple performance metrics.


This work was supported in part by Science and Technology Innovation 2030 –“New Generation Artificial Intelligence” Major Project No. 2018AAA0100905, in part by China NSF grant No. 62025204, 62072303, 61902248, and 61972254, and in part by Alibaba Group through Alibaba Innovation Research Program, and in part by Shanghai Science and Technology fund 20PJ1407900. The authors would like to thank Rui Du, Haiping Huang, Haiyang He and Guan Wang who did the really hard work for online system implementation. The opinions, findings, conclusions, and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies or the government.


  • G. Aggarwal, S. Muthukrishnan, D. Pál, and M. Pál (2009) General auction mechanism for search advertising. In WWW, pp. 241–250. Cited by: §2.3.
  • Y. Bachrach, S. Ceppi, I. A. Kash, P. Key, and D. Kurokawa (2014) Optimising trade-offs among stakeholders in ad auctions. In EC, pp. 75–92. Cited by: §1, §1, §2.2, §4.1.2.
  • P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al. (2018) Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261. Cited by: §1.
  • A. Borisov, J. Kiseleva, I. Markov, and M. de Rijke (2018) Calibration: a simple way to improve click models. In CIKM, pp. 1503–1506. Cited by: §3.5.2.
  • Q. Cai, A. Filos-Ratsikas, P. Tang, and Y. Zhang (2018) Reinforcement mechanism design for e-commerce. In WWW, pp. 1339–1348. Cited by: §5.
  • H. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, et al. (2016) Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, pp. 7–10. Cited by: §2.1.
  • D. Clevert, T. Unterthiner, and S. Hochreiter (2016) Fast and accurate deep network learning by exponential linear units (elus). In ICLR, Cited by: §3.2.
  • V. Conitzer and T. Sandholm (2002) Complexity of mechanism design. In UAI, pp. 103–110. Cited by: §5.
  • H. Daniels and M. Velikova (2010) Monotone and partially monotone neural networks. IEEE Transactions on Neural Networks 21 (6), pp. 906–917. Cited by: §3.3.
  • F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z. Kolter (2018) End-to-end differentiable physics for learning and control. NeurIPS 31, pp. 7178–7189. Cited by: §3.4.
  • C. Deng, H. Wang, Q. Tan, J. Xu, and K. Gai (2021) Calibrating user response predictions in online advertising. In ECML PKDD 2020, pp. 208–223. Cited by: §3.5.2.
  • P. Dütting, Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath (2019) Optimal auctions through deep learning. In ICML, pp. 1706–1715. Cited by: §1, §1, §5.
  • B. Edelman, M. Ostrovsky, and M. Schwarz (2007) Internet advertising and the generalized second-price auction: selling billions of dollars worth of keywords. American economic review 97 (1), pp. 242–259. Cited by: §1, §5.
  • B. Edelman and M. Ostrovsky (2007) Strategic bidder behavior in sponsored search auctions. Decision support systems 43 (1), pp. 192–198. Cited by: §2.3.
  • H. Edwards and A. J. Storkey (2017) Towards a neural statistician. In ICLR, Cited by: §3.2.
  • Z. Feng, H. Narasimhan, and D. C. Parkes (2018) Deep learning for revenue-optimal auctions with budgets. In AAMAS, pp. 354–362. Cited by: §1, §1, §5.
  • Z. Feng, O. Schrijvers, and E. Sodomka (2019) Online learning for measuring incentive compatibility in ad auctions. In WWW, pp. 2729–2735. Cited by: §4.1.1.
  • A. Goldfarb and C. Tucker (2011) Online display advertising: targeting and obtrusiveness. Marketing Science 30 (3), pp. 389–404. Cited by: §1.
  • N. Golowich, H. Narasimhan, and D. C. Parkes (2018) Deep learning for multi-facility location mechanism design.. In IJCAI, pp. 261–267. Cited by: §5.
  • N. Golrezaei, M. Lin, V. Mirrokni, and H. Nazerzadeh (2017) Boosted second price auctions: revenue optimization for heterogeneous bidders. Available at SSRN 3016465. Cited by: §1, §5.
  • A. Grover, E. Wang, A. Zweig, and S. Ermon (2019) Stochastic optimization of sorting networks via continuous relaxations. In ICLR, Cited by: §3.4.
  • D. Ha and J. Schmidhuber (2018) World models. arXiv preprint arXiv:1803.10122. Cited by: §3.4.
  • E. Hüllermeier and J. Beringer (2006) Learning from ambiguously labeled examples. Intelligent Data Analysis 10 (5), pp. 419–439. Cited by: §1.
  • T. Kurutach, I. Clavera, Y. Duan, A. Tamar, and P. Abbeel (2018) Model-ensemble trust-region policy optimization. In ICLR, Cited by: §3.4.
  • S. Lahaie and D. M. Pennock (2007) Revenue analysis of a family of ranking rules for keyword auctions. In EC, pp. 50–56. Cited by: §2.2, §4.1.2, §5.
  • S. Lahaie (2011) A kernel-based iterative combinatorial auction. In AAAI, Vol. 25. Cited by: §5.
  • T. M. Moerland, J. Broekens, and C. M. Jonker (2020) Model-based reinforcement learning: a survey. arXiv preprint arXiv:2006.16712. Cited by: §3.4.
  • R. B. Myerson (1981) Optimal auction design. Mathematics of operations research 6 (1), pp. 58–73. Cited by: §1, §2.3.
  • J. Rahme, S. Jelassi, and S. M. Weinberg (2021) Auction learning as a two-player game. In ICLR, Cited by: §1, §5.
  • W. Shen, P. Tang, and S. Zuo (2019) Automated mechanism design via neural networks. In AAMAS, pp. 215–223. Cited by: §1, §1.
  • P. Tang (2017) Reinforcement mechanism design.. In IJCAI, pp. 5146–5150. Cited by: §5.
  • L. Van der Maaten and G. Hinton (2008) Visualizing data using t-sne.. Journal of machine learning research 9 (11). Cited by: Appendix B.
  • H. R. Varian (2007) Position auctions. international Journal of industrial Organization 25 (6), pp. 1163–1178. Cited by: §1.
  • W. Vickrey (1961) Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance 16 (1), pp. 8–37. Cited by: §1, §1, §2.3, Definition 2.1.
  • C. A. Wilkens, R. Cavallo, and R. Niazadeh (2017) GSP: the cinderella of mechanism design. In WWW, pp. 25–32. Cited by: §2.3, §2.3, Definition 2.2, §3.5.1.
  • X. Yang, Y. Li, H. Wang, D. Wu, Q. Tan, J. Xu, and K. Gai (2019) Bid optimization by multivariable control in display advertising. In KDD, pp. 1966–1974. Cited by: §2.3.
  • M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola (2017) Deep sets. In NIPS, pp. 3391–3401. Cited by: §3.2, §3.2.
  • Y. Zhang, J. Hare, and A. Prügel-Bennett (2019) FSPool: learning set representations with featurewise sort pooling. In ICLR, Cited by: §3.2.
  • Z. Zhang, X. Liu, Z. Zheng, C. Zhang, M. Xu, J. Pan, C. Yu, F. Wu, J. Xu, and K. Gai (2021) Optimizing multiple performance metrics with deep gsp auctions for e-commerce advertising. In WSDM, pp. 993–1001. Cited by: §1, §1, §1, §2.2, §3.4, §4.1.2, §5.
  • G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai (2018) Deep interest network for click-through rate prediction. In KDD, pp. 1059–1068. Cited by: §2.1.
  • H. Zhu, J. Jin, C. Tan, F. Pan, Y. Zeng, H. Li, and K. Gai (2017) Optimized cost per click in taobao display advertising. In KDD, pp. 2191–2200. Cited by: §2.3.

Appendix A Model Configurations and Offline Training Procedure

In the set encoder module (Fig. (b)b),

consists of two fully connected layers with 128, 32 neurons respectively.

is a single fully connected layer with 16 neurons, thus the final size of the set embeddings is 16. The ELU non-linearity is applied to the output of every layer. In the context-aware rank score module (Fig. (c)c), we use 5 groups of 20 linear functions in the partially monotone MIN-MAX neural network, i.e., .

We use the Adam optimizer with . Each batch contains 128 PV requests in total. We also leverage some temperature annealing schedules for adjusting in the differentiable sorting engine during the training process, such as polynomial decay and exponential decay. But we did not observe significant performance differences between these schedules. The offline training procedure of DNA is as follows:

0:  Online log data with user behaviors, temperature annealing schedule in the differentiable sorting engine
1:  Data preprocessing: feature construction, ground-truth label generation
2:  Initialize the neural network parameters of the set encoder and the context-aware rank score function, initialize the temperature in the differentiable sorting engine
3:  while not converged do
4:     Sample a random minibatch from training data
5:     Compute the target loss and the cross-entropy loss with Eq. (13)(14), given the generated rank scores and payments with Eq. (4)(5)(6)(7)

     Update the network parameters using stochastic gradient descent optimizer (

i.e., Adam)
7:     Decrease by one step
8:  end while
Algorithm 1 Offline Training Procedure of DNA

Appendix B Analysis of the Set Embeddings

We next provide empirical evidence suggesting the meaningfulness of set embedding learned by the set encoder. We conducted experiments on training DNA mechanism without the set encoder. The learning curves on four different tasks are plotted in Fig. 5. We find that the learning performance degrades when disabling the set encoder, indicating that the context-aware rank scores are beneficial to promote the performance. To qualitatively study the latent set embeddings from the set encoder, we randomly select some top-10 and last-10 ads from the test data set and generate their corresponding set embeddings using the trained set encoder. The t-SNE (Van der Maaten and Hinton, 2008) plots can be seen in Fig. 6, where each point represents an ad. It is interesting to find that the top-ranked ads are clustered together, and the “weak” ads are separated from the cluster. This indicates that the set embedding may carry the “competitiveness” information from the other candidate ads, assisting the subsequent rank score module to learn rank scores towards optimizing the overall performance.

Figure 5. Learning curves on DNA and DNA w/o set encoder in four training tasks.
Figure 6. Distribution of the set embedding for sampled tok-10 and end-10 ads in latent space using t-SNE.

Appendix C Deployment in E-commerce Advertising System

The Deep Neural Auction mechanism is deployed under “Adverti-sement Intelligent Decision-mAking system” (AIDA) in Taobao display advertising system. The online inference procedure can be formulated as follows:

0:  Online auction data (all candidate ads in a PV request), the trained DNA
1:  Data preprocessing: feature construction
2:  Generate rank scores of DNA with Eq. (4)(5)(6)(7)
3:  Obtain the top- winning ads and their corresponding payments with differentiable sorting engine by setting
Algorithm 2 Online Auction Service of DNA
Figure 7. The Pipeline of Training System.

Fig. 7 shows the overall architectures of the training system pipeline, including online ad platform and offline trainer. Firstly, the online ad platform receives a page view (PV) request from a user. The relevant candidate ads are selected, together with the generated user response predictions (e.g., , ). Then, the auction mechanism is conducted and the top- winning ads will be selected and displayed to the user. Once the user finishes interacting with the ad, e.g., click, or place an order, these behaviors are recorded as log data and sent to the real-time data processing module, where hundreds of thousands of log data are processed per second. After that, the training procedure is performed via parameter-server (PS) framework and the model evaluation will be periodically carried on to monitor the training process. When the convergence condition is satisfied, the model checkpoint will be pushed to the model center module. The model checkpoint can be delivered in several minutes from offline to online. We also design a model version management tool inside the model center, endowing the training system with rollback ability in response to training crash or service breakdown. The whole pipeline from real-time data processing to model checkpoint transmission can be completed in less than 20 minutes.

Figure 8. Online Auction Service.

The online auction service, as illustrated in Fig. 8, consists of three main components: a feature extraction module, a model inference module, and an allocation & pricing module. Given the candidate ads set with the available information, the feature extraction module conducts feature engineering, which extracts useful information from the raw user and ads. The trained model checkpoint is then loaded and executed to generate rank scores for all ads in the model inference module. Finally, the allocation and payments are determined using the differentiable sorting engine in DNA (by setting ). We mainly focus on response time (RT) for the online auction service. The online auction service processes tens of thousands of ad auctions per second, and the RT is 6ms average with dozens of 32-CPU-cores servers. In the last Double Eleven shopping festival of Taobao, the online auction service accommodated a tremendous nearly 1 million ad auctions per second during the peak, keeping all services in working order.