DLF
Deep learning for flexible market price modeling (landscape forecasting) in real-time bidding advertising. An implementation of our KDD 2019 paper.
view repo
The emergence of real-time auction in online advertising has drawn huge attention of modeling the market competition, i.e., bid landscape forecasting. The problem is formulated as to forecast the probability distribution of market price for each ad auction. With the consideration of the censorship issue which is caused by the second-price auction mechanism, many researchers have devoted their efforts on bid landscape forecasting by incorporating survival analysis from medical research field. However, most existing solutions mainly focus on either counting-based statistics of the segmented sample clusters, or learning a parameterized model based on some heuristic assumptions of distribution forms. Moreover, they neither consider the sequential patterns of the feature over the price space. In order to capture more sophisticated yet flexible patterns at fine-grained level of the data, we propose a Deep Landscape Forecasting (DLF) model which combines deep learning for probability distribution forecasting and survival analysis for censorship handling. Specifically, we utilize a recurrent neural network to flexibly model the conditional winning probability w.r.t. each bid price. Then we conduct the bid landscape forecasting through probability chain rule with strict mathematical derivations. And, in an end-to-end manner, we optimize the model by minimizing two negative likelihood losses with comprehensive motivations. Without any specific assumption for the distribution form of bid landscape, our model shows great advantages over previous works on fitting various sophisticated market price distributions. In the experiments over two large-scale real-world datasets, our model significantly outperforms the state-of-the-art solutions under various metrics.
READ FULL TEXT VIEW PDFDeep learning for flexible market price modeling (landscape forecasting) in real-time bidding advertising. An implementation of our KDD 2019 paper.
Emerged from 2009 (Muthukrishnan, 2009), real-time bidding (RTB) has become one of the most important media buying mechanisms in online advertising. In RTB, the advertisers propose the bid price in real time, according to their own bidding strategies and the auction side information (Zhang et al., 2014a), then the ad exchange decides the winner in the market of the auction, i.e., the advertiser with the highest bidding price in this auction (Yuan et al., 2013). From the view of advertiser, specifically, the bid price is decided according to the estimated utility (Ren et al., 2016) and the cost of the given auction request (Ren et al., 2018, 2016; Wu et al., 2015). On one hand, the utility generally represents the positive user response probability such as click-through rate (CTR) or conversion rate (CVR). On the other hand, the cost is the price which the advertiser would probably pay for the given auction.
Note that, the true charge for the winner of the auction is the highest bid price from her competitors, which is defined as the market price^{1}^{1}1The terms ‘market price’ and ‘winning (bid) price’ are used interchangeably in the related literatures (Amin et al., 2012),(Cui et al., 2011),(Wu et al., 2015). In this paper, we use ‘market price’. in the second-price auction mechanism. So that, from an advertiser’s perspective, predicting the market price is a crucial but challenging problem since the highest bid from hundreds or even thousands of advertisers for a specific ad impression is highly dynamic and almost impossible to predict by modeling each advertiser’s strategy (Amin et al., 2012). Moreover, only with the market price distribution, the advertiser can estimate the corresponding winning probability given an arbitrary bid price, which supports the subsequent bid decision making (Zhang et al., 2014a; Ren et al., 2018). For example, Lin et al. (2016) adopted a bidding strategy by proposing the bid price according to the estimated market price and Ren et al. (2018) presented a method of learning bidding strategy through optimizing advertisers’ profits which requires the forecasted bid landscape for each bid request sample. Thus, it is more practical to model the market price as a stochastic variable (Wu et al., 2015, 2018) and predict its distribution given each ad request feature, named as bid landscape forecasting (Wang et al., 2016; Cui et al., 2011) and illustrated in Figure 1.
The previous works on bid landscape forecasting can be divided as two streams. The first stream is mainly based on statistically counting from the segmented samples, for example counting per campaign (Zhang et al., 2016b) or by some particular attribute combinations (Wang et al., 2016)
. Different samples in the same segment share the same market price distribution which is too coarse-grained and often result in low prediction performance. The second stream is based on predefining a parameterized distribution form, such as log-normal distribution
(Cui et al., 2011)(Wu et al., 2015, 2018)(Zhu et al., 2017), and then learning the distribution parameters with the observed data. However, as is discussed in (Yuan et al., 2014) these assumptions are often too restricted and rejected by statistical tests thus lack of generalization.Yet there is another challenge of the bid landscape foresting which is the censorship issue. Since RTB adopts the one-slot second-price auction mechanism (Yuan et al., 2013), only the winner, who submits the highest bid price, will know the market price, i.e., the charged price, while others can only know that the market price is higher than their bids, which is called right-censored data (Wu et al., 2015; Wang et al., 2016). To handle this censorship, many works (Zhang et al., 2016b; Wang et al., 2016; Wu et al., 2015, 2018)
borrow the idea of survival analysis in medical data science and take the losing logs into consideration to better model the true market price distribution. However, these methods rely on only the losing logs and do not take a comprehensive view of considering both winning logs and losing logs for censorship handling.
In this paper, we propose a deep neural network methodology name as Deep Landscape Forecasting (DLF) model, without any presumed heuristic forms of market price distribution, to better capture the sophisticated patterns for each auction in bid landscape forecasting. Specifically, we utilize a recurrent neural network to model the conditional probability of the winning event given a bid. And then the model forecasts the distribution of market price by probability chain rule and naturally derive the winning probability distribution of arbitrary bid prices, for the given auction. We not only train the model through maximizing the log-likelihood of the true market price in the winning logs. Moreover, we also adopt a comprehensive loss function over both winning logs and losing logs, to handle the censorship.
The novelty of our methodology are threefold.
[leftmargin=3mm]
No assumption of distribution forms: Based on the novel modeling methodology, our model manages to generate flexible forecasting results for each ad request without making any prior assumptions about the market price distribution, which will be illustrated in the experiment.
Novel Censorship loss function: We adopt a comprehensive loss function for censorship handling and make a step further upon the traditional survival analysis methodologies (Wu et al., 2015) to better model the market price distribution.
To our knowledge, this is the first work that proposes an end-to-end deep learning model without any distributional assumptions for bid landscape forecasting. To deal with the right-censored problem, we use both the observed winning data and censored losing data to derive an unbiased learning. In addition, the experimental results over two real-world datasets show the significant improvement of our model over strong baselines under various evaluation metrics.
Bid Landscape Forecasting. As is discussed in the above section, bid landscape forecasting has become an important component in RTB advertising and drawn much attention in recent works (Ren et al., 2016, 2018; Lin et al., 2016).
In the view of distribution modeling methods, there are two phases. In the early phase, researchers proposed several heuristic forms of functions to model the market price distribution. In (Zhang et al., 2014a; Ren et al., 2018, 2016), the authors provided some analytic forms of winning probability w.r.t. the bid price applied on the campaign level, which is based on the observation of the winning logs. Later in the recent researches, some well-studied distributions are applied in market price modeling. Cui et al. (2011) presented a log-normal distribution to model the market price ground truth. Wu et al. (2015) proposed a regression model based on Gaussian distribution to fit the market price. Recently, Gamma distribution for market price modeling has also been studied in the work (Zhu et al., 2017). The main drawback of these distributional methods is that these restricted empirical preassumptions may lose the effectiveness of handling various dynamic data and they even ignore the sophisticated real data divergence as we show in Figure 2.
In the view of forecasting, the goal is to predict the true market price distribution, i.e., bid landscape, for the given auction sample. A template-based method was present in (Cui et al., 2011) to fetch the corresponding market price distribution w.r.t. the given auction request. Wang et al. (2016) proposed a clustering-based tree model to automatically segment the data samples and built a non-parametric estimator for each leaf node to predict the market price distribution. These methods can only perform coarse-grained bid landscape forecasting based on each data segment which misses the individual patterns of each ad request. The authors in (Wu et al., 2015)
proposed a linear regression method to model the market price w.r.t. auction features. Nevertheless, those methods do not care much about the non-linear patterns in the real data, i.e., such as the co-occurrence and correlations between features
(Qu et al., 2016) and the similarity/distinction among data segments, which may result in poor forecasting performance on different ad campaigns. Recently, Wu et al. (2018) have proposed one work about winning price prediction with deep models. However, they use several assumptions about the form of market price distribution which is not flexible in practice. Moreover, as is stated in their paper, the goal of their method is to directly predict the winning price, rather than forecasting the bid landscape, thus their method does not outperform the tree-based model (Wang et al., 2016) over the log-likelihood metric which is one of the main evaluation metrics.In this paper, we focus on fine-grained bid landscape forecasting at impression-level, and utilize deep recurrent neural network to flexibly model the non-linear and sequential patterns in market price distribution without any prior assumption of the distribution form.
Learning over Censored Data. The data censorship is another challenge for bid landscape forecasting. In the online advertising field, many models based on survival analysis have been studied so far. Wu et al. (2015, 2018) proposed a censored regression model using the lost auction data to alleviate the data bias problem. Nevertheless, the Gaussian distribution or other distributional assumptions (Zhu et al., 2017) turn out to be too restricted while lacking of flexibility for modeling sophisticated yet practical distributions. Another problem is that these regression models (Wu et al., 2015, 2018; Zhu et al., 2017)
can only provide a point estimation, i.e., the expectation of the market price without standard deviation, which fails to provide winning probability estimation given an arbitrary bid price to support the subsequent bidding decision
(Ren et al., 2018). Amin et al. (2012) implemented Kaplan-Meier estimator (Kaplan and Meier, 1958) for handling the data censorship in sponsored search. Kaplan-Meier estimator is a classic method in survival analysis which deals the right censored data in medical research (Gordon and Olshen, 1985; Ranganath et al., 2016). The authors of (Wang et al., 2016; Zhang et al., 2016b) also utilized this non-parametric estimator to predict the winning probability. However, Kaplan-Meier estimator is merely statistically counting on the segmented data samples, thus fails to provide a fine-grained estimation, i.e., prediction on a single ad auction level.Another school of survival analysis methods is Cox proportional hazard model (Cox, 1992). This method commonly assumes that the instant hazard rate of event (i.e., auction winning in our case) occurrence is based on a base distribution multiplied by an exponential tuning factor. Recent works including (Katzman et al., 2016; Luck et al., 2017; Park and Hastie, 2007) all used the Cox model with predefined base function to model the hazard rate of each sample, such as Weibull distribution, log-normal distribution or log-logistic distribution (Lee and Wang, 2003). However, the problem is that the strong assumption of the data distribution may result in poor generalization in real-world data.
Considering all the limitations above, we propose our deep landscape forecasting method which models the conditional winning probability given the sequential price patterns through recurrent neural network, and maximizes the partial likelihood over the historic winning and losing data. Note that our model takes end-to-end learning in a unified learning objective without making any distributional assumptions, which is capable of fitting various bid landscape data and providing fine-grained prediction for each ad impression.
Recurrent Neural Network.
Due to its adequate model capability and the support of big data, deep learning, a.k.a. deep neural network, has drawn great attention. Among them, recurrent neural network (RNN) whose idea was firstly developed two decades ago shows satisfying performance for sequential pattern mining. And its variants like long short-term memory (LSTM)
(Hochreiter and Schmidhuber, 1997) employ memory structures to better captures dynamic sequential patterns. In this paper we borrow the idea of RNN for modeling conditional winning probability. And in the experiments, our model shows superior better performance against state-of-the-art baselines including survival analysis methods (Katzman et al., 2016; Lee et al., 2018) and recent bid landscape forecasting models (Wang et al., 2016; Wu et al., 2018).In this section, we firstly present the preliminaries used in our discussion in Section 3.1 and formulate the problem in Section 3.2. Then in Section 3.3 we discuss our bid landscape forecasting model in detail. We conduct a deep analysis of the model in Section 3.4.
In RTB scenario, the advertiser is asked to propose a bid price after receiving a bid request for auction. The bid request contains three parts of the auction information including user (e.g., location, browser label, etc.), publisher (e.g., web URL and ad slot size, etc.) and the ad content (e.g., product type, time and creative content). The goal of the advertiser is to propose an appropriate bid price and win the auction in a cost-effective manner (Ren et al., 2018; Yuan et al., 2013).
One of the challenges is that it is infeasible to model the bidding strategy of each competing bidder since the participating advertisers do not interact with each other (Wang et al., 2016). It is natural to model the market as a whole and regard the market price as a variable (Wang et al., 2016; Lin et al., 2016; Zhang et al., 2014a)
. Recall that the market price is the second highest bid price among all the bidders in the second-price auction, i.e., the highest bid price from the competitors in the view of the auction winner. The probability density function (P.D.F.) of the market price
is .Now that we have the P.D.F. of the market price , we can derive the winning probability of proposing the price at as
(1) |
which is the probability that our bid price is larger than the market price, i.e., winning the auction. Then the straightforward definition of the “losing” function is
(2) |
which represents the losing probability of proposing the bid price . Note that in survival analysis (Katzman et al., 2016; Li et al., 2016), the market price is regarded as the patient’s underlying survival period and the bid price is the investigation period, thus winning and losing the ad auction respond to the “death” and “survival” status of one patient (Zhang et al., 2016b).
The data of the bidding logs are represented as a set of triples , where is the feature of the bid request, is the proposed bid price in that auction. Here is the observed market price if the advertiser previously won this auction and she has already known the true market price, but is unknown (and we marked as null) for those losing auctions.
The main problem of bid landscape forecasting is to estimate the probability distribution of the market price with regard to the bid request feature . Formally speaking, the derived model is a “mapping” function which learns the patterns within the data and predicts the market price distribution of each auction as
(3) |
The general goal has been illustrated in Figure 1.
In this part, we formulate our deep learning method with censorship handling for bid landscape forecasting, which we call as Deep Landscape Forecasting (DLF) model.
First we transform the modeling from continuous space to discrete space. Note that, since all the price in real-time bidding advertising is discrete, it is natural to propose the discrete price model and derive the probability functions in the discrete price schema.
In the discrete context, a set of prices is obtained which arises from the finite precision of price determinations. Analogously we may also consider the grouping of continuous prices as uniformly divided disjoint intervals where and is the last observation interval boundary for the given sample, i.e., the proposed bid price in the auction. is the largest price interval in the whole price space. This setting is appropriately suited in our task and has been widely used in medical research (Li et al., 2016) and bid landscape forecasting field (Zhang et al., 2016b; Wang et al., 2016) where the price is always integer (Zhang et al., 2014b) thus we set .
As such, our winning function and losing function over discrete price space is
(4) | ||||
where the input to the two functions is the bid price from the advertiser. And the discrete market price probability function at the -th price interval is
(5) | ||||
We define the conditional winning probability given the price as
(6) |
which means the probability that the market price lies in the interval given the condition that is larger than the bid prices which are smaller than . The meaning of is the conditional probability of just winning the auction by proposing the bid price at the -th price interval.
Till now, we have presented the discrete price model and discuss the winning and losing probability over the discrete price space. We here propose our DLF model based on recurrent neural network with the parameter , which captures the sequential patterns for conditional probability at every price interval for the -th sample.
The detailed structure of DLF network is illustrated in Figure 3. At each price interval , the -th RNN cell predicts the conditional winning probability given the bid request feature conditioned upon the previous events as
(7) | ||||
where is the RNN function taking as input and as output.
is the hidden vector calculated from the last RNN cell. In our paper we implement the RNN function as a standard LSTM unit
(Hochreiter and Schmidhuber, 1997), which has been widely used in sequence data modeling. We describe the implementation details of in the appendix.From Eqs. (4), (6) and (7), we can easily derive the losing probability function and the winning probability function with the bidding price for the -th individual sample as
(8) | ||||
where is the price interval index for . Here we use probability chain rule to calculate the joint losing probability at the given bid price through multiplying the conditional losing probability , i.e., inverse of the conditional winning probability.
Since there is no ground truth of either market price distribution or winning probability, here we maximize the log-likelihood over the empirical data distribution to learn our deep model. We consider from two aspects for the loss function.
The first loss is based on the P.D.F. and it aims to minimize the negative log-likelihood of the market price over the winning logs as
(10) | ||||
where is the interval index of the true market price given the feature vector .
The second loss is based on the C.D.F., i.e., winning probability distribution. Recall that there are winning cases and losing cases in the dataset. As is shown in Figure 4, the left subfigure is the winning case where has been known and ; The right figure is the losing case where is unknown (censored) but we only have the knowledge that . Thus, there are two motivations about the second loss.
For the winning cases as in the left part of Figure 4, we need to “push down” the winning probability during the price range of , while “pull up” the winning probability during the price range of , especially for the winning probability in . Thus, on one hand, we adopt the loss over the winning cases that
(11) | ||||
As for the the losing cases in the right part of Figure 4, we just need to “push down” the winning probability since we have no idea about the true market price but we only know that . On the other hand, we just adopt the loss over the losing auctions as
(12) | ||||
In this section, we unscramble some intrinsic properties of our deep model and analyze the model efficiency in this section.
Properties of Loss Function. First of all, we take the view of winning prediction of our methodology. As is known that there is a winning status, i.e., an indicator of winning the auction, for each sample as
(13) |
For the winning logs, each sample is uncensored (i.e., is known) where . While for the losing logs, the true market price is unknown but the advertiser only has the idea that , so that .
Therefore, taking Eqs. (11) and (12) altogether and we may find that the combination of and describes the classification of winning the auction as
(14) | |||
which is the cross entropy loss for predicting winning probability when bidding at given over all the data .
Combining all the objective functions and our goal is to minimize the negative log-likelihood over all the data samples including both winning logs and losing logs as
(15) |
where the hyperparameter
controls the order of magnitudes of the gradients from the two losses at the same level to stabilize the model training.In the traditional survival analysis methods (Cox, 1992; Katzman et al., 2018) and the related works for bid landscape forecasting (Wu et al., 2015; Zhu et al., 2017), they usually adopts only based on P.D.F. and for censorship handling. We propose a comprehensive loss function which learns from both winning logs and losing logs. From the discussion above, and collaboratively learns the data distribution from the C.D.F. view.
Model Efficiency. Here we analyze the computational complexity of our DLF model. As is shown in Eq. (7), each recurrent unit takes (, , ) as input and outputs probability scalar and hidden vector to the next unit. Recall that the maximal price interval is , so the calculation of the recurrent units will runs for maximal times. We assume the average case time performance of recurrent units is , which is related to the implementation of the unit (Zhang et al., 2016a), e.g., recurrent depth, recurrent skip coefficients, yet can be parallelized through GPU processor. The subsequent calculation is to obtain the multiplication results of or () to get the results of and , as that in Figure 3, whose complexity is . Thus the overall time complexity of DLF model is , which is the same as the original recurrent neural network model.
In many literatures, recurrent neural networks have been deployed in recommender system (Wu et al., 2016a), online advertising platform (Zhou et al., 2019) and machine translation system (Wu et al., 2016b), each of which shows promising time efficiency in large scale online systems and, to some extent, guarantees online inference efficiency for our DLF model. We may also optimize the implementation of the RNN unit through other techniques, such as Quasi-RNN (Bradbury et al., 2017) and sliced-RNN (Yu and Liu, 2018). Moreover, the landscape forecasting module could be parallelly executed with the utility estimation in RTB scenario, e.g., click-through rate prediction model, and jointly feed the results for final bid decision making. In our experiments, under the recommended settings, we evaluate our model and it achieved 22 milliseconds for averaged inference time given one sample, satisfying the 100 milliseconds requirement of bid decision in the RTB scenario (Wang et al., 2017).
In this section, we present the experimental setup and the corresponding results under various evaluation metrics with significance test. Furthermore, we look deeper into our model and analyze some insights of the experiment results. Moreover, we have published the implementation code for reproducible experiments^{2}^{2}2Reproducible code: https://github.com/rk2900/DLF..
We use two real-world RTB datasets in our experiment. iPinYou RTB dataset, which has been published in (Liao et al., 2014), contains 64.7M bidding records, 19.5M impressions, 14.79K clicks and 16.0K CNY expense on 9 campaigns from different advertisers during 10 days in 2013. Each bidding log has 16 attributes, including weekday, hour, user agent, region, ad slot ID, etc. The auctions during the last 3 days are set as test data while the rest as training data. The other bidding dataset is YOYI dataset which was published in (Ren et al., 2016). It includes 402M impressions, 500K clicks and 428K CNY expense during 8 days in Jan. 2016. More details of the two datasets have been provided in (Liao et al., 2014) and (Ren et al., 2016) respectively.
Data Preparation. For simulating the real bidding market in an online fashion and show the advantages of our deep survival model, we take the original data of impression log as full-volume auction data, and perform a truthful bidding strategy (Lee et al., 2012) to simulate the bidding process, which produces the winning bid dataset and the losing bid dataset respectively. For each data sample , the real market price is known for the advertisers, while for each the corresponding market price is hidden. It guarantees the similar situation as that faced by all the advertisers in the real world marketplace. This simulation and data processing method have been widely used in bid landscape forecasting literatures (Wu et al., 2015; Zhu et al., 2017; Zhang et al., 2016b; Wang et al., 2016).
After data preparation, we make some statistics over the resulted datasets. As is illustrated in Table 1, we can find that the averaged market price in is much lower than that of , which is reasonable because of the second-price auction mechanism and also reflects the bias of the model without using losing (censored) logs.
In these datasets, since all the prices are integer value, we bucketize the discrete price interval as and the maximal price interval number is equal to the largest integer price in the dataset where .
Campaign | Total # | Winning # | WR | AMP | AMP () | AMP () |
1458 | 3,697,694 | 1,116,644 | 0.3020 | 69.6696 | 27.4265 | 87.9452 |
2259 | 1,252,753 | 396,283 | 0.3163 | 96.7888 | 27.1986 | 128.9877 |
2261 | 1,031,479 | 321,931 | 0.3121 | 87.6479 | 18.9000 | 118.8396 |
2821 | 1,984,525 | 228,833 | 0.1153 | 93.8962 | 13.2118 | 104.4125 |
2997 | 468,500 | 70,747 | 0.1510 | 60.4188 | 7.2762 | 69.8711 |
3358 | 2,043,032 | 315,010 | 0.1542 | 95.4967 | 21.2540 | 109.0308 |
3386 | 3,393,223 | 819,447 | 0.2415 | 78.0327 | 23.8983 | 95.2682 |
3427 | 3,130,560 | 654,989 | 0.2092 | 81.9650 | 25.2118 | 96.9808 |
3476 | 2,494,208 | 723,847 | 0.2902 | 80.0719 | 31.2218 | 100.0453 |
Overall | 19495974 | 4647731 | 0.2384 | 82.0744 | 25.0484 | 99.9244 |
YOYI | 401,617,064 | 202,214,191 | 0.5035 | 55.7444 | 24.4488 | 87.4842 |
Evaluation Phase. In this phase, the corresponding market price distribution with the true market price of each sample in the test data is estimated by all of the compared models respectively. The corresponding winning function and losing function can be easily obtained through the forecasted market price distribution as that in Eqs. (1), (2) and (4). We assess the performance of different settings in several measurements, as listed in the next subsection.
In our experiments, we evaluate all the models under two metrics and conduct the significance test between our model and the other baselines. Note that, there are two goals for bid landscape forecasting, i.e., forecasting of market price distribution (P.D.F.) and the corresponding winning probability (C.D.F.) estimation given arbitrary bid prices.
First we use average negative log probability (ANLP) as (Wang et al., 2016) to evaluate the performance of forecasting the market price distribution. Specifically, ANLP is to assess the likelihood of the co-occurrence of the test bid requests with the corresponding market prices, which is calculated as
(16) |
where is the learned bid landscape forecasting function of each model.
The last evaluation metric is concordance index (C-index), which is the most common evaluation used in survival analysis (Harrell et al., 1984; Li et al., 2016; Luck et al., 2017) and reflects a measure of how well a model predicts the ordering of samples according to their market prices. That is, given the bid price , two auction samples with large market price and with small market price should be ordered as where is placed before . This evaluation is the same as the area under ROC curve (AUC) metric in the classification tasks (Qu et al., 2016; Ren et al., 2016) when there is only one event of interest (i.e., winning in our task) (Li et al., 2016). From the classification view of auction winning probability estimation by proposing , C-index assesses the ordering performance among all the winning and losing pairs at among the test data thus illustrates the performance of winning probability estimation.
Finally, we conduct the significance test to verify the statistical significance of the performance improvement of our model w.r.t. the baseline models. Specifically, we deploy a Mann-Whitney U test (Mason and Graham, 2002)
under C-index metric, and a t-test
(Bhattacharya and Habtzghi, 2002) under ANLP metric.We compare our proposed DLF model with nine baseline models including traditional Cox proportional hazard function model, survival tree model, multi-task learning method and other deep learning models.
[leftmargin=4mm]
DeepSurv is a Cox proportional hazard model with deep neural network (Katzman et al., 2018)
for feature extraction upon the sample covariates. The loss function is the negative partial likelihood of the winning and losing outcomes.
Gamma is the gamma distribution based regression model (Zhu et al., 2017). The winning price of each bid request is modeled by a unique gamma distribution with respect to its features.
MM is the mixture regression model. This model uses both linear regression and censored regression, and combines two models and predicts as a mixture manner (Wu et al., 2015).
MTLSA is the recently proposed multi-task learning model (Li et al., 2016). It transforms the original survival analysis problem into a series of binary classification problems, and uses a multi-task learning method. The original model predicts the death rate of a patient, we change it to predict the wining rate of bidding in an auction.
STM
is the survival tree model. This model combines Kaplan-Meier estimator and decision trees with bi-clustering to predict bid landscape. This model is proposed in
(Wang et al., 2016) and, to our knowledge, achieved state-of-the-art performance in bid landscape forecasting.DeepHit is a deep neural network model (Lee et al., 2018) which predicts the probability of each bidding price from the minimum price to the maximum price.
DWPP (Wu et al., 2018) is a deep winning price prediction method using neural network to directly predict the market price, with assumption of the distribution form as Gaussian distribution.
RNN is based on our DLF model. However, it only optimizes over the winning logs without considering the censored information, whose loss function is only . This model is used to illustrate the power of partial likelihood loss over the censored losing data.
DLF is our deep landscape forecasting model which has been described in Section 3.
The details of the experimental configurations, such as hardware and training procedure, have been included in the appendix.
In this part, we present the detailed performance of all the compared models over the evaluation metrics.
ANLP | |||||||||||
iPinYou | KM | Lasso-Cox | DeepSurv | Gamma | MM | MTLSA | STM | DeepHit | DWPP | RNN | DLF |
1458 | 10.532 | 38.608 | 38.652 | 5.956 | 5.788 | 9.791 | 4.761 | 5.510 | 29.204 | 9.506 | 4.088^{*} |
2259 | 14.671 | 28.234 | 29.658 | 6.069 | 7.328 | 10.248 | 5.471 | 5.586 | 39.263 | 9.625 | 5.244^{*} |
2261 | 14.665 | 39.129 | 39.390 | 5.986 | 7.020 | 10.261 | 4.818 | 5.442 | 32.805 | 9.417 | 4.632^{*} |
2821 | 19.582 | 43.099 | 43.072 | 7.838 | 7.262 | 9.895 | 5.572 | 5.614 | 40.537 | 23.099 | 5.428^{*} |
2997 | 16.203 | 32.849 | 33.052 | 5.999 | 6.702 | 9.167 | 5.083 | 5.470 | 34.940 | 16.639 | 4.504^{*} |
3358 | 19.253 | 44.769 | 44.885 | 6.736 | 7.177 | 9.484 | 5.539 | 5.616 | 40.958 | 13.806 | 5.281^{*} |
3386 | 15.973 | 39.781 | 41.943 | 6.488 | 6.141 | 8.834 | 5.228 | 5.549 | 32.550 | 10.743 | 4.940^{*} |
3427 | 16.902 | 41.558 | 41.698 | 6.002 | 6.185 | 9.090 | 5.321 | 5.552 | 33.387 | 9.565 | 4.836^{*} |
3476 | 10.507 | 39.551 | 39.518 | 5.710 | 6.022 | 10.240 | 4.537 | 5.554 | 31.609 | 7.891 | 4.012^{*} |
Overall | 15.366 | 38.620 | 39.096 | 6.310 | 6.552 | 9.668 | 5.148 | 5.544 | 35.028 | 12.255 | 4.774^{*} |
YOYI | 7.907 | 30.946 | 27.897 | 6.475 | 5.652 | 10.286 | 4.503 | 5.567 | 29.108 | 5.885 | 4.453^{*} |
We first analyze the probability density estimation performance for learning the bid landscape forecasting. Though there is no ground truth for the market price distribution , we may also use the negative log-likelihood result to evaluate the performance over the test data. Table 2 lists the ANLP performance of the compared models. From the table, we may find that our DLF model achieves significant improvements against the other baselines including the state-of-the-art model STM on both iPinYou and YOYI datasets.
We also find from the table that (i) The survival tree model STM achieves relatively better performance than other baselines which may be the result of the well clustering methodology and the non-parametric survival analysis. (ii) All of the models with survival analysis, i.e., DeepSurv, Gamma, MM, STM, MTLSA, DeepHit, perform much better than RNN model which does not consider censored data into model training. (iii) DeepHit model gets worse results than DLF model probably for the reason that it does not model the price-level sequential dependency as that in our method, which in contrast reflects that the conditional sequential modeling of DLF model in Eq. (7) has significantly improved the forecasting performance in the market price distribution modeling. (iv) Though DWPP utilizes deep model for feature extraction, it performs poor under ANLP metric which has also been reported in their paper (Wu et al., 2018). The reason may be the assumed Gaussian form of the market price distribution lacks generalization in the practical applications.
C-index | |||||||||||
iPinYou | KM | Lasso-Cox | DeepSurv | Gamma | MM | MTLSA | STM | DeepHit | DWPP | RNN | DLF |
1458 | 0.698 | 0.820 | 0.835 | 0.612 | 0.698 | 0.505 | 0.764 | 0.861 | 0.866 | 0.894 | 0.904^{*} |
2259 | 0.685 | 0.775 | 0.791 | 0.584 | 0.685 | 0.505 | 0.768 | 0.785 | 0.729 | 0.791 | 0.876^{*} |
2261 | 0.666 | 0.847 | 0.890 | 0.564 | 0.666 | 0.508 | 0.812 | 0.838 | 0.807 | 0.874 | 0.929^{*} |
2821 | 0.677 | 0.741 | 0.714 | 0.563 | 0.678 | 0.507 | 0.790 | 0.810 | 0.746 | 0.737 | 0.881^{*} |
2997 | 0.734 | 0.910 | 0.852 | 0.641 | 0.734 | 0.517 | 0.835 | 0.907 | 0.885 | 0.762 | 0.919^{*} |
3358 | 0.704 | 0.866 | 0.896 | 0.601 | 0.706 | 0.542 | 0.811 | 0.888 | 0.744 | 0.819 | 0.944^{*} |
3386 | 0.716 | 0.845 | 0.854 | 0.569 | 0.719 | 0.512 | 0.849 | 0.881 | 0.833 | 0.800 | 0.923^{*} |
3427 | 0.724 | 0.830 | 0.845 | 0.586 | 0.742 | 0.508 | 0.798 | 0.873 | 0.796 | 0.804 | 0.901^{*} |
3476 | 0.692 | 0.865 | 0.877 | 0.676 | 0.692 | 0.505 | 0.830 | 0.879 | 0.861 | 0.917 | 0.922^{*} |
Overall | 0.700 | 0.834 | 0.840 | 0.600 | 0.703 | 0.513 | 0.807 | 0.858 | 0.807 | 0.823 | 0.911^{*} |
YOYI | 0.791 | 0.847 | 0.862 | 0.528 | 0.791 | 0.510 | 0.886 | 0.878 | 0.856 | 0.898 | 0.924^{*} |
In this part we illustrate the measurement of the winning prediction under the given bid price of the sample . As is discussed before, this can be regarded as a binary classification problem, so we present the performance of C-index in Table 3. From the table we can observe that DLF achieves the best C-index value among all the compared models on both iPinYou and YOYI datasets which show the classification effectiveness of our model. Especially, our model gains over 12.9% average improvements against the state-of-the-art STM model.
We can also conduct below findings from the table. (i) All the deep models including DeepSurv, DeepHit, DWPP, RNN and DLF have relatively better C-index performance than the other non-deep baselines. Even the RNN model without any censorship handling achieves satisfying prediction performance, which again reflects the advantage of our novel modeling perspective with sequential pattern mining. (ii) MM and Gamma do not perform well which may be accounted for that these models adopt restricted assumptions of the base distribution for the probability dense function. This phenomenon verifies our analysis in Sections. 1 and 2 and reveals the importance of modeling without distributional assumptions. (iii) DWPP performs not well which is reasonable because it is optimized for market price regression rather than winning probability estimation. However, please note that, the forecasting of market price distribution and the corresponding winning probability estimation are more general for RTB advertising.
To illustrate the model training and convergence of DLF model, we plot the learning curve and the evaluation results on iPinYou Campaign 3476 and YOYI datasets in Figure 5. Recall that our model optimizes over two loss functions, i.e., the ANLP loss and the cross entropy loss . We apply a method (Cao et al., 2018) whose idea is to feed the batch of training data under each one of the two losses alternatively. From the learning curves we can find that (i) DLF converges quickly and the values of both losses drop to stable convergence at about the first complete iteration over the whole training dataset. (ii) The two losses are alternatively optimizing and facilitate each other during the training, which proves the learning stability of our model.
In this section, we illustrate some comprehensive analysis of the prediction results of different models. As is illustrated in Figure 6, we plot the winning probability (C.D.F.) and the corresponding market price distribution (P.D.F.) of an example sample, calculated from all the compared models. The true market price of this sample is .
Accurate Bid Landscape Forecasting. As is illustrated in the figure, when the value of market price distribution is high, the corresponding winning probability increases rapidly. Thus accurately predicting makes the winning probability estimation reasonable. From the figure, we may find that our DLF model accurately places highest probability density on the true market price , while the other models cannot conduct reasonable forecasting results. This finding reflects the advantage of our DLF model under ANLP metric since it directly measures the accuracy of bid landscape forecasting.
Flexibility of Forecasted Distribution. As has been discussed before, we do not make any assumptions of the distribution form of either market price distribution or winning probability function, thus our DLF model can model sophisticated distribution forms as illustrated in the figure. However, those models assuming specific distribution forms, i.e., Lasso-Cox, Gamma, DeepSurv and DWPP, cannot reasonably model this practical case since their strong assumptions lack generalization in real-world applications. Specifically, Lasso-Cox, DeepSurv and DWPP show similar results of P.D.F. and C.D.F. which may be accounted for the same Gaussian distribution form adopted by these models. Thus it further shows the disadvantage of assuming distribution forms for bid landscape forecasting.
Effective Censorship Handling. Comparing the forecasted result of DLF with that of RNN which lacks the censorship handling, we can find that though RNN model predicts the market price distribution and the winning probability with the similar shape to our DLF model, it over-estimates the winning probability and place the probability density of not accurately. It is easy to explain since without censorship handling, the model may result in biased landscape forecasting results as also shown in (Wang et al., 2016).
In this paper, we analyzed the challenges of bid landscape forecasting and the cons of state-of-the-art methods. To tackle with these problems, we proposed a fined-grained bid landscape forecasting model based on deep recurrent neural network and survival analysis modeling. Note that we do not assume any distribution forms for bid landscape forecasting. Our DLF model not only captures complex patterns in the bid landscape of each bid request, but also considers prediction of the winning status over both winning logs and losing (censored) logs. The comprehensive experiment results have shown the significant advantages of our model comparing with other strong baselines. A deep investigation has been performed to verify the robustness and correctness of our model.
For the future work, we plan to incorporate the proposed bid landscape forecasting model into bid optimization for profit maximization (Ren et al., 2018; Lin et al., 2016; Diemert Eustache, Meynet Julien et al., 2017) in real-time bidding advertising.
Acknowledgments. The corresponding author Weinan Zhang thanks the support of National Natural Science Foundation of China (61702327, 61772333, 61632017) and Shanghai Sailing Program (17YF1428200).
Bidding Machine: Learning to Bid for Directly Optimizing Profits in Display Advertising.
TKDE (2018).In this section, we describe the details of the implemented architecture of the proposed model (DLF).
Recall that, each sample is a triple where is the -dimensional vector representing the feature of the sample, is an integer of the true event time and is a integer of the proposed bid price.
As is illustrated in Figure 3, the input to our DLF model is the triple . For each recurrent unit, we feed the input as where is an integer representing the price interval of the current unit.
Note that
is a multi-hot encoded feature vector including a series of one-hot encoded features, and we first put it through an embedding layer as
After getting the embedding vector , we concatenate the input vector and the bid as
Specifically, we implement Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) as the recurrent unit as
(17) | ||||
Here is the hidden state vector of the th recurrent unit. After we get the output of each unite
, a fully connected layer with sigmoid activation function predicts the hazard rate as is described in Eq. (
7) of our main paper asThen we may calculate the market price probability w.r.t. market price , the winning rate and the losing rate at the proposed bid price as Eqs. (8) and (9) of our main paper. More details can be referred to our published code and the link of code repository with datasets is https://github.com/rk2900/DLF.
All the models are trained until convergence and we consider learning rate from . The value of is tuned to 0.25. Batch size is fixed on 128 and embedding dimension is 32. All the deep learning models take input features and feed through an embedding layer for the subsequent feedforward calculation. The hyperparameters of each model are tuned and the best performances have been reported.
The models are trained under the same hardware settings with an Intel(R) Core(TM) i7-6900K CPU processor, an NVIDIA GeForce GTX 1080Ti GPU processor and 128 GB memory. The training time of each compared model is less than ten hours (as reported from the slowest training model MTLSA) on each dataset.