DeepAI
Log In Sign Up

Auction Throttling and Causal Inference of Online Advertising Effects

Causally identifying the effect of digital advertising is challenging, because experimentation is expensive, and observational data lacks random variation. This paper identifies a pervasive source of naturally occurring, quasi-experimental variation in user-level ad-exposure in digital advertising campaigns. It shows how this variation can be utilized by ad-publishers to identify the causal effect of advertising campaigns. The variation pertains to auction throttling, a probabilistic method of budget pacing that is widely used to spread an ad-campaign`s budget over its deployed duration, so that the campaign`s budget is not exceeded or overly concentrated in any one period. The throttling mechanism is implemented by computing a participation probability based on the campaign`s budget spending rate and then including the campaign in a random subset of available ad-auctions each period according to this probability. We show that access to logged-participation probabilities enables identifying the local average treatment effect (LATE) in the ad-campaign. We present a new estimator that leverages this identification strategy and outline a bootstrap procedure for quantifying its variability. We apply our method to real-world ad-campaign data from an e-commerce advertising platform, which uses such throttling for budget pacing. We show our estimate is statistically different from estimates derived using other standard observational methods such as OLS and two-stage least squares estimators. Our estimated conversion lift is 110 using naive observational methods.

READ FULL TEXT VIEW PDF
03/27/2019

Parallel Experimentation in a Competitive Advertising Marketplace

When multiple firms are simultaneously running experiments on a platform...
08/25/2022

Incrementality Bidding and Attribution

The causal effect of showing an ad to a potential customer versus not, c...
05/26/2022

Analysis of a Learning Based Algorithm for Budget Pacing

In this paper, we analyze a natural learning algorithm for uniform pacin...
08/26/2020

How Much Ad Viewability is Enough? The Effect of Display Ad Viewability on Advertising Effectiveness

A large share of all online display advertisements (ads) are never seen ...
11/19/2019

Principal Stratification for Advertising Experiments

Advertising experiments often suffer from noisy responses making precise...
03/11/2019

Estimating Individual Advertising Effect in E-Commerce

Online advertising has been the major monetization approach for Internet...
08/31/2015

Online Model Evaluation in a Large-Scale Computational Advertising Platform

Online media provides opportunities for marketers through which they can...

1 Introduction

Measuring the causal impact of online advertising is important for both advertisers and digital advertising platforms. For advertisers, understanding the impact of advertising is essential for evaluating past campaign performance and optimizing future spending. Comparing the causal effect to the cost of the campaign helps the advertiser assess the incremental return on investment from the campaign, and is a critical determinant of whether or not the campaign should be continued (e.g. blake_consumer_2015 for a well known example with eBay). The causal effects are also important determinants of how much advertisers should bid in real-time bid (RTB) auctions, because they represent the valuation of the advertiser for the average impression acquired in the campaign. As such, the ability to measure incremental ad-effects is now considered a key driver of the success of a firm’s digital advertising strategy (e.g., gordon_comparison_2019). For platforms, causally measuring ad-effects is important for valuing their own advertising inventory, for improving their ad-selling mechanisms, and for creating automation solutions that help advertisers do better bidding and budgeting of their campaigns on the platform (e.g., borgs2007dynamics; chakrabarty2007budget; feldman2007budget; muthukrishnan2007stochastic; muthukrishnan2010stochastic; amin2012budget; zhang_joint_2012; nuara2018combinatorial; geng_automated_2021).

However, measuring the causal impact of digital advertising is challenging. Typical observational ad data lack the variation required to measure causal effects because ad exposure is not randomly assigned, and significant biases remain even after matching exposed and unexposed users on observable dimensions (see gordon_comparison_2019). Although running experiments seem like a viable alternative, it is often expensive. It is well known that the causal effects of digital ads are small in magnitude, and large sample sizes are required to measure it with statistical precision (e.g., Sahni2015; lewis_unfavorable_2015. Large-scale experimentation can be economically unfavorable for advertisers because under randomization, ads are withheld from users in the control group, some of whom would have responded favorably to the ads had they been targeted. Large scale randomization may also hurt ad platforms because it requires their users to see random ads that may not match their preferences. The problem is more challenging if advertisers are interested in optimizing within-campaign expenditure: this requires determining whether to spend more on certain subpopulations and less on other subpopulations. Solving this requires measuring heterogeneous treatment effects of advertising, which requires more data than measuring average effects. Given the high costs of experimentation, it is valuable to identify alternative source of variation in observational digital advertising data that can help pin down causal ad-effects. This is the goal of this paper.

We identify a pervasive source of naturally occurring, quasi-experimental variation in user-level ad-exposure in digital advertising campaigns. We show how this variation can be utilized by ad-platforms and ad-publishers to identify the causal effect of advertising campaigns. Our proposal leverages an ad-serving strategy referred to as “auction throttling” which is widely used by major ad publishers such as Google (karande_optimizing_2013 and balseiro_budget_2017), Linkedin (agarwal_budget_2014) and others, and which has been well studied theoretically in the literature (e.g., charles2013budget).

Auction throttling is a budget smoothing mechanism motivated by the fact that advertising campaigns typically have limited budgets. The budget limitation prevents advertisers from participating in all eligible auctions, i.e. auctions for impressions that match the campaign’s targeting criteria. The concern is that if advertisers participate in every eligible auction early in the campaign, they will exhaust their budget too early and lose the opportunity to target customers that arrive later in the campaign. Hence, it may be in the advertiser’s interest to smooth their budget over the campaign duration. This smoothing process is referred to as “pacing.” Pacing budgets is also helpful for the ad-platform because in its absence, auction pressure on the platform becomes highly variable. For instance, if most ad campaign budgets are spent early in the day, auction pressure (and consequently ad-prices) become high early in the day and low later in the day. Understanding this, advertisers may shift budgets to later in the day leading to complex spending and pricing variability and auction dynamics on the platform. Pacing the budget helps the platform manage this variability.

The throttling mechanism paces the spending for a given advertising campaign, by calculating a participation probability for the campaign for each auctioned impression. This probability is calculated by the platform’s ad-server using real-time bidding information, such as the recent spending rate, remaining budget, remaining campaign time etc. The specifics of the implementation vary, but broadly speaking, the participation probability is high when the remaining budget is high or recent spending rate is low, and is low when the remaining budget is low or recent spending rates are high. Throttling is achieved by including campaigns randomly into auctions with this participation probability. Conditional on this, auctions proceed as usual from that point onwards with the campaign’s bids included (when its throttled in), or without the campaign’s bids (when its throttled out). Dynamically adjusting the participation probability over time helps pace the campaign. The participation probabilities are usually logged by platforms.111An alternative mechanism to pace the budget is to lower the bid when the budget is low. balseiro_budget_2017 has compared the revenue implications of these different mechanisms. Our paper can be understood as demonstrating an extra benefit of probabilistic throttling: providing a source of pervasive identifying variation in the data that does not require running experiments.

Our interest is in leveraging the participation probabilities for causal inference. We emphasize our interest here is not in analyzing the pacing algorithm or its design or implications. We show that typical implementations of such pacing mechanisms serve as a source of naturally occurring, quasi-experimental variation in auction participation, which induces random variation in user-level ad exposure. Typically, auction participation cannot be regarded as exogenous to post-auction outcomes because it depends on many factors that correlate with outcomes, such as ad targeting criteria, as well as remaining campaign budgets (which is affected by the arrival of past users). Nevertheless, we show that auction participation is conditionally independent of the potential outcomes of interests after conditioning on this participation probability. The implication is that ad-platforms and publishers can use the participation induced by the throttling mechanism as a conditional instrument variable for identifying the impact of advertising campaign, as well as its heterogeneous treatment effects over different sub-populations.

We articulate the key requirements of a throttling algorithm to facilitate such inference, and make precise the specific causal estimand that can be estimated leveraging this variation. We present an estimator that learns this estimand from typical throttled ad-campaign data. We propose an analytical standard error that provably works under a fixed throttling regime and a bootstrap standard error that works generally when we know the throttling algorithm. We use both simulation and empirical data from an e-commerce advertising platform to demonstrate the procedure and to benchmark it against alternative approaches. To the best of our knowledge, the value of this variation for inference has not been articulated in the literature, and therefore, we believe our proposal is new.

Our procedure requires access to the logged participation probabilities and knowledge of the throttling rule. Therefore, it should be seen as a way to implement causal inference of RTB ads from the publisher/platform’s perspective. For platforms and publishers who implement budget pacing methods already via probabilistic throttling, our procedure provides a way to automatically deliver inference on the ad campaigns on the platforms without the need for additional experimentation. Due to this reason, we believe the method presented here has practical value and appeal.

The remainder of the paper is organized as follows: Section 2 reviews the literature. Section 3 presents the setup and key assumptions related to the throttling mechanism. Section 4 describes the estimation procedure that leverages data generated under the throttling mechanism. Section 5 demonstrates the value of the data and the corresponding estimation procedure using simulation. Section 6 discusses the inference and standard errors. Section 7 applies the method to a dataset from an e-commerce advertising platform and demonstrate the usefulness of this variation. Section 8 concludes.

2 Literature Review

This paper contributes to the literature on measuring the effects of online advertising. Our paper contributes to this literature by identifying a pervasive source of quasi-experimental variation in observational data that allows ad publishers to measure advertising effects without running large-scale experiments. The quasi-experimental variation is induced by the limited budget and the probabilistic throttling algorithm that is widely used by many ad publishers.

There is now a large stream of literature (e.g., see gordon2021inefficiencies and papers cited there) that has proposed experimental approaches for causal measurement of digital ad effects. Several of these experimental designs are from the advertisers’ perspective: e.g., geo-level randomization (e.g., blake_consumer_2015) or inducing randomization of ad-intensities by manipulating ad campaign frequency caps on DSPs (e.g., sahni2019retargeted). Unlike these papers, our design does not require additional randomization, leverages existing ad-serving mechanisms (i.e., throttling), and is developed from the publisher/platform’s perspective, rather than the advertiser’s. Our method is related to johnson_ghost_2017 which enables ad publishers to more efficiently measure advertising effect by counterfactual ad logging. Similar to their setting, our paper focus on measuring advertising effects for ad publishers who often have direct access to the algorithm that generates advertising exposure. In contrast, we do not require additional experiments, because the probabilistic nature of the throttling algorithm generates random variation.222shapiro_positive_2018; shapiro2021tv and stephens2017super; hartmann2018super also leverage quasi-experimental variation to measure the effects of advertising. To assess the impact of TV advertising, the former two papers exploit the discontinuity induced in advertising along the borders of TV markets; and the latter two, exploit the variation induced in TV viewership due to changes in the identity of SuperBowl teams. In contrast, we focus on the variation induced by limited budget and the randomization induced by pacing algorithms. Our method can be applied in RTB digital advertising, while the above methods can be mostly applied to TV advertising.

Our estimation method is closely related to the literature on instrumental variables (IV) and the estimation of local average treatment effects (LATE). As in the IV model in ImbensAngrist1994; AngristImbensRubin1996 and the local IV model in Abadie2003; Frolich2007; Hong2010

, identification comes from a binary instrument, i.e. auction participation, that induces exogenous selection into treatment, i.e. ad exposure, for some subset of the population. However, one difference is the IV identification argument works in our setting only conditionally given the participation probability, which motivates different estimators. Our weighted average IV estimator is constructed by first estimating the local average treatment effect (LATE) for each subgroup defined by the same participation probability and then taking a weighted average of these LATE estimates with weights determined by the number of compliers within each subgroup. This estimator coincides with the nonparametric imputation estimator of conditional LATE in

Frolich2007 with participation probability as the key conditioning variable. However, in contrast, the large population random sampling assumption of Frolich2007 does not apply to online advertising data, so we propose new inference procedures based on design-based uncertainty (discussed below).

Our inference procedure is related to the work of AbadieAtheyImbensWooldridge2020 and others, which illustrates the difference between sampling-based and design-based inference. Design-based inference focuses on causal parameters defined over the sample and takes into account of the uncertainty from treatment assignment only and treats the units in the data as fixed; while sampling-based inference focuses on causal parameters defined over a hypothetical super-population and takes into account of both uncertainty in treatment assignment and uncertainty in sampling of units. Because the super-population of the potentially non-stationary online traffic is not well-defined and the quantities of primary business interest are causal effects for the realized traffic rather than of some hypothetical super-population, we take a designed-based inference perspective to quantify the uncertainty of the (sample) local average treatment effects we measure. Following AbadieAtheyImbensWooldridge2020, this is precisely an example of a scenario where the perspective of design-based uncertainty is valuable.

Finally, our inference task is related to hong_inference_2020 which measures the uncertainty of local average treatment effects with a stratified experiment. Our setting can be understood as a complex, sequential stratified experiment, where the assignment probability is not a priori fixed, and is driven by the throttling algorithm. We account for the uncertainty of assignment probability through a novel bootstrap procedure we introduce that incorporates the knowledge of the throttling algorithm. We show that this bootstrap procedure generates a conservative estimate of the standard error.

3 Setup

Consider an advertiser running a campaign on a platform that serves ads through auctions. The advertiser specifies the targeting criteria, campaign duration , bid per impression, and a limited budget. Due to this limited budget, the advertiser often cannot reach all eligible impressions that meet the targeting criteria. To help the advertiser manage the budget, the advertising platform implements a throttling algorithm that reduces the campaign’s probability to participate in auctions for impressions when the budget runs low.

Refer the th ad auction that meets the campaign’s targeting criteria starting from the beginning of the ad campaign as auction . For each auction, we observe

in which is the campaign’s participation probability in this auction , is the binary indicator of whether the advertiser participated in this auction, is referred to as ad exposure or auction outcome, and is the binary indicator of whether the advertiser won the auction and thus displayed its ad to the customer, is the outcome of interest, e.g. click or conversion, 333One challenge is that sometimes advertisers have multiple opportunities to display ads to customers, such that multiple ad impressions are associated with a conversion. For illustration purposes, we first abstract away from the case of multiple ad exposures, and consider the case when advertisers have a single opportunity to display ads to users. In Section 9 we discuss how our method handles situations when advertisers have multiple opportunities to serve ads to the same user., contains auction specific pretreatment side information like ads quality score and bid, and contains additional post-treatment auction information like auction expenditure. The throttling algorithm generates a participation probability for auction based on the available logged information , which contains auction ’s pretreatment side information and all historical data up to auction , . Formally, we have

To formally discuss the causal inference problem we are interested in, we use the following potential outcome notation. Denote the potential ad exposure under auction participation as . The potential ad exposure under non-participation is always equal to zero. In other words, we have one-sided compliance. The observed ad exposure is . Similarly, we denote the post-treatment side information under participation and non-participation as and . The observed post-treatment side information is . Denote the potential outcome of interest under ad exposure and under no ad exposure as and . The observed outcome of interest is . In summary, auction has the following potential variables,

All these potential variables are treated as pretreatment characteristics invariant to auction participation.

Let be the set of all auctions that satisfy the campaign’s targeting criteria. We show that the platform can measure a local average treatment effect (LATE) of ad exposure defined as

(1)

Interpretation

is the average treatment effect of ad exposure on the outcome of interest over all compliers, which are all auctions that the advertiser would win if she participates. This estimate is numerically equivalent to the average treatment effect on the treated (ATT) in standard advertising experiments that randomly assigns auction participation with equal probability.

LATE is different from ATT in our setting, because different auctions have different participation probability due to throttling. This LATE parameter informs the advertiser what the ROI and treatment effect would be, if the campaign has sufficient budget so that it could participate in all auctions. This parameter is structural in the sense it is invariant to the realized participation probability and the actual budget of the campaign. Therefore, if future customers are similar to the current customers, this LATE parameter can better inform the advertiser of future campaign budget allocation. For example, consider a campaign that has spent 90% of its budget in the morning, and cannot reach all the afternoon customers. If the morning customer has a low ROI of 1% and the afternoon customer has a high ROI of 50%, then focusing on ATT, which gives the morning customer a weight of 90%, may underestimate the potential ROI of the campaign. The advertiser may arrive at the wrong conclusion that they should not allocate budget to run such a campaign in the future. In contrast, focusing on LATE will weigh the morning and afternoon customers not based on how many are actually treated in the past campaign, but based on how many could potentially be treated. Therefore, in the presence of budget limitation, LATE could better inform advertisers of its future budget allocation decisions when future customers are similar to past customers.

3.1 Algorithm Assumptions

We outline a class of throttling algorithms that enable identification of the LATE. These algorithms satisfy the following assumptions:

Assumption 1 (Probabilistic Throttling and Conditional Random Assignment of Auction Participation).

The platform uses a throttling algorithm and logged information to randomly assign the focal advertiser into participating auction with probability . Formally,

The second equality defines the auction participation probability. The first equality says the participation probability is generated from the algorithm and logged information . The third equality says auction participation is conditionally randomly assigned given . In particular, is conditionally independent from the potential variables.

As a simple example, the vanilla probability throttling algorithm in karande_optimizing_2013, which sets the participation probability equal to the ratio of the remaining budget and the remaining maximum spending rate for the rest of the campaign, satisfies assumption 1. More intelligent throttling algorithms setting the participation probability adaptively using history and side information also satisfy this assumption.

One natural implication of assumption 1 is,

Namely, is conditionally unconfounded given participation probability . From a super-poluation perspective, this implication is in the Dawid conditional independence notation:

Implication (Conditional Random Assignment of Auction Participation).
(2)

We consider condition (2) as a novel but pervasive condition under which we can measure the effect of real time bid advertising. The key is to leverage the induced variation in auction participation. In standard observational settings, auction participation is not random marginally and is not a valid (marginal) instrumental variable, because advertiser’s targeting criteria often leads to positive correlation between the auction participation and the treatment effect. In our setting, participation is not random marginally either, because participation probability is a function of information , which may be correlated with the potential outcomes. For example, if customer arrivals are serially correlated, then logged information may both affect the current participation probability and be correlated with the current customer’s potential outcomes. Therefore, participation is not a valid instrumental variable marginally. However, (2) says that after conditioning on the participation probability, participation becomes independent of the potential outcomes due to random assignment.

Assumption 2 (Overlap).

There exists some , such that for any auction , the participation probability generated by the algorithm satisfies .

This assumption implies that the throttling algorithm ensures the participation probabilities are bounded away from zero or one, which allows us to observe sufficient variation in auction participation.

An alternative approach is the following. If we do not assume assumption 2 holds, we can prespecify an and focus on estimating the causal effects over the subpopulation whose observed participation probability lies in . In this sense, assumption 2 is not essential and is mainly meant to simplify our exposition.

Assumption 3 (Finite Support).

s are discrete variables with finite support.

As suggested by assumption 1, participation is only unconfounded within the set of auctions with the same participation probability. Assumption 3 ensures enough number of auctions share the same participation probability. Similar to the propensity score matching literature (abadie_matching_2016), this finite support assumption can be relaxed to allow for continuous support. If we do not assume assumption 3 holds, we can instead partition the participation probability into multiple discrete buckets and then construct estimators using the discretized probabilities. In this sense, Assumption 3 is also not essential and is meant to simplify our exposition. Moreover, algorithms in practice often update participation probabilities only at discrete time intervals and therefore satisfy Assumption 3.

3.2 Connection to Instrumental Variable

Auction participation is not a valid instrumental variable marginally. However, after conditioning on the participation probability, participation becomes independent of the potential outcomes due to random assignment. Therefore, the auction participation can be thought of as a conditional instrumental variable for estimating the advertising effects:

Implication (Conditional IV).

For a subpopulation defined by with , the auction participation is an instrument variable for estimating the LATE over this subpopulation

(3)

Assumption 1 ensures that satisfies conditional unconfoundedness. Because the advertiser cannot win auctions without participation, the monotonicity condition is also satisfied. Moreover, the exclusion restriction is automatically satisfied, since the outcome of interest does not depend on auction participation given the auction outcome.

3.3 Connection to Propensity Scores

Auction participation probability is similar to but different from a propensity score. The key difference is that the propensity score, , summarizes the probability of treatment received, but the auction participation probability, , is the probability of the intent-to-treat. The advertiser may not be explicitly interested in measuring the treatment effect of participating in auctions, but directly interested in the effect of winning auctions to show ads. Because participating in auctions is not equivalent to winning auctions, we need to account for the imperfect compliance when measuring the effect of advertising.

4 Estimation

Under Assumptions 1-3, we show that can be estimated using the following procedure:

  1. Partition the set of auctions based on the participation probability .

  2. Estimate of each partition by treating the participation as an instrumental variable.

  3. Estimate the number of compliers in each partition.

  4. Average the s of all partitions based on estimated number of compliers in each partition.

Let and be the subset of auctions in with participation and with nonparticipation, , , be the number of auctions in , and be the set of all possible s. defined in equation (3) can be estimated by the standard LATE estimator (imbens_causal_2015):

(4)

in which

are the estimates of intent-to-treat effect on the outcome and on the treatment .

The number of compliers in each partition, , can be estimated by

Proposition 1.

The total LATE defined in equation (1) can be estimated as a weighted average of ,

(5)

where the weight,

is the ratio between the estimated number of compliers in partition and the estimated number of compliers in the whole population .

Proof.

Rewrite equation (1) as follows,

where the weight is the number of compliers in group divided by the total number of compliers in the whole population . Replacing and by their estimates and gives us the . ∎

To build intuition, we can rewrite the above estimator equivalently as,

(6)

Intuitively, the numerator in the equation above estimates the total intent-to-treat effect if the advertiser were to participate in all the auctions. The denominator estimates the total number of of compliers in the sample, which can be interpreted as the total number of auctions that the advertiser could have won if the advertiser were to participate in all the auctions.

5 Simulation

To demonstrate how our estimator works, we consider a throttling algorithm that updates the participation probability for the focal advertiser every minutes in a second-price auction. After every -minute interval , the platform calculates the average expenditure per participation of the recent interval, the remaining budget , and the number of expected remaining auctions . The platform calculates a score using these numbers:

where is the expected number of auctions that can be participated without violating the campaign budget constraint. Intuitively, when this number is low relative to the number of remaining auctions, the participation probability in the next interval should be low. Assume the participation probability to be a discrete function of such score:

Consider an advertiser that runs a 24-hour campaign with a budget of and a bid of per impression. Customers arrival follow a Poisson process at a rate of per day, with two unobserved types, high () and low (). Amongst types, will purchase even without ads, and an additional

will purchase with ads. H typs are also attractive to competing advertisers, whose highest competing bid is drawn from a uniform distribution of

. Amongst L types, will purchase without ads, and an additional will purchase after seeing the ads. They are less attractive to competing advertisers, whose highest competing bid is drawn from a uniform distribution of . Customer types have a serial correlation of , which is meant to model typical online traffic where similar customers tend to arrive around similar time.

We simulate such campaign for times, and calculate the difference between with estimates using our estimator in Equation 5, as well as standard OLS and IV estimators:

Table 1 demonstrates the bias of these estimators. Figure 1 shows the distribution of such differences.

Method Estimator Mean Bias RMSE
0.2245 0.0000 0.0000
Equation (5) 0.2251 0.0006 0.0255
OLS 0.1530 -0.0714 0.0718
IV 0.2924 0.0680 0.0703
Table 1: Method Comparison
Figure 1: Distribution of difference between and estimates: vs OLS vs IV

The OLS estimator is biased because it incorrectly assumes winning the auction to be random. Winning the auction is endogenous because the winning probability depends on the customer type, which affects the bid of the competitor. The IV estimator is also biased, because it incorrectly assumes the participation to be random. The participation is endogenous because of serial correlation: past customer types are not only correlated with current customer types but also affect current participation. Appendix A provides a more detailed description of the simulation process.

6 Inference

Quantifying the uncertainty of our estimate is challenging because customer arrivals and advertiser auction participations can all be serially correlated and even nonstationary. Since it is hard to retrospectively characterize the super-population from which impressions are drawn and in-sample treatment effect is of primary business interest, we will treat the sample in the data as fixed (equivalently conditional on the data) and take a design-based inference perspective to quantify uncertainty of our estimated LATE following AbadieAtheyImbensWooldridge2020: given the set of units in the sample, how would the estimated LATE be different from the actual LATE, if the auction participations were assigned differently in a thought experiment. Formally, this design-based uncertainty is defined as:

where is the observed sample for which we are interested in evaluating the advertising effectiveness.

Because auction participations are assigned by the throttling algorithm, the distribution of the estimator and its inference also depends on the algorithm. We focus on a class of throttling algorithms that update the participation probability for the focal advertiser every minutes. We demonstrate how to conduct design-based inference through bootstrap when researchers have access to the exact throttling algorithm, which is the situation for platforms that have perfect information over how the participation probability is generated.

To formally define the throttling algorithm, index the time interval by , where each interval last minutes. Let be the set of auctions that belong to interval , and be the participation probability for all auctions in . The throttling algorithm maps past history to the participation probability for the next interval:

where is history observed by the platform after the end of interval . Figure 2 gives an example of the DAG of the data generating process.

Initial

start

Figure 2: Data generating process for the throttling algorithm

6.1 Bootstrap Inference

Our inference procedure first generates bootstrap samples that could have been generated under the throttling algorithm. A bootstrap sample of the campaign can be simulated using the bootstrap sampler below. Each bootstrap sample contains an entire sequence of samples from to . We use and to denote data in period with participation and with nonparticipation. We use to denote uniform random sample from dataset with replacement.

input : Data , with
output : Bootstrap sample
;
for  in  do
       ;
        /* Draw the total number of participations based on */
       ;
        /* Draw bootstrap samples with participation */
       ;
        /* Draw bootstrap samples with nonparticipation */
       ;
        /* Combine and as all bootstrap samples in period */
       ;
        /* Combine and as history at the end of period */
       ;
        /* Calculate participation probability in the next period */
      
end for
Algorithm 1 Bootstrap sampler

One key difference from the standard bootstrap procedure is that participation cannot be directly simulated from the empirical distribution by sampling with replacement. Instead, it has to be simulated based on the participation probability that is an output of the throttling algorithm. Knowledge of algorithm is key for conducting inference, because how the participation probability is generated/sampled cannot be directly derived from the empirical data.

Another key difference from the standard bootstrap procedure is that in this causal inference set-up, some of the potential outcomes are missing, i.e. we only observe the potential outcome under auction participation or under auction nonparticipation but not both for each impression. We address this missing data problem by drawing bootstrap samples with participation and with nonparticipation using separate subpopulations with participation and with nonparticipation respectively.

After simulating many such bootstrap samples and generate corresponding bootstrap version of the estimate

, we can construct an estimate of the variance

as:

This variance estimate is expected to be conservative because 1) bootstrap is generally known to be valid for asymptotic inference under the superpopulation model hong_inference_2020, and 2) the standard error estimated under a superpopulation model can be used as a conservative standard error for our sample LATE under a finite population model (imbens_nonparametric_2003 and AbadieAtheyImbensWooldridge2020).

Given the bootstrap resamples, we can also construct a bootstrap confidence interval using a percentile method:

in which and are -percentile and -percentile of the empirical distribution .

6.2 Inference Based on Asymptotic Normal Approximation

The bootstrap inference method proposed above works well in general but requires access to the throttling algorithm. Here we propose an alternative inference method based on large-sample asymptotic normal approximation. This method does not require access to the throttling algorithm and is provably valid under the additional assumption that participation probabilities are a priori fixed. Though there is no formal guarantee that the confidence interval based on this method has the correct coverage when the participation probabilities are adaptive, our simulation suggests the method often works well empirically for large campaigns. As a result, we believe this is a reasonable inference method in practice.

Our large-sample approximation is based on the following representation:

Suppose participation probabilities are fixed. Since all the s and s are sample means or differences of sample means, each of them are asymptotically normal. As a smooth function of sample means, is also going to be asymptotically normal. More precisely,

where is the asymptotic variance. Proposition 2 in appendix B

makes this statement rigorous and presents a proof outline based on delta-method and central limit theorem. Since variance of the estimator can be approximated by

intuitively one can can estimate the variance by a plug-in estimator

Equation (7) in Appendix B presents a full specification of the variance estimator.

This variance estimator may slightly overstate the variance, because the standard sampling-based variance estimators of intent-to-treat effects tend to be conservative for finite-sample design-based inference AbadieAtheyImbensWooldridge2020.

Whether the confidence interval based on this estimated variance and asymptotic normality works or not depends on how participation probabilities vary in practice and whether we have large samples or not for each . We find that this confidence interval often works particularly well for large advertising campaigns. The large sample size and relatively smooth change in participation probabilities of large campaigns fits the assumptions of this method well.

Table 2 shows the coverage using this asymptotic-normality-based approach and the bootstrap approaches. Given our throttling algorithm in Section 5, both methods provide a conservative estimate of the true variance.

Standard Error 0.0195 0.0205 0.0245
Coverage
Table 2: Simulation Results With Coverage For Nominal Confidence Intervals

The magnitude of under/over coverage of the asymptotic-normality-based inference method depends on the exact throttling algorithm and the serial correlation of customer types. Appendix B provides some intuition on why there may be over coverage in our simulation exercise.

7 Empirical Application

We apply our method to a campaign run on a large e-commerce advertising platform. The platform sells ads through auction and uses probabilistic throttling as a budget control mechanism. The campaign usually has low budget near the end of the day, triggering the throttling mechanism. We analyze the campaign data from pm-pm of each day, for a total of 9 days. Figure 3 shows how the participation probability is different at pm across different days.

Figure 3: Participation Probability Vs Time

Our outcome of interest is whether the customer has visited the product page related to the advertiser, regardless of whether it is through ad or organic search. For each hour, we observe all auctions participated by the focal campaign, as well as a sample of auctions not participated by the focal campaign. This nonparticipation is primarily driven by either throttling or not meeting the targeting criteria. Ideally our method can be best applied if the platform has logged whether the auction meets the targeting criteria, such that we can observe all auctions that the focal campaign could have participated in the absence of throttling. Unfortunately, this information is usually not logged, because the value of this information and how it can be applied has not been clearly documented prior to our paper. To demonstrate the value of our approach, we impute these throttling-induced non-participation based on whether the focal campaign’s top-20 competitors have participated and how much these top competitors bid. 444The top competitors are measured by how frequently they appear in the auctions participated by the focal advertiser. If a set of auctions have exactly the same top competitors and bids, but only some auctions are participated by the focal campaign, then we label the remaining auctions as throttling-induced non-participation. 555This prediction procedure is most similar to matching, but there are several differences: 1) the source of variation is directly known by researchers because of throttling; 2) instead of predicting units that could have received the treatment, the ad exposure, we are predicting the units that could have received the instrument, the auction participation; 3) the expected number of nonparticipated units to be matched to the participated units is directly informed by the throttling algorithm: . In our empirical setting, we cannot find all of these nonparticipated units, because we only observe a sub-sample of auctions not participated by the focal campaign, but not all of them. To ensure our estimate is valid, we weight the matched non-participated units so their weight add up to . We then apply our estimation method on this subset of auctions where we can match participated auctions with non-participated auctions. In practice, to avoid this additional imputation step that may introduce additional uncertainty, we recommend the platform to directly log auctions that the advertisers were throttled out.

Table 3 compares our method with standard observational methods that do not account for the reasons behind nonparticipation. Column (1) shows the OLS estimate that incorrectly assumes the ad exposure to be random. Column (2) shows the IV estimate that incorrectly assumes the auction participation to be a valid IV. Column (3) shows our method in Section 4 that leverages the knowledge of the throttling algorithm, which allows us to use participation as a conditional IV and excludes the auctions that the advertiser would never have participated in without the budget constraint.

Dependent variable:
Product Page View
OLS IV Our Method ()
Treatment Effect 0.019 0.044 0.011
(0.0004) (0.0021) (0.0027)
Observations 1,257,793 1,257,793 995,311
Note: p0.1; p0.05; p0.01
Table 3: Estimates of LATE

Compared to our estimate, the OLS and the incorrect IV both overestimate the treatment effect, respectively by 72% and 300%. This result is consistent with related literature that compares standard observational methods with experimental data (blake_consumer_2015 and gordon_comparison_2019, and gordon_close_2022), which finds that standard observational methods tend to over-state the benefit of advertising. To benchmark the degree of overestimation, we follow gordon_close_2022 to report the conversion lift of the estimated treatment effect, defined as

Both and can be estimated by the data, we report the estimated baseline conversion rate and the conversion lift in Table 4.

OLS IV Our Method
Treatment effect 0.019 0.044 0.011
Baseline conversion 0.003 -0.022 0.010
Conversion lift 666% NA 110%
Table 4: Method Comparison

The conversion lift for the IV estimate cannot be calculated because the estimated baseline conversion is negative, which is impossible because the minimum conversion rate is . This negative baseline conversion directly rejects that participation is a valid instrument. The conversion lift estimated using OLS, 600%, is much higher than the conversion lift estimated using our method, 110%, suggesting OLS may underestimate the baseline conversion rate. This underestimation is due to the fact that ad exposure is not random and may be positively correlated with the potential outcome, such that .

8 Conclusion

We demonstrate that probabilistic auction throttling, a budget control mechanism widely used by advertising platforms, induces pervasive quasi-experimental variation in auction participation. To leverage this variation for measuring the effectiveness of digital advertising, we propose a weighted IV estimator, where auction participation is a conditional instrumental variable. To account for the uncertainty in this novel setting, we also develop a bootstrap procedure that leverages the knowledge of the throttling algorithm to account for design-based uncertainty.

We use simulation to show that our method causally identifies the LATE. We then apply our method on an advertising campaign from an e-commerce advertising platform that used a probabilistic throttling algorithm. We show that our LATE estimate is significantly different from estimates obtained using other standard observational methods including OLS and IV. Our estimated conversion lift is 110%, a more plausible number than 600%, the conversion lifts estimated using naive observational methods. These results suggest that standard observational methods generates biases in our context, and highlights the need to leverage exogenous variation generated by the throttling algorithm.

9 Future Extensions: Multiple Ad Exposure

Because our primary goal is to illustrate the value of the variation induced by probabilistic throttling, we mainly demonstrated the case where the campaign has one advertising opportunity for each user. By an “advertising opportunity,” we mean an auction that the advertiser could potentially participate in, henceforth referred to as a “potential auction.” The procedure we have outlined can be extended easily to allow for cases when the campaign has multiple opportunities to advertise to each user.

The idea is to first divide users into distinct groups based on the number of potential auctions that are associated with each user who meets the focal advertiser’s targeting criteria. For example, group 1 would comprise users with exactly 1 potential auction, and group 2 would comprise users with exactly 2 potential auctions, and so forth. The key idea is to stratify users based on the potential, rather than the actual, number of auctions that the advertiser could have participated in for that user. This stratification could be thought of as a specific form of principle stratification (frangakis_principal_2002), except that the potential number of auctions may be directly observed in the data. For example, if the platform has logged all eligible auctions that meet the advertiser’s targeting criteria, then we directly observe the set of all potential auctions. The number of potential auctions for a given user can be directly calculated by summing up the number of potential auctions associated with the user. After this stratification, we can apply our estimator separately to each group to estimate a group-specific LATE. The advertiser could then utilize these group-specific LATE-s as is. Or they could combine them by taking a weighted average in order to generate an overall LATE across all groups. Intuitively, this procedure would allow advertisers to compare users associated with the same number of advertising opportunities and participation probabilities, but are exposed to different numbers and types of ads due to throttling and budget limitation. It has important managerial implication for advertisers, because they can better understand the ROI of different types of ads, how they interact with each other, and the optimal intensity of ads.

References

Appendix A Simulation Detail

Figure 4 shows the arrival of consumers. Because of auto-correlation, similar consumers tend to arrive around similar time.

Figure 4: Customer Type Vs Arrival Time

Figure 5 shows how budget evolves over time. Because of the throttling algorithm, the budget did not exhaust until the end of the campaign.

Figure 5: Remaining Budget Vs Arrival Time

Figure 6 shows how the participation probability changes over time due to the change in expenditure rate in the last interval.

Figure 6: Remaining Budget Vs Arrival Time

Such change is mainly driven by the type of customers, as illustrated in Figure 7

Figure 7: Remaining Budget Vs Arrival Time

Appendix B Inference Assuming Fixed Participation Probability

The inference would be easy if the participation probability is a priori fixed, and the realized participation is the only source of variation. Under the assumption that participation probabilities are fixed, our statistical inference problem reduces to a stratified experiment with partial compliance, in which strata are defined by participation probabilities.

Proposition 2.

Suppose the participation probabilities are a priori fixed. Let be the total sample size. As and s go to infinity at the same rate, the estimator is asymptotically normal:

in which the asymptotic variance is

where , , .

Proof.

We outline the proof in the following . Under proper regularity conditions for a central limit theorem, we have

in which

Apply delta method to the following expression

We get

in which the asymptotic variance is

where the gradient is

More explicitly,

To draw inference based on this asymptotic normality result, we need to estimate the asymptotic variance. Define the following estimator

(7)

where , , and are defined as

The terms on the right-hand side of the equations are

The variance estimator presented in the main text and reported in the simulation is

The confidence interval used in the simulation is the following

(8)

The assumption of non-stochastic participation probabilities is plausible when the algorithm dictates the schedule of the participation probability before the campaign starts: for example, the participation probability is scheduled to be before and after , regardless of the remaining budget.

Though the assumption of non-stochastic participation probability does not hold for throttling algorithms that adaptively adjust participation probabilities, the inference method will still work well when unpredictable variation in participation probabilities is only moderate. In particular, this inference method often works well for large advertising campaigns, of which participation probabilities follow a predictable temporal pattern.

In our simulation exercise, because the sample size is large and our algorithm uses average expenditure as the main input, the additional design-based uncertainty induced by the variation of the participation probability is small. Because some of the variance estimator for s are conservative when there is no variation in the participation probability, the confidence interval is conservative when the variation in participation probability is small.