Online advertisements are the main source of revenue for social-networking sites and search engines such as Google . Ad exchange platforms allow advertisers to select the target audience for their ad by specifying desired user demographics, interests and browsing histories . Every time a user loads a webpage or enters a search term, bids are collected from relevant advertisers , and an auction is conducted to determine which ad is shown, and how much the advertiser is charged [18, 31]. As it is not practical for advertisers to place individual bids for every user, the advertiser instead gives some high-level preferences about their budget and target audience, and the platform places bids on their behalf . More formally, let there be advertisers, and types of users. Each advertiser specifies their target demographic, average bid, and budget to the platform, which then decides a distribution, , of bids of advertiser for user type . These distributions represent the value of the user to the advertiser, and ensure that the advertiser only bids for users in their target demographic, with the expected bid not exceeding the amount specified by the advertiser . At each time step, a user visits a web page (e.g., Facebook or Twitter), the user’s type is observed, and a bid is drawn from , for each advertiser . Receiving these bids as input, the mechanism decides an allocation and price for the advertisement slot. Overall, such targeted advertising leads to higher utilities for the advertisers who show content to relevant audiences, for the users who view related advertisements, and for the platform which can benefit from selling targeted advertisements [8, 30, 9, 10]. However, targeted advertising can also lead to discriminatory practices. For instance, searches with “black-sounding” names were much more likely to be shown ads suggestive of an arrest record . Another study found that women were shown fewer advertisements for high paying jobs than men with similar profiles . In fact, a recent experiments demonstrated that ads can be inadvertently discriminatory ; they found that STEM job ads, specifically designed to be unbiased by the advertisers, were shown to more men than women across all major platforms (Facebook Ads, Google Ads, Instagram and Twitter). On Facebook, a platform with women  the advertisement was shown to more men than women. They suggest that this is a result of competitive spillovers among advertisers, and is neither a pure reflection of pre-existing cultural bias, nor a result of human input to the algorithm. Such (inadvertent) discrimination has led to two recent cases filed against Facebook, which will potentially lead to civil lawsuits alleging employment and housing discrimination [14, 25, 21] To gain intuition, consider the setting in which there are two advertisers with similar bids/budgets, but one advertiser specifically targets women (which is allowed for certain types of ads, e.g., related to clothing), while the second advertiser is does not target based on gender (e.g., because they are advertising a job). The first advertiser creates an imbalance on the platform by taking up ad slots for women and, as a consequence, the second advertiser ends up advertising to disproportionately fewer women and is inadvertently discriminatory. Currently, online advertising platforms have no mechanism to check this type of discrimination. In fact, the only way around this would be for the advertiser to set up separate campaigns for different user types and ensure that each one reached similar number of the sub-target audience, however doing so would violate discrimination rules as, in itself, each sub-advertisement would be discriminatorily selecting for a specific demographic .
Our main contribution is an optimization-based framework which maximizes the revenue of the platform subject to satisfying constraints that prevent the emergence of inadvertent discrimination as described above. The constraints can be formulated as any one of a wide class of “fairness” constraints as presented in . The framework allows for intersectionality, allowing constraints across multiple sensitive attributes (e.g., gender, race, geography and economic class) and allows for restricting different advertisers to different constraints. Formally, building on Myerson’s seminal work , we characterize the truthful revenue-optimal mechanism which satisfies the given constraints (Theorem 3.1). The user types, as defined by their sensitive attributes, are taken as input along with the type-specific bid distributions for each advertiser, and we assume that bids are drawn from these distributions independently. Our mechanism is parameterized by constant “shifts” which it applies to bids for each advertiser-type pair. Finding the parameters of this optimal mechanism, however, is a non-convex optimization problem, both the objective and the constraints. Towards solving this, we first propose a novel reformulation of the objective as a composition of a convex function constrained on a polytope, and an unconstrained non-convex function (Theorem 3.2). Interestingly, the non-convex function is reasonably well behaved, with no saddle-points or local-maxima. This allows us to develop a gradient descent based scheme (Algorithm 1) to solve the reformulated program, which under mild assumptions has a fast convergence rate of (Theorem 3.3). We evaluate our approach empirically by studying the effect of the constraints on the revenue of the platform and the advertisers using the Yahoo! Search Marketing Advertising Bidding Data . We find that our mechanism can obtain uniform coverage across different user types for each advertiser while loosing less than 5% of the revenue (Figure 3(b)). Further, we observe that the total-variation distance between the fair and unconstrained distributions of total advertisements an advertiser shows on the platform is less than (Figure 4). To the best of our knowledge, we are the first to give a framework to prevent inadvertent discrimination in online auctions.
2 Our model
We refer the reader to the excellent treatise  on Mechanism design, for a detailed discussion of the preliminaries.
A mechanism is defined by its allocation rule , and its payment rule .
Among these truthful mechanisms are those in which revealing the true valuation is optimal for all bidders.
Any allocation rule , of a truthful mechanism is monotone in for all .
Further, it can be shown that for any mechanism there is a truthful mechanism which offers the same revenue to the seller, and the same utility to each bidder .
As such, we restrict ourselves to truthful mechanisms.
It is a well known fact  that for any truthful mechanism the payment rule , is uniquely defined by its allocation rule .
Hence, for any truthful mechanism our only concern is the allocation rule, .
Let be the distribution of valuation of a bidder, be its probability density function, and
be its probability density function, andbe its cumulative density function, then we define the virtual valuation , as . We say is regular iff is non-decreasing in . Likewise, we say is strictly regular iff is strictly increasing in .
Myerson’s Optimal Mechanism.
Myerson’s mechanism is defined as the VCG mechanism [3, 13, 28] where the virtual valuation , is submitted as the bid , for each bidder . If the valuations , and therefore the virtual valuations are independent, then for any truthful mechanism the virtual surplus, , is equal to the revenue, , in expectation. Since VCG is surplus maximizing, if Myerson’s mechanism is truthful then it maximizes the revenue.
We represent the distribution of valuations of advertiser for user type by . Let be the virtual valuation of advertiser for user type , be its probability density function, and be its cummulative density function. We denote the joint virtual valuation of all advertisers for user type , by , and its joint probability density function by . The user types , are distributed according to the known distribution , and the mechanism’s allocation rule for user type , is denoted by .
2.1 Fairness Constraints
We would like to guarantee that advertisers have a fair coverage across user types. We do so by placing constraints on the coverage of an advertiser. Formally, we define an advertiser ’s coverage of user type as the probability that advertiser wins the auction conditioned on the user being of type .
where is the -th component of .
Towards ensuring that an advertiser has a fair coverage of different user types, we consider the proportional coverage of the advertiser on each user type.
. Towards ensuring that an advertiser has a fair coverage of different user types, we consider the proportional coverage of the advertiser on each user type. Given vectors, , we define -fairness constraints for each advertiser and user type , as a lower bound , and an upper bound , on the proportion of users of type the advertiser shows ads to, i.e., we impose the following constraints,
|(-fairness constraints, 2)|
2.2 Discussion of Fairness Constraints
Returning to the example presented in the introduction, we can ensure that the advertiser shows % of total ads to women, by choosing a lower bound of for this advertiser on women. More generally, for user types, moderately placed lower bounds and upper bounds ( and ), for some subset of advertisers, ensure this subset has a uniform coverage across all user types, while allowing other advertisers to target a specific user types. Importantly, while ensuring fairness across multiple user types our constraints allow for targeting within any single user type. This is vital as the advertiser may not derive the same utility from each user, and could be willing to pay a higher amount for more relevant users in the same user type. For example, if the advertiser is displaying job ads, then a user already looking for job opportunities may be of a higher value to the advertiser than one who is not. For a detailed discussion on how such constraints can encapsulate other popular metrics, such as risk-difference, we refer the reader to .
2.3 Optimization Problem
We would like to develop a mechanism which maximizes the revenue while satisfying the upper and lower bound constraints in Equation (2). Towards formally stating our problem, we define the revenue of mechanism , with an allocation rule , for user type as
where , and are the -th component of , and respectively. Thus, we can express our optimization problem with respect to functions , or as an infinite dimensional optimization problem as follows. (Infinite-dimensional fair advertising problem). For all user types , find the optimal allocation rule for,
where (5) encodes the lower bound constraints, (6) encodes the upper bound constraints, and (7) ensures that only one ad is allocated. In the above problem, we are looking for a collection of optimal continuous function . To be able to solve this problem, we need – in the least – a finite dimensional formulation of the fair online advertisement problem.
3 Theoretical Results
Our first result is structural, and gives a characterization of the optimal solution , to the infinite-dimensional fair advertising problem, in terms of a matrix , making it a finite-dimensional optimization problem with respect to .
(Characterization of an optimal allocation rule). There exists an , such that, if for all is strictly regular and independent, then , the set of allocation rules , defined below, is optimal for the infinite-dimensional fair advertising problem.
|(-shifted mechanism, 8)|
where we randomly breaks if any (this is equivalent to the allocation rule of VCG mechanism.)
We present the proof of Theorem 3.1 in Section 6.1. In the proof, we analyze the dual problem of the infinite-dimensional fair advertising problem. We reduce the dual problem to one lagrangian variable, by fixing the lagrangian variables corresponding lower bound (5) and upper bound (6) constraints to their optimal values. The resulting problem turns out to be the dual of the unconstrained revenue maximizing problem, for which Myerson’s mechanism is the optimal solution. We interpret the fixed lagrangian variables as shifting the original virtual valuations, . It then follows that for some , the -shifted mechanism (8) is the optimal solution to the infinite-dimensional fair advertising problem. Now, our task is reduced from finding an optimal allocation rule, to finding an characterizing the optimal allocation rule. Towards this, let us define the revenue, , and coverage , as functions of .
|(Revenue -shifted mechanism, 9)|
|(Coverage -shifted mechanism, 10)|
Depending on the nature of the distribution, the gradients and may not be a monotone function of the (e.g., consider the exponential distribution).
Therefore, in general neither is
(e.g., consider the exponential distribution). Therefore, in general neither isa concave, nor is a convex function of (see Section B for a concrete example). Hence, this optimization problem is non-convex both in its objective and in its constraints. We require further insights to solve the problem efficiently. Towards this we observe that revenue is a concave function of . Consider two allocation rules obtaining coverages , and revenues respectively. If we use the first with a probability , then we achieve a coverage , and a revenue . Therefore, the optimal allocation rule achieving , has at least revenue. Choosing the allocation rules which maximize the revenue for and respectively, this argument shows that revenue is a concave function of the coverage . Let , be the maximum revenue of the platform as a function of coverage .222We drop for some and each . This is crucial to calculate , see Remark 4.1. Consider the following two optimization problems. (Optimal coverage problem). Find the optimal for,
(Optimal shift problem). Given the target coverage , find the optimal for,
Our next result relates the solution of the above two problems with the infinite-dimensional fair advertising problem.
Given a solution to the optimal coverage problem, the solution , to the optimal shift problem with , defines an optimal -shifted mechanism (8) for the infinite-dimensional fair advertising problem.
Adding the all vector, , to for any , does not change the allocation rule of the -shifted mechanism. Thus, it suffices to show that for all , there is a unique , such that, and for all . We change show that for all , there is at-least one . In fact, the greedy algorithm which increases all , such that, and finds the required . Consider distinct , such that, and . We can show that , by Consider the advertiser and user type pair whose shift changes by the largest magnitude. We can show that , thereby proving that . ∎
The above theorem allows us to find the optimal , by solving the optimal coverage problem and optimal shift problem. Towards this, let us consider the optimal coverage problem. We already know that its objective is a concave function. We can further observe that its constraints are linear in . In particular, the constraints define a polytope, , which we refer to as the constraint-polytope. Therefore, it is a convex program, and a possible direction to solve this program is to use a gradient based methods. The trouble is that we do not have direct access to . A key idea is that, if we let , then we can calculate by solving the following linear-system,
Where, is the Jacobian of , with respect .333 represents the vectorization operator. It turns out that this Jacobian is invertible for all , and therefore the above linear system has an exact solution (see Section 4.1 for the details). Let us consider the optimal shift problem. The objective of the problem is non-convex (see Figure 8(b) and Section B for an example.) Interestingly, is a linear combination of for all and . Since the rows of the Jacobian, are linearly independent, the gradient is never zero unless we are at the global minimum where . This guarantees that the objective function does not have any saddle-points or local-maximum, and that any local-minimum is a global minimum. Using this we can develop efficient algorithms to solve the optimal coverage problem (Lemma 6.2). This brings us to our main algorithmic result, which is an algorithm to find the optimal allocation rule for the infinite-dimensional fair advertising problem.
(An algorithm to solve the infinite-dimensional fair advertising problem). There is an algorithm (Algorithm 1) which outputs , such that, if assumptions (16), (17), (18), and (19) are satisfied, then the -shifted mechanism (8) achieves a revenue -close to the optimal for the infinite-dimensional fair advertising problem in
Where the arithmetic calculations in each step are bounded by calculating once, and hides factors in and .
Roughly, the above algorithm has a convergence rate of , under the assumptions which we list below.
Assumption (16) guarantees that all advertisers have at least an probability of winning on every user type, assumption (17) places lower and upper bounded on the probability density functions of the , assumption (18) guarantees that the probability density functions of the is -Lipschitz continuous, and assumption (19) assumes that the expected is bounded. We expect Assumptions (16) and (19) to hold in any real-world setting. We can drop the lower bound in Assumption (17) by introducing “jumps” in , to avoid ranges where the measure of bids is small. Removing assumption (18) would be an interesting direction for future work.
We inherit the assumption of independent and regular distributions from Myerson. In addition, we require the the distributions of valuations are strictly regular to guarantee that ties between bidders happen with probability.444We can drop this assumption by incorporating specific randomized tie-breaking rules, that retain fairness. We note that the above allocation rule is monotone and allocates the ad spot to the bidder with the highest shifted valuation for a given user. Thus it defines a unique truthful mechanism and the corresponding payment rule. We refer to as the shift of the mechanism.
4 Our Algorithm
Algorithm 1 solves the optimal coverage problem by performing a projected gradient descent in the constraint polytope, . It starts with an initial coverage, , and the corresponding approximate shift 555We note that one advertiser’s shift is fixed to , see Remark 4.1 At the step , it calculates the gradient of , by solving the linear system . In order to solve the above linear system, we need to calculate and . This can be done in steps, if is known (see Section 4.1). Therefore, the algorithm requires a “good” approximation of at each step. It maintains this, by updating the previous approximation, , at the -th iteration. It does so by using another algorithm (Algorithm 2) to approximately solve the optimal shift problem. After calculating the gradient, it takes a gradient step and projects the current iterate on . This takes time (see Section 4.2), where is the fast matrix multiplication coefficient. It continues this process for iterations to get an -accurate . This approximation of determines a solution to the infinite-dimensional fair advertising problem. By ensuring Algorithm 2 is accurate, we can bound the total error introduced by approximations of , and preserve the convergence rate of Algorithm 1. Next we give the details of the projecting on , and calculating the gradient .
4.1 Calculating and Bounding
Let be the Jacobian of the vectorized coverage, , with respect to the vectorized shift, . Here, we fix the shift of one advertiser for each user type .666In fact, we can relax this condition, by removing advertisers who have zero probability of winning. Therefore, the Jacobian is a matrix.
|(Gradient oracle, 20)|
The Jacobian, , is invertible only when we fix the shift of one advertiser for each user type . Intuitively, if we increase the for all advertiser and one user type, the coverage remains invariant. As such, we cannot hope for the to be invertible without fixing one for each .
(Jacobian is invertible). For all , if all advertisers have non-zero coverage for all user types , then the Jacobian , is invertible.
The coverage remains invariant if the bids of all advertisers are uniformly shifted for any given user type . Therefore, for all we have
Since, increasing the shift , does not increase the coverage for any , we have that
Now, from Equation (21) we have
Further since the -th advertiser has non-zero coverage, i.e., there is non-zero probability that advertiser bids higher than all other advertisers, changing must affect all other advertisers. In other words, for all . Using this we have,
By observing that , on user type , is independent of the , of any user type , such that, , i.e.,
and using Equation (23), we get that the Jacobian, is strictly diagonally dominant. Now, by the properties of strictly dominant matrices it is invertible. ∎
Since for any , is independent of for any (25). We claim that the Jacobian is sparse, and consists of only non-zero elements, which form diagonal matrices of size , along the main diagonal of the Jacobian. This allows us to solve the linear system in Equation (20) in steps, where is the fast matrix multiplication coefficient.
4.2 Projection on Constraint Polytope ()
Given any point , by determining the constraints it violates, we can express the projection on the constraint polytope, , as a quadratic program with equality constraints. Using this we can construct a projection oracle , which given a point , can project onto in arithmetic operations, where is the fast matrix multiplication coefficient.
5 Empirical Study
(lower bound). Error bars represent the standard error of the mean .
We evaluate our approach empirically on the Yahoo! A1 dataset . We vary the strength of the fairness constraint for all advertisers, and find an optimal fair mechanism using Algorithm 1 and compare it against the optimal unconstrained (and hence potentially unfair) mechanism , which is given by Myerson . We first consider the impact of the fairness constraints on the revenue of the platform. Let denote the revenue of mechanism . We report the revenue ratio . Note that the revenue of can be at most that of , as it solves a constrained version of the same problem; thus . We then consider the impact of the fairness constraints on the advertisers. Towards this, we consider the distribution of winners amongst advertisers in an auction given by and an auction given by . We report the total variation distance between the two distributions, as a measure of how much the winning distribution changes due to the fairness constraints. Lastly, we consider the fairness of the resultant mechanism . To this end, we measure selection lift () achieved by , . Where , represents perfect fairness among the two user types.
For the empirical results we use the Yahoo! A1 dataset , which contains bids placed by advertisers on the top 1000 keywords on Yahoo! Online Auctions between June 15, 2002 and June 14, 2003.
The dataset has 10475 advertisers, and each advertiser places bids on a subset of keywords;
there are approximately bids in the dataset.
For each keyword , let be the set of advertisers that bid on it.
We infer the distribution, , of valuation of an advertiser for a keyword by the bids they place on the keyword.
In order to retain sufficiently rich valuation profiles for each advertiser, we remove advertisers who place less than 1000 bids on , whose valuations have variance lower than
There are such pairs. However, we observe that spillover does not affect all keyword pairs (see Figure 2(a)). To test the effect of imposing fairness constrains in a challenging setting, we consider only the auctions which are not already fair; in particular there are keyword pairs which are less than fair.
, whose valuations have variance lower thanfrom , or who win the auction less than of the time. This retains more than bids. The actual keywords in the dataset are anonymized; hence, in order to determine whether two keywords and are related, we consider whether they share more that one advertisers, i.e., . This allows us to identify keywords that are related (see Figure 2(b)), and hence for which spillover effects may be present as described in . Drawing that analogy, one can think of each keyword in the pair as a different type of user for which the same advertisers are competing, and the goal would be for the advertiser to win an equal proportion of each user.
5.2 Experimental Setup
As we only consider pairs of keywords in this experiment, a lower bound constraint is equivalent to an upper bound constraint . Hence, it suffices to consider lower bound constraints. We set , and vary uniformly from to , i.e., from the completely unconstrained case (which is equivalent to Myerson’s action) to completely constrained case (which requires each advertiser to win each keywords in the pair with exactly the same probability). We report and averaged over all auctions after iterations in Figure 4; error bars represent the standard error of the mean over iterations and 3282 auctions respectively.
5.3 Empirical Results
Since the auctions are unbalanced to begin with, we expect the selection lift to increase with the fairness constraint.
We observe a growing trend in the selection lift, eventually achieving perfect fair for .
Revenue Ratio. We do not expect to outperform the optimal unconstrained mechanism. However, we observe that even in the perfectly balanced setting with our mechanisms lose less than 5% of the revenue.
Advertiser Displacement. Since the auctions are unbalanced to begin with, we expect TV-distance to grow with the fairness constraint. We observe this growing trend in the TV-distance on lowering the risk-difference. Even for zero risk-difference () our mechanisms obtain a TV-distance is smaller than .
6.1 Proof of Theorem 3.1
Let us introduce three Lagrangian multipliers, a vector , a vector a continuous function , for the lower bound, upper bound, and single item constraints respectively. Then calculating the Lagrangian function we have,
The second integral is well defined by from the continuity of and monotonic nature of . In order for the supremum of the Lagrangian over to be bounded, the coefficient of must be non-positive. Therefore we require,
Since and are continuous, the former is equivalent to
If this holds, we can express the supremum of as,
Now we can express the dual optimization problem as: (Dual of the infinite-dimensional fair advertising problem). For all , find a optimal and for
Since the primal is linear in , and the constraints are feasible strong duality holds. Therefore, the dual optimal is primal optimal. For any feasible constraints we have for all and . Therefore the coefficient of , , and that of , . Since and are non-negative, a optimal solution to the dual is finite. Let be a optimal solutions to the dual, and be a optimal solution to the primal. Fixing and to their optimal values and in the dual, let us define new virtual valuations , for all and
Then the leftover problem has only one Lagrangian multiplier, . Let be the affine transformation of defined on virtual valuations, i.e., , then the problem can be expressed as follows. (Dual with shifted virtual valuations). For all , find the optimal for
This is the dual of the following unconstrained revenue maximizing problem. Myerson’s mechanism is the revenue maximizing solution to the unconstrained optimization problem. Further, by linearity and feasibility of constraints strong duality holds. Therefore the -shifted mechanism, for is a optimal fair mechanism. (Unconstrained primal for the infinite-dimensional fair advertising problem). For all , find the optimal allocation rule for,
Further, Myerson’s mechanism is truthful if the distribution of valuations are regular and independent. Since -shifted mechanism applies a constant shift to all valuation, it follows under the same assumptions that any -shifted mechanism is also truthful, and therefore has a unique payment rule defined by its allocation rule. ∎
6.2 Proof of Theorem 3.3
The next lemma is an algorithm to solve the optimal shift problem. Its proof is presented in Section 6.9
(An algorithm to solve the optimal shift problem). There is an algorithm (Algorithm 2) which outputs , such that, if assumptions (16), (17) and (18) are satisfied, then is an -optimal solution for the optimal shift problem, i.e., , in
Where the arithmetic operations in each step are bounded by calculating the one
Proof of Theorem 3.3.
Algorithm 1 starts with an initial coverage , and performs a projected gradient descent on the polytope . Since is convex, the projection doesn’t increase the distance between , and the optimal solution in .
We calculate the shift, using Algorithm 2. This introduces some error, , at each iteration. We fix the optimal value of later in the proof. Let , be the coverage reached by the gradient update at iteration , i.e., , and be the coverage after calculating , i.e., where . Due to the error , deviates from . By definition of , we have the following bound on the deviation.