 ## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 2 Our model

We refer the reader to the excellent treatise  on Mechanism design, for a detailed discussion of the preliminaries. A mechanism is defined by its allocation rule , and its payment rule . Among these truthful mechanisms are those in which revealing the true valuation is optimal for all bidders. Any allocation rule , of a truthful mechanism is monotone in for all . Further, it can be shown that for any mechanism there is a truthful mechanism which offers the same revenue to the seller, and the same utility to each bidder . As such, we restrict ourselves to truthful mechanisms. It is a well known fact  that for any truthful mechanism the payment rule , is uniquely defined by its allocation rule . Hence, for any truthful mechanism our only concern is the allocation rule, . Let be the distribution of valuation of a bidder,

be its probability density function, and

be its cumulative density function, then we define the virtual valuation , as . We say is regular iff is non-decreasing in . Likewise, we say is strictly regular iff is strictly increasing in .

##### Myerson’s Optimal Mechanism.

Myerson’s mechanism is defined as the VCG mechanism [3, 13, 28] where the virtual valuation , is submitted as the bid , for each bidder . If the valuations , and therefore the virtual valuations are independent, then for any truthful mechanism the virtual surplus, , is equal to the revenue, , in expectation. Since VCG is surplus maximizing, if Myerson’s mechanism is truthful then it maximizes the revenue.

##### Notation.

We represent the distribution of valuations of advertiser for user type by . Let be the virtual valuation of advertiser for user type , be its probability density function, and be its cummulative density function. We denote the joint virtual valuation of all advertisers for user type , by , and its joint probability density function by . The user types , are distributed according to the known distribution , and the mechanism’s allocation rule for user type , is denoted by .

### 2.1 Fairness Constraints

We would like to guarantee that advertisers have a fair coverage across user types. We do so by placing constraints on the coverage of an advertiser. Formally, we define an advertiser ’s coverage of user type as the probability that advertiser wins the auction conditioned on the user being of type .

 qij(xj) \coloneqqPrU[j]∫supp(ϕj)xij(ϕj)dfj(ϕj), (Coverage, 1)

where is the -th component of

. Towards ensuring that an advertiser has a fair coverage of different user types, we consider the proportional coverage of the advertiser on each user type. Given vectors

, , we define -fairness constraints for each advertiser and user type , as a lower bound , and an upper bound , on the proportion of users of type the advertiser shows ads to, i.e., we impose the following constraints,

 ℓij≤qij∑t∈[m]qit≤uij∀ i∈[n] and j∈[m]. ((ℓ,u)-fairness constraints, 2)

### 2.3 Optimization Problem

We would like to develop a mechanism which maximizes the revenue while satisfying the upper and lower bound constraints in Equation (2). Towards formally stating our problem, we define the revenue of mechanism , with an allocation rule , for user type as

 revM\coloneqq∑i∈[n],j∈[m]PrU[j]∫supp(ϕj)ϕijxij(ϕj)dϕj, (Revenue, 3)

where , and are the -th component of , and respectively. Thus, we can express our optimization problem with respect to functions , or as an infinite dimensional optimization problem as follows. (Infinite-dimensional fair advertising problem). For all user types , find the optimal allocation rule for,

 maxxij(⋅)≥0revM(x1,x2,…,xm) (4) s.t. (5) (6) ∑i∈[n]xij(ϕj)≤ 1 ∀ j∈[m],ϕj, (7)

where (5) encodes the lower bound constraints, (6) encodes the upper bound constraints, and (7) ensures that only one ad is allocated. In the above problem, we are looking for a collection of optimal continuous function . To be able to solve this problem, we need – in the least – a finite dimensional formulation of the fair online advertisement problem.

## 3 Theoretical Results

Our first result is structural, and gives a characterization of the optimal solution , to the infinite-dimensional fair advertising problem, in terms of a matrix , making it a finite-dimensional optimization problem with respect to .

###### Theorem 3.1.

(Characterization of an optimal allocation rule). There exists an , such that, if for all is strictly regular and independent, then , the set of allocation rules , defined below, is optimal for the infinite-dimensional fair advertising problem.

 xj(vj,αj)\coloneqq\operatornamewithlimitsargmaxi∈[n](ϕij+αij) (α-shifted mechanism, 8)

where we randomly breaks if any (this is equivalent to the allocation rule of VCG mechanism.)

We present the proof of Theorem 3.1 in Section 6.1. In the proof, we analyze the dual problem of the infinite-dimensional fair advertising problem. We reduce the dual problem to one lagrangian variable, by fixing the lagrangian variables corresponding lower bound (5) and upper bound (6) constraints to their optimal values. The resulting problem turns out to be the dual of the unconstrained revenue maximizing problem, for which Myerson’s mechanism is the optimal solution. We interpret the fixed lagrangian variables as shifting the original virtual valuations, . It then follows that for some , the -shifted mechanism (8) is the optimal solution to the infinite-dimensional fair advertising problem. Now, our task is reduced from finding an optimal allocation rule, to finding an characterizing the optimal allocation rule. Towards this, let us define the revenue, , and coverage , as functions of .

 \footnotesize{revshift(α)\coloneqq∑i∈[n]j∈[m]PrU[j]∫supp(fij)yfij(y)∏k∈[n]∖{i}Fkj(y+αij−αkj)dy} (Revenue α-shifted mechanism, 9) (Coverage α-shifted mechanism, 10)

Depending on the nature of the distribution, the gradients and may not be a monotone function of the

(e.g., consider the exponential distribution). Therefore, in general neither is

a concave, nor is a convex function of (see Section B for a concrete example). Hence, this optimization problem is non-convex both in its objective and in its constraints. We require further insights to solve the problem efficiently. Towards this we observe that revenue is a concave function of . Consider two allocation rules obtaining coverages , and revenues respectively. If we use the first with a probability , then we achieve a coverage , and a revenue . Therefore, the optimal allocation rule achieving , has at least revenue. Choosing the allocation rules which maximize the revenue for and respectively, this argument shows that revenue is a concave function of the coverage . Let , be the maximum revenue of the platform as a function of coverage .222We drop for some and each . This is crucial to calculate , see Remark 4.1. Consider the following two optimization problems. (Optimal coverage problem). Find the optimal for,

 maxq∈[0,1]n rev(q) (11) (12) (13) ∑i∈[n]qij≤ PrU[j] ∀ j∈[m] (14)

(Optimal shift problem). Given the target coverage , find the optimal for,

 minα∈Rn×mL(α)\coloneqq∥δ−q(α)∥2F (15)

Our next result relates the solution of the above two problems with the infinite-dimensional fair advertising problem.

###### Theorem 3.2.

Given a solution to the optimal coverage problem, the solution , to the optimal shift problem with , defines an optimal -shifted mechanism (8) for the infinite-dimensional fair advertising problem.

###### Proof.

Adding the all vector, , to for any , does not change the allocation rule of the -shifted mechanism. Thus, it suffices to show that for all , there is a unique , such that, and for all . We change show that for all , there is at-least one . In fact, the greedy algorithm which increases all , such that, and finds the required . Consider distinct , such that, and . We can show that , by Consider the advertiser and user type pair whose shift changes by the largest magnitude. We can show that , thereby proving that . ∎

The above theorem allows us to find the optimal , by solving the optimal coverage problem and optimal shift problem. Towards this, let us consider the optimal coverage problem. We already know that its objective is a concave function. We can further observe that its constraints are linear in . In particular, the constraints define a polytope, , which we refer to as the constraint-polytope. Therefore, it is a convex program, and a possible direction to solve this program is to use a gradient based methods. The trouble is that we do not have direct access to . A key idea is that, if we let , then we can calculate by solving the following linear-system,

 (Jq(α))⊤∇rev(δ)=∇revshift(α).

Where, is the Jacobian of , with respect .333 represents the vectorization operator. It turns out that this Jacobian is invertible for all , and therefore the above linear system has an exact solution (see Section 4.1 for the details). Let us consider the optimal shift problem. The objective of the problem is non-convex (see Figure 8(b) and Section B for an example.) Interestingly, is a linear combination of for all and . Since the rows of the Jacobian, are linearly independent, the gradient is never zero unless we are at the global minimum where . This guarantees that the objective function does not have any saddle-points or local-maximum, and that any local-minimum is a global minimum. Using this we can develop efficient algorithms to solve the optimal coverage problem (Lemma 6.2). This brings us to our main algorithmic result, which is an algorithm to find the optimal allocation rule for the infinite-dimensional fair advertising problem.

###### Theorem 3.3.

(An algorithm to solve the infinite-dimensional fair advertising problem). There is an algorithm (Algorithm 1) which outputs , such that, if assumptions (16), (17), (18), and (19) are satisfied, then the -shifted mechanism (8) achieves a revenue -close to the optimal for the infinite-dimensional fair advertising problem in

 ˜O(n7logmε2(μmaxρ)2(μminη)4(L+n2μ2max)) steps.

Where the arithmetic calculations in each step are bounded by calculating once, and hides factors in and .

Roughly, the above algorithm has a convergence rate of , under the assumptions which we list below.

Assumptions   [noitemsep] .

(-coverage, 16)
.
(Distributed distribution, 17)
.
(Lipschitz distribution, 18)
.
(Bounded bid, 19)

Assumption (16) guarantees that all advertisers have at least an probability of winning on every user type, assumption (17) places lower and upper bounded on the probability density functions of the , assumption (18) guarantees that the probability density functions of the is -Lipschitz continuous, and assumption (19) assumes that the expected is bounded. We expect Assumptions (16) and (19) to hold in any real-world setting. We can drop the lower bound in Assumption (17) by introducing “jumps” in , to avoid ranges where the measure of bids is small. Removing assumption (18) would be an interesting direction for future work.

### Some remarks

We inherit the assumption of independent and regular distributions from Myerson. In addition, we require the the distributions of valuations are strictly regular to guarantee that ties between bidders happen with probability.444We can drop this assumption by incorporating specific randomized tie-breaking rules, that retain fairness. We note that the above allocation rule is monotone and allocates the ad spot to the bidder with the highest shifted valuation for a given user. Thus it defines a unique truthful mechanism and the corresponding payment rule. We refer to as the shift of the mechanism.

## 4 Our Algorithm

Algorithm 1 solves the optimal coverage problem by performing a projected gradient descent in the constraint polytope, . It starts with an initial coverage, , and the corresponding approximate shift 555We note that one advertiser’s shift is fixed to , see Remark 4.1 At the step , it calculates the gradient of , by solving the linear system . In order to solve the above linear system, we need to calculate and . This can be done in steps, if is known (see Section 4.1). Therefore, the algorithm requires a “good” approximation of at each step. It maintains this, by updating the previous approximation, , at the -th iteration. It does so by using another algorithm (Algorithm 2) to approximately solve the optimal shift problem. After calculating the gradient, it takes a gradient step and projects the current iterate on . This takes time (see Section 4.2), where is the fast matrix multiplication coefficient. It continues this process for iterations to get an -accurate . This approximation of determines a solution to the infinite-dimensional fair advertising problem. By ensuring Algorithm 2 is accurate, we can bound the total error introduced by approximations of , and preserve the convergence rate of Algorithm 1. Next we give the details of the projecting on , and calculating the gradient .

### 4.1 Calculating and Bounding ∇rev(⋅)

Let be the Jacobian of the vectorized coverage, , with respect to the vectorized shift, . Here, we fix the shift of one advertiser for each user type .666In fact, we can relax this condition, by removing advertisers who have zero probability of winning. Therefore, the Jacobian is a matrix.

 Jq(α)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣∂q11(α)α11…∂q11(α)α(n−1)1…∂q11(α)α(n−1)m∂q21(α)α11…∂q21(α)α(n−1)1…∂q21(α)α(n−1)m⋮⋮⋱⋮⋮∂q(n−1)1(α)α11…∂q(n−1)1(α)α(n−1)1…∂q(n−1)1(α)α(n−1)m⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

To obtain , we use the fact that is always invertible (Lemma 4.2). Then, if we know for some , we can obtain the

by solving the following linear program.

###### Remark 4.1.

The Jacobian, , is invertible only when we fix the shift of one advertiser for each user type . Intuitively, if we increase the for all advertiser and one user type, the coverage remains invariant. As such, we cannot hope for the to be invertible without fixing one for each .

###### Lemma 4.2.

(Jacobian is invertible). For all , if all advertisers have non-zero coverage for all user types , then the Jacobian , is invertible.

###### Proof.

The coverage remains invariant if the bids of all advertisers are uniformly shifted for any given user type . Therefore, for all we have

 ∑t∈[n]∂qij∂αtj=0. (21)

Since, increasing the shift , does not increase the coverage for any , we have that

 ∂qkj∂αij≤0 and ∂qij∂αij≥0. (22)

Now, from Equation (21) we have

 ∀ i∈[n],j∈[m], ∂qij∂αij=∑t∈[n]∖{i}∣∣∣∂qij∂αtj∣∣∣. (23)

Further since the -th advertiser has non-zero coverage, i.e., there is non-zero probability that advertiser bids higher than all other advertisers, changing must affect all other advertisers. In other words, for all . Using this we have,

 ∀ i∈[n],j∈[m], ∂qij∂αij>∑t∈[n−1]∖{i}∣∣∣∂qij∂αtj∣∣∣. (24)

By observing that , on user type , is independent of the , of any user type , such that, , i.e.,

 ∀ i,s∈[n], j,t∈[m], s.t. j≠t, ∂qij∂αst=0, (25)

and using Equation (23), we get that the Jacobian, is strictly diagonally dominant. Now, by the properties of strictly dominant matrices it is invertible. ∎

###### Remark 4.3.

Since for any , is independent of for any (25). We claim that the Jacobian is sparse, and consists of only non-zero elements, which form diagonal matrices of size , along the main diagonal of the Jacobian. This allows us to solve the linear system in Equation (20) in steps, where is the fast matrix multiplication coefficient.

### 4.2 Projection on Constraint Polytope (Q)

Given any point , by determining the constraints it violates, we can express the projection on the constraint polytope, , as a quadratic program with equality constraints. Using this we can construct a projection oracle , which given a point , can project onto in arithmetic operations, where is the fast matrix multiplication coefficient.

## 5 Empirical Study (a) Implicit Fairness of Keyword Pairs. The x-axis represents fairness constraint ℓ (lower bound). We report number of keyword pairs satisfing each fairness level. We observe that 3282 auctions do not satisfy ℓ=0.3 fairness constraint. (a) Fairness. We report the fairness slift(F) achieved by our fair (F) mechanism for varying level of fairness.

We evaluate our approach empirically on the Yahoo! A1 dataset . We vary the strength of the fairness constraint for all advertisers, and find an optimal fair mechanism using Algorithm 1 and compare it against the optimal unconstrained (and hence potentially unfair) mechanism , which is given by Myerson . We first consider the impact of the fairness constraints on the revenue of the platform. Let denote the revenue of mechanism . We report the revenue ratio . Note that the revenue of can be at most that of , as it solves a constrained version of the same problem; thus . We then consider the impact of the fairness constraints on the advertisers. Towards this, we consider the distribution of winners amongst advertisers in an auction given by and an auction given by . We report the total variation distance between the two distributions, as a measure of how much the winning distribution changes due to the fairness constraints. Lastly, we consider the fairness of the resultant mechanism . To this end, we measure selection lift () achieved by , . Where , represents perfect fairness among the two user types.

### 5.1 Dataset

For the empirical results we use the Yahoo! A1 dataset , which contains bids placed by advertisers on the top 1000 keywords on Yahoo! Online Auctions between June 15, 2002 and June 14, 2003. The dataset has 10475 advertisers, and each advertiser places bids on a subset of keywords; there are approximately bids in the dataset. For each keyword , let be the set of advertisers that bid on it. We infer the distribution, , of valuation of an advertiser for a keyword by the bids they place on the keyword. In order to retain sufficiently rich valuation profiles for each advertiser, we remove advertisers who place less than 1000 bids on

, whose valuations have variance lower than

from , or who win the auction less than of the time. This retains more than bids. The actual keywords in the dataset are anonymized; hence, in order to determine whether two keywords and are related, we consider whether they share more that one advertisers, i.e., . This allows us to identify keywords that are related (see Figure 2(b)), and hence for which spillover effects may be present as described in . Drawing that analogy, one can think of each keyword in the pair as a different type of user for which the same advertisers are competing, and the goal would be for the advertiser to win an equal proportion of each user.
There are such pairs. However, we observe that spillover does not affect all keyword pairs (see Figure 2(a)). To test the effect of imposing fairness constrains in a challenging setting, we consider only the auctions which are not already fair; in particular there are keyword pairs which are less than fair. Figure 4: Effect of fairness on advertisers. The x-axis represents fairness constraint ℓ (lower bound). We report the dTV(M,F) of the distribution of ads allocated by the fair (F) and (M). Error bars represent the standard error of the mean .

### 5.2 Experimental Setup

As we only consider pairs of keywords in this experiment, a lower bound constraint is equivalent to an upper bound constraint . Hence, it suffices to consider lower bound constraints. We set , and vary uniformly from to , i.e., from the completely unconstrained case (which is equivalent to Myerson’s action) to completely constrained case (which requires each advertiser to win each keywords in the pair with exactly the same probability). We report and averaged over all auctions after iterations in Figure 4; error bars represent the standard error of the mean over iterations and 3282 auctions respectively.

### 5.3 Empirical Results

Fairness. Since the auctions are unbalanced to begin with, we expect the selection lift to increase with the fairness constraint. We observe a growing trend in the selection lift, eventually achieving perfect fair for .
Revenue Ratio. We do not expect to outperform the optimal unconstrained mechanism. However, we observe that even in the perfectly balanced setting with our mechanisms lose less than 5% of the revenue.
Advertiser Displacement. Since the auctions are unbalanced to begin with, we expect TV-distance to grow with the fairness constraint. We observe this growing trend in the TV-distance on lowering the risk-difference. Even for zero risk-difference () our mechanisms obtain a TV-distance is smaller than .

## 6 Proofs

### 6.1 Proof of Theorem 3.1

###### Proof.

Let us introduce three Lagrangian multipliers, a vector , a vector a continuous function , for the lower bound, upper bound, and single item constraints respectively. Then calculating the Lagrangian function we have,

The second integral is well defined by from the continuity of and monotonic nature of . In order for the supremum of the Lagrangian over to be bounded, the coefficient of must be non-positive. Therefore we require,

 ∫g⊆supp(ϕj)αij−βij+PrU[j]ϕij−∑t∈[m](αitℓit−βituit)−γj(ϕj)dfj(ϕj)≤0 ∀ g⊆supp(ϕj),i∈[n], j∈[m].

Since and are continuous, the former is equivalent to

If this holds, we can express the supremum of as,

 supxij(⋅)≥0L = ∑j∈[m]∫supp(ϕj)γj(ϕj)dfj(ϕj).

Now we can express the dual optimization problem as: (Dual of the infinite-dimensional fair advertising problem). For all , find a optimal and for

 minαj, βj≥0 γj(⋅)≥0∑j∈[m]∫supp(ϕj)γj(ϕj)dfj(ϕj) (26) s.t. αij−βij+PrU[j]ϕij−∑t∈[m](αitℓit−βituit)≤γj(ϕj)∀ i∈[n],j∈[m],ϕj. (27)

Since the primal is linear in , and the constraints are feasible strong duality holds. Therefore, the dual optimal is primal optimal. For any feasible constraints we have for all and . Therefore the coefficient of , , and that of , . Since and are non-negative, a optimal solution to the dual is finite. Let be a optimal solutions to the dual, and be a optimal solution to the primal. Fixing and to their optimal values and in the dual, let us define new virtual valuations , for all and

 ϕ′ij\coloneqqϕij+1PrU[j](α⋆ij−β⋆ij−∑t∈[m](α⋆itℓit−β⋆ituit)).

Then the leftover problem has only one Lagrangian multiplier, . Let be the affine transformation of defined on virtual valuations, i.e., , then the problem can be expressed as follows. (Dual with shifted virtual valuations). For all , find the optimal for

 minγj(⋅)≥0∑j∈[m]∫supp(ϕj)γj(ϕ′j)dfj(ϕ′j) (28) s.t.PrU[j]ϕ′ij≤γj(ϕ′j)∀ i∈[n],j∈[m],ϕ′. (29)

This is the dual of the following unconstrained revenue maximizing problem. Myerson’s mechanism is the revenue maximizing solution to the unconstrained optimization problem. Further, by linearity and feasibility of constraints strong duality holds. Therefore the -shifted mechanism, for is a optimal fair mechanism. (Unconstrained primal for the infinite-dimensional fair advertising problem). For all , find the optimal allocation rule for,

 maxxij(⋅)≥0revM(x1,x2,…,xm) s.t. ∑i∈[n]xij(ϕj)≤ 1 ∀ j∈[m],ϕj∈supp(ϕj).

Further, Myerson’s mechanism is truthful if the distribution of valuations are regular and independent. Since -shifted mechanism applies a constant shift to all valuation, it follows under the same assumptions that any -shifted mechanism is also truthful, and therefore has a unique payment rule defined by its allocation rule. ∎

### 6.2 Proof of Theorem 3.3

##### Supporting Lemmas.

Towards the proof of Theorem 3.3 we require the following two Lemmas. The first lemma shows that is Lipschitz continuous. Its proof is presented in Section 6.3.

###### Lemma 6.1.

(Revenue is Lipschitz). For all coverages , if assumption (16), (17) and (19) are satisfied, then

 |rev(q1)−rev(q2)|≤(μmaxρμminη)n2∥q1−q2∥F. (30)

The next lemma is an algorithm to solve the optimal shift problem. Its proof is presented in Section 6.9

###### Lemma 6.2.

(An algorithm to solve the optimal shift problem). There is an algorithm (Algorithm 2) which outputs , such that, if assumptions (16), (17) and (18) are satisfied, then is an -optimal solution for the optimal shift problem, i.e., , in

Where the arithmetic operations in each step are bounded by calculating the one

###### Proof of Theorem 3.3.

Algorithm 1 starts with an initial coverage , and performs a projected gradient descent on the polytope . Since is convex, the projection doesn’t increase the distance between , and the optimal solution in .

 ∀ q, ∥projQ(q)−q⋆∥2≤∥q−q⋆∥2 (31)

We calculate the shift, using Algorithm 2. This introduces some error, , at each iteration. We fix the optimal value of later in the proof. Let , be the coverage reached by the gradient update at iteration , i.e., , and be the coverage after calculating , i.e., where . Due to the error , deviates from . By definition of , we have the following bound on the deviation.

 (Error from Algorithm