Online advertising has become a key revenue source for many businesses on the Internet. Sponsored search is a major type of online advertising, which displays paid advertisements (ads) along with organic search results. Generalized Second Price (GSP) auction is one of the most commonly used auction mechanisms in sponsored search, which works as follows. When a query is issued by a web user, the search engine ranks all the ads bidding on this query (or keywords related to the query) according to their bid prices, and charges the owner of a clicked ad by the minimum bid price for him/her to maintain the current rank position.111In practice, the predicted click-through rate is also used in the ranking and pricing rules. However, it can be safely absorbed into the weighted bid prices without influencing the theoretical analysis on the GSP auctions.
If only the ads that exactly bid on the query are included in the auction, we call the corresponding mechanism an exact-match mechanism. The GSP auction in this specific setting has been well studied in the literature [2, 29, 23, 37, 1, 8, 18], and has been shown to have a number of nice theoretical properties: (1) It possesses an efficient (welfare-maximizing) Nash equilibrium; (2) Its social welfare in equilibrium is fairly good even in the worst case : the pure price of anarchy (PoA) is bounded by 1.282 and the Bayes-Nash PoA is bounded by 2.927; (3) In the Bayesian setting, the GSP auction paired with the Myerson reserve price generates at least a constant fraction (i.e., ) of the optimal revenue in its Bayes-Nash equilibria for MHR distribution.
Despite the fruitful and positive results, the exact-match mechanism is not sufficient when we are faced with practical requirements in commercial search engines. First, the query space is extremely large (billions of queries are issued by web users every day), so it is practically impossible for advertisers to bid on every query related to their ads. Second, even if advertisers are capable enough to bid on the huge number of related queries, the search engine might not be able to afford it due to the scalability and latency constraints. Due to these reasons, commercial search engines usually use a broad-match mechanism to enhance the GSP auction. A broad-match mechanism requires advertisers to bid on at most keywords instead of an arbitrary number of queries, and matches the keywords to queries using a query-keyword bipartite graph (in which the number of keywords is significantly smaller than the number of queries). The broad-match mechanism is friendly to advertisers since they only need to consider a relatively small number of keywords in order to reach a large number of related queries. The mechanism is also friendly to the search engine since it restricts the complexity of the bidding language and therefore that of the auction system.
Today, most search engines implement the broad-match mechanism in a straightforward manner. That is, when a query is issued, all the ads bidding on the keywords that can be matched to the query on the query-keyword bipartite graph will be put together for the GSP auction. And for every advertiser, the bids on the matched keywords will be transformed to the bid on the query using some pre-defined heuristics (e.g., the maximum bid on the matched keywords). For ease of reference, we call the broad-match mechanism described above as theStandard Broad-Match GSP mechanism, or SBM-GSP for short.
Although this mechanism effectively addresses the problems with the exact-match mechanism, as far as we know, it has several theoretical drawbacks.
The social welfare of the SBM-GSP mechanism was studied in , for the single-slot case and full-information setting only. By using the notion of homogeneity (denoted as ) to measure the diversity of an advertiser’s valuations over different queries that can be matched to a keyword, an almost-tight pure PoA bound was derived, whose order is . Considering that is usually large in practice, it can be concluded that the social welfare of the SBM-GSP mechanism can be rather bad in its worst equilibrium.
One has not obtained a complete picture about the theoretical properties of the SBM-GSP mechanism: no results are available regarding the multi-slot case (which is, however, more practically important since most search engines sell multiple ad slots per query), and even for the single-slot case, the social welfare and revenue in the Bayesian setting are not clear.
Given the aforementioned limitations of the SBM-GSP mechanism, a natural question to ask is whether we can design a broad-match mechanism with better guarantees on its performance, in terms of both social welfare and revenue, for both single-slot and multi-slot cases, and in both full-information and Bayesian settings. This is exactly the focus of our work.
In this paper, we propose a new broad-match mechanism, which we call Probabilistic Broad-Match mechanism. Its basic idea is as follows. For each query, our mechanism assigns a matching probability to every keyword that can be matched to this query on the query-keyword bipartite graph. When the query is issued by a user, the mechanism randomly chooses a keyword according to the matching probability distribution and runs the GSP auction only upon those ads that bid on the chosen keyword. For simplicity, we also use PBM-GSP to refer to the above mechanism.
We perform a comprehensive study on the social welfare in equilibrium of the PBM-GSP mechanism, for both single-slot and multi-slot cases, and in both full-information and Bayesian settings. We also derive a revenue bound for the PBM-GSP mechanism for both single-slot and multi-slot cases in the Bayesian setting. To the best of our knowledge, this is the first work on broad-match mechanisms that goes far beyond the single-slot case and the full-information setting.
The contributions of our work can be summarized as follows.
(Section 3) We propose a novel broad-match mechanism (i.e., the PBM mechanism) for multi-slot sponsored search auctions.
(Section 4) We analyze the social welfare in equilibrium of the PBM-GSP mechanism in both full-information and Bayesian settings. We define a new concept, called keyword-level expressiveness (denoted as ), which can better characterize the expressiveness of the bidding language in the PBM-GSP mechanism than the concept of expressiveness proposed in previous work .
(Section 4.1) We extend the concept of homogeneity defined in  to the Bayesian setting, and prove that the Bayes-Nash PoA of PBM-GSP is at most in the multi-slot case. The bound can be further optimized to in the single-slot case.
(Section 4.2) We prove that in the full-information setting, the pure PoA of PBM-GSP is at most when there are multiple slots to display ads. And the bound can be improved to in the single-slot case (which is tight with respect to each factor). Furthermore, we show that the pure PoA bound of PBM-GSP is better than that of SBM-GSP in the same setting under mild conditions.
(Section 5) We analyze the revenue bound of PBM-GSP in the Bayesian setting. We prove that by using the Myerson reserve price to each keyword, PBM-GSP can achieve a revenue at least of the optimal social welfare with MHR distribution, where is the maximum derivative of the virtual value function.
In this section, we introduce the basics about broad-match auctions, and some preliminary concepts that will be used in our theoretical analysis.
2.1 Broad-Match Auctions
According to [21, 7, 20, 14], a broad-match mechanism can be defined on a query-keyword bipartite graph. Denote as the query space, and denote as a probability distribution over , which indicates the probability that query is issued by users. Denote as the keyword space. In practice, the size of is much larger than the size of . Denote as a (undirected) bipartite graph between queries and keywords, in which an edge if and only if query can be matched to keyword (or equivalently, can be matched to ). Denote as the neighborhood of vertex , i.e., for any query , represents the set of keywords that can be matched to the query, and for any keyword , represents the set of queries that can be matched to the keyword. Without loss of generality, we assume , for all and , for all .
Assume there are advertisers and slots. Denote as the click probability associated with the -th ad slot222In real world, the slot number is usually bounded by a constant . In this case, we can define without loss of any generality., which satisfies i.f.f . We assume advertiser has a private valuation for query if his/her ad is clicked by the users, denote as the valuation profile of advertisers in which
is the vector indicating the-th advertiser’s valuation for all the queries, and as the valuations of the other advertisers. We assume for any query , there is at least one advertiser that positively valuates it. Define as the query set that advertiser has positive values on. For ease of reference, in the rest of the paper, we will call the queries (keywords) that an advertiser positively valuates positive queries (keywords).
Denote as the advertisers’ bid profile, where is a vector indicating the -th advertiser’s bid prices on all the keywords in , and denote as the bids of advertisers excluding . According to the industry practice, we assume that each advertiser can only bid on up to keywords. As a result, for each , there are at most positive values. Denote as the bid price of advertiser on keyword and as all the advertisers’ bids on keyword .
Based on the notations above, SBM-GSP can be described as follows. When a query is issued, the SBM-GSP mechanism first finds all the keywords that can be matched to the query. Second, it includes all the ads that bid on these keywords into the auction and uses the following formula to transform the bid prices on keywords of advertiser to his/her bid price on the query: . In the end, the GSP auction is run upon the ads with their query-level bids, i.e., all the ads are ranked by their bids, and the payment of a clicked ad equals the bid of the ad ranked right below it.
2.2 Solution Concepts
In this paper, we consider rational behaviors under various assumptions on the information availablity to the advertisers. In general, the advertisers are engaged as players in a game defined by the auction mechanism (in the remaining of the paper, we use “advertiser” and “player” interchangeably). Every advertiser aims at selecting a bidding strategy that maximizes his/her utility. According to the availability of the information, we can categorize the settings into the Bayesian setting (partial information setting) and the full-information setting respectively.
In the Bayesian setting, we assume that the valuation (type) profile is drawn from a publicly known distribution . A strategy for player is a (possibly randomized) mapping , mapping his/her type to a bid vector . We use to denote the corresponding bid profile when is applied to . Denote as the utility function of advertiser . We say a strategy is a Bayes-Nash equilibrium for distribution , if for all , all , and all alternative strategies ,
In other words, in a Bayes-Nash equilibrium, each player maximizes his/her expected utility using strategy , assuming that the others bid according to strategies .
In the full-information setting, the valuation profile is known and fixed. In this setting, a pure strategy of any advertiser corresponds to a bid vector . we say that a bid profile is a (pure) Nash equilibrium if there is no deviation from which the players can be better off, i.e., for all advertiser , for all ,
3 Probabilistic Broad-Match Mechanism
As discussed in the introduction, the SBM-GSP mechanism has several drawbacks from a theoretical perspective. In this paper, we develop a new broad-match mechanism with better theoretical guarantee, which we call Probabilistic Broad-Match (PBM-GSP) mechanism. The detail of the PBM-GSP mechanism is described in Algorithm 1, and can be explained as below.
Given the query-keyword bipartite graph , for each query , we impose a matching probability distribution whose support is , i.e., if and only if , and . With this matching probability distribution, for any issued query , the mechanism randomly samples a keyword , and selects the ads bidding on the keyword into the auction. For each selected ad, the bid price on keyword will be directly used as the bid price on query during this round of auction, 333One may have noticed that due to the probabilistic sampling, an advertiser can only get access to a fraction of the whole query volume if he remains bidding on the same set of keywords as he/she does with SBM-GSP. Therefore, some advertisers may have to bid on more keywords so as to maintain the same visibility of their ads to the users. Fortunately, since the number of keywords is always significantly smaller than the number of queries, the situation will not be as serious as in exact-match mechanism. i.e., , where , and then a GSP auction is run to determine the ad allocations and prices.
For ease of description, we define as the advertiser who is ranked at position and as the ranking position of advertiser , for any keyword and bid profile . For sake of rigorousness, we define if there are fewer than positive bids on keyword , and define , for any query and keyword . We also define if advertiser does not bid on keyword , and define . Define as the price charged to player when keyword is sampled and a user clicks on the ads, i.e., for PBM-GSP, if advertiser is ranked right below advertiser , then . With the aforementioned notations, the expected utility of advertiser can be defined as
As a common way to rule out unnatural equilibria [10, 32, 8, 14], we only consider conservative bidders in the theoretical analysis. It is easy to show that for any advertiser on any keyword , a bidding price is always weakly dominated by the bid (see Lemma 1), in which is the expected value of keyword for advertiser and defined as . (Conservative bidder) For any advertiser , a bid price for keyword is always weakly dominated by , where . Note that with the PBM-GSP mechanism, advertisers will not compete across keywords. For advertiser , denote as his/her utility obtained from keyword . It is easy to see that . For any bidding profile , if advertiser bids a value larger than on keyword and get the same position as bidding , changing his/her bid to will not hurt his/her total utility. If he/she bids a larger value and obtains a better position , he/she will suffer a payment larger than when his/her ad is clicked, and therefore his/her expected utility on keyword must be less than , and the theorem follows. In PBM mechanism, bids for different keywords will not be mixed up in the same auction, it is easier for advertisers to evaluate their payoffs on each keyword. As a result, they could develop more accurate bidding strategies to reflect their valuations on each keyword. For example, it can be easily shown that in single-slot setting, the dominant strategy for an advertiser is to truthfully report the expected valuation on the keyword that he/she chooses to bid. When there is only one slot to display Ads, for any advertiser , the weakly dominant strategy for keyword is . In the next sections, we show this probabilistic matching can eventually improve the performances of the auction system.
4 Social Welfare Analysis
In this section, we present our theoretical results on the social welfare (efficiency) of the proposed PBM-GSP mechanism. Specifically, we study the ratio between the optimal social welfare and the worst-case welfare in equilibrium, which is also known as the Price of Anarchy (PoA) [27, 22, 12, 4]:
Bayes-Nash PoA : In the Bayesian setting, we assume every advertiser privately knows his/her own valuation vector for the queries, and only knows a prior distribution of other advertisers’ valuation vectors. Assume the valuation profile is drawn from a public distribution and the Bayes-Nash PoA is defined as
where refers to the social welfare of the optimal allocation that allocates slot of any query to the player with the -th largest value, i.e.,
where is the -th largest value among the valuations of query . Similarly, refers to the social welfare of the PBM-GSP mechanism with bidding profile , i.e.,
Pure PoA : In the full-information setting, the valuation of each advertiser on each query is fixed and the pure PoA can be mathematically defined as follows:
In order to characterize the influence of the maximum number of bid keywords, i.e., , we use expressiveness to measure the capacity of the bidding language. The concept of expressiveness has been widely used in the literature of auction theory [36, 13, 28, 5], and its theoretical foundation has been established in . In this paper, we use a new notion of expressiveness, which we call the keyword-level (KL) expressiveness. As will be seen in later sections, the KL-expressiveness will affect both the social welfare and search engine revenue for the PBM-GSP mechanism. The formal definition of KL-expressiveness is given as below. (Keyword-Level Expressiveness) Given a valuation profile , we call the auction system -KL-expressive, if for any advertiser , keywords can cover at least fraction of his/her positive keywords, i.e., . We call an auction system -KL-expressive (in the Bayesian setting), if for any valuation profile sampled from , the auction system is -KL-expressive. When , we say the auction system is fully KL-expressive444In real sponsored search systems, the number of keywords that an advertiser can bid on is usually large enough to satisfy most of his/her needs. For example, in Google Adwords, advertisers are allowed to bid up to 3 million keywords, which can be regarded as quite a large number. In this case, we can consider the system as fully KL-expressive. .
4.1 Bayes-Nash Price of Anarchy
In this subsection, we analyze the Bayes-Nash PoA for the PBM-GSP mechanism. We first extend the concept of homogeneity proposed in  to the Bayesian setting. We call the extended concept expected homogeneity, which measures the diversity of advertisers’ valuations on the queries matched to the same keyword in an expectation sense. For completeness, we list the definitions for both homogeneity and expected homogeneity as follows (in the full-information setting, expected homogeneity will trivially reduce to homogeneity). (Homogeneity)  A keyword is -homogeneous if for every advertiser and two arbitrary queries , . The auction system is -homogeneous if every keyword is -homogeneous. (Expected Homogeneity) A keyword is -expected-homogeneous if for any advertiser , two arbitrary queries , . The auction system is -expected-homogeneous if every keyword is -expected-homogeneous.
We leverage the technique developed in , which is used to analyze the PoA bound for the GSP auction.  We say that a game is -semi-smooth if for each player there exists some (possibly randomized) strategy (depending only on the type of the player) such that holds for every pure strategy profile and every (fixed) type vector (The expectation is taken over the random bits of ). If a game is -semi-smooth and its social welfare is at least the sum of the players’ utilities, then the price of anarchy with uncertainty is at most .
With the above definitions and lemmas, we give an upper bound for the Bayes-Nash PoA of the PBM-GSP mechanism. If the auction system is -KL-expressive and -expected-homogeneous, and the GSP auction is a -semi-smooth game, the Bayes-Nash PoA for the PBM-GSP mechanism is at most . To prove the theorem, we use the welfare generated from a truthfull bidding profile to connect the optimal welfare and the welfare in any Bayes-Nash equilibrium. Here the truthfull bidding profile denotes the situation when all advertisers bid their expected values on any keyword and there is no constaint, i.e., . In this situation, equals , where is the -th largest value among all the expected valuations on keyword . We prove the theorem in two steps. First, we bound the ratio between and , and then bound the ratio between and . The proof details of the two steps are given below.
For the first step, we show if the GSP auction is a -semi-smooth game, for any Bayes-Nash equilibrium of the PBM-GSP mechanism, the following bound holds,
Note that with the PBM-GSP mechanism, advertisers will not compete across keywords. For each advertiser , define the utility on any positive keyword as . By the defininition of , this utility function can be rewritten as . Thus for this particular keyword, the advertiser’s utility is exactly that for the GSP auction with true value defined as . Denote as the set of positive keywords for advertiser . Considering that the game within a given keyword is -semi-smooth, there must exist a (randomized) strategy on keyword satisfying, for every pure strategy ,
where is the welfare generated from keyword , i.e., , and .
On this basis, we design a randomized strategy for advertiser as follows. The randomized strategy first randomly samples keywords from , and plays the strategy if keyword is sampled. Considering that the auction system is -KL-expressive, the probability of any keyword sampled by the strategy is larger than .
Then it is straightforward to attain
Given the fact that the social welfare is at least the total utility of all the players, for any Bayes-Nash equilibrium ,we have
Then inequality (3) follows.
For the second step, we show that . Considering
it suffices to prove for any keyword and any query , . Since the auction system is -expected-homogeneous, the following result holds with probability one,
Without loss of generality, we assume that for keyword . Then we have
Applying (8) to (6), we can prove . Then the theorem follows by combining the two steps. In , it is shown that the GSP auction is -semi-smooth. Furthermore, it is trivial to obtain that the GSP auction in the single-slot case is a -semi-smooth game. Therefore, we can obtain the following two corollaries. If the auction system is -KL-expressive and -expected-homogeneous, the Bayes-Nash PoA for the PBM-GSP mechanism is at most . If the auction system is -KL-expressive and -expected-homogeneous and there is only one slot to display ads, the Bayes-Nash PoA for the PBM-GSP mechanism is at most .
4.2 Pure Price of Anarchy in Full-Information Setting
In this subsection, we analyze the pure PoA for the PBM-GSP mechanism. In particular, based on the notions of KL-expressiveness and homogeneity, we derive the following pure PoA bound. If the auction system is -KL-expressive and -homogeneous, the pure PoA of PBM-GSP mechanism for the multi-slot case is at most . Similar to Theorem 4.1, the proof of Theorem 4.2 contains two steps. For the first step, we prove .
Denote as the set of (keyword, position) pair that advertiser wins when all advertisers truthfully bid, i.e., , denote as the keyword set in . Given any bid profile , denote as the (keyword, position) set that advertiser actually bids and wins, i.e., , denote as the set of keywords in , whose size is no larger than .
We divide advertisers into three categories, : (1) advertisers in bid on keywords and ; (2) advertisers in bid on keywords and ; (3) advertisers in bid on fewer than keywords. We apply the equilibrium conditions to the three categories respectively.
1. For any advertiser in category , by definition, advertiser wins a position in any keyword in .
So, first, advertiser will not increase his/her payoff by changing his/her strategy from bidding a keyword with position to any keyword with position , where . Considering all advertisers are conservative, we have
Summing up both sides over all advertisers , where and , we have
Second, advertiser will not increase his/her payoff by changing his/her strategy from bidding on keyword with position to bidding the same keyword with position , where , , and . Similar to (9), we have
Summing up both sides over all advertisers , and where , we have
Considering that , and , we obtain
2. For advertiser in category and , since is a Nash equilibrium, it is clear that , and for , , , the following holds,
By summing over all advertisers , and , we have