1 Introduction
Online advertising has become a key revenue source for many businesses on the Internet. Sponsored search is a major type of online advertising, which displays paid advertisements (ads) along with organic search results. Generalized Second Price (GSP) auction is one of the most commonly used auction mechanisms in sponsored search, which works as follows. When a query is issued by a web user, the search engine ranks all the ads bidding on this query (or keywords related to the query) according to their bid prices, and charges the owner of a clicked ad by the minimum bid price for him/her to maintain the current rank position.^{1}^{1}1In practice, the predicted clickthrough rate is also used in the ranking and pricing rules. However, it can be safely absorbed into the weighted bid prices without influencing the theoretical analysis on the GSP auctions.
If only the ads that exactly bid on the query are included in the auction, we call the corresponding mechanism an exactmatch mechanism. The GSP auction in this specific setting has been well studied in the literature [2, 29, 23, 37, 1, 8, 18], and has been shown to have a number of nice theoretical properties: (1) It possesses an efficient (welfaremaximizing) Nash equilibrium; (2) Its social welfare in equilibrium is fairly good even in the worst case : the pure price of anarchy (PoA) is bounded by 1.282 and the BayesNash PoA is bounded by 2.927; (3) In the Bayesian setting, the GSP auction paired with the Myerson reserve price generates at least a constant fraction (i.e., ) of the optimal revenue in its BayesNash equilibria for MHR distribution.
Despite the fruitful and positive results, the exactmatch mechanism is not sufficient when we are faced with practical requirements in commercial search engines. First, the query space is extremely large (billions of queries are issued by web users every day), so it is practically impossible for advertisers to bid on every query related to their ads. Second, even if advertisers are capable enough to bid on the huge number of related queries, the search engine might not be able to afford it due to the scalability and latency constraints. Due to these reasons, commercial search engines usually use a broadmatch mechanism to enhance the GSP auction. A broadmatch mechanism requires advertisers to bid on at most keywords instead of an arbitrary number of queries, and matches the keywords to queries using a querykeyword bipartite graph (in which the number of keywords is significantly smaller than the number of queries). The broadmatch mechanism is friendly to advertisers since they only need to consider a relatively small number of keywords in order to reach a large number of related queries. The mechanism is also friendly to the search engine since it restricts the complexity of the bidding language and therefore that of the auction system.
Today, most search engines implement the broadmatch mechanism in a straightforward manner. That is, when a query is issued, all the ads bidding on the keywords that can be matched to the query on the querykeyword bipartite graph will be put together for the GSP auction. And for every advertiser, the bids on the matched keywords will be transformed to the bid on the query using some predefined heuristics (e.g., the maximum bid on the matched keywords). For ease of reference, we call the broadmatch mechanism described above as the
Standard BroadMatch GSP mechanism, or SBMGSP for short.Although this mechanism effectively addresses the problems with the exactmatch mechanism, as far as we know, it has several theoretical drawbacks.

The social welfare of the SBMGSP mechanism was studied in [14], for the singleslot case and fullinformation setting only. By using the notion of homogeneity (denoted as ) to measure the diversity of an advertiser’s valuations over different queries that can be matched to a keyword, an almosttight pure PoA bound was derived, whose order is . Considering that is usually large in practice, it can be concluded that the social welfare of the SBMGSP mechanism can be rather bad in its worst equilibrium.

One has not obtained a complete picture about the theoretical properties of the SBMGSP mechanism: no results are available regarding the multislot case (which is, however, more practically important since most search engines sell multiple ad slots per query), and even for the singleslot case, the social welfare and revenue in the Bayesian setting are not clear.
Given the aforementioned limitations of the SBMGSP mechanism, a natural question to ask is whether we can design a broadmatch mechanism with better guarantees on its performance, in terms of both social welfare and revenue, for both singleslot and multislot cases, and in both fullinformation and Bayesian settings. This is exactly the focus of our work.
In this paper, we propose a new broadmatch mechanism, which we call Probabilistic BroadMatch mechanism. Its basic idea is as follows. For each query, our mechanism assigns a matching probability to every keyword that can be matched to this query on the querykeyword bipartite graph. When the query is issued by a user, the mechanism randomly chooses a keyword according to the matching probability distribution and runs the GSP auction only upon those ads that bid on the chosen keyword. For simplicity, we also use PBMGSP to refer to the above mechanism.
We perform a comprehensive study on the social welfare in equilibrium of the PBMGSP mechanism, for both singleslot and multislot cases, and in both fullinformation and Bayesian settings. We also derive a revenue bound for the PBMGSP mechanism for both singleslot and multislot cases in the Bayesian setting. To the best of our knowledge, this is the first work on broadmatch mechanisms that goes far beyond the singleslot case and the fullinformation setting.
Our Results
The contributions of our work can be summarized as follows.

(Section 3) We propose a novel broadmatch mechanism (i.e., the PBM mechanism) for multislot sponsored search auctions.

(Section 4) We analyze the social welfare in equilibrium of the PBMGSP mechanism in both fullinformation and Bayesian settings. We define a new concept, called keywordlevel expressiveness (denoted as ), which can better characterize the expressiveness of the bidding language in the PBMGSP mechanism than the concept of expressiveness proposed in previous work [14].

(Section 4.1) We extend the concept of homogeneity defined in [14] to the Bayesian setting, and prove that the BayesNash PoA of PBMGSP is at most in the multislot case. The bound can be further optimized to in the singleslot case.

(Section 4.2) We prove that in the fullinformation setting, the pure PoA of PBMGSP is at most when there are multiple slots to display ads. And the bound can be improved to in the singleslot case (which is tight with respect to each factor). Furthermore, we show that the pure PoA bound of PBMGSP is better than that of SBMGSP in the same setting under mild conditions.


(Section 5) We analyze the revenue bound of PBMGSP in the Bayesian setting. We prove that by using the Myerson reserve price to each keyword, PBMGSP can achieve a revenue at least of the optimal social welfare with MHR distribution, where is the maximum derivative of the virtual value function.
2 Preliminaries
In this section, we introduce the basics about broadmatch auctions, and some preliminary concepts that will be used in our theoretical analysis.
2.1 BroadMatch Auctions
According to [21, 7, 20, 14], a broadmatch mechanism can be defined on a querykeyword bipartite graph. Denote as the query space, and denote as a probability distribution over , which indicates the probability that query is issued by users. Denote as the keyword space. In practice, the size of is much larger than the size of . Denote as a (undirected) bipartite graph between queries and keywords, in which an edge if and only if query can be matched to keyword (or equivalently, can be matched to ). Denote as the neighborhood of vertex , i.e., for any query , represents the set of keywords that can be matched to the query, and for any keyword , represents the set of queries that can be matched to the keyword. Without loss of generality, we assume , for all and , for all .
Assume there are advertisers and slots. Denote as the click probability associated with the th ad slot^{2}^{2}2In real world, the slot number is usually bounded by a constant . In this case, we can define without loss of any generality., which satisfies i.f.f . We assume advertiser has a private valuation for query if his/her ad is clicked by the users, denote as the valuation profile of advertisers in which
is the vector indicating the
th advertiser’s valuation for all the queries, and as the valuations of the other advertisers. We assume for any query , there is at least one advertiser that positively valuates it. Define as the query set that advertiser has positive values on. For ease of reference, in the rest of the paper, we will call the queries (keywords) that an advertiser positively valuates positive queries (keywords).Denote as the advertisers’ bid profile, where is a vector indicating the th advertiser’s bid prices on all the keywords in , and denote as the bids of advertisers excluding . According to the industry practice, we assume that each advertiser can only bid on up to keywords. As a result, for each , there are at most positive values. Denote as the bid price of advertiser on keyword and as all the advertisers’ bids on keyword .
Based on the notations above, SBMGSP can be described as follows. When a query is issued, the SBMGSP mechanism first finds all the keywords that can be matched to the query. Second, it includes all the ads that bid on these keywords into the auction and uses the following formula to transform the bid prices on keywords of advertiser to his/her bid price on the query: . In the end, the GSP auction is run upon the ads with their querylevel bids, i.e., all the ads are ranked by their bids, and the payment of a clicked ad equals the bid of the ad ranked right below it.
2.2 Solution Concepts
In this paper, we consider rational behaviors under various assumptions on the information availablity to the advertisers. In general, the advertisers are engaged as players in a game defined by the auction mechanism (in the remaining of the paper, we use “advertiser” and “player” interchangeably). Every advertiser aims at selecting a bidding strategy that maximizes his/her utility. According to the availability of the information, we can categorize the settings into the Bayesian setting (partial information setting) and the fullinformation setting respectively.
In the Bayesian setting, we assume that the valuation (type) profile is drawn from a publicly known distribution . A strategy for player is a (possibly randomized) mapping , mapping his/her type to a bid vector . We use to denote the corresponding bid profile when is applied to . Denote as the utility function of advertiser . We say a strategy is a BayesNash equilibrium for distribution , if for all , all , and all alternative strategies ,
In other words, in a BayesNash equilibrium, each player maximizes his/her expected utility using strategy , assuming that the others bid according to strategies .
In the fullinformation setting, the valuation profile is known and fixed. In this setting, a pure strategy of any advertiser corresponds to a bid vector . we say that a bid profile is a (pure) Nash equilibrium if there is no deviation from which the players can be better off, i.e., for all advertiser , for all ,
3 Probabilistic BroadMatch Mechanism
As discussed in the introduction, the SBMGSP mechanism has several drawbacks from a theoretical perspective. In this paper, we develop a new broadmatch mechanism with better theoretical guarantee, which we call Probabilistic BroadMatch (PBMGSP) mechanism. The detail of the PBMGSP mechanism is described in Algorithm 1, and can be explained as below.
Given the querykeyword bipartite graph , for each query , we impose a matching probability distribution whose support is , i.e., if and only if , and . With this matching probability distribution, for any issued query , the mechanism randomly samples a keyword , and selects the ads bidding on the keyword into the auction. For each selected ad, the bid price on keyword will be directly used as the bid price on query during this round of auction, ^{3}^{3}3One may have noticed that due to the probabilistic sampling, an advertiser can only get access to a fraction of the whole query volume if he remains bidding on the same set of keywords as he/she does with SBMGSP. Therefore, some advertisers may have to bid on more keywords so as to maintain the same visibility of their ads to the users. Fortunately, since the number of keywords is always significantly smaller than the number of queries, the situation will not be as serious as in exactmatch mechanism. i.e., , where , and then a GSP auction is run to determine the ad allocations and prices.
For ease of description, we define as the advertiser who is ranked at position and as the ranking position of advertiser , for any keyword and bid profile . For sake of rigorousness, we define if there are fewer than positive bids on keyword , and define , for any query and keyword . We also define if advertiser does not bid on keyword , and define . Define as the price charged to player when keyword is sampled and a user clicks on the ads, i.e., for PBMGSP, if advertiser is ranked right below advertiser , then . With the aforementioned notations, the expected utility of advertiser can be defined as
As a common way to rule out unnatural equilibria [10, 32, 8, 14], we only consider conservative bidders in the theoretical analysis. It is easy to show that for any advertiser on any keyword , a bidding price is always weakly dominated by the bid (see Lemma 1), in which is the expected value of keyword for advertiser and defined as . (Conservative bidder) For any advertiser , a bid price for keyword is always weakly dominated by , where . Note that with the PBMGSP mechanism, advertisers will not compete across keywords. For advertiser , denote as his/her utility obtained from keyword . It is easy to see that . For any bidding profile , if advertiser bids a value larger than on keyword and get the same position as bidding , changing his/her bid to will not hurt his/her total utility. If he/she bids a larger value and obtains a better position , he/she will suffer a payment larger than when his/her ad is clicked, and therefore his/her expected utility on keyword must be less than , and the theorem follows. In PBM mechanism, bids for different keywords will not be mixed up in the same auction, it is easier for advertisers to evaluate their payoffs on each keyword. As a result, they could develop more accurate bidding strategies to reflect their valuations on each keyword. For example, it can be easily shown that in singleslot setting, the dominant strategy for an advertiser is to truthfully report the expected valuation on the keyword that he/she chooses to bid. When there is only one slot to display Ads, for any advertiser , the weakly dominant strategy for keyword is . In the next sections, we show this probabilistic matching can eventually improve the performances of the auction system.
4 Social Welfare Analysis
In this section, we present our theoretical results on the social welfare (efficiency) of the proposed PBMGSP mechanism. Specifically, we study the ratio between the optimal social welfare and the worstcase welfare in equilibrium, which is also known as the Price of Anarchy (PoA) [27, 22, 12, 4]:

BayesNash PoA : In the Bayesian setting, we assume every advertiser privately knows his/her own valuation vector for the queries, and only knows a prior distribution of other advertisers’ valuation vectors. Assume the valuation profile is drawn from a public distribution and the BayesNash PoA is defined as
where refers to the social welfare of the optimal allocation that allocates slot of any query to the player with the th largest value, i.e.,
(1) where is the th largest value among the valuations of query . Similarly, refers to the social welfare of the PBMGSP mechanism with bidding profile , i.e.,
(2) 
Pure PoA : In the fullinformation setting, the valuation of each advertiser on each query is fixed and the pure PoA can be mathematically defined as follows:
In order to characterize the influence of the maximum number of bid keywords, i.e., , we use expressiveness to measure the capacity of the bidding language. The concept of expressiveness has been widely used in the literature of auction theory [36, 13, 28, 5], and its theoretical foundation has been established in [3]. In this paper, we use a new notion of expressiveness, which we call the keywordlevel (KL) expressiveness. As will be seen in later sections, the KLexpressiveness will affect both the social welfare and search engine revenue for the PBMGSP mechanism. The formal definition of KLexpressiveness is given as below. (KeywordLevel Expressiveness) Given a valuation profile , we call the auction system KLexpressive, if for any advertiser , keywords can cover at least fraction of his/her positive keywords, i.e., . We call an auction system KLexpressive (in the Bayesian setting), if for any valuation profile sampled from , the auction system is KLexpressive. When , we say the auction system is fully KLexpressive^{4}^{4}4In real sponsored search systems, the number of keywords that an advertiser can bid on is usually large enough to satisfy most of his/her needs. For example, in Google Adwords, advertisers are allowed to bid up to 3 million keywords, which can be regarded as quite a large number. In this case, we can consider the system as fully KLexpressive. .
4.1 BayesNash Price of Anarchy
In this subsection, we analyze the BayesNash PoA for the PBMGSP mechanism. We first extend the concept of homogeneity proposed in [14] to the Bayesian setting. We call the extended concept expected homogeneity, which measures the diversity of advertisers’ valuations on the queries matched to the same keyword in an expectation sense. For completeness, we list the definitions for both homogeneity and expected homogeneity as follows (in the fullinformation setting, expected homogeneity will trivially reduce to homogeneity). (Homogeneity) [14] A keyword is homogeneous if for every advertiser and two arbitrary queries , . The auction system is homogeneous if every keyword is homogeneous. (Expected Homogeneity) A keyword is expectedhomogeneous if for any advertiser , two arbitrary queries , . The auction system is expectedhomogeneous if every keyword is expectedhomogeneous.
We leverage the technique developed in [10], which is used to analyze the PoA bound for the GSP auction. [10] We say that a game is semismooth if for each player there exists some (possibly randomized) strategy (depending only on the type of the player) such that holds for every pure strategy profile and every (fixed) type vector (The expectation is taken over the random bits of ). If a game is semismooth and its social welfare is at least the sum of the players’ utilities, then the price of anarchy with uncertainty is at most .
With the above definitions and lemmas, we give an upper bound for the BayesNash PoA of the PBMGSP mechanism. If the auction system is KLexpressive and expectedhomogeneous, and the GSP auction is a semismooth game, the BayesNash PoA for the PBMGSP mechanism is at most . To prove the theorem, we use the welfare generated from a truthfull bidding profile to connect the optimal welfare and the welfare in any BayesNash equilibrium. Here the truthfull bidding profile denotes the situation when all advertisers bid their expected values on any keyword and there is no constaint, i.e., . In this situation, equals , where is the th largest value among all the expected valuations on keyword . We prove the theorem in two steps. First, we bound the ratio between and , and then bound the ratio between and . The proof details of the two steps are given below.
For the first step, we show if the GSP auction is a semismooth game, for any BayesNash equilibrium of the PBMGSP mechanism, the following bound holds,
(3) 
Note that with the PBMGSP mechanism, advertisers will not compete across keywords. For each advertiser , define the utility on any positive keyword as . By the defininition of , this utility function can be rewritten as . Thus for this particular keyword, the advertiser’s utility is exactly that for the GSP auction with true value defined as . Denote as the set of positive keywords for advertiser . Considering that the game within a given keyword is semismooth, there must exist a (randomized) strategy on keyword satisfying, for every pure strategy ,
(4) 
where is the welfare generated from keyword , i.e., , and .
On this basis, we design a randomized strategy for advertiser as follows. The randomized strategy first randomly samples keywords from , and plays the strategy if keyword is sampled. Considering that the auction system is KLexpressive, the probability of any keyword sampled by the strategy is larger than .
Then it is straightforward to attain
(5)  
Given the fact that the social welfare is at least the total utility of all the players, for any BayesNash equilibrium ,we have
Then inequality (3) follows.
For the second step, we show that . Considering
(6) 
it suffices to prove for any keyword and any query , . Since the auction system is expectedhomogeneous, the following result holds with probability one,
(7)  
Without loss of generality, we assume that for keyword . Then we have
(8)  
Applying (8) to (6), we can prove . Then the theorem follows by combining the two steps. In [10], it is shown that the GSP auction is semismooth. Furthermore, it is trivial to obtain that the GSP auction in the singleslot case is a semismooth game. Therefore, we can obtain the following two corollaries. If the auction system is KLexpressive and expectedhomogeneous, the BayesNash PoA for the PBMGSP mechanism is at most . If the auction system is KLexpressive and expectedhomogeneous and there is only one slot to display ads, the BayesNash PoA for the PBMGSP mechanism is at most .
4.2 Pure Price of Anarchy in FullInformation Setting
In this subsection, we analyze the pure PoA for the PBMGSP mechanism. In particular, based on the notions of KLexpressiveness and homogeneity, we derive the following pure PoA bound. If the auction system is KLexpressive and homogeneous, the pure PoA of PBMGSP mechanism for the multislot case is at most . Similar to Theorem 4.1, the proof of Theorem 4.2 contains two steps. For the first step, we prove .
Denote as the set of (keyword, position) pair that advertiser wins when all advertisers truthfully bid, i.e., , denote as the keyword set in . Given any bid profile , denote as the (keyword, position) set that advertiser actually bids and wins, i.e., , denote as the set of keywords in , whose size is no larger than .
We divide advertisers into three categories, : (1) advertisers in bid on keywords and ; (2) advertisers in bid on keywords and ; (3) advertisers in bid on fewer than keywords. We apply the equilibrium conditions to the three categories respectively.
1. For any advertiser in category , by definition, advertiser wins a position in any keyword in .
So, first, advertiser will not increase his/her payoff by changing his/her strategy from bidding a keyword with position to any keyword with position , where . Considering all advertisers are conservative, we have
(9)  
Summing up both sides over all advertisers , where and , we have
(10)  
Second, advertiser will not increase his/her payoff by changing his/her strategy from bidding on keyword with position to bidding the same keyword with position , where , , and . Similar to (9), we have
(11) 
Summing up both sides over all advertisers , and where , we have
(12)  
Summing up (4.2) and times (12), we have
(13)  
Considering that , and , we obtain
(14)  
2. For advertiser in category and , since is a Nash equilibrium, it is clear that , and for , , , the following holds,
(15) 
By summing over all advertisers , and , we have