Generalization Analysis for Game-Theoretic Machine Learning

10/09/2014 ∙ by Haifang Li, et al. ∙ 0

For Internet applications like sponsored search, cautions need to be taken when using machine learning to optimize their mechanisms (e.g., auction) since self-interested agents in these applications may change their behaviors (and thus the data distribution) in response to the mechanisms. To tackle this problem, a framework called game-theoretic machine learning (GTML) was recently proposed, which first learns a Markov behavior model to characterize agents' behaviors, and then learns the optimal mechanism by simulating agents' behavior changes in response to the mechanism. While GTML has demonstrated practical success, its generalization analysis is challenging because the behavior data are non-i.i.d. and dependent on the mechanism. To address this challenge, first, we decompose the generalization error for GTML into the behavior learning error and the mechanism learning error; second, for the behavior learning error, we obtain novel non-asymptotic error bounds for both parametric and non-parametric behavior learning methods; third, for the mechanism learning error, we derive a uniform convergence bound based on a new concept called nested covering number of the mechanism space and the generalization analysis techniques developed for mixing sequences. To the best of our knowledge, this is the first work on the generalization analysis of GTML, and we believe it has general implications to the theoretical analysis of other complicated machine learning problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many Internet applications, such as sponsored search and crowdsourcing, can be regarded as dynamic systems that involve multi-party interactions. Specifically, users arrive at the system at random with their particular needs; agents provide products or services that could potentially satisfy users’ needs; and the platform employs a mechanism to match agents with users. Afterwards, users may give feedback to the platform about their satisfactions; the platform extracts revenue and may provide agents with some signals as their performance indicator. Since both the information reported by the agents and the mechanism will affect the payoff of the agents, self-interested agents may strategically adjust their behaviors (e.g., strategically report the information about their services or products) in response to the mechanism (or more accurately the signals they receive since the mechanism is invisible to them). Take sponsored search as an example. When a user submits a query to the search engine (the platform), the search engine runs an auction to determine a ranked list of ads based on the bid prices reported by the advertisers (the agents). If the user clicks on (gives feedback to) an ad, the search engine will charge the corresponding advertiser by a certain amount of money. After a few rounds of auctions, the search engine will provide the advertisers with some signals on the auction outcome, e.g., the average rank positions of their ads, the numbers of clicks, and the total payments. Based on such signals, the advertisers may adjust their bidding behaviors to be better off in the future.

It is clear that the mechanism plays a central role in the aforementioned dynamic system. It determines the satisfaction of the users, the payoffs of the agents, and the revenue of the platform. Therefore, how to optimize the mechanism becomes an important research topic. In recent years, a number of research works [Lahaie and Pennock2007, Radlinski et al.2008, Zhu et al.2009a, Zhu et al.2009b, Medina and Mohri2014, He et al.2014, Tian et al.2014] have used machine learning to optimize the mechanism. These works could be categorized into three types.

  • Some researchers assume that the agents are fully rational and investigate the Nash (or dominant-strategy) equilibrium of the mechanism. For example, [Medina and Mohri2014] proposes a machine learning framework to optimize the second-price auction in sponsored search in the single-slot setting. In this case, the dominant strategy for fully rational advertisers is to truthfully reveal their valuations through the bid prices and therefore their bidding behaviors have no dynamics.

  • Some researchers assume that the behaviors of the agents are i.i.d. and independent of the mechanism, and optimize the mechanisms based on historical behavior data. For example, [Zhu et al.2009a] and [Zhu et al.2009b] apply machine learning algorithms to optimize the first-price auction based on the advertisers’ historical bidding data.

  • Some other researchers believe that the behaviors of the agents are neither fully rational nor i.i.d., instead, they are dependent on the mechanism through a data-driven Markov model. For example,

    [He et al.2013] and [Tian et al.2014] assume that the agents’ behavior change is Markovian, i.e., dependent on their historical behaviors and the received signals in previous time periods.

Please note that the assumption in the third type of works is more general, and can cover the other two types as its special cases. According to [Fudenberg1998]

, Nash (and also dominant-strategy) equilibrium in many games can be achieved by best-response behaviors, with which an agent determines the next action by maximizing his/her payoff based on the current action profile and mechanism. It is clear that the best-response behaviors are Markovian. Furthermore, it is also clear that the i.i.d. behaviors are special cases of Markov behaviors, where the Markov transition probability is reduced to a fixed distribution independent of the signals and the previous behaviors.

Based on the Markov assumption on agent behaviors, [He et al.2014] propose a new framework for mechanism optimization, called game-theoretic machine learning (GTML). The GTML framework involves a bi-level empirical risk minimization (ERM): it first learns a Markov model to characterize how agents change their behaviors, and then optimizes the mechanism by simulating agents’ behavior changes in response to the mechanism based on the learned Markov model. The GTML framework has demonstrated promising empirical results, however, its generalization analysis is missing in the literature111 In [Tian et al.2014], authors only studied the generalization ability for behavior learning. Furthermore, their definition for behavior learning error is different from ours, and cannot be applied to the generalization analysis for GTML.. Actually this is a very challenging task because conventional machine learning assumes data is i.i.d. generated from an unknown but fixed distribution [Devroye1996, Vidyasagar2003], while in GTML, agents behavior data have time dependency and may dynamically change in response to the mechanism. As a result, conventional generalization analysis techniques could not be directly applied.

In this paper, we present a formal analysis on the generalization ability of GTML. Specifically, utilizing the stability property of the stationary distribution of Markov chain

[Mitrophanov2005], we decompose the generalization error for GTML into the behavior learning error and the mechanism learning error. The former relates to the process of learning a Markov behavior model from data, and the latter relates to the process of learning the optimal mechanism based on the learned behavior model. For the behavior learning error, we offer novel non-asymptotic error bounds for both parametric and non-parametric behavior learning methods: for parametric behavior learning method, we upper bound the behavior learning error by parameter learning error; for non-parametric behavior learning method, we derive a new upper bound for the gap between transition frequency and transition probability of a Markov chain. After that, we apply the Hoeffding inequality for Markov chains to both of the upper bounds, and obtain the error bound for both parametric and non-parametric behavior learning methods. For the mechanism learning error, we make use of a new concept called nested covering number of the mechanism space. Specifically, we first partition the mechanism space into subspaces (i.e., a cover) according to the similarity between the stationary distributions of the data induced by mechanisms. In each subspace, the data distribution is similar and therefore one can substitute the data sample associated with each mechanism by a common sample without affecting the expected risk by much. Second, for each mechanism subspace, we derive a uniform convergence bound based on its covering number [Anthony and Bartlett2009] by using the generalization analysis techniques developed for mixing sequences. In the end of this paper, we apply our generalization analysis of GTML to sponsored search, and give theoretical guarantee to GTML in this scenario.

To the best of our knowledge, this is the first work that performs formal generalization analysis on GTML, and we believe the methodologies we use have their general implications to the theoretical analysis of other complicated machine learning problems as well.

2 GTML Framework

In this section, we briefly introduce the game-theoretic machine learning (GTML) framework. For ease of reference, we summarize related notations in Table 1.

2.1 Mechanisms in Internet Applications

Internet applications such as sponsored search and crowdsourcing can be regarded as dynamic systems involving interactions between multiple parties, e.g., users, agents, and platform. For example, in sponsored search, the search engine (platform) ranks and shows ads to users and charges the advertisers (agents) if their ads are clicked by users, based on the relevance degrees of the ads and the bid prices reported by the advertisers. Similar multi-party relationship can also be found in crowdsourcing, where the platform corresponds to Mechanical Turk, the agents corresponds to employers. While we can assume the behaviors of the users to be i.i.d., the behaviors of the agents are usually not. This is because agents usually have clear utilities in their minds, and they may change behaviors in order to maximize their utilities given the understandings on the mechanism used by the platform. As a result, the agents’ behaviors might be dependent on the mechanism.

Mathematically, we denote the space of mechanisms as , and assume it to be bounded with distance . We denote the space of user need/feedback, the space of agent behaviors, and the space of the signals, as , , and respectively. We assume and are both finite, with size and , since the behaviors and signals are usually discrete and bounded. For a mechanism , at the -th time period, agents’ behavior profile is , and a user arrives at the system. The platform matches the agents to the user and charges them, according to mechanism . After that, the platform will provide some signals (e.g., the number of clicks on the ads ) to the agents as an indication of their performances. Since may be affected by agents’ behavior profile , mechanism , and user data , we denote where is a function generating the signals for agents. After observing , agents will change their behavior to to be better off in the future.

2.2 Markov Agent Behavior Model

In order to describe how agents change their behaviors, the authors of [He et al.2013] and [Tian et al.2014] proposed a Markov behavior model. The key assumption made by the Markov model is that any agent only has a limited memory, and his/her behavior change only depends on his/her previous behaviors and signals in a finite number of time periods. To ease the discussion and without loss of too much generality, they assume the behavior model to be first-order Markovian. Formally, given the signal , the distribution of agents’ next behavior profile can be written as follows,

where is the transition probability matrix of the behavior profile, given the signals .

As mentioned in the introduction, the Markov behavior model is very general and can cover other types of behavior models studied in the literature, such as the best-response behaviors and the i.i.d. behaviors.

2.3 Bi-Level Empirical Risk Minimization

In [He et al.2014], bi-level empirical risk minimization (ERM) algorithm is proposed to solve the GTML problem. The first-level ERM corresponds to behavior learning, i.e., learning the Markov behavior model (the transition probability matrixes ) from training data containing signals and corresponding behavior changes. The second-level ERM corresponds to mechanism learning, i.e., learning the mechanism with the minimum empirical risk defined with both the behavior model learned at the first level and the training data containing users’ needs/feedback.

For behavior learning, suppose we have samples of historical behaviors and signals . The goal is to learn the transition matrix from these data. In [He et al.2014] and [Tian et al.2014], both parametric and non-parametric approaches were adopted for behavior learning. With the parametric approach, one assumes the transition probability to take a certain mathematical form, e.g., , where

denotes the inner product of two vectors and parameter

is learned by maximum likelihood estimation. With the non-parametric approach, one directly estimates each entry

by counting the frequencies of the event out of the event given signal . No matter which approach is used, we denote the learned behavior model as for ease of reference.

For mechanism learning, suppose we have samples of user data and a Markov behavior model , learned as above. The goal is to learn an optimal mechanism to minimize the empirical risk (e.g., minus empirical revenue/social welfare) on the user data, denoted as where . For this purpose, for arbitrary mechanism , one generates samples of behavior data in a sequential manner using the Markov model and samples of user data. With the samples of behavior data and user data, the empirical risk of mechanism can be computed. To improve the computational efficiency of mechanism learning, in [He et al.2014], the authors introduce a technique called -sample sharing. Specifically, given , in the optimization process, if the distance between a new mechanism and another mechanism whose behavior data is already generated is smaller than (i.e., ), then one will not generate behavior data for any more, but instead reuse the behavior data previously generated for mechanism . Therefore, we denote the sample for mechanism as , where is equal to itself or another mechanism satisfying . Consequently, the empirical risk of mechanism is defined as below,

By minimizing , one can obtain an empirically optimal mechanism:

While GTML and the bi-level ERM algorithm have demonstrated their practical success [He et al.2013], their theoretical properties are not yet well understood. In particular, given that GTML is more complicated than conventional machine learning (in GTML the behavior data are time-dependent and mechanism-dependent), conventional generalization analysis techniques cannot be directly applied and new methodologies need to be proposed.

3 Generalization Analysis for GTML

In this section, we first give a formal definition to the generalization error of the bi-level ERM algorithm for GTML, and then discuss how to derive a meaningful upper bound for this generalization error. Finally, we apply our generalization analysis of GTML to sponsored search, and show the GTML in this scenario has good generalization ability.

According to [Tian et al.2014], for a behavior model (such as the true Markov behavior model and the model obtained by the behavior learning algorithm), under some mild conditions (e.g., is irreducible and aperiodic), the process is a uniformly ergodic Markov chain for arbitrary mechanism . Then given mechanism and behavior model , there exists a stationary distribution for , which we denote as . For simplicity, we assume the process is stationary 222Our results can similarly holds without this assumption.. We define the risk for each mechanism as the expected loss with respect to the stationary distribution of this mechanism under the true behavior model , i.e.,

The optimal mechanism minimizing this risk is denoted as , i.e.,

We consider the gap between the risk of the mechanism learned by the bi-level ERM algorithm and the risk of the optimal mechanism , i.e., . We call this gap the generalization error for the bi-level ERM algorithm, or simply the generalization error for GTML.

To ease the analysis, we utilize the stability property of the stationary distribution of uniformly ergodic Markov Chain and decompose the generalization error for GTML into two parts, as shown in the following Theorem. Due to space restrictions, we leave all proofs in this paper to supplemental materials.

Theorem 3.1.

The generalization error of the bi-level ERM algorithm for GTML can be bounded as:

(1)

where is an upper bound for loss , and is a non-negative constant depending on .

For ease of reference, we call the first term in the right-hand side of inequality (1) behavior learning error and the second term mechanism learning error. We will derive upper bounds for both errors in the following subsections.

3.1 Error Bound for Behavior Learning

In this subsection we derive error bounds for both parametric and non-parametric behavior learning methods. Since the behavior space and signal space are both finite, it is shown in [Tian et al.2014] that forms a time-homogeneous Markov chain. Furthermore, under regular conditions, the Markov chain is uniformly ergodic, i.e., there exists , such that the elements in the -step transition probability matrix of are all positive. For ease of reference, we denote the minimum element in this matrix as . Since the mechanism is fixed in the process of behavior learning , we omit all the super scripts in if without confusion. Please note that, in [Tian et al.2014], although authors studied the generalization analysis for behavior learning, their definition on behavior learning error is different from ours and cannot be applied in the generalization analysis for GTML. To be specific, they measure the behavior learning error by the expected behavior prediction loss of the learned behavior model with respect to the stationary distribution under the true behavior model, while we measure behavior learning error in a stricter way by the infinity distance between the learned model and the true model.

Parametric Behavior Learning

With the parametric approach [He et al.2014], the transition probability is proportional to a truncated Gaussian function, i.e., where is bounded . The parameter is obtained by maximizing the likelihood. We first bound the behavior learning error by the gap between the learned parameter and the parameter in the true model, utilizing the property of maximum likelihood method; then we apply the Hoeffding inequality for uniformly ergodic Markov chain [Glynn and Ormoneit2002] and finally obtain the error bound for parametric behavior learning method as shown in the following theorem.

Theorem 3.2.

For any , we have, for ,

where are positive constants.

Non-parametric Behavior Learning

In the non-parametric behavior learning, we estimate the transition probability by the conditional frequency of the event given that and , i.e.,

The difficulty in analyzing the error of the above estimation comes from the sum of random variables in the denominator of the conditional frequency. To tackle the challenge, we first derive an upper bound for the gap between conditional transition frequency and conditional transition probability, which does not involve such a sum of random variables, then apply the Hoeffding inequality for uniformly ergodic Markov chain

[Glynn and Ormoneit2002] to this upper bound. In this way, we manage to obtain a behavior learning error bound, as shown in the following theorem.

Theorem 3.3.

For any , we have for ,

where are positive constants.

3.2 Error Bound for Mechanism Learning

In this section, we bound the mechanism learning error by using a new concept called nested covering number for the mechanism space. We first give its definition, and then prove a uniform convergence bound for mechanism learning on its basis.

Nested Covering Number of Mechanism Space

The nested cover contains two layers of covers: the first-layer cover is defined for the entire mechanism space based on the distance between stationary distributions induced by the mechanisms. The second-layer cover is defined for each partition (subspace) obtained in the first layer based on the distance between the losses of the mechanisms projected onto finite common data samples.

First, we construct the first-layer cover for the mechanism space . In mechanism learning, the learned Markov behavior model is used to generate the behavior data for different mechanisms. For simplicity, we denote the stationary distribution of the generated data as (or for simplification) and the set of stationary distributions for as

. We define the (induced) total variance distance on

as the total variance distance on , i.e., for , . For , the smallest -cover of w.r.t. the total variance distance is , where . That is, , where is the -balls of with respect to the (induced) total variance distance. We define the first-layer covering number as the cardinality of , denoted as . Based on , we can obtain a partition for , denoted as , where is an -partition of . When the mapping from mechanism to its stationary distribution is uniformly Lipschitz continuous, then . Because for , and belong to the same -partition of . So, considering is bounded, we have .

Second, we consider the loss functions for each mechanism subspace

, and define its covering number w.r.t. the common samples , where and are generated by mechanism . Again, we define the second-layer cover as the smallest -cover of under the distance, i.e., , and define the second-layer covering number as its maximum cardinality with respect to the sample .

In summary, the nested covering numbers for a mechanism space are defined as follows:

Definition 3.4.

Suppose is a mechanism space, we define its nested covering numbers as .

Uniform Convergence Bound for Mechanism Learning

In this subsection, we derive a uniform convergence bound for the ERM algorithm for mechanism learning. We first relate the uniform convergence bound for the entire mechanism space to that for the subspaces constructed according to the first-layer cover. Then considering that uniformly ergodic Markov chains are -mixing [Doob1990], we make use of the independent block technique for mixing sequences [Yu1994] to transform the original problem based on dependent samples to that based on independent blocks. Finally, we apply the symmetrization technique and Hoeffding inequality to obtain the desired bound.

Theorem 3.5.

Suppose that the mapping from to is uniformly Lipschitz continuous, and the -mixing rate of Markov chain (denoted as ) is algebraical(i.e., , where .). For any , we have

where denotes ceiling function, and .

Remark 1: For space restrictions, we only present the bound with specific mixing rate, which is simpler and easier to understand. Without the assumption on the mixing rate, we can also obtain a similar bound, which can be found in Theorem C.1 in the supplemental materials.

Remark 2: Although we have to leave the proofs to the supplementary materials due to space restrictions, we would like to point out one particular discovery from our proofs. While the -sample sharing technique was originally proposed to improve efficiency, according to our proof it plays an important role in generalization ability . Then a question is whether this technique is necessary for generalization ability. Our answer is yes if is infinite. Let us consider a special case in which and , i.e., the behavior model does not rely on the signals. If -sample sharing is not used, for finite ,

This implies that mechanism learning without -sample sharing does not have generalization ability.

Remark 3: An assumption made in our analysis is that the map from to is uniformly Lipschitz continuous. However, sometimes this assumption might not hold. In this case, we propose a modification to the original -sample sharing technique. The modification comes from the observation that the first-layer cover is constructed based on the total variance distance between stationary distributions of mechanisms. Therefore, in order to ensure a meaningful cover, we could let two mechanisms share the same data sample if the estimates of their induced stationary distributions (instead of their parameters) are similar. Please refer to the supplementary materials for details of this modification and a proof showing how it can bypass the discontinuity challenge. Note that the modified -sample sharing technique no longer has efficiency advantage since it involves the generation of behavior data for every mechanism examined during the training process, however, it ensures the generalization ability of the mechanism learning algorithm, which is desirable from the theoretic perspective.

3.3 The Total Error Bound

By combining Theorem 3.1, Theorem 3.3, Theorem 3.2 and Theorem 3.5, we obtain the total error bound for GTML as shown in the following theorem.

Theorem 3.6.

With the same assumptions in Theorem 3.5, for bi-level ERM algorithm in GTML, for any , we have the following generalization error bound 333Please refer to Theorem C.3 for the total error bound without the assumption on the mixing rate.:

where , and .

From the above theorem, we have following observations: 1) The error bound will converge to zero when the scales of agent behavior data and user data approach infinity. 2) The convergence rate w.r.t. is faster than that w.r.t. , indicating that one needs more user data than agent behavior data for training. 3) The mechanism space impacts the generalization ability through both its first layer covering number (which is finite) and second layer covering number.

3.4 Application to Sponsored Search Auctions

In this section, we apply our generalization analysis for GTML to sponsored search auctions. In sponsored search, GSP auctions with a query-dependent reserve price are widely used [Edelman, Ostrovsky, and Schwarz2005, Easley and Kleinberg2010, Medina and Mohri2014].

When a reserve price is used, the GSP auction runs in the following manners. First, the search engine ranks the ads according to their bid prices (here we follow the common practice to absorb the click-through rate of an ad into its bid price to ease the notations), and will show to the users those ads whose bid prices are higher than the reserve price. If the ad on the -th position (denoted as ) is clicked by a user, the search engine will charge the corresponding advertiser by the maximum of the bid price of and the reserve price . For sake of simplicity and without loss of generality, we will only consider two ad slots. Let the binary vector indicate whether and are clicked by users. Then the user data include two components, i.e., , where is the query issued by the user and records user’s click feedback. Denote the bid profile of the shown ads as , (for simplicity we sometimes omit in the notation). We consider a query-dependent reserve price, i.e., the auction family is . For a mechanism , the revenue of the search engine can be represented as:

and the loss is .

Since the first layer covering number is always finite and independent of the user data size (i.e., ), we just need to bound the second-layer covering number for GSP auctions space with reserve price, which is shown as below.

Theorem 3.7.

For GSP auctions with reserve price, the second layer covering number can be bounded by the pseudo-dimension (P-dim) of the reserve price function class. To be specific, we have:

Combine Theorem 3.6 and Theorem 3.7, we get a total error bound for GTML applied to GSP auctions with reserve price in the following theorem, which first gives generalization guarantees for GTML in sponsored search.

Corollary 3.8.

With the same assumptions in Theorem 3.5, for any , for GTML applied to GSP auctions with reserve price, we have the following generalization error bound:

where , and .

4 Conclusion and Future Work

In this paper, we have given a formal generalization analysis to the game-theoretic machine learning (GTML) framework, which involves a bi-level ERM learning process (i.e., mechanism learning and behavior learning). The challenges of generalization analysis for GTML lies in the dependency between the behavior data and the mechanism. To tackle the challenge, we first bound the error of behavior learning by leveraging the Hoeffding inequality for Markov Chains, and then introduce a new notion called nested covering number and bound the errors of mechanism learning on its basis. Our theoretical analysis not only enriches the understanding on machine learning algorithms in complicated dynamic systems with multi-party interactions, but also provides some practical algorithmic guidance to mechanism design for these systems. As for future work, we would also like to extend the idea of -sample sharing and apply it to improve the mechanisms in other real-world applications, such as mobile apps and social networks.

Notation Meaning
Spaces of user need/feedback, agent behaviors, and signals
mechanism space and the distance on it
at the -th time period, under mechanism , the users need/feedback, agents’ behavior, and the signal
transition probability matrix of agents behavior under signal
the learned behavior model and the true behavior model
the learned mechanism and the optimal mechanism
loss function
a mechanism that is equal to or another mechanism satisfying
stationary distribution of the process with behavior model
empirical risk of mechanism with behavior model by -sample sharing technique
expected risk of mechanism with the true behavior model
(induced) total variance distance on mechanism space
covering number of mechanism space under distance and (induced) total variance distance
the -partition of mechanism space according to its first layer cover
the loss function class in each partition
covering number for the function class under distance
beta mixing rate of Markov chain
Table 1: Notations

References

  • [Anthony and Bartlett2009] Anthony, M., and Bartlett, P. L. 2009. Neural network learning: Theoretical foundations. cambridge university press.
  • [Devroye1996] Devroye, L. 1996.

    A probabilistic theory of pattern recognition

    , volume 31.
    springer.
  • [Doob1990] Doob, J. 1990. Stochastic processes. Wiley publications in statistics. Wiley.
  • [Easley and Kleinberg2010] Easley, D., and Kleinberg, J. 2010. Networks, crowds, and markets. Cambridge Univ Press.
  • [Edelman, Ostrovsky, and Schwarz2005] Edelman, B.; Ostrovsky, M.; and Schwarz, M. 2005. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Technical report, National Bureau of Economic Research.
  • [Fudenberg1998] Fudenberg, D. 1998. The theory of learning in games, volume 2. MIT press.
  • [Glynn and Ormoneit2002] Glynn, P. W., and Ormoneit, D. 2002. Hoeffding’s inequality for uniformly ergodic markov chains. Statistics & probability letters 56(2):143–146.
  • [He et al.2013] He, D.; Chen, W.; Wang, L.; and Liu, T.-Y. 2013. A game-theoretic machine learning approach for revenue maximization in sponsored search. In

    Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence

    .
  • [He et al.2014] He, D.; Chen, W.; Wang, L.; and Liu, T.-Y. 2014. A game-theoretic machine learning approach for revenue maximization in sponsored search. CoRR abs/1406.0728.
  • [Lahaie and Pennock2007] Lahaie, S., and Pennock, D. M. 2007. Revenue analysis of a family of ranking rules for keyword auctions. In Proceedings of the 8th ACM Conference on Electronic Commerce, EC ’07, 50–56. New York, NY, USA: ACM.
  • [Medina and Mohri2014] Medina, A. M., and Mohri, M. 2014. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In Proceedings of the Thirty-First International Conference on Machine Learning, 262–270.
  • [Mitrophanov2005] Mitrophanov, A. Y. 2005. Sensitivity and convergence of uniformly ergodic markov chains. Journal of Applied Probability 1003–1014.
  • [Radlinski et al.2008] Radlinski, F.; Broder, A.; Ciccolo, P.; Gabrilovich, E.; Josifovski, V.; and Riedel, L. 2008. Optimizing relevance and revenue in ad search: a query substitution approach. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 403–410. ACM.
  • [Tian et al.2014] Tian, F.; Li, H.; Chen, W.; Qin, T.; Chen, E.; and Liu, T.-Y. 2014. Agent behavior prediction and its generalization analysis. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.
  • [Vidyasagar2003] Vidyasagar, M. 2003. Learning and generalisation: with applications to neural networks. Springer.
  • [Yu1994] Yu, B. 1994. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability 94–116.
  • [Zhu et al.2009a] Zhu, Y.; Wang, G.; Yang, J.; Wang, D.; Yan, J.; and Chen, Z. 2009a. Revenue optimization with relevance constraint in sponsored search. In Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising, 55–60. ACM.
  • [Zhu et al.2009b] Zhu, Y.; Wang, G.; Yang, J.; Wang, D.; Yan, J.; Hu, J.; and Chen, Z. 2009b. Optimizing search engine revenue in sponsored search. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 588–595. ACM.