sim_player_A
None
view repo
Machine learning is often used in competitive scenarios: Participants learn and fit static models, and those models compete in a shared platform. The common assumption is that in order to win a competition one has to have the best predictive model, i.e., the model with the smallest out-sample error. Is that necessarily true? Does the best theoretical predictive model for a target always yield the best reward in a competition? If not, can one take the best model and purposefully change it into a theoretically inferior model which in practice results in a higher competitive edge? How does that modification look like? And finally, if all participants modify their prediction models towards the best practical performance, who benefits the most? players with inferior models, or those with theoretical superiority? The main theme of this paper is to raise these important questions and propose a theoretical model to answer them. We consider a study case where two linear predictive models compete over a shared target. The model with the closest estimate gets the whole reward, which is equal to the absolute value of the target. We characterize the reward function of each model, and using a basic game theoretic approach, demonstrate that the inferior competitor can significantly improve his performance by choosing optimal model coefficients that are different from the best theoretical prediction. This is a preliminary study that emphasizes the fact that in many applications where predictive machine learning is at the service of competition, much can be gained from practical (back-testing) optimization of the model compared to static prediction improvement.
READ FULL TEXT VIEW PDFNone
Machine learning is often used in competitive scenarios: Participants learn and fit static models, and those models compete in a shared platform. The common assumption is that in order to win a competition one has to have the best learning model, i.e., the model with the smallest out-sample error. Is that necessarily true? The main theme of this paper is to raise this important question and propose a theoretical model to answer it.
In competitive machine learning, every player has a predictive model for the target, and the closest, the fastest or the most reliable model wins. What distinguishes this from static learning problems is that in a competition, target and rewards are shared among all players, and the distribution of rewards is not necessarily the same as that of competitors’ model merits. For instance, the rules may dictate that absolute winners take all or most of the rewards, while everyone else gets nothing [1, 2]. In other words, a learning model that is really good but not quite the best can earn as much as a very poor model. The best example of this is in a one-time bidding game: The closest bid to the fair value of an asset wins the auction, and the reward is the underlying transaction [3] (which may or may not be an appealing reward after all). The merit of a prediction can be assessed in the time domain as opposed to the space of target unit, or some combination of both. This is for example the case in High-Frequency electronic trading: The trader who has the fastest realization of a fair quote, and consequently the shortest reaction time, has the highest fill ratio, and is able to secure a profit above the transaction cost[4]. Meanwhile the slow trader’s fate is doomed at missed fills or adverse selection [5, 6]. The extent of competitive machine learning is vast. Bidding and auction models are used in various on-line platforms where big data and AI play important roles, including e-commerce, travel reservation, on-line advertising and many more [7, 8]. Many marketing businesses revolve around competing over a shared pool of targeted customers’ attentions (reward) by designing smarter recommendation algorithms (prediction models) [9, 10]. Ride sharing rivals use machine learning algorithms to predict surge times and provide recommendations or incentives to drivers for more rides and lower pickup times[11]. Machine learning is now an essential part of air traffic control, which is another example of an interactive platform of rewards (faster routs) and losses (delays) [12]. In fact, with the exception of applications in natural sciences and bio-informatics, there is an element of competition in almost all practical machine learning problems.
Key Questions. Noting the essence of competitive machine learning, we now ask the previous question again. Does the best theoretical predictive model for a target always yield the best reward in a competition? If not, can one take the best model and purposefully change it into a theoretically inferior model which in practice results in a higher competitive edge? How does that modification look like? And finally, if all participants gear their prediction models towards the best practical performance, who benefits the most? players with inferior models, or those with theoretical superiority? These are some very important questions that have never been formally stated and studied to the best of the authors’ knowledge.
Our Contributions. We provide a theoretical framework to answer the above raised questions. We consider a simple case where two participants, A and B, compete over a shared target. Each participant has a prediction model for the target value, and the player with the closest prediction collects the entire reward, which is equal to the absolute value of the target. The inferior player earns nothing. The predictions are based on underlying machine learning models that each player has independently built and trained. To make the study feasible, we focus on a simple case of linear prediction models: The target resource is the linear sum of underlying independent factors (features). Each player has access to a limited subset of those factors (), and unlimited training data. We also make the assumption of “linear knowledge” regime, which means , and are all constants. Each player can obtain his best theoretical prediction for the target. We study this problem in a game theoretic framework. A “strategy” for each player is defined as a linear prediction model using the available features with a particular set of coefficients. We show that this is a constant-sum game where the expected reward of every player can be fully characterized and computed for any pair of selected strategies. We focus on a class of strategies where the models’ coefficients are “symmetric” with respect to both features sets. We show that for symmetric pure strategies, play A’s reward is where is a computable constant. When both players use their theoretical models, for some other . We further demonstrate that each player can secure a max-min reward over all symmetric strategies. This allows us to prove that the guaranteed reward for player A is for some other constant . We numerically demonstrate that in most cases , if A is the inferior player (i.e. ). In other words, the inferior player can gain additional reward from practical (e.g. back-testing or real-world) optimization of his model coefficients, and can be as high as 1.8. This is a fascinating counter intuitive result. It basically means that the inferior player can purposefully tweak his theoretical prediction model into another model that performs up to 1.8 times better in competitive learning, despite having a higher mean-square prediction error. Additionally, by looking into the chosen coefficients, we observe that the model tweaking (including both shared and unique features) always consists of magnifying the coefficients, though we cannot mathematically prove this latter point at present.
Suppose there are two players A and B who are competing to predict a target value and collect rewards as a result of their predictions. We model as the linear combination of features :
(1) |
where s are independent from each other. The players each make an estimation about by using a subset of the features. In other words, A has access to a subset of features indexed in , and B has access to another subset of features indexed in . Without loss of generality, we assume that , so player A is at a theoretical disadvantage. Let us call the estimations of the two players and . The game rules dictates that whoever has the closest estimation wins all of the reward, which in this case is proportional to the absolute resource value . Therefore, for a particular instance, the reward for the two players is as follows:
(2) |
where and are the prediction errors. The competition is only meaningful in a statistical sense, therefore the average rewards should be considered:
(3) |
Note that based on this definition , therefore we are dealing with a constant-sum game. The theoretical best estimations for and are the maximum likelihood estimations:
(4) |
Assuming players have had sufficiently large sets of training data, a linear regression will yield the above theoretical models. However, we consider the possibility that each player can choose a different prediction model. We limit the study to the set of all linear prediction models. A pair of strategies for A and B can be represented with a pair of coefficient vectors
, and , where the corresponding prediction models are:(5) |
We also consider the linear knowledge regime where the following holds for some constants :
(6) |
Consider the following simple example consisting of four variables:
(7) |
For further simplicity, let us also assume that s can randomly take values in
with equal probabilities instead of normal distribution. Player A has an inferior prediction model due to lower number of available features. It is possible to list out error values and rewards for all 16 possible combinations of
s, as demonstrated in Table 2.1.0000 | 0 | 0 | 0 | 0 |
0001 | 1 | 1 | ||
0010 | 1 | 1 | ||
0011 | 2 | 1 | ||
0100 | 1 | |||
0101 | 2 | |||
0110 | 2 | |||
0111 | 3 | |||
1000 | 1 | 1 | ||
1001 | 2 | |||
1010 | 2 | |||
1011 | 3 | |||
1100 | 2 | |||
1101 | 3 | |||
1110 | 3 | |||
1111 | 4 |
From there, the expected reward of A is computable for every setting of s and s, which is essentially the average of column is table 2.1. The total expected reward in this case is . When both players use their respective theoretical models, i.e. and , then which is considerably lower than . For further simplicity, let us now assume that the players are only allowed to choose from strategies with equal coefficients, i.e. , and . If player A unilaterally chooses a prediction model different than the theoretical model, then the reward profile looks as Figure 1 for different values of . The reward peak is very close to a fair split. To achieve this, A must use an exaggerated model with . An intuitive justification is that by choosing larger coefficients, the chance of conceding a smaller error value becomes lower for A. However, this will also increase the (conditional) chance of winning tail events where target values are larger.
Obviously, if B is smart he will not permit a significant loss and will try to adapt his strategy as well. From a game-theoretic perspective, there is a mixed-strategy Nash equilibrium [13], which in this case is the solution of the minmax problem over all mixed strategies [14]. However, our focus here is solely on pure strategies, and from that angle, A can at least guarantee the following maxmin value:
(8) |
The heat map of as a function of is illustrated in Figure 2. leads to a guaranteed (maxmin) reward of for player A, which is more than 3.5 times what he can earn with the theoretical model.
We now turn our attention back to the generic model of (1) and study it more rigorously. Our first result states that for every strategy pair the reward is computable.
For every strategy pair :
(9) |
where is the pdf of the 3d-normal distribution with mean zero and covariance , where:
,
,
,
,
.
The integral of (14) can be computed within any arbitrary precision by using the method of sigma points based on physical Hermite polynomials [15]:
(10) |
where s are the roots of , and .
Using the premises of elementary game theory, one can easily deduce that player A can secure the following reward:
(11) |
However, despite computability of the reward function, any optimization over involves variables. Therefore any attempt to derive explicit or computable relationships for is hopeless. To facilitate this computation, we need to make the following definition:
Definition A strategy for player A is called model-symmetric (or symmetric in brief) if
(12) |
for some real scalars . We use the notation , and denote the set of all model-symmetric strategies for player A with . A strategy for player B is called model-symmetric if
(13) |
for some real scalars . We use the notation , and denote the set of all model-symmetric strategies for player B with . We also define .
The next corollary states that for model-symmetric strategies the reward is , where is independent of and only depends on parameters of the linear knowledge regime. It follows immediately from Theorem 1 and the definition of symmetric strategies.
If and are model-symmetric strategies, then , where:
(14) |
where is the pdf of 3d-normal distribution with mean zero and covariance , and:
,
,
,
,
,
.
Proof: we only need to show that for a symmetric , the minimizer of must be symmetric too. if and are two indices in , then any change in the values of and only effect and (recall the notations of model covariances in Theorem 1). Let be the minimizer of and , and be the resulting values for and . Define a new vector which is equal to in every coordinate except and , where . We can conclude that:
(15) | |||
(16) |
This means that
results in a model for B which has the same covariance with the target, but smaller or equal error variance, while everything else remains the same. Such a model is equal or superior for player B. Therefore
gets smaller which contradicts the choice of , unless . Similarly for two indices in , a similar transformation of results in a model with smaller or equal and identical . In this case since has only one unique value, will remain the same as well. Thus a similar argument holds, and the values of and coordinates in are identical.Based on theorem 2, we can use a computational optimization over four variable to obtain a lower bound for the guaranteed rate of the inferior player . For simplicity, let us define:
(17) |
is the (normalized) reward of player A when both players use their theoretical prediction models, while is the conservative guaranteed reward if player A chooses an optimally distorted model. The ratio is therefore the game-theoretic notion of “gain” that player A can achieve in this competition by optimizing his model’s coefficients. Based on Theorem 2 and Corollary 1.1, a lower bound for this gain is completely computable for every regime vector g. We have computed and will present this for a special case where . This case presents a “typical” subsets intersection for large if and are selected independently and uniformly. The 2d plots of and as functions of are depicted in Figures 2(a) and 2(b) respectively. Note that the gain can be as large as 1.8 for player A. In general, the more inferior A (the smaller g1/g2), the larger the gain.
We introduced a game theoretic framework for studying the performance of predictive linear models in competitions. We focused on a two player competition, and demonstrated that if both players put aside the theoretical predictions of the target, and optimize their corresponding models based on the practical reward, then the player with the inferior prediction gains extra rewards. The reward can be significant depending on the players’ knowledge regimes. This underlines the value of model optimization beyond the theoretical prediction. A more rigorous future study shall consider the following further directions: 1) devising an analytical study for the trade-off between cost of feature (knowledge) expansion and reward enhancement, 1) obtaining theoretical characterizations of optimal coefficients for each player, 2) studying mixed strategies and minmax equilibrium, which leads to suggestions for model combination (e.g., alpha combination in trading and investment), 3) studying the problem in a dynamic setup where players adaptively optimize their models. One can potentially characterize the convergence of the system and propose efficient learning/optimization algorithms for the latter problem. 4) Studying a multi-player game where features access has a particular distribution among a large number of players. Understanding the optimization gain of players as a function of where in the spectrum of knowledge they stand is a challenging and important practical problems that conforms to asymptotic cases of real world competitive applications.
Y. Kim, W. N. Street, G. J. Russell, and F. Menczer, “Customer targeting: A neural network approach guided by genetic algorithms,”
Management Science, vol. 51, no. 2, pp. 264–276, 2005.Stecha and V. Havlena, “Unscented kalman filter revisited—hermite-gauss quadrature approach,” in
Information Fusion (FUSION), 2012 15th International Conference on. IEEE, 2012, pp. 495–502.
Comments
There are no comments yet.