Log In Sign Up

Game of Coins

We formalize the current practice of strategic mining in multi-cryptocurrency markets as a game, and prove that any better-response learning in such games converges to equilibrium. We then offer a reward design scheme that moves the system configuration from any initial equilibrium to a desired one for any better-response learning of the miners. Our work introduces the first multi-coin strategic attack for adaptive and learning miners, as well as the study of reward design in a multi-agent system of learning agents.


page 1

page 2

page 3

page 4


Game Networks

We introduce Game networks (G nets), a novel representation for multi-ag...

Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning

A large portion of the passenger requests is reportedly unserviced, part...

Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Optimizing strategic decisions (a.k.a. computing equilibrium) is key to ...

Bounded strategic reasoning explains crisis emergence in multi-agent market games

The efficient market hypothesis (EMH), based on rational expectations an...

On Liquidity Mining for Uniswap v3

The recently proposed Uniswap v3 replaces the fungible liquidity provide...

Human-Agent Decision-making: Combining Theory and Practice

Extensive work has been conducted both in game theory and logic to model...

Convergence of Learning Dynamics in Information Retrieval Games

We consider a game-theoretic model of information retrieval with strateg...

Code Repositories


Connect 4 featuring multiple AIs and a Peggle-style alternate game mode.

view repo

1 Introduction

Cryptocurrencies are an arms race. Hundreds of digital coins have crept into the worldwide market in the last decade MarketState , including more than a dozen with over a billion dollar Market Cap, e.g., buterin2014next ; BitcoinCash ; Litecoin ; Cardano ; Neo . The vast majority of cryptocurrencies are based on the notion of proof of work (PoW) Bitcoin . As a result, the major strategic players in the context of cryptocurrencies are miners who devote their power to solving computational puzzles to find PoWs Bitcoin ; buterin2014next .

The miners for a particular coin usually gain rewards that are proportional to the power they invest in the coin out of the total invested power (in the coin) by all miners. Each coin, therefore, can be viewed as having some weight that reflects the reward it divides among its miners. In practice, a coin’s weight (or reward) depends on its transaction rate, transaction fees, and its fiat exchange rate.

While the above description is not complete, it does capture the fundamental decision faced by the miner: where should I mine? One indication for reward-based coin switching can be found online in websites like whattomine , where miners enter their mining parameters (technology, power, cost, et cetra) and get a list of coins they can mine for, ordered by their profitability. Another interesting example happened on November 12 (2017) bitinfocharts , when a dramatic change in the Bitcoin to Bitcoin Cash BitcoinCash (a spin-off from Bitcoin) exchange rate led to a major inrush of miners from Bitcoin to Bitcoin Cash (see Figure 1).

(a) Bitcoin and Bitcoin Cash exchange rates over time.
(b) Hashrate corresponds to the number of miners.
Figure 1: Miners move from Bitcoin to Bitcoin Cash.

All in all, the structure of the cryptocurrency market suggests that we face here a game among miners, where each miner wishes to mine coins of heavy weights while avoiding competition with other miners. In this paper we introduce for the first time the study of the cryptocurrency market as a game, consisting of a set of strategic players (miners) with possibly different mining powers and a set of coins with possibly different rewards (weights). The miners are free to choose to mine for any coin from the set, and we consider general better-response learning of the miners. That is, whenever any miner may benefit from deviating (i.e., changing the coin it mines for), some miner will take a step that improves his payoff; we allow an arbitrary sequence of such individual improvement steps (sometimes called improving path MondererShapley96 ). In our first major result we prove that any such better response learning converges to a (pure) equilibrium regardless of miner powers and coin rewards! This result is obtained by showing an ordinal potential, which according to MondererShapley96 , implies that arbitrary better response learning converges to equilibrium.

Having at hand the above fundamental result, we move to a discussion of strategic manipulation Nisan:2007:AGT:1296179 . While many efforts have been invested in the study of crypto-related manipulations SelfishMining ; OptimalSelfishMining ; StubbornMining ; MinersDilemma , we introduce for the first time the manipulation of the miners’ learning and optimization process. Given that a shift in the weight of a coin may influence miner behavior whattomine , in the cryptocurrency setting, it is quite possible for an interested party to affect this weight, either by creating additional transactions with high fees (sometimes called whale transactions liao2017incentivizing ) or by manipulating the coin exchange rate gandal2018price ; priceMuniplautionm ; whalesMuniplaution ; BitcoinMuniplaution . This way, a miner (or another interested party) can attempt to change the system equilibrium to a better one for them. We show that under broad circumstances, for every equilibrium of such a game, there exists a miner and another equilibrium in which the miner’s payoff is higher. The question is therefore: can one design rewards (i.e., temporarily increase coin weights) in a way that will lead the system from a given equilibrium to a desired one, so that the system will remain in the desired equilibrium after reverting to the original weights? Note that such reward design allows the manipulator to pay a finite cost while gaining an advantage indefinitely.

The above reward design problem is challenging since miners might take any better response step, and may make their moves in any order. Given the (modified) weights, we can use our previous major result to claim that any better response learning will converge to an equilibrium. Notice that the latter may not be the desired one, but now we can modify the rewards again. In the second major result of this paper we show that such desired reward design for learning agents is feasible! Namely, we provide a (multi-step) algorithm for assigning rewards in equilibrium states that moves learning agents from any initial equilibrium to a desired one.

In summary, our contributions are as follows:

  1. We formalize strategic mining in multi-cryptocurrency markets as a game (Section 2).

  2. We prove that any better-response learning in such games, starting from an arbitrary configuration, converges to equilibrium (Section 3).

  3. We show that, in many cases, for every equilibrium there is a miner and another equilibrium in which the miner’s payoff is higher (Section 4).

  4. We offer a reward design scheme that moves the system configuration from any initial equilibrium to a desired one for any better-response learning of the miners (Section 5).

For space limitations, the proofs of some of the claims we state here are deferred to Appendices C - F.

1.1 Related work

Results on better response learning convergence to pure equilibrium are rare and are typically restricted to games with exact potential NIPS2017_7216 ; MondererShapley96 , which coincide with congestion games. We show that our game does not have an exact potential (Section 3), and in fact our game belongs to the larger class of ID congestion games, where the payoff of a player depends on the player and the identity of other players who choose a similar resource, rather than on their number only. While there exist extensions of congestion games in which better-response learning converges to equilibrium (e.g., a restricted form of player-specific congestion games Milchtaich96 , which does not include our game), such results are extremely rare in the context of ID congestion games.

Unlike works on learning in games that emphasize adapting specific machine learning algorithms to minimize regret 

NIPS2017_7216 ; PalaiopanosPP17 ; NIPS2015_5763 ; learninggamesbook , we assume minimal rationality on behalf of the players, i.e., that they follow an arbitrary better response step improving their individual payoffs.

Our work also expands literature on reward design NIPS2010_4146 ; NIPS2017_7253 ; SorgSL10

, and to the best of our knowledge, is the first to introduce reward design for learning agents in a multi-agent setting. While seminal works in reward design assign/modify state rewards in a reinforcement learning context 

DBLP:books/lib/SuttonB98 , we design rewards for equilibrium states for any better response learning.

Though several previous works presented game theoretical analyses for cryptocurrencies liao2017incentivizing ; SelfishMining ; OptimalSelfishMining ; StubbornMining ; carlsten2016instability ; MiningGames ; Johnson14 ; MinersDilemma ; Schrijvers16 , the vast majority of them deal (in one way or another) with miners’ incentives to follow the coins’ mining protocols. Our work is the first to extend the study to a multi-coin setting and establish fundamental game theoretical results therein.

2 Model

A system in our model is a tuple , where is a finite set of miners (players) and is a finite set of coins (resources). A miner has mining power , which it can invest in one of the coins , i.e., the set of possible actions of is . We denote the set of configurations of a system as and denote by the action of player in configuration . When clear from the context, we omit the subscript indicating the system and simply write . Given and , we denote by the set of miners who mine for in , i.e., , and by their total mining power, i.e., . For we denote by the configuration that is identical to except that is replaced by .

A reward function maps coins to rewards. A game consists of a system and a reward function . Every coin in a game divides its reward among all the players that mine for it, and the miners’ payoffs are defined as follows: For , the revenue per unit (RPU) of coin in is . When clear from the context, we omit the the parameter indicating the game. The payoff function of a miner is .

Given a game , a configuration , a miner , and a coin , we say that moves from to in if it changes its action from to . A move from to is a better response step for if . We say that a miner is stable in a configuration in game if has no better response steps in . A configuration is stable or a (pure) equilibrium if every miner is stable in . A better response learning from in is a sequence of configurations resulting from a sequence of better response steps starting from , which is either infinite or ends with a stable configuration. In case it is finite, we say that it converges to its final configuration.

A function is an ordinal potential for a game if for any two configurations s.t. some better response step of a miner leads from to , it holds that . If, in addition, , then is an exact potential. By MondererShapley96 , if a game has an ordinal potential, then every better response learning converges.

3 Better response learning convergance

In this section we prove that although a game has no exact potential, every better-response learning of the miners in game converges to a stable configuration (pure equilibrium) regardless of the sets and and the reward function . To gain intuition, the reader is referred to our Appendix A and B, where we show how to construct a particular equilibrium in a game for any and , and give a simple ordinal potential function for the symmetric case in which is a constant function i.e., , , respectively.

No exact potential.

We start by showing that our game does not have an exact potential.

Proposition 1.

The game does not always have an exact potential.


Let be a game where , , , and . Assume by way of contradiction that has an exact potential function , and consider the following four configurations:

  • . Payoffs: , .

  • . Payoffs: , .

  • . Payoffs: , .

  • . Payoffs: , .

Note that . However, by definition of exact potential, we get that . A contradiction.

Ordinal potential.

To show an ordinal potential, we use the following definitions:

For a configuration in a game , we define to be the sequence of pairs in ordered lexicographically from smallest to largest. Denote by the coin (second element of the pair) in the entry in . Consider the ordered set , where is the set of all possible lists in , and is the lexicographical order. The rank of a list , , is the rank of in from smallest to largest.

Note that since and are finite, we know that and are finite. The following two observations establish a connection between better response steps and the of the associated coins.

Observation 1.

Consider a game , , , and s.t. . Then in every better response step of that changes to a coin , it holds that .

Observation 2.

Consider a game . If some better response step from configuration to configuration of a miner changes to , then .

We are now ready to prove that any game has an ordinal potential function.

Theorem 1.

For any finite sets and of miners and coins and reward function , is an ordinal potential in the game .


Consider two configurations s.t. some better response step of a miner leads from configuration to configuration , and let and . We need to show that . Since only the RPUs of and are affected we get that


By Observation 1, we get that , and thus , By Observation 2, we get that , and thus, together with the definition of and Equation 1, we get that , Therefore, none of them “move down” to a position before in and so


That is, the first elements of are equal to the first elements of . Hence, it suffices to show that the element of is lexicographically larger than the element of . Let . From Equation 2, we know that , so there are two possible cases:

  • First, . The theorem follows from Observation 2.

  • Second, . In this case,

    as needed.

4 There is often a better equilibrium

Before moving to our second major result in which we describe a manipulation through dynamic reward design that transitions the system between equilibria, in this section we show that under broad circumstances, in every stable configuration there is at least one miner who has higher payoff in another stable configuration. This means that such a miner will gain from moving the system there. Specifically, we prove this for games that satisfy the following assumptions (note that we use these assumptions only in this section):

Assumption 1 (Never alone).

For a configuration in a game , if there is a coin s.t. , then there is a miner s.t. changing to is a better response step for .

Although this assumption cannot hold when , it often holds in practice since the number of miners must be much larger than the number of coins for the cryptocurrency to be secure (truly decentralized).

Assumption 2 (Generic game).

For any two coins and two sets of players in a game , .

This assumption is common in game theory  

DBLP:journals/mss/HolzmanL03 , and it makes sense in our game since mining power in practice is measured in billions of operations per hour and coin rewards are coupled with coin fiat exchange rates, so exact equality is unlikely.

The following observation follows from Assumption 1 and the fact that coins that are chosen by at least one miner always divide their entire reward. It stipulates that in every stable configuration, the sum of the payoffs the miners get is equal to the sum of the coins’ rewards.

Observation 3 (All stable configurations are globally optimal).

For every stable configuration in a game under Assumption 1, it holds that .

From Observation 3 and Assumption 2 it is easy to show the following claim:

Claim 4.

Consider a game under Assumptions 1 and 2. If the game has more than one stable configuration, then for every stable configuration there exist a miner and a stable configuration s.t. .

It remains to show that has more than one stable configuration. Consider s.t. , and , let . We first show that the game has two different configurations in which miners do not share a coin and at most one of them is unstable. Then, we inductively construct two configurations in , , based on the two configurations in , in which all miners in keep their locations and all miners except maybe the one that was unstable in are stable. The construction step is captured by Claim 5, where in the step.

Claim 5.

Let be a reward function. Consider a system , and a configuration . Now consider another system s.t. , , and . Let and consider a configuration s.t. for all and . Then is stable in in game , and every player that is stable in in is also stable in in .

Finally, we show that the two configurations we construct in are stable: Let be the (possibly) unstable miner. By Assumption 1 (note that the assumption refers only to game ), cannot be alone in a coin (otherwise there must be another unstable miner), and thus it shares the coin with a smaller stable miner, which we show implies that is stable.

Our results are captured by the following proposition, which follows from Claim 4 and and the inductive construction using Claim 5.

Proposition 2.

Consider a game under Assumptions 1 and 2. Then for every stable configuration in there exist a miner and a stable configuration in which .

5 Reward design: moving between equilibria

In this section we consider a system , where s.t. . For every reward function and every two stable configurations in game we describe a mechanism to move the system from to the desired configuration by temporarily increasing coin rewards. Note that once we lead the system to , we can return to the original rewards (i.e., stop manipulating coin weights) because is stable in . Therefore, a manipulator who gains from moving to a desired stable configuration can do it with a bounded cost.

We first define a reward design function that maps system configurations to reward functions.

Definition 1 (reward design function).

Consider a system . A reward design function for system is a function mapping every configuration to a reward function, i.e., .

Dynamic reward design.

Consider a system and a reward function . A dynamic reward design mechanism for game is an algorithm that for any two stable configurations in moves the system from to by following the protocol in Algorithm 1.

3:     choose a reward design function s.t. for all ,
4:     allow better-response learning in , starting from , to converge to some stable
5:     configuration convergence is due to Theorem 1
Algorithm 1 protocol to move a system with reward function from to .

5.1 Reward design algorithm

To describe a dynamic reward design algorithm we need to specify the reward design function for every loop iteration in Algorithm 1. Intuitively, we observe that miners with less mining power are easily moved between coins, meaning that we can increase a coin reward so that a small miner with little mining power will benefit from moving there, but bigger miners with more mining power prefer to stay in their current locations. Therefore, the idea is to evolve the current configuration to in stages, where in stage , we move the miners with the smallest mining powers to the location (coin) of miner in the final configuration (i.e., ) while keeping the remaining miners in their (final) places. To this end, we define intermediate configurations. For , , we define as:


That is, in , miners are in their final locations and miners are in the final location of miner . Note that . Figure 1(a) illustrates the stage transitions in the algorithm.

(a) Configuration
(b) Iteration moving ; is the anchor.
Figure 2: Reward design algorithm: (a) stages; and (b) iteration within stage . Boxes represent coins, discs represent miners. The unlabeled bottom discs represent possible bigger miners who are in their final locations.

Notice that since we allow arbitrary better response learning (in every iteration), choosing a reward design function is a subtle task; miners can move according to any better response step, and we cannot control the order in which miners move. One may attempt to design a reward function so that in the resulting game there is exactly one unstable miner with exactly one better response step in the current configuration. However, even given such a function, after that miner takes its step, other miners might become unstable, which can in turn lead to a learning process that depends on the order in which miners move and on the choices they make (in case they have more than one better response step). Hence, the main challenge is to be able to restrict the set of the possible stable configurations reached by learning phase in each iteration.

In every loop iteration of stage we pick a miner that we want to move from to (as explained shortly) and choose the reward function carefully so that (1) ’s only better response step is , (2) all other miners are stable, and (3) in every stable configuration reached by better response learning after ’s step, is in , all miners are in either or , and all the other (bigger) miners remain in their (final) locations.

Moreover, our proof shows by induction that our reward design function of stage (defined below) guarantees that the set of possible configurationas reached by learning in stage is

Notice that the stage starts at . We now explain how we choose the reward design function for stage . First, for every configuration , the index of the miner we want to move from to (called mover) is

Note that for every , . Moreover, and . Let . Intuitively, we use as an anchor in configuration ; we choose a reward function that increases the reward of coin as high as possible without making the anchor unstable. As a result, all the miners in (who are bigger than or equal to the anchor) remain stable, and miner has a unique better response step to move to . Figure 1(b) illustrates and for some configuration .

In order to make sure that miners not in also remain stable, and in order to guarantee that that every better response learning after ’s step converges to a configuration in , we choose a reward function that evens out the RPUs of all coins other than . For , let . The reward design function for stage is:


Note that the RPUs of all coins except in the game are equal to . In addition, note that if a miner bigger than or equal to moves to , then ’s RPU becomes no bigger than . However, since , has a unique better response step to move to . Therefore, we get that our reward design function allows us to control the first step of the learning process. In the next section we give more intuition on how it also restricts the stable configuration at the end of any learning process at stage to the set .

As for the fist stage, note that we need to move all miners to coin , so intuitively we only need to increase its reward high enough. We therefore choose:


In Algorithm 2 we present our reward design algorithm, and in the next section we outline the proof that every stage eventually completes.

2:for i=1 …n do
3:     repeat is defined in Equations 4 and 5
4:         allow better-response learning in , starting from , to converge to some stable
5:          configuration
7:     until  is defined in Equations 3
Algorithm 2 Dynamic reward design algorithm.

5.2 Proof outline

The proof for stage 1 is straightforward so we skip it. Consider stage . We prove in the appendix the following technical lemma about stable configurations in the stage:

Lemma 1.

Consider a configuration . Then every better response learning in the game that starts at converges to a configuration such that:

  1. , .

  2. .

As part of the proof, we show that within stage , all the reached configurations (both stable and unstable) are in . Let and . After moves to according to its only better response step, in the resulting configuration , the RPUs of all coins not in remain . Moreover, , and . Therefore, although , it is still not high enough to drive miners not in (by definition, bigger than ) to move to it. So the only miners that possibly have better response steps at are miners in who wish to move to . Moreover, the total mining power of the miners who actually move to is smaller than , otherwise, ’s RPU will go below . In the proof we use the above intuition to formulate an invariant that captures the lemma statement and prove it by induction on better response steps. The lemma then follows from Theorem 1 (every better response learning converges to a stable configuration).

We next use Lemma 1 to prove that every stage completes in a finite number of loop iterations. To this end, we associate with every configuration

a binary vector

indicating, for each , whether is in , where it needs to be at the end of the stage. Consider the ordered set , where is the set of all binary vectors of length , and is the lexicographical order. For a configuration , we define to be a vector in such that:

and the function to be the rank of in .

Theorem 2.

Every stage of Algorithm 2 completes in a finite number of loop iterations.


By definitions of stage and set , the first configuration of stage is . By definition of , for all , . By inductively applying Lemma 1, we get that every loop iteration in stage ends in a configuration in . Therefore, consider a loop iteration of stage that starts in configuration and ends in configuration , we get by Lemma 1 that and , . Therefore . Now since the set is finite, we get the after a finite number of iterations we reach configuration .

6 Discussion

Our work studies and challenges the crypocurrency market from a novel angle – the strategic selections by adaptive miners among multiple coins. There are several central followups one may consider. First, our reward design is effective for arbitrary better-response learning, but one may wonder about its speed of convergence under specific markets. In addition, we consider convergence to equilibrium, and one may consider also convergence to a bad (possibly unstable) configuration in which, for example, a particular miner will have a dominant position in a coin, killing (at least for a while) the basic guarantee of non-manipulation (security) for that coin and allowing him to get a bigger portion of the reward. One also may wonder about the asymmetric case where some coins can be mined only by a subset of the miners.


  • [1] Bitcoin cash. Accessed: 2018-04-23.
  • [2] Bitcoin price manipulation: Economists warn just one person may have caused value surge. Accessed: 2018-04-24.
  • [3] Cardano. Accessed: 2018-04-23.
  • [4] Crypto whales and how they manipulate the price. Accessed: 2018-04-24.
  • [5] Cryptocurrency info charts. Accessed: 2018-04-23.
  • [6] Cryptocurrency market state visualization. Accessed: 2018-04-23.
  • [7] Cryptocurrency price manipulation is unavoidable. Accessed: 2018-04-24.
  • [8] Litecoin. Accessed: 2018-04-23.
  • [9] Neo. Accessed: 2018-04-23.
  • [10] What to mine. Accessed: 2018-04-24.
  • [11] Vitalik Buterin et al. A next-generation smart contract and decentralized application platform.
  • [12] Miles Carlsten, Harry Kalodner, S Matthew Weinberg, and Arvind Narayanan. On the instability of bitcoin without the block reward. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pages 154–167. ACM, 2016.
  • [13] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
  • [14] Ittay Eyal. The miner’s dilemma. In Security and Privacy (SP), 2015 IEEE Symposium on, pages 89–103. IEEE, 2015.
  • [15] Ittay Eyal and Emin Gun Sirer. Majority is not enough: Bitcoin mining is vulnerable. In International Conference on Financial Cryptography and Data Security, pages 436–454. Springer, 2014.
  • [16] Neil Gandal, JT Hamrick, Tyler Moore, and Tali Oberman. Price manipulation in the bitcoin ecosystem. Journal of Monetary Economics, 2018.
  • [17] Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan. Inverse reward design. In Advances in Neural Information Processing Systems, pages 6768–6777, 2017.
  • [18] Amélie Heliou, Johanne Cohen, and Panayotis Mertikopoulos. Learning with bandit feedback in potential games. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6369–6378. Curran Associates, Inc., 2017.
  • [19] Ron Holzman and Nissan Law-yone. Network structure and strong equilibrium in route selection games. Mathematical Social Sciences, 46(2):193–205, 2003.
  • [20] Benjamin Johnson, Aron Laszka, Jens Grossklags, Marie Vasek, and Tyler Moore. Game-theoretic analysis of ddos attacks against bitcoin mining pools. In International Conference on Financial Cryptography and Data Security, pages 72–86. Springer, 2014.
  • [21] Aggelos Kiayias, Elias Koutsoupias, Maria Kyropoulou, and Yiannis Tselekounis. Blockchain mining games. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 365–382. ACM, 2016.
  • [22] Kevin Liao and Jonathan Katz. Incentivizing blockchain forks via whale transactions. In International Conference on Financial Cryptography and Data Security, pages 264–279. Springer, 2017.
  • [23] Igal Milchtaich. Congestion games with player-specific payoff functions. Games and Economic Behavior, 13, 1996.
  • [24] D. Monderer and L.S. Shapley. Potential games. Games and Economic Behavior, 14:124–143, 1996.
  • [25] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008.
  • [26] Kartik Nayak, Srijan Kumar, Andrew Miller, and Elaine Shi. Stubborn mining: Generalizing selfish mining and combining with an eclipse attack. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pages 305–320. IEEE, 2016.
  • [27] Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V. Vazirani. Algorithmic Game Theory. Cambridge University Press, New York, NY, USA, 2007.
  • [28] Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5874–5884, 2017.
  • [29] Ayelet Sapirshtein, Yonatan Sompolinsky, and Aviv Zohar. Optimal selfish mining strategies in bitcoin. In International Conference on Financial Cryptography and Data Security, pages 515–532. Springer, 2016.
  • [30] Okke Schrijvers, Joseph Bonneau, Dan Boneh, and Tim Roughgarden. Incentive compatibility of bitcoin mining pool reward functions. In International Conference on Financial Cryptography and Data Security, pages 477–498. Springer, 2016.
  • [31] Jonathan Sorg, Richard L Lewis, and Satinder P. Singh. Reward design via online gradient ascent. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2190–2198. Curran Associates, Inc., 2010.
  • [32] Jonathan Sorg, Satinder P. Singh, and Richard L. Lewis. Internal rewards mitigate agent boundedness. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 1007–1014, 2010.
  • [33] Richard S. Sutton and Andrew G. Barto. Reinforcement learning - an introduction. Adaptive computation and machine learning. MIT Press, 1998.
  • [34] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E Schapire. Fast convergence of regularized learning in games. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2989–2997. Curran Associates, Inc., 2015.

Appendix A Existence of an equilibrium

We show here how to find a stable configuration (pure equilibrium) in the game for any , and . We do this by induction, selecting coins for miners in descending order of mining power.

Claim 6.

Consider a reward function , a system , and another system s.t. , , and . Then, if the game has a stable configuration, than the game has a stable configuration as well.


Let be a stable configuration in . We use it to build a configuration in in the following way: for all set , and set to . We now show that configuration is stable in .

First, consider . Since we pick to be , we get that , , so is stable in . Next, consider a miner s.t. . Since is stable, we know that , . Now since and for all , we get that . Meaning that is stable in .

Finally, consider s.t. . We need to show that for all ,  . Again, since we pick to be , we know that , , and thus , . Now by the claim assumption, . Therefore, , . Thus, , . Meaning that is stable in .

Proposition 3.

For any set of of miners , set of coins , and a reward function , the game has a stable configuration.


Order miners by mining power so that . First we show that the game has a stable configuration. Let , and define a configuration in as follows: . Since , we get that the configuration is stable in . The lemma follows by inductively applying Claim 6.

Appendix B Ordinal potential for the symmetric case

We consider here the symmetric case where all coin rewards are equal, and show that any game in this case has a simple potential function.

Proposition 4.